CMU-ML-20-100
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-20-100

Probabilistic Single Cell Lineage Tracing

Chieh Lin

March 2020

Ph.D. Thesis

CMU-ML-20-100.pdf


Keywords: Time-series single-cell RNA-Seq, Graphical models, Regulatory networks, Developmental trajectories, Maximum likelihood, Bayesian hierarchical clustering


Cell lineage tracing is a long-standing open problem in biology. To solve thisproblem, new technologies that can profile single-cells have been introduced in the last decade. Currently, studies attempt to construct lineage relationships using time-series single-cell RNA sequencing (scRNA-Seq) data or by utilizing artificial mutations for marking cells. The former studies rely on pseudo-time ordering which suffers from shortcomings that can impact their accuracy. The latter often apply phylogeny-based methods which often lead to hundreds of candidate trees. There is no current method to combine single-cell lineage trees from different individuals of the same organism to reconstruct a single invariant lineage for the same species.

In this thesis, we present a set of machine learning models that focus on reconstructing single-cell lineages. We developed a pro babilistic model based on Continuous-State Hidden Markov Model (CSHMM) to reconstruct trajectories and branchings from time series scRNA-Seq data. The model is then extended by learning the dynamics of regulatory interactions that take place during the process being sutdied (CSHMM-TF). We next present a method that integrates sequence andxpression data. In addition, we developed LinTIMaT, a statistical model for reconstructing single-cell lineage trees using both artificial mutations and scRNA-Seq dataand for constructing a general invariant lineage tree from multiple cell lineage trees of the same species. Finally, we apply CSHMM to a new dataset and show that it is capable of reconstructing lineage relationships and provides important novel insights for studying lung development.

209 pages

Thesis Committee:
Ziv Bar-Joseph (Chair)
Roni Rosenfeld
Jian Ma
Darrell Kotton (Boston University)
Killian Hurley (Royal College of Surgeons in Ireland)

Roni Rosenfeld, Head, Machine Learning Department
Martial Hebert, Dean, School of Computer Science


SCS Technical Report Collection
School of Computer Science