|
Senior Thesis 2024
Computer Science Department
School of Computer Science, Carnegie Mellon University
Optimized semi-deconvolution using reference data
Alan Luo
Senior Thesis
April 2024
Thesis Document
Robust and Accurate Deconvolution Single-Cell, or RADs, is an algorithm for integrating bulk and single-cell genomic data in cancer progression studies. Methods like RADs are used to examine the composition of cells making up a tumor and how their behavior is perturbed in different tumor sites, which in turn yields insight to how cancers might be better monitored or treated. RADs is an algorithm that performs a technique called semi-deconvolution, which seeks to infer frequencies of cell types and their gene expression evolution over stages of cancer progression. The first efforts in this research area focused on specific use cases wherein the data that one has available is limited to bulk data profiling average genomic features of mixtures of many distinct cells. Single-cell data has revolutionized the field by allowing one to track genomic behavior of individual cells in a tumor but is not always technically feasible. There are situations when one has samples suitable for single-cell methods, such as some recent metastases, but also samples only suitable for bulk methods, such as biopsies of archived primary tumors that may have been preserved years earlier. RADs focuses on such scenarios but can have poor resolution for identifying and quantifying the many different cell types that may be found in the bulk data. Hence, the goal of this study was to improve upon RADs by making use of reference single-cell RNA-seq datasets that provide models of gene expression of many known cell types. At present, several new combinations of data have been explored to improve the algorithm, using different penalty weights each time. The results were that the performance worsened as the penalty weight increased, at least for one of the combinations. In addition, the changes in cell type compositions observed across the penalty weights were somewhat consistent with expectations from prior biological knowledge. The results indicate that the prior reference-free RADs method could be adapted to accommodate third-party reference data sources. More results are being collected.
15 pages
Advisor
Russell Schwartz
|