CMU-CS-98-105
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-98-105.ps

Automating Computational Molecular Genetics
Solving the Microsatellite Genotyping Problem

See-Kiong Ng

January 1998

Ph.D. Thesis

CMU-CS-98-105.ps
CMU-CS-98-105.pdf


Keywords: Artificial intelligence, automation software, biotechnology, computational biology, molecular genetics, microsatellite genotyping, pattern matching, FAST-MAP


The Human Genome Project has extended the reach of modern genetics by providing an infrastructure of high-resolution genetic maps. Scientists can now find genes using these maps by genotyping -- experimentally assaying the genome at mapped genetic markers. To track the inheritance patterns of a genetic disorder, individual genomes are genotyped at high resolution using densely distributed genetic markers, such as the microsatellites. However, because of the complexity associated with the inheritance patterns of most common human genetic diseases, hundreds of thousands of genotyping experiments are typically required to genetically localize even one disorder on the genome.

The full automation of microsatellite-based genotyping is currently limited by the human scoring bottleneck: every experiment must be viewed by a human eye. The intricate genotyping data, densely multiplexed for throughput, is confounded with intrinsic data artifacts such as PCR stuttering. Human experts are required to visually decipher the highly complex data patterns that resulted. It is estimated that over half the cost of microsatellite-based genotyping is due to this human scoring effort.

We have developed and implemented novel computer-bsed analysis methods that computationally solve the various problems associated with the microsatellite scoring bottleneck. Our system, FAST-MAP, is a platform-independent fully automated genotyping system that accurately calls alleles from quantitative microsatellite data. FAST-MAP has been extensively tested and used by scientists worldwide to generate genotypes with high accuracy from real data generated in high throughput genetic laboratories. With FAST-MAP, we have shown that by appropriately modeling and representing genotype data, powerful computational strategies can overcome key molecular biology bottlenecks and significantly advance the rapid localization of genes across the whole human genome.

401 pages


Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu