CMU-CS-05-181
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-05-181

FPT Algorithms for Binary Near-Perfect Phylogenetic Trees

Srinath Sridhar, Kedar Dhamdhere*, Guy E. Blelloch,
Eric Halperin**, R. Ravi++, Russell Schwartz+
September 2005

CMU-CS-05-181.ps
CMU-CS-05-181.pdf


Keywords: Phylogenetic trees, Parsimony, Near-perfect phylogeny


We consider the problem of reconstructing near-perfect phylogenetic trees using binary character states
(referred to as BNPP). A perfect phylogeny assumes that every character mutates at most once in the evolutionary tree, yielding an algorithm for binary character states that is computationally e cient but not
robust to imperfections in real data. A near-perfect phylogeny relaxes the perfect phylogeny assumption by allowing at most a constant number of additional mutations. In this paper, we present a simple lower bound for the size of an optimal phylogeny, develop two algorithms for constructing optimal phylogenies and show experimental results for one of the variants. The first algorithm is intuitive and reconstructs an optimal near-perfect phylogenetic tree in time (q + k)O(q)nm + O(nm2) where k is the number f characters that share
four gametes with some other character. A second, more involved algorithm shows the problem to be fixed parameter tractable in q by solving it in time qO(q)nm + O(nm2) where n is the number of taxa and m is the number of characters. This is a significant improvement over the previous best result of nmO(q)2O(q2s2), where
s is the number of states per character (2 for binary). We implement the first algorithm and show that it finds the optimal solution quickly for a selection of population datasets including mitochondrial and Y chromosome samples from humans and other primates. Our results describe the first practical phylogenetic tree reconstruction algorithm that finds guaranteed optimal solutions while being easily implemented and computationally feasible for data sets of biologically meaningful size and complexity.

37 pages

*Google, Inc. Mountain View, CA
**ICSI, University of California, Berkeley
++Tepper School of Business, Carnegie Mellon University
+Department of Biological Sciences, Carnegie Mellon Un iversity


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu