COMPUTER SCIENCE TECHNICAL REPORT ABSTRACTS

CMU-CS-05-164
Computer Science Department
School of Computer Science, Carnegie Mellon University

CMU-CS-05-164

A Randomized Algorithm for Learning Mahalanobis Metrics:
Application to Classification and Regression of Biological Data

Christopher James Langmead

July 2005

CMU-CS-05-164.pdf

Keywords: Computational biology, metric learning, classification, regression

We present a randomized algorithm for semi-supervised learning of Mahalanobis metrics over Rⁿ.
The inputs to the algorithm are a set, U, of unlabeled points in Rⁿ, a set of pairs of points,
S = {(x,y)_i}; x,y ∈ U, that are known to be similar, and a set of pairs of points, D = {(x,y)_i}; x,y ∈ U,
that are known to be dissimilar. The algorithm randomly samples S, D, and m-dimensional subspaces
of Rⁿ and learns a metric for each subspace. The metric over Rⁿ is a linear combination of the subspace metrics. The randomization addresses issues of efficiency and overfitting. Extensions of the algorithm to learning non-linear metrics via kernels, and as a pre-processing step for dimensionality reduction are discussed. The new method is demonstrated on a regression problem (structure-based chemical shift prediction) and a classification problem (predicting clinical outcomes for immunomodulatory strategies for treating severe sepsis).

15 pages

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu