Computer Science Department
School of Computer Science, Carnegie Mellon University


A Randomized Algorithm for Learning Mahalanobis Metrics:
Application to Classification and Regression of Biological Data

Christopher James Langmead

July 2005


Keywords: Computational biology, metric learning, classification, regression

We present a randomized algorithm for semi-supervised learning of Mahalanobis metrics over Rn.
The inputs to the algorithm are a set, U, of unlabeled points in Rn, a set of pairs of points,
S = {(x,y)i}; x,y ∈ U, that are known to be similar, and a set of pairs of points, D = {(x,y)i}; x,y ∈ U,
that are known to be dissimilar. The algorithm randomly samples S, D, and m-dimensional subspaces
of Rn and learns a metric for each subspace. The metric over Rn is a linear combination of the subspace metrics. The randomization addresses issues of efficiency and overfitting. Extensions of the algorithm to learning non-linear metrics via kernels, and as a pre-processing step for dimensionality reduction are discussed. The new method is demonstrated on a regression problem (structure-based chemical shift prediction) and a classification problem (predicting clinical outcomes for immunomodulatory strategies for treating severe sepsis).

15 pages

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by