|   | CMU-CS-05-164 Computer Science Department
 School of Computer Science, Carnegie Mellon University
 
    
     
 CMU-CS-05-164
 
A Randomized Algorithm for Learning Mahalanobis Metrics:Application to Classification and Regression of Biological Data
 
Christopher James Langmead 
July 2005  
CMU-CS-05-164.pdf Keywords: Computational biology, metric learning, classification, regression
 We present a randomized algorithm for semi-supervised learning 
of Mahalanobis metrics over Rn.
 The inputs to  the algorithm are a set, U, of unlabeled points in 
Rn, a set of pairs of points,
 S = {(x,y)i}; x,y ∈  U, 
that are known to be similar, and a set
of pairs of points, 
D =  {(x,y)i}; x,y ∈  U,
 that are known to be dissimilar. The algorithm randomly samples 
S, D, and m-dimensional subspaces
 of 
 Rn and learns a metric
for each subspace. The  metric over  Rn is a linear
combination of the subspace metrics. The randomization addresses
issues of efficiency and overfitting. Extensions of the algorithm
to learning non-linear metrics via kernels, and as a
pre-processing step for dimensionality reduction are discussed.
The new method is demonstrated on a regression problem
(structure-based chemical shift prediction) and a classification
problem (predicting clinical outcomes for immunomodulatory
strategies for treating severe sepsis).
 
15 pages 
 
 |