Computer Science Department
School of Computer Science, Carnegie Mellon University


Structure Based Chemical Shift Prediction
using Random Forests Non-linear Regression

K. Arun*, Christopher James Langmead

July 2005


Keywords: Computational biology, structural biology, Nuclear Magnetic Resonance, NMR, chemical shift, regression, Random Forests

Protein nuclear magnetic resonance (NMR) chemical shifts are among the most accurately measurable spectroscopic parameters and are closely correlated to protein structure because of their dependence on the local electronic environment. The precise nature of this correlation remains largely unknown. Accurate prediction of chemical shifts from existing structures' atomic co-ordinates will permit close study of this relationship. This paper presents a novel non-linear regression based approach to chemical shift prediction from protein structure. The regression model employed combines quantum, classical and empirical variables and provides statistically significant improved prediction accuracy over existing chemical shift predictors, across protein backbone atom types. The results presented here were obtained using the Random Forest regression algorithm on a protein entry data set derived from the RefDB re-referenced chemical shift database.

14 pages

*Department of Biological Sciences, Carnegie Mellon University

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by