Computer Science Department
School of Computer Science, Carnegie Mellon University


A Statistical Approach to Algorithmic Analysis
of High-Dimensional Nearest-Neighbor Search

Alexander Gray

February 2004

Keywords: Algorithmic analysis, data-dependent analysis, k-nearest neighbor, range queries, adaptive data structures, high dimensionality, distance distribution

The most commonly used algorithms for spatial data searches such as k-nearest-neighbor and spherical range queries are based on a class of data structures we call space-partitioning trees, which have remained the pragmatic method of choice due to their ability to often empirically provide sub-linear efficiency in reported dimensionalities in the tens and occasionally beyond, in constrast to methods designed for worst-case optimality. Despite long-standing practical interest in a more realistic runtime analysis of such methods, particularly in the high-dimensional case demanded by many modern applications, little further progress has been made since the seminal expected-time analysis of 1977. One fundamental reason for this is that algorithm analysis has not, to date, provided examples of analyses which link algorithmic runtime to probabilistic properties of the input distribution. This paper introduces some basic statistical machinery for making this link, and thereby presents initial steps toward providing a statistically principled framework for distribution-dependent runtime analysis of space-partitioning-based algorithms, with an emphasis on providing explanations for their observed behavior in high-dimensional spaces.

17 pages

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by