Computer Science Department
School of Computer Science, Carnegie Mellon University


Adventures in Ultra-Small-Space Clustering

Adam Meyerson, Liadan O'Callaghan*, Serge Plotkin*

December 2002

Keywords: Algorithms, approximation, clustering, sampling, k-median, facility location

We give a sampling-based algorithm for the k-Median problem, with running time where [.....] k is the desired cluster number and is a confidence parameter. This is the first k-Median algorithm with fully polynominial running time that is independent of n, the size of the data set. It gives a solution that is, with high probability, an O(1) approximation if each cluster in the optimal solution has [....] points. We give near-matching lower bounds showing that this assumption about cluster size is necessary. We also present a related algorithm for finding a clustering that excludes a small number of outliers.

13 pages

* Department of Computer Science, Stanford University

Return to: SCS Technical Report Collection
School of Computer Science homepage

This page maintained by