CMU-CS-06-142
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-06-142

Detection of Spatial and Spatio-Temporal Clusters

Daniel B. Neill

June 2006

Ph.D. Thesis

CMU-CS-06-142.ps
CMU-CS-06-142.pdf

Keywords: Cluster detection, data mining, algorithms, biosurveillance, fMRI

This thesis develops a general and powerful statistical framework for the automatic detection of spatial and space-time clusters. Our "generalized spatial scan" framework is a flexible, model-based framework for accurate and computationally efficient cluster detection in diverse application domains. Through the development of the "fast spatial scan" algorithm and new Bayesian cluster detection methods, we can now detect clusters hundreds or thousands of times faster than previous approaches. More timely detection of emerging clusters (with high detection power and low false positive rates) was made possible by development of "expectation-based" scan statistics, which learn baseline models from past data then detect regions that are anomalous given these expectations. These cluster detection methods were applied to two real-world problem domains: the early detection of emerging disease epidemics, and the detection of clusters of activity in fMRI brain imaging data. One major contribution of this work is the development of the SSS system for nationwide disease surveillance, currently used in daily practice by several state and local health departments. This system receives data (including emergency department records and medication sales) from over 20,000 stores and hospitals nationwide, automatically detects emerging clusters of disease, and reports these results to public health officials. Through retrospective case studies and semi-synthetic testing, we have shown that our system can detect outbreaks significantly faster than previous disease surveillance methods.

158 pages


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu