CMU-CS-06-176
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-06-100

Distributed Pattern Discovery in Multiple Streams

Jimeng Sun, Spiros Papadimitriou*, Christos Faloutsos

January 2006

CMU-CS-06-100.pdf


Keywords: Data mining, stream mining, distributed mining, privacy preserving data mining

Given m groups of streams which consist of n1,...nm co-evolving streams in each group, we want to: (i) incrementally find local patterns within a single group, (ii) efficiently obtain global patterns across groups, and more importantly, (iii) efficiently do that in real time while limiting shared information across groups. In this paper, we present a distributed, hierarchical algorithm addressing these problems. It first monitors local patterns within each group and further summarizes all local patterns from different groups into global patterns. The global patterns are leveraged to improve and refine the local patterns, in a simple and elegant way. Moreover, our method requires only a single pass over the data, without any buffering, and limits information sharing and communication across groups. Our experimental case studies and evaluation confirm that the proposed method can perform hierarchical correlation detection efficiently and effectively.

17 pages

*IBM Watson Research Center, New York. This work was done while he was studying at Carnegie Mellon University.


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu