|   | CMU-CS-12-146 Computer Science Department School of Computer Science, Carnegie Mellon University 
 
 Fast Anomaly Discovery given Duplicates Jay-Yoon Lee, U Kang, Danai Koutra, Christos Faloutsos December 2012 
 Given a large cloud of multi-dimensional points, and an off-the-shelf outlier detection method, why does it take a week to finish? After careful analysis, we discovered that duplicate points create subtle issues, that the literature has ignored: if dmax is the multiplicity of the most overplotted point, typical algorithms are quadratic on dmax. For graph-related outlier detection, all the satellites of a 'star' node will have identical features, and thus create over-plotting with dmax being the highest degree; due to power law degree distributions, this may be huge, for real graph data. We propose several ways to eliminate the problem; we report wall-clock times and our time savings; and we show that our methods give either exact results, or highly accurate approximate ones. 
16 pages 
 
 | 
| 
    Return to: 
	SCS Technical Report Collection This page maintained by reports@cs.cmu.edu | |