COMPUTER SCIENCE TECHNICAL REPORT ABSTRACTS

CMU-CS-12-124R
Computer Science Department
School of Computer Science, Carnegie Mellon University

CMU-CS-12-124R

Islands and Bridges: Making Sense of
Marked Nodes in Large Graphs

Leman Akoglu, Jilles Vreeken*, Hanghang Tong**,
Duen Horng Chau, Nikolaj Tatti**, Christos Faloutsos

January 2013

CMU-CS-12-124R.pdf

This report is a revised version of
Computer Science Technical Report CMU-CS-12-124
from August 2012

Keywords: Link mining, connection subgraphs, sensemaking

Suppose we are given a large graph in which, by some external process, a handful of nodes are marked. What can we say about these marked nodes? Are they all close-by in the graph, or are they segregated into multiple groups? How can we automatically determine how many, if any, groups they form as well as find simple paths that connect the nodes in each group? We formalize the problem in terms of the Minimum Description Length principle: a set of paths is simple when we need few bits to describe each path from one node to another. For example, we want to avoid high-degree nodes, unless we need to visit many of its spokes. As such, the best partitioning requires the least number of bits to describe the paths that visit all marked nodes. We show that our formulation for finding simple paths between groups of nodes has connections to well-known other problems in graph theory, and is NP-hard. We propose fast effective solutions, and introduce DOT2DOT, an efficient algorithm for partitioning marked nodes as well as finding simple paths between nodes within parts. Experimentation shows DOT2DOT correctly groups nodes for which good connection paths can be constructed, while separating distant nodes.

28 pages

*University of Antwerp
**IBM T.J. Watson Research Center
***K.U. Leven

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu