CMU-ML-14-100
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-14-100

Modeling Large Social Networks in Context

Qirong Ho

July 2014

Ph.D. Thesis

CMU-ML-14-100.pdf


Keywords: Social Networks, Statistical Models, Scalable Algorithms, Big Data, Distributed Systems, Cluster Computing, Network Side Information, Triangle Features


Today's social and internet networks contain millions or even billions of nodes, and copious amounts of side information (context) such as text, attribute, temporal, image and video data. A thorough analysis of a social network should consider both the graph and the associated side information, yet we also expect the algorithm to execute in a reasonable amount of time on even the largest networks. Towards the goal of rich analysis on societal-scale networks, this thesis provides (1) modeling and algorithmic techniques for incorporating network context into existing network analysis algorithms based on statistical models, and (2) strategies for network data representation, model design, algorithm design and distributed multi-machine programming that, together, ensure scalability to very large networks. The methods presented herein combine the flexibility of statistical models with key ideas and empirical observations from the data mining and social networks communities, and are supported by software libraries for cluster computing based on original distributed systems research. These efforts culminate in a novel mixedmembership triangle motif model that easily scales to large networks with over 100 million nodes on just a few cluster machines, and can be readily extended to accommodate network context using the other techniques presented in this thesis.

257 pages

Thesis Committee:
Eric P. Xing (Chair)
Christos Faloutsos
William W. Cohen
Mark S. Handock (UCLA)


SCS Technical Report Collection
School of Computer Science