CMU-ML-07-109
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-07-109

Scalable Graphical Models for Social Networks

Anna Goldenberg

May 2007

Ph.D. Thesis

CMU-ML-07-109.pdf


Keywords: Graphical models, Bayesian Networks, social networks, latent variable model, social network evolution

This thesis tackles the problems of efficiently learning large probabilistic models for sparse relational data. Recent dramatic increases in the collection of social network data and the rapid growth in probabilistic and statistical approaches to tractable machine learning made it possible to analyze networks with millions of people.

There are many questions one could ask about the formation, properties and dynamics in social networks. This thesis considers the following three questions: 1) given a set of interactions between people, what can be learned about the relations of these people without knowing the true underlying social network; 2) given additional information about each individual in the network, what can be done to improve understanding of their relations; 3) what are the dynamics underlying the formation and the evolution of social networks.

We introduce new algorithms and models for learning about relations in a social network and evolution of those relations over time. We present a scalable search procedure for learning Bayesian Networks from the binary events data, i.e. this structure learning algorithm is based solely on the information about people's participation in the set of given events. We present learning results on very large (up to three million nodes) Bayesian Networks and show how they can be used to understand more about the underlying social networks. We extend this model by incorporating information about individuals, such as their affiliation and interests. We use block modeling to both improve the quality of our Bayesian Networks and learn more about group interaction patterns.

Finally, we introduce a generative mechanism that provides an explanation of the social network evolution. This dynamic generative model is of exploratory nature.

The described models and learning algorithms have one thing in common: they are all motivated by real life phenomena.

123 pages


SCS Technical Report Collection
School of Computer Science homepage

This page maintained by reports@cs.cmu.edu