CMU-CS-24-139
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-24-139

Deep Learning on Graphs:
Tackling Scalability, Privacy, and Multimodality

Minji Yoon

Ph.D. Thesis

July 2024

CMU-CS-24-139.pdf


Keywords: Deep Learning, Graph Mining, Deep Learning on Graphs, Graph Representation Learning, Graph Neural Networks, Graph Convolution Networks, Graph Neural Architecture Search, Message Passing, Importance Neighborhood Sampling on Graphs, Transfer Learning on Graphs, Heterogeneous Graph Neural Networks, Graph Generative Models, Graph Transformer, Multimodal Graphs, Multimodal Learning, Multimodal Graph Learning, Language Models

Graphs are everywhere, from e-commerce to knowledge graphs, abstracting interactions among individual data entities. Various real-world applications running on graph-structured data require effective representations for each part of the graph – nodes, edges, subgraphs, and the entire graph – that encode its essential characteristics.

In recent years, Deep Learning on Graphs (DLG) has broken ground across diverse domains by learning graph representations that successfully capture the underlying inductive bias in graphs. However, these groundbreaking DLG algorithms sometimes face limitations when applied to real-world scenarios. First, as graphs can be built on any domain that has interactions among entities, real-world graphs are diverse. Thus, for every new application, domain expertise and tedious work are required for hyperparameter tuning to find an optimal DLG algorithm. Second, scales of real-world graphs keep increasing to billions with unfiltered noise. This requires redundant preprocessing such as graph sampling/noise filtering in advance of DLG to be realized in applications. Next, real-world graphs are mostly proprietary, while many DLG algorithms often assume they have full access to external graphs to learn their distributions or extract knowledge to transfer to other graphs. Finally, the advent of single-modal foundation models in language and vision fields has catalyzed the assembly of diverse modalities, resulting in the formulation of multimodal graphs with diverse modalities on nodes and edges. However, learning on multimodal graphs while exploiting the generative capabilities of each modality's foundation models is an open question in DLG.

In this thesis, I propose to make DLG more practical across four dimensions: 1) automation, 2) scalability, 3) privacy, and 4) multimodality. First, we automate algorithm search and hyperparameter tuning under the message-passing framework. Then, we propose to sample each node's neighborhood to regulate the computation cost while filtering out noisy neighbors adaptively for the target task to handle scalability issues. For privacy, we redefine conventional problem definitions, including graph generation and transfer learning, to be aware of the proprietary and privacy-restricted nature of real-world graphs. Finally, I proposed a new multimodal graph learning algorithm that is built on unimodal foundation models and generates content based on multimodal neighbor information.

As the data collected by humanity increases in scale and diversity, the relationships among individual elements increase quadratically in scale and complexity. By making DLG more scalble, privacy-certified, and multimodal, we hope to enable better processing of these relationships and positively impact a wide array of domains.

170 pages

Thesis Committee:
Christos Faloutsos(Co-chair)
Ruslan Salakhutdinov(Co-chair)
Tom M. Mitchell
Jure Leskovec (Stanford University)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu