COMPUTER SCIENCE TECHNICAL REPORT ABSTRACTS

CMU-CS-25-119
Computer Science Department
School of Computer Science, Carnegie Mellon University

CMU-CS-25-119

Explainable Mining of Graphs and Time Series:
Algorithms and Applications

Meng-Chieh Lee

Ph.D. Thesis

June 2025

Keywords: Graph Mining, Network Effects, Random Walks, Belief Propagation, Node Classification, Graph Neural Networks, Linear Graph Neural Networks, Link Prediction, Information Theory, Usable Information, Large Language Models, Large Language Model Agents, Retrieval Augmented Generation, Knowledge Graphs, Graph Anomaly Detection, Minimum Description Length, Frequent Pattern Mining, Learnable Graph Kernel, Kernel Convolution Networks, Graph Property Prediction, Human-Trafficking Detection, Time Series Analysis, Anomaly Detection, Time Series Anomaly Detection, Self-Supervised Learning, Hyperparameter Optimization, Seizure Detection, Group Anomaly Detection

Given a social network graph, how can we predict connections between users and determine whether they are based on shared hobbies or common friends? Given a database containing molecular graphs, how can we determine whether the graphs inhibit HIV replication based on substructures they frequently share? Similarly, in time series data from EEG recording, how can we identify seizures and explain why they are considered abnormal? Although recent machine learning methods have shown improved performance, many remain black-box models, making explainability challenging. This leads us to explainable artificial intelligence (XAI), which offers valuable insights through its explanations and is more practical for deployment in real-world applications.

In this thesis, we focus on developing explainable machine learning methods tailored for graphs and time series. Each method we propose is either inherently explainable, or designed to automatically provide data analysis or justification for its decisions. In each part, we present effective and general algorithms, and explore a broad range of applications.

In the first part, we focus on node-level graph mining. We propose algorithms to analyze various types of information in graphs, e.g., the network effects of the graph structure, and the usable information in the node features. Our proposed linear methods are not only inherently interpretable and fast, but also outperform baselines in solving node classification and link prediction tasks. In node classification, our method improves the accuracy by 10.3% over the second-best baseline, while being 2.5 times faster. In link prediction, our method achieves an average rank 1.1, outperforming baselines on 11 out of 12 real-world datasets. In the application of graph retrieval-augmented generation, our agentic method achieves an average relative improvement of 51%.

In the second part, we focus on graph-level graph mining. We discover frequent substructures using the minimum description length (MDL) principle and learnable graph kernels. In graph anomaly detection, our MDL-based method is 58 times faster than the second-best baseline, while achieving 1.3 times higher average precision. In graph regression, our method with learnable graph kernels improves the mean absolute error by 14.3%. In the application of human trafficking detection, our method detects human trafficking ads with 84% precision, while requiring only 8 hours to process 4 million documents.

In the third part, we focus on time series mining, with an emphasis on time series anomaly detection. Our self-supervised method effectively identifies the ground truth hyperparameters of anomalies in time series data, resulting in an average rank 2.2 compared to the baselines. In the application involving medical EEG signals, unlike traditional point anomaly detection methods, we focus on identifying group anomalies that occur within a short period and exhibit similar abnormal patterns. Our method is fast and scalable, and discovers and ranks both point and group anomalies in 2 minutes for 1 million data points on a stock machine.

257 pages

Thesis Committee:
Christos Faloutsos (Co-Chair)
Leman Akoglu (Co-Chair)
Geoffrey Gordon
Nina Mishra (Amazon)

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu