CMU-ML-20-110
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-20-110

Learning DAGs with Continuous Optimization

Xun Zhang

August 2020

Ph.D. Thesis

CMU-ML-20-110.pdf


Keywords: Bayesian network, structure learning, causal discovery


Learning the structure of directed acyclic graphs (DAGs, also known asBayesian networks) from data is an important and classical problem in machine learning, with prominent applications in causal inference, fairness, interpretability, and biology, etc. This is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes. Existing approaches often rely on various local heuristics for enforcing the acyclicity constraint. By contrast, structure learning for undirected graphical models (e.g. Gaussian MRF) is recognized as a tractable optimization problem nowadays, and achieved huge success in various practical domains such as bioinformatics.

In this thesis, we take a first step towards bridging this gap between directedand undirected graphical models. We begin by introducing a fundamentally different strategy for Bayesian network structure learning: We formulate the problemas a purely continuous optimization program over real matrices that avoids thec ombinatorial constraint entirely. This is achieved by a novel characterization of acyclicity that is not only smooth but also exact. The resulting problem can be efficiently solved by standard numerical algorithms, without imposing any structural assumptions on the graph such as bounded treewidth or in-degree.

We then study the generalization of the above continuous algorithm to learning nonparametric DAGs. We extend the algebraic characterization of acyclicity to nonparametric structural equation model (SEM) by leveraging nonparametric sparsity based on partial derivatives, resulting in a continuous optimization problem that can be applied to a variety of nonparametric and semiparametric models including GLMs, additive noise models, and index models as special cases.

Lastly, we introduce a unified view of score-based and ICA-based methods based on the proposed continuous optimization framework. In particular, weshow that the popular ICA-based methods that exploits non-Gaussianity of the independent noise distribution can be handled by the continuous optimization framework, which is conceptually clearer and easier to incorporate prior knowledge,and has the potential to be generalized to allow for models with hidden confounders and feedback loops.

99 pages

Thesis Committee:
Eric P. Xing (Co-Chair)
Pradeep Ravikumar (Co-Chair)
Clark Glymour
Kun Zhang
Raquel Urtasun (University of Toronto)

Roni Rosenfeld, Head, Machine Learning Department
Martial Hebert, Dean, School of Computer Science


SCS Technical Report Collection
School of Computer Science