MACHINE LEARNING TECHNICAL REPORT ABSTRACTS

	CMU-ML-20-110 Machine Learning Department School of Computer Science, Carnegie Mellon University CMU-ML-20-110 Learning DAGs with Continuous Optimization Xun Zhang August 2020 Ph.D. Thesis CMU-ML-20-110.pdf Keywords: Bayesian network, structure learning, causal discovery Learning the structure of directed acyclic graphs (DAGs, also known asBayesian networks) from data is an important and classical problem in machine learning, with prominent applications in causal inference, fairness, interpretability, and biology, etc. This is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes. Existing approaches often rely on various local heuristics for enforcing the acyclicity constraint. By contrast, structure learning for undirected graphical models (e.g. Gaussian MRF) is recognized as a tractable optimization problem nowadays, and achieved huge success in various practical domains such as bioinformatics. In this thesis, we take a first step towards bridging this gap between directedand undirected graphical models. We begin by introducing a fundamentally different strategy for Bayesian network structure learning: We formulate the problemas a purely continuous optimization program over real matrices that avoids thec ombinatorial constraint entirely. This is achieved by a novel characterization of acyclicity that is not only smooth but also exact. The resulting problem can be efficiently solved by standard numerical algorithms, without imposing any structural assumptions on the graph such as bounded treewidth or in-degree. We then study the generalization of the above continuous algorithm to learning nonparametric DAGs. We extend the algebraic characterization of acyclicity to nonparametric structural equation model (SEM) by leveraging nonparametric sparsity based on partial derivatives, resulting in a continuous optimization problem that can be applied to a variety of nonparametric and semiparametric models including GLMs, additive noise models, and index models as special cases. Lastly, we introduce a unified view of score-based and ICA-based methods based on the proposed continuous optimization framework. In particular, weshow that the popular ICA-based methods that exploits non-Gaussianity of the independent noise distribution can be handled by the continuous optimization framework, which is conceptually clearer and easier to incorporate prior knowledge,and has the potential to be generalized to allow for models with hidden confounders and feedback loops. 99 pages Thesis Committee: Eric P. Xing (Co-Chair) Pradeep Ravikumar (Co-Chair) Clark Glymour Kun Zhang Raquel Urtasun (University of Toronto) Roni Rosenfeld, Head, Machine Learning Department Martial Hebert, Dean, School of Computer Science

SCS Technical Report Collection School of Computer Science