CMU-ML-19-113
Machine Learning Department
School of Computer Science, Carnegie Mellon University



CMU-ML-19-113

Post-Inference Methods for Scalable Probabilistic
Modeling and Sequential Decision Making

Willie Neiswanger

August 2019

Ph.D. Thesis

CMU-ML-19-113.pdf


Keywords: Machine learning, probabilistic modeling, Bayesian inference, prior knowledge, sequential decision making, Bayesian optimization, distributed inference, parallel algorithms


Probabilistic modeling refers to a set of techniques for modeling data that allows one tospecify assumptions about the processes that generate data, incorporate prior beliefsabout models, and infer properties of these models given observed data. Benefitsinclude uncertainty quantification, multiple plausible solutions, reduction of overfit-ting, better performance given small data or large models, and explicit incorporation of a priori knowledge and problem structure. In recent decades, an array of inferencealgorithms have been developed to estimate these models.

This thesis focuses on post-inference methods, which are procedures that can beapplied after the completion of standard inference algorithms to allow for increasedefficiency, accuracy, or parallelism when learning probabilistic models of big datasets. These methods also allow for scalable computation in distributed or onlinesettings, incorporation of complex prior information, and better use of inferenceresults in downstream tasks. A few examples include:

  • Embarrassingly parallel inference. Large data sets are often distributed over acollection of machines. We first compute an inference result (e.g. with Markovchain Monte Carlo or variational inference) on each machine, in parallel, without communication between machines. Afterwards, we combine the results to yield an inference result for the full data set.

    Prior swapping. Certain model priors limit the number of applicable inference algorithms, or increase their computational cost. We first choose any "convenient prior" (e.g. a conjugate prior, or a prior that allows for computationally cheap inference), and compute an inference result. Afterwards, we use this result to efficiently perform inference with other, more sophisticated priors or regularizers.

  • Sequential decision making and optimization. Model-based sequential decision making and optimization methods use models to define acquisition functions. We compute acquisition functions using the inference result from any probabilistic program or model framework, and perform efficient inference in sequential settings.
We also describe the benefits of combining the above methods, present methodologyfor applying the embarrassingly parallel procedures when the number of machines is dynamic or unknown at inference time, illustrate how these methods can be applied for spatio temporal analysis and in covariate dependent models, show ways to optimize these methods by incorporating test-functions of interest, and demonstrate how these methods can be implemented in probabilistic programming frameworks for automatic deployment.

210 pages

Thesis Committee:
Eric P. Xing (Chair)
Jeff Schneider
Ruslan Salakhutdinov
Ryan P. Adams (Princeton University)
Yee WHye Teh (University of Oxford/Deep Mind)

Roni Rosenfeld, Head, Machine Learning Department
Martial Hebert, Dean, School of Computer Science


SCS Technical Report Collection
School of Computer Science