Computer Science Department
School of Computer Science, Carnegie Mellon University


Learning Generative Models from Incomplete Data

Yao Chong Lim

M.S. Thesis

August 2019


Keywords: Machine learning, generative models, imputation, missing data, variational autoencoders

Real-world machine learning systems must handle missing data well in order to build robust and reliable systems. One way to tackle this problem is to build a generative model that can learn from incomplete data. Such models can then be used in applications such as image restoration and missing value imputation. To achieve this, this thesis introduces a deep generative model, the Variational Auto-decoder (VAD), a variant of the stochastic gradient variational Bayes (SGVB) estimator first introduced by Kingma and Welling in 2013. To improve the robustness of the model to varying rates of missing data during training and testing, the VAD framework directly optimizes parameters of the approximate posterior of the latent variables. Compared to the common variational autoencoder (VAE) implementation of SGVB, no encoder network is used to approximate the parameters of the approximate posterior. Through empirical evaluation on six datasets, including datasets on image classification, facial landmark detection, and multimodal language, we show that the VAD framework is more robust to different rates of missing data than previous generative models for incomplete data.

54 pages

Thesis Committee:
Louis-Philippe Morency (Chair)
Alexander Hauptmann

Srinivasan Seshan, Head, Computer Science Department
Martial Hebert, Dean, School of Computer Science

Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by