CMU-CS-19-103
Computer Science Department
School of Computer Science, Carnegie Mellon University



CMU-CS-19-103

Incorporating Structural Bias into Neural
Networks for Natural Language Processing

Zichao Yang

Ph.D. Thesis

February 2019

CMU-CS-19-103.pdf


Keywords: Natural Language Processing, Neural Networks, Structure Bias, Attention Mechanism, Visual Question Answering, Document Classification, VAE, GAN

Neural networks in recent years have achieved great breakthroughs in natural language processing. Though powerful, neural networks are often statistically inefficient and require large quantities of labeled data to train. One potential reason is that natural language has rich latent structure and general purpose neural architectures have difficulty learning underlying patterns from limited data. In this thesis, we aim to improve the efficiency of neural networks by exploring structural properties of natural language in designing neural model architectures. We accomplish this by embedding prior knowledge into the model itself as a type of inductive bias.

In the first half of this thesis, we explore supervised tasks related to natural language–for example, visual question answering and document classification. We find in those tasks, the inputs have salient features that provide clues to the answers. The salient regions of inputs, however, is not directly annotated and cannot be directly leveraged for training. Moreover, the salient features must be reasoned about and discovered according to context in a step by step manner. By building a specific neural network module using iterative attention mechanism, we are able to localize the most important parts from inputs gradually and use them for prediction. The resulting systems not only achieve the state-of-the-art results, but also provide interpretations for their predictions.

In the second half of this thesis, we explore several unsupervised modeling tasks related to nature language–specifically, variational auto-encoders (VAEs) [59] and generative adversarial networks (GANs) [33]. We find those model designed for continuous inputs such as images do not perform well with natural languages as inputs. The main challenges lie in that the existing neural network modules in VAEs and GANs are not good at dealing with discrete and sequential inputs. To overcome the limitations, we designed network modules with input structure taken into consideration. Specifically, we proposed to use dilated CNNs as decoders for VAEs to control the contextual capacity. For GANs, we proposed to use more structured discriminators to replace the binary classifiers to provide better feedback to generators. Altogether, we have shown that by modifying architectural properties of component modules, we can constrain unsupervised learning problems in a way that makes them more feasible and leads to improved practical results.

90 pages

Thesis Committee:
Eric Xing (Co-Chair)
Taylor Berg-Kirkpatrick (Co-Chair)
Alexander Smola (Amazon)
Ruslan Salakhutdinov
Li Deng (Citadel)

Srinivasan Seshan, Head, Computer Science Department
Tom M. Mitchell, Interim Dean, School of Computer Science


Return to: SCS Technical Report Collection
School of Computer Science

This page maintained by reports@cs.cmu.edu