|   | CMU-CS-03-216 Computer Science Department
 School of Computer Science, Carnegie Mellon University
 
    
     
 CMU-CS-03-216
 
Modeling Syntax for Parsing and Translation 
Peter Venable 
December 2003  
Ph.D. Thesis 
CMU-CS-03-216.psCMU-CS-03-216.ps.gz
 CMU-CS-03-216.pdf
 Keywords: Statistical, syntax, parsing, translation
 Syntactic structure is an important component of natural language
utterances, for both form and content.  Therefore, a variety of
applications can benefit from the integration of syntax into their
statistical models of language.  In this thesis, two new syntax-based
models are presented, along with their training algorithms: a
monolingual generative model of sentence structure, and a model of the
relationship between the structure of a sentence in one language and
the structure of its translation into another language.  After these
models are trained and tested on the respective tasks of monolingual
parsing and word-level bilingual corpus alignment, they are
demonstrated in two additional applications.  First, a new statistical
parser is automatically induced for a language in which none was
available, using a bilingual corpus.  Second, a statistical
translation system is augmented with syntax-based models.  Thus the
contributions of this thesis include: a statistical parsing system; a
bilingual parsing system, which infers a structural relationship
between two languages using a bilingual corpus; a method for
automatically building a parser for a language where no parser is
available; and a translation model that incorporates phrase structure.
 
130 pages 
 |