Syntax Augmented Machine Translation
Google Tech Talks
December, 17 2007
Ashish Venugopal - RESEARCH SCIENTIST
Probabilistic Synchronous Context Free Grammars hold significant promise for machine translation, modeling context sensitive translation and re-odering effects with simple hierarchical operations learned directly from parallel data. Source language sentences are transformed into target language sentences via intermediate nonterminal symbols, typically via bottom up chart parsing with these grammars.
Introducing an N-Gram language model into this search space introduces dependencies between consecutive chart items, making exact search computationally difficult. We present a two pass approach that is motivated by grammars which include a large number of nonterminal symbols. We evaluate this method against a state of the art single pass approach.
The motivation for this two pass approach comes from a desire to include a large number of nonterminal labels in the translation grammar. Initial results using labels from associated phrase structure parse trees are promising, but this data is often noisy and requires human data generation. We propose a novel method to discriminatively learn nonterminal labels towards directly improving translation quality.
Speaker: Ashish Venugopal