Development of Large-Scale Grammars Through Corpus Construction (Japanese Audio)

Posted in Science, Companies, Conferences on June 04, 2012



Google Tech Talk
14:00- JST Oct 27 2010
At Google Japan (Japanese Audio)

Speaker : Yusuke Miyao (宮尾祐介)
Bio : http://www.nii.ac.jp/en/faculty/digital_content/MIYAO-Yusuke/
Affiliation : National Institute of Informatics (国立情報学研究所)

Language : spoken in Japanese and slides in English

Title : Development of large-scale grammars through corpus construction

Abstract :
A crucial bottleneck of grammar-based deep parsing is the difficulty
of the development of large-scale grammars that can analyze real-world
sentences. In our approach, the final goal of grammar development is
the construction of a treebank (parsed corpus) that conforms to a
grammar theory. Given an existing corpus (e.g. Penn Treebank) and a
grammar theory, we can construct a treebank at low cost. Since a
large-scale lexicon can be extracted automatically from the treebank,
a large-scale grammar can be developed in a short period. In this
talk, I overview our method of corpus-based grammar development, in
comparison with manual grammar development and grammar learning.

Japanese title : コーパス構築に基づく大規模文法開発

Japanese abstract :
文法に基づく深い構文解析の最大の問題点は,実世界の文を解析できる大規模
文法の実装が困難なことである.コーパス構築に基づく文法開発手法では,文
法開発の最終目的を,文法理論に基づくツリーバンク(解析済みコーパス)の
構築と考える.既存のコーパス(Penn Treebank など)と文法理論を利用すると,
ツリーバンクは比較的低コストで構築することができる.すると,大規模辞書
はツリーバンクから自動獲得できるので,大規模文法を短期間で開発すること
が可能となった.本トークでは,人手による文法開発や文法学習と対比しなが
ら,コーパス構築に基づく文法開発手法を概説する.

Watch Video

Tags: Google Tech Talk, computational linguistics, Machine Learning, Google, GoogleTechTalks