The Long Road from Text to Meaning
Computers have given us a new way of thinking about language. Given a large sample of language, or corpus, and computational tools to process it, we can approach language as physicists approach forces and chemists approach chemicals. This approach is noteworthy for missing out what, from a language-user's point of view, is important about a piece of language: its meaning.
I shall present this empiricist approach to the study of language and show how, as we develop accurate tools for lemmatisation, part-of-speech tagging and parsing, we move from the raw input -- a character stream -- to an analysis of that stream in increasingly rich terms: words, lemmas, grammatical structures, Fillmore-style frames. Each step on the journey builds on a large corpus accurately analysed at the previous levels. A distributional thesaurus provides generalisations about lexical behaviour which can then feed into an analysis at the ‘frames' level. The talk will be illustrated with work done within the ‘Sketch Engine' tool.
For much NLP and linguistic theory, meaning is a given. Thus formal semantics assumes meanings for words, in order to address questions of how they combine, and WSD (word sense disambiguation) typically takes a set of meanings (as found in a dictionary) as a starting point and sets itself the challenge of identifying which meaning applies. But, since the birth of philosophy, meaning has been problematic. In our approach meaning is an eventual output of the research programme, not an input.
Author: Adam Kilgarriff