# Bayesian nonparametrics in document and language modeling

Bayesian nonparametric models have garnered significant attention in recent years in both the machine learning and statistics communities. These are highly flexible models whose complexity grows with the amount of data, and are nice approaches to addressing the common problem of model selection. In this talk I shall first give a brief overview of Dirichlet processes and infinite mixture models, the cornerstone of Bayesian nonparametric models. Then I shall introduce the hierarchical Dirichlet process, a Bayesian nonparametric model for problems involving multiple related groups of data. I illustrate the use of hierarchical Dirichlet processes using some applications to document and language modelling.

*Google Tech Talks
August 28, 2008*

**Speaker: Yee Whye Teh
**I am interested in statistical machine learning and its applications. Specifically, I look into theories, models and methodologies to make graphical models applicable to large and complex problems. I am also keen on deploying the knowledge gained in applications ranging from natural language processing, to machine vision to biological problems.

I studied Computer Science and Mathematics at the University of Waterloo, obtaining my B.Math. in 1997. Then I embarked on my graduate studies at the University of Toronto under the tutelage of Geoffrey Hinton. Between 1999 and 2001 I spent two years in London England at the Gatsby Computational Neuroscience Unit where Geoff was the founding director. In 2001 we returned to Toronto and I finished my Ph.D. in 2003. Immediately I started a postdoc with Michael Jordan at UC Berkeley, finishing in 2004. Starting in January 2007, I am now a lecturer back at the Gatsby Unit.