Enabling Object Search rather than Page Search
Many users really want to find objects rather than pages. In the past few years, there has been significant progress in research in machine learning methods for information extraction (IE).
I'll describe some of this progress, with a focus on recent machine learning advances in entity resolution and schema matching – with the combination of logical clauses and probability, significant gains in accuracy, and scalable engineering. I'll describe the deployment of these ideas in http://rexa.info, a CS research paper digital library with entities and relations for papers, people, grants, topics, and soon venues and institutions.
Speaker: Andrew McCallum
Andrew McCallum is an Associate Professor at University of Massachusetts, Amherst. He was previously Vice President of Research and Development at WhizBang Labs, a company that used machine learning for information extraction from the Web. In the late 1990's he was a Research Scientist and Coordinator at Justsystem Pittsburgh Research Center. He was a post-doctoral fellow at Carnegie Mellon University after receiving his PhD from the University of Rochester in 1995. He is on the editorial board of the Journal of Machine Learning Research. For the past eight years, McCallum has been active in research on statistical machine learning applied to text, especially information extraction, document classification, finite state models, and semi-supervised learning.
Google Tech Talks
March, 11 2008