Using Statistics to Search and Annotate Pictures
Google Tech Talks
September 25, 2006
Nuno Vasconcelos is an Assistant Professor at the Electrical and Computer Engineering Department of the University of California, San Diego, where he heads the Statistical Visual Computing Laboratory. Before joining UCSD, he was a member of the research staff at the Compaq Cambridge Research Laboratory, which later became the HP Cambridge Research Laboratory. He received a PhD from MIT in 2000 and his areas of research interest are computer vision, statistical signal processing, machine learning, and multimedia. He is the recipient of a 2005 NSF CAREER award, and a Hellman Fellowship.
The last decade has produced significant advances in content-based image retrieval, i.e. the design of computer vision systems for image search. I will review our efforts in the area, with emphasis on the subject of semantic retrieval. This consists of learning to annotate images, in order to support natural language queries. In particular, I will argue for a retrieval framework which combines the best properties of classical "query by visual example" (QBVE), and more recent semantic methods, and which we denote as "query by semantic example" (QBSE). While simple, we show that, when combined with ideas from multiple instance learning, this framework can be quite powerful. It improves semantic retrieval along a number of dimensions, the most notable of which is generalization (out-of-vocabulary queries). It can also be directly compared to query by example, making it possible to quantify the gains of representing images in semantic spaces. Our results show that these gains are quite significant, even when the semantic characterization is noisy and somewhat unreliable. This suggests an interesting hypothesis for computer vision: that it may suffice to adopt simple visual models, as long as they operate at various levels of abstraction and are learned from large amounts of data.