Large-scale Image Classification: ImageNet and ObjectBank
Google Tech Talk (more info below)
May 5, 2011
Presented by Professor Fei-Fei Li, Stanford University
A key challenge in visual recognition is to recognize and label a large number of visual concepts, such as object and scene classes. In this talk, I'll discuss two recent projects in the Stanford Vision Lab on this topic. ImageNet is a large-scale image ontology that is built on the backbone structure of WordNet. In this talk, we show briefly how ImageNet is constructed using Amazon Mechanical Turk. And given the largest (publicly available) dataset of tens of thousands of image classes, how today's state-of-the-art computer vision algorithms do in the problem of large-scale image classification? Then I will discuss a new image representation called "Object Bank", that is a significant departure from all the previous image representation techniques such as Bag-of-Words models using low-level features (SIFT, HOG, GIST, etc.). We show that using the new Object Bank representation, a simple linear SVM classifier can result in superior performances in all standard image classification datasets. Furthermore, sparsity algorithms make our representation more efficient and scalable for large scene datasets, and reveal semantically meaningful feature patterns.