Similarity Search: A Web Perspective
Google Tech Talks
October, 18 2007
Similarity search is the problem of preprocessing a database of N objects in such a way that given a query object, one can effectively determine its nearest neighbors in database. "Geometric near-neighbor access tree" data structure, an early work (1995) by Sergey Brin, is one of the most known solutions to this problem.
Similarity search is closely connected to many algorithmic problems in the web. Similarity search is an abstraction of many algorithmic problems we face in data management. In this talk we will focus on:
- Personalized news aggregation: Searching for news articles that are most similar to the user's profile of interests
- Behavioral targeting: Searching for the most relevant advertisement for displaying to a given user.
- Social network analysis: Suggesting new friends.
- Computing co-occurrence similarities.
- "Best match search": Searching resumes, jobs, BF/GF, cars, apartments.
We describe features that make web applications somewhat different from previously studied models. Thus we re-examine the formalization and the classical algorithms for similarity search. This leads us to new algorithms (we present two of them) and numerous open problems in the field.
Speaker: Yury Lifshits
Yury Lifshits obtained his PhD degree from Steklov Institute of Mathematics at S...