Peer to Peer Web Search with Minerva
Google TechTalks August 3, 2006 Gerhard Weikum ABSTRACT The peer-to-peer (P2P) computing paradigm is an intriguing alternative to Google-style search engines for querying and ranking Web content. In a network with many thousands or millions of peers the storage and access load requirements per peer are much lighter than for a centralized server farm. On the other hand, P2P Web search also poses major challenges, one of them being the computation, dissemination, and efficient management of statistical measures that are crucial for good search strategies and ranking algorithms. Statistics (e.g., local and global document frequencies, overlap among peers' contents, PageRank-style authority) need to be acquired and maintained in a decentralized manner for scalability, they need to be compact for efficient communication, and they need to provide sufficiently accurate estimators of various measures of interest. This talk will give an overview on our ongoing research on P2P Web search, with emphasis on statistics-driven query routing, decentralized PageRank computation, and exploitation of user behavior. The developed methods have been implemented in the Minerva prototype system, an experimental testbed for P2P research.