Searching the Web by Discovering and Clustering Related Terms

Posted in Science on September 04, 2008


Searching the Web by Discovering and Clustering Related Terms

The amount of information on the web is growing so fast that it is becoming more and more difficult for classical search engines to find relevant information. Indeed, due to the frenetic increase of webpages written in different languages and sometimes in mis-interpreted languages, the degree of ambiguity of the human language has been constantly evolving to levels unseen so far. However, people still query the systems with no more than 2 words on average. As a consequence, new information retrieval systems need to be proposed to decrease the level of ambiguity of the queries. Such systems usually make use of query expansion techniques to solve this problem. In this talk, I will present a system based on the automatic discovery of terms that are related to the query as a means of helping the user to search for relevant information.

This technique can be classified within Interactive Query Expansion systems. However, unlike other systems, we use Web Mining Techniques to discover related terms based on different features such as association measures, document similarity, document relevance, etc. In the second part of my talk, I will present the future extensions of our retrieval systems based on the automatic discovery of relations between related terms. So, by using agglomerative clustering techniques and an auto-fed WebWarehouse, we hope to be able to propose less ambiguous query expansion terms than in present systems where the user needs to sort out the terms he is interested in.

Web spider
Web Spider is a system that returns all related terms and links from a given URL and a given query.
The Spider has been developped using C5.0 machine learning algorithm.

Author: Gaël Dias, Beira Interior University

Watch Video

Tags: Science, Lectures, Computer Science, Clustering, Machine Learning, VideoLectures.Net