Dimensionality Reduction by Feature Selection in Machine Learning
Dimensionality reduction is a commonly used step in machine learning, especially when dealing with a high dimensional space of features. The original feature space is mapped onto a new, reduced dimensioanllyity space and the examples to be used by machine learning algorithms are represented in that new space. The mapping is usually performed either by selecting a subset of the original features or/and by constructing some new features. This persentation deals with the first approach, feature subset selection. We provide a brief overview of the feature subset selection techniques that are commonly used in machine learning and give a more detailed description of feature subset selection used in machine learning on text data. Performance of some methods used is document categorization is illustrated by providing experimental comparison on real-world data collected from the Web.
Author: Dunja Mladeni?, Jožef Stefan Institute