Robust Projected Clustering with P3C
Clustering is the task of finding groups in data. While traditional clustering algorithms typically measure similarity between objects by considering all attributes/features/dimensions of data objects, projected clustering algorithms attempt to find clusters that may exist only in subspaces, i.e., subsets of attributes. The problem of finding projected clusters is motivated by the fact that in high-dimensional data notions of similarity become less and less meaningful as the dimensionality increases, and meaningful clusters may only exist in smaller subspaces – possibly different for different clusters.
In this talk, I will briefly discuss some prominent approaches to projected clustering, and present a particular projected clustering algorithm P3C, which we have proposed recently, in more detail. P3C does not require many (and often difficult to set) parameter values, and can, under certain conditions (which it shares with most of the approaches proposed in the literature so far), discover the true number of projected clusters. P3C is effective in detecting very low-dimensional projected clusters embedded in high dimensional spaces. P3C is also one of the few projected clustering algorithms that can be extended to deal with categorical data.
Speaker: Jörg Sander
Jörg Sander is currently an Associate Professor at the University of Alberta, Canada. He received his MS in Computer Science in 1996 and his PhD in Computer Science in 1998, both from the University of Munich, Germany. He authored more than 30 papers in international conferences and journals. His current research interests include spatial and spatio-temporal databases, as well as knowledge discovery in databases, especially clustering and data mining in spatial and high-dimensional data sets.
Google Tech Talks
March, 20 2008