A theory of similarity functions for learning and clustering
Kernel methods have proven to be very powerful tools in machine learning. In addition, there is a well-developed theory of sufficient conditions for a kernel to be useful for a given learning problem. However, while a kernel function can be thought of as just a pairwise similarity function that satisfies additional mathematical properties, this theory requires viewing kernels as implicit (and often difficult to characterize) maps into high-dimensional spaces. In this talk I will describe a more general theory that applies to more general similarity functions (not just legal kernels) and furthermore describes the usefulness of a given similarity function in terms of more intuitive, direct properties of the induced weighted graph. An interesting feature of the proposed framework is that it can also be applied to learning from purely unlabeled data, i.e., clustering. In particular, one can ask how much stronger the properties of a similarity function should be (in terms of its relation to the unknown desired clustering) so that it can be used to *cluster* well. Investigating this question leads to a number of interesting graph-theoretic properties, and their analysis in the inductive setting uses regularity-lemma type results of [FK99,AFKK03]. This work is joint with Maria-Florina Balcan and Santosh Vempala.
Author: avrim blum, carnegie mellon university