Kernel Methods in Computational Biology
Many problems in computational biology and chemistry can be formalized as classical statistical problems, e.g., pattern recognition, regression or dimension reduction, with the caveat that the data are often not vectors. Indeed objects such as gene sequences, small molecules, protein 3D structures or phylogenetic trees, to name just a few, have particular structures which contain relevant information for the statistical problem but can hardly be encoded into finite-dimensional vector representations. Kernel methods are a class of algorithms well suited for such problems. Indeed they extend the applicability of many statistical methods initially designed for vectors to virtually any type of data, without the need for explicit vectorization of the data. The price to pay for this extension to non-vectors is the need to define a positive definite kernel between the objects, formally equivalent to an implicit vectorization of the data.
Author: Jean-Philippe Vert, Ecole des Mines de Paris - Paris Tech