The Biological domain poses new challenges for statistical learning. In the talk we shall analyze and theoretically explain some counter-intuitive experimental and theoretical findings that systematic reversal of classifier decisions can occur when switching from training to independent test data (the phenomenon of anti-learning). We demonstrate this on both natural and synthetic data and show that it is distinct from overfitting. The natural datasets discussed will include: prediction of response to chemo-radio-therapy for esophageal cancer from gene expression (measured by cDNA-microarrays); prediction of genes affecting the aryl hydrocarbon receptor pathway in yeast. The main synthetic classification problem will be the approximation of samples drawn from high dimensional distributions, for which a theoretical explanation will be outlined.
Author: Adam Kowalczyk, National Ict Australia