Universal Coding/Prediction and Statistical (In)consistency of Bayesian inference
Part of this talk is based on results of A. Barron (1986) and recent joint work with J. Langford (2004). We introduce the information-theoretic concepts of universal coding and prediction. Under weak conditions on the prior, Bayesian sequential prediction is universal. This means that a code based on the Bayesian predictive distribution allows one to substantially compress data. We give a simple proof of the fact that universality implies consistency of the Bayesian posterior. It follows that Bayesian inconsistency in nonparametric settings (a la Diaconis & Freedman) can only occur if priors are used that do not allow for data compression. This gives a frequentist rationale for Rissanen's Minimum Description Length Principle. We also show that under misspecification, the Bayesian predictions can substantially outperform the predictions of the best distribution in the model. Ironically, this implies that the Bayesian posterior can become *inconsistent*: in some sense good predictive performance implies inconsistency!
Author: Peter Grünwald, National Research Institute For Mathematics And Computer Science