Special Seminar in Applied Mathematics
Bayesian inference can behave badly if the model under consideration is wrong yet useful: the posterior may fail to concentrate even for large samples, leading to extreme overfitting in practice. We introduce a test that can tell from the data whether we are heading for such a situation. If we are, we adjust the learning rate (equivalently: make the prior lighter-tailed, or penalize the likelihood more) in a data-dependent way. The resulting "safe" estimator continues to achieve good rates with wrong models. In classification problems, it learns faster in easy settings, i.e. when a Tsybakov condition holds. The safe estimator is based on empirical mixability, which generalizes an idea from worst-case online prediction. Thus, safe estimation connects three paradigms: Bayesian inference, (frequentist) statistical learning theory and (worst-case) on-line prediction.
* For an informal introduction to the idea, see Larry Wasserman's blog