Stein’s method and functional inequalities in machine learning

Mathematical statistics is a ubiquitous tool in modern data analysis. However, in typical applications, theoretically supported statistical frameworks cannot be used directly - for instance because they lack closed-form solutions or are too computationally expensive. Because of this, approximate inference techniques have been developed in the recent years as a way to speed up the learning process of algorithms based on statistics. However, the quality of the associated approximations is still not well-understood. Researchers and practitioners therefore look for tools to measure both the efficiency of the associated inference and sampling methods and the size of the error they generate when using approximations. Indeed, such tools are still not widely available in the literature for many of the commonly used models, therefore leaving researchers and practitioners unable to assess whether their methods are simultaneously fast and robust enough for their purposes. At the same time, due to the broad availability of probabilistic programming languages, approximate inference has been applied to a wide variety of problems, related to medical imaging, detection of gravitational waves or, recently, modelling of infectious diseases. Underestimation of uncertainty or inaccuracy of point estimates in those applications could undermine the reliability of the associated research or the quality of measures introduced as a result of it.
The aim of this research project was to advance the development of quality measures for approximations in machine learning and statistics, using the rich theoretical machinery of mathematical analysis and probability. The project has been concluded with five papers, each targeting a specific approximation. Specifically, together with collaborators, we have achieved the following goals. We have constructed fully computable quality guarantees for the Laplace approximation of a Bayesian posterior, with respect to a variety of useful divergences. We have also constructed a new functional-data goodness-of-fit test for Gaussian Process targets and measures absolutely continuous with respect to Gaussians. Moreover, we have provided a novel targeted accuracy diagnostic for distributional approximations. Furthermore, in another paper, we have proved a functional version of the celebrated de Jong Theorem describing the asymptotic behaviour of U-statistics. Finally, we have proved new Berry-Esseen bounds for vector-valued statistics of binomial processes, with respect to the convex distance.

Firstly, we have developed a novel goodness-of-fit test for functional data. Our test is based on a new, infinite-dimensional discrepancy and checks whether the data fit a given Gaussian process or another given model that is absolutely continuous with respect to a Gaussian process. We have checked numerically that our test outperforms the approaches proposed before.
Secondly, we have derived quality guarantees for a common approximate inference method, called the Laplace approximation. The idea of this method is to replace an intractable posterior in Bayesian inference with an appropriate Gaussian distribution. Our guarantees are fully computable from the data. They also control the crucial quantities which are most commonly reported by users of Bayesian inference - posterior means, posterior variances and posterior credible sets. The sample-size and dimension dependence of our guarantees has been shown to be such that it cannot be improved in the generality of our assumptions.
Thirdly, we have completed a project in which we design a new diagnostic tool for the quality of Bayesian approximations, including variational inference in particular. Our diagnostic produces bounds on the error of posterior functionals of interest, including, for instance, component-wise means or variances. This is in contrast to other existing methods which characterize the quality of the whole variational distribution. Indeed, the quality of the whole variational distribution is typically poor in realistic applications, even if specific posterior functionals are accurate. Our new method fills this gap.
Moreover, we have completed two additional probability theory projects which advance the theory of Gaussian approximation - both in finite- and infinite-dimensional contexts. We have proved a novel functional approximation of rescaled degenerate U-statistics with Gaussian processes. We have also provided a new method for bounding the convex distance between Gaussian distributions and functionals of binomial processes.
The above mentioned results were disseminated via publications in international journals, such as Probability Theory and Related Fields, Annals of Applied Probability and Bernoulli, and in the Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS). Moreover, the results were disseminated through a number of presentations at international conferences, workshops and seminars. Our results are available open access, which makes them ready to be exploited by other researchers or industry practitioners.

All our results go substantially beyond the state of the art and are highly novel. Our goodness-of-fit test for functional data is a novel tool in the field of Functional Data Analysis as it does not rely on projecting the target distribution onto a finite-dimensional space. Instead, it uses directly an infinite-dimensional discrepancy. Because of that, its empirical performance is better than that of the competitors. Our bounds on the quality of the Laplace approximation hold under assumptions much weaker than those of any of the results existing before. Additionally, they have been shown to have a better dimension dependence than that of the competitors. In fact, their dimension and sample size dependence are such that they may not be improved in the generality of our assumptions. Our new quality diagnostic for Bayesian approximations is constructed using a novel methodology and is more adapted to the needs of practitioners than the previously designed methods. Indeed, all the methods designed in this project will have impact on the work of practitioners and researchers using approximate Bayesian inference. They will help them assess whether the methods they use are robust enough for their purposes.

Periodic Reporting for period 2 - Stein-ML (Stein’s method and functional inequalities in machine learning)

Diese Seite teilen

Herunterladen