Skip to main content

General theory for Big Bayes

Periodic Reporting for period 1 - GTBB (General theory for Big Bayes)

Reporting period: 2019-10-01 to 2021-03-31

In the modern era of complex and large data sets, there is stringent need for flexible, sound and scalable inferential methods to analyze them. Bayesian approaches have been increasingly used in statistics and machine learning and in all sorts of applications such as biostatistics, astrophysics, social science etc. Major advantages of Bayesian approaches are: their ability to model complex models in a hierarchical way, their coherency and ability to deliver not only point estimators but also measures of uncertainty from the posterior distribution which is a probability distribution on the parameter space at the core of all Bayesian inference. The increasing complexity of the data sets raise huge challenges for Bayesian approaches: theoretical and computational and I believe that we are now at a turning point where the tool which have made Bayesian approaches so successful in the last 20 years have reached their limits and new directions need to be considered. The aim of this project is to develop a general theory for the analysis of Bayesian methods in complex and high (or infinite) dimensional models which will cover not only fine understanding of the posterior distributions but also an analysis of the output of the algorithms used to implement the approaches.
The main objectives of the project are (briefly):
1. Asymptotic analysis of the posterior distribution of complex high dimensional models
2. Asymptotic analysis of approximate computational Bayes approaches – Interractions between asymptotics and computations.

In addition, some of the models studied and developed in the project will be applied in various fields of science.
In the first 30 months of this project, I have developed models and theory for 3 aspects of the GTBB project, namely around statistics for network models, deep neural networks and Hawkes processes. I have also studied the theoretical properties of approximate Bayesian computation methods in the context of model mispecification. These works have lead to 6 publications and 4 preprints.

1. Hawkes processes [Task 1]: In the first 30 months of the project we have finished our first paper on asymptotic theory for Bayesian nonparametric estimation of linear Hawkes processes, this paper has been published in the Annals of Statistics. We have then extended these results to the Bayesian nonparametric estimation of multivariate Hawkes processes, together with the estimation of the graph of interaction. This work is available on Arxiv and has been presented in a number of seminars and conferences. Moreover, at the beginning of the Covid pandemic we have used a discrete time Hawkes process to model the early stages of the epidemic, this work has been published in PLoSone.

2. Network models : modelling and inference. [Task 1]
Networks are complex objects and it is common knowledge that in real life the networks encountered present characteristics that are not well represented by the most commonly used and studied models in statistics. In particular in social science, biology etc, networks typical are sparse, with power law degree distributions and various types of clustering behaviour. We have built on the seminal paper Caron and Fox (2015) to develop new models, study a wide class of models called the graphex models and study the theoretical properties of a Bayesian approach. These works have been presented in 4 preprints and 1 published article.

3. Deep neural networks [Task 1]: Deep neural networks are widely used in practice and are well known to perform extremely well in a lot of situations when the aim is to make prediction. Despite a large amount of work from various branches of mathematics, it is still not at all clear why inference based on deep neural networks is so effective. Moreover there is a very large collection of models, with different architectures, types of activation functions etc and all these models are known to be difficult to tune in practice. Following an approach based on that analysis of infinitely width networks, we have studied the impact of initialisation in these regimes, in particular via the reknown Neural Tangent Kernel approach, in three papers, two of which have been published and the third is available on Arxiv. Also PDRA 2, has studied the construction of penalties which have a prior interpretation in the context of DNN.

4. Approximate Bayesian approaches [Task 3]: PDRA 1 (Alisa Kirichenko) has worked on making statistical methods more robust: (i) One paper (AISTAT) shows that one can make a Bayesian method robust towards misspecification by using Generalized Bayesian inference instead. (ii) In the second project we generalise the notion of the minimax convergence rates by making them robust towards random (and potentially data-dependant) sample sizes. We show that in most cases the time-robust minimax rates differ from the standard ones by at most a logarithmic factor, and give an example of the problem, for which they differ by exactly an iterated logarithmic factor.
In addition to the results presented above, a number of projects have been either advanced and are almost finished or have been initiated.

In the domain of uncertainty quantifications, we have in particular studied the reknown Bernstein von Mises theorem for non regular semi-parametric models, where complex priors on the nonparametric parts are also considered and we have also developed a semi-parametric procedure which is asymptotically efficient in semi-parametric hidden Markov models, which is a widely used class of models which have lead to interesting recent developments in its semi-parametric version, since they can lead to robust segmentations or clustering procedures. These results are first steps into the domain of Bernstein von Mises results and uncertainty quantification in high dimensional mixture models, whose existence are still mostly non explored. In addition to these results we have also derived an interesting tool to study deconvolution models (a special class of mixture models) which we have applied to Bayesian approaches. All these results aim at obtaining a better understanding of the behaviour of the posterior distribution in complex mixture models to determine in particular if Bayesian measures of uncertainty have stable or reliable properties as the amount of information in the data (i.e. the sample size) increases. Finally we have investigate spatial adaptation in Bayesian methods and this work has just been submitted on Arxiv.

Uncertainty quantification for other families of complex models will be investigated in the second half of the project, in particular point processes. In this domain, we have started investigating families of point processes outside the class of Hawkes processes. The study of these processes has lead us to investigate the problem of inference of functions which leave on a manifold.

Approximate Bayesian methods are also being developed and some progress have been obtained towards the analysis of ABC algorithms in high-dimensions for which theoretical results are still sparse.
Judith Rousseau