General theory for Big Bayes

Periodic Reporting for period 2 - GTBB (General theory for Big Bayes)

Reporting period: 2021-04-01 to 2022-09-30

In the modern era of complex and large data sets, there is stringent need for flexible, sound and scalable inferential methods to analyze them. Bayesian approaches have been increasingly used in statistics and machine learning and in all sorts of applications such as biostatistics, astrophysics, social science etc. Major advantages of Bayesian approaches are: their ability to model complex models in a hierarchical way, their coherency and ability to deliver not only point estimators but also measures of uncertainty from the posterior distribution which is a probability distribution on the parameter space at the core of all Bayesian inference. The increasing complexity of the data sets raise huge challenges for Bayesian approaches: theoretical and computational and I believe that we are now at a turning point where the tool which have made Bayesian approaches so successful in the last 20 years have reached their limits and new directions need to be considered. The aim of this project is to develop a general theory for the analysis of Bayesian methods in complex and high (or infinite) dimensional models which will cover not only fine understanding of the posterior distributions but also an analysis of the output of the algorithms used to implement the approaches.
The main objectives of the project are (briefly):
1. Asymptotic analysis of the posterior distribution of complex high dimensional models
2. Asymptotic analysis of approximate computational Bayes approaches – Interractions between asymptotics and computations.

In addition, some of the models studied and developed in the project will be applied in various fields of science.

In the first half of this project, we have developed models and theory for 3 aspects of the GTBB project, namely around statistics for network models, deep neural networks and point processes. We have also studied the theoretical properties of approximate Bayesian computation methods in the context of model misspecification. These works have lead to 5 publications and 15 preprints.

1.[Task 1] Complex modelling and inference. We have studied 4 families of complex models: Point processes (Hawkes), Networks, Diffusion models and Deep Neural networks.
i) Hawkes processes : Bayesian nonparametric methods for multivariate nonlinear Hawkes processes and estimation of the graph of interactions. This work is available on Arxiv. Moreover, at the beginning of the Covid pandemic we have used a discrete time Hawkes process to model the early stages of the epidemic, this work has been published in PLoSone.
(ii) Networks are complex objects and it is common knowledge that in real life the networks encountered present characteristics that are not well represented by the most commonly used and studied models in statistics. In particular in social science, biology etc, networks typical are sparse, with power law degree distributions and various types of clustering behaviour. We have built on the seminal paper Caron and Fox (2015) to develop new models, study a wide class of models called the graphex models and study the theoretical properties of a Bayesian approach. These works have been presented in 3 preprints and 1 published article.
(iii) PDRA 3 has studied Bayesian nonparametric methods for estimating the (gradient) drift vector field in a multi-dimensional (reversible) diffusion models, which are primary physical models that describe the motion of a particle immersed in a potential energy field.
(iv) Deep neural networks and links with bayesian inference [Task 1]: Deep neural networks are widely used in practice and are well known to perform extremely well in a lot of situations when the aim is to make prediction. Despite a large amount of work from various branches of mathematics, it is still not at all clear why inference based on deep neural networks is so effective. Moreover there is a very large collection of models, with different architectures, types of activation functions etc and all these models are known to be difficult to tune in practice. Following an approach based on that analysis of infinitely width networks, we have studied the impact of initialisation in these regimes in 4. Also PDRA 2, has studied the construction of penalties which have a prior interpretation in the context of DNN and various initialisation regimes. both works are available as preprints. Finally DNN model functions which are compositions, PDRA 3 has shown in a continuous regression model that if the true function satisfies a compositional assumption, then the posterior based on any mean-zero Gaussian process can only recover the truth at a rate that is strictly slower than the minimax rate by a factor that is polynomially suboptimal in the sample size. On the contrary, deep learning techniques such as certain deep neural networks have been shown to achieve the minimax rate. Thus, in the presence of composition structures, Gaussian process regression is provably outperformed, in terms of estimation rates, by these alternative techniques.

2.[Task 2] Semiparametric inference and uncertainty quantification.
(i) Non standard loss functions: We have studied concentration of the posterior distribution under non standard loss functions to understand the impact of the prior on local features of the functions of interest. This has lead to 2 preprints (with C. Scricciolo and V. Rockova resp.).
(ii) Bernstein von Mises theorems: we have proposed a pseudo posterior approach to ensure that the resulting distributions behave well for multiple targets, namely the cut - posterior. we have applied this idea to the nonparametric hidden Markov models, thus obtaining an efficient semi-parametric Bernstein von Mises theorem for the transition matrix, together with a well behaved posterior distribution on the emission distributions and the smoothing probabilities. this has lead to 1 preprint.

4. [Task 3] Approximate Bayesian approaches:
(i) Robustness: PDRA 1 has shown shows that one can make a Bayesian method robust towards misspecification by using Generalized Bayesian inference, published in AISTAT. She has also generalised the notion of the minimax convergence rates in the context of random sample sizes, which has lead to 1 preprint.
(ii) Bayes factors: We have studied efficient algorithms to compute the marginal likelihoods in mixture models and have studied their asymptotic behaviour, with important implications for Bayesian tests. This has lead to 1 preprint.
(iii) Coresets: We have proposed and studied a simple procedure to make Bayesian inference in the context of very large data sets, as an alternative to distributed inference.

For Task 1.
Hawkes processes : Our result is the first to derive asymptotic results for nonparametric inference in nonlinear Hawkes processes, which is significantly more complex than linear Hawkes processes. We are currently working on high dimensional Hawkes processes and on variational methods for such models [Task 3]
Other point processes : We are finishing a project on spatial point processes with covariates [PDRA1 and PDRA 3].
PhD2 has completed his first paper on estimation of a density near a manifold. He is currently working on posterior contraction rates under Wasserstein Metrics.
PhD1 is finishing a project on BNP models for non context free grammars.
Deep neural network and Bayesian inference: We have shown for various architectures that initialisation can have a dramatic effect on the learning ability of the network (via the nTK but not only) and we have proposed in the case of residual networks a rescaling which allows for much more stabilisation. With PDRA 3 and co authors have published a paper on optimally learning compositional functions and the inability of Gaussian processes to do that.
For Task 2: Uncertainty and semi-parametric inference. In addition to what was described in the previous section:
We are finishing a paper on Bernstein von Mises (BvM) theorem for non regular semi-parametric models. We are also extending the work on hidden Markov models to the case with unknown number of components.
We have obtained an inversion inequality which allows to understand better Wasserstein neighbourhood of mixing densities in deconvolution problems, which we have extended to the multivariate case This result should be useful for further analysis related to uncertainty quantification in such models.
We have also studied the development of tools to study posterior local properties for functions whose regularity is spatially varying. We have written one paper and used some of these ideas in the context of spatial point processes.
We have obtained an asymptotic precise description of the posterior distribution for a class of sparse networks which allows for uncertainty quantification . This result covers in particular the the misspecified case.
We have also started working on the use of Bayesian bootstrap methods for targeted inference, which is of particular importance in the context of causal inference. I expect that a first paper will be ready next fall.
Uncertainty quantification for other families of complex models will be investigated in the second half of the project, in particular point processes.

For Task 3:
Approximate Bayesian methods are also being developed and some progress have been obtained towards the analysis of ABC algorithms in high-dimensions for which theoretical results are still sparse. We are finishing a paper on extensions of the existing theoretical results on ABC algorithms to heterogeneous and high dimensional summary statistics.
We have also studied and proposed a Bayesian coreset approach for very large data sets. Further extensions to this work will be considered in the second half of the project.

Judith Rousseau

Periodic Reporting for period 2 - GTBB (General theory for Big Bayes)

Share this page

Download