Periodic Reporting for period 4 - NBEB-SSP (Nonparametric Bayes and empirical Bayes for species sampling problems: classical questions, new directions and related issues)
Reporting period: 2023-09-01 to 2024-02-29
RT1) the study of nonparametric Bayes and nonparametric empirical Bayes methodologies for classical species sampling problems, generalized species sampling problems emerging in biological and physical sciences, and questions thereof in the context of optimal design of species inventories;
RT2) the use of recent mathematical tools from the theory of differential privacy to study the fundamental tradeoff between privacy protection of information, which requires to release partial data, and Bayesian learning in species sampling problems, which requires accurate data to make inference.
With regards to RT2, we have developed the following research lines (RL): RL7) a nonparametric Bayes methodology and a nonparametric empirical Bayes methodology for disclosure risk assessment (also known as the risk of re-identitication), which is a the basis of some modern privacy preserving mechanisms; RL8) species sampling problems within the framework of global differential privacy and the framework of local differential privacy, with respect to suitable perturbation mechanisms, e.g. Laplace and Gaussian noise additions, general exponential mechanisms, generalized randomized response and bit flipping; RL9) a comprehensive theory for goodness-of-fit tests, with emphasis on the study of the power of the test, under global differential privacy and local differential privacy. RL10) a novel randomized mechanism for releasing private data, which is based on the use of synthetic data generated from nonparametric posterior distributions, with applications to species sampling problems; RL11) a computational framework (based on Markov chain Monte Carlo) for Bayesian nonparametric estimation and clustering under local differntial privacy and global differential privacy.
Under the research themes RT1 and RT2, our work has yielded a wealth of groundbreaking results, earning several publication in top-tier journals spanning (mathematical) statistics, applied probability, and machine learning. Our innovative findings have not only made waves in print but have also been showcased by the PI and team members at premier conferences and workshops in the field, capturing the attention of leading experts and driving forward the frontiers of knowledge.
In addition to the research themes RT1 and RT2, we started a new research theme on the theory of deep Bayesian neural networks, which are nowadays very popular in the fields of statistics and machine learning. Several results have been produced in the context of the large width and large depth behaviour of feedforward deep neural networks, also in terms of contraction rates, under both Gaussian random weights and Stable (or heavy-tails) random weights for the network. Quantitaitive central limit theorems for large-width Gaussian neural networks have been also considered by relying on the use of the Stein-Malliavin calculus and second-order Poincaré inequlities. Other results concern with the fundamental problem of the training of feedforward neural networks through the gradient descent, leading to an interesting generalizations of the popular notion of the neural tangent kernel, and setting a link with estimation in kernel regression.