Skip to main content

Nonparametric Bayes and empirical Bayes for species sampling problems: classical questions, new directions and related issues

Periodic Reporting for period 1 - NBEB-SSP (Nonparametric Bayes and empirical Bayes for species sampling problems: classical questions, new directions and related issues)

Reporting period: 2019-03-01 to 2020-08-31

Object of research are species sampling problems, and generalizations thereof, whose importance has grown considerably in recent years driven by numerous applications in the broad area of biosciences, machine learning, theoretical computer science and information theory. Within the broad field of species sampling problems, the research will be focussed on two research themes:

RT1) the study of nonparametric Bayes and nonparametric empirical Bayes methodologies for classical species sampling problems, generalized species sampling problems emerging in biological and physical sciences, and question thereof in the context of optimal design of species inventories;

RT2) the use of recent mathematical tools from the theory of differential privacy to study the fundamental tradeoff between privacy protection of information, which requires to release partial data, and Bayesian learning in species sampling problems, which requires accurate data to make inference.
In the period March 2019 - August 2020 both research themes RT1 and RT2 have been pushed forward. With regards to RT1, we started the following reseach projects: RP1) the development of a theoretical (probabilistic) ground at the basis of nonparametric Bayes and nonparametric empirical Bayes methodologies for classical species sampling problems, with empahsis on large sample asymptotic properties of exchangeable random partitions and discrete random structures thereof; RP2) the extension of nonparametric Bayes and nonparametric empirical Bayes methodologies for species sampling to the more general setting of features/traits sampling models, with applications to cancer genomics and microbial ecology; RP3) the development of a nonparametric empirical Bayes methodology for classical species sampling models under the assumption of power-law data, with applications to the context of natural language processing. With regards to RT2, we started the following reseach projects: RP4) the development of a nonparametric empirical Bayes methodology for disclosure risk assessment, which is a the basis of some modern privacy preserving mechanisms; RP5) the study, and the development, of species sampling problems in the context of privatized data by means of random hashing mechanisms; RP6) the development of a comprehensive theory for goodness-of-fit tests under the framework of differential privacy and of local differential privacy. In addition to RT1 and RT2, we started a new research theme (RT3) under which we aim at investigating the use of deep neural networks, nowadays very popular, in the context of species sampling problems. In this respect, preliminary results have been produced in the context of feedforward and convolutional deep neural networks.
"The work in the periord March 2019 - August 2020, within reseach themes RT1, RT2 and RT3, produced results that represent remarkable progresses beyond the state of the art. In the next months, we will keep on working on the reseach projects started in the periord March 2019 - August 2020. Among these reseach projects, RP2 and RP5 led to novel prominsing ideas that deserve attention and further study. With regards to RP2, it suggested the investigation of a novel class of nonparametric prior distributions for feature/trait allocation models, which may produce more robust inferences than competitive priors currently known in the literature. With regards to RP5, it set forth a novel methodology that arose intersting connections with the following areas: i) sketching algorighms for streaming data; ii) compressed sensing; iii) sparse recovery via sparse matrices; iv) multi-armed bandits for ""better machine learning"". With regards to the research theme RT3, this is constantly being explored in the directions of both modeling and inference (learning). In addition to that, we started a systematic study to develop a novel appraoch to Bayesian consistency. Our approach is tailored to the context of species sampling problems. This is an ambitious projects that aims at revisiting the classical approach to Bayesian consistency within the mathematical framework of optimal transport. We expect that this reseach project will produce interesting results, paving the way to new research directions beyond the context of species sampling problems."