European Commission logo
polski polski
CORDIS - Wyniki badań wspieranych przez UE
CORDIS

Flexible Bayesian Non-Parametric Priors

Final Report Summary - FLEXIBNPP (Flexible Bayesian Non-Parametric Priors)

The use of Bayesian non-parametric (BNP) priors in applied statistical modeling has become increasingly popular in the last few years. From the seminal paper of Ferguson (1973, Annals of Statistics), the Dirichlet Process and its extensions have been increasingly used to address inferential
problems in many fields. Examples range from variable selection in genetics to linguistics, psychology, human learning , image segmentation and applications to the neurosciences. The increased interest in non-parametric Bayesian approaches to data analysis is motivated by a number
of attractive inferential properties. For example, BNP priors are often used as flexible models to describe the heterogeneity of the population of interest, as they implicitly induce a clustering of the observations into homogeneous groups. In the big data era, there is a growing need of models for describing the main features of large and non trivial datasets that are increasingly available for the easiness of collecting information through the modern networks (for instance Internet). This proposal provided flexible priors for explaining such datasets, in particular two research
lines have been developed: 1) Vectors of Dependent BNP priors for modeling information pooling across unit, 2) Non-exchangeable BNP priors for modeling the heterogeneity of the data . The project has been developed around three Specific Aims (SA):

SA1. Introduction of Novel BNP priors and Theoretical studies
SA2. New efficient algorithms for addressing inference
SA3. Applications to real datasets

As illustrated in the project website:

https://sites.google.com/site/flexibnp/ ,

FLEXIBNPP led to 20 published papers, 1 book chapter, 1 discussion and 4 submitted papers. Novel priors have been introduced such as the Beta-GOS prior (2014, Journal of the American Statistical Association), the Beta-product dependent prior (2014, Journal of Econometrics) and priors based on a novel class of dependent Bayesian non-parametric priors called Compound Random Measures (2017, Journal of the Royal Statistical Society - Series B). In particular, theoretical properties of Compound Random Measures have been studied (2018, Statistics and Probability Letters).
A novel Bayesian nonparametric model for conditional copulas has been introduced and used for analysing data about twins' cognitive abilities (2018, Journal of the Royal Statistical Society - Series C). The Beta-Gos prior has been used to study chromosomal-aberration in Breast cancer and the Beta-product dependent prior has been used to study business cycles in related markets. Furthermore, FLEXIBNPP partially funded a PhD scholarship. Part of the PhD topic focused on applications of vectors of completely random measures to Survival Analysis. The investigation led to the introduction of a model for survival functions when multiple samples information is available (2018, Electronic Journal of Statistics) and to a novel survival regression model which takes into account for nonproportionality. Theoretical properties of these models have been studied such as posterior characterisations and asymptotic consistency. The project achieved important results about computational algorithms for posterior inference in complex models by focusing on methods that make use of approximate likelihoods. To this respect, the Bootstrap likelihood idea (2016, Australian and New Zealand journal of Statistics) and the survey paper (2018, Statistics Surveys) were pivotal for introducing a novel method based on quantiles (2018, https://arxiv.org/abs/1802.00796). Furthermore, a tailored algorithm for density regression for Compound Random Measures has been introduced (2018, Bayesian Analysis). This algorithm represents the first proposal in Bayesian nonparametric that makes use of a pseudo marginal approach. As it can be appreciated from the website, the project produced many other outputs which will have an impact on modelling complex dataset.