Project description
Shining light into the black box of Bayesian algorithms for big data
Bayesian analysis, a method of statistical inference that applies probability to update our belief about the model based on the observations, is fundamental to many statistical and machine learning algorithms for big data. It supports understanding of processes for complex problems, including assessing climate change and tracking the spread of a disease. However, Bayesian methods are reaching their limits to include the explosion of available data, and attempts to speed up processing are largely black box solutions. The EU-funded BigBayesUQ project is developing a theory for scalable Bayesian methods enabling quantification of performance, limitations, and uncertainty. This will enhance accuracy and subsequently support from a broad community of scientists and researchers.
Objective
Recent years have seen a rapid increase in available information. This has created an urgent need for fast statistical and machine learning methods that can scale up to big data sets. Standard approaches, including the now routinely used Bayesian methods, are becoming computationally infeasible, especially in complex models with many parameters and large data sizes. A variety of algorithms have been proposed to speed up these procedures, but these are typically black box methods with very limited theoretical support. In fact empirical evidence shows the potentially bad performance of such methods. This is especially concerning in real-world applications, e.g. in medicine. In this project I shall open up the black box and provide a theory for scalable Bayesian methods combining recent, state-of-the-art techniques from Bayesian nonparametrics, empirical process theory, and machine learning. I focus on two very important classes of scalable techniques: variational and distributed Bayes. I shall establish guarantees, but also limitations, of these procedures for estimating the parameter of interest, and for quantifying the corresponding uncertainty, within a framework that will also convince outside of the Bayesian paradigm. As a result, scalable Bayesian techniques will have more accurate performance, and also better acceptance by a wider community of scientists and practitioners. The proposed research, although motivated by real world problems, is of a mathematical nature. In the analysis I consider mathematical models, which are routinely used in various fields (e.g. high-dimensional linear and logistic regressions are the work horses in econometrics or genetics). My theoretical results will provide principled new insights that can be used, for instance in multiple specific applications I am involved in, including developing novel statistical methods for understanding fundamental questions in cosmology and the early detection of dementia using multiple data sources.
Fields of science (EuroSciVoc)
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: https://op.europa.eu/en/web/eu-vocabularies/euroscivoc.
This project's classification has been validated by the project's team.
CORDIS classifies projects with EuroSciVoc, a multilingual taxonomy of fields of science, through a semi-automatic process based on NLP techniques. See: https://op.europa.eu/en/web/eu-vocabularies/euroscivoc.
This project's classification has been validated by the project's team.
- natural sciences computer and information sciences data science big data
- natural sciences mathematics applied mathematics statistics and probability bayesian statistics
- natural sciences computer and information sciences artificial intelligence machine learning
- natural sciences mathematics applied mathematics mathematical model
Keywords
Project’s keywords as indicated by the project coordinator. Not to be confused with the EuroSciVoc taxonomy (Fields of science)
Project’s keywords as indicated by the project coordinator. Not to be confused with the EuroSciVoc taxonomy (Fields of science)
Programme(s)
Multi-annual funding programmes that define the EU’s priorities for research and innovation.
Multi-annual funding programmes that define the EU’s priorities for research and innovation.
-
HORIZON.1.1 - European Research Council (ERC)
MAIN PROGRAMME
See all projects funded under this programme
Topic(s)
Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.
Calls for proposals are divided into topics. A topic defines a specific subject or area for which applicants can submit proposals. The description of a topic comprises its specific scope and the expected impact of the funded project.
Funding Scheme
Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.
Funding scheme (or “Type of Action”) inside a programme with common features. It specifies: the scope of what is funded; the reimbursement rate; specific evaluation criteria to qualify for funding; and the use of simplified forms of costs like lump sums.
HORIZON-ERC - HORIZON ERC Grants
See all projects funded under this funding scheme
Call for proposal
Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.
Procedure for inviting applicants to submit project proposals, with the aim of receiving EU funding.
(opens in new window) ERC-2021-STG
See all projects funded under this callHost institution
Net EU financial contribution. The sum of money that the participant receives, deducted by the EU contribution to its linked third party. It considers the distribution of the EU financial contribution between direct beneficiaries of the project and other types of participants, like third-party participants.
20136 Milano
Italy
The total costs incurred by this organisation to participate in the project, including direct and indirect costs. This amount is a subset of the overall project budget.