Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Inference in High Dimensions: Light-speed Algorithms and Information Limits

Project description

Developing algorithms for high-dimensional data inference

Advanced computing technology allows for the collection and storage of huge amounts of data. In many scientific applications, where both data and machine learning are high-dimensional, it is increasingly difficult to interpret information and draw accurate inferences using conventional statistical theory. This poses a particular challenge in medical applications. To address this, the ERC-funded INF_2 project aims to develop a theoretical framework for high-dimensional inference in machine learning and data science. Using a mean-field approach, it will determine the fundamental limits, or minimal data requirements, of inference and devise algorithms that work effectively with the minimal amount of data. These principles will then be adapted to real-life applications in genome-wide association studies.

Objective

Extracting information from data is the key challenge of our time, and in many applications (e.g. genome-wide association studies, data compression, and virtual assistants such as ChatGPT) both the data and the machine learning model used to extract information are increasingly high-dimensional. As traditional statistical theory is ill-equipped to face this explosion in the dimensionality of the problem, machine learning is now predominantly experimental. However, empirical approaches come with huge costs affordable only to large companies, and they lack interpretability, which is especially troublesome in medical applications. To address these issues, the INF^2 project develops information-theoretically principled methods for high-dimensional inference in machine learning and data science. The key insight is that, via a “mean-field” approach, high-dimensional quantities are well approximated by low-dimensional ones and then characterized exactly. Leveraging this characterization, we will (i) establish the fundamental limits of inference, i.e. the minimal amount of data necessary to solve the problem, and (ii) design efficient algorithms requiring only the minimal amount of data. The challenge we tackle is to apply this paradigm to practical settings, in which data are structured and heterogeneous (as in genome-wide association studies), and models consist of complex architectures tailored to applications (auto-encoders for data compression, and transformers for ChatGPT). Through a novel analysis of spectral methods, approximate message passing and gradient descent, INF^2 builds a theoretical framework having conceptual impact, as well as vast applicability, in machine learning and information theory. This framework is then brought to the real world via applications in genome-wide association studies. Broadly, our results enable the principled design of machine learning algorithms and models, drastically reducing costs and providing interpretable solutions.

Host institution

INSTITUTE OF SCIENCE AND TECHNOLOGY AUSTRIA
Net EU contribution
€ 1 662 400,00
Address
Am Campus 1
3400 Klosterneuburg
Austria

See on map

Region
Ostösterreich Niederösterreich Wiener Umland/Nordteil
Activity type
Higher or Secondary Education Establishments
Links
Total cost
€ 1 662 400,00

Beneficiaries (1)