Reconciling Classical and Modern (Deep) Machine Learning for Real-World Applications

Projektinformationen

APHELEIA

ID Finanzhilfevereinbarung: 101087696

DOI

10.3030/101087696

EK-Unterschriftsdatum 27 Juni 2023

Startdatum 1 September 2023

Enddatum 31 August 2028

Finanziert unter

European Research Council (ERC)

Gesamtkosten

€ 1 999 375,00

EU-Beitrag

€ 1 999 375,00

1 999 375,00

Koordiniert durch

INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET AUTOMATIQUE
France

Periodic Reporting for period 1 - APHELEIA (Reconciling Classical and Modern (Deep) Machine Learning for Real-World Applications)

Berichtszeitraum: 2023-09-01 bis 2026-02-28

Despite the undeniable success of machine learning in addressing a wide variety of technological and scientific challenges, the current trend of training predictive models with an evergrowing number of parameters from an evergrowing amount of data is not sustainable. These huge models, often engineered by large corporations benefiting from huge computational resources, typically require learning a billion or more of parameters. They have proven to be very effective in solving prediction tasks in computer vision, natural language processing, and computational biology, for example, but they mostly remain black boxes that are hard to interpret, computationally demanding, and not robust to small data perturbations. With a strong emphasis on visual modeling, the grand challenge of APHELEIA is to switch the trajectory of machine learning to a sustainable path, by developing a new generation of models that are robust, interpretable, efficient, and that do not require massive amounts of data to produce accurate predictions. To achieve this objective, we will foster new interactions between classical signal processing, statistics, optimization, and modern deep learning. Our goal is to reduce the need for massive data by enabling scientists and engineers to design trainable machine learning models that directly encode a priori knowledge of the task semantics and data formation process, while automatically prefering simple and stable solutions over complex ones. These models will be built on solid theoretical foundations with convergence and robustness guarantees, which are important to make real-life trustworthy predictions in the wild. We will implement these ideas in an open-source software toolbox readily applicable to visual recognition and inverse imaging problems, which will also handle other modalities. This will stimulate interdisciplinary collaborations, with the potential to be a game changer in the way scientists and engineers design and solve machine learning problems.

The project has advanced several key areas of machine learning, optimization, and real-world applications.

Trainable algorithms:
A new functional perspective on bilevel optimization was introduced, enabling the use of overparameterized neural inner solvers without requiring strong convexity. This led to scalable algorithms with demonstrated benefits for meta-learning and hyperparameter optimization. An extension to online learning was also proposed. In addition, a long-standing open problem in inverse problems was solved by providing a rigorous theoretical foundation for using pretrained denoisers within iterative algorithms, bridging the gap between empirical heuristics and mathematical guarantees in image restoration.

Image processing and inverse problems:
Novel methods for solving inverse problems with small, unpaired datasets were proposed, achieving state-of-the-art results in deblurring, blind super-resolution, and PSF calibration. The team also introduced HySUPP, an open-source framework for hyperspectral unmixing, and SpectralEarth, a large-scale dataset for pretraining hyperspectral foundation models, significantly improving downstream tasks such as land-cover and crop mapping. Finally, the methods developed during the project also led to a state-of-the-art approach for fluorescence microscopy.

Self-supervised learning and visual recognition:
The work addressed several challenges in self-supervised and visual model design. This includes identifying and fixing instability issues in vision transformers, proposing new architectures for masked image modeling, and establishing reproducible guidelines for distilling large visual models into compact, task-specific students. In addition, a fast, learning-free pipeline (LUDVIG) was introduced for transforming 2D features into 3D scene representations, enabling efficient segmentation and reconstruction.

Astronomy applications:
In the field of astronomy, a novel framework for exoplanet detection was developed, leveraging cross-observation learning to improve sensitivity and robustness. A physically grounded model of speckle noise was also designed, enabling better detection and characterization of faint exoplanets in challenging datasets.

Graph representations:
Finally, a new graph transformer architecture was proposed that extends attention mechanisms to per-channel filters and integrates higher-order topological features directly, achieving strong performance on molecular benchmarks without explicit message-passing.

Scalable non-convex optimization:
New approaches were developed to tackle complex non-convex problems. GloptiNets leverage the spectral structure of smooth target functions to build scalable optimizers that also produce certificates of optimality. Further contributions include efficient solvers for spectral unmixing in hyperspectral image processing and advances in counterfactual risk minimization for continuous actions, improving stability and offline policy selection in real-world systems.

We want to highlight three major achievements spanning theory, real-world applications, and foundational machine learning.

The first addresses a long-standing open problem in inverse problems and image reconstruction. The work on MAP Estimation with Denoisers provides the first rigorous theoretical foundation for widely used Plug-and-Play (PnP) and Regularization-by-Denoising (RED) algorithms. These methods, which replace hand-crafted priors with powerful deep denoisers, have been highly successful in practice but lacked statistical guarantees. This research shows that they can be rigorously interpreted as performing MAP estimation under mild assumptions, proving convergence rates and explaining practical heuristics such as over-smoothing and damping. By bridging practice and theory, it establishes a solid probabilistic framework for developing future reliable and trainable reconstruction methods, with potential impact in domains such as medical imaging and astronomy.

The second achievement focuses on astronomy and exoplanet detection. A novel, physically grounded multi-scale statistical model of stellar speckle noise was developed to tackle one of the main obstacles in direct exoplanet imaging. Integrated into an end-to-end learnable detection and flux estimation pipeline, this approach significantly improves sensitivity and robustness, enabling the reliable detection of faint exoplanets that were previously hidden by noise. It is now being applied to large datasets from VLT/SPHERE, opening the door to new exoplanet discoveries and deeper insights into the limitations of current high-contrast imaging systems.

Finally, in self-supervised learning, the introduction of a new generation of visual foundation models has had a transformative impact. Their open release has enabled thousands of researchers to apply them across diverse domains, from biology and medicine to Earth observation and astronomy. This line of work was further extended with a breakthrough study on Vision Transformers (ViTs), which identified instability caused by hidden “background tokens” and proposed a simple architectural solution—register tokens. This fix, now widely adopted, has reshaped the design of modern transformers and earned the Best Paper Award at ICLR 2024, underscoring its broad significance for the machine learning community.

Periodic Reporting for period 1 - APHELEIA (Reconciling Classical and Modern (Deep) Machine Learning for Real-World Applications)

Herunterladen Den Inhalt der Seite herunterladen