Periodic Reporting for period 1 - APHELEIA (Reconciling Classical and Modern (Deep) Machine Learning for Real-World Applications)
Berichtszeitraum: 2023-09-01 bis 2026-02-28
Trainable algorithms:
A new functional perspective on bilevel optimization was introduced, enabling the use of overparameterized neural inner solvers without requiring strong convexity. This led to scalable algorithms with demonstrated benefits for meta-learning and hyperparameter optimization. An extension to online learning was also proposed. In addition, a long-standing open problem in inverse problems was solved by providing a rigorous theoretical foundation for using pretrained denoisers within iterative algorithms, bridging the gap between empirical heuristics and mathematical guarantees in image restoration.
Image processing and inverse problems:
Novel methods for solving inverse problems with small, unpaired datasets were proposed, achieving state-of-the-art results in deblurring, blind super-resolution, and PSF calibration. The team also introduced HySUPP, an open-source framework for hyperspectral unmixing, and SpectralEarth, a large-scale dataset for pretraining hyperspectral foundation models, significantly improving downstream tasks such as land-cover and crop mapping. Finally, the methods developed during the project also led to a state-of-the-art approach for fluorescence microscopy.
Self-supervised learning and visual recognition:
The work addressed several challenges in self-supervised and visual model design. This includes identifying and fixing instability issues in vision transformers, proposing new architectures for masked image modeling, and establishing reproducible guidelines for distilling large visual models into compact, task-specific students. In addition, a fast, learning-free pipeline (LUDVIG) was introduced for transforming 2D features into 3D scene representations, enabling efficient segmentation and reconstruction.
Astronomy applications:
In the field of astronomy, a novel framework for exoplanet detection was developed, leveraging cross-observation learning to improve sensitivity and robustness. A physically grounded model of speckle noise was also designed, enabling better detection and characterization of faint exoplanets in challenging datasets.
Graph representations:
Finally, a new graph transformer architecture was proposed that extends attention mechanisms to per-channel filters and integrates higher-order topological features directly, achieving strong performance on molecular benchmarks without explicit message-passing.
Scalable non-convex optimization:
New approaches were developed to tackle complex non-convex problems. GloptiNets leverage the spectral structure of smooth target functions to build scalable optimizers that also produce certificates of optimality. Further contributions include efficient solvers for spectral unmixing in hyperspectral image processing and advances in counterfactual risk minimization for continuous actions, improving stability and offline policy selection in real-world systems.
The first addresses a long-standing open problem in inverse problems and image reconstruction. The work on MAP Estimation with Denoisers provides the first rigorous theoretical foundation for widely used Plug-and-Play (PnP) and Regularization-by-Denoising (RED) algorithms. These methods, which replace hand-crafted priors with powerful deep denoisers, have been highly successful in practice but lacked statistical guarantees. This research shows that they can be rigorously interpreted as performing MAP estimation under mild assumptions, proving convergence rates and explaining practical heuristics such as over-smoothing and damping. By bridging practice and theory, it establishes a solid probabilistic framework for developing future reliable and trainable reconstruction methods, with potential impact in domains such as medical imaging and astronomy.
The second achievement focuses on astronomy and exoplanet detection. A novel, physically grounded multi-scale statistical model of stellar speckle noise was developed to tackle one of the main obstacles in direct exoplanet imaging. Integrated into an end-to-end learnable detection and flux estimation pipeline, this approach significantly improves sensitivity and robustness, enabling the reliable detection of faint exoplanets that were previously hidden by noise. It is now being applied to large datasets from VLT/SPHERE, opening the door to new exoplanet discoveries and deeper insights into the limitations of current high-contrast imaging systems.
Finally, in self-supervised learning, the introduction of a new generation of visual foundation models has had a transformative impact. Their open release has enabled thousands of researchers to apply them across diverse domains, from biology and medicine to Earth observation and astronomy. This line of work was further extended with a breakthrough study on Vision Transformers (ViTs), which identified instability caused by hidden “background tokens” and proposed a simple architectural solution—register tokens. This fix, now widely adopted, has reshaped the design of modern transformers and earned the Best Paper Award at ICLR 2024, underscoring its broad significance for the machine learning community.