Periodic Reporting for period 1 - MIA-NORMAL (Medical Image Analysis with Normative Machine Learning)
Okres sprawozdawczy: 2023-09-01 do 2026-02-28
The MIA-NORMAL project addresses this unmet need by introducing Normative Representation Learning (NRL), a paradigm shift that equips ML models with a foundational sense of normality. Trained exclusively on large datasets of healthy patients, NRL can identify anomalies without requiring prior knowledge of specific diseases. The project develops a robust theoretical framework and practical methods across cross-sectional, sequential, and multi-modal data types, integrating imaging with clinical records and text to create tools that are generalisable, interpretable, and scalable.
The expected impact is to enable reliable anomaly detection and confirmation of normality at the point of care. By bringing automated expertise to the patient’s side, NRL will democratize access to advanced diagnostics, reduce the burden on specialists, and accelerate access to early, patient-specific, preventive medicine. Ultimately, this will help transform healthcare into a more efficient, equitable, and proactive system that can better meet the growing demands of modern medicine.
Aim 1: Normative Representation Learning Theory
A major advance was D³GM (Dynamical Systems Driven Diffusion Generative Models), introduced at NeurIPS 2024. D³GM reformulates diffusion sampling as a measure-preserving random dynamical system, improving stability and generalisability in inverse problems. This directly addresses instability and reproducibility concerns in generative models. Complementary work at CVPR 2025 introduced the Image Retrieval Score (IRS) and Diversity-Aware Diffusion Models (DiADM), exposing and mitigating diversity collapse. Together, these advances provide principled metrics and modelling strategies ensuring that NRL frameworks capture the full variability of healthy anatomy.
Aim 2: Cross-sectional NRL
The project achieved top performance in international benchmarks such as MICCAI MOOD and VLM3D, confirming the strength of NRL-based anomaly detection. These methods have also been adopted in industrial defect detection, highlighting transferability. Within medicine, new approaches include multi-task synthetic anomaly training and L-FUSION (2025), a fetal ultrasound segmentation framework that combines NRL priors with foundation model features. L-FUSION improves segmentation, generates counterfactual healthy outputs, and provides uncertainty maps for routine quality assurance.
Aim 3: Sequential NRL
Progress in temporal modelling has been published at CVPR and MICCAI. Contributions include unsupervised pose estimation for adult and infant motion, new biomarkers for neonatal care, and improved generative video models addressing diversity collapse in physiological motion synthesis. NRL-Based video and dataset summarisation frameworks reduce storage needs and streamline retrospective data analysis. Thus, we are working towards sequential NRL models that autonomously capturestemporal dependencies that are overlooked by supervised models.
Aim 4: Multi-modal NRL
The project collaborates with CT-RATE, establishing a dataset of many thousand CT volumes paired with reports, enabling NRL-based vision-language models that surpass supervised baselines for multi-abnormality detection. Beyond radiology, we extended NRL into computational pathology, focusing on kidney transplant assessment. A new dataset combining gigapixel whole-slide images, pathology reports, and molecular data supports the development of NRL-driven pathology tools is work in progress. Our models capture normative tissue across scale, enable robust data quality control, and multimodal integration.
So far, MIA-NORMAL has delivered theoretical advances in generative modelling, benchmark-leading anomaly detection, advances in temporal modelling, and new multi-modal resources spanning imaging, pathology, and text. These outputs have already led to high-impact publications, benchmark leadership, and open-source tools, validating NRL as a foundation for future clinical AI.
For cross-sectional NRL, the project achieved top performance in benchmarks such as MICCAI MOOD and VLM3D. Developed data imputation methods were adopted in domains like industrial defect detection, showing transferability. These results set a benchmark for unsupervised anomaly detection and demonstrate how NRL can provide clinically relevant triage without annotated datasets.
Sequential NRL progressed through CVPR and IEEE papers. One introduced unsupervised pose estimation for adult and infant motion, suggesting biomarkers in neonatal care. Another solved diversity collapse in video synthesis, improving representation of physiological motion. A reinforcement-learning-based video summarisation method further reduced storage and reporting costs in ultrasound. Together, these advances show how sequential NRL captures temporal dependencies invisible to supervised models.
In multi-modal NRL, we collaborated on the CT-RATE dataset, the first dataset of more than 50,000 CT volumes paired with radiology reports. This enabled models that surpass supervised baselines in multi-abnormality detection without manual annotation. Building on this, we showed how joint latent spaces for images and text can support anomaly detection, report generation, and retrospective analysis. These advances lay the foundation for NRL vision-language foundation models in medical imaging.
An unexpected outcome was the rise of agentic AI systems equipped with NRL awareness. Our work on Bayesian Decoding Games introduced a game-theoretic framework for consistency in language models, while our MICCAI paper on MESHAgents presented a collaborative system that autonomously identified imaging phenotypes and confounders in cardiovascular cohorts. This work stream shows that normative embeddings can ground autonomous agents as collaborators in biomedical discovery. This inspires further proof-of-concept work, which aims to extend NRL-aware agentic AI to oncology, with pancreatic cancer as the first application.