Periodic Reporting for period 1 - TAIPO (Trustworthy AI tools for personalized oncology)
Période du rapport: 2023-05-01 au 2025-10-31
This is particularly critical in oncology, where treatment decisions have profound consequences. Clinicians need AI tools that not only provide accurate predictions but also reliably communicate uncertainty - tools that "know when they don't know."
TAIPO (Trustworthy AI in Personalized Oncology) addresses this gap through two objectives:
Objective 1: Develop trustworthy AI tools for cancer diagnosis and patient stratification. We assess and enhance model reliability across all clinical scenarios through novel auditing methods (ModelAuditor), model transformation techniques (ModelTransformer), and transparent risk communication (ModelMonitor).
Objective 2: Establish frameworks for robust modeling of therapy decisions and outcomes, extending trustworthiness principles to survival prediction and therapy recommendation with meaningful uncertainty estimates.
We demonstrate broad applicability across different cancer types (melanoma, blood cancers, ...) and diverse data modalities.
We expect impact on two different axes:
Translational Impact: By enabling reliable uncertainty communication with open source tools, we empower efficient human-AI collaboration where clinicians rely on AI for routine cases while focusing expertise on complex cases.
Methodological Impact: Our auditing framework and theoretical advances (bias-variance-covariance decomposition) establish new standards for AI trustworthiness.
Our progress demonstrates these goals are achievable: successful framework development, novel insights into uncertainty quantification of generative models and foundation model calibration, and clinical validation in lymphoma stratification show a path of how trustworthy AI can become clinical reality.
Development of AI Trustworthiness Framework
We successfully developed ModelAuditor, an AI agent that evaluates whether AI systems work reliably in real hospital settings. This toolkit systematically tests how AI models perform when conditions change - such as when different hospitals use different imaging equipment or serve different patient populations. This addresses a critical challenge: AI systems performing perfectly in one hospital may fail in another due to these variations.
Discovery in AI Behavior
While developing our framework, we made an unexpected discovery with significant implications for medical AI. We found that the newest generation of AI systems, known as foundation models, exhibit fundamentally different patterns when expressing uncertainty compared to traditional AI systems. This discovery may influence how we design tools to ensure AI reliability in clinical settings.
Clinical Applications and Validation
We successfully applied our trustworthy AI framework to real medical challenges. Most notably, we developed a tool for identifying cancer subtypes and applied and validated this system for classifying lymphoma patients (DLBCL), to help doctors identify which patients might benefit from specific treatments. This demonstrates that our approach works effectively in actual clinical settings.
Advanced Methods for Predictions and Uncertainty Quantification
Beyond diagnosis, we are currently developing novel methods for predicting patient outcomes over time. Our survival models will provide predictions about treatment outcomes while communicating their uncertainty.
A significant achievement was extending our calibration framework beyond classification tasks. We introduced a novel bias-variance-covariance decomposition of kernel scores applicable to generative models, published at ICML 2024. This theoretical advance provides a unified framework for uncertainty quantification across diverse AI architectures, including large language models and image generation systems. This helps clinicians understand when to trust AI recommendations versus relying on their clinical expertise.
Setting International Standards
Our work contributed to new international recommendations for evaluating medical AI, published in Nature Methods. By leading the calibration component of this community-wide effort, we helped establish standards that influence how practitioners worldwide assess whether AI tools are safe and effective.
Progress Toward Project Goals
These achievements represent significant progress toward creating AI tools that enhance rather than replace clinical expertise. By developing methods to quantify and improve AI trustworthiness, we are bridging the gap between research results and practical tools that have the potential to improve patient care.
The TAIPO project has yielded significant results with strong potential for clinical impact. Our ModelAuditor toolkit provides a comprehensive solution for evaluating AI trustworthiness under real-world clinical conditions, addressing a critical barrier to adoption. The discovery that foundation models for classification exhibit fundamentally different calibration properties has implications beyond our project, potentially influencing how AI is deployed across diverse use cases.
Our theoretical advances, particularly the bias-variance-covariance decomposition (ICML 2024), establish a mathematical foundation for uncertainty quantification across diverse AI architectures, applicable to large language models and generative image models. The clinical validation for lymphoma patient stratification (currently in revision) demonstrates real-world impact.
Our contributions to the Metrics Reloaded recommendations (Nature Methods 2024) ensure our research directly influences international standards for medical AI evaluation, reaching thousands of practitioners worldwide. This work was the basis for ModelAuditor.
Key Needs for Further Uptake
Regulatory Integration: Engagement with regulatory bodies (EMA) would help to establish formal standards for AI trustworthiness evaluation based on our frameworks.
Multi-center Validation: While our lymphoma application shows promise, broader clinical validation across multiple institutions is needed to demonstrate generalizability.
Clinical System Integration: Seamless integration with existing hospital IT infrastructure remains challenging. Partnerships with EHR vendors would help to embed our tools into clinical workflows.
Commercialization Strategy: We are exploring spin-off opportunities for our software tools while maintaining open-source components for academic use.
Continued Research: The rapid evolution of AI architectures, particularly multimodal foundation models, requires sustained research efforts to maintain relevance.