Advanced machine learning for Innovative Drug Discovery

Informations projet

AIDD

N° de convention de subvention: 956832

DOI

10.3030/956832

Projet clôturé

Date de signature de la CE 24 Août 2020

Date de début 1 Janvier 2021

Date de fin 31 Mars 2025

Financé au titre de

EXCELLENT SCIENCE - Marie Skłodowska-Curie Actions

Coût total

€ 3 926 573,39

Contribution de l’UE

€ 3 926 573,39

3 926 573,39

Coordonné par

HELMHOLTZ ZENTRUM MUENCHEN DEUTSCHES FORSCHUNGSZENTRUM FUER GESUNDHEIT UND UMWELT GMBH
Germany

Periodic Reporting for period 2 - AIDD (Advanced machine learning for Innovative Drug Discovery)

Période du rapport: 2023-01-01 au 2025-03-31

AIDD was a Marie Skłodowska-Curie Initial Training European Industrial Doctorate Network at the interface between chemistry, computer science, and life sciences, providing well-structured multidisciplinary training and educating highly-in-demand machine learning specialists capable of operating in interdisciplinary and international research and business settings. Cornerstones of AIDD’s curriculum included online lectures and periodic schools delivered by internationally-leading experts, coming from a balanced consortium of researchers in academia, SMEs, and large companies. The scientific goals of the AIDD project were:

1) To develop a suite of interoperable, open-source modular AI tools to support core tasks in computational chemistry and drug discovery, tailored primarily for the pharmaceutical sector.
2) To optimise de novo drug design using multi-modal data integration, enhancing molecular design through integration of diverse data types.
3) To implement AI-driven toxicity and side-effect prediction pipelines by developing robust AI modules capable of filtering out candidate molecules with undesirable properties such as toxicity, non-specificity, and adverse side effects.
4) To push innovation in reaction prediction and retrosynthetic planning, streamlining synthetic chemistry.
5) To create next-generation tools for Molecular Dynamics (MD) simulations: ML-enhanced tools to accelerate molecular dynamics simulations for binding affinity prediction.
6) To integrate explainability and expert knowledge into ML models, developing methodologies to incorporate human expert reasoning and provide interpretable AI outputs.

AIDD has produced a robust and interoperable suite of open-source tools addressing all of these aims, collectively forming the “One Chemistry” framework, which holds strong exploitation potential across the pharmaceutical industry and academic research.

Here we summarise the ESRs contributions to the goals stated above, before elaborating on efforts dedicated to dissemination, communication, outreach.

(1) In developing interoperable, open-source AI tools for tasks like property prediction, molecule generation, and synthesis planning. ESRs 1–15 contributed modular tools to the “One Chemistry” platform, with integration led by ESRs 1 and 2. These tools were designed for flexibility and reuse across diverse research contexts.
(2) In de novo drug design and multimodal data integration, ESRs 13 and 15 enriched AI models with quantum data, enhancing ADME prediction and enabling inverse molecular design. ESR9 incorporated expert knowledge into generative workflows [3], while ESR5’s PILOT and PoLiGenX tools integrated 3D geometry, binding pockets, and latent molecular features. ESR14 aligned molecular, transcriptomic, and imaging data using contrastive learning to improve toxicity and activity prediction. ESR6 advanced compound prioritization by integrating microscopy images with annotations, and ESR8 developed CLOOME, a contrastive learning framework combining molecular graphs and Cell Painting images to bridge chemical and biological data for drug design.
(3) In toxicity and side-effect prediction, ESR5 applied 3D equivariant GNNs to enhance biological relevance in toxicity modeling. ESRs 9 and 11 co-developed E-GuARD to improve robustness in detecting interference effects, while ESR16 integrated ADMETox filtering into a triaging pipeline at VPC. ESR14 developed multiple toxicity models, addressed dataset imbalance, and introduced novel training schemes for better DILI prediction. ESR6 used phenotypic screening data for early compound triaging, and ESR8’s CLOOME embeddings supported activity prediction and MoA classification, indirectly aiding toxicity profiling.
(4) In reaction and retrosynthesis prediction, ESRs 3 and 7 collaborated on retrosynthesis models and benchmarking tools, leading to the Modelzoo package and its integration into AiZynthFinder. ESR7 also developed a multi-objective MCTS algorithm while ESR3 focused on convergent multi-step synthesis. ESR4 worked on reaction yield prediction, revealing challenges in model transferability. ESR12 contributed to reagent modeling and visualization tools, supporting integration with retrosynthesis workflows. ESR1 evaluated transformer efficiency in retrosynthesis, and ESR9 began adapting tools to real lab conditions using expert feedback.
(5) ESR15 advanced molecular dynamics (MD) by developing ML-enhanced methods for electrostatics and free energy calculations with DFT-level accuracy, reducing reliance on manual parameterization.
(6) In explainability, ESR9 integrated expert input into molecule generation and co-developed E-GuARD for interpretable data augmentation. ESR1 assessed XAI limitations in toxicity models, while ESR14’s active learning framework improved data efficiency. ESR10 contributed to model trust through calibrated uncertainty, and ESR7’s retrosynthesis algorithm and design workflow embedded expert reasoning. ESR12 enabled visual exploration of reagent space and improved model inference time. ESR13’s work on Graphormers enhanced explainability by revealing how models encode molecular properties.

AIDD’s training program included six PhD schools across Europe, covering cheminformatics, advanced ML and computation methods, experimental techniques and training in communication, IP, research ethics and entrepreneurship. ESRs and PIs presented at top conferences (ICLR, ICML, NeurIPS) and public events like Pint of Science and Lange Nacht der Forschung. AIDD also co-organised ICANN2024, the 1st AIDD workshop at ICANN2024, the Tox24 Challenge and a Special Issue of J. Cheminformatics (https://www.biomedcentral.com/collections/AIDR). Dissemination included press releases from TU Dortmund, JKU Linz, SUPSI, and others. AIDD’s Twitter/X account posted 484 tweets, reaching 198,000 views, with 200 reposts and 310 likes. The LinkedIn group shared 60 posts, earning 3,731 likes and 223 reposts. Publications are available at https://scholar.google.com/citations?user=exVyEcUAAAAJ with 1,244 citations to date.

AIDD successfully trained 16 cross-disciplinary experts in advanced ML/AI for life sciences. Six have completed their PhDs and secured roles at leading universities, pharmaceutical companies, and start-ups, with the rest expected to defend by mid 2026. The project produced a modular, open-source suite of tools, designed for seamless integration into industrial pipelines. Notably, the Modelzoo package has been adopted into AstraZeneca’s AiZynthFinder, and several ESRs have continued their research within industry, demonstrating direct uptake and exploitation of AIDD outputs. AIDD’s contributions represent state-of-the-art advances in AI for drug discovery, including graph transformers pretrained on quantum properties, convergent retrosynthesis algorithms, equivariant diffusion-based and human-in-the-loop generative design, and integration of chemical structures with phenotypic imaging data. These innovations are accessible through open-source code and open-access publications, supporting further research, teaching, and cross-disciplinary exploration. Societally, AIDD’s tools accelerate early-stage drug discovery, improve compound filtering to reduce animal testing, and promote safer, more efficient drug development. Public access to these validated AI tools fosters a more equitable and impactful model of scientific innovation.

AIDD concept

TMAP vizualization of frequent hitters chemical space.

Convergent Route Prediction

Human-in-the-loop AL for molecular generation

CLOOME

Active Learning with BERT

TSNE maps for reactions in USPTO and Reaxys for reaction classes (https://doi.org/10.1039/D2SC06798F

Invariant Graph Neural Networks for Ligand Generation

Pretraining Graphormers with Quantum Properties for ADMET Modelling

Equivariant Graph Neural Networks for Toxicity Prediction

Workflow used to build the winning model of the Kaggle Challenge

Announcement of Tox24 Challenge organised during ICANN2024 (https://e-nns.org/icann2024/challenge/)

Periodic Reporting for period 2 - AIDD (Advanced machine learning for Innovative Drug Discovery)

Partager cette page Partager cette page sur les réseaux sociaux

Télécharger Télécharger le contenu de la page