Periodic Reporting for period 2 - AIDD (Advanced machine learning for Innovative Drug Discovery)
Période du rapport: 2023-01-01 au 2025-03-31
1) To develop a suite of interoperable, open-source modular AI tools to support core tasks in computational chemistry and drug discovery, tailored primarily for the pharmaceutical sector.
2) To optimise de novo drug design using multi-modal data integration, enhancing molecular design through integration of diverse data types.
3) To implement AI-driven toxicity and side-effect prediction pipelines by developing robust AI modules capable of filtering out candidate molecules with undesirable properties such as toxicity, non-specificity, and adverse side effects.
4) To push innovation in reaction prediction and retrosynthetic planning, streamlining synthetic chemistry.
5) To create next-generation tools for Molecular Dynamics (MD) simulations: ML-enhanced tools to accelerate molecular dynamics simulations for binding affinity prediction.
6) To integrate explainability and expert knowledge into ML models, developing methodologies to incorporate human expert reasoning and provide interpretable AI outputs.
AIDD has produced a robust and interoperable suite of open-source tools addressing all of these aims, collectively forming the “One Chemistry” framework, which holds strong exploitation potential across the pharmaceutical industry and academic research.
(1) In developing interoperable, open-source AI tools for tasks like property prediction, molecule generation, and synthesis planning. ESRs 1–15 contributed modular tools to the “One Chemistry” platform, with integration led by ESRs 1 and 2. These tools were designed for flexibility and reuse across diverse research contexts.
(2) In de novo drug design and multimodal data integration, ESRs 13 and 15 enriched AI models with quantum data, enhancing ADME prediction and enabling inverse molecular design. ESR9 incorporated expert knowledge into generative workflows [3], while ESR5’s PILOT and PoLiGenX tools integrated 3D geometry, binding pockets, and latent molecular features. ESR14 aligned molecular, transcriptomic, and imaging data using contrastive learning to improve toxicity and activity prediction. ESR6 advanced compound prioritization by integrating microscopy images with annotations, and ESR8 developed CLOOME, a contrastive learning framework combining molecular graphs and Cell Painting images to bridge chemical and biological data for drug design.
(3) In toxicity and side-effect prediction, ESR5 applied 3D equivariant GNNs to enhance biological relevance in toxicity modeling. ESRs 9 and 11 co-developed E-GuARD to improve robustness in detecting interference effects, while ESR16 integrated ADMETox filtering into a triaging pipeline at VPC. ESR14 developed multiple toxicity models, addressed dataset imbalance, and introduced novel training schemes for better DILI prediction. ESR6 used phenotypic screening data for early compound triaging, and ESR8’s CLOOME embeddings supported activity prediction and MoA classification, indirectly aiding toxicity profiling.
(4) In reaction and retrosynthesis prediction, ESRs 3 and 7 collaborated on retrosynthesis models and benchmarking tools, leading to the Modelzoo package and its integration into AiZynthFinder. ESR7 also developed a multi-objective MCTS algorithm while ESR3 focused on convergent multi-step synthesis. ESR4 worked on reaction yield prediction, revealing challenges in model transferability. ESR12 contributed to reagent modeling and visualization tools, supporting integration with retrosynthesis workflows. ESR1 evaluated transformer efficiency in retrosynthesis, and ESR9 began adapting tools to real lab conditions using expert feedback.
(5) ESR15 advanced molecular dynamics (MD) by developing ML-enhanced methods for electrostatics and free energy calculations with DFT-level accuracy, reducing reliance on manual parameterization.
(6) In explainability, ESR9 integrated expert input into molecule generation and co-developed E-GuARD for interpretable data augmentation. ESR1 assessed XAI limitations in toxicity models, while ESR14’s active learning framework improved data efficiency. ESR10 contributed to model trust through calibrated uncertainty, and ESR7’s retrosynthesis algorithm and design workflow embedded expert reasoning. ESR12 enabled visual exploration of reagent space and improved model inference time. ESR13’s work on Graphormers enhanced explainability by revealing how models encode molecular properties.
AIDD’s training program included six PhD schools across Europe, covering cheminformatics, advanced ML and computation methods, experimental techniques and training in communication, IP, research ethics and entrepreneurship. ESRs and PIs presented at top conferences (ICLR, ICML, NeurIPS) and public events like Pint of Science and Lange Nacht der Forschung. AIDD also co-organised ICANN2024, the 1st AIDD workshop at ICANN2024, the Tox24 Challenge and a Special Issue of J. Cheminformatics (https://www.biomedcentral.com/collections/AIDR(s’ouvre dans une nouvelle fenêtre)). Dissemination included press releases from TU Dortmund, JKU Linz, SUPSI, and others. AIDD’s Twitter/X account posted 484 tweets, reaching 198,000 views, with 200 reposts and 310 likes. The LinkedIn group shared 60 posts, earning 3,731 likes and 223 reposts. Publications are available at https://scholar.google.com/citations?user=exVyEcUAAAAJ(s’ouvre dans une nouvelle fenêtre) with 1,244 citations to date.