Periodic Reporting for period 1 - AiChemist (Explainable AI for Molecules - AiChemist)
Okres sprawozdawczy: 2023-09-01 do 2025-08-31
The main objectives of AiChemist are:
1. Develop and benchmark explainable molecular, reaction and protein representations that improve accuracy, speed and applicability domain versus conventional physics-based/ML baselines.
2. Advance mechanistic and quantum-informed models (e.g. reaction-outcome predictors, QM-derived descriptors) to ground AI decisions in chemical theory.
3. Bridge AI outputs and chemical intuition through practical explainable AI (XAI) workflows for toxicity, drug response and reaction design—including uncertainty, multi-objective trade-offs, and human-interpretable rationales.
4. Validate on public and proprietary datasets, release open, privacy-aware tools.
5. Train DCs through coordinated schools and secondments spanning academia and pharma, with the involvement of regulators in the supervisory board, ensuring durable uptake and technology transfer.
By improving trust, portability and efficiency of AI across discovery pipelines, AiChemist aims to reduce experimental iterations and compute budgets; enable safer medicines and chemicals via interpretable toxicity predictions; protect proprietary data while encouraging model exchange; and cultivate a new cohort of researcher-innovators fluent in XAI, open science and responsible research. The expected gains—faster, cheaper and greener design with explanations that chemists and regulators can use—position AiChemist to contribute to Europe’s strategic goals for innovation, safety and sustainability.
In WP2, an automated meta-MD workflow rediscovered the full catalytic cycle of a Buchwald–Hartwig coupling, an important first for complex organometallic mechanisms, and is now being extended to challenging substrates. We curated large condition datasets (~120k amide, ~50k Buchwald–Hartwig, ~20k Suzuki) and learned “condition fingerprints” that cluster settings yielding similar outcomes; combined with CGR reaction features, these embeddings improve feasibility modelling and enable virtual condition screening for practical recommendation. We also delivered multifidelity workflows that automate SMILES→DFT descriptors and a pharmacophore-representation calculator to support low-data selection.
In WP3, we produced a deployable multi-target nano-QSAR (NanoToxRadar) and initiated a deep model of gut-microbiome drug metabolism using PLM-derived bacterial embeddings, advanced explainable toxicity modelling by benchmarking five XAI methods on SMILES encoders to establish faithful, consistent attributions, then using the most reliable signals—augmented with global physicochemical/QM descriptors (e.g. HOMO–LUMO gap, logP)—to steer generation of new chemotypes for mutagenicity and cardiotoxicity. We have also built subpopulation-aware cardiotoxicity models from curated FAERS datasets using rigorous nested cross-validation and design choices optimised against DICTrank, and introduced an interpretable multi-instance learning workflow (“MILK”) that quantifies conformer importance for activity to support trustworthy explanations.
Furthermore, AiChemist has co-led crowdsourcing challenges, i.e. the Tox24 and 2nd Joint EU-Openscreen/SLAS Challenges, which accelerate AI4Science research by uniting diverse expertise to develop, benchmark, and validate models on real-world scientific data, fostering innovation and reproducibility.