European Commission logo
español español
CORDIS - Resultados de investigaciones de la UE
CORDIS

Advanced machine learning for Innovative Drug Discovery

Periodic Reporting for period 1 - AIDD (Advanced machine learning for Innovative Drug Discovery)

Período documentado: 2021-01-01 hasta 2022-12-31

AIDD (http://ai-dd.eu) is a Marie Skłodowska-Curie Innovative Training Network - European Industrial Doctorate (EID) dedicated to the interdisciplinary field at the interfaces of chemistry and machine learning with the goal to advance medicinal chemistry based on innovative informatics methods. The main goal of AIDD is the training of a new generation of scientists skilled to face the challenges and exploit the opportunities of the field as well as to contribute a "One Chemistry" open source software that can predict outcomes ranging from different properties to molecule generation and synthesis.
The work in the AIDD project is structured into three main research work packages comprising sixteen individual research topics. Within the first WP “Beyond classical machine learning” the AIDD codebase was developed. The data required for individual projects for all WPs were collected. Promiscuity prediction models were developed. The proteo-chemometric modelling (PCM) of drug-target interactions, where the HyperNetwork predicts the parameters of a bioactivity prediction, achieved state-of-the-art results on the established benchmark. Last but not least, AIDD contributed to the team which won Kaggle First EUOS/SLAS Joint Compound Solubility Challenge. The “Machine learning for intelligent compound design” WP concentrated on approaches and algorithms either to generate or profile chemical compounds. Specifically, work was focused on different representations of either chemical and/or image phenotype space, which either separately or jointly aim to improve modelling of project specific or generic properties. The Few-Shot Learning approaches were used to model activities with only a small amount of data and investigate their efficacy. Dose-time dependency toxicity space was analysed to improve prediction of DILI toxicity. In addition to 2D representation we also studied the potential of equivariant graph neural networks to predict chemical toxicity and provide results similar to those based on traditional 2D methods. Combining structural information of molecules and information from microscopy imaging data was shown to learn transferable representations by performing linear probing for activity prediction, cross-modal retrieval and a zero-shot image classification task. Within the “Synthesis and prediction of chemical compounds” WP work was focused on yield prediction in which we showed that even state-of-the-art models are currently unable to predict the numerical value of yield and thus classification approaches could be more suitable for this task. A link between single-step and multi-step evaluation metrics was evaluated and practical suggestions for use of machine learning models for planning of chemical reaction synthesis were provided along with benchmarking results of four state-of-the-art retrosynthesis prediction methods. A method for completing missing reagents of reactions was developed and published in a high-rank journal. Last but not least, a method for prediction of transition regions to interpolate reaction-like behaviours and obtain estimations on the energy of barriers was developed.
Within Curriculum WP a structured training was provided to fellows within four Schools which included public broadcasting of the non-confidential presentations. The online curriculum included lectures from invited speakers as well as fellows, who thus trained their presentation skills as well as dissemination results of their works to the audience outside of the consortium via Zoom. The website of the consortium https.//ai-dd.eu was developed and is used, together with the AIDDONE twitter account (https://twitter.com/AiddOne) as the main dissemination hub of project activities. Presentation to conferences including invited lectures at conferences and congresses contributed to further dissemination of project results.
The project has developed several innovative approaches such as methodology to predict missing chemical reactants and identified requirements for use of retrosynthesis models to be used in multistep reaction prediction scenarios. Together with the anticipated developments on yield and reactivity predictions, these findings will be important to develop better computer-assisted organic synthesis systems to facilitate synthesis of new molecules. We envisage that these models and algorithms will be used daily by chemists in the pharmaceutical industry to reduce the time from idea to synthesis as well as save on used chemicals. The Kaggle First EUOS/SLAS Joint Compound Solubility Challenge allowed us to identify the winning strategy for prediction of physico-chemical properties of compounds. Both these findings could have a socio-economic impact by speeding drug discovery and design of compounds with favourable physico-chemical properties.
AIDD concept
TMAP vizualization of frequent hitters chemical space.
TSNE maps for reactions in USPTO and Reaxys for reaction classes https://doi.org/10.1039/D2SC06798F
Missing reactants can degrade reaction outcome prediction https://doi.org/10.1039/D2SC06798F
Workflow used to build the winning Kaggle Challenge model