Periodic Reporting for period 2 - BIGCHEM (Big Data in Chemistry)
Reporting period: 2018-01-01 to 2019-12-31
Within the target discovery and validation phase of drugs discovery, BIGCHEM exploited Big Data in machine-learning models for compound activity prediction (ESR1, ESR2), with a special emphasis on model interpretation (ESR1). The work of BIGCHEM on new technologies for chemical high-throughput virtual screening (ESR2, ESR3, ESR4) resulted in an improvement of the screening process efficiency, specifically with respect to time and cost. This work also contributed to a deeper understanding of compound promiscuity, the property of chemical compounds to activate multiple targets, and additionally developed tools to detect and filter promiscuous compounds with unwanted compound activity (ESR9, ESR4, ESR5), providing better interpretation of the results of screening assays.
Furthermore, the BIGCHEM fellows created innovative in silico methods for the visualization and analysis of large-scale datasets of compounds (ESR3, ESR4), facilitating the detection of new chemical leads of high-throughput screening campaigns.
The influence of the BIGCHEM fellow’s scientific outcome on the lead optimization and de novo design phase was also remarkable. The proposal of new generative models for the creation of new compounds with desired properties (ESR6, ESR7, ESR8) and the development of innovative approaches for the planning of synthetic routes (ESR7, ESR9) was key for the transformation and advance of this phase.
BIGCHEM also contributed to the creation of new data sharing methodologies, in particular methods to share ADMETox properties by means of graph convolutional deep neural networks and by using Molecule Matched Pairs (ESR10).
The above-mentioned BIGCHEM outcomes were presented at 65 scientific conferences and events and resulted in 52 publications (see http://bigchem.eu). The project also co-organised the Strasbourg Summer School on Chemoinformatics in 2018 and International Conference on Neural Networks (ICANN2019) in Munich in 2019 as well as edited a special issue on “Big data in Chemistry” at the Journal of Cheminformatics. The impact of BIGCHEM on the scientific community has been really outstanding: its publications were cited more than 600 times only in 2018 according to Google Scholar and include one “Hot paper” as well as four “highly cited” articles (source Web of Science). All ESRs were enrolled at the PhD programs of the respective Universities. Three fellows have already received their PhD degrees and the others are working towards it.
More specifically, of practical relevance in pharmaceutical research was the contribution of BIGCHEM on improving the understanding the decisions of complex machine-learning models is (ESR1). This was also true for the new developed strategies to combine data sets from heterogeneous sources, which expands application scenarios for machine learning-based predictive modeling in drug discovery (ESR2). The virtual screening tools developed by BIGCHEM fellows (ESR3) were successfully used in various academic and industrial projects. In particular, the Hierarchical Generative Topographic Mapping (GTM) Zooming approach was applied to compare large chemical libraries and to search for unique chemotypes in Boehringer Ingelheim GmbH & Co KG.
Without a doubt, the promiscuity filters developed to target specific assay technologies are of practical use in drug discovery, since they perform better than the generic ones (ESR4). The newly created tools and databases by BIGCHEM fellows (ESR6) will help researchers to work with the Big Data. Methodology developed within the project facilitated identification of new compounds for synthesis, new drug candidates and in general, allowed better understanding of and navigation through chemical space. The software tool created to predict the synthetic feasibility of reactants (ESR7) will help the drug discovery field to increase the synthetic accessibility of the molecular designs, will speed up, and will facilitate the identification of candidates for chemical synthesis in the wet-lab. These methodologies (ESR9) were already successfully validated to design synthetic routes for compounds in internal drug discovery projects AstraZeneca. The dissemination of the developed methodology in open access articles is allowing its widespread use by other interested partners, including academy, SMEs and large industry.
The new generative models for the creation of new compounds with desired properties (ESR6, ESR7, ESR8) developed by BIGCHEM will support future drug discovery projects by providing novel pharmaceutically relevant molecules with desired properties such as polypharmacology (ESR9) in a cost- and time-efficient manner.
The review articles of BIGCHEM partners directed to non-specialized and non-experimented audiences will contribute to the public awareness of the impact and possibilities of this new field to improve human health.