Skip to main content

Big Data in Chemistry

Deliverables

Open lectures to students of high schools/gymnasia

The lectures to students of higher schools and gymnasium will be provided to disseminate information about scientific topics and stimulate interest of future students in the scientific studies.

Publication of newsletter

The newsletter that will overview project activities to a public audience will be prepared and sent to the related societies

Organisation of Open Days

The organization of Open Days to promote MC EID beyond the scientific community will be provided.

Web site and application system for fellows

The web site and web application system to accept application of fellows is established and ready to accept applications.

Minutes of the kick-off meeting

This deliverable will be report of the results of the kick-off meeting.

3rd Winter school report

"Report about results of ""Virtual and HTS screening"" school organized by LDC and Boehringer Ingelheim Pharma GmbH & Co. KG will be provided."

Comparison of performances of different data sharing approaches

Report will assess the performance of different data sharing strategies.

2nd Winter school report

"The report of the third school ""Computer-Assisted Drug Discovery"" which will be organized by University of Modena."

Review of the developed protocols and their performances on public and in house data

Review of the performance of ligand- and structure-based approaches for drug design and discovery will be provided.

1st Summer school report

"The report about the results of the second School ""Chemical Data Resources"" organized by Uni Bern and Uni Zurich will be provided."

Benchmarking of developed machine learning approaches

The overview of benchmarking of methods for data visualization and modelling based on GTM approach will be provided.

Report of the final closing conference

Report will summarize the results of the final closing conference of the project.

Analysis of target similarity of chemical compounds

Analysis of compound promiscuity and selectivity patterns will be provided.

2nd Summer school report

"Report of ""Chemical Space and ADMETox profiling"" school organized by HMGU and AZ will be provided."

Overview of strategies for data sharing

The report will summarize the strategies for secure sharing of data that will be developed and validated during the project.

Overview of HTS data

The report will summarize HTS data that will be available for development of frequent hitter filters.

Preparation of CDPs

CDP will be prepared for each employed fellow and provided to REA.

Final report of the project and an overview of the awarded PhDs

Report will summarize achievement of the project, overview the awarded PhDs and the scientific output.

Analysis of frequent hitters for screening technologies

We will report about the identified frequent hitters developed for different screening technologies.

1st Winter school report

"The report of the first School ""Introduction to chemoinformatics"" organized by the Uni Bonn will be provided."

Searching for OpenAIRE data...

Publications

Application of Generative Autoencoder in De Novo Molecular Design


Published in: ISSN 1868-1743
DOI: 10.1002/minf.201700123

Virtual Exploration of the Ring Systems Chemical Universe

Author(s): Ricardo Visini, Josep Arús-Pous, Mahendra Awale, Jean-Louis Reymond
Published in: Journal of Chemical Information and Modeling, Issue 57/11, 2017, Page(s) 2707-2718, ISSN 1549-9596
DOI: 10.1021/acs.jcim.7b00457

The rise of deep learning in drug discovery

Author(s): Hongming Chen, Ola Engkvist, Yinhai Wang, Marcus Olivecrona, Thomas Blaschke
Published in: Drug Discovery Today, Issue 23, 2018, Page(s) 1241-1250, ISSN 1359-6446
DOI: 10.1016/j.drudis.2018.01.039

Support Vector Machine Classification and Regression Prioritize Different Structural Features for Binary Compound Activity and Potency Value Prediction

Author(s): Raquel Rodríguez-Pérez, Martin Vogt, Jürgen Bajorath
Published in: ACS Omega, Issue 2/10, 2017, Page(s) 6371-6379, ISSN 2470-1343
DOI: 10.1021/acsomega.7b01079

Chemical Space: Big Data Challenge for Molecular Diversity

Author(s): Mahendra Awale, Ricardo Visini, Daniel Probst, Josep Arús-Pous, Jean-Louis Reymond
Published in: CHIMIA International Journal for Chemistry, Issue 71/10, 2017, Page(s) 661-666, ISSN 0009-4293
DOI: 10.2533/chimia.2017.661

Prediction of Compound Profiling Matrices Using Machine Learning

Author(s): Raquel Rodríguez-Pérez, Tomoyuki Miyao, Swarit Jasial, Martin Vogt, Jürgen Bajorath
Published in: ACS Omega, Issue 3/4, 2018, Page(s) 4713-4723, ISSN 2470-1343
DOI: 10.1021/acsomega.8b00462

Selection of protein conformations for structure-based polypharmacology studies

Author(s): Luca Pinzi, Fabiana Caporuscio, Giulio Rastelli
Published in: Drug Discovery Today, Issue 23/11, 2018, Page(s) 1889-1896, ISSN 1359-6446
DOI: 10.1016/j.drudis.2018.08.007

Combining structural and bioactivity-based fingerprints improves prediction performance and scaffold hopping capability

Author(s): Oliver Laufkötter, Noé Sturm, Jürgen Bajorath, Hongming Chen, Ola Engkvist
Published in: Journal of Cheminformatics, Issue 11/1, 2019, ISSN 1758-2946
DOI: 10.1186/s13321-019-0376-1

Luciferase Advisor: High-Accuracy Model To Flag False Positive Hits in Luciferase HTS Assays

Author(s): Dipan Ghosh, Uwe Koch, Kamyar Hadian, Michael Sattler, Igor V. Tetko
Published in: Journal of Chemical Information and Modeling, Issue 58/5, 2018, Page(s) 933-942, ISSN 1549-9596
DOI: 10.1021/acs.jcim.7b00574

Prediction of Compound Profiling Matrices, Part II: Relative Performance of Multitask Deep Learning and Random Forest Classification on the Basis of Varying Amounts of Training Data

Author(s): Raquel Rodríguez-Pérez, Jürgen Bajorath
Published in: ACS Omega, Issue 3/9, 2018, Page(s) 12033-12040, ISSN 2470-1343
DOI: 10.1021/acsomega.8b01682

Large-Scale Comparison of Alternative Similarity Search Strategies with Varying Chemical Information Contents

Author(s): Oliver Laufkötter, Tomoyuki Miyao, Jürgen Bajorath
Published in: ACS Omega, Issue 4/12, 2019, Page(s) 15304-15311, ISSN 2470-1343
DOI: 10.1021/acsomega.9b02470

Multitask Machine Learning for Classifying Highly and Weakly Potent Kinase Inhibitors

Author(s): Raquel Rodríguez-Pérez, Jürgen Bajorath
Published in: ACS Omega, Issue 4/2, 2019, Page(s) 4367-4375, ISSN 2470-1343
DOI: 10.1021/acsomega.9b00298

A Survey of Multi‐task Learning Methods in Chemoinformatics


Published in: ISSN 1868-1743
DOI: 10.1002/minf.201800108

Exploring the GDB-13 chemical space using deep generative models

Author(s): Josep Arús-Pous, Thomas Blaschke, Silas Ulander, Jean-Louis Reymond, Hongming Chen, Ola Engkvist
Published in: Journal of Cheminformatics, Issue 11/1, 2019, Page(s) 11:20, ISSN 1758-2946
DOI: 10.1186/s13321-019-0341-z

Multi-task generative topographic mapping in virtual screening

Author(s): Arkadii Lin, Dragos Horvath, Gilles Marcou, Bernd Beck, Alexandre Varnek
Published in: Journal of Computer-Aided Molecular Design, Issue 33/3, 2019, Page(s) 331-343, ISSN 0920-654X
DOI: 10.1007/s10822-019-00188-x

Diversifying chemical libraries with generative topographic mapping

Author(s): Arkadii Lin, Bernd Beck, Dragos Horvath, Gilles Marcou, Alexandre Varnek
Published in: Journal of Computer-Aided Molecular Design, Issue Aug 12, 2019, Page(s) -, ISSN 0920-654X
DOI: 10.1007/s10822-019-00215-x

Applications of Deep-Learning in Exploiting Large-Scale and Heterogeneous Compound Data in Industrial Pharmaceutical Research

Author(s): Laurianne David, Josep Arús-Pous, Johan Karlsson, Ola Engkvist, Esben Jannik Bjerrum, Thierry Kogej, Jan M. Kriegl, Bernd Beck, Hongming Chen
Published in: Frontiers in Pharmacology, Issue 10, 2019, ISSN 1663-9812
DOI: 10.3389/fphar.2019.01303

Randomized SMILES strings improve the quality of molecular generative models

Author(s): Josep Arús-Pous, Simon Viet Johansson, Oleksii Prykhodko, Esben Jannik Bjerrum, Christian Tyrchan, Jean-Louis Reymond, Hongming Chen, Ola Engkvist
Published in: Journal of Cheminformatics, Issue 11/1, 2019, Page(s) 3-13, ISSN 1758-2946
DOI: 10.1186/s13321-019-0393-0

Prediction of Different Classes of Promiscuous and Nonpromiscuous Compounds Using Machine Learning and Nearest Neighbor Analysis

Author(s): Thomas Blaschke, Filip Miljković, Jürgen Bajorath
Published in: ACS Omega, Issue 4/4, 2019, Page(s) 6883-6890, ISSN 2470-1343
DOI: 10.1021/acsomega.9b00492

A de novo molecular generation method using latent vector based generative adversarial network

Author(s): Oleksii Prykhodko, Simon Viet Johansson, Panagiotis-Christos Kotsias, Josep Arús-Pous, Esben Jannik Bjerrum, Ola Engkvist, Hongming Chen
Published in: Journal of Cheminformatics, Issue 11/1, 2019, ISSN 1758-2946
DOI: 10.1186/s13321-019-0397-9

Interpretation of Compound Activity Predictions from Complex Machine Learning Models Using Local Approximations and Shapley Values

Author(s): Raquel Rodríguez-Pérez, Jürgen Bajorath
Published in: Journal of Medicinal Chemistry, Issue September 12, 2019, 2019, Page(s) NA, ISSN 0022-2623
DOI: 10.1021/acs.jmedchem.9b01101

Datasets and their influence on the development of computer assisted synthesis planning tools in the pharmaceutical domain

Author(s): Thakkar, Amol; Kogej, Thierry; Reymond, Jean-Louis; Engkvist, Ola; Bjerrum, Esben Jannik
Published in: Chemical Science, Issue 3, 2020, Page(s) 154–168, ISSN 2041-6539
DOI: 10.1039/c9sc04944d

Building attention and edge message passing neural networks for bioactivity and physical–chemical property prediction

Author(s): M. Withnall, E. Lindelöf, O. Engkvist, H. Chen
Published in: Journal of Cheminformatics, Issue 12/1, 2020, ISSN 1758-2946
DOI: 10.1186/s13321-019-0407-y

Automating drug discovery

Author(s): Gisbert Schneider
Published in: Nature Reviews Drug Discovery, Issue 17/2, 2018, Page(s) 97-113, ISSN 1474-1776
DOI: 10.1038/nrd.2017.232

Transformer-CNN: Swiss knife for QSAR modeling and interpretation

Author(s): Pavel Karpov, Guillaume Godin, Igor V. Tetko
Published in: Journal of Cheminformatics, Issue 12/1, 2020, ISSN 1758-2946
DOI: 10.1186/s13321-020-00423-w

Assessing the information content of structural and protein-ligand interaction representations for the classification of kinase inhibitor binding modes via machine learning and active learning

Author(s): Raquel Rodríguez-Pérez; Filip Miljković; Jürgen Bajorath
Published in: Journal of Cheminfomatics, Issue 3, 2020, ISSN 1758-2946
DOI: 10.5281/zenodo.3759400

Activity landscape image analysis using convolutional neural networks

Author(s): Javed Iqbal; Martin Vogt; Jürgen Bajorath
Published in: Journal of Cheminformatics, Issue 2, 2020, ISSN 1758-2946
DOI: 10.5281/zenodo.3759410

BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry


Published in: ISSN 1868-1743
DOI: 10.1002/minf.201600073

On the Integration of In Silico Drug Design Methods for Drug Repurposing

Author(s): Eric March-Vila, Luca Pinzi, Noé Sturm, Annachiara Tinivella, Ola Engkvist, Hongming Chen, Giulio Rastelli
Published in: Frontiers in Pharmacology, Issue 8, 2017, ISSN 1663-9812
DOI: 10.3389/fphar.2017.00298

Does ‘Big Data’ exist in medicinal chemistry, and if so, how can it be harnessed?

Author(s): Igor V Tetko, Ola Engkvist, Hongming Chen
Published in: Future Medicinal Chemistry, Issue 8/15, 2016, Page(s) 1801-1806, ISSN 1756-8919
DOI: 10.4155/fmc-2016-0163

Influence of Varying Training Set Composition and Size on Support Vector Machine-Based Prediction of Active Compounds

Author(s): Raquel Rodríguez-Pérez, Martin Vogt, Jürgen Bajorath
Published in: Journal of Chemical Information and Modeling, Issue 57/4, 2017, Page(s) 710-716, ISSN 1549-9596
DOI: 10.1021/acs.jcim.7b00088

Molecular de-novo design through deep reinforcement learning

Author(s): Marcus Olivecrona, Thomas Blaschke, Ola Engkvist, Hongming Chen
Published in: Journal of Cheminformatics, Issue 9/1, 2017, Page(s) 48-59, ISSN 1758-2946
DOI: 10.1186/s13321-017-0235-x

Matched Molecular Pair Analysis on Large Melting Point Datasets: A Big Data Perspective


Published in: ISSN 1860-7179
DOI: 10.1002/cmdc.201700303

Analysis and Modelling of False Positives in GPCR Assays

Author(s): Dipan Ghosh, Igor Tetko, Bert Klebl, Peter Nussbaumer, Uwe Koch
Published in: Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions - 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, Issue 11731, 2019, Page(s) 764-770
DOI: 10.1007/978-3-030-30493-5_71

Neural Network Guided Tree-Search Policies for Synthesis Planning

Author(s): Amol Thakkar, Esben Jannik Bjerrum, Ola Engkvist, Jean-Louis Reymond
Published in: Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions - 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, Issue 11731, 2019, Page(s) 721-724
DOI: 10.1007/978-3-030-30493-5_64

Augmentation Is What You Need!

Author(s): Igor V. Tetko, Pavel Karpov, Eric Bruno, Talia B. Kimber, Guillaume Godin
Published in: Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions - 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, Issue 11731, 2019, Page(s) 831-835
DOI: 10.1007/978-3-030-30493-5_79

Improving Deep Generative Models with Randomized SMILES

Author(s): Josep Arús-Pous, Simon Johansson, Oleksii Prykhodko, Esben Jannik Bjerrum, Christian Tyrchan, Jean-Louis Reymond, Hongming Chen, Ola Engkvist
Published in: Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions - 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, Issue 11731, 2019, Page(s) 747-751
DOI: 10.1007/978-3-030-30493-5_68

A Transformer Model for Retrosynthesis

Author(s): Pavel Karpov, Guillaume Godin, Igor V. Tetko
Published in: Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions - 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, Issue 11731, 2019, Page(s) 817-830
DOI: 10.1007/978-3-030-30493-5_78

Attention and Edge Memory Convolution for Bioactivity Prediction

Author(s): Michael Withnall, Edvard Lindelöf, Ola Engkvist, Hongming Chen
Published in: Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions - 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, Issue 11731, 2019, Page(s) 752-757
DOI: 10.1007/978-3-030-30493-5_69

Molecular de novo Design Through Deep Generative Models

Author(s): Engkvist, Ola; Arús-Pous, Josep; Bjerrum, Esben Jannik; Chen, Hongming
Published in: Artificial Intelligence in Drug Discovery, Issue 1, 2020, Page(s) -
DOI: 10.5281/zenodo.3628194

Diversify Libraries Using Generative Topographic Mapping

Author(s): Lin, Arkadii; Beck, Bernd; Horvath, Dragos; Varnek, Alexandre
Published in: Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions - 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, Issue 11731, 2019, Page(s) 839-341
DOI: 10.5281/zenodo.3515029

Detection of Frequent-Hitters Across Various HTS Technologies

Author(s): David, Laurianne; Walsh, Jarrod; Bajorath, Jürgen; Engkvist, Ola
Published in: Artificial Neural Networks and Machine Learning – ICANN 2019: Workshop and Special Sessions - 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, Issue 11731, 2019, Page(s) 842-844
DOI: 10.5281/zenodo.3515025

'Ring Breaker': Assessing Synthetic Accessibility of the Ring System Chemical Space

Author(s): Thakkar, Amol; Selmi, Nidhai; Reymond, Jean-Louis; Engkvist, Ola; Bjerrum, Esben Jannik
Published in: ChemRxiv, Issue 11, 2019, Page(s) -
DOI: 10.26434/chemrxiv.9938969.v1

Direct Steering of de novo Molecular Generation using Descriptor Conditional Recurrent Neural Networks (cRNNs)

Author(s): Kotsias, Panagiotis-Christos; Arús-Pous, Josep; Chen, Hongming; Engkvist, Ola; Tyrchan, Christian; Bjerrum, Esben Jannik
Published in: Nature Machine Intelligence, Issue 126, 2019
DOI: 10.26434/chemrxiv.9860906.v2

SMILES-Based Deep Generative Scaffold Decorator for De-Novo Drug Design

Author(s): Josep Arús-Pous Atanas Patronov Esben Jannik Bjerrum Christian Tyrchan Jean-Louis Reymond Hongming Chen Ola Engkvist
Published in: ChemRxiv, Issue 1, 2020, Page(s) 1, ISSN 2573-2293
DOI: 10.26434/chemrxiv.11638383.v1

REINVENT 2.0 – an AI tool for de novo drug design

Author(s): Thomas Blaschke, Josep Arús-Pous, Hongming Chen, Christian Margreitter, Christian Tyrchan, Ola Engkvist, Kostas Papadopoulos, Atanas Patronov
Published in: ChemRxiv, Issue 1, 2020, Page(s) 1, ISSN 2573-2293
DOI: 10.26434/chemrxiv.12058026.v2

Cartographie Topographique Générative: un outil puissant pour la visualisation, l'analyse et la modélisation de données chimiques volumineuses

Author(s): Arkadii Lin
Published in: PhD thesis, 2019

Machine Learning Methodologies for Interpretable Compound Activity Predictions

Author(s): Raquel Rodríguez Pérez
Published in: PhD thesis, 2020

Exploration of synthetically accessible chemical space by de novo design

Author(s): Xuejin Zhang
Published in: PhD thesis, 2019