Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Constructing Intermolecular Potentials by Combining Physics and Machine Learning

Periodic Reporting for period 1 - ML Potentials (Constructing Intermolecular Potentials by Combining Physics and Machine Learning)

Reporting period: 2018-03-15 to 2020-03-14

Molecular interactions have a central role in determining the chemical and physical properties of molecules and materials. Intramolecular interactions (a.k.a. chemical bonds) bind atoms together to form molecules; on the other hand, intermolecular interactions are the attractive and repulsive forces between non-bonded fragments within a molecule or in-between different molecules. Even though intermolecular interactions contribute only a tiny fraction to the total energy, accurately quantifying intermolecular interactions is essential to correctly model chemical phenomena like the structure, dynamics, and function of proteins, the binding and metabolism of drugs, the structure and relative stability of supramolecular complexes and crystal polymorphs, and the orientation and reactivity of molecules on surfaces.

Intermolecular potentials can be accurately quantified by applying high-level molecular quantum mechanics methods. Although these methods are physically sound and mathematically rigorous, their computational cost grows exponentially with the size of the system. Conversely, molecular mechanics compromises accuracy for speed, and approximates intermolecular interactions by parameterizing a classical potential energy function called a force field. Although molecular mechanics methods are applicable to systems with millions of atoms, they are unreliable for systems and physicochemical phenomena that are dissimilar to those used to parameterize the force field, which significantly limits their accurate applications.

These limitations motivated me to develop state-of-the-art machine-learning (ML) models to accurately and rapidly predict molecular interaction energies and forces. Just as human chemists learn from past experiences to make predictions about the properties of new molecules, in ML a mathematical model is trained to leverage prior experimental and/or computational results to predict the properties of new molecules. This proposal is motivated by the impressive success of recent statistical ML models for accurately predicting molecular energies and forces, but their ineluctable shortcomings for modeling intermolecular (long-range and non-local) interactions and their consequent failure to scale to larger systems. By modeling long-range interactions using quantum mechanics methods (where they are computationally affordable), I developed an ML method that can be trained using only small-to-medium-size molecules, but applied to larger molecules.

The main objectives achieved within the first half of the proposed action include: 1) developing a new methodology called Distance-Adapted version of SchNet (DASNet) which improve the existing state-of-the-art neural network architectures, 2) implementing a prototype of DASNet alongside designing the framework of the ultimate software package for predicting interaction energies based on the proposed model, 3) building the infrastructure for running quantum chemistry computations and compiling training data, 4) Releasing ChemTools software (free and open-source package which embodies a collection of interpretive tools for analyzing outputs of quantum chemistry calculations to gain chemical knowledge), 5) disseminating the action outcome by presenting at international workshops and conferences and organizing a hands-on workshop in Europe to promote Python programming language and teach ChemTools software package to a wide range of researchers.
Because the project ended halfway through, most of the work that was performed was acquiring the essential background preparation and building the technical foundation for the results phase of the project. These preparatory tasks included acquiring expertise in machine-learning methodology and refining the proposal into a detailed list of technical objectives. During the process of developing a detailed algorithm, I was inspired to develop the Distance-Adapted version of SchNet (DASNet) methodology. I then built a prototype software package for implementing, training, and testing DASNet, and developed efficient software for the additive variational Hirshfeld (AVH) method for computing atomic charges and, more generally, partitioning molecular properties into atomic contributions. As a side project, I developed a flexible software package for kernel learning.
While the practical impact of DASNet has not yet been realized, an immediate impact was achieved through the release of the free and open-source ChemTools software package, which I was developing mainly to provide training data for DASNet. However, ChemTools has had a broad impact: in less than a month since its initial release, ChemTools has been installed by more than 30 researchers from around the world, from diverse research fields including chemistry, biochemistry, and materials science and engineering.
In addition to fulfilling the main goals intended for the reporting period, I also released ChemTools, a free and open-source software package for elucidating the molecular electronic structure, intermolecular and intramolecular interactions, and chemical reactivity. I was the lead organizer for a week-long international software workshop featuring ChemTools, with nearly thirty participants (mostly graduate students, but also postdoctoral fellows and professors), at the Sorbonne University in Paris. There were also separate, shorter, hands-on ChemTools sessions that I co-organized at conferences in China and Chile. This dissemination activity was greater than I had anticipated and occurred earlier in the MSCA than I expected.
I developed a refined version of the Schütt Network, called DASNet, that incorporates chemical knowledge about atomic properties and the key length-scales of intramolecular interactions. This refines and improves the methodology I had anticipated using in my original proposal. The free and open-source software package I am developing that implements DASNet will be disseminated like ChemTools, through hands-on sessions at international conferences and dedicated week-long workshops in the future.
Ultimate Goal of Proposal
My booklet 0 0