Skip to main content

Machine Learning for Catalytic Carbon Dioxide Activation

Periodic Reporting for period 1 - MachineCat (Machine Learning for Catalytic Carbon Dioxide Activation)

Reporting period: 2018-06-01 to 2020-05-31

Computational chemistry offers the possibility to predict chemical phenomena based entirely on computer simulations. As such, it is not only an invaluable tool for understanding chemical properties and the outcomes of reactions, but also provides guidance towards the design of new compounds and catalysts. Unfortunately, traditional approaches are subject to a balancing act between predictive accuracy and efficiency. Highly accurate methods are limited small molecules, due to their prohibitive computational cost, while more efficient approaches employ approximations rendering them less reliable. Machine learning (ML) has emerged as a powerful tool to overcome these limitations. ML models are able to “learn” to behave like highly accurate computational chemistry methods, providing predictions at only fraction of the original computational cost, but with the same accuracy. Hence, they can be used to simulate molecular systems at levels of precision beyond the reach of conventional approaches. However, due to their young age and inherent black box nature, little is understood of the inner workings of these ML models. This makes their predictions hard to rationalize and complicates the systematic improvement of current ML approaches, as it is not clear how to incorporate additional physical knowledge.

The main objective of Machine Learning for Catalytic Carbon Dioxide Activation (MachineCat) was to deepen our understanding of ML in computational chemistry and use this knowledge to push the boundaries of existing approaches. To this end, MachineCat studied chemical problems which prove challenging to current ML methods, focusing on the organocatalytic conversion of carbon dioxide mediated by a modified chitosan catalyst. This reaction is highly relevant for sustainable chemistry, as it offers cheap access to value-added chemicals, potentially replacing fossil fuels as primary carbon source. Yet, little detail is known on how the reaction proceeds and one objectives of MachineCat was to use ML approaches to elucidate the reaction mechanism. As a final objective, MachineCat aimed to explore the potential of ML methods for the rational design of new compounds and improved catalytic systems.

The conclusions found in MachineCat demonstrate the utility of modern ML architectures beyond providing efficient and accurate models of complex chemical systems. By incorporating physical relations into the structure of these ML models, their predictions and internal states can be readily understood in the context of fundamental chemical concepts, such as atomic charges and orbitals. Moreover, physical laws can be integrated in such a way, that the rich formalism of quantum mechanics can be applied directly to these ML models. This offers access a vast range of chemical properties and even the molecular wavefunction itself. Such models provide a direct relationship between chemical structure, composition and properties, which can be leveraged to design compounds with desirable qualities. Finally, by incorporating ideas from the field of ML, it is even possible to construct models which can directly generate the structure and composition of novel compounds.
Work on MachineCat included the development of an open source code package (SchNetPack) for machine learning in chemical systems together with other researchers of the host group. This code package was then used in a comparison of different ML approaches. Subsequently, an investigation on the interpretability of ML models in chemistry based on two prominent approaches and existing data sets was conducted. The insights gained in this manner were then applied to modeling an organic reaction. This resulted in a entirely new ML model called FieldSchNet, capable of describing the interactions of molecules with external environments and fields. Due to the its structure inspired by physics, FieldSchNet can operate in a plethora of different ways, which were explored in a study of solvent effects on molecular spectroscopy and reactions, as well as the design of molecular environments to enhance chemical reactions. At the same time, the SchNOrb model for predicting molecular wavefunctions was developed in collaboration with international researchers and members of the host group, bringing ML models even closer to high level computational methods. Research on these models also sparked the implementation of a new architecture for the simulation of photochemical phenomena (SchNarc) together with researches from the University of Vienna. In addition, a generative model for molecules (g-SchNet) was developed together with ML experts of the host group, the first ML model capable of automatically generating 3D structures of molecules. Following these method developments, the carbon dioxide conversion reaction was studied. The nature of the system necessitated further adaptations of the ML models in SchNetPack, was well as a significant extension of the simulation capabilities of the package. In order to make it possible to perform reference computations with accurate computational chemistry methods a new fragmentation approach was implemented. Based on these extensions, a potential for the carbon dioxide conversion capable of modeling the complex reaction dynamics at unprecedented accuracy is currently being finalized.

The main results achieved by MachineCat encompass the development of the SchNetPack code package, not only suitable for constructing models but also for simulation purposes and as a development tool for researchers. The studies of organic reactions and carbon dioxide conversion in particular have shown, that ML approaches are able to exceed the limits of conventional methods and yield predictions close to experiment. Over the course of MachineCat, four fundamentally new ML models were developed in the form of FieldSchNet, SchNOrb, g-SchNet and SchNarc, each opening new venues for research. FieldSchNets ability to model solvent effects will be exploited together with BASF as industrial partner as part of the BASLEARN project.

The research undertaken in MachineCat has so far resulted in four publications in peer reviewed journals (three of them in high impact journals), two bookchapters and one publication at NeurIPS, the leading ML conference. Posters were presented at NeurIPs and a joint workshop (BBDC, BZML and RIKEN). The researcher gave talks at 3 workshops (IPAM, UniSysCat), one at the annual conference of the Americal Phyiscal Socienty, three at seminars (host group, BasCat and UnisysCat), one as part of a visiting fellowship at the University of Warwick and one talk at the BASF headquarters in Ludwigshafen.
The ML models developed during MachineCat each represent a significant progress beyond the current state of the art. FieldSchNet leverages physical principles to predict a vast quantity of properties and phenomena, making it well suited for a wide range of chemical phenomena (reactions, solvent effects, biological systems, inverse design, spectroscopy). SchNOrb is the first ML model to directly predict the electronic Hamiltonian, a central quantity of quantum mechanics. This allows for a much tighter integration of ML with established electronic structure approaches. g-SchNet is the first generative model which can generate three dimensional molecular structures. As such, it opens up a completely new venue of research towards the targeted design of molecules and materials.
Machine learning enables chemical simulations and applications out of reach by conventional means