Skip to main content
Go to the home page of the European Commission (opens in new window)
English English
CORDIS - EU research results
CORDIS

Combining Machine Learning and Quantum Chemistry for the Design of Homogeneous Catalysts

Periodic Reporting for period 1 - ML4Catalysis (Combining Machine Learning and Quantum Chemistry for the Design of Homogeneous Catalysts)

Reporting period: 2021-09-01 to 2023-08-31

Catalysis is essential for making processes in the chemical industry, one of the major consumers of energy, more efficient. In recent years, experimental approaches for finding new catalysts were increasingly complemented by theoretical simulation. However, the most widespread computational methods, like density functional theory (DFT), are too computationally costly to apply them to large systems and long timescales, or to screen large numbers of potential catalyst candidates.

In the past decade, machine learning (ML) techniques have reduced the cost of simulations substantially while mostly preserving the accuracy of the parent methods. However, these methods need large amounts of data, which requires the use of black-box parent methods that are mostly of so-called single-reference type (e.g. DFT or coupled cluster theory). Many first-row transition metal complexes, which are promising candidates for cheap homogeneous catalysts, have a complicated electronic structure that requires the use of more sophisticated multireference (MR) methods. These are not easily set up in a black-box fashion for a large number of compounds, which is why they are not yet widespread for labeling machine learning data.

The main objective of ML4Catalysis was the automation of MR calculations in order to train so-called Δ-ML potentials that are more accurate for transition metal complexes. These potentials could then be used for the screening and design of new homogeneous catalysts.
In order to improve the use of available data in Δ-ML potentials, we developed a new type of equivariant feature in terms of semiempirical Hamiltonian matrices. As the baseline method for this potential, we chose the very promosing architecture Neural Equivariant Interatomic Potentials (NequIP). The benchmarking of this combined equivariant Δ-ML method on the QM9 and tmQMg datasets is still ongoing and is expected to be published soon. We also worked on improving ML models by incorporating physical priors. In particular, we exploited extensivity of the energy by learning targets that are corrections on top of a baseline obtained by linear fitting. This technique was applied in order to improve ML models based on natural bond orbital features and published in the form of a peer-reviewed open-access journal article.

The autoCAS automated active space selection tool developed by our collaborators at ETH Zürich removes the need for manual interference in MR calculations. However, this tool is based on a relatively expensive density matrix renormalization group calculation. Therefore, we worked on developing an ML method that can learn active spaces within a set of related compounds belonging to a common reaction network. As an application, we chose the oxidation of hydrocarbons like methane to the corresponding alcohol like methanol. This reaction is of high importance for generating valuable feedstock chemicals. It can be catalyzed by the homogeneous catalyst [Fe(TPA)(H2O)]3+ with hydrogen peroxide as the oxidant. The automated generation of the reaction network was performed with the Chemoton software. In order to prevent a too large combinatorial explosion of the reaction network, a new filter was implemented in Chemoton in the course of ML4Catalysis. This work is still ongoing and will be published as an open-access journal article in the future.

The results of ML4Catalysis were presented in two lectures and one poster contribution at international conferences. Furthermore, research data supporting all our findings was and will be made openly available.
The use of equivariant features derived from semiempirical information in a Δ-ML context and the automated labeling of data using MR methods is a clear progress beyond the state of the art in the field. The developed methods are not only applicable to finding new catalysts that will make the chemical industry greener and more energy-efficient. More broadly, they have the potential to impact any field of chemistry and materials science. Since chemistry pervades almost every aspect of our daily lives, it is expected that the project results will have a strong societal impact.
equivariance.jpg
My booklet 0 0