Periodic Reporting for period 1 - aCCuracy (Turning gold standard quantum chemistry into a routine simulation tool: predictive properties for large molecular systems)
Okres sprawozdawczy: 2023-07-01 do 2025-12-31
On the accuracy side, quantum chemistry’s gold standard (GS) model repeatedly provides chemically reliable predictions matching experiments. However, its traditional form is limited to molecules the size of an amino acid. In contrast, the current workhorse method [density functional theory (DFT)] can scale to 1000s of atoms and have become one of the ten most widely used tools across science. Yet, DFT methods can still have well-known difficulties when modeling a number of ubiquitous chemical motifs.
The first objective of this project is to develop models that are both more predictive and more affordable. First, we accelerate GS methods significantly while retaining their intrinsic accuracy. Second, we combine accurate components of GS with DFT approaches, thereby increasing the reliability of simulations with only minor sacrifice in efficiency.
The second challenge is to enable productive discussion between the experiments and simulations. This requires predictive modeling of measurable quantities—thermodynamic, kinetic, and spectroscopic data—in experimentally relevant conditions and environments. While these properties are accessible at the DFT level, even the simplest remain unavailable at the accelerated GS level. Our second aim is therefore to overcome scientific and technical barriers to enable GS-level predictions of key molecular properties, governing dynamics and interactions with electromagnetic fields.
The resulting methods will allow us and the simulation community to investigate intricate, (bio)chemically relevant processes, which are far from accessible with chemical accuracy with any current lower-cost model.
In parallel, we devised new ways to combine the strengths of the above quantum chemistry approaches. First, we utilize components from the accelerated GS-type models to develop a new type of DFT method with outstanding accuracy-over-cost performance. Second, we advanced quantum embedding schemes, which apply the more accurate but expensive GS methods to the chemically most relevant regions (e.g. catalytic centers or close-contact intermolecular interactions), while quantum mechnically (QM) embedding them in a larger, efficiently computed DFT environment to account for solvent, biochemical, or crystal environment effects.
The calculation of measurable properties requires advanced techniques to evaluate derviatives needed for the response to external effects. By overcoming data storage and communication bottlenecks, we developed a massively parallel and memory-efficient GS gradient code. On a single 128-core node, this code already outperforms the best current alternatives by an order of magnitude. We also introduced a quantum embedding framework that enables the relatively simpler introduction of localization considerations to GS derivative properties, laying the foundation for further advances in the second half of the project.
Beyond enabling novel (bio)chemical applications (see below), our methods provide unprecedented quality reference data for large, real-world molecules. These references are critical for assessing, improving, and training more affordable methods such as DFT, machine learning (ML), and molecular mechanics (MM).
Our new methods are regularly released in the MRCC package at www.mrcc.hu with open source for academic use and facilitation of commercial use.
Our accelerated GS methods consistently push the limits of chemically accurate models in terms of accessible molecule size and complexity. Our methods reached system sizes above 1000 atoms with converged GS accuracy. Moreover, the main impact is making GS accuracy routinely accesible for practically relevant systems in the 100–200 atom range using broadly availabe computational resources. This range already covers a large portion of modern chemistry, e.g. including increasingly sophisticated and complicated catalytic systems.
This efficiency enables so far inaccessible accuracy for real-life processes, including proper and necessary environment effects. For instance, we made possible to model biochemical processes with up to 300–400 QM atoms at converged GS accuracy. For the first time, real-life drug–protein binding pocket interactions and enzyme catalytic reactions with large cofactors have been modeled at converged GS level of theory. These systems require large active sites and substrates, often involving hundreds of atoms and complex quantum effects that necessitate GS models for robust accuracy.
Similarly challenging homogeneous catalysis and crystal surface processes have become routinely accessible via our accelerated GS methods. Here, we can model large catalysts and high-density condensed-phase systems that were previously computationally prohibitive at the GS level.
Altogether, our methods deliver quantitative accuracy on the energy scale of chemical reactions and provide robust, first-principles insight into the relationship between the electronic/atomic structure and function. These predictive, open access, and affordable tools offer a powerful theoretical foundation for understanding and designing complex molecular systems across (bio)chemistry and materials science.