Skip to main content

Statistical Tools for Reaction Efficacy AssessMent: Prediction and Understanding in Organocatalyst Discovery

Periodic Reporting for period 1 - STREAM (Statistical Tools for Reaction Efficacy AssessMent: Prediction and Understanding in Organocatalyst Discovery)

Reporting period: 2018-09-01 to 2020-08-31

The need for chiral compounds, often as single enantiomers, has escalated abruptly in recent years, driven particularly by the demands of the pharmaceutical industry. For example, two-thirds of prescription drugs are chiral, with the majority of new chiral drugs being administered as single enantiomers. Moreover, chiral compounds have found use as agricultural chemicals, flavors, fragrances, and materials. In response to this widespread demand, chemists have recorded impressive successes for the preparation of stereochemically pure compounds. Both small molecule catalysts and enzymes have been developed for such purposes. However, it was only relatively recently that such asymmetric catalysis, with enantiomeric excesses approaching 100%, was achieved with synthetic catalysts. Therefore, innovations in asymmetric reaction/catalyst development represents a goal of considerable significance for world-wide health and would improve the lives of millions of people. Thus, the overarching objective of the action was to develop data science-driven workflows that enable the prediction of experimental enantioselectivity. In these studies, we concluded that multivariate linear regression can correlate the structure of every reaction component (substrate, catalyst, and conditions) to enantioselectivity data. The resulting statistical models can anticipate how changing the substrate and catalyst structure alters the reaction outcome. We showed that these general reaction models are accurate in predicting out-of-sample, and provide guidance in reaction application to include additional substrates.
Reaction development is focused on the identification of optimal conditions (reagents, catalyst, solvent, time, temperature, etc.) that facilitates the conversion of starting materials to a desired product. The application of reaction conditions from closely-related reactions to the target transformation is one technique that largely drives the current process of new method development. Unfortunately, this approach often fails owing to subtle differences in reaction requirements. Therefore, the optimization process continues to be a resource-intensive, empirical endeavor. Therefore, in this work, we outlined a complementary manner in which classical physical organic techniques and high-level calculations can be merged to allow the integration of optimization and mechanistic assessment of catalytic reactions. In this approach, DFT-derived parameter sets describing the important structural features of the reaction components are related to experimental outputs. The resulting mathematical equation, generally consisting of multiple terms, can be deployed to predict the reaction outcome. The reactions and catalysts under study ranged significantly in structure and application, but the general goal was to understand the interactions responsible for effective catalysis and to develop new data-driven tools that will facilitate reaction design. Specifically, we have developed such a workflow and applied it to three key problems in chemical synthesis: 1) streamline the empirical, costly process of reaction optimization (Nature 2019, 571, 343), 2) enable applications of reactions to more include additional substrates (JACS, 2019, 141, 19178). And 3) as the data-driven tools utilize physical organic methods to describe molecules numerically, the resulting correlations can be interpreted to provide mechanistic insights of how catalysts/substrates interact (Chem. Sci. 2020, 11, 6450). Results from the action were published in three elite chemistry journals and the key results further presented at 5 top US universities. Perhaps more importantly, the supporting information of each published report contains correlation tools including parameter lists and virtual screening libraries to facilitate application by any research group. Ultimately, maximizing access and re-use of our research data. Note, these publications are available through open access.
Progress beyond the state-of-the-art and potential impacts:
• Collaboration in organic chemistry is relatively rare, and the merging of our different interests and capabilities in the broad areas of asymmetric catalysis and mechanistic studies allowed for an innovative, modern approach to asymmetric reaction development.
• The projects described were highly multi-disciplinary; phases of this proposal required aspects of computational chemistry, chemoinformatics, and statistics in the development and validation of mathematical models. Allowing a wide-ranging skillset to be built.
• The work performed in the granting period is directly relevant to three Horizon 2020 themes
• The new workflows developed during the granting period are anticipated to be general and applicable in principle to any chemical system
• The supported work has directly impacted the fellow's career prospects. The researcher will begin the position of assistant professor at the University of British Columbia, Vancouver, Canada.
• Advances in developing predictive strategies for organic synthesis will impact how one performs experiments in the laboratory and accelerate the drug discovery and development process.
MC_Summary