## Final Report Summary - IETSOL (Calculation of Pharmacokinetic Properties of Druglike Molecules using Integral Equation Theory)

The aim of the IETSOL project was to develop accurate methods to calculate pharmacokinetic properties of druglike molecules based on the Integral Equation Theory (IET) of molecular liquids. IET is a promising method for computer modelling of molecular solutions based on classical statistical mechanics. The theory allows one to compute structural and thermodynamic properties of arbitrary liquid-phase solutions, by solving a set of integral equations and a closure relationship. It has previously been used in qualitative studies of solution chemistry phenomena ranging from solvation of monatomic ions to solvent effects on biomolecules and supramolecular assemblies. Although IET has been an active topic of academic research for over 40 years, however, it has not been widely adopted in applied research in either industry or academia because in its original form it does not afford accurate calculations of solvation thermodynamics across multiple classes of molecules.

The IETSOL project focused on developing new methods within the scope of the Reference Interaction Site Model (RISM) formulism of IET. The project brought several fundamental breakthroughs that increased the accuracy and applicability of both 1D and 3D RISM. These included the first RISM methods for: (i) chemically accurate computation of solvation free energies; (ii) ab initio calculation of the solubility of crystalline druglike molecules; (iii) prediction of relative protein-ligand binding free energies. One of these methods was selected as a research highlight by the British Institutes (http://goo.gl/bz3Or7) while another method developed at the outset of the project was highlighted by the American Institute of Physics and the Max Planck Society (http://goo.gl/OaHCTe and http://goo.gl/cTQ6By). Further details about the main results of the project are provided below.

1. 3D RISM/UC free energy functional

We developed a conceptually new free energy functional (3D RISM/UC) in the integral equation theory of molecular liquids that gives accurate calculations of hydration thermodynamics for druglike molecules. The functional provides an improved description of excluded volume effects by incorporating two free coefficients. When the values of these coefficients were obtained from experimental data for simple organic molecules, the hydration free energies of an external test set of druglike molecules could be calculated with an accuracy of about 1 kcal/mol. The 3D RISM/UC method is easily implemented using existing computational software and allows in silico screening of the solvation thermodynamics of potential pharmaceutical molecules at significantly lower computational expense than explicit solvent simulations.

2. First Principles Calculation of Solubility

We demonstrated that the intrinsic aqueous solubility of crystalline druglike molecules can be estimated with reasonable accuracy from sublimation free energies calculated using crystal lattice simulations and hydration free energies calculated using the 3D Reference Interaction Site Model (3D-RISM) of the Integral Equation Theory of Molecular Liquids (IET). The solubilities of 25 crystalline druglike molecules taken from different chemical classes were predicted by the model with a correlation coefficient of R = 0.85 and a root mean square error (RMSE) equal to 1.45 log S units, which was significantly more accurate than results obtained using implicit continuum solvent models. The method is not directly parametrized against experimental solubility data, and it offers a full computational characterization of the thermodynamics of transfer of the drug molecule from crystal phase to gas phase to dilute aqueous solution.

3. Computational Alanine Scanning using MM-3DRISM/UC: Binding in the Bovine Chymosin - Bovine k-Casein complex.

We demonstrated that the relative binding thermodynamics of single-point mutants of a model

protein-peptide complex (the bovine chymosin-bovine?-casein complex) can be calculated accurately and efficiently using molecular integral equation theory. The results were shown to be in good overall agreement with those obtained using implicit continuum solvation models. Unlike the implicit continuum models, however, molecular integral equation theory provides useful information about the distribution of solvent density. The experimentally observed water-binding sites on the surface of bovine chymosin could be identified quickly and accurately from the density distribution functions computed by molecular integral equation theory. The bovine chymosin-bovine ?-casein complex is of industrial interest because bovine chymosin is widely used to cleave bovine ?-casein and to initiate milk clotting in the manufacturing of processed dairy products. The results shed light on the recent discovery that camel chymosin is a more efficient clotting agent than bovine chymosin for bovine milk.

4. RISM-MOL-INF: a new toolkit for modelling solvation complexes in molecular informatics applications.

We proposed a fast and efficient method to compute molecular descriptors based on the Integral Equation Theory of Molecular Liquids. The new RISM-MOL-INF descriptors are calculated from the solvation complex formed between solute and solvent molecules, and hence they capture molecular-scale solvation and desolvation effects that are omitted by standard molecular descriptors and implicit continuum solvation models. Hydration free energy and intrinsic aqueous solubility of organic molecules can be predicted accurately using machine learning algorithms trained on RISM-MOL-INF descriptors only. Due to the importance of solvation and desolvation effects in biological systems, it is anticipated that the RISM-MOL-INF descriptors will find many applications in biophysical and biomedical property prediction.

The IETSOL project focused on developing new methods within the scope of the Reference Interaction Site Model (RISM) formulism of IET. The project brought several fundamental breakthroughs that increased the accuracy and applicability of both 1D and 3D RISM. These included the first RISM methods for: (i) chemically accurate computation of solvation free energies; (ii) ab initio calculation of the solubility of crystalline druglike molecules; (iii) prediction of relative protein-ligand binding free energies. One of these methods was selected as a research highlight by the British Institutes (http://goo.gl/bz3Or7) while another method developed at the outset of the project was highlighted by the American Institute of Physics and the Max Planck Society (http://goo.gl/OaHCTe and http://goo.gl/cTQ6By). Further details about the main results of the project are provided below.

1. 3D RISM/UC free energy functional

We developed a conceptually new free energy functional (3D RISM/UC) in the integral equation theory of molecular liquids that gives accurate calculations of hydration thermodynamics for druglike molecules. The functional provides an improved description of excluded volume effects by incorporating two free coefficients. When the values of these coefficients were obtained from experimental data for simple organic molecules, the hydration free energies of an external test set of druglike molecules could be calculated with an accuracy of about 1 kcal/mol. The 3D RISM/UC method is easily implemented using existing computational software and allows in silico screening of the solvation thermodynamics of potential pharmaceutical molecules at significantly lower computational expense than explicit solvent simulations.

2. First Principles Calculation of Solubility

We demonstrated that the intrinsic aqueous solubility of crystalline druglike molecules can be estimated with reasonable accuracy from sublimation free energies calculated using crystal lattice simulations and hydration free energies calculated using the 3D Reference Interaction Site Model (3D-RISM) of the Integral Equation Theory of Molecular Liquids (IET). The solubilities of 25 crystalline druglike molecules taken from different chemical classes were predicted by the model with a correlation coefficient of R = 0.85 and a root mean square error (RMSE) equal to 1.45 log S units, which was significantly more accurate than results obtained using implicit continuum solvent models. The method is not directly parametrized against experimental solubility data, and it offers a full computational characterization of the thermodynamics of transfer of the drug molecule from crystal phase to gas phase to dilute aqueous solution.

3. Computational Alanine Scanning using MM-3DRISM/UC: Binding in the Bovine Chymosin - Bovine k-Casein complex.

We demonstrated that the relative binding thermodynamics of single-point mutants of a model

protein-peptide complex (the bovine chymosin-bovine?-casein complex) can be calculated accurately and efficiently using molecular integral equation theory. The results were shown to be in good overall agreement with those obtained using implicit continuum solvation models. Unlike the implicit continuum models, however, molecular integral equation theory provides useful information about the distribution of solvent density. The experimentally observed water-binding sites on the surface of bovine chymosin could be identified quickly and accurately from the density distribution functions computed by molecular integral equation theory. The bovine chymosin-bovine ?-casein complex is of industrial interest because bovine chymosin is widely used to cleave bovine ?-casein and to initiate milk clotting in the manufacturing of processed dairy products. The results shed light on the recent discovery that camel chymosin is a more efficient clotting agent than bovine chymosin for bovine milk.

4. RISM-MOL-INF: a new toolkit for modelling solvation complexes in molecular informatics applications.

We proposed a fast and efficient method to compute molecular descriptors based on the Integral Equation Theory of Molecular Liquids. The new RISM-MOL-INF descriptors are calculated from the solvation complex formed between solute and solvent molecules, and hence they capture molecular-scale solvation and desolvation effects that are omitted by standard molecular descriptors and implicit continuum solvation models. Hydration free energy and intrinsic aqueous solubility of organic molecules can be predicted accurately using machine learning algorithms trained on RISM-MOL-INF descriptors only. Due to the importance of solvation and desolvation effects in biological systems, it is anticipated that the RISM-MOL-INF descriptors will find many applications in biophysical and biomedical property prediction.