Skip to main content
European Commission logo print header

Development of Virtual Screening Algorithms: Exploring Multiple Ligand Binding Modes Using Spherical Harmonic Consensus Clustering

Final Report Summary - DOVSA (Development of Virtual Screening Algorithms: Exploring Multiple Ligand Binding Modes Using Spherical Harmonic Consensus Clustering)

During the fellowship, I developed virtual screening algorithms to help deal with cases where multiple ligands may be associated with multiple pocket sub-sites or which may bind multiple targets, using a spherical harmonic surface shape-based approach. The approach was tested and validated using 40 well-known drug targets from the DUD dataset. It was also used to screen ligands belonging to some targets of the MDL Drug Data Report (MDDR) database. I have also continued with the CXCR4 entry inhibitors project with my colleagues at the IQS (Barcelona, Spain).

As a result of this work I had 11 journal articles, 9 oral presentations and 7 posters in international conferences, and 7 seminars/master courses on identifying and quantifying drug promiscuity by correlating ligand and target shape similarities. I was awarded the Best Overall Poster Content in Medicinal Chemistry conference (Lanzarote, Spain) 2012 for the work 'Gaussian ensemble screening (GES): A new approach to polypharmacology and virtual screening'. I was Session Chair in the 3rd International Conference on Drug Discovery and Therapy, 7-10 February 2011, and I form part of the ACS CINF Program Committee for ACS New Orleans - 2013 conference. I am currently supervising a thesis on 'Design of antiviral compounds (VIH). Modelling of potential allosteric inhibitors for the CXCR4 co-receptor', and I have been awarded two times for the work 'Discovery of novel non-cyclam polynitrogenated CXCR4 coreceptor inhibitors'. All the work developed during my Marie Curie Fellowship in Nancy has given me the opportunity to become also a member of the scientific committee for Harmonic Pharma SAS.

Below, I present a brief description of the work carried out to achieve the project's objectives, the main results, conclusions and their potential impact and use.

My DOVSA project focused on the development of novel computational approaches to assist in the discovery of new therapeutic drug molecules. The main practical objective was to advance the state of the art in ligand-based virtual screening (VS) by developing new algorithms based on compact and computationally efficient spherical harmonic (SH) representations of molecular shapes and other chemical properties.

Ligand-based shape matching approaches have become established as important and popular virtual screening (VS) techniques. However, despite their relative success, many authors have discussed how best to choose the initial query compounds and which of their conformations should be used. Furthermore, it is increasingly the case that pharmaceutical companies have multiple ligands for a given target and these may bind in different ways to the same pocket. Conversely, a given ligand can sometimes bind to multiple targets, and this is clearly of great importance when considering drug side-effects. However, traditional shape matching approaches normally use just one conformation of a compound as the query, but it is not known a priori if this is the correct query to use to screen an entire database. For example, other compound families could also be active for the same target but they might only be found in the database if a different query conformation is used. In other words, conventional VS assumes there is only one binding mode for a given protein target. This may be true for some targets, but it is certainly not true in all cases. Several recent studies have shown that some protein targets bind different ligands in different ways (CCR5: Kellenberger et al., 2007; CXCR4: Wong et al., 2008; CDK2: Sato et al., 2006; HIVRT: Lewis et al., 2003; FXA: Taha et al., 2005; LXR: Williams et al., 2003). Hence, the DOVSA project focused on the development of new VS approaches which can detect such cases and which can associate specific sub-sets of ligands with their corresponding receptor binding sites. In my thesis, I introduced the notion of SH-based "consensus shapes" to help deal with these questions.

Given that the consensus clustering developed in my thesis seemed to offer a straight-forward way to explain how multiple diverse ligands might distribute themselves within the CCR5 pocket, and because I achieved very good results for CCR5 and the related CXCR4 system, in the DOVSA project I extended and improved the approach to other protein targets. I applied the consensus shape clustering approach to perform large-scale clustering and cross-docking experiments to the 40 protein-ligand targets in the DUD dataset using PARASURF/PARAFIT to determine the extent to which clustering may be used to predict high affinity non-cognate ligands. Results from clustering showed that in some cases the ligands for a given target were split into two sub-groups, which could suggest they bind to different sub-sites of the same target. In other cases, our clustering approach sometimes grouped together ligands from different targets, and this suggested that those ligands could bind to the same targets.

Hence SH-based clustering can rapidly give cross-docking information while avoiding the expense of performing all-against-all docking calculations. We also reported on the effect of the query conformation on the performance of shape-based screening of the DUD dataset (Directory of Useful Decoys; Huang et al., 2006) and the potential gain in screening performance by using consensus shapes calculated in different ways. We provided details of our analysis of shape-based screening using both PARASURF/PARAFIT [1] and ROCS [2], and we compared the results obtained with shape-based and conventional docking approaches using MSSH/SHEF [3] and GOLD [4]. The utility of each type of query was analyzed using commonly reported statistics such as enrichment factors (EF) and receiver-operator-characteristic (ROC) plots as well as other early performance metrics.

The DUD consensus clustering results suggested that the P38 target could have multiple sub-sites. Hence, the P38 DUD ligand set was clustered using chemical fingerprints and the SH consensus shapes for each of the resulting 15 groups were calculated. The consensus shapes of these clusters were compared in PARAFIT and the resulting pairwise Tanimoto similarity scores were used in another round of hierarchical clustering to obtain three SC clusters. In order to explore how the members of the SC clusters might distribute themselves in the P38 pocket, the SC pseudo-molecules were rigidly docked into the P38 pocket. This placed the SC A pseudo-molecule on one side of the pocket, the SC B pseudo-molecule on the opposite side and SC C was placed in the same way as SC A. These docking poses are consistent with the two known binding sub-sites (ATP and allosteric) observed crystallographically.

On the other hand, I extended the consensus clustering approach to calculate consensus receptor pocket shapes. This allowed me to relate receptors to each other by the SH shape similarity of their ligands and their binding pockets. Since shape complementarity is an essential feature for molecular recognition, using ligand and binding pocket shapes should provide a good way to characterise their properties. If two binding pockets of different proteins share a common shape, it is likely that ligands that bind to part of one binding pocket will also be recognized in the corresponding part of the other pocket. On the other hand, if two ligands of different proteins share a similar shape, it is likely that both of them will complement the shape of each binding pocket. Hence, by identifying similar ligands and binding pocket shapes, my approach aims to provide a shape-based way to predict promiscuous ligands and targets. I applied the approach to a wide range of ligands which Schuffenahuer (Schuffenahuer et al., 2002) previously selected from the MDL Drug Data Report (MDDR) database (MDL Drug Data Report, 2010.2 (MDL Informations Systems Inc., San Leandro, CA, 2010).), and for which crystallographic protein-ligand complexes exist in the Protein Data Bank (PDB) (Berman et al., 2000).

This gave an annotated list of ligands for 249 protein targets of pharmacological interest. The shape similarity between ligands and between binding pockets for these selected protein targets was quantified according to a similarity threshold and a rigorous interaction probability was used to predict promiscuity. I analysed the correlation between binding pocket and ligand shape spaces. It can be seen that the promiscuity predicted for androgen, hydroxymethylglutaryl-coa-reductase, and GABA-A alpha subunit are consistent with several existing MDDR activity classes for the ligands related to these classes. I also compared the promiscuity predictions with experimental activity values extracted from the BindingDB (Liu et al., 2007) database (Perez-Nueno et al. The Open Conference Proceedings Journal, 2011).

The aforementioned ideas and research carried out during the first part of the project culminated in the development of a new and fast polypharmacology tool, Gaussian Ensemble Screening (GES) which predicts polypharmacological relationships between drug classes quantitatively. This approach represents a cluster of molecules with similar spherical harmonic surface shapes as a Gaussian distribution with respect to a selected centre molecule. Calculating the Gaussian overlap between pairs of such clusters allows the similarity between drug classes to be calculated analytically without requiring thousands of bootstrap comparisons, as in current promiscuity prediction approaches. We find that such cluster similarity scores also follow a Gaussian distribution. Hence, a cluster similarity score can be transformed into a probability value, or 'p-value', in order to quantify the relationships between drug classes.

We applied the GES approach to predict relationships between drug classes in a subset of the MDL Drug Data Report (MDDR) and RNA splicing inhibitors (collaboration with Prof Jamal Tazi who heads a top lab in the Institut de Génétique Moleculaire de Montpellier (IGMM), working on RNA splicing related to the HIV life cycle). Our results indicate that GES is a useful way to study polypharmacology relationships, and it could provide a novel way to propose new targets for drug repositioning (Perez-Nueno et al. J. Chem. Inf. Model. 2012, doi: 10.1021/ci3000979). Hence, all DOVSA objectives have been fully accomplished and a new generic polypharmacology tool has emerged from the project, which has high potential to be either exploited as it is or extended easily to represent and compare, not only SH shape, but also distributions of several other molecular attributes.