Community Research and Development Information Service - CORDIS

Periodic Report Summary 1 - EVO-COUPLINGS (STATISTICAL ANALYSIS OF PROTEIN SEQUENCES TO INFER 3D STRUCTURE AND FUNCTION)

Marie Curie CIG reporting document

Project Objective Summary

Annex 1 of the Grant Agreement contained two central Aims, which were each broken down into a number of sub-projects. The first half of the grant period focused on the project objectives in Aim 1, while the second half of the grant period will focus on those contained in Aim 2.

Aim 1 of Annex 1 of the Grant Agreement identifies criteria for whether a given protein sequence alignment contains sufficient information for protein tertiary structure prediction.

Aims 1.1-1.4 require the development of a framework that describes:
(i) The quality of sequence alignment data available for different protein families, and
(ii) The technical challenge posed by prediction of protein tertiary structure for different protein families.

Aim 1.5-1.6 requires analysis of the accuracy of our maximum entropy model, and the ability of the models built for different protein families to discriminate functional from non-functional sequences.

Ultimately I aim to develop and apply statistical methods that will allow useful information to be reliably extracted from large biological datasets.

Progress Summary

Overall the sub-parts of Aim 1 have been addressed both for protein sequence alignments, and also, beyond the work envisaged in the grant proposal, in both the context of protein-protein interactions and in addition the context of small molecule ligand binding to protein receptors. In this latter context, the aim is to use covariance analysis to build a model of protein ligand binding. Analogously to protein tertiary structure prediction, this model is used to predict whether or not any given ligand is likely to bind to a protein receptor of interest.

Results Summary

Bitbol, A. F., Dwyer, R. S., Colwell*, L. J., & Wingreen*, N. S. (2016). Inferring interaction partners from protein sequences. Proceedings of the National Academy of Sciences, 113(43), 12180-12185.

Specific protein−protein interactions are crucial in the cell, both to ensure the formation and stability of multiprotein complexes and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners, causing their sequences to be correlated. Here we exploit these correlations to accurately identify, from sequence data alone, which proteins are specific interaction partners. Our general approach, uses a pairwise maximum entropy model to infer couplings between residues. We introduce an iterative algorithm to predict specific interaction partners from two protein families whose members are known to interact.. We obtain a striking 0.93 true positive fraction on our complete dataset (of bacterial two component signaling systems, and ABC transporter complexes) without any a priori knowledge of interaction partners.

Lee, Alpha A., Brenner, Michael P., and Lucy J. Colwell. "Predicting protein–ligand affinity with a random matrix framework." Proceedings of the National Academy of Sciences 113.48 (2016): 13564-13569.

Rapid determination of whether a candidate compound will bind to a particular target receptor remains a stumbling block in drug discovery. We use an approach inspired by random matrix theory to decompose the known ligand set of a target in terms of orthogonal “signals” of salient chemical features, and distinguish these from the much larger set of ligand chemical features that are not relevant for binding to that particular target receptor. After removing the noise caused by finite sampling, we show that the similarity of an unknown ligand to the remaining, cleaned chemical features is a robust predictor of ligand–target affinity, performing as well or better than any algorithm in the published literature.

This document will be updated with details of further publications as they become available.

Dissemination Summary and expected final results, including the socio-economic impact and the wider societal implications of the project so far.

Dissemination activities include publication in peer-reviewed journals and presentations by the PI at meetings, conferences and workshops and also at institutions within the EU and further afield of the major results achieved during this reporting period. The socio-economic impact beyond academia is reflected in the large number of invited talks that the PI has given at different companies across the UK during the first reporting period.

The wider societal implications of the project so far are reflected in the activities of the PI to promote the role of women in science through organization of a departmental seminar series and symposia that feature significant numbers of female speakers at all career stages.

Further societal implications of the project are realized through the efforts of the PI to coordinate and deliver a graduate level module on computational structural biology, which features 16 hours of contact teaching time by the PI and provides MPhil students in computational biology with the tools necessary to implement the techniques that are described in lectures by working on small individual research projects.

Reported by

THE CHANCELLOR, MASTERS AND SCHOLARS OF THE UNIVERSITY OF CAMBRIDGE
United Kingdom

Subjects

Life Sciences
Follow us on: RSS Facebook Twitter YouTube Managed by the EU Publications Office Top