Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS

Determining in vivo protein structures and understanding genetic interactions using deep mutagenesis

Periodic Reporting for period 2 - MUTANOMICS (Determining in vivo protein structures and understanding genetic interactions using deep mutagenesis)

Reporting period: 2022-07-01 to 2023-12-31

Changes in DNA sequence (mutations) cause thousands of different genetic diseases and underlie evolution. However, after 70 years of molecular biology, we remain rather limited in our ability to predict how changes in sequence alter the properties of the protein molecular machines encoded by DNA. This limited capacity to predict how changes in sequence alter the activities of proteins fundamentally limits clinical genetics - for example the identification of disease-causing mutations - and makes engineering biology difficult and slow.

To address this shortcoming, we are developing methods that allow us to quantify the precise molecular effects of millions of changes in the sequences of many different proteins on their molecular properties. Applied at scale, these approaches will allow us to generate reference atlases of mutational effects for clinical genetics and, more fundamentally, datasets of sufficient size and diversity to allow the fundamental ‘encoding’ problems of molecular biology to be directly tackled using computational approaches, including artificial intelligence. The long-term objective is to be able to understand, predict and engineer the sequence-to-activity relationships that underlie essentially all of biology.
My laboratory has developed sequencing-based assays to quantify in parallel the biophysical properties of hundreds of thousands of protein and RNA variants. To date we have developed and validated high throughput selections assays to quantify the effects of hundreds of thousands of mutations on protein stability, binding affinity and aggregation. We have also developed computational pipelines to analyse the resulting data and we have developed approaches to use the combined datasets to infer the biophysical effects of mutations.
Applying these methods has allowed us to produce the first global maps of how mutations alter the stability and binding energies of multiple proteins and the first systematic maps of long-range ‘allosteric’ communication in proteins. We have also produced the first global maps of how mutations alter the aggregation of proteins, showing that this data accurately predicts mutations known to cause familial forms of Alzheimer’s disease. We are now scaling up these methods to produce reference atlases of mutational effects across structurally diverse proteins and using them to address the challenge of predicting what happens when many different mutations are combined.
Allosteric landscape of a protein domain