Skip to main content
Weiter zur Homepage der Europäischen Kommission (öffnet in neuem Fenster)
Deutsch Deutsch
CORDIS - Forschungsergebnisse der EU
CORDIS

Fitness landscape of intrinsically disordered proteins

Periodic Reporting for period 1 - FLINDIP (Fitness landscape of intrinsically disordered proteins)

Berichtszeitraum: 2020-12-01 bis 2022-11-30

What is the problem/issue being addressed?
In this project, we are focusing on intrinsically disordered proteins, a large group of proteins whose mechanism of action and role is poorly understood. Such proteins do not adopt a unique native structure. Instead, they explore numerous conformations depending on external conditions. Recent bioinformatic analyses show that up to 15% of all proteins are intrinsically disordered.
result
Why is it important for society?
The critical role of intrinsically disordered proteins in cellular functions and in the onset of pathological conditions generated significant interest for their study. Much effort has been devoted to map the effects of particular mutations on protein functionality. However, there has been no attempt to systematically study the genotype-to-phenotype link in intrinsically disordered proteins. Obtaining such information is essential for improving theoretical models of protein folding and molecular evolution, as well as for de-novo design of intrinsically disordered proteins with improved activities.

What are the overall objectives?
The aim of this project is to experimentally measure and analyse, the genotype-to-phenotype connection for several intrinsically disordered proteins by deep mutational scanning. We are focusing on disordered proteins that contribute to surviving complete desiccation in their host organisms. The approach we follow is to generate libraries of cells expressing hundreds of thousands of variants of these proteins and to perform competition assays to measure functionality of every variant in the library. We then use a variety of approaches to analyse the dataset, including machine-learning algorithms to model the fitness of variants and to design new functional intrinsically disordered proteins.

Conclusion of the action:
The project was made of two work packages. The first package consisted in producing libraries of mutants of the target genes. Despite our best efforts the phenotype of the target genes previously published by another lab could never be reproduced so we decided to work on different genes. The our lab closed for a long period of time due to Covid-19. Upon resuming lab work, preliminary data on the new genes showed promising results. Production of new libraries is ongoing. The second work package is data analysis. Due to the amount of data produced by our approach we decided to tackle data analysis by using machine learning. Our collaborators at the Kondrashov lab shared a dataset similar to the one we plan to obtain (mutations versus phenotype).The results of this study were published, and the pipeline developed for this data is fully reusable on disordered proteins.
The first work package was to generate mutants of our target genes and to measure the phenotypic impact of mutations. The target genes are associated with desiccation resistance in tardigrade, their source organism but also in bacteria and yeast (Boothby et al 2017). First, we optimised the sequences for expression in yeast/bacteria. We used the optimised sequence as template in error-prone PCR to generate libraries of random mutants. The libraries were cloned into expression plasmids and transformed into bacteria/yeast cells, ready for screening. In parallel we attempted to reproduce the results of Boothby et al 2017 but could never observe the reported phenotype. At this point, our lab closed due to Covid-19. When lab work resumed we chose to investigate desiccation-resistance proteins from the Late embryogenesis abundant (LEA) family published in Liu et al., 2019. These sequences are extremely rich in repeated regions, which creates difficulties for gene synthesis, assembly and sequencing. Still, our preliminary experiments with these genes showed a promising phenotype in bacteria. We are near finishing the preparation of the new libraries.
The second work package was to develop a pipeline to analyse the experimental data. Our collaborators at the Kondrashov lab shared a dataset of mutants vs phenotype for 4 green fluorescent proteins (GFPs). The results were published (Sommermeyer et al., 2022) and the code, available on Github can be reused on intrinsically disordered proteins. In the paper, we characterized the fitness peaks of four GFPs with a broad range of sequence divergence. Two studied fitness peaks were highly sensitive to mutations and epistatis, two were mutationally robust. Interestingly, mutationally robust proteins are not optimal templates for machine-learning-driven protein design. Instead, predictions were more accurate for mutationally fragile proteins. This conclusion gives useful insights for protein engineering.
(i) In the academic domain
Large datasets on the functionality of proteins are in high demand for studying their structure and evolution. The GFPs datasets provided valuable insights on protein fitness landscapes and the generalisation of findings on protein function. We showed that proteins sensitivity to mutations and epistasis did not correlate with sequence divergence.
(ii) in the medical domain
The experimental work planned in the frame of this project on intrinsically disordered proteins of could not be completed by the end of the grant, but it shows promising progress. We expect our results on intrinsically disordered proteins to have a beneficial impact in the medical domain in the long run, by shedding light on the workings of other intrinsically disordered proteins.
(iii) in the technological domain
Firstly, we showed that mutationally robust proteins are not optimal templates for machine-learning-driven protein design. Our demonstration that mutant predictions is more accurate for fragile proteins with high epistasis provides a critical strategic insight for protein engineering.
Secondly the methodology and pipeline developed for this project are transposable to other proteins. In the longer run, intrinsically disordered proteins associated with desiccation resistance present a high potential biotechnological value, as they could be transferred as an independent functional genetic module to other organisms and provide the phenotype of resistance to desiccation. They could also be used as an additive in enzyme solution to protect the latter from denaturation and loss of activity after desiccation.
All the resources generated by these projects have been made publicly available and reusable. Analysis results have been published (Gonzalez Somermeyer et al., 2022) and presented in local and international seminars and conferences, datasets have been deposited in public archives, genetic resources are available upon request, source code is available from GitHub.
Graphical abstract of the FLINDIP project
Mein Booklet 0 0