Periodic Reporting for period 1 - FLINDIP (Fitness landscape of intrinsically disordered proteins)
Okres sprawozdawczy: 2020-12-01 do 2022-11-30
In this project, we are focusing on intrinsically disordered proteins, a large group of proteins whose mechanism of action and role is poorly understood. Such proteins do not adopt a unique native structure. Instead, they explore numerous conformations depending on external conditions. Recent bioinformatic analyses show that up to 15% of all proteins are intrinsically disordered.
result
Why is it important for society?
The critical role of intrinsically disordered proteins in cellular functions and in the onset of pathological conditions generated significant interest for their study. Much effort has been devoted to map the effects of particular mutations on protein functionality. However, there has been no attempt to systematically study the genotype-to-phenotype link in intrinsically disordered proteins. Obtaining such information is essential for improving theoretical models of protein folding and molecular evolution, as well as for de-novo design of intrinsically disordered proteins with improved activities.
What are the overall objectives?
The aim of this project is to experimentally measure and analyse, the genotype-to-phenotype connection for several intrinsically disordered proteins by deep mutational scanning. We are focusing on disordered proteins that contribute to surviving complete desiccation in their host organisms. The approach we follow is to generate libraries of cells expressing hundreds of thousands of variants of these proteins and to perform competition assays to measure functionality of every variant in the library. We then use a variety of approaches to analyse the dataset, including machine-learning algorithms to model the fitness of variants and to design new functional intrinsically disordered proteins.
Conclusion of the action:
The project was made of two work packages. The first package consisted in producing libraries of mutants of the target genes. Despite our best efforts the phenotype of the target genes previously published by another lab could never be reproduced so we decided to work on different genes. The our lab closed for a long period of time due to Covid-19. Upon resuming lab work, preliminary data on the new genes showed promising results. Production of new libraries is ongoing. The second work package is data analysis. Due to the amount of data produced by our approach we decided to tackle data analysis by using machine learning. Our collaborators at the Kondrashov lab shared a dataset similar to the one we plan to obtain (mutations versus phenotype).The results of this study were published, and the pipeline developed for this data is fully reusable on disordered proteins.
The second work package was to develop a pipeline to analyse the experimental data. Our collaborators at the Kondrashov lab shared a dataset of mutants vs phenotype for 4 green fluorescent proteins (GFPs). The results were published (Sommermeyer et al., 2022) and the code, available on Github can be reused on intrinsically disordered proteins. In the paper, we characterized the fitness peaks of four GFPs with a broad range of sequence divergence. Two studied fitness peaks were highly sensitive to mutations and epistatis, two were mutationally robust. Interestingly, mutationally robust proteins are not optimal templates for machine-learning-driven protein design. Instead, predictions were more accurate for mutationally fragile proteins. This conclusion gives useful insights for protein engineering.
Large datasets on the functionality of proteins are in high demand for studying their structure and evolution. The GFPs datasets provided valuable insights on protein fitness landscapes and the generalisation of findings on protein function. We showed that proteins sensitivity to mutations and epistasis did not correlate with sequence divergence.
(ii) in the medical domain
The experimental work planned in the frame of this project on intrinsically disordered proteins of could not be completed by the end of the grant, but it shows promising progress. We expect our results on intrinsically disordered proteins to have a beneficial impact in the medical domain in the long run, by shedding light on the workings of other intrinsically disordered proteins.
(iii) in the technological domain
Firstly, we showed that mutationally robust proteins are not optimal templates for machine-learning-driven protein design. Our demonstration that mutant predictions is more accurate for fragile proteins with high epistasis provides a critical strategic insight for protein engineering.
Secondly the methodology and pipeline developed for this project are transposable to other proteins. In the longer run, intrinsically disordered proteins associated with desiccation resistance present a high potential biotechnological value, as they could be transferred as an independent functional genetic module to other organisms and provide the phenotype of resistance to desiccation. They could also be used as an additive in enzyme solution to protect the latter from denaturation and loss of activity after desiccation.
All the resources generated by these projects have been made publicly available and reusable. Analysis results have been published (Gonzalez Somermeyer et al., 2022) and presented in local and international seminars and conferences, datasets have been deposited in public archives, genetic resources are available upon request, source code is available from GitHub.