Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Protein design from sub-domain sized fragments

Periodic Reporting for period 5 - Protein Lego (Protein design from sub-domain sized fragments)

Reporting period: 2020-08-01 to 2021-08-31

Proteins are ubiquitous, very diverse, and participate in virtually every cellular process. This functional diversity arises from the defined three-dimensional structure that a protein’s polypeptide chain folds into based on its amino acid sequence. The folding process is finely tuned and easily disturbed, which is apparent through the large number of diseases caused by protein misfolding. Given this context, it is remarkable how the variety of proteins evolved. It turns out that Nature achieved this with a trick, namely by reusing successfully folding fragments and by recombining these to create diversity. In large proteins we can clearly identify independently folding units called protein domains. Similarly folded domains are found in different combinations in multi-domain proteins. Thus, the domain evidently constitutes an evolutionary unit. But also beyond the size of domains, smaller fragments have been identified that likely served as building blocks during the evolution of the domains themselves. Duplication and fusion as well as recombination of protein fragments are believed to be the major mechanisms during the evolution of folded proteins. Both have been tested in protein engineering experiments and appear to provide a good template for protein design.
Based on these observations we set out to develop a general protein design approach using the principle of recombination. Using sensitive sequence comparison methods, we identified more than thousand such evolutionary successful protein fragments and generated a database called Fuzzle (short for Fold Puzzle) that is easily accessible as a webserver. We further classified the fragments based on their associated functions. These fragments can now be used as building blocks in protein design. To test this, we chose a couple of fragments from different folds and tested them in recombination experiments. We were able to build stable hybrid proteins and even achieved the transfer of functional sites with particular fragments. Based on these insights we implemented a computational tool called Protlego that automates the generation of hybrid protein and their analysis.
Overall, the Protein Lego project enabled important insights into two related research fields, namely evolution and design or proteins. The databases and tools we provide enable protein scientists to explore evolutionary relationships and protein engineers to choose building blocks for the construction of new proteins. In fact, we provide a unique engineering methodology that offers an innovative new route to rationally design proteins by assembling existing protein pieces in a Lego-like manner. We have significantly contributed to increasing the general understanding of sequence-structure-function relations in proteins and our ability to design complex, custom-made proteins, which has impact simultaneously both in synthetic biology and bioengineering.
In the ERC Protein Lego project we first performed sensitive sequence-based homology searches in large-scale. Using similarity network analysis, we identified more than 1000 protein fragments of various length among different protein folds that represent versatile building blocks for protein design. With the results we set up a database that was made accessible in a webserver (fuzzle.uni-bayreuth.de) allowing users to individually filter the dataset and create customized networks for folds of interest. It is a resource for structural and evolutionary biologists and provides raw material for the design of custom-made proteins. In a second step we extended the database to include ligand information. This addition is a significant asset since now protein fragments that bind specific ligands can be identified and analyzed. Often the mode of binding a ligand is conserved in proteins thereby supporting a common evolutionary origin. The same can now be explored for subdomain-sized fragments within this database. The ligand binding information can also be used in protein engineering to graft binding pockets into other protein scaffolds or to transfer functional sites via recombination of a specific fragment.
The ability to recombine subdomain-sized fragments is what we explored next. In particular we focused on a set of different protein folds and recombined fragments that originated from these. We successfully generated a number of stable hybrid proteins that were analysed in detail. In these designs we explored the option of transferring functional sites associated with a particular fragment and could show that this is a valid method for the introduction of binding properties. Additionally, we gained insights into evolutionary relationships of the investigated protein folds. Based on these insights we developed a tool to automate the design approach and to computationally test different options for recombination. This new tool called Protlego is publicly available (https://hoecker-lab.github.io/protlego) and allows a user to fetch fragments from the Fuzzle database, choose a particular pair of which to generate possible recombinants, and energetically score these. The program includes tools for subsequent analysis that we also made generally available in a stand-alone webserver (http://proteintools.uni-bayreuth.de).
We reported about the results and tools at many international conferences, workshops and seminars. The knowledge is further disseminated through publications as journal articles.
The Fuzzle database, a global survey of inter-fold similarities in which data can be browsed at different levels of complexity, has provided a new state of the art. It gives an overview of all folds against all, provides sequence alignments and structural superpositions of individual pairs, as well as information of associated functionalities, and serves as a resource of choosing protein fragments as building blocks for hybrid designs. This bioinformatic resource combined with the experimental exploration of hybrid protein design and its first automation opens a new area for design by recombination.
summary-pic.png