Proteins are ubiquitous, very diverse, and participate in virtually every cellular process. This functional diversity arises from the defined three-dimensional structure that a protein’s polypeptide chain folds into based on its amino acid sequence. The folding process is finely tuned and easily disturbed, which is apparent through the large number of diseases caused by protein misfolding. Given this context, it is remarkable how the variety of proteins evolved. It turns out that Nature achieved this with a trick, namely by reusing successfully folding fragments and by recombining these to create diversity. In large proteins we can clearly identify independently folding units called protein domains. Similarly folded domains are found in different combinations in multi-domain proteins. Thus, the domain evidently constitutes an evolutionary unit. But also beyond the size of domains, smaller fragments have been identified that likely served as building blocks during the evolution of the domains themselves. Duplication and fusion as well as recombination of protein fragments are believed to be the major mechanisms during the evolution of folded proteins. Both have been tested in protein engineering experiments and appear to provide a good template for protein design.
Based on these observations we set out to develop a general protein design approach using the principle of recombination. Using sensitive sequence comparison methods, we identified more than thousand such evolutionary successful protein fragments and generated a database called Fuzzle (short for Fold Puzzle) that is easily accessible as a webserver. We further classified the fragments based on their associated functions. These fragments can now be used as building blocks in protein design. To test this, we chose a couple of fragments from different folds and tested them in recombination experiments. We were able to build stable hybrid proteins and even achieved the transfer of functional sites with particular fragments. Based on these insights we implemented a computational tool called Protlego that automates the generation of hybrid protein and their analysis.
Overall, the Protein Lego project enabled important insights into two related research fields, namely evolution and design or proteins. The databases and tools we provide enable protein scientists to explore evolutionary relationships and protein engineers to choose building blocks for the construction of new proteins. In fact, we provide a unique engineering methodology that offers an innovative new route to rationally design proteins by assembling existing protein pieces in a Lego-like manner. We have significantly contributed to increasing the general understanding of sequence-structure-function relations in proteins and our ability to design complex, custom-made proteins, which has impact simultaneously both in synthetic biology and bioengineering.