Periodic Reporting for period 1 - EMPHABIOSYS (Emergence of New Phases in Biopolymer Systems)
Berichtszeitraum: 2020-06-16 bis 2022-06-15
The solution of the protein folding problem is of paramount theoretical and practical importance. It will have immediate impact on molecular biology, drug design and nanotechnology. Inspired by nature, the ability to understand and mimic this biological mechanism would lead to novel ways of fabricating biomaterials. More importantly, it will contribute to the societally important issue of human health through an understanding of the principles behind protein folding (and misfolding). These are responsible for cell function and malfunction. Examples include amyloid formation implicated in human neurodegenerative diseases such as Alzheimer’s and Parkinson’s and type 2 diabetes due to the misfolding of insulin protein.
Protein folding problem is a problem of formidable complexity: there are 20 different amino acid types involved, each with its own distinct chemical and geometrical properties; there is the role of water as a solvent; and there is a huge number of degrees of freedom involved. Scientifically, the protein folding problem is a combination of several questions. The first is the prediction of the native state of a protein from the knowledge of its amino acid sequence. Second, there is a question of how the folding occurs, as well as how proteins fold so quickly. Experimentally it is clearly established that biological times for folding are in the range from microseconds to seconds, whereas one might have naively expected these to be astronomically large if the search for the native state had been random among the enormously large number of possible conformations.
Despite the complexity, the puzzles of the protein problem, and the great diversity in protein native state folds, there are remarkable common properties that all proteins share: they are made of emergent building blocks of topologically one-dimensional alpha-helices and almost planar, effectively two-dimensional beta-sheets connected by loops and assembled into the three-dimensional native state structure. Further, proteins can fold rapidly and efficiently, and act as amazingly effective molecular machines.
The overall objective of the project is to find the hidden simplicity that underlies all the complexity involved with protein problem that would explain from the first principles, with a minimum number of essential ingredients, the reason behind all these remarkable common properties of all proteins.
The existence of such a phase naturally opens more general questions: does life exist elsewhere in our cosmos? If yes, does it have to be based on what we know about life on earth? Can one imagine creating nifty nanomachines, without relying on carbon chemistry, in the lab based on lessons learned from our framework? And could one conceive the beginnings of artificial life facilitated by a network of such machines working harmoniously together?
The current state of the art in the field can be summarized in a textbook paradigm: sequence determines structure for proteins. The novelty of our view is that protein sequences rather select from the menu of folds predetermined by symmetry and geometry. This view presents an enormous simplification for evolution and natural selection. In the theater of life, sequences evolve and so do functionalities, but they do so within the fixed backdrop of the immutable protein folds, determined by symmetry and geometry.
There has been a breakthrough in machine learning ideas to connect protein sequence and structure. In particular, the artificial intelligence program AlphaFold developed by Google’s Deep Mind has been chosen by the journal Science as the scientific discovery of the year 2021. However, it cannot reveal the mechanism or rules of protein folding. My work provides a rationale for why machine learning works as well as it does. This is simply because the menu of distinct folds is limited and comprises assemblies of modular building blocks, whose structure we predict quantitatively with no adjustable parameters. More importantly, our work provides a simple explanation of what underlies the nature of protein and amyloid structures, and when concluded will undoubtedly have the potential to improve the predictability of machine learning algorithms.
We expect to fully explore the scope and implications of our theoretical framework in the next year. Our work is directly relevant to tackle the societally important problem of human health, it will be useful for making nifty machines in the lab and it eventually could form the basis for the creation of artificial life.