European Commission logo
français français
CORDIS - Résultats de la recherche de l’UE
CORDIS

Emergence of New Phases in Biopolymer Systems

Periodic Reporting for period 1 - EMPHABIOSYS (Emergence of New Phases in Biopolymer Systems)

Période du rapport: 2020-06-16 au 2022-06-15

The aim of the EMPHABIOSYS project is to understand proteins, the molecular machines of life that are key actors in virtually all biochemical processes within the cell. At a biochemical level, proteins are short chain molecules built from a menu of 20 amino acids, each bearing a distinct sidechain, linked together into a linear polypeptide chain. The proteins we observe in nature today have evolved to perform specific functions. They are physiologically active only when their linear chain folds in aqueous solution into a unique three-dimensional structure characteristic of each protein. The knowledge of this so-called native state structure of a protein is crucial for understanding its biological function. The way a protein can efficiently, reversibly, and reproducibly acquire its unique native state, starting from an extended, random-coil configuration, is the so-called protein folding problem. It represents a remarkable example of a self-assembly process that has so far eluded a complete explanation, notwithstanding more than 50 years of intensive research. Protein folding remains one of the fundamental open questions across the fields of contemporary molecular biology, biochemistry, and biophysics. The journal Science on its 125th anniversary in 2005, classified it among the Top100 most important and challenging problems facing scientists in the next quarter of century.

The solution of the protein folding problem is of paramount theoretical and practical importance. It will have immediate impact on molecular biology, drug design and nanotechnology. Inspired by nature, the ability to understand and mimic this biological mechanism would lead to novel ways of fabricating biomaterials. More importantly, it will contribute to the societally important issue of human health through an understanding of the principles behind protein folding (and misfolding). These are responsible for cell function and malfunction. Examples include amyloid formation implicated in human neurodegenerative diseases such as Alzheimer’s and Parkinson’s and type 2 diabetes due to the misfolding of insulin protein.

Protein folding problem is a problem of formidable complexity: there are 20 different amino acid types involved, each with its own distinct chemical and geometrical properties; there is the role of water as a solvent; and there is a huge number of degrees of freedom involved. Scientifically, the protein folding problem is a combination of several questions. The first is the prediction of the native state of a protein from the knowledge of its amino acid sequence. Second, there is a question of how the folding occurs, as well as how proteins fold so quickly. Experimentally it is clearly established that biological times for folding are in the range from microseconds to seconds, whereas one might have naively expected these to be astronomically large if the search for the native state had been random among the enormously large number of possible conformations.

Despite the complexity, the puzzles of the protein problem, and the great diversity in protein native state folds, there are remarkable common properties that all proteins share: they are made of emergent building blocks of topologically one-dimensional alpha-helices and almost planar, effectively two-dimensional beta-sheets connected by loops and assembled into the three-dimensional native state structure. Further, proteins can fold rapidly and efficiently, and act as amazingly effective molecular machines.

The overall objective of the project is to find the hidden simplicity that underlies all the complexity involved with protein problem that would explain from the first principles, with a minimum number of essential ingredients, the reason behind all these remarkable common properties of all proteins.
The native-state structures of globular properties are stable and well packed indicating that self-interactions are favored over protein-solvent interactions under folding conditions. We have used this a guiding principle to derive the geometry of the building blocks of protein structures: alpha-helices and strands assembled in beta-sheets, with no adjustable parameters, no amino acid sequence information, and no chemistry. We have found an almost perfect fit between the dictates of mathematics and physics and the rules of quantum chemistry. More specifically, starting from a single constructive hypothesis that the protein backbone should to be viewed as a tube of non-zero thickness (that respects the correct symmetry) and seek to maximize its self-interaction in its native state by adopting space-filling conformations (physical driving principle), we theoretically derive the existence of the protein building blocks and their detailed geometries that are in excellent accord with experimental data on more than 4000 high resolution protein structures.
Our results demonstrate that the correct symmetry and geometry, rather than full atomistic details are the key ingredients that drive protein folding. Moreover, because symmetry plays a key role in determining the phases of matter and because our results are completely independent of any microscopic details, they strongly point towards the existence of a new phase of matter in which protein native state structures reside.

The existence of such a phase naturally opens more general questions: does life exist elsewhere in our cosmos? If yes, does it have to be based on what we know about life on earth? Can one imagine creating nifty nanomachines, without relying on carbon chemistry, in the lab based on lessons learned from our framework? And could one conceive the beginnings of artificial life facilitated by a network of such machines working harmoniously together?

The current state of the art in the field can be summarized in a textbook paradigm: sequence determines structure for proteins. The novelty of our view is that protein sequences rather select from the menu of folds predetermined by symmetry and geometry. This view presents an enormous simplification for evolution and natural selection. In the theater of life, sequences evolve and so do functionalities, but they do so within the fixed backdrop of the immutable protein folds, determined by symmetry and geometry.

There has been a breakthrough in machine learning ideas to connect protein sequence and structure. In particular, the artificial intelligence program AlphaFold developed by Google’s Deep Mind has been chosen by the journal Science as the scientific discovery of the year 2021. However, it cannot reveal the mechanism or rules of protein folding. My work provides a rationale for why machine learning works as well as it does. This is simply because the menu of distinct folds is limited and comprises assemblies of modular building blocks, whose structure we predict quantitatively with no adjustable parameters. More importantly, our work provides a simple explanation of what underlies the nature of protein and amyloid structures, and when concluded will undoubtedly have the potential to improve the predictability of machine learning algorithms.

We expect to fully explore the scope and implications of our theoretical framework in the next year. Our work is directly relevant to tackle the societally important problem of human health, it will be useful for making nifty machines in the lab and it eventually could form the basis for the creation of artificial life.
Theoretical geometries of two types of protein beta-sheets are in excellent accord with experiments.
Theoretically derived geometry of protein alpha-helix is in excellent accord with experimental data.
Illustration of two protein structures described by a space-filling tube of non-zero thickness.