Periodic Reporting for period 1 - REFRACT (Repeat protein Function Refinement, Annotation and Classification of Topologies)
Berichtszeitraum: 2019-01-01 bis 2022-06-30
Tandem repeat regions are considerably diverse, ranging from single amino acid homorepeats detectable by their lack of sequence complexity, to cryptic repeats of up to 40 residues per unit and even entire domains. The typical repeat protein sequences fold into a modular protein structure composed by the repetition of the same structural unit. Overall, such sequences are ubiquitous in genomes, and overrepresented in complex organisms, e.g. being present in almost a third of human proteins. Tandem repeat proteins found their perfect functional niche in the biological pathways that require fast evolution, such as host-pathogen interactions, and played an essential role in the evolution of eukaryotes, compensating their lower mutation rates. In addition, numerous studies over the last decade have related repeat proteins to human diseases and emerging infection threats, confirming the essentiality of their function.
REFRACT research objective is to address major challenges in the tandem repeat protein field: to benchmark and improve the existing repeat regions detection methods and to improve our understanding of repeat proteins functional mechanisms and evolution. This aim will be achieved by the following steps: (1) Benchmarking existing methods and defining their use-cases, (2) Coordinating steps to analyze repeat protein roles in biological pathways and organism evolution, (3) Providing a detailed description of their mechanism of function, (4) Building and characterizing a commonly accepted repeat proteins classification from sequence and structure.
The second phase of the project was focused on the characterization and description of the repeat datasets, in terms of overlap with other phenomena such as protein disorder, transmembrane and low complexity regions. A special effort was made in developing strategies for modelling TRP regions from sequence by homology, as well as a software for the automatic TRP region and unit annotation. These approaches have provided more comprehensive insight into tandem repeats characteristics, and their functional relationship with protein disorder, low complexity regions and protein folding.
The last phase of the project will be focused on understanding TRP evolution and improving the actual classification schema. By using evolutionary information and structural data from high-accurate protein structural models, new TRP folds could be identified, and consequently the current classification could be extended. Furthermore, the last face will translate and concentrate all the generated knowledge of previous work into core data resources.