Skip to main content
European Commission logo
français français
CORDIS - Résultats de la recherche de l’UE
CORDIS

Repeat protein Function Refinement, Annotation and Classification of Topologies

Periodic Reporting for period 1 - REFRACT (Repeat protein Function Refinement, Annotation and Classification of Topologies)

Période du rapport: 2019-01-01 au 2022-06-30

REFRACT is an international consortium aiming to expand our knowledge on the mechanism of tandem repeat protein function and evolution, establishing a common classification and best practices. Starting from available state of the art computational tools and databases, it aims to drive a new level of tandem repeat proteins characterization leveraging the complementary expertise of institutions in Europe and Latin America.
Tandem repeat regions are considerably diverse, ranging from single amino acid homorepeats detectable by their lack of sequence complexity, to cryptic repeats of up to 40 residues per unit and even entire domains. The typical repeat protein sequences fold into a modular protein structure composed by the repetition of the same structural unit. Overall, such sequences are ubiquitous in genomes, and overrepresented in complex organisms, e.g. being present in almost a third of human proteins. Tandem repeat proteins found their perfect functional niche in the biological pathways that require fast evolution, such as host-pathogen interactions, and played an essential role in the evolution of eukaryotes, compensating their lower mutation rates. In addition, numerous studies over the last decade have related repeat proteins to human diseases and emerging infection threats, confirming the essentiality of their function.
REFRACT research objective is to address major challenges in the tandem repeat protein field: to benchmark and improve the existing repeat regions detection methods and to improve our understanding of repeat proteins functional mechanisms and evolution. This aim will be achieved by the following steps: (1) Benchmarking existing methods and defining their use-cases, (2) Coordinating steps to analyze repeat protein roles in biological pathways and organism evolution, (3) Providing a detailed description of their mechanism of function, (4) Building and characterizing a commonly accepted repeat proteins classification from sequence and structure.
During the first reporting period REFRACT laid the foundations to detect, characterize and the functional describe Tandem Repeat Proteins (TRP). Starting from the state-of-the-art tools, often developed by the consortia partners, new TRP detection algorithms have been implemented. The project started generating and annotating datasets of repeats proteins, detected from protein sequences and/or from protein structures and non-repeats, which can be used for computational methods benchmarking and to identify the differences and overlaps between the groups. Moreover, the new and existing tools for the detection of TRP regions in both protein sequences and structures were benchmarked and assessed in this first phase of the project.

The second phase of the project was focused on the characterization and description of the repeat datasets, in terms of overlap with other phenomena such as protein disorder, transmembrane and low complexity regions. A special effort was made in developing strategies for modelling TRP regions from sequence by homology, as well as a software for the automatic TRP region and unit annotation. These approaches have provided more comprehensive insight into tandem repeats characteristics, and their functional relationship with protein disorder, low complexity regions and protein folding.

The last phase of the project will be focused on understanding TRP evolution and improving the actual classification schema. By using evolutionary information and structural data from high-accurate protein structural models, new TRP folds could be identified, and consequently the current classification could be extended. Furthermore, the last face will translate and concentrate all the generated knowledge of previous work into core data resources.
REFRACT intends to raise awareness for TRPs in the life sciences at large with a specific focus on young researchers, who will be trained to use the TRP resources. The project successfully established its training impact both through its first training bootcamp in Lima (Peru) and by openly sharing its training materials, which have been accessed even from students that could not participate in the bootcamp. In addition, we are collecting materials to collaborate with training platforms within Elixir (https://elixir-europe.org/) CABANA (https://www.cabana.online) and The Carpentries (https://carpentries.org/) developing TRP-related programming courses. The plan is in line with our objectives of improving the impact of European science in Latin America and viceversa and promoting scientific excellence, open science and best practices in the field.
Methods for tandem repeat proteins detection
Examples of tandem repeat protein functions
The challenge of tandem repeat proteins detection