Periodic Reporting for period 1 - GENOMEPEP (UNCOVERING PATHOGENIC MICROPEPTIDES FROM THE HUMAN GENOME)
Periodo di rendicontazione: 2020-09-01 al 2023-08-31
The GENOMEPEP project concentrated on developing analysis pipelines to carry out analyses concentrating on identifying potentially pathogenic micropeptides. This project leveraged existing data from the Estonian Biobank, which includes genotyping of 212,000 volunteers using DNA microarrays and imputed data for common and rare variants. With access to comprehensive digital health records since 2004, the project utilized diagnostic codes and phenotypes to create detailed study cohorts, facilitated by the digitalized Estonian healthcare system.
The project's first phase involved compiling these diverse datasets to establish cohorts focused on cardiovascular and related diseases, utilizing additional data such as blood metabolite information and prescription medicine data for cohort stratification. The second phase concentrates on identifying pathogenic variants within the micropeptidome that correlate with cardiovascular phenotypes and other traits through GWAS and subsequent analyses, with contingencies for non-significant findings.
Identification of novel pathogenic genes and development of guidelines to investigate the micropeptidome will assist in the advancement of research, diagnostic medicine, and pharmacology both in the public and private sectors.
The purpose of the first phase of this project was to generate a master dataset containing phenotypic data from the biobank participants, together with their cardiovascular diagnostic information, blood metabolite measurement levels, drug prescriptions, procedures, and other surgeries. In total, we created over 600 separate CVD phenotypes using EstBB data. As the pipeline for such a high number of phenotypes was established within the framework of GENOMEPEP, we additionally applied this approach to other non-CVD phenotypes as well, yielding over 10,000 different human traits from the EstBB dataset. To identify potential peptides associating with these diseases, we developed a computational pipeline to carry out genome-wide association studies (GWASs) and post-GWAS analyses on these phenotypes. All of these phenotypes are currently undergoing analyses and are expected to finish in the first half of 2024. Although these results were planned to be ready already by the middle-point of the GENOMEPEP project, the COVID-19 pandemic considerably delayed the schedule of this crucial phase, as the project fully depended on datasets and labor from external partners. Nevertheless, when these analyses will be finished, they will yield the largest number of GWASs from a population-specific biobank in the world. The results of this enormous endeavor, together with the analyses of potential micropeptide-disease associations, are currently being prepared for publication and will be submitted for peer review in second half of 2024.
In the process of developing the pipeline and testing the validity of the phenotypes, we discovered various interesting gene-disease associations. For example, we identified rare genetic variant in the POMC gene among 1,739 biobank participants, which results in lack of certain small peptide hormones in the brain. The lack of these critical peptides in the pituitary gland in turn leads to decreased satiety feeling, and results on average in 3-4 kg higher body weight on average among the variant carriers. The results of this research have been successfully presented at various scientific conferences around the globe and are currently being prepared for a peer-reviewed publication.
Moreover, some of the resulting GWASs have already been shared during different international collaborative studies. In particular, these analyzes were used to validate a novel loss-of-function variant of loss of function in lipophilin B peptide, which in turn increases the risk of acquiring Lyme disease by 30% in general population. Importantly, this risk variant is known to be present among ~55% of Europeans, highlighting the value of the research approach used in GENOMEPEP. This discovery is the first known genetic risk variant for Lyme disease and will provide invaluable information for research focusing on prevention and treatment of this common infectious disease. The results of our work have been accepted in an open-access peer-reviewed journal and will be published in the first quarter of 2024.
In addition to scientific achievements, GENOMEPEP allowed personal development of the main researcher. As the previous background of the main researcher was from the field of virology, his expertise became invaluable in public outreach during COVID-19 pandemic. As such, during the GENOMEPEP project he made over 100 different public outreach appearances, including on national TV and radio shows, podcasts, public events, social media campaigns, public health advertisements, and in all the main daily and weekly newspapers in interview formats or with opinion articles. These steps have been of critical importance for the future career, as they provided necessary experience for enticing public outreach and provided contacts within the communications field within the context of a small country.
The systematic characterization of micropeptides can be considered a relatively unexplored direction in biology. The GENOMEPEP project concentrated on creating a novel dataset, which can be used to identify novel molecular mechanisms that participate in the etiology complex diseases. As such, the results of this study can be used to discover new biomolecules for clinical diagnostics as biomarkers or can be developed into potential pharmaceuticals. From the point of view of population genetics, understanding the mutational landscape within the micropeptidome will support both biological and medical fields in improving risk assessments associated with common diseases, significantly furthering the implementation of personalized medicine.