The most important achievement for this project has been the establishment of the FinRegistry data resource (
https://www.finregistry.fi/(öffnet in neuem Fenster)).
FinRegistry is a curated, nationwide, register-based data resource for developing statistical and machine learning models, performing high-throughput epidemiological analyses and deriving outcome-specific prediction models.
FinRegistry data are collected across 19 registries covering public health care visits, health conditions, medications, vaccinations, laboratory responses, demographics, familial relations and socioeconomic variables, with decades of follow-up for most registries.
FinRegistry includes everyone living in Finland on 1 January 2010, as well as their parents, spouses, children and siblings, comprising a sample size of approximately 7.2 million persons.
FinRegistry data are mapped to more than 3000 clinical endpoints defined by leveraging multiple registers and clinical expertise as part of the FinnGen project.
The Risteys web portal [https://risteys.finregistry.fi] enables exploration of clinical endpoint definitions, their links to international ontologies and the results of epidemiological analyses to gain insights into disease epidemiology in the Finnish population.
The establishment of this resource and the integration of genetic data provided by FInnGen (
https://www.finngen.fi/en(öffnet in neuem Fenster)) has allowed to perform several machine learning analyses of direct public health relevance.
First, we considered the effects of 2,890 health, socio-economic and demographic factors in the entire Finnish population aged 30–80 and genome-wide information from 273,765 individuals to identify predictors of COVID-19 vaccination uptake. The strongest predictors of vaccination status were labour income and medication purchase history. Mental health conditions and having unvaccinated first-degree relatives were associated with reduced vaccination. A prediction model combining all predictors achieved good discrimination (area under the receiver operating characteristic curve, 0.801; 95% confidence interval, 0.799–0.803). The 1% of individuals with the highest predicted risk of not vaccinating had an observed vaccination rate of 18.8%, compared with 90.3% in the study population. We identified eight genetic loci associated with vaccination uptake and derived a polygenic score, which was a weak predictor in an independent subset. Our results suggest that individuals at higher risk of suffering the worst consequences of COVID-19 are also less likely to vaccinate.
Second, we studied the impact of genetic variation on overall disease burden. We introduce an approach to estimate the effect of genetic risk factors on disability-adjusted life years (DALYs; ‘lost healthy life years’). We use genetic information from 735,748 individuals and consider 80 diseases. Rare variants had the highest effect on DALYs at the individual level. Among common variants, rs3798220 (LPA) had the strongest individual-level effect, with 1.18 DALYs from carrying 1 versus 0 copies. Being in the top 10% versus the bottom 90% of a polygenic score for multisite chronic pain had an effect of 3.63 DALYs. Some common variants had a population-level effect comparable to modifiable risk factors such as high sodium intake and low physical activity. Attributable DALYs vary between males and females for some genetic exposures. We found that genetic risk factors can explain a sizable number of healthy life years lost both at the individual and population level.
These are the two main scientific output directly related to this project. We are now started to address the third Aim of the grant, focusing on the establishment of a clinical trial names GENEROOS. In this study, we will leverage the unique opportunity provided by the Finnish biobank research to re-contact 1200 individuals that have extreme genetic predisposition for high/low BMI as measured by a p[olygenic score for BMI and evaluate how a randomized diet intervention effect varies between the two extreme groups.