Periodic Reporting for period 4 - ImmRisk (Defining how environmental factors influence downstream effects of immune-mediated disease risk-SNPs)
Reporting period: 2020-03-01 to 2021-02-28
Some people are genetically at risk for developing a specific disease. However, not everybody who is genetically at risk, will automatically develop the disease. One of the reasons for this is that in addition to genetic factors, environmental factors can influence whether a disease will develop. The aim of this project is to determine which genetic and environmental factors are involved in the development of complex diseases, and understand their downstream molecular mechanisms. Understanding how and why these diseases develop, will allow for the identification of high risk individuals in which preventive measurements can be taken to reduce their risk of developing specific diseases (e.g. prevent exposure to the identified environmental factors). Moreover, this knowledge may aid in the identification of new drug targets for these diseases.
The overall objectives of this project are:
1) Identify the downstream consequences of genetic variants that are associated with immune-mediated diseases.
2) Identify which environmental stimuli alter the downstream molecular effects of genetic variants that are associated with immune-mediated disease, through the generation of genotype and single-cell RNA-seq data on blood cells from ~120 individuals that have been stimulated with ~3 different pathogens.
3) Identify other environmental risk factors that influence downstream molecular effects of these genetic variants that are associated with immune-mediated disease, by re-analyzing genotype and RNA-seq data from >20,000 samples, generated in the presence and absence of many different (disease) stimuli, to identify those conditions (e.g. bacterial infections, detectable using the RNA-seq data itself).
We have set up a consortium, eQTLGen Consortium (http://eqtlgen.org) to perform the largest eQTL meta-anlysis to date, encompassing 31,684 whole blood samples from 37 individual RNA expression datasets. This allowed us to identify cis-eQTL effects for 88% and trans-eQTL effects for ~29% of blood-expressed genes. In addition, we calculated polygenic risk scores for 1,267 complex traits and correlated those with gene expression levels (ePRS analysis). We observed a number of significant associations, e.g the polygenic risk score for HDL cholesterol levels was associated with the expression of genes known to play a role in lipid metabolism (e.g. ABCA1, ABCG1) and familial hypercholesterolemia (e.g. LDLR).
Objective 2:
We generated single-cell (scRNA-seq) data from peripheral blood mononuclear cells from 120 donors. We first studied how genetic variation is affecting gene expression in single-cells that have not been stimulated with pathogens. We subsequently studied how genetic variation is affecting gene expression in single-cells that have been stimulated with Candida albicans, Pseudomonas Aeruginosa or Tuberculosis for 3 or 24h.
Objective 3:
We developed a pipeline to automatically download public RNAseq fastq files, align them to a reference genome, and call genotypes. The pipeline has been tested and validated on 4002 samples from BBMRI - BIOS, a Dutch biobank. We subsequently applied this pipeline to a large collection of publicly available RNA-seq samples and could use it to identify eQTLs in many cell-types, tissues and conditions.
- ePRS analysis: polygenic risk scores were calculated for 1,267 complex traits and these were correlated with gene expression levels (eQTLGen Consortium).
- Performed one of the first, and largest single cell eQTL analysis to date (Nature Genetics, 2018: doi: 10.1038/s41588-018-0089-9).
- Novel methodology in single cell data to identify co-expression QTLs, i.e. identification of genetic variants that affect the co-expression of two genes (Nature Genetics, 2018: doi: 10.1038/s41588-018-0089-9).
Expected results until the end of the project:
Objective 1:
- Context-dependent eQTL analysis in ~32,000 whole blood samples: identification of eQTLs that are modulated by a specific context, which could be, for example, a specific cell type or the expression of another gene.
Objective 2:
- Identification of cell type-specific and environmental-dependent eQTLs and co-expression QTLs.
- Generation of personalized, context-dependent gene regulatory networks.
- Greater understanding how genetics and environment interact with each other in the context of health and disease.
Objective 3:
- Identification of environmental factors that modulate specific disease risk.