Skip to main content

European Genetic Disease Diagnostics

Final Report Summary - EURO-GENE-SCAN (European genetic disease diagnostics)

Executive summary:

Molecular techniques have become more efficient, increasingly precise and much cheaper, resulting in an unprecedented discovery rate of inherited disease genes. In areas such as primary immunodeficiencies (PID), muscle disorders, growth deficiencies, hearing/vision impairments and metabolic diseases, very large numbers of different genes have been found to carry mutations in diseases with heterogeneous clinical presentation. For example, mutations in more than 170 genes have been found to cause PID. This means that even for well-defined subgroups of PID, mutations in different genes result in identical, or overlapping, phenotypes.

Current mutation analysis is very complex, often with many different European laboratories being involved. Thus, individual laboratories carrying out mutation detection normally only cover a few per cent of all disease genes. Obtaining a correct diagnosis is both difficult and time-consuming. If multiple genes need to be analyzed, the cost rises proportionately.

New sequencing approaches have been used for the analysis of whole genomes. We are adapting these technologies, based on massive, parallel sequencing, to specific disease fields. This will involve the development of an innovative multiplexing technology. The proposed prototype area is PID, where significant collaboration has already been underway in Europe over the last two decades. We estimate that, using high-throughput sequencing, the cost for analyzing all known 179 PID genes in a single run could potentially be in the same range as the current cost for mutation detection in single disease genes.

We are also developing chips to identify single nucleotide polymorphisms (SNPs) for the study of modifier genes. In addition, we are developing reverse-phase protein arrays for proteomics approaches in the diagnostics of PID patients during infancy. During the EURO-GENE-SCAN project we are disseminating the information about the developed technologies to the scientific community.

Project context and objectives:

The main objective of the project was to develop high-throughput technologies in order to diagnose the primary immunodeficiency diseases (PIDs). There are more than 170 different forms of PIDs known to date and the number is constantly increasing. There is an estimated number of 60 000 patients suffering from PIDs in Europe. In brief, PIDs are rare, genetic, chronic diseases. Many of the autosomal recessive forms affect less than 1 on 1 000 000 individuals, while the X-linked recessive forms affect about 1 every 100 000 or less.

The project activities were divided into three main tasks:

1. Adapting the new techniques for DNA sequencing to meet our objectives
2. Development of a genome-wide PID-specific SNP-based chip
3. Reverse phase protein (RPP) microarrays for large-scale screening for PID.

During the project the following goals were achieved:

The selector procedure developed by Beneficiary five, was a deoxyribonucleic acid (DNA) capture technology based on the hybridization of oligonucleotide constructs, called 'selectors', to defined target nucleic acid sequences. The selectors contain target-complementary sequences, and act as ligation templates to direct circularization of target DNA fragments, which can then be amplified. During the project three different designs of selectors were tested. Each design included the complete set of 156 PID genes from previous design, plus 23 newly discovered genes ? resulting in a total of 179 target genes (EGS179). The design called EGS179-2 was considered to be the preferred design, and selected as the final, optimized selector set.

The EGS179-2 design includes 98.55 % of all targeted bases in the 179 PID genes, using a set of 12 702 selector probes. When using this selector set for target enrichment, followed by sequencing at a total depth of 29.6 million reads per sample, 98.9 % of the 98.55 % PID regions targeted by design was covered at least once. This corresponds to 96.4% coverage of the complete PID target region. Furthermore, to prove the potential use of the designed selector set for amplification and mutation analysis of the 179 PID genes, we subjected a set of 42 patient samples and one HapMap sample (NA12801) to an enrichment experiment, followed by Illumina sequencing (Partner three). For 20 of the patient samples, the disease causing mutation was known before experiment onset, while it was unknown for 22 of the cases. Data analyses and mutation validation were done by Beneficiary three and one, respectively. Ultra high throughput sequencing of EURO-GENE-SCAN samples was performed on state of the art next generation sequencing machines. The Illumina GAIIx and the improved HiSeq 2000 (http://www.illumina.com/systems/hiseq_2000.ilmn) were used by Beneficiary 3 (SME). During the course of project, the data analysis pipeline has constantly been improved by incorporating the observations and evaluations made by previous analyses. This allowed us to learn from each analysis performed to fine tune the analysis steps and parameters used in the final EURO-GENE-SCAN project specific data analysis pipeline including various processing steps.

Common variable immunodeficiency (CVID) is a primary antibody deficiency with an estimated prevalence of 1 per 25 000 to 1 per 100 000. Patients suffer from hypogammaglobulinemia, a reduction of two serum immunoglobulin levels (IgG and IgA or IgM). Beneficiary two of this project has collected a cohort of families with at least one individual affected with CVID. In these families CVID patients have a mutation in TACI (tumour necrosis factor receptor 13B, TNFRSF13B). The same mutation has been identified in some of their healthy family members. The idea is to compare the genome of these patients and healthy individuals to identify possible modifier genes that are mutated in patients and not in healthy family members additionally to TACI. This is done by using a genome wide single nucleotide polymorphism (SNP) array from Affymetrix SNP array 6.0 with the Nsp/Sty restriction enzyme assay. The data analysis is expected to show one or more SNPs that are present in patients only, which may then be associated with modifier gene(s). This would mean that a mutation in the identified modifier gene(s), in addition to a mutation in TACI, is responsible for the phenotype in CVID patients. The focus is on TACI mutations C104R and A181E, which are the most common of all TACI mutations and have been found in patients of different ethnic background.

We have collected 10 pairs, 8 trios, 3 quadruplets, 2 quintets and 3 sixtets, concordant for the genetic variant, but discordant for the phenotype from 35 families with CVID and a genetic variant in TACI. The combination of genetic linkage and association studies revealed exactly one genetic modifier locus on chromosome 5 with a LOD-score of 3.98 under the hypothesis that the modifier acts autosomal recessive, and exactly one genetic modifier locus on chromosome 3 with a LOD-score of 3.61 under the hypothesis that the modifier acts autosomal dominant. Most interestingly, one gene within the linkage regions is involved in TACI-downstream signalling, and the other gene is a poorly characterised transcription factor which also lies in the linkage region of a large autosomal-recessive CVID family with a LOD score of more than 4, indicating that this gene is of importance to the development of CVID. Once the modifier gene to TACI has been identified by Sanger sequencing and gene expression analysis, functional studies will be conducted to study the interaction/synergy between these two proteins/pathways.

Additional five patients out of a novel cohort of 30 have been diagnosed with IL10/IL10R deficiency by us. Comparison of the response to treatment in seven patients with IL10/IL10R deficiency revealed resistance to anti-inflammatory drugs and monoclonal antibodies, resulting in high morbidity, whereas allogeneic HSCT was successful. Non-transplanted patients continued to suffer from severe inflammatory bowel disease and perianal disease. Therefore, HSCT probably is the only therapeutical approach to cure the disease and to give the affected patients better quality of life.

We were also able to establish the following diagnostic guidelines for DOCK8 deficiency: Possible: NIH HIES score more than 20 plus a weighted score of clinical features more than 12 based on hypereosinophilia and upper respiratory tract infections (weighing positively) and parenchymal lung abnormalities, retained primary teeth, and minimal trauma fractures (weighing negatively). Probable: Above plus consanguineous parents, severe viral infections and/or allergies. Definitive: Homozygous or compound heterozygous mutation in DOCK8 and/or lack of full-length protein expression.

These results are clearly significant, because an early distinction between HIES due to STAT3 mutations and HIES due to DOCK8 mutation is not always possible, but is highly desirable to be able to make important clinical decisions such as for or against stem cell transplantation.

The main aim of this project was to develop the RPP technology for large scale screening for concentrations of serum proteins. In the earlier stages of this work, we screened for single analytes such as serum levels of IgA and C3. The objective of the second period was to extend this work to encompass other proteins (C2), other sources of starting material for the screening (eluates from Guthrie cards from newborns) and to establish a method whereby more than one analyte could be measured simultaneously in the same sample.

The method was developed in the first phase of this project and was validated and used for measuring the IgA levels in a large (approximately equal to 5 000), population based cohort of Swedish children and adults.

Theoretically, IgA deficiency would be a hallmark of most antibody deficiency syndromes and testing for IgA levels in eluates from DBSS might thus identify a majority of patients with severe combined immunodeficiency (SCID), hyper-IgM syndrome HIGM, common variable immunodeficiency (CVID) and IgAD. (In total, these patient categories correspond to more than 90 % of all patients with antibody deficiency in Europe).

In order to screen for PID in newborns, we thus eluted the plasma proteins from DSBB samples (Guthrie cards) and spotted and tested these in ELISA and in the RPP microarray. Although the ELISA gave reproducible results, the RPP was not sufficiently sensitive to measure the very low levels of IgA in the eluates but was easily measurable by ELISA screening.This manuscript was rejected by several journals as it was suggested that this technology represented an ?old-fashioned? approach (Borte S, Janzi M, Pan-Hammarström Q, von Döbeln U, Nordvall L, Winiarski J, Fasth A, Hammarström L. Evaluation of lack of IgA as a diagnostic marker for primary immunodeficiency disorders using DBSS eluates. PLoS One submitted) and the advent of the TREC/KREC technology would make this type of testing superfluous. We thus carried out extensive work on this angle (Borte S, von Döbeln U, Fasth A, Wang N, Janzi M, Winiarski J, Sack U, Pan-Hammarström Q, Borte M, Hammarström L. Neonatal screening for severe primary immunodeficiency diseases using high-throughput triplex real-time PCR. Blood in press 2012) and showed that, contrary to the belief of the referees, most patients with antibody deficiencies (accounting for the vast majority of PID patients), are NOT detected by TREC/KREC testing. Nor are patients with complement deficiencies detected by the TREC/KREC assay. Our manuscript has now been substantially re-written and re-submitted, strongly arguing that the array methodology actually represents state of the art in newborn screening.

Interestingly, although the array technique works nicely for C3 Janzi et al. Microarray based analysis of serum proteins in dried blood samples on filter paper (Guthrie cards). PLoS One 4, e5321, 2009) it does not, owing to the lack of suitable reagents, work for quantification for C2. Thus, using ELISA, deficient samples can be easily recognized whereas the array method does not give reliable results. Although a preliminary manuscript was written up, it was never submitted as it was felt that the information was not sufficiently interesting to be published.

The above tests have all been based on detection of a single analyte. The final planned task proposed within this project was to develop a screening method for multiple (five) proteins in the same assay. This project has been quite labor-intensive and involved a change in platform format as the RPP was found to be less suitable for this purpose. Thus, in collaboration with colleagues at SciLife (a core facility within the Karolinska Institutet), we have, using blinded serum samples from PID patients from beneficiary four, successfully developed a Luminex, bead-based technology, theoretically being able to quantify up to 100 different proteins in the same sample/run. Due to limitations in the accessibility of antisera suitable for this assay format, we have been restricted to simultaneous detection of 46 proteins (known to be mutated/missing in PID patients). The results are currently being written up.

Project results:

The main objective of the project was to develop high-throughput technologies in order to diagnose PIDs. There are 179 different forms of PIDs known to date and the number is constantly increasing. There is an estimated number of 60 000 patients suffering from PIDs in Europe. In brief, PIDs are rare, genetic, chronic diseases. Many of the autosomal recessive forms affect less than 1 in 1 000 000 individuals, while the X-linked recessive forms affect about 1 in 100 000 or less.

The hallmark of PIDs is increased susceptibility to infections. However, in certain PIDs the frequency of tumours is significantly enhanced and, in some forms of PIDs, severe autoimmune phenomena predominate. PIDs often show locus heterogeneity, i.e. they fall into subgroups with a similar phenotype, but with different genes affected. These diseases are often difficult to diagnose and frequently require complex treatment regimens. For all the above reasons PIDs are highly suitable for the development of efficient methods for mutation detection. An important reason is that early diagnosis is prerequisite for the treatment of these severe genetic disorders.

The project activities were divided into three main tasks:

1. Adapting the new techniques for DNA sequencing to meet our objectives
2. Development of a genome-wide PID-specific SNP-based chip
3. Reverse phase protein (RPP) microarrays for large-scale screening for PID.

The work was divided into six work packages (WPs).

WP 1: Optimisation of selector probes for extended PCR products

The selector procedure is a DNA capture technology based on the hybridization of oligonucleotide constructs, called ?selectors?, to defined target nucleic acid sequences. The selectors contain target-complementary sequences, and act as ligation templates to direct circularization of target DNA fragments, which can then be amplified. Here, a non-PCR based version of the selector method is used, applying RCA-based multiple displacement amplification, generating an amplification product which is easily integrated with shotgun library construction for short-read sequencing platforms.

During the first half of the project, two selector designs were made, the first including 94% of a small set of seven PID genes (EGS8), and the second including 97.5% of all 156 PID genes known at that time (EGS156).

Selector-based enrichment and sequencing of EGS8 resulted in 90 % of all targeted bases being covered by a read depth of at least 20x.

Enrichment and sequencing of EGS156 resulted in coverage of at least 80 % of the targeted bases by at least 20x. This somewhat lower coverage could partially be explained by an error found in the design software, which had led to a mismatch in some of the selector probes, hence resulting in a sub-optimal coverage of the targeted genes. In addition, the inclusion of a large portion of repeats in the design due to no repeat masking and allowing one selector probe arm to be located in a repeat led to a competition for available sequencing capacity, further reducing coverage, and complicating data analysis.

These shortcomings were addressed in the second half of the project. To be able to cope with repeats in the best possible way, three different designs were tested. Each design included the complete set of 156 PID genes from previous design, plus 23 newly discovered genes resulting in a total of 179 target genes (EGS179). Repeats were handled in the following way:

1. EGS179-1: in this design, all repeats longer than 80 bp are excluded from the target region, and none of the selector arms may be located in a repeat
2. EGS179-2: this design includes the same target regions as EGS179-1, added with several additional regions, obtained by iterating the design process for the non-covered parts, this time allowing one selector probe arm in a repeat
3. EGS179-3: in this design, all repeats longer than 80 bp are excluded from the target region, but, in contrast to EGS179-1, one of the probe arms may be located in a repeat. This design differs from EGS179-2 in the sense that only one design round is performed. I.e. in EGS179-3, one probe arm is allowed in a repeat from the start of the design process ? whereas in EGS179-2, extra probes are added to the design by iteration.

In a first experiment, aiming at comparing the three designs described above, one HapMap sample (NA18507) was prepared for each of the designs. These samples were then sequenced by GATC.

Results showed that both EGS179-2 and EGS179-3 obtain a better coverage compared to EGS179-1 (96.4 % and 96.2 % in EGS179-2 and 3, versus 92.5 % in EGS179-1, at read depth one). This observation can easily be explained based on how the design was done. As EGS179-1 was more stringent compared to EGS179-2 and 3, the perecentage of the target region covered by design will be lower (93.5% of all target PID regions in EGS179-1, versus 98.55 % and 98.52 % in EGS179-2 and 3), hence resulting in a lower target coverage after enrichment and sequencing.

When normalizing the data for the number of reads obtained per sample we could see that design EGS179-2 and EGS179-3 performed equally well. Indeed, both have a comparable coverage by design (98.55 % versus 98.52 %), and also the number of repeated bases included by these two designs ? which strongly influences target coverage by competing for available sequencing capacity was the same (1319 bp in both designs).

EGS179-2 however, has the advantage over EGS179-3 that as it has less chance for repeats to be located in one of the probe arms. For this reason, and also considering its slightly higher coverage by design compared to EGS179-3, EGS179-2 was considered to be the preferred design, and selected as the final, optimized selector set.

The EGS179-2 design included 98.55 % of all targeted bases in the 179 PID genes, using a set of 12702 selector probes.

When using this selector set for target enrichment, followed by sequencing at a total depth of 29.6 million reads per sample, 98.9 % of the 98.55 % PID regions targeted by design was covered at least once. This corresponds to 96.4 % coverage of the complete PID target region.

But even at lower sequencing depths an adequate coverage of the target region can be obtained; when sequencing at a total depth of at least 3.3 million reads per sample ? matching the capacity of bench-top sequencers such as Illumina's MiSeq or Ion Torrents? PGM 95 % of the complete 179 PID genes are covered at least once. Increasing the sequencing depth will further increase the number of obtained reads per fragment.

To prove the potential use of the designed selector set for amplification and mutation analysis of the 179 PID genes, we subjected a set of 42 patient samples and one HapMap sample (NA12801) to an enrichment experiment, followed by Illumina sequencing (Partner 3). For 20 of the patient samples, the disease-causing mutation was known before experiment onset, while it was unknown for 22 of the cases. Data analyses and mutation validation were done by Partners three and one, respectively.

WP2: High-throughput mutation analysis of extended PCR products

Ultra high throughput sequencing of EURO-GENE-SCAN samples provided by the partners was performed on state of the art next generation sequencing machines.

The Illumina GAIIx and the improved HiSeq 2000 (http://www.illumina.com/systems/hiseq_2000.ilmn) were used.

Data analysis pipeline

During the course of project, the data analysis pipeline has constantly been improved by incorporating the observations and evaluations made by previous analyses. This allowed us to learn from each analysis performed to fine tune the analysis steps and parameters used in the final data analysis pipeline. The pipeline includes various processing steps and is described below in detail.

Sequence quality

The sequence reads generated from each of the samples are checked for the quality using the quality score distribution and the nucleotide composition of the bases called. This will enable to foresee any significant sequencing error prior to variation analysis. The sequencing quality of samples is assessed using the FastQC tool [http://www.bioinformatics.bbsrc.ac.uk/projects/ fastqc/]. The base quality distribution over the read length is inspected to determining the sequencing success. A good sequencing run will have an average base quality of > Q25 over at least 90% of the read length. In addition to the standard Illumina HiSeq 2000 quality filtering, the sequencing reads are screened for sequencing adaptors and artifacts related to sequencing library preparation. A final quality score screen is performed to ensure that only high quality reads are processed.

Mapping to reference

Once the sequencing quality of the samples is satisfactory, the sequences are mapped to the genome reference. For Euro-gene-scan project two versions of human genome reference are used hg18 (GRCh36) and hg19 (GRCh37). The chromosomal sequences of human genome were assembled by the International Human Genome Project sequencing centers and are distributed via Genome Reference Consortium (GRC) [http://www.ncbi.nlm.nih.gov/projects/genome/ assembly/grc/index.shtml]. The human genome reference sequence is retrieved from UCSC servers [http://hgdownload.cse.ucsc.edu/goldenPath//bigZips/]. Mapping to the whole genome is done using Burrows-Wheeler Aligner (BWA) with the default parameters [Li H. and Durbin R. (2009) Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics, 25:1754-60]. BWA is one of the popular short read aligners and is used in the resequencing projects including 1000 genomes project. The sequence read mapping information is collected from the alignments generated for each sample.

Junction processing

Due to the nature of rolling circle amplification employed by the selector technology, a portion of the reads will cross over the fragment ends (Junctions). These reads have to be identified and processed separately to allow accurate mapping to the genome. This is achieved by preparing the human genome reference, which includes the junction regions from the PID selector design. Junctions regions can be predetermined by using the selector probe co-ordinates for the respective PID selector design. Sequence reads are mapped to the human genome reference including the junctions using BWA. This is done in order to give each read a best chance to align to the correct reference region. The sequence alignments are inspected for reads that match the junction regions and the reads are split at the junction point to generate corrected fragment reads.

Target region processing

The split junction read pairs and the non-junction reads are mapped to the human genome (hg18 or hg19) using BWA with default parameters. The sequence alignments are processed to select only the amplified region of interest (ampROI). Amplified region of interest is defined by each PID selector design. For processing the target region, all the selectors designed in the enrichment were merged or collapsed to obtain non-overlapping genome co-ordinates. The resulting regions are designated as target region and used for downstream processing.

Local realignment

The alignment algorithm, BWA aligns each sequence read, one at a time, yielding the best possible alignment for one individual read only. As it tries to find the best match for each read, it will introduce mismatches. This mode of operation makes it impossible to minimise mismatches across all genomic regions for all reads in one single alignment operation.

Insertion and deletion (InDels) generated by the initial alignment are present across the entire regions of interest. These alignment artifacts can be removed using local realignment. The Smith-Waterman algorithm minimizes mismatches around known InDels by doing local realignment of all reads across these regions and effectively removing mismatches that otherwise would be interpreted as a false positive variants. Thus, local realignment serves to transform regions with misalignments due to InDels into clean reads containing a consensus InDel suitable for standard variant discovery approaches. Genome Analysis Tool Kit (GATK) IndelRealigner module is used for this purpose [Depristo MA, Banks E, Poplin R, Garimella KV, Maguire JR, Hartl C, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet 2011;43:491-8].

Base quality recalibration

The sequencing process used to read the fragments are not perfect yet which means there is certain degree of uncertainty in calling the correct bases. Probabilities are computed for each base to determine how reliable the call is. Due to the technical limitations, there will be some error in computing those probability scores accurately. These errors have to be corrected to reduce the false positive variants. The goal of this step is to improve the base quality scores of reads for downstream processing and also correct for error covariates like machine cycle and dinucleotide context. A base quality score represents the probability of a particular base mismatching the reference genome. After recalibration, quality scores are more accurate in that they are closer to the true probability of mismatch. This process is achieved by analyzing the covariation among several different features of a base. The reported quality score, sequencing cycle, and sequencing context are considered for this step. GATK modules - CountCovariates, TableRecalibration and AnalyzeCovariates are used for this analysis step.

Coverage assessment

The fully processed sequence alignments are used to compute the sequence coverage at each site in the amplified region. Sequence coverage is determined for each sample separately and coverage profiles are created for doing sample wise comparison. Additionally, the reads and bases are filtered using the mapping results and base quality score thresholds to remove ambiguous variants. In this way only high quality bases and reads are included in the variant analysis. The coverage graph gives a measure of how the bases in the region of interest are covered. Additionally, it also reveals how each of the samples in the design performed with respect to other samples in the same PID selector design.

Variant detection

Realigned and recalibrated sequence alignments are then used for calling variants. Variants single nucleotide polymorphisms (SNPs) and insertion-deletions (InDels) are called using the Genome Analysis Toolkit?s (GATK's) UnifiedGenotyper module. It uses a Bayesian genotype likelihood model to estimate simultaneously the most likely genotypes and allele frequency in a population of N samples. Reporting a Phred-scaled confidence value (where the Phred base-caller uses a four-phase procedure to determine a sequence of base-calls from the processed trace) at each position, it provides a probability of a variant allele being present and as well as determining the genotype of each sample.

Variant annotation

Called variants are annotated based on their genomic context and by associating variant specific information with the variant call. For variant specific known annotations, information from dbSNP database [http://www.ncbi.nlm.nih.gov/projects/SNP/] is used. In order to determine the effects of the variants on proteins, gene annotations are associated using snpEff tool [Cingolani, P. "snpEff: Variant effect prediction", http://snpeff.sourceforge.net 2012]. The gene annotations are retrieved from Ensembl [http://www.ensembl.org/Homo_sapiens]. Known variation information (dbSNP) and gene annotations are associated to the variants detected using GATK's VariantAnnotator module. All the annotations about the variants, dbSNP association and gene effects are taken into cosideration.

Variant filtering and variants table

Even after extensive corrections and screening of the primary and secondary analysis data, there will be huge number of variants detected in each sample. Most of those variants are false positives. GATK's VariantFiltration module was used to provide a comprehensive and robust platform to filter variants in the Euro-Gene-Scan samples to reduce the number of false positive variants. All the variants detected are represented in a tab-separated value format. The filters applied and the annotations associated are also contained in the table.

Database design

The database design adopted for distributing Euro-Gene-Scan project data is designed. Variants identified in each of the PID designs are imported to the database and the data is then classified into sample-specific, variant-specific and annotation-specific tables. This classification enables one to efficiently store the information and also to do species or population specific comparison.

User interface

An online user interface has been created in order to navigate the database. The access is granted to all Euro-Gene-Scan partners over a secure website.

Access the database using here: http://egs.gatc-biotech.com using the login information provided by email.

WP3: The immunology of SNP-chip.

The group of Prof Bodo Grimbacher at University College London, as Beneficiary 2 of this project, has collected a cohort of families with at least one individual affected with CVID. In these families, CVID patients have a mutation in TACI (tumour necrosis factor receptor 13B, TNFRSF13B). The same mutation has been identified in some of their healthy family members. We compare the genome of these patients and healthy individuals to identify possible modifier genes that are mutated in patients and not in healthy family members additionally to TACI.

This is done by using a genome wide single nucleotide polymorphism (SNP) array from Affymetrix SNP array 6.0 with the Nsp/Sty restriction enzyme assay.

The data analysis is expected to show one or more SNPs that are present in patients only, which may then be associated with modifier gene(s). This would mean that a mutation in the identified modifier gene(s), in addition to a mutation in TACI, is responsible for the phenotype in CVID patients.

Description of work

Use the Affymetrix SNP-chip to assay common variable immunodeficiency (CVID) patients with genetic alterations in the TACI gene and in age and gender matched healthy donors, most of which are healthy family members with genetic alterations in TACI (equipment and know-how for this is in place).

Achieved:

During the project's duration 122 DNA samples from 36 families with CVID patients from 7 different sources worldwide have been collected. 115 samples have been analyzed by SNP array. The 115 samples include 40 CVID patients, 11 individuals affected by unclassified hypogammaglobulinemia or abnormal immunoglobulin A levels, and 62 healthy family members. Most individuals carry one of the mutations C104R (75 individuals) or A181E (24 individuals) in TACI, and 5 individuals have a compound heterozygous C104R/A181E mutation. 3 individuals carry a tyrosine at amino acid position 104 (C104Y), and 2 individuals have the genotype C104R/C104Y.

Of CVID patients with C104R mutations, 17 have a heterozygous, 5 a homozygous, and 4 a compound heterozygous mutation (2x C104R/C104Y and 2x C104R/A181E). 1 CVID patient has a heterozygous C104Y mutation. A181E mutations are heterozygous in 7 CVID patients and compound heterozygous in 3 (2x C104R/A181E and 1x A181E/187N). 2 CVID patients have the genotype Y79C/I87N and 3 have no TACI mutation. Of patients with hypogammaglobulinemia, 5 have a heterozygous C104R mutation, 3 a compound heterozygous C104R mutation (2x C104R/P97P and 1x C104R/G168R), and 2 a heterozygous A181E mutation. One patient with selective IgA deficiency has a heterozygous C104R mutation.

To identify modifier genes for TACI in CVID, families with individuals that carry a mutation in TACI in more than one generation are most suitable. Several families have individuals with CVID as well as some healthy family members with a heterozygous C104R mutation even have healthy or affected family members with available DNA samples over three generations.

In other families homozygous C104R mutations could be detected in affected but as also in healthy individuals. This shows that a homozygous mutation in TACI is not the sole cause for the disease. Another example for this is a family with members that carry the homozygous mutation and a phenotype of dysgammaglobulinemia.

The collection also includes families with milder forms of hypogammaglobulinemia.

A heterozygous C104R mutation in healthy family members and individuals affected by hypogammaglobulinemia was found in several families. One family has healthy individuals, CVID patients and a patient with selective IgA deficiency all with the same het C104R mutation. Interestingly, one CVID patient from this family has a wild type TACI genotype.

Families with the A181E TACI mutation comprise approximately one third of our sample collection. Some families have healthy individuals and CVID patients with the heterozygous A181E mutation. In one family, the heterozygous A181E mutation was found in healthy family members as well as in CVID and hypogammaglobulinemia patients, whereas in other families, all patients with the heterozygous A181E mutation have the phenotype of hypogammaglobulinemia.

Two families had a combination of C104R and A181E TACI mutation.

Analysis of the SNPs associated with the CVID phenotype in the presence of a TACI mutation and compilation of SNPs within those above listed genes

DNA of all the individuals listed above was collected and analysed on an Affymetrix 1 million SNP-Chip, followed by multivariate analysis of the results. A list of SNPs was compiled and those SNPs were identified that were present in addition to the TACI mutation in patients with CVID, but absent in healthy individuals with the same TACI mutation.

Samples that contributed directly to the LOD scores were those of individuals heterozygous for TACI mutations, whereas individuals who are wild type for TACI or have biallelic mutations contributed indirectly by helping to establish the phase. We could only get linkage evidence in families with a least two individuals heterozygous for TACI mutations. All genotyped individuals were used to estimate marker allele frequencies.

Achieved:

Multivariate analysis was performed as described, and a list of all SNPs was compiled. We identified one genetic modifier locus on chromosome 5 with a LOD-score of 3.98 under the hypothesis that the modifier acts autosomal recessive, and one genetic modifier locus on chromosome 3 with a LOD-score of 3.61 under the hypothesis that the modifier acts autosomal dominant.

Additional projects:

To diagnose more early-onset colitis patients with IL10- or IL10 receptor-deficiency and to compare the clinical outcome with or without hematopoietic stem cell transplantation.

Children suffering from early-onset colitis with severe auto-inflammation of the large intestine were found to have homozygous mutations in the gene IL10RB encoding the beta-subunit of the IL-10 receptor (Glocker et al. 2009) or in the gene encoding IL-10 (Glocker et al. 2010). Hematopoietic stem cell transplantation (HSCT) had been curative in these children, however, so far only four children had been found with these mutations and only 3 received HSCT. We therefore aimed for diagnosing more early-onset colitis patients with IL10- or IL10R-deficiency by mutation screening and functional studies of the IL10 signalling pathway, including STAT3 phosphorylation assays and cell stimulation experiments, and to compare the clinical outcome of patients that received HSCT to those who didn't.

Results: In a cohort of 30 patients with severe enterocolitis of unknown aetiology, severe perianal disease with formation of enterocutaneous and/or rectovaginal fistulae, and an onset within the first four years of life, we found five patients with novel mutations in IL10RA or IL10RB. For the observation of the clinical course and response to treatment, the two previously identified IL10-deficient patients (Glocker et al. 2010) were also included. Six of the seven patients received standard immunosuppressive therapy. All were resistant to anti-inflammatory drugs and monoclonal antibodies. Owing to the life-threatening and life-shortening clinical course, allogeneic HSCT was carried out in three patients for whom a matched donor was available, resulting in sustained remission. The positive outcomes in the transplanted patients suggest that HSCT should be considered in patients with IL10/IL10R deficiency. Successful HSCT probably is the only therapeutical approach to cure the disease, and to give the affected patients better quality of life. Patients who have not been transplanted continued to suffer from severe inflammatory bowel disease and perianal disease requiring life-long immunosuppressive therapy.

To investigate the hypothesis that auto-antibodies against IL10 play a role in inflammatory bowel diseases.

Introduction: The term inflammatory bowel disease (IBD) describes a heterogeneous group of conditions characterized by a chronic and relapsing inflammation of the small or large intestine with substantial associated morbidity and mortality. The two most prevalent IBD entities are Crohn?s disease and ulcerative colitis, both of which give rise to symptoms such as abdominal pain, diarrhea, bleeding and malabsorption (Baumgart and Sandborn 2007). A profound dysbiosis of the intestinal microbiote in genetically susceptible individuals is a key step in the development of chronic relapsing inflammation. The identification of monogenic defects causing IBD in the IL10 receptor ?- and ?- chain as well as in the IL10 gene (Glocker et al. 2007, Glocker et al. 2010, Moore et al. 2001) emphasizes the pivotal role IL10 signaling plays in the suppression of inflammation in the gut.

IL10 is a pleiotropic cytokine produced by monocytes, macrophages, regulatory T cells and other lineages exerting immunoregulatory functions (Moore et al. 2001). While IL10 possesses various immunostimulatory effects, especially on B cells and CD8 T cells, it also has diverse immunosuppressive functions, rendering IL10 the most important anti-inflammatory cytokine. It is capable of suppressing the expression of pro-inflammatory cytokines such as TNF-?, IFN-?, IL1?, IL6 and IL12 as well as the expression of cell surface molecules such as MHC II or B7 (Moore et al. 2001, Williams et al. 2004, de Waal Malefyt et al. 1991).

We aimed to test the hypothesis of auto-antibodies binding to IL10 or to the IL10 receptor in IBD patients, thus, abrogating IL10 signaling and mimicking a state of IL10 deficiency. An ELISA screen for auto-antibodies of the IgG and the IgA isotype putatively directed against IL10 signaling components was performed on 52 IBD patients, 38 of which had Crohn?s disease (CD) and 14 ulcerative colitis (UC).

Results: Five IBD patients showed increased values for anti-IL10 IgG antibodies. However, the means did not differ significantly from the healthy control group; one-way analysis of variance resulted in an insignificant P-value of 0.1094. For auto-antibodies against the IL10 receptor, some patients proved to have significantly elevated ODs of up to 4 times the mean suggesting that some IBD patients do produce auto-antibodies against the IL10 receptor ?- or ?-chain. In the CD group, 8% of patients tested for anti-IL10RB had values that were higher than any value measured in the healthy donor group, compared to 7% of patients in the UC group. For anti-IL10RA, 11 % of the CD patients and 7% of the UC patients had elevated values.

To assess whether the detected IL10 receptor autoantibodies have functional consequences, their impact on STAT3 phosphorylation, a key event in IL10 signaling, was determined. Analysis of STAT3 phosphorylation upon IL10 stimulation of PBMCs pre-incubated with serum of a healthy donor or serum of two patients with increased values of anti-IL10RA and anti-IL-10RB auto-antibodies showed that the concentration of detected antibodies was not sufficient to significantly reduce IL-10 downstream signaling. Even when repeated with higher concentrations of suspected auto-antibodies, inhibition of IL10-induced STAT3 phosphorylation could not be obtained.

Antibodies of the IgA isotype

IgA is the most abundant immunoglobulin in the gut, and its putative role in IBD has been often speculated upon. We therefore extended our search for the detection of anti-IL10 and anti-IL10R auto-antibodies of the IgA isotype. However, neither CD nor UC patients produce significant amounts of IgA auto-antibodies against IL10 or the IL10 receptor. Stratifying patients according to their age of onset or site of manifestation did not change the results.

In summary, even though there was no evidence for an autoimmune disruption of IL10 signaling in the general IBD patient population, in singleton patients auto-antibodies against the IL10 receptor may contribute to perpetuation and exacerbation of disease if present at sufficiently high titers; a tendency is shown for singleton patients in this study.

To identify new mutations in DOCK8. To characterise the clinical phenotype of AR-HIES due to DOCK8 mutations. To determine if AR-HIES patients with or without DOCK8 mutations can be distinguished by clinical features; and to establish diagnostic guidelines to distinguish between DOCK8- and STAT3-deficiency.

Introduction: Hyper-IgE syndromes (HIES) are rare primary immunodeficiencies (Minegishi 2009). Autosomal-dominant HIES (AD-HIES) due to STAT3 defects (Woellner et al. 2010) and autosomal-recessive HIES (AR-HIES) due to DOCK8 defects (Engelhardt et al. 2009) share most but not all clinical features. Distinguishing AR- from AD-HIES is of clinical importance (e.g. indication for transplantation). The different modes of inheritance are inadequate to guess the mutated gene because many patients with STAT3 defects have de novo mutations and unaffected parents.

Results: DOCK8 was analysed in 74 patients in 54 families with the phenotype of AR-HIES. In 59 patients from 45 families we identified 28 deletions/insertions, 12 splice site or nonsense mutations, and one gene transcription failure; four others had evidence of DOCK8 deficiency, but the mutations could not be characterized fully. 15 patients from 9 families did not have a DOCK8 mutation. So unlike AD-HIES, AR-HIES definitely has locus heterogeneity.

Clinical data were collected and compared between 36 index patients with and 10 without DOCK8 mutation using regression analysis. However, the clinical phenotype of AR-HIES was too heterogeneous to distinguish patients with or without DOCK8 mutations based on clinical features.

Furthermore, the 36 index patients with DOCK8 deficiency were compared to 58 AD-HIES index patients with STAT3 mutations. A machine-learning approach was used to identify features that better predict a DOCK8 or STAT3 mutation, respectively. A combination of five clinical features can predict DOCK8 mutations in patients with a diagnosis of HIES, although misclassifications can occur. AR-HIES is not strictly a milder disease than AD-HIES, since some features such as eosinophilia are more severe in DOCK8 deficiency than in STAT3 deficiency. We propose the following diagnostic guidelines for DOCK8 deficiency: Possible: NIH HIES score more than 20 plus a weighted score of clinical features >12 based on hypereosinophilia and upper respiratory tract infections (weighing positively) and parenchymal lung abnormalities, retained primary teeth, and minimal trauma fractures (weighing negatively). Probable: Above plus consanguineous parents, severe viral infections and/or allergies. Definitive: Homozygous or compound heterozygous mutation in DOCK8 and/or lack of full-length protein expression.

To investigate disease-modifying SNPs in conjunction with monogenic disorders with a known mutation in other primary immunodeficiency diseases.

Depending on the funding, pre-existing, clinically well-characterized cohorts of our collaborators will also be screened on the newly developed ?immunology SNP-chip. These may include: approximately 100 samples of patients with the hyper IgE syndrome and mutations in STAT3, samples with the autoimmune lymphoproliferative syndrome (ALPS) and mutations in TNFRSF6, samples of patients with Artemis deficiency and mutations in DCLRE1C, more than 500 samples of patients with IgA deficiency, and more than 50 samples of patients with chronic mucocutaneous candidiasis are already available to us.

Done, analysis pending:

Collection of samples from other PID families was severely delayed, as most collaborators were no longer willing to ship samples for genetic analysis without obtaining an additional ethics vote and patient consent. This was previously not anticipated as practice (not legislation) has changed during the course of the project. However, several DNA samples from families with individuals with Hyper IgE-syndrome and STAT3 mutation, as well as from families with inflammatory bowel disease and IL10 mutation were genotyped on the Affymetrix 1 million SNP-chip. Analysis including compilation of a list of SNPs and identification of linkage regions is currently being done.

WP4: Reverse phase protein (RPP) microarrays for large-scale screening for PID.

In WP4, beneficiary one aims to develop the RPP technology for large scale screening for concentrations of serum proteins. The method has been validated in the first stage of this project and used for measuring the IgA levels in large number of normal children and controls. Large scale screening was also carried out on a cohort of patients with Graves disease (Jorgensen GH, Ornolfsson AE, Johannesson A, Gudmundsson S, Janzi M, Wang N, Hammarström L, Ludviksson BR. Association of immunoglobulin A deficiency and elevated thyrotropin-receptor autoantibodies in two Nordic countries. Hum Immunol 72, 166-172, 2011).

Theoretically, IgA deficiency would be a hallmark of most antibody deficiency syndromes and testing for IgA levels in eluates from DBSS might identify a majority of patients with severe combined immunodeficiency (SCID), hyper-IgM syndrome HIGM, common variable immunodeficiency (CVID) and IgAD. (In total, these patient categories correspond to >90% of all patients with antibody deficiency in Europe). As the IgA in newborns is considered to be of foetal origin, lack of IgA would clearly suggest a deficient patient. DBSS were therefore collected from patients diagnosed with various forms of primary immunodeficiency and a large number of controls. The serum proteins were eluted and subsequently tested using both ELISA and RPP. Although the ELISA gave reproducible results, the RPP was not sufficiently sensitive to measure the very low levels of IgA in the eluates. Surprisingly, IgA was found in most patients, except those born to IgA deficient mothers, suggesting that, contrary to the current dogma, there is actually a small, but appreciable, transport/diffusion of IgA from mother to child during pregnancy. This manuscript was rejected by several journals as it was suggested that this technology represented an ?old-fashioned? approach for newborn screening and the advent of the TREC/KREC technology would make this type of testing superfluous. We thus carried out extensive work on this angle (Borte S, von Döbeln U, Fasth A, Wang N, Janzi M, Winiarski J, Sack U, Pan-Hammarström Q, Borte M, Hammarström L. Neonatal screening for severe primary immunodeficiency diseases using high-throughput triplex real-time PCR. Blood in press 2012) and showed that, contrary to the belief of the referees, most patients with antibody deficiencies (accounting for the vast majority of PID patients), are NOT detected by TREC/KREC testing. Nor are patients with complement deficiencies detected by the TREC/KREC assay. Our manuscript has now been substantially re-written and re-submitted, arguing strongly that the our approach actually represents ?state-of-the-art? in newborn screening (Borte S, Janzi M, Pan-Hammarström Q, von Döbeln U, Nordvall L, Winiarski J, Fasth A, Hammarström L. Evaluation of lack of IgA as a diagnostic marker for primary immunodeficiency disorders using DBSS eluates. PLoS One submitted).

The above tests have all been based on detection of a single analyte. The final planned task proposed within this project was to develop a screening method for multiple proteins (n=5) in the same assay. This project has been quite labor-intensive and involved a change in platform format as the RPP, albeit possible to utilize for a limited number of analytes, was found to be less suitable for this purpose. Thus, in collaboration with colleagues at SciLife (a core facility within the Karolinska Institutet), we have, using blinded serum samples from PID patients from partner 4 (Ewa Bernatowska), successfully developed a Luminex, bead-based technology, theoretically being able to quantify up to 100 different proteins in the same sample/run. Due to limitations in the accessibility of antisera suitable for this assay format, we have been restricted to simultaneous detection of 46 proteins (known to be mutated/missing in PID patients). The results are currently being written up.

WP5. Conferences for high-throughput technologies for diagnostics

In WP5, beneficiary 4 (CMHI) together with beneficiary 1(KI) organized conferences on massive, parallel DNA sequencing and SNP technology and a conference on proteome-based high-throughput diagnostics in Warsaw, Poland.

Beneficiary four has established the Polish Working Group for Primary Immunodeficiency (PID), March 2005. The Group has been set up from nine main Polish centres for the diagnosis and therapy of PID, covering the whole of Poland. The main objectives of the Group?s activities are to increase awareness of PID among scientists, clinical immunologists and paediatricians and general practitioners and to achieve the development of channels for the active dissemination of information about the novel technologies among patient organizations, media and public health groups, governmental and non-governmental organizations. Information about planned conference on massive parallel DNA sequencing and SNP technology and conference on proteome-based high-throughput diagnostics were submitted to the departments of genetics well in advance, both in public and private sectors throughout the country. Information also appeared in medical journals, and in the tablets of the universities. Prior to both conferences CMHI has been closely cooperating with the Polish Society of Experimental and Clinical Immunology, the Polish Genetic Society and the Society of Diagnostic Laboratories concerning dissemination of information. Organizations and laboratories carrying out diagnostics in the fields of muscle disorders, growth deficiencies, hearing or vision impairments and metabolic defects has been also invited to participate. The information about both conferences has been published in the broad spectrum of medical journals, on the web-sides and disseminated by mail to individual scientists.

The aim was to optimise efforts to achieve new diagnostic tools exploiting the knowledge of the human genome in combination with advanced read-out technology and actively make these available in a disease-neutral way. The new technology advances developed to optimise the development of high-throughput technologies, which were implemented in the EURO-GENE-SCAN work have been presented during both conferences. The scientific program has been prepared by beneficiary one together with beneficiary four. Speakers carrying out frontline research in high-throughput technologies for the development of sensitive and reliable diagnostics have been invited.

1. Conference on Massive Parallel DNA Sequencing, and SNP Technology, chaired by Beneficiary one (prof. Edvard Smith), 12 May 2011, location: Stanislaw Staszic Palace, Nowy Swiat 72, Warsaw.
2. Conference on Proteome-based high-throughput diagnostics. chaired by Beneficiary one (Prof Lennart Hammarström), 13 May 2011, location: Stanislaw Staszic Palace, Nowy Swiat 72, Warsaw.

Around 110 persons (medical doctors, researchers, students) participated. The highly professional presentations and discussions were fruitful and very useful for the participants. Due to the great interest, exceeding housing opportunities, not all who wished to participate in both or even one of the conferences were able to attend.

Numerous inquiries about the possibility of presentation of the subject once again, caused the organization of autumn meeting in the southern part of Poland. The second conference on massive parallel DNA sequencing, and SNP Technology and on proteome-based high-throughput diagnostics has been organized in Zakopane, 3 and 4 November 2011.

Furthermore, in WP5, beneficiary four (CMHI) together with beneficiary 1(KI) organized a workshop on high throughput technologies for diagnosis in Istanbul, from 3 to 6 October 2010 at the European Society of immunodeficiencies (ESID) biannual meeting.

WP6. Collecting test samples for massive parallel DNA sequencing and SNP technology. In WP6, beneficiary four (CMHI) together with beneficiary 1(KI) are responsible for generation of a large set of samples from well-characterized patients, which were used to validate the Roche 454 and the Solexa (Illumina 1G) and the SNP-chip based technologies. The collection of 189 DNA samples corresponding to different PID diseases has been done by beneficiary 4. From each patient more than 10 µg of DNA sample was prepared. The quality control of each sample has been done by electrophoresis. The samples were shipped to Partner 1 and the additional quality check of DNA samples was done at the Karolinska Institutet.

Genetic diagnosis of patients was originally done in Department of Medical Genetic CMHI, Warsaw; Department of Immunology at the Erazmus University in Rotterdam; Laboratory of Human Genetics of Infectious Diseases, University of Paris René Descartes-INSERM U550; Department of Infectious and Paediatric Immunology, Medical and Health Science Center, University of Debrecen, Hungary; Department of Paediatrics, Oncology, Hematology and Diabetology, Medical University, Lodz, Poland; Health Center - Genomed, Warsaw, Poland.

Potential impact:

The EURO-GENE-SCAN project is focusing on innovative high-throughput technologies for genotyping, sequencing, including SNP-analysis in the areas of genomics as well as new approaches in proteomics. We concentrate on primary immunodeficiency diseases and develop high-throughput diagnostic tools. This is of high importance since due to their rareness, PID patients are often misdiagnosed or neglected. However, early diagnosis and response are of great importance, since delayed management of PIDs can lead to severe and irreversible complications or even death of a patient.

The diagnosis is difficult since PIDs are often multifactorial in the sense that they fall into subgroups with a similar phenotype but with different aetiology. Heterogeneous clinical presentation and/or locus heterogeneity is common. This also means that making correct diagnosis is highly complex. Thus, the symptoms and signs presented by a patient may not be typical for a particular disease entity, and even when they are typical, there are frequently a number of genes, which could cause a similar disease when mutated. Moreover, for many diseases there are genotype-phenotype correlations, meaning that symptoms and signs are influenced by the location and type of mutation.

Making an early and reliable diagnosis is often crucial for efficient treatment. This, of course, affects dramatically the quality of life of patients suffering from these diseases and also impacts heavily on their family members.

The implications of the identification of immunodeficiency genes/mutations are manifold. From a clinical point of view, mutation analysis, carrier detection and prenatal assessment have now become possible for many diseases. Gene/mutation identification also offers possibilities for the development of novel, far more accurate, treatment regimens, such as gene therapy. Thus, the project has the potential to have a direct positive impact on health of the European citizens. Since the technology will also become cheaper, the social impact is especially true for those regions of Europe where the current costs for mutation detection are prohibitive. Furthermore, the implementation of adequate diagnostic high-throughput strategies will reduce the costs to the society. It is possible that by using high-throughput techniques, all currently known 179 PID genes can be analysed in a single run considerably reducing current costs for mutation detection.

To this end, another important issue is that studies on the aetiology of these genetic diseases contribute to better understanding of human immunological processes in both health and disease. Throughout the history of immunology, these 'experiments of nature' have been crucial for the understanding of the basic biology of the immune system.

List of websites:

'http://www.gatc-biotech.com/en/about-us/research-development/euro-gene-scan.html'.