Skip to main content
European Commission logo print header

Sharing capacity across Europe in high-throughput sequencing technology to explore genetic variation in health and disease

Final Report Summary - GEUVADIS (Sharing capacity across Europe in high-throughput sequencing technology to explore genetic variation in health and disease)

Executive Summary:
An increasing number of research centres in Europe have access to the latest high-throughput next-generation sequencing technologies. Storing and analysing the large amount of data produced generate major challenges. Tackling these challenges requires extensive exchange of data, information and knowledge between a wide range of stakeholders. The GEUVADIS (Genetic EUropean VAriation in DISease) Consortium had four main aims: 1. Develop standards in quality control (QC) and assessment of sequence data (WP2). 2. Develop models for sequencing data storage, access and exchange (WP3) 3. Develop standards for the handling, analysis and interpretation of sequencing data from DNA and RNA (WP4 and 5) 4. Develop guidelines on the handling of ethical, legal and social implications (ELSI) of phenotype prediction from sequence variation (WP6).
Since the start of the project in October 2010, the Consortium has obtained high impact results in all these areas:
1: After a detailed assessment of how GEUVADIS laboratories manage QC of Exome and RNA sequencing processes, we concluded that although the procedures for sample quantification, target enrichment and sequencing library preparation were quite similar across laboratories, there were substantial differences in the subsequent data analysis pipelines with respect to filtering procedures, alignment of sequencing reads and variant calling. Secondly, collaboratively established best-practice in quality control and sequence analysis of mRNA and small RNA. We successfully applied these practices in a large GEUVADIS study (see 4.)
2: We set up and implemented a data access and exchange strategy for the GEUVADIS project RNA sequencing data. The data access scheme for the GEUVADIS RNA sequencing experiment is summarised in the scheme available at http://www.geuvadis.org/web/geuvadis/RNAseq-project#Data_Access. Secondly, NGS data storage and exchange recommendations were developed and reported: Minimum Information about the Next generation Sequencing Experiment (MINSEQE) reporting standard, MAGE-TAB submission format for NGS data, and NGS terms in EFO ontology. Analysis components lists have been created and tested internally at EBI and externally by other GEUVADIS consortium partners. GEUVADIS FTP site for the scientific data exchange was established at EBI and used successfully by all GEUVADIS partners. Five datasets generated within GEUVADIS consortium have been deposited into EBI archives and are publicly available. GEUVADIS Data Browser was created and successfully used for the analysis results visualization.
3: 500 RNA samples were distributed, and sequenced by seven laboratories, using highly standardised protocols and quality control measures. The experiments and the core data analysis took place in a highly collaborative manner. Analysis pipelines developed by GEUVADIS partners were used to deliver a high quality analysis: Lappalainen et al., Nature 2013, ‘t Hoen et al. Nat Biotech, 2013. Four subgroups were formed and collaborated on medical exome sequencing projects focused on Parkinson's disease, chronic inflammatory disorders, Fibromyalgia and intellectual disability. We also published and analysed a European-wide survey on existing exome sequencing technologies, tools, use and data storage, access and sharing policies. In addition, we have created a GEUVADIS European Exome Variant Server (GEEVS), hosted at P1, CRG, now accessible to the public at: http://geevs.crg.eu/.
4: We Analysed ELSI aspects of phenotype prediction from sequence variation in various clinical situations; and in direct-to-consumer (DTC) whole genome sequencing. We also produced a position paper & policy recommendations on ELSI related policy guidelines regarding DTC sequencing. Finally, we performed a study on the analysis of professional attitudes regarding large scale genetic information generated through next generation sequencing in research.
During GEUVADIS, we organised/co-organised 11 Training Events and workshops, and co-organised two international conferences.
The project results were disseminated to the scientific community through 176 oral and 22 poster presentations, and 43 scientific articles. We also produced a podcast and a video targeting the general public, explaining the main objectives and results of GEUVADIS. The project received important press coverage (23 articles). Overall, the activities performed during the project duration enabled us to design and disseminate standards in operating procedures and biological/medical production and interpretation of RNA and exome sequence data in relation to clinical phenotypes. The consortium brought together cutting-edge knowledge and resources on medical genome sequencing at the European level, and allowed our researchers to develop and test new hypotheses on the genetic basis of disease. We have produced an unprecedented RNA database, as well as an exome variant server, both open to the whole scientific community. Through dissemination, training and ELSI-related activities, our Consortium strongly engaged with society and included all its activities in a wider setting than the restricted genomics research community. Indeed, the rise of sequencing technologies has a strong impact on the daily practice of medicine, on public health systems, and on society in general. Through GEUVADIS, we promoted an efficient, reasonable and responsible use of these technologies in research as well as in clinical practice.  

Project Context and Objectives:
The latest high-throughput next-generation sequencing technologies allow investigators to sequence entire human genomes and transcriptomes at an affordable price and within a short time frame. An increasing number of research centres in Europe have access to these technologies, in-house or through regional, national and international infrastructures. Storing, disseminating and analysing the large amount of data produced generate major challenges. Tackling these challenges requires extensive exchange of data, information and knowledge between sequencing centres, bioinformatics networks, the medical research community and the industry at the European level. The GEUVADIS (Genetic EUropean VAriation in DISease) Consortium has four main aims; each split in several sub-aims:

1. Develop standards in quality control (QC) and assessment of sequence data (WP2).
- To define standards for quality control of sequence from complete transcriptomes and exomes to identify rare mutations.
- To guarantee production of high quality sequence data across the participating centers.
- To make recommendations on standard procedures for harmonized quality control of DNA and RNA sequence data across European laboratories that are applying next generation sequencing technologies to transcriptome and exon sequencing.

2. Develop models for sequencing data storage, access and exchange (WP3)
- To coordinate efforts to develop standards and formats for data representation and exchange.
- To coordinate efforts to develop data structures for storing the was amount of sequencing data and related metadata in databases
- To develop procedures and pipelines for depositing sequencing data and related metadata to durable archives at the European Bioinformatics Institute (EBI) and controlled data release to the users

3. Develop standards for the handling, analysis and interpretation of sequencing data from DNA and RNA (WP4 and 5)
WP4:
- To develop routines for large scale sequencing projects related to functional genomics
- To coordinate selection of 500 RNA pilot samples from existing collections with WP5
- To coordinate RNA sequencing of 500 samples
- To coordinate the analysis and interpretation of RNA sequencing data
- To extend these methodologies to other functional genomics datasets
WP5:
- To develop routines for large scale sequencing projects related to rare variants
- To coordinate selection of 500 DNA pilot samples from existing collections
- To coordinate exome selection and sequencing of 500 samples
- To coordinate the analysis and interpretation of sequencing data

4. Develop guidelines on the handling of ethical, legal and social implications of phenotype prediction from sequence variation (WP6)
- To identify specific ELSI issues relevant to phenotype prediction from sequence variation in various clinical situations, from the experience of the partners and from bibliographical work
- To map these issues among those already well addressed with established ethical and legal frameworks in genetics and to identify the gaps and difficulties on the three levels (ethics, law, social impact and acceptability)
- To propose policy guidelines for implementation of phenotype prediction from sequence variation into health systems
- To propose an analysis and a white paper on direct to consumer proposal of DNA sequences
1. Develop standards in quality control (QC) and assessment of sequence data (WP2).
- To define standards for quality control of sequence from complete transcriptomes and exomes to identify rare mutations.
- To guarantee production of high quality sequence data across the participating centers.
- To make recommendations on standard procedures for harmonized quality control of DNA and RNA sequence data across European laboratories that are applying next generation sequencing technologies to transcriptome and exon sequencing.

2. Develop models for sequencing data storage, access and exchange (WP3)
- To coordinate efforts to develop standards and formats for data representation and exchange.
- To coordinate efforts to develop data structures for storing the was amount of sequencing data and related metadata in databases
- To develop procedures and pipelines for depositing sequencing data and related metadata to durable archives at the European Bioinformatics Institute (EBI) and controlled data release to the users

3. Develop standards for the handling, analysis and interpretation of sequencing data from DNA and RNA (WP4 and 5)
WP4:
- To develop routines for large scale sequencing projects related to functional genomics
- To coordinate selection of 500 RNA pilot samples from existing collections with WP5
- To coordinate RNA sequencing of 500 samples
- To coordinate the analysis and interpretation of RNA sequencing data
- To extend these methodologies to other functional genomics datasets
WP5:
- To develop routines for large scale sequencing projects related to rare variants
- To coordinate selection of 500 DNA pilot samples from existing collections
- To coordinate exome selection and sequencing of 500 samples
- To coordinate the analysis and interpretation of sequencing data

4. Develop guidelines on the handling of ethical, legal and social implications of phenotype prediction from sequence variation (WP6)
- To identify specific ELSI issues relevant to phenotype prediction from sequence variation in various clinical situations, from the experience of the partners and from bibliographical work
- To map these issues among those already well addressed with established ethical and legal frameworks in genetics and to identify the gaps and difficulties on the three levels (ethics, law, social impact and acceptability)
- To propose policy guidelines for implementation of phenotype prediction from sequence variation into health systems
- To propose an analysis and a white paper on direct to consumer proposal of DNA sequences

Project Results:
WP1: Coordination and Communication Office

Communication and coordination
The CRG dedicated scientific project management team ensured the daily management and coordination of the project. All project activities were carefully monitored through an efficient communication strategy and through several efficient tools:
- Six specific Mailing Lists, through which we could target the recipient of specific information within one Work Package or other specific sub-project group, for exemple the RNA sequencing analysis group mailing list.
- Regular Teleconferences, with updates on the project's activities and discussion on WP tasks implementation and future collaborations.
- A Website, comprising of a public page, including 'resources' sections dedicated to the general public, as well as a podcast explaining the main concept of the project, and a video published by the project XploreHealth.
- An intranet accessible to consortium members only, and gathering all useful information on the project; as well as a Wiki - created by the UNIGE team to monitor all activities within WP4
- 3 Press releases: the first one was published during the kick-off meeting, the second one was published when we gave open access to the WP4 generated RNAseq data, and the third one at the publication of our two collaborative high impact publications.
- Two Interim Reports where each partner specifically detailed their participation to all project WP was prepared and distributed to the SAB before the first and second Annual Meetings. These report enabled us to monitor the progress of the project and to get specific feedback from the SAB on how to improve our efficiency within GEUVADIS.

Events
We successfully organized the kick-off meeting in Barcelona, the first annual meeting in Toulouse, the second annual meeting in Santiago de Compostela, where the general assembly gathered and had an opportunity to discuss project activities with Professor Pui-Yan Kwok, our scientific advisor.
We also organized the final meeting as a satellite event of the Hands-on biobanks 2014 event in The Hague, the Netherlands, which GEUVADIS co-organized with the BBMRI project, where the general assembly gathered and had an opportunity to discuss the impact of the GEUVADIS project and future activities which could be put in place for the future.
The list of these events is detailed in the table below:

Type Date Comment
KO Meeting 2011 17/12/2010 Barcelona, Spain
Annual Meeting 2012 28/11/2011 Toulouse, France
Annual Meeting 2013 29/10/2012 Santiago de Compostela, Spain
Final meeting 2013 20/11/2013 The Hague, the Netherlands


WP2: Quality Control of Sequence Data

The aim of WP2 was to establish and disseminate standards for quality controls (QC) for Next Generation Sequencing of RNA and Exomes.

Task 2.1. Survey
Compare already established procedures for quality assessment of the sequence data among the participating laboratories with respect to transcriptome and exome sequencing using NGS platforms available within the consortium; through a systematic survey conducted among the participating sequencing laboratories.
During the first period the main focus has been on identifying areas in the analysis pipeline where quality control checks are important or even vital. We have considered the whole process, beginning with the starting material, library preparation and sequencing, continuing to data analysis and the establishment of the RNA/DNA sequence. Based on input from the other sequencing partners we have defined and listed important QC parameters that are currently applied amongst the partners. We have produced a survey document (D2.1 Quality control of sequence data.)
Based on the parameters and standards defined in D2.1 a free text survey was produced and distributed to the members of the GEUVADIS consortium. The main conclusion based on the results from the survey was that the different laboratories, in general, had very similar laboratory procedures for sample quantification, target enrichment and sequencing library preparation. On the contrary, there were substantial differences in the subsequent data analysis pipelines with respect to filtering procedures, alignment of sequencing reads and variant calling etc…
These findings reflect the fast, continuous development of the various sequencing applications, both by the manufacturers of sequencing equipment as well as by the scientific community.
We also note that the manufacturers typically only support and leave warranty for their own sequencing kits and protocols, whereas the data analysis is very much dependent on the individual study, where the scientists have a plethora of software and parameters to tweak.

Task 2.2. Best practice
Define best-practice procedures for quality control of transcriptome and exome sequence data from the existing sequencing platforms.
As part of WP4, mRNAs and small RNAs of lymphoblastoid cell lines of 465 individuals were sequenced in a distributed manner by P1-CRG, P2-UNIGE, P3-HMGU, P6-MPIMG, P8-UU, P9-CAU and P11-LUMC. The RNA was extracted by P2-UNIGE and subsequently distributed randomly between the partners who sequenced the samples, following strict guidelines with respect to protocols and version of reagent kits to be used.
The main guidelines for standardized mRNA sequencing was to use HiSeq-instruments for sequencing, to apply 75 bp paired-end sequencing with fragment size of 280 bp and to aim at a coverage with 10 million paired reads. Illumina TrueSeq protocols and reagents for library preparation, cluster generation and sequencing were used. All libraries were quality controlled by sizing using a Bioanalyzer instrument prior to sequencing.
The success of using this approach has so far manifested itself in two published articles in Nature (Lappalainen et al. (2013) Transcriptome and genome sequencing uncovers functional variation in humans. Nature, 501(7468), 506-511. DOI:10.1038/nature12531) and Nature Biotechnology (‘t Hoen et al.(2013) Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories. Nat Biotech 31(11), 1015-1022. DOI:10.1038/nbt.2702) where the latter particularly focuses on quality control of RNA-seq experiments. The RNA sequence data produced by the seven centers showed that the biological variation between the individual samples was much larger than the variation between the data from the different labs (fig WP2.1) thereby demonstrating that distributed RNA-sequencing is feasible.

Task 2.3 Recommendation for the community
These parameters include monitoring: the distribution of nucleotide-level quality scores; the average and distribution of GC content; the average and standard deviation of insert size; the percentage of reads mapping to annotated exons where at least 60% of the mapped reads should map to annotated exons; 3’ to 5’ coverage bias; and measures should be taken to detect sample swaps, contamination and outliers.
Table: Important quality checks in mRNA and sRNA sequencing (from t Hoen et al, Nat Biotech, 2013)

Quality checks common to mRNA and sRNA sequencing
- Distribution of base quality scores
- Average and width of the distribution of GC content
- Percentage of reads mapping to the genome
- Checks for sample swaps and contaminations
- Outlier detection: pair-wise correlations in expression quantification between samples
Quality checks specific for mRNA
- The average and standard deviation of insert size
- Percentage of reads mapping to annotated exons
- 5′–3′ trends in coverage across transcripts
Quality checks specific for sRNA
- Length distribution after adaptor clipping
- Percentage of reads mapping to known sRNA genes

WP3: Data Storage, Access and Exchange

Recommendations:
Next Generation Sequencing data storage and exchange recommendations have been developed and reported. These recommendations consists of three parts:
1. MINSEQE (Minimum Information about the Next gEneration Sequencing Experiment) reporting standard is established and used in ArrayExpress database EBI for NGS data.
2. MAGE-TAB submission format is used for NGS data submission and includes a number of recommendations how to describe NGS experiment.
3. NGS terms have been added into EFO – Experimental Factor Ontology in order to support qualitative NGS data annotation.

Analysis components
Analysis components lists have been created and successfully tested internally at EBI and externally by other GEUVADIS consortium members.
The standard analysis components for the variant discovery are:
• Quality control procedures
• Alignment
• Summarization by features procedures
• Normalization procedures

The standard analysis components for the variant discovery are:
• Alignment
• SNP Calling
• Indel Calling
• Structural Variant Discovery

Data deposition
GEUVADIS FTP site for the scientific data exchange was established at EBI and used successfully by all GEUVADIS partners. Data from FTP site have been copied to EBI repositories according to agreements with GEUVADIS RNA-seq project (WP4) partners.
Five datasets generated within GEUVADIS consortium have been deposited into EBI archives and are publicly available:
• GEUVADIS main RNA-seq project mRNA data that passed quality control procedures (ArrayExpress accession E-GEUV-1),
• GEUVADIS main RNA-seq project small RNA data that passed quality control procedures (ArrayExpress accession E-GEUV-2),
• GEUVADIS main RNA-seq project all mRNA and small RNA-seq data regardless quality control status (ArrayExpress accession E-GEUV-3),
• GEUVADIS pilot RNA-seq project mRNA data (ArrayExpress accession E-GEUV-4) and
• Dataset “Diagnostic Exome Sequencing in Persons with Severe Intellectual Disability” created by GEUVADIS partner RUNMC (Radboud University, Nijmegen Medical Center) from WP5 is deposited into EGA (The European Genome-phenome Archive) under submission EGAS00001000287, dataset's meta-data are available also through EBI ArrayExpress archive under accession E-GEUV-5.

Data visualization
GEUVADIS data browser was created especially for the GEUVADIS RNA-seq project to visualize quantification and QTL results and link them together according to the analysis type (transcript, exon, miRNA): http://www.ebi.ac.uk/Tools/geuvadis-das/
Links to the EBI archives with raw and mapped data as well links to the analysis results and genotypes are available from the GEUVADIS data browser.

EVA (European Variation Archive)
European Variation Archive is an EBI new project which is in the development state. GEUVADIS partners have submitted example datasets, participated in conference calls addressed the exome server submission format, and agreed to share the data. EVA plans to follow up with the GEUVADIS project members to initiate the new data submission to the EVA once the submission interface is ready for this (spring 2014).

WP4: Handling, Analysis and Interpretation or RNA-sequence data and other functional datasets
The main aim of WP4 was to perform the sequencing and data analysis of 500 RNAseq samples and use the data analysis and eQTL derivation as a means to develop and recommend data production and analysis standard for RNA sequencing.

Study design
The process of samples preparation, distribution to the 7 sites that will perform the sequencing experiment, organization of data deposition and primary analysis and ultimately the core data analysis towards a publication has been monitored by UNIGE through a variety of means, including regular Teleconferences, constant update of the GEUVADIS Wiki site and a continuous collaboration with EBI. A specific mailing list has been created for the analysis group, to target lab members from the GEUVADIS centers who are specifically involved in the sequencing data analysis. The Wiki site, hosted on the GEUVADIS intranet, is a web-based service which enables both project partners and external collaborators to enter all information relevant for the project.
We also hosted two specific analysis meetings in Geneva and Barcelona to agree on analysis plans, and distribute clear tasks to all team members.

Data analysis
After a collaborative and highly structured study design in the first half of the project, we have achieved the following goals:
1. Analysis of RNAseq data has been completed as part of a coordinated effort among many laboratories in Europe
2. Raw and processed data are available at public repository for the community to freely use and download (http://www.geuvadis.org/web/geuvadis/RNAseq-project#Data_Access). There is an associated browser (http://www.ebi.ac.uk/Tools/geuvadis-das/) where researchers can visualize both the transcriptome data and the eQTLs discovered in this study.
3. A number of papers have been submitted and two of them have been published in Nature and Nature Biotechnology with the others in various stages of review process. These papers have generated a lot of interest from the community and have become a reference for transcriptome studies both in research and the medical setting.

Main scientific findings
The importance of this project lies in three domains:
1. We show that the distributed RNA sequencing between seven laboratories has yielded data of extremely high quality, with good replication between different laboratories. Thus, RNA sequencing technology is mature enough for large distributed studies. Furthermore, the shared expertise of the partner laboratories in the most recent methods for e.g. read mapping sets an example for the technical execution of future RNA sequencing studies.

2. The dataset that we have created is likely to become one of the most important transcriptome reference datasets for the human genomics community, given the already published genome sequences of these individuals by the 1000 Genomes project. In addition to making the raw data freely accessible, we are also developing more advanced data sharing and visualization approaches with Ensembl.

3. Finally, and most importantly, this project gives us novel insight on how both common and rare genetic variation contributes to quantitative and qualitative transcriptome variation in human populations – not only expression levels, but also e.g. splicing, miRNA-mRNA interactions, fusion genes, and RNA editing. We discover cis-eQTLs for over 8000 genes and alternative splicing QTLs for thousands of genes. The genome sequencing data gives us a more precise view of the spectrum and functional mechanisms of causal regulatory variation. Additionally, analyzing allelic expression and splicing shows how the majority of regulatory variation is rare in the population, thus highlighting the importance of large sample sizes and transcriptome analysis at the level of an individual. Finally, we validate the functional effects of hundreds of loss-of-function variants annotated in these genomes, but also show frequent compensation mechanisms, highlighting how transcriptome sequencing is often essential for proper functional interpretation of genetic variants.


WP5: Biological and medical interpretation of sequence data for rare variants

Sequencing tools in Europe
Sequence analysis pipelines have been established at all GEUVADIS partner sites. These pipelines, with few exceptions, use open source software for mapping of sequencing reads to a reference and for subsequent filtering steps to enrich the output variants for properties such as frequency, functionality or pathogenic potential. Further than assessing sequence analysis tools available within the GEUVADIS partners, we decided to run a much wider survey across Europe, with the aim of collecting comprehensive qualitative and quantitative information on the use of Next Generation Sequencing for research and clinical purposes in Europe. The main objective of this survey was collect comprehensive qualitative and quantitative information on the use of Next Generation Sequencing for research and clinical purposes in Europe.
The analysis of the survey results lead to three major conclusions:
1. Exome sequencing is a key technology in human genetics research, and to a certain extent also in clinical genetics
2. Illumina HighSeq 2000, and more generally illumina is a key technology in human genomics. A focus on this technology is hence justified in Quality Control and data analysis standardization efforts such as the GEUVADIS WP2 and WP4 RNAseq study.
3. Data storage is mostly local, and sequencing data is not often shared. This gives a strong justification for the seed to setup user-friendly, secured shared databases of high quality so that the most is made out of the sequencing data produced throughout Europe, especially in a context where most sequencing producing institutions are publically funded. Once again, this gives a strong justification for the data storage standardization efforts conducted within GEUVADIS WP3, and for databases such as the GEEVS and the EVA.

Patient consent and data access
Patient consent issues have been intensely discussed between partner sites (see also WP6 ELSI). Principles of necessary and eligible points to be addressed in a sequencing consent have been made available by WP6. Partner sites have established appropriate consent forms.
Access to sequencing data is related to the original patient consents, and the interpretation of such consents through investigators and ethics committees.

Sequencing consortia
Both rare and common variant effect studies in medicine require large samples sizes. Sequencing projects with genome wide data being analyzed in cases and controls have shown that rare sequence variation using current sequencing technology requires individual level data analyzed centrally. A pilot project within GEUVADIS focused on exome sequences in 500 cases with Parkinson disease cases. Rare variants identified in candidate genes have been performed and will inform the design of customized SNP-arrays. Such array studies in turn will enable large scale association studies testing for both common and rare variants.

GEEVS: The GEUVADIS European Exome Variant Server
Next Generation Sequencing (NGS) methods have paved the way for in-depth analysis of Mendelian and complex diseases. Identification of rare coding and splicing genetic variants is crucial for the identification of causal genes; hence for effective analysis of clinical samples it is indispensable to distinguish between rare and common variants. As a first step in this direction the 1000 genomes project and the NHLBI Exome Sequencing Project (ESP) have released population allele frequencies for SNPs based on several thousands of samples. However allele frequency (AF) for some SNPs can differ drastically between populations and geographical regions. Unfortunately these region specific alleles will pop up as seemingly rare when comparing variants identified in a sample to the common AF databases. In order to compensate for this issue population/region specific AF databases are required. Here we designed a unified database of human genetic variants in patients of European ancestry. We have collected aggregated SNP frequencies from various European regions, which are provided by the members of the GEUVADIS consortium. The database stores aggregated variant frequencies along with allele counts, and functional annotation based on Refseq, Ensembl and UCSC KnownGenes. We have designed a standard (‘best practice’) protocol for variant calling, quality control and variant aggregation across institutes. The aggregated AFs show a high correlation to 1000genomes and Exome Variant Server AFs. The data is made available via an easy to use web interface called GEUVADIS European Exome Variant Server (GEEVS) or via download at http://www.geuvadis.org/geevs. The website further allows for data submission from any European institute interested in participating in this effort. In December 2013 data from more than 1000 cases had been aggregated.

WP6: Ethical, Legal and Social Issues of Phenotype prediction from Sequence Variation

Europe has the capacity to play a key role in the integration of sequence data with related phenotypic information, to improve understanding of disease and advance diagnostics and therapy development; this must be implemented in a responsible way and within an harmonized ethics framework that will ensure that these technologies are used for the service of patients and in the respect of fundamental values. Much work had already been done on ELSI in genetics and genomics, thus the work concentrated on the issues that challenge the classical framework through the developing capacities of sequencing, both for research and for translations into the health systems of European countries. They relate to privacy, data protection, results interpretation and communication, and availability of sequence data. In this context we first produced documents to help harmonizing the practices in ethical aspects of sequencing within the project through templates for informed consent, material transfer agreement and, where relevant an agreed ethics policy was set up and transparently communicated through the website (D6.1). We contributed to the analysis of issues in using high throughput sequencing (Soulier et al. book chapter 2012; D6.3) to the establishment through the European Society of Human Genetics and through the Public health genomics European network of European recommendations for the use of high throughput sequencing in research and clinics (Van el et al. J Hum Genet. 2013 Jun;21(6):580-4; Howard et al. Public health genomics, 2013, 16(3):100-9; D6.5). We participated in the international debate about return of results and incidental findings in research and in the clinical context in collaboration with other FP7 EU projects or international consortia such as PHGENII, TECHGENE, ESGI, ICGC, EUROGENTEST, 3Gb-TEST, P3G. We explored the views, opinions and self-reflexivity of scientists participating in the project about having their genome sequenced for research and the results put in a database; in addition to raise awareness and giving the possibility to express their views this led to a publication accepted in Journal of Empirical Research on Human Research Ethics. We also contributed to the general reflection concerning the necessity of an international infrastructure for coordinating and improving the efficiency of ELSI analyses in relation to genomics and society (Kaye et al. Science, 2012, 336, 673-674) and to policy analyses (Meslin et al. Clin Transl Med. 2013 Jul 27;2(1):14). These activities of mapping and coordinating activities in the ELSI domain that are relevant for the issues of phenotype prediction from sequence variation allowed to establish a list of experts and events relevant for the domain (D6.2) as well as proposing training modules, events, resources (D6.4) as well as contributing to policy aspects through answers to public consultations on policy documents. Through the organization of several expert meetings (MS19, 20, 21) and educational events we contributed to the update of the community on the evolving data protection framework in the EU, to the establishment of a state of the art on the market of sequencing proposed directly to consumer and to prudent policy aspects in this domain. Finally this coordination action was successful in allowing the scientific and medical community involved to contribute timely to the ethical debates and framework that will allow to keep the patient at the center of technology translation.

WP7: Dissemination and Training

In the past few years, next generation sequencing technologies have become mature and widely adopted, both in research and clinical settings. Still, there are many open questions around data standards, quality control, data analysis, and privacy. The GEUVADIS consortium addressed many of these open questions in expert workshops and network meetings.
The GEUVADIS consortium has been very active in dissemination and training. We have put significant efforts in increasing the awareness of the impact and ethical implications of the use of NGS technologies with the general public. This has been achieved through 3 press releases, 24 general press publications, and 16 oral presentations for the general public. These presentations were mainly focused to educate the general public about NGS technology itself, what people can expect from, how it will change diagnostics, health care and treatment. The address of ethical concerns and privacy and legal issues received special attention during these lectures.
The introduction of NGS technology in research and in clinical settings raised a high demand for training. We have organized strong training programs for both basic and clinical researchers and clinicians through the organization of 11 training workshops. These were mainly focused at data analysis routines and (clinical) interpretation of NGS data.
We have also been at the forefront of the ethical and legal discussions associated with NGS technology through the organization of 3 expert workshops with participants from all over the world.
Scientific work and coordinating activities achieved wide exposure in scientific conferences, including conferences on genetics, bioinformatics, medicine, law and bioethics. The GEUVADIS consortium presented a total of 176 scientific talks and 22 posters. Much of the research and discussion has been laid down in a total of 43 scientific publications. The average impact factor of these publications was 12.6 indicative of the high quality, impact and outreach of the work. Uniquely to this project, data associated with the RNA-seq publications (WP4) was already made available to the community in a very early stage, more than half a year before publication in scientific journals. The early access by the community has achieved a spur of further analysis activities by the scientific community, evident from the >300 times the data has been downloaded so far. See: http://www.ebi.ac.uk/Tools/geuvadis-das/and http://www.geuvadis.org/web/geuvadis/rnaseq-project

Again uniquely to this project, the wiki page in which GEUVADIS researchers archived RNA-seq analysis methods and routines, has been opened up to the community to achieve full transparency and to enhance reproducibility of the results. See: http://geuvadiswiki.crg.es/index.php/Main_Page

Finally, the GEUVADIS European Exome Variant Server (http://geevs.crg.eu/) has been set up (WP5) and opened to the community for the sharing of genetic variants in a secure and privacy-aware manner. This effort has set a standard in the field and increased the visibility of the GEUVADIS consortium.

Potential Impact:
Impact of our scientific results on the community

One of the main scientific achievements of the GEUVADIS project is its RNA study. This study involved RNA sequencing in seven European partner laboratories. In this project, we have sequenced mRNA and micro-RNA of 500 samples from the 1000 Genomes project, which is thus far the largest study worldwide to combine human transcriptome and genome sequencing data. The project was designed in 2010, the data was collected in 2011-2012, and the analysis of the data was completed by end of 2012. Two papers appeared in 2013 (Lappalainen et al. Nature and t’Hoen et al. Nature Biotechnology) and a number of other papers are submitted.

The work under this WP is going to have a fundamental impact on the way we use transcriptome data for medical studies. We have demonstrated that distributing RNA-sequencing among different laboratories is feasible. This possibility may be particularly attractive for large population-based and cross-biobank studies, where sequencing efficiency and sample logistics may require the combination of data from different laboratories. We have shown that it is feasible to do RNAseq studies in large scale and discover a lot of biological signal. We have therefore shown the value and high information content of the transcriptome. We have developed a number of methodologies and obtained such a deep biological insight that the community will benefit form in future studies. Given these results we anticipate that transcriptome analysis will become a key assay for diagnosis and prognosis in regular medical practice.
These methodologies will also have a higher impact than others produced in smaller scale studies also because we have developed them in the context of a consortium which involved the two internationally leading sequencing technologies companies to date, namely Illumina and Life technologies.

This high impact can be illustrated by the article metrics which can be taken directly from the publishers’ websites. Our Lappalainen et al publication in Nature has been viewed more than 43 thousand times to date, and has been reported on online more than 140 times. It also has already accumulated many citations, which can be interpreted as a consequence of the early release of the GEUVADIS data one year before the publication was out. The 't Hoen et al publication has been viewed more than 10 thousand times, and disseminated online 112 times.


Impact of our scientific tools on the community

The RNA sequencing GEUVADIS study was a success also in the data publicity perspective. Specifically it served as a successful example of a large collaborative project that produced high quality long RNA-seq and small RNA-seq datasets, made the raw (fastq format), processed (bam format) files publicly available, published all protocols and analysis results, created a user friendly analysis results data browser will impact the future of data deposition.

The GEUVADIS data browser has been visited more than 1000 times during the first two months’ time of its existence.

GEUVADIS datasets from RNA-seq main project have been downloaded around 1000 times since they were released which make them the most popular RNA-seq studies at EBI ArrayExpress archive. This result is particularly significant financially; indeed, a large budget is needed by the downloaders to store this data. The total budget needed to store 1000 times the GEUVADIS data largely exceeds the total GEUVADIS budget. Based notably on this simple observation, we claim that we have managed to generate highly valuable data with a relatively limited funding, complemented by significant investments from the partner institutions.

As described above, we have setup a GEUVADIS European Exome Variant Server (GEEVS).
The GEEVS is a unified database of human genetic variants in patients of European ancestry. We have collected aggregated SNP frequencies from various European regions, which are provided by the members of the GEUVADIS consortium. The database stores aggregated variant frequencies along with allele counts, and functional annotation based on Refseq, Ensembl and UCSC KnownGenes. We have designed a standard (‘best practice’) protocol for variant calling, quality control and variant aggregation across institutes. The data is made available via an easy to use web interface called GEUVADIS European Exome Variant Server (GEEVS) or via download at http://www.geuvadis.org/geevs. The website further allows for data submission from any European institute interested in participating in this effort. In December 2013 data from more than 1000 cases had been aggregated. We foresee that this database will be highly useful for the community, and will significantly impact the way exome based clinical studies will be run in the future.

Impact of the project on society

We see that the adoption of standards for quality control, data analysis, data protection and secure data sharing as the most important GEUVADIS achievement and as essential for the implementation of NGS technology in routine diagnostics and health care. GEUVADIS has played a significant role in the societal debate on the impact and ethical and legal consequences of the introduction of NGS technology, which are essential aspects of its successful implementation. Successful training of biomedical and clinical researchers and clinicians should have helped to ensure the quality of the analysis and interpretation of NGS data.

Indeed, during the course of the project, we have given informative talks to several local secondary education centers. We also held numerous open days in which students were able to see the centers facilities, and talk with GEUVADIS researchers about the general objective of the project, and it’s potential impact on the future of genomic medicine. GEUVADIS activities were also discussed during a number of lectures for biomedical science and medical students, in addition to the numerous Courses and Workshops we organized for students and professionals. Materials for these courses and workshops were always distributed to the participants.
In addition to the dedicated section for the general public in the GEUVADIS website, J. Veltman Wrote a chapter in the book: "Wetenschappelijke doorbraken de klas in" or “Scientific breakthroughs in the classroom!” by Peeters et a. iSBN 978-90-818461-1-0.

The podcast we produced on the GEUVADIS project, its objectives and main activities, accessible at http://www.geuvadis.org/web/geuvadis/podcast has been viewed more than 500 times.

The EU project XploreHealth realized a video about GEUVADIS, entitled: ‘do we speak the same genome’ accessible at: more than 9.000 views in December 2013, more than 10.000 views in February 2014. This video was produced by the Reporters Company http://reporters.com.es/. The company won an excellence price for this video: “Premio a una Obra Periodística sobre Bioética de la Fundación Víctor Grífols i Lucas” or the Price to a journalistic work on bioethics from the Victor Grifols y Lucas Foundation. http://www.fundaciogrifols.org/portal/es/2/24857/ctnt/dD7/_/_/8z74/Premios-y-Becas-sobre-Bio%C3%A9tica-2012-2013.html. Since the video is accessible on main platforms such as youtube and dailymotion, we expect that many viewers will see it after the end of the project.


GEUVADIS after GEUVADIS

The QC guidelines we have produced in the context of the RNA sequencing study will be added to the BBMRI web resource, (http://bbmri.eu/) which is updated regularly.

There is also a strong ongoing collaboration between the GEUVADIS RNAsequencing data analysis team with the NIH Genotype-Tissue Expression (GTEx) program, which studies gene expression in human tissues, providing a greatly complementary approach to the one we adopted in GEUVADIS, focusing on human lymphoblastoid cell lines. It is also the largest publically funded RNA sequencing project in the world. A number of GEUVADIS researchers have had a greatly important role in this project by bringing their expertise gathered in the GEUVADIS effort.

Since we are convinced of the potentially strong impact of the GEEVS on the community, we are planning to apply to specific follow-up funding to be able to maintain it at the CRG and extend it significantly by allowing any researcher to submit its data in the server. In addition, we have ensured a strong integration between the EVA (European Variation Archive) which is a fully funded project from the EBI.

Finally, the results obtained in GEUVADIS have also greatly impacted several participating researchers’ careers. Indeed, two postdoctoral researchers, Tuuli Lappalainen, who drove the RNA sequencing study, as well as Marc Friedlander, who took care of the micro RNA data analysis, managed to obtain an independent positions based in a large part on the results they managed to obtain in GEUVADIS. In addition, several new collaborations in clinical exome sequencing stemmed from the project.

Moreover, several researchers have been called as experts to monitor large scale national and international publically funded large scale sequencing efforts. This is highly significant, since the expertise gathered in GEUVADIS has been recognised by high level decision makers, who are setting the stage for the implementation of clinical genomics at the national, European and international level.


List of Websites:
www.geuvadis.org