Skip to main content

Opening Estonian Genome Project for European Research Area

Final Report Summary - OPENGENE (Opening Estonian genome project for European research area)

Executive summary:

OPENGENE grant no. 245536 (FP7-RegPot-2009-1) awarded to the Estonian Genome Centre of the University of Tartu had a huge impact to the EGC development in 2009-2012. This funding 'instrument' was excellent for the situations where EGC was in: new genome centre with large biobank, but having too few leading scientists, outdated technology platform and not known very much in the main European research centres. But now we could hire several exceptionally good scientists from Cambridge, Oxford, Uppsala, MIT, Helsinki, Singapore, purchased a new Illumina genome analysis platform for genotyping and sequencing and could send our scientists for long- and short-time visits to the leading genome centres in Europe and North America. And finally, by supporting the few conferences and workshops in Estonia on genomics and ethics we could invite more than 50 top scientists all over the world to visit the EGC in Tartu and they all gave a lecture and moderated the workshops. All this has resulted in new research grants from EU, NIH and elsewhere and of course, we were more competitive at the Estonian grant applications. The number of publications and citations rose sharply from 2010 and now we are publishing over 30 papers per year, have applied one patent and started the collaboration with the industry. Moreover, our basic research, genome technology and bioinformatics / statistical competence has extended from the science to the practical medicine in two ways: we are sequencing patients (mostly children) genomes in difficult cases where the cause or mechanism of the disease is not revealed by other diagnostic methods and secondly, we have made preparations for implementation of the precision medicine in Estonia which will use the genomic variants for the individual disease risk and drug response predictions.

This investment to the EGC has made it also sustainable for future. First, EGC was a leading partner in the consortia 'Translational Genomics' funded by Univ. of Tartu developmental fund (EUR 3.5 million for 2012-2016), EGC has two new FP7 grants from 2013, one from NIH and few more are under negotiations.

The large investment from FP7-RegPot was also instrumental that we moved in 2012 into a new laboratory building, which was already designed according to the requirements of the biobank, genome technology core lab. The transition from 2009 to 2012 was profound and in large part it was because the EGC was awarded this grant.

In conclusion, if we look at the Strengths, weaknesses, opportunities, and threats (SWOT) analysis we presented in the applications then it is apparent that we are now much, much stronger, the weaknesses are not there anymore, the opportunities were used in fully (and even more so!) and the threats did not realise at all.

Project context and objectives:

The aim of the project OPENGENE (FP7-REGPOT-2009-1 grant no 245536) was to open the resources of the Estonian Genome Centre, University of Tartu (EGCUT) to the European Research Area (ERA), making it more accessible in terms of upgrading the research infrastructure, promoting further integration into pan-European research and development (R&D) networking and improving research capacity, making it more attractive as a potential partner in scientific research.

To increase the research potential of the EGCUT, three major goals were established.

The first was to purchase an Illumina genotyping system iScan. This instrument is the first in Estonia (and in the Baltic region) and has increased the research potential and competitiveness in the ERA enormously.

The second task was to strengthen human capital in the region by recruiting experienced researchers from abroad and exchanging ideas and researchers at different levels with other European universities, institutes and laboratories. The plan also includes organising three international workshops on genomics, biobanking and related fields, two international conferences on genomics and two conferences in bioethics.

The third task of the proposal was to increase the visibility of the EGCUT among scientists, research partners and the Estonian people.

Description of the work performed and main results

Hereby is provided the brief summary of performed work since the beginning of the project presented by Work package (WP)s:

WP1: We have hired two experienced researchers, one from Cambridge, UK (Medical Research Council) and one from Uppsala University, Sweden, Dept. of Medical Sciences.
WP2: We purchased the Illumina iScan platform and hired a researcher to run the instrument.
WP3: Our scientists have made all together 21 short-term visits to leading research institutions of human genomics in Europe and outside to establish collaboration and introduce the EGCUT and its infrastructure.

The visits were done among others to the University of Geneva Medical School (Switzerland), The Wellcome Trust Sanger Institute (UK), Helmholtz Centre, Munich (Germany) and Erasmus University Medical centre (the Netherlands), McGill University and Genome Quebeck Innovation Centre (Canada).

From the beginning of the project four post-doctoral level researchers and 8 MSc level experienced researchers from the EGCUT have had a long-term visits to TOP centres, like Imperial Collage of London (UK), University of Geneva Medical School and Lausanne University (Switzerland) and Broad Institute (USA) with the aim to strengthen our know-how and tighten the collaboration.

The number of ISI publications that refer to the OPENGENE grant has risen to 34 and we have submitted 34 joint international grant applications, of which 7 have been funded. The new international grants play important role in sustainability and promote further integration into pan-European R&D network.

WP4: Dissemination and promotional activities include being visible at the European level by participating at the meeting of the European Society of Human Genetics (ESHG) with a stand. We have introduced the EGCUT and its resources having a booth at the exhibition of the ESHG meetings in Gothenburg 2010, in Amsterdam 2011 and in Nurnberg in 2013. To promote our activities in Estonia and keep the webpage up-dated, we hired a web-editor / Public relations (PR) specialist.

WP5: We have organised three international practical courses, two international conference on genomics 'GENEFORUM' and two international conferences on ethics. 18 EGCUT scientists and 5 scientists from ethics group have introduced their research results, established new contacts and discussed about collaboration during their visits to international seminars and conferences.

The 3rd OPENGENE workshop held in the recently opened new house of EGCUT in Tartu.

WP6: For the smooth and efficient management of all WPs of the project, one full-time project manager was hired. The international steering committee of OPENGENE project has met 2 times in Estonia and evaluated the project as successful and well-managed.

Project results:

Technology development

Dr Lili Milani, one of the new recruitment from Uppsala University (Sweden) has several years of experience from genome-wide studies on Deoxyribonucleic acid (DNA) methylation in childhood leukemias and pharmacogenetics. During her doctorate (PhD) studies at Uppsala University, she worked closely with the research group of Professor Magnus Ingelman-Sundberg at the Department of Physiology and Pharmacology at Karolinska Institutet, and this collaboration has now been brought to the Estonian Genome Centre. Together with Maxim Ivanov, a post-doctoral researcher in this group, Lili Milani developed a new and improved method for sequence capture of bisulphite treated DNA. DNA methylation is one of the most important epigenetic alterations involved in the control of gene expression, and bisulphate sequencing of genomic DNA is currently the only method to study DNA methylation patterns at single-nucleotide resolution. This was important for the cost-efficient sequencing of genes of interest, in our case genes involved in the absorption, distribution, metabolism and excretion of drugs. Unravelling the genetic and epigenetic variation regulating the activity of these drug transporters and metabolising enzymes is crucial for individualised drug therapy, in order to avoid over treatment and unnecessary side-effects of drugs. The method was published in 'Nucleic Acids Research' at the beginning of 2013.

In addition to the survey of DNA methylation, we implemented a recently published method for the genome-wide analysis of 5-hydroxymethylcytosine (5hmC), which is a novel epigenetic mark in mammalian DNA that plays an important role in the control of gene expression. In collaboration with Chunxiao Song, a PhD student in the research group of Professor Chuan He at the Department of Chemistry, University of Chicago, we implemented their target capture method for the detection of 5hmC in human adult and fetal liver samples. By genome-wide mapping of the distribution of 5hmC in human liver samples by next-generation sequencing we detected significant differences in the fetal and adult livers. In adult livers, 5hmC occupancy was overrepresented in genes involved in active catabolic and metabolic processes, whereas the majority of hydroxymethylated genes in fetal livers were more specific to pathways for differentiation and development. Taken together, these findings suggest that 5hmC has an important role in the development and function of the human liver. Moreover, the extent of hydroxymethylation of genes may potentially explain interindividual differences in hepatic function, particularly differences observed in individual responses to drug metabolism and toxicity as well as in susceptibility to liver diseases. The study is now finalised and ready for submission.

Development of the genotyping and sequencing core facility of the EGCUT

Dr Lili Milani's expertise in sequencing technology also facilitated the development of the genotyping and sequencing core facility of the EGCUT. She built up a strong team of lab technicians and specialists (several with a PhD degree in genetics), for the implementation of the latest technologies for whole genome, whole transcriptome, and focused exome sequencing, as well as methods for the study of epigenetics, such as ChIPseq and whole genome and targeted bisulfite sequencing. Protocols were also developed for the deep sequencing of PCR products, increasing the speed and reducing the costs of traditional Sanger sequencing.

The HiScanSQ also allows the analysis of gene expression and DNA methylation using microarrays, which are excellent for the study of large numbers of samples. In collaboration with the group of Professor Pärt Peterson at the Department of General and Molecular Pathology, Faculty of Medicine, University of Tartu, we designed a study for the genome-wide analysis of the regulation of gene expression in immune cells. Human blood contains various subtypes of immune cells with different gene expression profiles and functional roles. Monocytes together with T and B-lymphocytes orchestrate the majority of immune regulation and are implicated in many pathological and age-related changes of the immune system. Within our project we focused on the analysis of the expression and epigenomic changes of three peripheral blood subpopulations - CD14+ monocytes, and CD4+ and CD8+ T-cells. Monocytes represent approximately 2-8 % of human peripheral blood leukocytes - they are needed to replenish tissue macrophages and dendritic cells. CD4+ cells (4-20 % of leukocytes) function as regulators of cell-mediated immune response and CD8+ cells (2-11 % of leukocytes) have cytolytic function to kill virus-infected cells. Using AutoMACS equipment and magnetic beads, prof. Peterson's group purified three cell types: CD14+ monocytes, CD4+ and CD8+ T-cells from over 500 recontacted individuals from the Estonian biobank. We selected 313 of the most informative individuals for genome-wide Ribonucleic acid (RNA) expression profiling, using microarrays from Illumina. In addition, we further selected a subset of 100 donors (the 50 youngest and 50 oldest individuals) for the genome-wide analysis of DNA methylation status of 450 000 CpG sites (Illumina Infinum 450K CpG Methylation Array). The collected data characterises the genomic CpG methylation with gene expression in two T-cell subsets. The data are currently under bioinformatic analysis together with expression and methylation analysis from whole blood samples and will be further combined with other datasets (SNP information, metabolomic studies, clinical laboratory information). The initial methylation analysis for CD4+ and CD8+ T-cells and for whole blood samples revealed genes that are clearly methylated with age, and genes that lose methylation during ageing. The highest number of differentially methylated sites was found in CD8+ cells, which is the cell population in charge of our adaptive immunity, i.e. the most exposed cell type. The epigenetic markers for ageing are excellent candidates for further study; we are currently exploring the effects of different behavioural phenotypes in the methylation of these genes, in order to identify the effects of smoking or high BMI on our 'molecular age'. A manuscript summarising these results is currently being written.

In addition to the study focusing on immune cells, we also carried out a pilot study of 96 well selected asthma cases and 96 age, sex and BMI matched controls from the Estonian Biobank. We sought to determine differentially methylated genes in whole blood, using the 450K methylation arrays from Illumina. Although the analysis yielded several interesting candidate genes (incl. TRAPPC9, IGF1R and DICER1), none of the differences were significant after adjustment for multiple testing. Our main conclusion was that there is too much noise in whole blood to identify specific methylation effects, either much larger sample sizes are needed, or purified blood cell populations. This dataset is still under analysis, testing different algorithms for the correction of proportions of different cell populations using DNA methylation levels of specific CpG sites.

Genotyping and sequencing service

In addition to our own projects, the high throughput HiScanSQ instrument has been available for projects of many other laboratories at the University of Tartu, and even for other universities from nearby countries. We have, for example, provided SNP genotyping service for groups at the Department of Biotechnology, University of Tartu (prof. Maris Laan, prof. Ants Kurg); the Margarete Fischer-Bosch Institute of Clinical Pharmacology, Stuttgard, Germany (prof. Matthias Schwab); the National Institute for Health and Welfare, Finland (prof. Markus Perola); and the Department of Psychiatry, University of Helsinki, Finland (prof. Tiina Paunio) and many other laboratories in Germany, Denmark, UK, The Netherlands, Latvia, Lithuania, Hungary, Czech Rebublic, and Russia. The Human 450K DNA methylation arrays from Illumina have also been ordered by some of the groups mentioned above. And RNA expression analyses on Human HT-12 v4 microarrays have also been ordered from laboratories at the University of Tartu and a few groups in Finland. In cases of very severe phenotypes exome sequencing has become very popular, as it is only a fraction of the cost of the sequencing of an entire genome, but the causative mutations are very likely to be found in the protein coding regions of genes. Professor Ants Kurg from the Department of Biotechnology, University of Tartu, has sequenced over 24 whole exomes at our core facility, within the EU project CHERISH, focusing on children with mental retardation. For Professor Pärt Peterson from the Department of General and Molecular Pathology, University of Tartu, we also sequenced the entire exomes of two immunologically well characterised families, with children suffering from chronic mucocutaneous candidiosis. We were able to identify the causing mutations, which were de novo mutations, not present in the genes of the parents. As a next step we are designing ChIPseq experiments in order to reveal how the mutated transcription factor affects the downstream expression of its target genes.

We have also been able to provide sequencing service for groups from other departments at the University of Tartu, for example scientists from the Institute of Ecology and Earth Sciences have sequenced the spores from a mycorrhizal fungus (Dr Maarja Öpik, Prof. Martin Zobel). We have also aided Dr Eve Vedler and Prof. Ain Heinaru, from the Department of Genetics, to sequence the bacterial genomes of different Pseudomonas species and biotypes, and Prof. Andres Merits to sequence several viral genomes at our facility. In addition to the projects where we have been involved in the experimental design, we have also provided sequencing service only to others who have been confident enough to carry out the sample preparation on their own. The flexibility of the sequencing technology is of tremendous value to all researchers in the fields of genomics, transcriptomics, and epigenomics of virtually any living organism. And having the equipment available in Tartu has not only raised the quality of our own research and expanded our knowledge, but also been of assistance to many other research groups here, and has definitely served as a boost for research in our region.

Participation in international consortia for genomics

1. GIANT consortium: The scientists of the EGC under the leadership of Dr Krista Fischer, who is the head of statistics and was recruited from the University of Cambridge, have actively participated in the activities of the WAIST group of the consortium. The first stage (2010) involved genome-wide association analysis of the EGCUT data (that became possible due to genotyping more than 3 000 samples with genome-wide arrays and 2600 samples with Cardio-Metabochip array). The second stage (2011-2012) involved participation in meta-analysis of the data. As the consortium is large and there were many cohorts involved, the meta-analysis project became quite complex and lengthy, where her statistical expertise was very useful. The publications in printed press are at the final pre-submission stage, but we have also co-authored several presentations at large international meetings.
In addition to GIANT Waist subproject she also provided input to GIANT BODY SHAPE project, looking for genetic markers that are associated with main body shape parameters.

2. ENGAGE consortium: Mendelian randomisation (since 2011) and pleiotropy (2012) projects. These are the projects where her expertise in statistics and causal inference was very important and highly regarded.

a) The large-scale Mendelian randomisation project uses the body-mass related genotypes (at the first stage FTO only; at second stage a genetic multi-marker score) to estimate the causal effects of obesity on a large number of disease traits. The analysis is done in more than 20 cohorts, combining the results afterwards in the meta-analysis, so that the final sample size exceeds 120 000. Methodologically this is a challenging project, as the Instrumental Variables estimation techniques are non-standard methods that require specific assumptions even when applied in a single cohort. An additional challenge is the use of meta-analysis in such projects. The main paper is submitted to PLOS Medicine, but the project created several side-projects. Prof. Metspalu presented several methodological findings and discussions at the Internationa Biometric Society conference in Kobe, Japan (August 2012). Also the poster at the meeting of the International Society for Human Genetics in Montreal, 2011, was on the Mendelian randomisation-related topics, indicating differences and similarities between causal analysis of classical randomised controlled trials and Mendelian randomisation studies. In addition, Dr Krista Fischer has co-authored several specific project-related posters and presentations and articles in peer review journals.

b) The pleiotropy project also uses novel statistical methodology to identify the set of possibly related traits that are regulated by the same genetic mechanisms. The idea is to use statistical models where genotype takes the role of the outcome variable and the traits of interest take the roles of independent explanatory variables. If the genotype has an effect on a certain set of traits, the same set of traits would be selected as the most optimal model for the genotype, using the Bayesian information criteria for model-selection. The main traits of interest are the ones related to metabolic syndrome: body mass and obesity, systolic blood pressure, lipid levels, as well as cardio-metabolic diseases. She proposed ideas to simplify the analysis, as well as to do a more efficient meta-analysis. The initial, pilot phase of the study was successfully completed and the results were presented at the American Society for Human Genetics meeting in November 2012 (San Francisco).

3. Other international consortia

a) CHARGE consortium: Genome-wide association study (GWAS) for caffeine intake- related traits (with Dr Maryilin Cornelis). Dr Krista Fischer has participated in the local analysis of the EGCUT data at both, discovery and replication stage. There are ongoing discussions on further collaboration.
b) GWAS for educational attainment (in collaboration with Prof. C. Rietveld et al.)
c) GWAS for gene-environment interactions related to smoking and obesity parameters
d) Different projects related to Telomere length measurements (Telomere length has been determined for circa 5 200 EGCUT samples at the University of Leicester, UK), in collaboration with Dr Veryan Codd and her group at Leicester, in the framework of ENGAGE consortium: GWAS (discovery+replication stage), metabolome-wide association study, association of known genetic markers for telomere length with mortality. There are more Telomere-length related studies at the planning stage.

4. Independent projects initiated in the EGCUT

a) Nuclear magnetic resonance (NMR) metabolite concentrations and their association with mortality: About 10 000 EGCUT plasma samples were analysed in Finland, using the NMR technology to estimate concentrations of 112 plasma metabolites (the data was obtained in 2011-2012). The set of metabolites includes a number of lipoprotein measurements (VLDL, LDL and HDL particles, classified to different subtypes and sizes), amino acids, proteins and other low molecular weight particles. As the EGCUT cohort is regularly linked to the Estonian cause-specific mortality registry as well as to the Estonian population registry, up-to-date mortality data is obtained at least twice a year. As the NMR metabolites were analysed for a random subset of the cohort, it became possible to analyse, whether some of the metabolite measurements can be used as biomarkers to identify high-risk individuals. Four plasma metabolites appeared to have highly significant effect on mortality, making it possible to form a risk score that identifies a subset where the risk for all-cause mortality was more than 10 times higher than in the rest of the cohort. In collaboration with colleagues from the Finnish Institute for Molecular Medicine (FIMM), we were able to replicate the effects in the independent FinRisk97 cohort. The manuscript based on these findings has been submitted to high-impact medical journal.

b) NMR metabolites and other phenotypic traits: There are several ongoing projects related to NMR metabolites, in collaboration with colleagues from the University of Helsinki, Finland: association between plasma metabolites and nutrition; association analysis of NMR metabolites and time of menopause (includes assessment of general age effects).

GWAS

This discovery and hypothesis generation approach to human genomics has been one of the most successful research activities in human genomics world-wide in the last 8 years! OPENGENE grant enabled the Estonian Genome Centre to update the genotyping facility by obtaining the new Illumina platform. It was a timely and necessarily addition to the facility as the old scanner had 6 times less capacity. By then, Illumina had announced a new generation of genotyping arrays, the Omni family. The new array design had a tenser bead positioning and required a laser with much higher scanning resolution - this meant that the previous version of the Illumina scanner, BeadScanner, could not be used any more. So far the genome-wide genotyped cohort reached 2 000 Estonian Genome Centre samples and without the technological updating, no more samples could have been genotyped. As most of the genetic risk factors have small effect sizes and reasonably large samples sizes are needed to detect them, the full potential of the Estonian Genome Centre in understanding the genetic architecture of the common complex diseases.

The iScan enables to process also new generation gene expression and DNA methylation beadchips, array formats, which are unreadable to the previous version of the Illumina scanner. Those types of analyses are required estimate the direct effects of DNA sequence variation on downstream gene functional and through that, identify causal gene and underlying biological pathways, which malfunction leads to the development of a disease. DNA methylation profiling enables to study the epigenetic marks, which are inherited but not encode into the raw DNA sequence and provides further insight into disease etiology. Mentioned to research topic are currently heavily studied in the field.

From 2009 almost 20 000 Estonian Genome Centre biobank samples have been genotyped with genome-wide arrays, including the custom made Illumina platforms - Cardio-MetaboChip, ImmunoChip and ExomeChip. These custom made arrays enable to ask fundamental questions about genetic risk factors in metabolic and immunological diseases and in particular, to evaluate the role of low frequency gene DNA sequence variants. We have also analysed the expression profiles for close to 2 000 samples, DNA methylation patterns for 400 samples and carried out a detailed analysis with both platforms on 600 fractionated blood cell samples (whole blood, CD4+, CD8+, Monocytes).

The increasing amount of genomic variation data provided the necessary start-point to obtain the knowledge about the analytical frameworks that are applied in the field of genomics of complex diseases and traits. The number of genotyped samples has made the Estonian Genome Centre a valued member of various International Genetics Consortia (like ENGAGE, CHARGE, GIANT, International Blood Pressure Consortium, Global Lipids Consortium, if just to mention some. The latest addition is International Psychiatric Consortium). Over the years Estonian Genome Centre has been (and still is) part of more than 100 active research projects and our researches have been co-authors for more than 100 published papers on GWAS and related fields. This research has been extremely fruitful as more than 1 600 DNA sequence variants have been identified as a modulating factor for more than 100 physiological traits and diseases.

In recent years the research focus has shifted from studying the disease phenotypes, like diabetes; cardiovascular disease; hypertension; hyperlipidemia etc., to research on the intermediate phenotypes (called also endophenotypes), like insulin and glucose levels; hematological parameters, blood pressure; lipids particle profiles. This approach has been proven to be successful in identifying genetic variants that increase the genetic predisposition to a disease. Even more, while identifying the causal biological pathways and malfunctioning regulatory mechanism, it may be possible to prevent the disease by counselling and change the lifestyle. A subset of Estonian Genome Centre samples have an extensive set of endophenotype profiles – the deep-going phenotype data in combination with the high-density genotype and whole exome and whole genome sequencing data, made it possible to participate in the following research. Estonian Genome Centre has been part of a series of genome-wide association analysis on endophenotypes: Blood platelets characteristics - Gieger et al (Nature 2011); Red Blood Cell characteristics - van der Harst et al (Nature 2012); Blood Pressure - Wain et al (Nature Genetics 2011); Kidney function and creatine levels - Pattaro et al (Plos Genet 2012) and Köttgen et al (Nature Genetics 2012); Glycaemic traits - Scott et al (Nature Genetics 2012). These studies illustrate that detection of DNA sequence variants that contribute to the normal variability in healthy individuals can help to understand the etiology of the disease. As an example - very often the genes that are known to cause severe familial birth-defects also harbour common variants of small effect that increase the predisposition to the particular more common, but less severe form of disease.

A substantial part of the heritability of diseases and traits has remained unexplained, although large samples (up to tens and hundreds of thousands) have been studied. It has been argued that we need more subjects under the study and DNA sequence variants with low allele frequency could reveal most of the genetic variants associated with the disease. The EGCUT samples have also been included in the analyses of rare forms of the BMI. In the papers of Walters et. al. (Nature 2010) and Jaqcuemont et. al. (Nature 2011) structural variation in the region on the short arm of chromosome 16 has been shown to have a minor effect on the adult BMI. If a fragment of genetic material from that region is lost, the individual is prone to be extremely obese (BMI > 35) and a duplication of that region causes susceptibility to clinical underweight (BMI < 18). Previously this deletion was identified in patients with mental retardation but it was also present in the general population, although the carriers had modest education levels and had difficulties with self-care. The genetic cause of the weight problem is known for five individuals (out of 8 000) of the EGCUT biobank and the database.

Ongoing research

Several of the ongoing genome-wide association analysis projects are still ongoing. One of the reasons is the fact that turnaround time from launching to publishing is roughly two years. Secondly, while Estonian Genome Centre is now active member of many GENETIC EPIDEMIOLOGY CONSORTIA, new projects are discussed and launched biweekly bases. Using two ongoing studies on human stature and schizophrenia, we would like to illustrate the power analysing a large sample and thus emphasise the need to genotype even more samples. In case of height, the discovery sample for genome-wide association analyses in current phase is reaching 250 000 samples (including 12 000 Estonian Genome Centre samples). By meta-analysing the single cohort specific genetic effects close to 700 sequence variants have been identified. This data has enabled to reach several important conclusions: (i) there can be up to 9 haplotypes in a genomic region which independently have a modulating effect on the trait, (ii) more genomic regions enable more precisely inferred the underlying biological circuits. Similar example can be given for schizophrenia - a disease for which close to ten genetic risk variants have been robustly associated. The lack of success has been previously explained by difference in genetic architecture but also with extreme heterogeneity of the symptoms. In the current round of discovery analysis pooled more than 35 000 cases and more than 80,000 healthy controls (including 270 and 2 500 Estonian Genome Centre samples respectively. The combined meta-analysis identified more than 80 independent sequence variants and prioritised more than 70 genes for downstream functional research. This study will provide a useful insight into the disease etiology and hopefully to the production of more efficient drugs.

Short and long term scientific visits

The scientists of the EGC visited many top-level genome centres of the Europe and North America and detailed report of these visits are given. Here we would like to demonstrate the value of these visits to the young graduate student who became internationally known genome scientist and obtained his PhD degree on the studies he performed at the EGC. Of course we had several people like him, but we have to say - he was the most talented. Dr Tõnu Esko performed a series of research visits while he was a PhD student at the Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, but because of his exceptionally good and productive master studies previously, he was recruited as a research specialist at Estonian Genome Centre.

Several excellent examples can be given how the OPENGENE grant enabled successful collaboration and opening Estonian Genome Centre to European researches. As a first one Tõnu Esko used the OPENGENE grant money to attend at the European Society of Human Genetics 2011 conference in Amsterdam. He was invited to give an oral presentation about the genetics of height. During the conference he meet with Dr Lude Franke from University Medical Centre Gröningen, Netherlands, because L. Franke liked the talk and a new approach he was presenting. They agreed that Tõnu Esko will join Dr Franke research group to learn the statistical and computational framework to study how DNA sequence affect gene expression. As for 1 000 Estonian Genome Centre samples had available both high-density genotype data and whole genome gene expression profiles, it was agreed that Estonian data will be joined with Gröningen data to achieve the necessary statistical power. T. Esko spent three months in University Medical Centre Grönigen (from January to April 2012) as a visiting researcher at both Dr Lude Frnke's and Prof Cisca Wijmenga computational biology and molecular genetics laboratory. On top of the research collaboration he also gave a talk about Estonian Biobank at a joint seminar with Department of Genetic Epidemiology (organised by Prof. Harold Snider, currently a part time professor at the EGC in Tartu) of close to 100 researches. Estonian Biobank has now several ongoing projects with Dr Franke and Prof Wijmenga. The computational tools developed by Dr Franke have been used to conduct largest to date expression quantitative trait mapping through meta-analysing samples from 9 cohorts and up to 5 500 samples. During his stay T. Esko performed bioinformatical look-ups and contributed into writing the paper (manuscript was accepted recently to Nature Genetics). This work resulted with next invited oral presentation at European Society of Human Genetics 2012 at Nurnberg, a published research article at Plos Genetics (Magadi et al) and several manuscripts are in preparation (Pers et al; Westra et al).

As a second example, Dr Esko attended the American Society of Human Genetics 2011 meeting in Montreal. While the main aim of the travel was to meet with the researches at the McGill University (Prof Knoppers was the main person developing the content of the Estonian Genome Centre's informed consent and the Human Genes Research Act in 1999), Dr Esko had also a poster presentation at the conference and he met with several collaborators from Europe and United States. He and Dr Juan Ramon Gonzalez from the Centre for research in environmental epidemiology, Barcelona, Spain, agreed that Dr Esko will spend some time in his and his close collaborator Prof Luis Perez-Jurado research laboratory some time to tighten the research collaboration on complex structural variants, like copy number changes and inversions, in human genome. Dr Esko spent three weeks (in July 2012) at the Centre for research in environmental epidemiology.

His visit resulted with several research projects between Estonian Biobank and Dr Gonzalez and Prof Perez-Jurado. The computational tools developed by Dr Gonzalez have been applied to the genotype information of Estonian Biobank to study structural variants in general population. So far the collaborative research has been focused on mosaic copy number changes in whole blood (a situation when a part or the whole chromosome has different copy number from two and this aberration is present only a fraction of analysed blood cells) DNA genotype data. Several of these findings are being currently validated with molecular genetics tools in Prof Perez-Jurado laboratory. Their aim was to estimate event effect to normal gene expression levels and predisposition to various diseases (two manuscripts in preparation). Dr Gonzalez is currently developing tools to detect these events in High-throughput sequencing datasets. Also Dr Gonzalez has submitted as European Research Council (ERC) starting grant, which was heavily discussed during Dr Esko's visit as Estonian Biobank will be the main collaborator and data provider. A further project was carried out to estimate the presence of genomic inversions in general population. These analyses were again performed using Dr Gonzalez tools. An inversion was identified which causes predisposition to asthma in obese (Gonzalez et al - manuscript submitted). The inversions were the predicted on the genome wide scale and Estonian Genome Centre gene expression data was used to prioritise bioinfrmatical predictions for wet lab validation (currently performed by Prof Perez-Jurado) (manuscript in preparation).

As a last example, to illustrate the value of OPENGENE grant, Dr Esko met with Prof Joel Hirschhorn from Children's Hospital Boston, Harvard Medical School and Broad Institute at the American Society of Human Genetics 2011 meeting at Montreal. They agreed that Dr Esko will start as a post-doctoral fellow at Prof Hirschhorn's research group at Boston after finishing his PhD studies at University of Tartu. It was agreed that a joint NIH R1 grant will be applied. Dr Esko and Prof Mestpalu obtained the approval from the Ethics Board of University of Tartu and also from Estonian Government to transfer 6 000 Estonian Genome Centre DNA samples to Boston. The purpose of the last OPENGENE scholarship was to fund the postdoctoral studies of Dr Esko at Children's Hospital Boston / Broad Institute of Harvard and MIT, where he started as a research fellow in Prof Joel Hirschhorn's laboratory. The OPENGENE grants only funds the starting period and from 1 January 2013 he will be funded by the NIH R1 grant (Prof Hirschhorn). Dr Esko has started with good position as his abstract was accepted as a platform presentation at the American Society of Human Genetics 2012, San Francisco (manuscript in preparation). The main focus of Dr Esko's research will be studying the population extremes of human stature and body mass index.

He will also coordinate the collaboration between Estonian Genome Centre and Children's Hospital Boston. So far 2000 DNA's have been transferred to Boston, these are genotyped and data are back in the EGC database. Soon the rest of the 4 000 will follow the same route. All the samples will be genotyped with the Illumina ExomeChip and followed up with genome sequencing. These experiments enable us to evaluate how large proportion of the trait variability is explained by the rare DNA sequence variants in coding regions. Of course not all scientists are so active and talented, but we have to say the visits either long or short were very useful and secured EGC position in the network of the leading genome centres in Europe and also in North-America.

Potential impact:

Upgrading the gene analysis technology platform next to the new scientists was one of the most crucial steps forward ensuring the participation in the GWAS discovery phase of the global human genome research.

The acquirement of the HiScanSQ (genotyping and second generation sequencing) instrument, which was central in the OPENGENE project, brought the technology core of the Estonian Genome Centre of the University of Tartu (EGCUT) to the same level as the most modern genomics research centres in the rest of Europe. The rapid training of the laboratory staff of the EGCUT allowed us to implement the latest technologies in the field of genomics for the genotyping and sequencing of both disease specific cohorts and healthy control samples from the general population, all selected from the Estonian Biobank. With the use of an autoloader and microarray staining robot, financed by other grants, we were able to genotype close to 20 000 individual samples during the duration of the OPENGENE project. The generated data, in combination with the rich phenotypic information available for each subject, allowed the EGCUT to participate in numerous meta-analysis studies on many different traits, published in high-impact journals (described in detail below). In addition, we genotyped samples for many collaborators of the different FP7 projects (ENGAGE) and others, demonstrating the quality of the work and trust in our sample and data handling and analysis.

To Conclude, in these three years a small start-up institute with a biobank has been transformed to the internationally very active, highly regarded and productive genome centre. OPENGENE grant, in Estonian terms it was quite substantial, was as the 'core' funding, which was taken in the country as a serious research institution with European quality and further investments (we still had to write grant applications, but we could demonstrate the better track record) followed, especially when we could hire our top researchers from abroad.

This past 3 years have made us stronger, we can do well on the international grant market, but also we can reach out to the Estonian medical system by offering new, genome based approached to the health care during the new funding period - 'Horizon 2020'. We are prepared to this!

Project website: http://www.biobank.ee