Skip to main content
European Commission logo print header

Identification of epigenetic markers underlying increased risk of T2D in South Asians

Final Report Summary - EPI-MIGRANT (Identification of epigenetic markers underlying increased risk of T2D in South Asians)

Executive Summary:
Type-2 diabetes (T2D) is a major public health problem in all regions of the world, particularly amongst rapidly urbanising countries such as India [1, 2]. South Asians (people originating from India, Pakistan, Bangladesh or Sri Lanka), who comprise one-quarter of the world’s population, are at up to 4-fold higher risk of T2D compared to North Americans and Europeans [2-4]. The IDF predicts that T2D will affect more than 100 million people in India alone by 2030 [1]. Improved understanding of the mechanisms underlying the high incidence of T2D amongst South Asians is urgently needed to stem the current epidemic of T2D in this population.
DNA methylation at CpG sites (cytosine-guanine nucleotide pairs) influences gene expression, cellular differentiation and molecular response to environmental stressors [5-8]. Our general aim in this proposal is to identify novel epigenetic markers underlying increased risk of T2D in South Asians. We carried out epigenome-wide association using the Infinium HumanMethylation 450K BeadChip (Illumina) in the baseline peripheral venous blood samples from a prospective, nested case-control study of 1,074 South Asians with incident T2D and 1,590 controls. Replication testing of 7 top-ranking signals was done in 1,141 Europeans (377 with T2D). We also separately tested the association of the DNA methylation markers with prevalent T2D amongst 1,720 South Asians (647 with T2D). We used targeted resequencing to fine-map the methylation markers. We investigated the association of methylation with genetic variants to explore possible pathways linking methylation with metabolic phenotypes. Finally, we will examine the extent to which each of the risk factors identified account for prevalent T2D among South Asians, in different regions worldwide.
We first developed a comprehensive analysis pipeline for data analysis using the Illumina Infinium HumanMethylation450 BeadChip [9]. We develop and validate new approaches to quality control, data normalisation and batch correction through control-probe adjustment, and demonstrate that these improve data-quality. The analysis pipeline presented outperforms existing approaches, enabling accurate identification of methylation quantitative trait loci for follow-up experiments.
We show that methylation markers at genetic 5 loci are associated with future T2D amongst South Asians (relative risks top vs bottom quartile: 1.58 to 2.41 P=3.7x10-5 to P=1.2x10-14). A methylation score integrating results from the 5 loci is highly predictive of future T2D (relative risk top vs bottom quartile: 3.51 95%CI 2.79-4.42 P=1.3x10-26). Results replicate amongst Europeans. We show that a DNA methylation score identifies an almost four-fold higher risk of future T2D between upper and lower quartiles. We show that DNA methylation improves prediction of future T2D compared to traditional measures, as estimated by area under the curve and positive predictive value. The ability of DNA methylation to discriminate risk of T2D is particularly striking in obese, normoglycaemic South Asians, amongst whom methylation enables identification of a subset of obese individuals with high (>20%) or low (<5%) incidence of T2D during follow-up. We separately show that methylation is unfavourable amongst South Asians compared to Europeans. The differences we observe in methylation at the five loci identify a large proportion of the unexplained increased risk of T2D amongst South Asians compared to Europeans.
Results of the EpiMigrant study enable development of a predictive panel of lifestyle, genetic, environmental, and epigenetic markers underlying susceptibility to diabetes in South Asians. DNA methylation may therefore help identify people who may benefit from early pharmacologic or lifestyle interventions, findings of huge potential translational importance for risk stratification and personalised medicine.

Project Context and Objectives:
Importance of Type-2 diabetes (T2D) amongst South Asians
This project focuses on understanding the lifestyle, environmental, genetic and epigenetic factors underlying the increased risk of type-2 diabetes (T2D) among South Asians, who constitute ~1/4 of people with T2D worldwide. India alone has ~50 million people with T2D, more than any other nation. Conservative estimates based on population growth, aging and rates of urbanisation show that T2D cases in India will rise to ~80 million by 2030 [10]. T2D prevalence is currently ~9% in rural India, ~18% in urban India [11], and ~22% amongst Indians living in Europe (compared to ~6% among indigenous Europeans) [12]. Similar patterns are observed among South Asians in Pakistan, Bangladesh, and Sri Lanka [13]. Though T2D among South Asians was considered to be more prevalent among the more affluent, recent data show that T2D rates are rapidly rising among low and middle income South Asians, who are also more susceptible to T2D complications, due to reduced access to quality health care in these settings [14].
The economic burden of T2D among South Asians makes this an important global clinical and public health challenge. Economic disparities, scarcity of adequate health-care, and low education status in present a major obstacle in reducing the burden of T2D in South Asian countries. Cost of care increases substantially with complications of T2D, or when admission to hospital, surgery, or insulin are needed. T2D will also continue to place a heavy burden in the health expenditures of European countries to which South Asians have emigrated in large numbers [15].

Reasons underlying increased T2D in South Asians
Known environmental and genetic factors only account for a small part of the increased risk T2D among South Asians [4, 12, 16-22]. Our data from studies in migrant (UK) and non-migrant (India) South Asians, of similar genetic background, show that obesity and physical inactivity do not account for the increased risk of T2D. We compared risk factors in migrant South Asians (UK) with non-migrant rural South Asians (India). Participants were assessed by the same research team, using identical equipment and protocols, with biochemical measures in a single core laboratory. We found that T2D risk, insulin resistance and related metabolic disturbances were higher amongst migrant South Asians, compared to non-migrants (18.6% vs 4.3%). T2D risk, insulin, and glucose are higher in migrant, compared to non-migrant, South Asians at all levels of obesity (BMI and WHR) and physical activity, indicating that the increased T2D amongst migrant South Asians is not explained by obesity or physical inactivity.
The London Life Science Population (LOLIPOP) Study is a population based cohort of ~18,606 South Asians and 9,766 European white men and women, aged 35-75 years recruited from the lists of 58 general practitioners in West London [23-25]. Our data show a 3.4 fold (95% CI 3.2-3.6) higher prevalence of T2D amongst South Asians compared to Europeans. Increased rates of T2D are present at every age–group, and between both genders amongst Indian Asian compared to European whites. The prevalence of T2D is higher among South Asians than Europeans at each level of central and generalised obesity, and physical activity. After adjustment for BMI, waist-hip ratio and physical activity there remains a striking 3.2 fold (95% CI 3.1-3.4) excess of T2D in Asians, compared to Europeans. Thus, environmental factors alone do not appear contribute to the excess of T2D among South Asians.
We have also tested the contribution of known genetic variants to the increased risk of T2D in South Asians. We find that the genetic variants reported to be associated with T2D in genome-wide association studies fail to account for the increased risk of T2D in South Asians, compared to Europeans (OR for T2D in South Asians vs Europeans: 3.1 [95% CI 3.0-3.3] after adjustment for all SNPs tested. We have carried out the well-powered genome-wide association study for T2D amongst South Asians, to identify common genetic variants underlying the increased risk of T2D in this group [4]. We find common genetic variants at five novel loci (GRB14, VPS26A, HMG20A, AP3S2 and HNF4A) were associated with T2D (P<5x10-8); SNPs at GRB14 and HNF4A were also associated with insulin sensitivity and pancreatic beta-cell function respectively. Though novel, the genetic loci identified are associated with risk of T2D amongst both South Asians and Europeans, are cosmopolitan and shared, and thus too do not explain the high risk of T2D in Asians.

Epigenetic modification of gene expression underlying T2D
Epigenetic regulation refers to heritable changes in gene expression and phenotype that are not determined by changes in DNA sequence [26]. DNA methylation at cytosine residues in CpG sites is the best understood of the mechanisms for epigenetic regulation of gene expression [27, 28]. DNA Methylation patterns are established and modified by specific DNA methyltransferases, such as DNMT1 which transfers patterns of methylation to a newly synthesized strand after DNA replication [29]. Epigenetic changes are thus preserved during somatic cell division, and may also be transmitted from the parental germline to the offspring. Transgenerational epigenetic inheritance is well documented in a wide range of organisms, including prokaryotes, plants, and animals, and provides a mechanism for evolution and adaptation [30].
Recent studies suggest an important role for epigenetic modification in the aetiology of T2D and related metabolic disorders [31, 32]. Dietary manipulation in Avy mice influences agouti gene expression, fur colour, weight, and propensity to develop obesity, diabetes and cancer, providing evidence of epigenetic modification in response to environmental exposure [33]. Both maternal under nutrition and low birth-weight are associated with increased risk of T2D in the off-spring [34-36]. In rats, maternal protein restriction leads to impaired glucose tolerance and insulin resistance in the adult offspring, accompanied by changes in expression for genes involved in insulin-signalling [37]. Experimental intrauterine growth restriction in rats is accompanied by widespread alterations of DNA methylation in pancreatic islets and an increased risk of T2D, and in humans intrauterine growth restriction is associated with changes in DNA methylation at the HNF4A gene locus, a known T2D susceptibility locus [38]. Data from the Dutch Hunger Winter Families study, shows that individuals exposed to famine in utero during World War II have less methylation of the IGF2 gene in blood DNA and tended to develop obesity later in life at higher rates compared to unexposed same-sex siblings [39]. In rodent models of low birth weight, the increased risk of T2D is passed on to the second generation providing evidence for transgenerational inheritance of disease risk. Consistent with this hypothesis, risk of T2D is increased amongst the second generation offspring of people exposed during to famine as children [40]. Together, these studies provide strong evidence suggest that epigenetic modifications, arising from in utero and transgenerational exposure to a poor nutritional environment, predispose to the development of T2D in adult life. The epigenetic markers which identify susceptibility to T2D remain to be determined.

Evidence to support epigenetic modification in South Asians.
Birth cohort studies show that maternal undernutrition, low birth-weight, and rapid postnatal child growth are all associated with increased risk of T2D in the off-spring [35]. These risks appear to be mediated through epigenetic modification and may be trans-generational.
More than 96 per cent of low birth-weight occurs in the developing world, with highest incidence in South Asians (31% of live births) [41]. Results of The Pune Maternal Nutrition Study, a prospective population-based observational study of rural South Asian women and their offspring, show that Asian mothers are shorter (mean 152cm), thinner (mean BMI 18.1kg.m2) than their European counterparts, and that full-term South Asian neonates are ~700g lighter (~2SD) than the average European [35]. These neonatal differences are accompanied by changes in body composition with greater reduction in lean, muscle tissue than truncal fat. Birth cohort studies in India confirm the association of maternal undernutrition, low-birth weight and thinness in infancy with later development of impaired glucose tolerance and T2D [35]. Risk of impaired glucose tolerance and T2D is highest amongst South Asians who had shorter mothers, parents with lower BMI, low birth weight, thinness in infancy and greater weight gain during childhood and adolescence, independent of their adult BMI. The risk of IGT / T2D is increased x6 amongst South Asians who were in the lowest ~1/3 of BMI as children but progressed to be in ~the highest 1/3 of BMI as adults, compared with adults who had high BMI as children, but became thin adults [42]. These observations of adverse intrauterine and postnatal exposures in South Asians raise the possibility that epigenetic modification of gene expression may contribute to age-dependent changes in key metabolic genes, and increased susceptibility to T2D.

Aims and objectives
Our general aim in this proposal is to identify novel epigenetic markers underlying increased risk of T2D in South Asians. We propose carrying out epigenome scans in South Asian T2D cases and controls [23-25], followed by replication testing of top ranking epigenetic markers in non-migrant and migrant South Asians. We will then develop a predictive panel of lifestyle, genetic, environmental, and epigenetic markers underlying susceptibility to T2D in South Asians. Finally, we will examine the extent to which each of the risk factors identified account for prevalent T2D among South Asians, in different regions worldwide.

Our specific aims are:
1. Carry out epigenome-wide scans to identify markers associated with T2D amongst South Asians.
2. Replicate the top-ranking epigenetic markers for association with T2D in independent cohorts of non-migrant and migrant South Asian T2D cases and controls.
3. Develop and validate a set of lifestyle, environmental, genetic, and epigenetic risk factors predictive of incident T2D risk in South Asians.
4. Evaluate the contribution of lifestyle, environmental, genetic, and epigenetic risk factors to T2D risk among South Asians, in different regions of the world.

EpiMigrant study design and work packages
The study design was delivered through 5 work packages organised in interlinked groups:

Coordination and management
WP1 dealt with coordination and management of the project, including i. periodic and final reporting to the European Commission, ii communication between participants, and iii creation of the study website.

Data generation – measurement of epigenetic markers
WP2 and WP3 housed data generation, through measurement of epigenetic markers amongst South Asian T2D cases and controls. WP2 used state of the art Illumina 450K Infinium Methylation BeadChips to measure DNA methylation at CpG sites across the whole genome in. WP3 carried out replication testing of the strongest methylation signals (from WP2) in further samples of non-migrant and migrant South Asians.).

Data analysis to identify epigenetic markers associated with T2D
WP4 housed QC and analysis of the DNA methylation measurements, to identify the epigenetic marks associated with T2D. The analytic team included experts in the analysis of DNA methylation signatures, analysis of whole genome data and the design, conduct and analysis of large-scale population studies. QC measures included identification of samples that are outliers or have high miscall rates. Quantile normalization of probe specific intensity and batch effect adjustment were used to reduce inter-array variation. The association of epigenetic markers (gene specific and individual CpG sites) with T2D was investigated by regression techniques. Results identify, for the first time, epigenetic marks associated with T2D in South Asians.

Risk factors underlying T2D amongst South Asians in different settings
WP5 explored the predictive value of the epigenetic markers to incident T2D, and the contribution of epigenetic and risk factors to T2D amongst South Asians. The contribution of epigenetic and risk factors to T2D amongst non-migrant and migrant Population attributable risks will be calculated to quantify the contribution of lifestyle, environmental, genetic, and epigenetic risk factors to T2D risk among South Asians in different settings.

The EpiMigrant consortium
The EpiMigrant consortium of international collaborators brought together the necessary samples and expertise, and provided a unique opportunity to investigate the epigenetic, genetic and environmental mechanisms underlying T2D in South Asians. Key strengths included: international scientific expertise in the epidemiology and population genetics of type-2 diabetes, in epigenetics, bioinformatics and large scale bio-statistics; well characterised cohorts of South Asian T2D cases and controls from around the world; a track record of successful collaboration; and robust links to international consortia investigating T2D in Europeans using genomic, metabolomic, transcriptomic, epigenomic and other state of the art approaches, providing opportunities for novel research synergies.

Project Results:
-Pilot work
Prior to this project there had been no large scale studies of DNA methylation using the 450K microarray. At the outset of the project we therefore completed a pilot study to enable evaluation of a number of unresolved technical and biological issues. We measured DNA methylation at baseline and 8 years later in 45 individuals. All were initially free from T2D, 30 developed T2D during follow-up. Results of the pilot study demonstrated that: i. age, gender, and white blood cells subsets are major factors influencing methylation levels in DNA from peripheral blood. Failure to take account of white blood cell counts leads to spurious results; ii. Methylation changes with T2D onset (reverse causation). This revealed a major potential limitation of using retrospective T2D cases, which had been the initial EpiMigrant design. Use of retrospective cases would lead to identification of methylation markers arising after disease onset, and with little utility for prediction of future T2D.
Based on the results of the pilot study, the EpiMigrant study design was updated to adopt a prospective study design (nested case-control study of incident T2D) using blood samples collected from prior to disease onset, and with white blood cell subsets measured.

-Sample selection and shipping
Sample selection criteria
.Discovery
T2D was defined as physician diagnosis, fasting glucose≥7mmol/L or HbA1c≥6.5% [47]. Incident T2D cases were defined as free from T2D at baseline, but who developed T2D during follow-up. Controls were free from T2D both at baseline and follow-up, and matched to cases by age and sex.
We searched all available South Asian cohorts for samples reaching these entry criteria; only samples from the LOLIPOP study met the criteria in full. We therefore carried out epigenome-wide association amongst all available 1,074 Indian Asian cases of incident T2D and 1,590 Indian Asian controls. DNA methylation was quantified in the baseline DNA samples collected at study enrolment, when all participants were free from T2D.

.Replication
Replication testing was done amongst 1,720 migrant and non-migrant South Asians (647 cases with prevalent T2D) and also 1,141 Europeans from the LOLIPOP study (N=181 incident T2D cases, N=568 controls) and KORA S3 and S4 studies (N=196 incident T2D cases, 196 controls). Samples were matched for age and gender. All replication samples were of European ancestry to enable replication in a second ethnic group, and investigation of the potential contribution of DNA methylation to the increased risk of T2D in South Asians. DNA methylation was quantified in the baseline DNA samples collected at study enrolment, when all participants were free from T2D.

.Description of cohorts
London Life Sciences Prospective Population (LOLIPOP) study
LOLIPOP is a prospective population study of Indian Asian (N=17,606) and European (N=7,766) men and women, recruited at age 35-75 years from the lists of 58 General Practitioners in West London, United Kingdom between 2003 and 2008. South Asians had all 4 grandparents born on the Indian subcontinent (India, Pakistan, Sri Lanka or Bangladesh), Europeans were of self-reported white ancestry.
At baseline all participants completed a structured assessment of cardiovascular and metabolic health, including personal and family history, leisure time physical activity, and anthropometry. Participants were seen between 8am and mid-day, after an overnight fast (8 hours), allowing collection of fasting blood samples for complete blood count and measurement of glucose, insulin, HbA1c, lipid profile. Amino acid concentrations were measured by 1H Nuclear Magnetic Resonance [48]. Physical activity was defined as engaging in ≥90 mins of at least moderately vigorous leisure time physical activity (≥3 METS) per week. Homeostasis model assessment of insulin resistance (HOMA-IR) and beta cell function (HOMA-B) were calculated [49]. Aliquots of whole blood were stored at -80C before extraction of genomic DNA. The LOLIPOP study is approved by the National Research Ethics Service (07/H0712/150) and all participants gave written informed consent.
At follow-up, electronic health records from primary care practitioners were extracted for each participant, and structured queries used to identify individuals with new onset T2D. In addition 7,640 participants attended clinical evaluation, including questionnaire and fasting blood samples for glucose and HbA1c, enabling identification of people confirmed to be free from T2D at the end of follow-up (not on treatment for T2D and fasting glucose<7mmol/L and HbA1c<6.5%).

Cooperative Health Research in the Region of Augsburg (KORA)
KORA is a research platform of population-based health surveys and subsequent follow-up examinations amongst individuals of German nationality resident in the region of Augsburg in Southern Germany. Written informed consent was obtained from all participants and the studies have been approved by the ethics committee of the Bavarian Medical Association. Study design, sampling method and data collection have been described in detail elsewhere [50]. The surveys S3 and S4 were conducted in 1994/1995 and 1999-2001, respectively, and comprised independent samples of 4,856 and 4,261 subjects aged 25 to 74 years. Both cohorts were reinvestigated in the follow-up examinations F3 and F4 in 2004/2005 and 2006-2008, respectively, with 2,974 and 3,080 participants. Anthropometric variables and clinical parameters were determined at all examinations. Replication testing in samples from the KORA study was done using the Illumina HumanMethylation450 BeadChip with genomic DNA (750ng) as described, from 196 cases and 196 controls, matched for age (±2 years), sex, cohort and observation time till diagnosis of diabetes.

-Generation of whole genome methylation data and replication data
Epigenome-wide scans
Epigenome-wide scans were completed at the Oxford Genomics Centre in two sets: a Pilot Study of 96 samples, followed by a main study of 3360 samples. Bisulfite conversion of genomic DNA was performed using the EZ DNA methylation kit according to manufacturer's instructions (Zymo Research, Orange, CA). Case and control samples were distributed randomly in all experiments. Methylation of genomic DNA was quantified using the Illumina HumanMethylation450 array according to manufacturer’s instructions. Epigenome-wide scans were completed to specified levels of data quality. First-pass quality control using array internal control probes was performed after each batch was completed. This revealed four samples with low bisulphite conversion. At a detection p-value cut-off of 5%, 21 arrays failed to reach 99% call rate, which included those samples that failed bisulphite conversion. Six samples that performed worst in batch 1-10 were repeated in batch 12, of which five then passed the first-pass QC. After removing failed samples, a principal component analysis showed no clustering by batch (Figure 2). The data were passed to the analytical group (including Oxford participants) to undertake downstream analyses. All files generated by the Oxford Genomics Centre were transferred to via a secured FTP connection to an Imperial FTP Server. This includes intensity files (IDATs) and methylation scores extracted with Illumina GenomeStudio in .txt format.

Replication by pyrosequencing
The first goal of the replication analysis was to find a sensitive and processive method to detect small changes in DNA methylation, and enable assessment of a large number of samples. Before starting the replication phase we carefully re-evaluated the technologies available. Based on expert advice we adopted the well established Pyrosequencing technique, which is a sensitive and reproducible method and has been successfully applied to methylation analysis.
Templates for pyrosequencing were obtained by amplifying bisulfite-treated DNA with biotinylated primers. The biotinylated PCR products were then immobilized on streptavidin-coated Sepharose beads (GE Healthcare, Orsay, France). Pyrosequencing was performed with the PyroMark Q96 MGMT kit (Qiagen, Courtaboeuf, France) on a PSQTM96 MA system (Biotage, Uppsala, Sweden).
In a pilot study involving 95 samples we measured DNA methylation of 16 sequences, running each assay in duplicate, and evaluated the correlation between the two runs, and with the Illumina chip data. The 16 markers showed correlations coefficients>0.7 including all 7 of the markers selected for testing against T2D in the main replication effort. Given the high reproducibility, we were able to run each assay once, rather than in duplicate. Methylation at ABCG1 showed a bimodal pattern due to an artefact (the presence of a SNP downstream the CpG position of interest). To overcome this technical problem we needed to redesign a new reverse primer that excluded the SNP.


-Method development for quality control and data analysis
The Illumina Infinium HumanMethylation450 BeadChip (450K methylation array) makes it possible to measure DNA methylation on a genome-wide scale [51]. However, the 450K methylation array architecture is highly complex as it includes multiple different probe types, each using different chemistry. Furthermore the methylation assay involves multiple steps that introduce assay variability and batch effects. Multiple methods have been proposed for analysis of the complex data generated by the 450K methylation array [52-58]; however there is currently no consensus on the optimal analysis pipeline. We therefore developed a comprehensive approach to the analysis of 450K methylation array data. Our pipeline, termed CPACOR (incorporating Control Probe Adjustment and reduction of global CORrelation), performs superiorly to published methods, and provides a blueprint for the analysis of large-scale Epigenome-Wide Association Studies (EWAS) [9].

.Initial quantification and quality control
We analysed two DNA methylation datasets: the population study of type-2 diabetes comprising 2,687 South Asian samples, and the technical replication dataset comprising 36 samples measured in duplicate. Initial and repeat sample analyses were carried out in separate batches, thereby maximising the impact of technical factors.
We performed an initial top-level quality control following analysis recommendations given by Illumina. This led to the exclusion of 22 samples (sample call-rate <98% or incorrect gender). Because distributions for methylation values differ between autosomal and gender chromosome markers we analysed these separately. Markers that are predicted to cross-hybridise [59], with a SNP in the probe-sequence, or that measure methylation at non-CpG sites were retained but flagged.

.Evaluating the detection P-value threshold
We initially used a detection P-value of P<0.05 to call methylation markers, as recommended by Illumina. We noted though that calculated detection P-values reported by minfi [56] range from 1 to 2.2 x 10-16, with values lower than 2.2 x 10-16 reported as zero. To investigate the impact of detection P-value threshold, we first evaluated call rates on the Y-chromosome amongst females in the population study; we found that >50% of Y-chromosome markers had non-zero call rates in females. As females have no Y-chromosome, this strongly suggests that the default detection P-value (P<0.05) is not sufficiently stringent, which leads to spurious results. When the detection P-value threshold is lowered to P<10-16 the proportion of Y-chromosome markers with non-zero call rate in females is reduced from 55% to 10.0%. The majority of these remaining markers represent previously unidentified cross-hybridising probes. The more stringent detection threshold has no material effect on Y-chromosome calling in males.
We also show that this observation applies to autosomal markers by comparing results for the 36 samples that were measured in duplicate. We observe a higher correlation (P<10-11) between duplicate pairs when a detection P-value threshold of P<10-16 is applied compared to a threshold P<0.05. This provides further evidence for improved quantification of methylation with a more stringent detection P-value threshold. Based on our results, we suggest P<10-16 as detection P-value threshold, providing a high accuracy at minimal loss of data [9].

.Data normalisation
Data normalisation is a standard way of reducing technical biases across measurements in microarray data analysis. However, there is no established consensus normalisation approach for the 450K methylation array. We therefore assessed the performance of ten different normalisation methods [52, 55, 59-62] using the relationship between beta-values for the 36 samples measured in duplicate. The highest correlations between the paired measurements of methylation were achieved after quantile normalisation of intensity values for markers, subdivided by probe type, probe sub-type and colour channel (Figure 1). Other approaches performed significantly worse, including some that showed little or no improvement compared to non-normalised data.
Correlation between technical replicates may not assess over-normalization. To quantify the ability to detect true signal after each normalisation method, we therefore performed spike-in simulations based on the population study. For that purpose we randomly assigned case-control status to all samples. For 100 100 randomly selected markers beta-values were increased (“spiked”) in the case samples. To measure the performance of each normalisation method, we then determined the proportion of the spiked markers that were ranked in the top 100 methylation markers by univariate regression analysis. As before, quantile normalisation of intensity values performs best. Whereas most methods lead to improved performance, some over-normalise resulting in a reduction of true signal compared to no normalisation (Figure 2). On the basis of these results, which are in agreement with previous findings [63, 64], we performed quantile normalisation of intensity values for all samples in this study.

.Removal of technical biases
Based on normalised data, we used linear regression to compare the paired measurements of beta-values from the 36 samples measured in duplicate. We still observed a high degree of statistical inflation (λ=2.11 Figure 3) indicating residual strong systematic biases between the duplicates. As this analysis is based on duplicate measurements, this result is caused batch and other technical effects.
Existing methods for batch correction typically require knowledge of relevant experimental factors such as bisulfite conversion batch, array number, position on array, date or time [65]. These data may not be available, or where available may not accurately measure the technical bias. We therefore developed a new method to correct for technical biases in the 450K methylation data, termed Control Probe Adjustment (CPA). We first retrieved signal intensities for the 450K methylation array control probes, which assess multiple aspects of the chemistry involved in quantification of methylation, such as bisulfite-conversion efficiency (Table 1). To take into account the high degree of correlation between these control probes, we performed a Principal Component Analysis (PCA) of control probe intensities. The PCs correlated closely with multiple technical parameters, including bisulfite batch and plates (Figure 4). We then apply CPA to the 36 samples measured in duplicate by including the principal components (PCs) as linear predictors in the regression analysis. Adjustment for the first 30 PCs almost entirely removed test statistic inflation consistent with effective correction for batch and technical effects (λ=1.01; Figure 3).

.Null hypothesis and global correlation patterns
To determine the P-value distribution under the null hypothesis we randomly shuffle the case-control status amongst all 2,664 samples of the population study. We then performed a logistic regression for each marker. and repeated this 1,000 times to give 1,000 sets of P-values under no association. Even though true association has been removed (by randomising case-control status) we observed substantial departure from the null expectation. This includes both overall statistical deflation for the majority of permutations, but also a small number of permutations with a high degree of statistical inflation (λmedian=0.96; λ2.5%tile=0.84; λ97.5%tile=1.46).
We hypothesised that this departure from the null expectation is caused by correlation between markers. This Correlation reduces the number of independent tests and may explain the apparent deflation of P-values. To test this hypothesis, we randomly reassigned beta-values for each marker to re-establish independence between markers. This effectively abolished test-statistics deflation and revealed a narrow prediction interval around the expected (λmedian=1.00; λ2.5%tile=1.00; λ97.5%tile=1.01).

.Factors driving global correlation patterns
Correlation between methylation markers may arise from technical and biological confounders. To better understand correlation patterns we carried out a PCA of the population study dataset We then used the PCs to explore relationships of methylation to technical and biological factors. The first three PCs were strongly associated with multiple white-blood cell sub-populations. In addition to measured white-blood cell sub-populations, we generated a complementary set of white blood cell subpopulations, which were estimated from the methylation data itself [66]. These accurately reproduce white-blood cell measurements (Pearson correlation coefficient r=0.82-0.56) but provide cell type proportions of four additional lymphocyte sub-populations. We also found significant correlations of PCs with age, but not with any other clinical variables.
Adjustment for biological factors reduced the correlation between markers and test statistic inflation. The greatest reduction results from adjustment for white-blood cell subpopulations (Table 2). To make a final correction for global covariation that is still unaccounted by the biological factors included in the regression we performed a final PCA. Adjustment for the first 5 PCs further reduced the correlation between markers (λmedian=1.00; λ2.5%tile=0.97; λ97.5%tile=1.05). On the basis of these results we calculated a 95% prediction interval and propose an epigenome wide significance threshold of p<10-7 that is consistent with ~470,000 independent tests [9].

.Impact on local correlation
Nearby CpG-sites (<1kb distance) tend to be correlated, as has been reported by previous studies [67, 68]. There are likely to reflect biologically functional units. We replicated these findings, and also show that our adjustments for technical and biological factors remove correlation between markers with a high genomic distance (>1kb). Correlation between markers in direct genomic neighbourhood (<1kb) is retained, supporting the view that our approach to data analysis preferentially removes the long-range correlations between markers that are more likely to spurious.

.Performance
We used simulated case-control datasets to assess the performance of the CPACOR analysis pipeline (Table 3). Based on the spike-in approach described above, we show that the proportion of spiked markers achieving high rank is improved successively by each of the stages of our pipeline including quantile normalisation, adjustment for control probes, and adjustment for biological factors. These adjustments reduce statistical inflation and increase the power to identify true association signals.
We also used spike-in simulations to compare our analysis pipeline with published methods [52-58]. Most published methods could not be completed using datasets of >2,000 samples, even on a dedicated high-performance computing cluster with 2TB of RAM. In contrast our approach achieves improved computational performance through parallelisation. Although different methylation studies may require different approaches to analysis, results from spiked data of a smaller dataset (250 cases, 250 controls) indicate that CPACOR performs significantly better than published methods.

.Marker subtypes and sex chromosomes
We analysed the specific properties of three different marker categories: i. non-CpG markers, ii. cross-hybridising markers, and iii. markers with a SNP in the probe-sequence. We found very little evidence to suggest these markers decrease overall data-quality. Including them during quantile normalisation does not change correlation between technical duplicates (mean r=0.9979 in both cases). We recommend retaining, but flagging these markers.
We also analysed markers on sex-chromosomes in an analogous manner. Adjustment for technical and biological factors also reduces correlation between markers on the sex chromosomes, although to a lesser extent. This suggests a higher probability of both Type-1 and Type-2 errors during analysis of sex-chromosome data, compared to autosomal results.

Conclusions
We have developed a comprehensive analysis pipeline termed CPACOR based on data from the Illumina 450 Methylation array. We show that the default detection P-value is insufficiently stringent to prevent spurious results, identify the optimal approach to data normalisation, describe a new, highly effective method for dealing with technical bias, and demonstrate the importance of accounting for biological confounders. Using case-control permutations we established an epigenome-wide significance threshold of p<10-7, that is consistent with Bonferroni correction. Our approach significantly outperforms existing methods for identification of true association. Furthermore our approach is scalable and, unlike many existing methods, capable of handling large-scale datasets involving several thousand samples. Our comprehensive set of instructions for the analysis of Illumina 450k methylation will advance the ability of epigenome-wide association studies to accurately identify methylation quantitative trait loci for hypothesis driven follow-up experiments.

-Results of the epidemiological analyses
.Clinical characteristics
Baseline characteristics of the 1,074 South Asian cases with incident T2D and the 1,590 controls are shown in Table 4. South Asians who developed T2D during follow-up showed significant differences in baseline characteristics from controls, including higher body mass index, waist circumference and waist-hip ratio, and higher fasting glucose, insulin and HbA1c compared to controls (P<0.001).

.Epigenome-wide association and replication
Epigenome-wide association analysis identified markers at 7 genetic regions that were associated with incident T2D at P<5x10-7) (Figures 5, 6). In replication testing, 5 of the 7 sentinel methylation markers were associated with incident T2D at P<0.05 amongst both South Asians and Europeans. In combined analysis of epigenome-wide discovery and prospective replication data, all 5 markers reached epigenome-wide significance (P<10-7) for association with incident T2D (Table 5). Relative risks for incident T2D between the top and bottom quartiles of methylation ranged from 1.72 to 2.14 (Figure 7). The 5 replicated markers also showed strong association with prevalent T2D amongst the 1,720 South Asian replication samples (P=6.8x10-5 to P=9.7x10-55 Table 6). Current knowledge concerning function of the genes nearest to the sentinel methylation markers is summarised in (Table 7). The association of differential methylation locus with incident T2D was independent of baseline body mass index, waist-hip ratio, HOMA-IR, HOMA-B, branched-chain and aromatic amino acid concentrations.

.DNA Methylation as a predictor of T2D
In a combined model, the leading methylation markers were each independently associated with incident T2D. We show that a methylation risk score (MRS) combining results for the 5 markers was highly predictive of future T2D amongst South Asians (relative risk for Q4 vs Q1: 3.5 P=10-26). MRS replicated in the independent sample of Europeans with incident T2D with no evidence for heterogeneity of effect with South Asians (P>0.1). In sensitivity analyses we examined the relationship of methylation score for prediction of T2D amongst i. South Asians without pre-diabetes (glucose<6mmol/l, HbA1c<6%) and ii. Indian Asian cases and controls closely matched for conventional T2D risk factors. In both instances methylation score remained strongly associated with T2D (no prediabetes: P=2.3x10-19; matchedP=2.5x10-8).
Area under the curve (AUC) for the T2D methylation score was 0.640 which is similar to that for body mass index or waist-hip ratio (AUCBMI=0.625; AUCWHR=0.648). Addition of DNA methylation improves both positive predictive value (PPV) and sensitivity for future T2D, compared to models that include family history, physical activity, body mass index, waist-hip ratio, glucose and HbA1c (Table 8).
We evaluated DNA methylation as a predictor of T2D amongst the 1,932 South Asians who were normoglycaemic at baseline (HbA1c<6% and fasting glucose <6mmol/l). Future risk of T2D is up to 5-fold higher amongst South Asians in the highest quartile vs lowest quartile of methylation (Figure 8), such that DNA methylation strongly discriminates between South Asians at high risk of future T2D (>20% over 10 years) vs those at low risk (<5% over 10 years). DNA methylation substantially improves PPV and sensitivity for future T2D amongst obese normoglycaemic South Asians, compared to models that do not include methylation.

-Functional genomics and bioinformatic analyses
.The relationship between DNA methylation and gene expression
The relationship between methylation and gene expression in peripheral blood leucocytes was investigated in samples from three cohorts: LOLIPOP (South Asians, N=907), KORA (Europeans, N=703) and the EnviroGenoMarkers (EGM, European, N=591)

LOLIPOP. Details on the LOLIPOP cohort and methylation analysis has been described earlier. Gene expression analysis was performed with the Illumina HumanHT-12 v4 BeadChip array according to manufacturer's protocol. Background correction using negative controls was performed, and subsequently quantile normalised and log2 transformed [69, 70]. Linear models were fitted with log transformed gene expression as response variable, and quantile-normalisd beta values (methylation), age, sex, top 24 control probe PCs from methylation measurement, and technical covariates related to the expression measurement including RIN, RNA extraction batch, RNA conversion batch, scanning batch, array and array position. Calculations were performed using R, version 3.0.1.

KORA. Details on the KORA cohort have been described earlier. For a subset of 703 KORA F4 subjects, both methylation (Illumina 450k) and gene expression data were available. Gene expression analysis was performed by the Illumina HumanHT-12 v3 BeadChip array, with blood sample collection, RNA isolation and preparation as well as gene expression measurement described in detail elsewhere [71]. Linear models were fitted with log transformed gene expression as response variable, and DNA methylation, age, sex, smoking state (categorized as smoker, former smoker and never smoker), physical activity (categorized as active, inactive), alcohol intake, the top 20 control probe PCs from methylation measurement and three technical covariates related to the expression measurement, namely amplification batch, sample storage time and RNA integrity number (RIN) as covariates [71]. Calculations were performed using R, version 3.0.1.

EGM. The EnviroGenoMarkers (EGM) project is a nested case-control study of incident breast cancer and B-cell leukaemia [72]. Methylation and gene-expression were quantified in the baseline blood samples collected 1-17 years prior to disease onset. Transcriptomics profiles were obtained using the Agilent 4x44K Whole Human Genome Microarray and subjected to extensive quality control procedures. DNA methylation profiles were obtained using the 450K array according to the manufacturer's protocol. Bisulphite conversion was carried out using the Zymo EZ DNA Methylation Kit. Probes that had missing values in more than 20% of the samples were excluded. The final dataset consisted of 29,662 transcripts and 432,633 DNA methylation probes. We used linear regression to determine the association between methylation and expression of nearby genes (1MB). We inferred statistical significance at P<0.05 after correction for the number of marker-eQTL comparisons made.
We investigated the relationships between methylation at the 5 CpG sites with expression of nearest gene. In peripheral blood, ethylation is strongly associated with expression amongst both South Asians and Europeans (P=3.8x10-3 to 3.8x10-21) at 2 of the loci, and also shows evidence for association with expression at 2 other loci (Table 9).

.Cross-tissue patterns of DNA methylation
We examined cross tissue patterns of methylation using publicly available data (GSE48472) from the Gene Expression Omnibus database [73]. This dataset comprises 41 samples from blood and 6 metabolically relevant tissues (liver, muscle, pancreas, subcutaneous fat, omentum, spleen) analysed using the 450K methylation array. We found tissue-specific patterns of DNA methylation. For instance, at two of the loci, DNA methylation is high in blood, fat, spleen, liver and pancreas, but lower in skeletal muscle.

.Heritability and genetic variants influencing methylation
Heritability of methylation markers was assessed amongst 615 UK South Asians from 89 families, ascertained based on a history of coronary artery disease in one or more family members. Family size varied from 3 to 26. Participants were aged 14-80 years, and underwent structured assessment of cardiovascular and metabolic health, including anthropometry, and fasting blood samples for glucose, lipid profile, HbA1c and complete blood count. Aliquots of whole blood were stored at -80C for extraction of genomic DNA. Characteristics of the South Asian family study participants are shown (Table 10). Results demonstrate that methylation at all 5 loci show significant heritability (range 0.30 to 0.61) consistent with trans-generational effects.

.Fine-mapping of the region
The 450K array assays <2% of the estimated ~30M CpG sites in the human genome. To better describe the patterns of regional methylation we carried out re-sequencing of the leading locus in 172 samples. We initially used sequence capture and next generation sequencing to assay 99 of the 133 predicted CpG sites within 5kb of the sentinel methylation marker on chr 1. We supplemented this with pyrosequencing of 14 CpG sites adjacent to the sentinel marker, both to improve local coverage and to replicate the findings from next-generation sequencing.
Primers were designed using Sequenom EpiDesigner BETA (www.epidesigner.com). Target DNA enrichment was done using the Fluidigm 48.48 Access Array IFC System, followed by PCR to attach sequence-specific adapters and sample barcodes. Pooled sequencing was done using the Illumina MiSeq platform (150bp paired-end runs). We then used the Burrows-Wheeler Aligner to map the directional, paired-end Illumina sequencing reads to the reference genome (hg19 build) [74], then quantified methylation from the frequencies of converted and unconverted cytosine residues observed in reads mapped to each CpG site.
We quantified the pairwise correlation between methylation at the sentinel CpG site with each of the additional CpG sites assayed. We carried forward 8 CpG sites showing r>0.5 with the sentinel marker for pyrosequencing amongst 238 incident T2D cases and 382 controls selected at random from the discovery samples, to quantify their association with T2D (logistic regression) both as single markers and in aggregate (mean methylation across the sites assayed).
Resequencing reveals a cluster of 8 CpG sites in the 3’UTR of the gene which show methylation that correlates closely with methylation at the sentinel marker (Figure 9). Average methylation across these 8 CpG sites is closely associated with risk of future T2D, and this regional association is stronger than for any individual CpG site (relative risk per 1SD change - discovery marker: 1.29±0.07 P=5.2x10-3; regional score 1.38±0.07 P=7.9x10-4).

-DNA Methylation and the increased risk of T2D in South Asians
The incidence of new-onset T2D was 11.8% amongst South Asians and 4.3% amongst Europeans in the LOLIPOP cohort over mean 8 years follow-up. Age, sex adjusted relative risk of T2D risk of T2D is 3.4 fold [3.0-3.9] (P=1.1x10-77) higher amongst South Asians then Europeans, and remains 2.5 fold [2.2-2.9] fold higher after further adjustment for conventional risk factors.
Methylation levels at 4/5 loci identified are unfavourable amongst South Asian compared to European controls (Figure 10). At each locus the direction for differential methylation amongst South Asians was concordant with the direction associated with an increased risk of T2D. In multivariate analysis DNA methylation score is 0.86±0.06 SD higher amongst South Asians than Europeans after adjustment for age, gender and conventional risk factors (0.05±0.04 vs -0.82±0.04 SD, P=10-34). This difference in methylation score translates to ~32% [ie ln(1.34)/ln(2.5)] of the unexplained excess 2.5-fold risk of T2D amongst South Asians compared to Europeans.

Conclusions
We report the results of a large, prospective, epigenome-wide study investigating the relationship between DNA methylation and future T2D. Differential methylation of genomic DNA at five genetic loci predicts incident T2D in both South Asians and Europeans. A DNA methylation score identifies an almost four-fold higher risk of future T2D between upper and lower quartiles. Fine-mapping of the top-ranking locus reveals multiple additional methylation markers that further improve prediction of T2D.
Methylation of DNA at CpG sites regulates gene expression and mediates the biological response to environmental exposures [5-8, 75-77]. The methylation markers identify genes in key pathways underlying T2D and associated metabolic defects. At the leading locus the identified gene is known to be highly glucose sensitive, to regulate cellular glucose entry via GLUT1, and protect against mitochondrial oxidative stress. The gene may be involved in glucose induced apoptosis in pancreatic beta-cells, and may contribute to regulation of adiposity and energy expenditure in the hypothalamus. At other loci the genes identified are in hepatic lipogenesis, cholesterol and phospholipid transport, and insulin secretion. Animal models have impaired glucose tolerance and insulin secretion with normal insulin sensitivity. The genes may contribute to the dyslipidaemia and hepatic steatosis seen in obesity, insulin resistance and T2D.
Our results also show that DNA methylation in peripheral blood reflects methylation status in metabolically active tissues, but that the relationships between DNA methylation and gene expression are tissue specific. Our findings pave the way for functional studies to define the pathways linking DNA methylation to T2D and its metabolic disturbances.
South Asians are at ~3 fold higher risk of T2D compared to Europeans [3, 4]. Rapid urbanisation and demographic transition on the Indian subcontinent will lead to more than 100 million people being affected by T2D in India alone by 2030.[1] The reasons underlying the excess risk of T2D amongst South Asians remain unknown. Body mass index, central adiposity, diet and physical activity do not explain the higher risk of T2D amongst South Asians [16, 17]. Genome-wide association studies amongst people of South Asian ancestry do not identify population specific variants associated with T2D [4, 78, 79]. Instead, they find cosmopolitan variants with similar effect size and risk allele frequency to Europeans, excluding a role for known genetic variants underlying the excess risk of T2D in Asians. The lack of knowledge of the mechanisms underlying T2D in South Asians represents a major obstacle to the development of effective measures aimed at reducing the burden of T2D in this population.
In the current study we show that methylation is unfavourable at 4 of the 5 genetic loci identified, amongst South Asians compared to Europeans. The differences we observe in methylation at the five loci identify ~1/3rd of the unexplained increased risk of T2D amongst South Asians compared to Europeans. Whether there are population-specific DNA methylation markers predicting T2D in Europeans remains to be investigated. We also show that the DNA methylation markers identified are highly heritable. Our results provide evidence that DNA methylation is a potential novel transgenerational mechanism underlying risk of T2D.
We show that DNA methylation improves prediction of future T2D compared to traditional measures, as estimated by area under the curve and positive predictive value. The ability of DNA methylation to discriminate risk of T2D is particularly striking in obese, normoglycaemic South Asians, amongst whom methylation enables identification of a subset of obese individuals with high (>20%) or low (<5%) incidence of T2D during follow-up. DNA methylation may therefore help identify people who may benefit from early pharmacologic or lifestyle interventions.
We conclude that DNA methylation in blood is a strong and independent predictor of future T2D. Methylation markers identify increased risk of T2D amongst South Asians compared to Europeans. Our findings provide the basis for development of new strategies to tackle the emerging global epidemic of T2D.

Potential Impact:

Overview
The EPI-MIGRANT study aimed to understand the epigenetic, genetic and environmental factors underlying the high prevalence and incidence of type-2 diabetes in South Asians, one of the largest ethnic minority groups in Europe and who show a high burden of diabetes compared to their non-migrant counterparts. The EPI-MIGRANT study was specifically tailored to the needs, priorities and impacts outlined by the call (HEALTH.2011.2.4.3-4) and has delivered on each of its planned strategic impacts including:
•We identify 5 novel epigenetic markers that strongly predict diabetes in South Asians, thus laying the foundations for improved diagnosis and treatment, and potentially the development of novel therapeutic targets.
•We have evaluated both environmental and multigenerational exposures underling diabetes in South Asians; our research may therefore impact on future strategies to improve health though lifestyle and related interventions.
•We have built and extended a cooperation between the national projects from partner countries, thus ensuring a global impact beyond the specific South Asian populations studied.
•We have engaged the collaboration of an SME with particular interest and expertise in biomarker discovery, and diagnostic / therapeutic innovation, thereby providing opportunities to improve innovation and competitiveness in European health-related industries and businesses
•To extend upon our findings, and to translate them for clinical benefit in risk stratification and personalised medicine, the consortium have two H2020 proposals awarded / under review.

Better understanding of risk factors underlying diabetes in South Asians
Diabetes represents a major and growing threat to health and well-being among South Asians, as they migrate from rural to urban areas, and in regional settings around the world. Though diabetes among South Asians is more prevalent among the affluent, recent data show that diabetes rates are rapidly rising among low and middle income South Asians, who are also more susceptible to diabetes complications, due to reduced access to quality health care in these settings. Known environmental and genetic risk factors do not explain more than a small part of the increased diabetes among South Asians, who comprise a quarter of the world's population. This represents a major health inequality affecting a large population, many of whom live in poor conditions.
Development of effective risk prediction tools for diabetes among South Asians necessitates understanding of the mechanisms underlying the high rates of diabetes in this population, and the factors that make Asians susceptible to complications of diabetes. Though the causes of diabetes are poorly understood, it is widely recognised that the disease is characterized by an inadequate beta-cell response to the progressive insulin resistance that typically accompanies advancing age, physical inactivity, and weight gain. However, the mechanisms that underlie individual susceptibility to diabetes remain obscure. The failure to understand the pathophysiological mechanisms underlying diabetes present major obstacles in efforts to develop improved risk prediction tools, and new preventive and therapeutic strategies.
In this study we have defined, for the first time, the contribution of epigenetic, genetic and environmental risk factors to risk of T2D amongst South Asians. Our study design included major strengths including; i. Unique, well characterised, well powered prospective population studie of South Asian populations; ii. technologically advanced epigenome array; and iii. validation of a predictive panel of lifestyle, genetic, environmental, and epigenetic markers increasing diabetes risk, in further prospective population studies, and in other population groups.
We show that differential methylation of genomic DNA at five genetic loci predicts incident T2D in Indian Asians. Results replicate amongst Europeans. We show that a DNA methylation score identifies an almost four-fold higher risk of future T2D between upper and lower quartiles. We show that DNA methylation improves prediction of future T2D compared to traditional measures, as estimated by area under the curve and positive predictive value. The ability of DNA methylation to discriminate risk of T2D is particularly striking in obese, normoglycaemic Indian Asians, amongst whom methylation enables identification of a subset of obese individuals with high (>20%) or low (<5%) incidence of T2D during follow-up. We separately show that methylation is unfavourable amongst Indian Asians compared to Europeans. The differences we observe in methylation at the five loci identify ~1/3rd of the unexplained increased risk of T2D amongst Indian Asians compared to Europeans. Results of this study enable development of a predictive panel of lifestyle, genetic, environmental, and epigenetic markers underlying susceptibility to diabetes in South Asians. Results of EpiMigrant show that DNA methylation may therefore help identify people who may benefit from early pharmacologic or lifestyle interventions, findings of huge potential translational importance for risk stratification and personalised medicine.

Better understanding of molecular mechanisms underlying diabetes in South Asians
The bioinformatic and gene expression data generated in this study show that the methylation markers identify genes in key pathways underlying T2D and associated metabolic defects. This includes genes involved in regulating cellular glucose entry via GLUT1; protection against mitochondrial oxidative stress; pancreatic beta-cells apoptosis; regulation of adiposity and energy expenditure; cholesterol and phospholipid transport; insulin secretion; regulation of hepatic lipogenesis; hepatic steatosis and insulin resistance. Our findings provide potential new insights into the disease mechanisms underlying diabetes, and may enable development of new pharmacological strategies for T2D prevention. Our results show that DNA methylation in peripheral blood reflects methylation status in metabolically active tissues, but that the relationships between DNA methylation and gene expression are tissue specific. Our findings pave the way for functional studies to define the pathways linking DNA methylation to T2D and its metabolic disturbances.
We also show that the DNA methylation markers identified are highly heritable. Our results provide evidence that DNA methylation is a potential novel transgenerational mechanism underlying risk of T2D. Furthermore fine-mapping of the top-ranking locus reveals multiple additional methylation markers that further improve prediction of T2D. These findings pave the way for future clinical and molecular studies to define the mechanisms underpinning transgenerational inheritance, as well as systematic mapping of risk loci to define not only the optimal marker set that provides best predicitive value but also further insights into molecular mechanism.
Identification of molecular mechanisms increasing susceptibility to diabetes will inform understanding of the processes and pathways involved in disease pathogenesis, and enable development of accurate markers that predict risk of disease, and provide opportunities for drug development, aimed at reducing the burden of disease.

Improving the health of South Asians in Europe and other regions of the world.
EpiMigrant focused on an innovative epigenome-wide search for biomarkers of diabetes among South Asians, and explored the interactions of epigenetic, genetic, environmental, and lifestyle factors to diabetes in South Asians, to quantify their contribution to diabetes among South Asians. Our integration of genetic, epigenetic, clinical and biochemical information tinto risk stratification alorithms enables more precise classification of diabetes and its risk than has been previously possible, as well as providing prognostic and therapeutic benefits for the patient. By improving the predictive tools, and hence prevention and therapeutic options for patients with diabetes, EpiMigrant research will have a strong impact on improving the health of the South Asians population.
Results from this research will also guide specific interventions based on individual diabetes risk profiles. Early identification of diabetes risk, and appropriate treatment, will reduce morbidity associated with diabetes, with improved quality of life, and significant impact in continuing employment and reducing absence of work. The preventative strategies emerging from this research are likely to have favourable impact in extended families of South Asians, and will enable interventions such as lifestyle change to be implemented across generations.

Translation of scientific discoveries into clinical practice
EpiMigrant researchers represent world-leaders in diabetes research, with a proven track record of bringing biomarker discoveries into day-to-day clinical practice, specifically in diabetes. EpiMigranr research has already shown its potential for clinical translation to improve existing predictive diabetes models. In addition our identification of molecular mechanisms increasing susceptibility to diabetes has informed understanding of the processes and pathways involved in disease pathogenesis and has led to close interaction between leading basic and clinical diabetes researchers and clinicians engaged in the consortium, and beyond.
Identification of epigenetic mechanisms in cancer has already led to development of disease markers and epigenetic therapies, which are already in clinical use. Given that epigenetics is at the heart of phenotypic variation in health and disease, it seems likely that understanding and manipulating the epigenome holds enormous promise for preventing and treating common diseases such as diabetes. Epigenetics also offers an important window to understanding the role of the environment's interactions with the genome in causing disease, and in modulating those interactions to improve human health. This is of particular relevance to diabetes in South Asians, rates of which increase when migrating from rural to urban settings, and overseas.

Strengthening diabetes research in South Asians
Identification of the epigenetic, genetic and environmental mechanisms underlying diabetes in South Asians, including the complex interaction between these processes, necessitate a multidisciplinary approach with high level expertise. The consortium brought together diabetes research groups of Europe, and individual excellence in fields of diabetes, epigenomics, genomics, epidemiology, bio-informatics and statistics, and translational research. The project therefore included researchers from three European Member states, International Cooperation Partner Countries (India, Sri Lanka and Mauritius) and from other international experts in diabetes and epigenetics (USA, Australia).
A major impact of this proposal is bringing together of diabetes research groups from around the world who individually have assembled diabetes case / controls collections, with detailed phenotypic characterisation, in different geographic settings. In the large part, diabetes research among South Asians has been carried out in isolation by various groups and predominantly focussed on research questions that have been limited due the relative small size of the individual cohorts. The project brings together the world's largest collection of well characterised diabetes cases and controls, focussed on key objectives of identification, development, and validation of epigenetic markers underlying diabetes in South Asians, when they migrate from rural to urban settings, and in different settings around the world.
The success of the diabetes research collaboration is already measurable, The partners have recently been awarded the iHealth-T2D grant by Horizon 2020, which will evaluate new strategies for risk stratification and intervention to present T2D amongst migrant and non-migrant South Asians. In addition EpiMigrant underpins the Toast-T2D application which is presently under review by H2020, and which seeks to use the molecular markers for personalised medicine to prevent T2D in South Asians. These represent tangible evidence of the improved network of research into South Asian T2D.

-Impact at a European level
Diabetes research in the EU needs to maintain its international competitiveness. To remain effective in research in complex diseases requires multidisciplinary academic, institutional and industrial collaborations. The EpiMigrant study brought state of the art research, with major methodological advances, a focus on translation to biomarker and drug target discovery to enhance EU SMEs, and strong interaction between clinicians and researchers and academia and industry alike.
Results of this project will separately enhance European competitiveness through transfer from research into commercially successful products in the fields of novel biomarkers, predicting diabetes in South Asians. CellCentric, our SME partner made an important contribution to EpiMigrant, and in particular provided guidance on assay development and the prospects for commercialisation and intellectual property. The project innovations will provide a strong basis for growth of this biotechnology company. The market potential and reinforcement of competitiveness would be immense for therapies which improve morbidity and mortality of a common disorder affecting South Asians, who form ~1/4 of the world’s population.
EpiMigrant has also made considerable contributions to the knowledgebase in terms of novel biomarkers for diabetes. Specifically, we identify a predictive panel of epigenomic biomarkers that validate in Europeans. Our research provides the justification for a systematic search of DNA methylation that predict T2D amongst Europeans, and show that our results are likely to have wider global implication for populations beyond South Asians.


-Dissemination of the project results
In order to achieve the main objective of this project, identification of to lifestyle, environmental, genetic and epigenetic markers increasing susceptibility to T2D in South Asians, EpiMigrant has disseminated its results to external stakeholders with the purpose of supporting multicentric studies for translational health research in Europe and elsewhere. All partners of EpiMigrant participate in dissemination to ensure European and International access to our results.
The research results of EpiMigrant have been submitted for publication in international scientific journals. To date, 5 published manuscripts have capitalised on the data generated in EpiMigrant: A further three are under review, and six more are in preparation. EpiMigrant results have been presented at numerous (~20) European, North American, South Asian, and other international workshops and conferences in either poster or oral presentation format. Target audiences include policy makers, industry, public health practitioners, clinicians, researchers, service providers, patients and other stakeholders. Our EpiMigrant website comprises an open-access web page with the objectives of the project and the most relevant results. ~In addition, to create awareness about findings, EpiMigrant investigators have given presentations to meetings of colleague scientists, hospitals, general practitioners, and the general public.

-Papers (published/under review)
Lehne et al. A coherent approach for analysis of the Illumina HumanMethyla tion450 BeadChip improves data quality and performance in epigenome-wide association studies. Genome Biology (2015; in press). We report the results of the comprehensive analysis pipeline for Epigenome-wide Association Studies (EWAS) developed within the EpiMigrant study. This outperforms existing approaches, enabling accurate identification of methylation quantitative trait loci for hypothesis driven follow-up experiments. We describe new approaches to quality control, data normalisation and batch correction through control-probe adjustment, and demonstrate that these improve data-quality. Using permutation testing we also establish a null hypothesis for EWAS, show how it can be affected by correlation between individual methylation markers and present methods to restore statistical independence.

Chambers et al. DNA methylation markers in peripheral blood predict incident Type-2 diabetes amongst Indian Asians and Europeans. Lancet Diabetes and Endocrinology (under review). We report the main results of the EpiMigrant epigenome-wide association study of incident T2D amongst South Asians. We describe methylation markers at 5 loci that were associated with future T2D, with a methylation score integrating results from the 5 loci displaying even stronger association independent of established risk factors. Methylation score was also higher amongst South Asians compared to Europeans, suggesting that these methylation markers may help to account for the increased risk of T2D amongst South Asians compared to Europeans, providing potential new opportunities for risk stratification and prevention of T2D amongst South Asians.

Wood et al. Defining the role of common variation in the genomic and biological architecture of adult human height. Nature Genetics. 2014: 46; 1173-86. Data from EpiMigrant participants contributed to this pioneering large scale investigation of the genetic factors influencing height.
Hoggart et al. Novel approach identifies SNPs in SLC2A10 and KCNK9 with evidence for parent-of-origin effect on body mass index. Plos Genetics 2014: 10; e1004508. Data from EpiMigrant participants contributed to this innovative styidy investigating parent of origin effects influencing adiposity traits in humans.

Dichgans et al. Shared genetic susceptibility to ischemic stroke and coronary artery disease: a genome-wide analysis of common variants. Stroke. 2013: 45; 24-36. Data from EpiMigrant participants contributed to this pioneering large scale investigation of the genetic factors influencing stroke.

Global lipids consortium. Discovery and refinement of loci associated with lipid levels. Nature Genetics. 2013: 45; 1274-83. Data from EpiMigrant participants contributed to this pioneering large scale investigation of the genetic factors influencing blood lipid levels, a major risk factor for cardiovascular disease.

-Posters/Oral presentations
Lehne B et al. Genome-wide Association Study of DNA-methylation identifies millions of associated loci and reveals global regulatory landscape. American Society of Human Genetics Meeting 2014, San Diego, CA, October 18-22, 2014.
We report a large-scale Genome-wide Association Study (GWAS) for methylation Quantitative Trait Loci (methQTL) to identify DNA sequence variants that influence DNA methylation. Methylation (450K) and genotype data measured in peripheral blood of 1,840 individuals of South Asian descent were used. The identified trans-methQTLs identify multiple SNPs with a widespread impact on DNA-methylation across the entire genome, including SNPs in well-known regulatory genes such as CTCF, NFKB and the MHC Region. This study provides a comprehensive and genome-wide view of genetic regulation in DNA methylation and constitutes a valuable resource for the understanding of molecular pathways and human disease.

Drong AW et al. Epigenome-wide meta-analysis of over 10,000 individuals reveals extensive perturbations in DNA methylation associated with adiposity. American Society of Human Genetics Meeting 2014, San Diego, CA, October 18-22, 2014. We describe an epigenome-wide association study (EWAS) to investigate the relationship of DNA methylation with body mass index (BMI) to examine the epigenetic perturbations associated with obesity. A total of four studies comprising 5,387 whole-blood samples from European (N=2,707) and South Asian (N=2,680; EpiMigrant data) individuals were included. The primary EWAS yielded 207 independent epigenetic loci associated with BMI at P<1x10-7, with little evidence for heterogeneity between the populations. Replication testing amongst 4,998 whole-blood samples from 9 independent cohorts yielded 187 epigenetic loci with P<0.05 in replication testing and remained P<1x10-7 in combined analysis across all stages.

Wahl S et al. Downstream analyses and Mendelian randomization study on methylome-wide associations with BMI reveal biological pathways underlying BMI-related metabolic consequences. American Society of Human Genetics Meeting 2014, San Diego, CA, October 18-22, 2014. In an epigenome-wide study on body mass index (BMI), we have discovered and replicated 187 DNA methylation sites associated with body mass index in more than 10,000 individuals of European and South Asian ancestry. We integrated the identified methylation sites with single nucleotide polymorphism and gene expression data, with directionality of the observed associations explored by means of Mendelian randomization experiments. Methylation at 125 CpG sites showed strong genetic regulation. Mendelian randomization experiments suggest that methylation at the majority of CpG sites might be consequential to changes in BMI. Our findings provide new evidence of the biological pathways underlying adiposity and related metabolic disturbances.

Loh M et al. Importance of Batch and White Blood Cell Subtypes Correction in Analysis of Illumina Infinium 450K Methylation Arrays. American Society of Human Genetics Meeting 2013, Boston, MA, October 22-26, 2013. We report the impact of batch effect and variation in white blood cell (WBC) subsets on methylation association signals in studies of peripheral blood, and evaluate strategies for correcting for technical and biological confounding in epigenome-wide association studies (EWAS). Methylation in DNA extracted from peripheral blood from 1,072 people with incident Type-2 diabetes (T2D; cases) and 1,615 controls were measured using the 450K array, with 36 DNA samples analysed in duplicate. Amongst the duplicate samples, we observed strong association between replicate batch and methylation. Adjustment for 24 different in-built control probes on the array reduces the degree of statistical inflation, with bisulfite conversion control showing the strongest effect. Simultaneous correction for all control probes corrects for majority of the inflation. Adjustment for NK cells and monocytes proved to show the strongest effects.

Drong A et al. Quality Control and Data Normalisation in large Illumina Infinium HumanMethylation450 Datasets. American Society of Human Genetics Meeting 2013, Boston, MA, October 22-26, 2013. We describe methodology for the epigenome-wide studies (EWAS) of DNA methylation using the Illumina 450K platform. Using the Epimigrant T2D EWAS dataset (N=2,687) and technical replication dataset (N=36 pairs), we find that using the Bonferroni-corrected of threshold reduces the number of false methylation calls on the Y chromosome in females. We also found that our proposed normalisation method including adjustments for residuals after fitting control probes, together with batch and quantile normalisation outperforms other methods.

Lehne B et al. Correlation and Null Hypothesis in Epigenome-wide Association Studies (EWAS). American Society of Human Genetics Meeting 2013, Boston, MA, October 22-26, 2013. In this poster, we analyse the p-value distribution that arises in an EWAS under no association based on 450K data for 2,660 individuals of South Asian descent from the EpiMigrant project by permutation testing of disease labels followed by association testing. Under no association we observe a substantial deflation of test statistics. We found that correlation between markers is the consequence of biological and technical confounders, each of which affect the methylation status of multiple markers simultaneously. Correlation between markers affects the null hypothesis underlying an EWAS and the analysis of EWAS data therefore requires careful adjustment for confounding factors.

-EpiMigrant database and secondary use of data
The epigenome-wide data generated by the EpiMigrant project provides an invaluable resource for future studies and collaborations. To date, the data has been used in several ongoing projects. We are investigating the genetic variants influencing DNA methylation through genome-wide association. To date, no study has investigated this on a genome-wide basis. Results will improve understanding of the regulatory mechanisms linking genetic variation to phenotypic pattern. We will make this database publicly available and we believe this will be very useful to the scientific community for future studies.
The data is also a core component of our trans-ethnic study of genetic variants influencing blood pressure traits (Nature Genetics - under review; 2014 ASHG/Charles J. Epstein Trainee Award for Excellence in Human Genetics Research – Semifinalist), Here we provide first evidence for DNA methylation as a potential mediator of the relationship between DNA sequence variation and blood pressure through causal inference testing.
The methylation data generated within EpiMigrant also forms the basis for several upcoming/work-in-progress international collaborations including EWAS for Body Mass Index (BMI), blood pressure phenotypes and lipids. The availability of epigenome-wide data from both South Asians and Europeans within the same cohort and analysed on the same array by the same centre (Oxford) also allowed us to investigate the impact of ethnic differences in methylation profiles that could contribute to differences in disease risk that differ between population groups that are currently unaccounted for by genetic or environmental factors (e.g. T2D, coronary heart disease, certain cancer types).
The data generated by the EpiMigrant project will bring value to the scientific community beyond the current project. Data access is regulated through the EpiMigrant data access committee. We plan is to place the data into the public domain once the primary reports have been published.

-Further exploitation of the project results:
Results of this study enable development of a predictive panel of lifestyle, genetic, environmental, and epigenetic markers underlying susceptibility to diabetes in South Asians. We are using the results of EpiMigrant to generate and validate new algorithms for identification of people who may benefit from early pharmacologic or lifestyle interventions, findings of huge potential translational importance for risk stratification and personalised medicine.
The results have already been exploited to extend and widen the research collaboration. The partners have recently been awarded the iHealth-T2D grant by Horizon 2020, which will evaluate new strategies for risk stratification and intervention to present T2D amongst migrant and non-migrant South Asians. In addition EpiMigrant underpins the Toast-T2D application which is presently under review by H2020. Toast-T2D seeks to use the molecular markers for personalised medicine to prevent T2D in South Asians. These represent tangible evidence for exploitation of EpiMigrant outputs.
Identification of molecular mechanisms increasing susceptibility to diabetes will inform understanding of the processes and pathways involved in disease pathogenesis, and enable development of accurate markers that predict risk of disease, and provide opportunities for drug development, aimed at reducing the burden of disease. Academic and industry collaborations are being pursued to exploit our preliminary observations and understand the causal pathways linking DNA methylation with T2D, with the ultimate goal of developing novel therapeutic strategies.
We also show that the DNA methylation markers identified are highly heritable. Our results provide evidence that DNA methylation is a potential novel transgenerational mechanism underlying risk of T2D. Furthermore fine-mapping of the top-ranking locus reveals multiple additional methylation markers that further improve prediction of T2D. These findings pave the way for future clinical and molecular studies to define the mechanisms underpinning transgenerational inheritance, as well as systematic mapping of risk loci to define not only the optimal marker set that provides best predicitive value but also further insights into molecular mechanism.

List of Websites:
Project website address: http://www.epimigrant.eu/

Name of the scientific representative of the project's co-ordinator, Title and Organisation: Dr John Chambers, Imperial College London

E-mail: j.chambers@imperial.ac.uk