Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary
Content archived on 2024-06-18

Global initiative on gene-environment interaction on diabetes/obesity risk

Final Report Summary - INTERCONNECT (Global initiative on gene-environment interaction on diabetes/obesity risk)

Executive Summary:
The concept of InterConnect is founded on the considerable global variation in the risk of diabetes and obesity between populations. Most studies to date have sought to explain variation in risk within populations but the variation between populations is much greater and largely unexplained. Physically bringing data together from cohort studies across the world to investigate this between-population risk is analytically desirable but is constrained by governance, ethical and legal challenges.

InterConnect has, therefore, aimed at developing and implementing a solution that enables cross-cohort analyses in a manner equivalent to a meta-analysis of harmonised individual level data, but without transfer of data. Such an approach has previously been limited by a lack of knowledge of relevant studies, their design and the data available; by methodological diversity in the assessment of exposures and outcomes; and by the lack of a framework for federated meta-analysis. InterConnect has contributed to reducing these limitations.

In the InterConnect project, a number of significant advances have been made. An online searchable study registry, which includes over 276 studies from 78 different countries, has been developed and implemented, providing a resource for identifying potential opportunities for cross-cohort analyses. The ability to harmonise methods relating to the key environmental exposures of diet and physical activity and outcome measures assessed through anthropometry has been enhanced through an online measurement toolkit. A practical framework for federated meta-analysis, in which analytical instructions are sent remotely and analysis is performed locally so all data stays at source and only results are shared, has been established through the use of exemplar projects. Rather than develop technology in isolation from real scientific questions, InterConnect has sought to use real aetiological and public health examples as a driving force for technology implementation and adoption and has, therefore, engaged researchers and enabled understanding of the real-life issues that affect implementation and uptake of the approach. Twenty two studies – outside of the InterConnect consortium – have set up their own local data servers and have participated in either completed or on-going exemplar projects. This infrastructure forms a foundation for new research questions that can be identified and led by others beyond InterConnect, to form the basis of federated meta-analyses in the future.

The federated meta-analysis approach has analytical advantages when compared to more conventional processes for sharing of results between cohorts. The analyst running a federated analysis has the flexibility to refine and re-run analyses quickly, heterogeneity is reduced and analyses are conducted consistently across the participating studies using data that has been harmonised to a common format. Once set-up, the infrastructure can be re-used to give a secure, scaleable approach to cross-cohort analyses. Since resources and expertise are required to set up the local servers, it is important that the federated meta-analytical approach is used in the situations in which it adds scientific value, where the question has not already been answered, where it can’t be answered by meta-analysis of published literature and where conventional meta-analytical approaches have limitations or barriers.

The degree to which the infrastructure and federated meta-analytical approach is used in future research is dependent on a number of further developments, particularly expansion of the analytical functionality of the software; capacity building in analytical skills to enable local researchers to lead, as well as participate in, cross-cohort research; and the need for funders to incentivise the re-use of existing data and provide support for core activities. Notwithstanding this, InterConnect has successfully demonstrated the feasibility of local investigators to set up servers and host their study data with remote expert support. It has shown the scientific value of including data from studies across worldwide settings and the validity of the analytical results generated through federated meta-analysis.




Project Context and Objectives:
The concept of InterConnect is based on the considerable global variation in the risk of type 1 (T1DM), type 2 diabetes mellitus (T2DM) and obesity between populations. The rise in the prevalence of diabetes is a major public health problem not only by virtue of the high burden of morbidity and mortality with which it is associated, but also because of the enormous cost of treating this chronic condition and its complications. The condition disproportionately affects disadvantaged groups within populations and has a much higher prevalence in specific ethnic groups. The predominant driver of the rising prevalence of T2DM is obesity, for which there is a closely linked global epidemic.

The pattern of global variation in both T1DM and T2DM suggests a complex, but distinct, interaction between genetic and environmental risk factors across the life course. Considerable progress has been made in studying the relationship between lifestyle behavioural factors and the incidence of T2DM, and the detection of genetic variants in susceptibility has made the study of gene-environment interaction possible. However, to date, most studies of interaction have sought to explain variation within populations. The major questions that remain unanswered are what explains differences between populations and, in particular, what explains the excess risk of diabetes in certain specific high risk populations (Figure 1).

The determinants of risk between populations could, theoretically, be studied by the establishment de novo of multi-ethnic international prospective cohort studies. However, such studies would need to be so large, and consequently so expensive, that this is not a realistic proposition. The elapsed time required for disease outcomes to be manifest is also prohibitive as it would take many years for results to be produced. The preferable approach is, therefore, to work collaboratively towards enabling the study of gene-environment interaction between populations using existing data. However, our current research approaches for bringing data together across studies are not well placed to achieve this:

• Meta-analysis of published literature is unlikely to be a fruitful approach, not only because of heterogeneity in the exposures and outcomes considered, but also because of the limitations of publication bias.

• The study of interaction within individual large scale cohorts is possible when data on exposures at baseline is standardised and a large number of person years of follow-up have been accumulated allowing a large number of incident cases to accumulate. However, such cohorts are sampled from relatively homogeneous populations and do not, by themselves, allow the study of variation in risk between populations.

• Meta-analysis of gene-environment interaction can be successful where data is being brought together from different studies, but is often limited by harmonisation challenges, regulatory difficulties linked to physical data sharing and the analytical burden for participating cohorts.

Therefore, the goal of InterConnect has been to aim towards a different solution which directly investigates this between-population variation and thus requires an approach based on meta-analysing individual participant data (IPD) across studies rather than simply meta-analysing results across different populations. Such an approach is dependent on achieving solutions to a number of different problems that were prevalent at the outset of the project; these are described below and provided the context that determined the project objectives.

• Data discovery

There is an incomplete knowledge of relevant studies globally. Although funders know the studies that they support, there is no mechanism for identifying study resources globally to allow the investigation of this issue across population groupings.

There is a lack of knowledge of the design of studies including the populations studied and incomplete information about the nature of the data available in individual studies. Even where studies are known, it is often difficult for researchers outside of those studies to know precisely what design was employed in an individual study, what populations were recruited, what samples were stored and what data is available.

• Method harmonisation

There is considerable heterogeneity in the methods used to assess exposures and outcomes across populations. For studies that have existing data, there is an imperative to work towards the harmonisation of approaches across studies in order to be able to make use of the vast quantities of data that already exist. This process of retrospective harmonisation requires expert knowledge about methods for assessing exposures and outcomes and the creation of an analytical framework which can operationalise the decision tools derived from the expert opinion. For studies that are still being planned, the problems are somewhat different as it is possible to prospectively standardise approaches. However, existing measurement toolkits often take the approach of narrowing down to the recommendation of single instruments rather providing linkage to a range of methods that might be used in different contexts. This limits the utility of existing toolkits for global studies.

There are methodological challenges in studying the basis for differences in genetic risk between groups of different ancestry. Most initial studies of genetic main effects and interaction with the environment have been in homogeneous populations in which variation in ancestral origin was regarded as a statistical nuisance phenomenon that could be reduced by the study design or by the analytical strategy. However, if this variation is itself the source of the variation in risk between populations, then this approach ignores the main issue. There is, therefore, a need to identify tools to detect and describe population structure and to incorporate such population structure into genetic analyses so that genetic heterogeneity is leveraged rather than being suppressed or simply accounted for.

• Data management and governance

There are considerable challenges associated with bringing data together across populations when studies have been undertaken in different countries, under different ethical, regulatory and legal arrangements, by researchers operating in multiple institutions with diverse governance structures and processes. A requirement from funders to insist on the deposition of newly acquired data in open access systems can only be applied prospectively and is unlikely to be successful on a global scale, where there is plurality of funders and regulatory domains. It is theoretically possible to consider the establishment of a multi-centre study to bring together data from many individual studies into a single pooled individual participant data meta-analysis. This is administratively cumbersome as it requires institutional-level collaboration and material transfer agreements to deal with the governance of data sharing. The complexity on a global scale would be considerable and there are often specific restrictions on sharing individual participant genetic data. Thus, it would be preferable to find a solution that simulated a pooled individual participant data meta-analysis which does not require central deposition of data and which, therefore, could be conducted more easily at a global level without a complex centralised governance structure.

PROJECT SCIENCE & TECHNOLOGY OBJECTIVES

InterConnect aims to change the way that data are used in population research into the causes of diabetes and obesity. It seeks to create the foundation to enable research to move from explaining the differences in the risk of diabetes and obesity within populations to being able to explain differences in risk between populations.

Current strategies for studying variation between populations are limited and InterConnect therefore aims to enable a solution which directly investigates between-population variation via meta-analyses of individual participant data across studies but without physical pooling of the data. Such an approach is currently limited by a lack of knowledge of relevant studies, their design and the data available; by methodological diversity in the assessment of exposures and outcomes; and by the lack of a framework for federated meta-analysis. InterConnect therefore seeks to build the foundation for cross-cohort analysis on a sustainable basis through the following objectives:

• Developing and populating an online registry of studies relevant to the field of the causes of diabetes and obesity; providing a mechanism to link to study-level meta-data including standardised descriptions of the populations studied, assessment measures used and materials stored.

• Creating a virtual forum for harmonisation of methods between relevant studies, including objective approaches to the measurement of key environmental exposures such as diet and physical activity and the use of biomarkers.

• Establishing an appropriately governed framework in which individual participant data from contributing studies can be analysed in a safe and protected setting that protects privacy, is aligned with the consent and legal arrangements of the studies and maximises the utility of the information that has been collected.

• Developing statistical and epidemiological methods for federated individual participant data meta-analysis between populations.

• Establishing a funders’ network to ensure connectivity with the project and a forum for stakeholders who have an interest in the policy, social and economic benefits of the research that will be enabled by InterConnect, acting together to promote cultural change towards a new paradigm of optimising use of existing data.

Project Results:
INTRODUCTION

The S&T activities of InterConnect have been structured through three Action Lines which focused on addressing the project objectives outlined in the preceding section through the development of an online registry of studies, resources for method harmonisation, and data management and governance arrangements for conducting federated meta-analyses.

These Action Lines interfaced with work-packages (WP) focused on engagement of funders and stakeholders, the dissemination of information and also with horizontal scientific themes to ensure activities were linked to current and future scientific direction. Science needs were operationalised through ‘working exemplar projects’. By addressing research questions of aetiological and public health interest, the exemplars served to engage researchers and also enabled understanding of the real-life issues that affect the implementation and uptake of the tools that were being developed. The first exemplar project was set up to investigate the association between physical activity during pregnancy and neonatal anthropometric outcomes. A second exemplar focused on the association of fish intake with the risk of T2DM and provided a foundation for other exemplars investigating the links between different aspects of dietary behaviour and diabetes risk.

The major domains of the project are illustrated in Figure 2.

ACTION LINE 1: STUDY REGISTRY

The overall aim of this Action Line was to build a registry of studies so that researchers are able to identify studies that are relevant to the study of differences in risk of diabetes and obesity between populations, including the study of gene-environment interaction. The work has been completed through three WPs. The role of WP1 was to develop the online registry and populate it with studies in adults in which the outcome of interest was T2DM or obesity. The role of WP2 and WP3 was to populate the registry with studies relevant to specific ethnic minority and migrant populations, and studies relevant to pregnancy and childhood respectively.

Our strategy for the registry was to focus on only including a limited set of information that would generally be readily available from public sources. This approach was designed to enable sign-posting of a large number of studies without creating a burden on study investigators. We describe below how the registry was set up and populated.

• Setting up the registry

The online registry (Figure 3) was successfully set-up using the Mica software to host the registry database as well as to provide all necessary resources to enter study meta-data via a web-based data collection form, to verify this information via study investigators and to visualize the registry database online in a form that is searchable and accessible to the public (http://www.interconnect-diabetes.eu/data-discovery/).

Descriptive information and study level meta-data relevant for the project were defined. Protocols, which laid out how to identify such information in WP1, 2 and 3 as well as how to populate the registry database with this information via the web-based data collection form, were developed.

Before study meta-data was published in the online registry, study investigators were given the opportunity to review the information through the web-based forms and to make changes if necessary. Study records that were actively verified by investigators were published as ‘verified by the investigator’ and the remaining studies were published as being ‘populated from public sources’ after 4 weeks notification. This ensured that there was no undue delay in building the registry resource.

• Populating the registry

Each of the WPs used two main approaches to populating the registry with studies relevant to the study of differences in risk of diabetes and obesity between populations. These were systematic reviews to identify studies from the published literature, and a variety of searches (e.g. study websites, grey literature, exemplar projects, personal contacts) to identify relevant studies that were unpublished. The approaches were, therefore, complementary, so that the registry could be made as complete as possible for the purpose of data discovery.

The registry has been populated with descriptions and meta-data from 276 studies relating to diabetes and obesity. Studies from 78 countries are represented in the registry, providing a substantial level of geographic diversity and a useful platform from which researchers may identify studies with specific parameters for cross-cohort analyses (Figure 4).

Investigators can email InterConnectRegistry@mrc-epid.cam.ac.uk if they would like their study to be added to the registry through the web-based data collection form, with the intention that the registry continues to expand.

• Systematic reviews

The process of populating the study registry required the completion of four systematic reviews which covered the following topics:

- Gene-environment interaction and type 2 diabetes in adults (WP1)
- Gene-environment interactions on diabetes and obesity among ethnic and migrant populations (WP2)
- The contextual factors that are relevant to ethnic and migrant populations (WP2)
- The association between infant formula milk feeding and infant growth (WP3)

In addition to being successful in identifying studies for inclusion in the registry from a diversity of geographic regions and environmental contexts, these systematic reviews have generated findings for publication in peer-reviewed journals. Two reviews have been written up and submitted for publication (WP2) and others are in process (WP1, WP3). The meta-analysis of the association between infant formula milk feeding and infant growth showed that findings from studies from low and middle income countries (LMICs) differ from studies from developed countries. This is an original observation with major public health relevance and it highlights the importance of considering the wide range of evidence and geographic diversity.

• Exemplar projects

As described in the introduction, a connecting theme has been the use of exemplar projects to engage researchers and deliver the objectives of the project. 22 studies have been or are currently participating in exemplar projects. These studies are represented in the InterConnect study registry and so can be readily identified by researchers for future cross-cohort analyses.

The process of conducting the exemplars requires meta-data from the individual studies and the development of code that can be applied remotely to transform the data to a common format. Additional work has been undertaken to expand the registry to make the data dictionaries for the raw data and the harmonisation algorithms available so that resources developed through the exemplar projects are available for re-use. This is described more fully below.

ACTION LINE 2: METHOD HARMONISATION

The overall aim of this Action Line was to develop a framework for retrospective harmonisation of methods in existing studies and for promoting methods that are fit for purpose for the measurement of exposures and outcomes in future studies. WP4 focused on the harmonisation of exposures (risk factors for disease) and WP6 focused on the harmonisation of disease outcomes. WP6 focused on genetic information, as described below.

Physical activity and diet were identified as priority exposures. Both are important public health exposures that are aetiologically relevant and potentially amenable to modification. However, the assessment of diet and physical activity is complex. There are many subjective and objective methods with varying degrees of validity for different variables, populations and research designs, and this can be confusing for researchers. Anthropometric measures are often used as an outcome or an exposure in epidemiological studies. While validity is less of an issue than for the assessment of diet and physical activity, anthropometry can be affected by observer error and therefore, standardisation, training, and quality assurance / quality control (QA/QC) are important considerations.

Our strategy for the harmonisation of exposures and outcomes was (1) to use exemplar research projects to focus retrospective harmonisation activities so that they were driven by scientific need, and (2) to develop a toolkit that provides researchers with information on the measurement of diet, physical activity and anthropometry. This enables users to be better equipped at using and interpreting existing data and in reaching an appropriate decision about which methods are fit for purpose when planning new studies, so facilitating both retrospective and prospective harmonisation.

• Use of exemplars for retrospective harmonisation

Real-life exemplar projects have been used to focus retrospective harmonisation and develop resources for re-use that are driven by scientific need and have practical use. These exemplars have investigated the association between physical activity in pregnancy and neonatal anthropometric outcomes and the association between fish intake and the development of T2DM.

We have systematically collected meta-data on methods, exposures and outcomes available for all studies participating in the physical activity and fish exemplar projects. The meta-data routinely collected included details relating to the methods and instruments used for measurement of exposures and outcomes. These comprised: type and name of instruments (tools) if known, time frame, validation and source of information. A meta-data form was created to operationalise this process. The information collected was then used to evaluate harmonisation potential across the data sources. Three steps to data harmonisation were followed:

- Definition of target variables, based on the collection of meta-data from participating studies and agreement about the target variables for harmonisation through evaluation of the meta-data and project objectives;
- Assessment of harmonisation potential by reviewing meta-data in detail to establish which studies can contribute to which target variables;
- Development of data processing algorithms which can be applied to derive common format data.

To streamline and operationalise the harmonisation process, we developed tools and procedures:

- A meta-data request form to collect meta-data;
- Regular web conferences with the working groups that were established around each exemplar project;
- A matrix displaying harmonisation potential;
- Use of online collaborative google spreadsheets so that data analysts and researchers working on harmonisation could share information and instantly review working algorithms.

Harmonisation of exposures and outcomes were completed for the physical activity and fish exemplar projects, for which results of federated meta-analyses have also been generated, as described below in the Action Line on Data Management and Governance.

Extensive work is required to harmonise data that have been collected using different measurement tools and harmonisation is achieved through the use of algorithms specifying transformations that are applied to the raw data. In the federated meta-analysis approach that has been implemented through the InterConnect project, these algorithms are executed as code on each study’s local data server; as such they are accessible to researchers with the appropriate security permissions but not more widely. Given the extensive work involved in data harmonisation, it is important that the resources are made widely available to enable re-use and to reduce the effort needed by others trying to harmonise similar data sets in the future. We have, therefore, used a new, improved version of the software platform hosting the publicly accessible InterConnect registry to catalogue the harmonisation algorithms from the exemplar projects so that they are widely available for re-use by others. This resource is available via https://studies.interconnect-diabetes.eu/registry/ and Figure 5 contains screenshots from the physical activity exemplar to illustrate the information therein.

• Development of a measurement toolkit

The Diet, Anthropometry and Physical Activity (DAPA) Measurement Toolkit (http://www.measurement-toolkit.org/) is an online tool which signposts researchers to an inventory of subjective and objective methods for the assessment of diet and physical activity, key exposures of public health importance, as well as methods for anthropometry including body composition (Figure 6). Our strategy was to create a ‘one-stop shop’ by building on a resource that was originally set up through an MRC-funded project to create a single resource for researchers which is preferable to wide proliferation of multiple toolkits and websites.

The DAPA toolkit does not recommend or promote any specific method or instrument, but rather provides information for users to be better equipped at using and interpreting existing data or reaching an appropriate decision on choosing methods that are fit-for purpose when planning new studies. The toolkit provides a framework to assist researchers, building from basic concepts of measurement theory, through online resources on the measurements and instruments, to guidance on the complex issues of retrospective and prospective data harmonisation.

In terms of specific content, there is an inventory of methods structured by assessment domain. Each method-specific page provides information about the underlying principles, practical and analytical guidelines, and suitability for assessing different populations and variables in various research designs. In addition to this inventory of methods, the toolkit has further sections to introduce the basic concepts of assessment in population health sciences, matrices which summarise important considerations when using specific methods or the resulting data, and an instrument library which provides detailed information on the use of specific measurement instruments. There is also an introduction to concepts in data harmonisation and specific case studies, derived from the exemplar projects.

• Resources for genetic information

The focus of WP5 has been the harmonisation of genetic information across populations with diverse ancestry to enable analyses to combine all available data by leveraging the genetic diversity.

To this end, WP5 defined the descriptive information on self-reported ancestry and available genotypic data and this was incorporated into the design of the study registry, which is described above. An R script was written to produce principal component analysis plots to describe population structure and the plots produced can be uploaded to the registry. Through literature review and expert consultation admixture mapping and identity-by-descent (IBD) mapping were identified as the two most commonly used genome-wide approaches that can be used when study populations are diverse. These methods are complementary to traditional genome wide association studies (GWAS) but, unlike GWAS, they leverage genetic heterogeneity rather than suppressing or simply accounting for it. Links to currently used software for admixture and IBD mapping have been made available through the InterConnect catalogue of resources (http://www.interconnect-diabetes.eu/catalogue/tools-data-analysis/).

ACTION LINE 3: DATA MANAGEMENT AND GOVERNANCE

The overall aim of this Action Line was to identify feasible data management methods that allow analysis of gene-environment interaction using IPD across studies globally within an appropriate governance framework, and was taken forward by WP7.

The data management strategy built on the work of the FP7-funded BioSHaRE project, Maelstrom Research and DataSHIELD, an open source software for remote and non-disclosive analysis of research data developed originally by Prof Paul Burton from Newcastle University (U.K.).

A fundamental aspect of the InterConnect approach is a federated process. Individual participant data (IPD) from contributing studies are held securely on geographically-dispersed, study based servers; analytical commands are sent as blocks of code from an analysis server to which non-identifiable summary statistics (i.e. results, not data) are returned (Figure 7). Analyses are performed locally so all data stays at source, within the governance structure of the originating study and under their control.

• Validation of federated meta-analysis using DataSHIELD

Two validation studies of federated meta-analysis using DataSHIELD were conducted. The purpose of these studies was to:

- Build an understanding of what is required to set up a federated analysis;
- Identify any specific issues;
- Verify that results obtained through federated analysis are similar to those obtained through standard analysis.

The first validation study used data from a Vitamin D supplementation trial where data had been collected in two different sites. Two separate Opal servers were configured, one for each dataset in order to mimic a project with two separate studies. Data was harmonised through the remote application of algorithms and a regression analysis was conducted through the federated meta-analysis approach using DataSHIELD. The results obtained from the federated data analysis were the same to two decimal places as those previously obtained in Stata using conventional data pooling.

The second validation study was an empirical comparison of the study of gene-environment interaction using the federated analysis tool compared to a more traditional individual participant meta-analysis using a combined dataset of multiple studies. This was tested through the FP6-funded InterAct study since this provided a route to the approach of bringing together data from different sources since this study included multiple countries. An individual participant meta-analysis to investigate the interaction between coffee intake and the TCF7L2 gene was undertaken, with the analysis conducted in two different ways:

- On a standalone instance of R where models were run for each country and meta-analysed in a conventional data pooling approach;
- With the dataset split by country into separate Opal projects and a separate central analysis server coordinating the meta-analysis using DataSHIELD.

A logistic regression model was used with T2DM status as the outcome. The exposure variable was TCF7L2. The gene-diet interaction was modelled assuming an additive effect by including a multiplicative interaction term in the model for coffee and TCF7L2. This model was further adjusted for age, sex, physical activity, education, BMI, smoking status, total energy intake, intake of fruit and vegetables, meat, soft drinks and alcohol. Results were scaled to an intake of 125g per day of coffee. The logistic regression coefficients for the gene-diet interaction coefficient from each country’s model were estimated and combined using random effects meta-analysis. The meta-analysis was performed using the RMA function from the ‘metafor’ R package.

The analysis showed that there is no significant interaction between coffee intake and TCF7L2. The results of the analysis were exactly the same regardless of whether all the models were run on the same machine through a conventional data pooling approach or run as separate projects on Opal i.e. a federated meta-analysis approach.

• Use of exemplar projects to implement federated meta-analysis

Exemplar projects that address current research questions of direct relevance to public health have been used in order to engage researchers in the IPD federated meta-analysis approach. These projects have also served to identify the real-life issues that affect implementation and uptake of the tools and establish the foundations of a potential collaborative network for cross-cohort analysis in diabetes and obesity research.

To engage studies in the exemplar projects, a range of communication materials were developed and WebEx meetings were held with external (i.e. not part of the InterConnect consortium) study investigators to encourage their participation. Communication resources that were developed are made available for re-use through the InterConnect resource catalogue (http://www.interconnect-diabetes.eu/catalogue/tools-communication-stakeholders/).

A total of 22 studies have actively engaged with one or more exemplar research projects. This has required study investigators to: provide meta-data, participate in discussions related to the harmonisation potential of the data and the analysis plans, physically set up their own local data servers and upload the required variables for IPD federated meta-analysis.

To support and enable study participation, extensive technical guidance was provided and this is captured for re-use in the SOP for setting up the server and preparing data for federated meta-analysis. The SOP includes the following topics:

- Specification of the hardware required;
- Configuration of the operating system and prerequisite software;
- Advice on security configurations;
- Tests to ensure that setup was successful;
- Instructions on preparing and uploading data;
- Setting up the server for harmonisation to take place.

The SOP was continually developed through the implementation of the exemplars, as new real-life implementation and technical communication issues were identified and resolved. The SOP was, therefore, made available in GitHub to allow real-time updates Links to the SOP and example code for conducting analyses are available through the resource catalogue (http://www.interconnect-diabetes.eu/catalogue/tools-data-analysis/).

A particular implementation challenge was the upload of the required data variables to the local study server, which many studies originally found difficult. The upload tool in Opal used to only accept SPSS files (which is a simple way of uploading the data) or the data and data dictionary both needed to be as Excel files (which required a lot of manual effort). WP7, therefore, had to provide support and a bespoke script to help studies convert their data, which was mostly in Stata format, to the Excel format. Based on this practical experience, WP7 added in a new feature (in beta test) to Opal that accepts files in the SAS and Stata format as well as SPSS. This subsequently helped to smooth the process in the majority of cases.

The implementation work through the exemplar projects also highlighted the need for further development of DataSHIELD, which is led by Prof Paul Burton of the University of Newcastle. This development is beyond the scope of the FP7-funded InterConnect project. WP7 has worked in close contact with Prof Burton to communicate these needs and efforts will be made to jointly-secure new funding for the development and on-going maintenance of DataSHIELD.

The purpose of the exemplar projects was in part to understand and address implementation issues, as described above, but also to demonstrate the practical utility of the IPD federated meta-analysis approach by producing results of public health significance. Eight population-based studies participated in the first exemplar on the association between physical activity in pregnancy and neonatal outcomes (Figure 8). Federated meta-analyses of individual participant data (IPD) were conducted remotely in each cohort generating results from c. 73k participants without physical pooling of data. This analysis showed that leisure time physical activity in late, but not early, pregnancy was consistently associated with lower birth weight and lower risk of large for gestational age babies (Figure 9). The results of this first project have been submitted for publication in a peer-reviewed journal.

Results have also been generated from the second exemplar on the association of fish intake with risk of T2DM diabetes and this exemplar also paved the way for further exemplar questions relating to other dietary factors (legumes intake, dietary patterns and gene-diet interaction for dietary factors and TCF7L2 on T2DM risk). In total, 22 external studies have or continue to participate in the exemplar projects. A pipeline of activity initiated by InterConnect is continuing on other dietary research exemplar projects. This forms a potential foundation for new research questions that are identified and led by others beyond the InterConnect consortium, to be addressed through federated meta-analysis in the future.

Overall, the exemplar projects have demonstrated the practical feasibility of conducting cross-cohort analyses in a manner equivalent to a meta-analysis of individual level data but without any direct access to individual-level data. The approach has analytical strengths compared to conventional approaches to sharing of results. The analyst has the flexibility to refine and re-run analyses quickly, heterogeneity is reduced by the ability to include the same types of confounders and all analyses are conducted consistently across participating studies on data that has been harmonised to a common format.

• Consideration of ethical, legal and social issues (ELSI)

Through a variety of meetings and discussion with both experts and researchers participating in the exemplars, WP7 considered the relevant ELSI issues. It was concluded that the federated approach does not raise any issues beyond those that are standard for all research. Those with responsibility for study data must ensure that the proposed analyses are consistent with the terms of consent for the original data collection and that all relevant institutional scientific, data access and ethical approvals are in place; these are standard requirements for all research and not different for the federated meta-analysis approach.

An important distinction was drawn between data sharing and data access; the InterConnect approach is providing access to data that are held within institutions and not transferred, and sharing of results, rather than physically sharing data. As such, conventional data sharing agreements are not well-suited and a bespoke “Data Access and Results Sharing Network Agreement” was, therefore, developed. This is available through the catalogue of resources (http://www.interconnect-diabetes.eu/catalogue/) for groups who wish to prospectively establish a formal collaborative group or network. An important aspect of the agreement is the definition of different roles within the federated meta-analysis process and the responsibilities pertaining to those roles; embedding awareness of these may be achieved through a variety of routes, and not limited to formal collaboration agreements.

NETWORKS AND DISSEMINATION

The Action Lines described above interfaced with work-packages (WP) focused on engagement of funders and stakeholders, and the dissemination of information.

Throughout the project EURADIA has used its network of partners in the Alliance for Diabetes Research and the network created from the FP7-funded DIAMAP (A Roadmap for Diabetes Research in Europe) project to create contacts lists and circulate information about InterConnect activity, progress and links to on-going work. A successful joint funder and stakeholder meeting was held in October 2014 and participants agreed to be part of a virtual funder and stakeholder network. A further three Symposium events were held immediately preceding the annual European Association for the Study of Diabetes (EASD) conferences, and these served to communicate progress, discuss the concepts behind the project and further expand the virtual networks. Information about the project has also been disseminated through a wide range of channels, including the project website, exhibitor stands at many health care events and research conferences, events relating to policy at the EU level and other EU public health programme initiatives, seminars and through newsletters and social media. Tailored information about the project for potential participants in the working exemplar projects was also developed.

The original ambition was for a funder to take on leadership of a network of funders but a specific funder did not come forward. This was something that we could not directly control, as the network needed to be self-determining. Dialogue therefore diversified to engage in particular with the pharmaceutical industry, for which the InterConnect approach may be very attractive, enabling access to commercial clinical trial data and data collected in clinical or academic settings without loss of control by either industry or academia. Discussion has primarily been via the Innovative Medicines Initiative.





Potential Impact:
MAIN OUTPUTS

The InterConnect project has created foundations to enable research into the differences in risk of diabetes and obesity between populations through the following main outputs:

- A study registry which includes over 275 studies from 78 different countries and provides a resource for identifying potential opportunities for cross-cohort analyses;

- The Diet, Anthropometry and Physical Activity (DAPA) Measurement Toolkit which provides information for users to be better equipped at using and interpreting existing data or reaching an appropriate decision on methods that are fit-for purpose when planning new studies;

- Harmonisation algorithms that convert data to common formats which have practical use in exemplar projects and have been made available for re-use to reduce the effort needed by others trying to harmonise similar data sets in the future;

- Validation of the IPD federated meta-analysis approach through direct comparison with pooled data, including for the analysis of gene-environment interaction;

- Clarification that the ethical, legal and social issues relating to federated meta-analysis are the same as for standard research processes;

- Proof of concept through the exemplar projects that the federated meta-analysis approach can be used practically by researchers to address questions of public health significance;

- The engagement of 22 studies from 17 countries which have each set up a local server and actively participated in either completed or on-going exemplar projects, so potentially creating a re-useable infrastructure.

In addition to these practical outputs, we have developed an understanding of the strengths of the federated meta-analysis approach and defined the situations in which it may be considered to be the method of choice. Challenges in implementation and new requirements that are needed to support wider use of federated meta-analysis have also been identified. These considerations are outlined below.

POTENTIAL IMPACT

The federated meta-analysis approach has advantages over conventional approaches to cross-cohort analyses which are based on either data pooling or results sharing. Physically bringing data together from cohort studies across the world is constrained by governance, ethical and legal challenges. Unlike conventional approaches to sharing of results, the analyst running a federated meta-analysis has the flexibility to refine and re-run analyses quickly, heterogeneity is reduced by the ability to include the same types of confounders and all analyses are conducted consistently across participating studies on data that has been harmonised to a common format. In addition, once set up the infrastructure can be re-used with the study team simply providing access to additional subsets of data depending on the specific analytical requirements of each new research question. This gives a secure and scaleable approach to cross-cohort analyses.

The active participation of 22 studies in the exemplar projects is testament to the recognition by researchers of the strength of the federated meta-analysis approach since it required them to commit their own resources, both in terms of staff time and IT infrastructure. Some studies were able to participate in a very straightforward manner but others found participation more difficult, particularly where there was a lack of local IT support or the study did not already have good quality meta-data. Given that the federated meta-analysis does require resource and expertise to set up, it is important to define the particular situations in which it is the method of choice. These are largely cross-cohort analyses that investigate ecological variation across different geographies or populations or analyses which have a requirement for the analysis of sensitive data.

A particular strength of the federated meta-analysis approach once it is set-up, is that it potentially enables cross-cohort research to be democratic and equitable. Participating studies remain in full control of their data, they can opt-in or out of specific research questions on a case by case basis, and the amount of work aligns with the scientific role of the participants. This contrasts with data pooling, where study contributors lose control of their data, and conventional results sharing approaches where the burden of work falls on the local study analysts but the kudos resides with those leading the research question, creating inequity. Equity is a particularly important consideration in the analysis of differences in risk between populations, since such research may often require data that was collected in low and middle income countries.

A pipeline of activity initiated by InterConnect is continuing through a number of on-going exemplar projects, and this forms a potential foundation for new research questions, identified and led by others beyond the InterConnect consortium, to be addressed through the federated meta-analysis approach in the future. On-going use of this infrastructure is dependent on a number of developments:

- Expansion of DataSHIELD functionality and resources for roll out of releases, with appropriate governance

- Training, refinement of current functionality and capacity building to enable researchers to code harmonisation algorithms and conduct federated analyses using DataSHIELD packages within R statistical analysis software.

- Wider recognition of ‘collaborator status’ on publications and by funders in evaluations of research contributions to incentivise the re-use of existing data, without comprising International Committee of Medical Journal Editors (ICMJE) guidelines on authorship

FUTURE DISSEMINATION ACTIVITIES AND PLANS FOR USE OF FOREGROUND

Our on-going dissemination activities to encourage uptake and use of the resources developed through the InterConnect project are targeted to different stakeholder groups, as outlined below.

• Researchers

As already highlighted, 22 studies are actively engaged in research using federated meta-analysis through the exemplar projects. Research is on-going for a number of diet-related questions and the grouping that completed the first exemplar question on physical activity during pregnancy is now considering a new research question. This forms a base from which to seed further organic expansion, and publications resulting from these exemplars will increase awareness of the approach in the wider research community.

Our engagement with researchers has influenced other study consortium approaches. It is encouraging that the recently H2020-funded LifeCycle Project includes use of DataSHIELD as a mechanism to continue the development of the EU Birth Cohort Network. This very helpfully adds critical mass to the research infrastructure for federated meta-analysis and makes it possible to see how a larger network of topic specific networks or groupings can emerge over time. We have agreed with Prof Vincent Jaddoe (LifeCycle Project Co-ordinator) to share the insights we have gained about the technical aspects of supporting local server set-up and software installation, DataSHIELD analyses and investigator training, data management and harmonisation, and study governance. We will also discuss with Prof Jaddoe how research questions of common interest between the groupings established through InterConnect and the EU Birth Cohort Network can be taken forward together.

• Funders

We are in close contact with Prof Paul Burton (Newcastle University) to discuss opportunities for research grant funding to support the development of DataSHIELD. Discussion is also continuing through the forums of European Medical Informatics Framework (EMIF) project, with a view to potentially influencing the scope of projects that may follow on from the current Innovative Medicines Initiative funding.

The Coordinator is also maintaining a watching brief on academic funding opportunities that may be particularly suited to the approach. An example is the Global Challenge Research Fund in the UK. This is providing extensive funding for international research, particularly in low and middle income countries for which the equitable nature of the InterConnect analytical approach is well suited.

• Wider stakeholders

EURADIA have committed to continue to disseminate information about the InterConnect project and resources through its communication channels, including regular newsletters, conferences, events and social media.

The Consortium will write an opinion piece for one or more of the major journals once the results of the first exemplar project are published in a peer-reviewed journal; such articles will have maximum value in encouraging uptake of the approach at this point. In parallel with this, EURADIA is drafting a lay summary of the project which will link to the catalogue of resources and be issued as a press release.

List of Websites:
Coordinator: Professor Nick Wareham (Nick.Wareham@mrc-epid.cam.ac.uk)
Project Manager: Dr Rebecca Stratford (Rebecca.Stratford@mrc-epid.cam.ac.uk)

MRC Epidemiology Unit
University of Cambridge School of Clinical Medicine
Box 285 Institute of Metabolic Science
Cambridge Biomedical Campus
Cambridge, CB2 0QQ, UK
final1-scientific-report-figures-final.pdf