Skip to main content

Integrated human data repositories for infectious disease-related international cohorts to foster personalized medicine approaches to infectious disease research


Optimised mapping of seroprevalence data according to time, targeted populations and categorised assays

Task 84 Harmonisation of seroprevalence data for COVID19 and generation of a seroprevalence map of EuropeLead AMU other partners involved UCD UKHD EMBLSeroprevalence studies represent a major epidemiological and public health tool They allow evaluating the attack rate in general or specific populations and estimating the immune status and thus the vulnerability to further spread of the outbreak of these populations In particular data from such surveys are needed over time to track the immune status of populations and to determine if and when the prevalence of antibodypositive individuals is reaching a point at which herd immunity can be anticipated Before reaching herd immunity the data can also be used to calibrate mathematical models and inform public health officials about transmission dynamics The methodology of assays has not been harmonised nor completely optimised Different proteins mainly the envelope and the nucleoprotein the latter being more conserved among coronaviruses and different classes of antibodies IgG IgA IgM are targeted by ELISA tests Neutralisation assays unfrequently use the reference PRNT technique which is poorly adapted to large series and cannot be automated When virus neutralisation tests VNT are used in a 96well format the amount of virus varies among investigators most frequently in the range of 50100 TCID50 per assay which produces different neutralising titres In addition different pseudoneutralisation assays have been implemented This is convenient since the assay can be performed outside a BSL3 laboratory However the density of envelope proteins on the surface of pseudoviruses is usually much lower that observed in wildtype virions Consequently low amounts of antibodies can neutralise such pseudoviruses and pseudo neutralisation techniques produce much higher neutralising titres than traditional techniquesOther important sources for heterogeneity between serological surveys are the lack of comparative data between the multitude of both commercial N 230 on the FIND website and inhouse assays and the differences between studies with regard to the design and the choice of the target populations Thus seroprevalence studies may provide a constellation of unrelated pictures in the absence of harmonisation efforts It is unrealistic at this stage to propose a full standardisation of assays Rather what is currently required is i the implementation of a comparator panel that would link the different seroprevalence surveys and would allow proposing a sound dynamic map of seroprevalence across Europe ii an expert analysis of epidemiological and biological study designs to provide skills and scientific content for optimizing data sharing and data comparabilitySpecific subtasks Subtask 841 Situational Analysis to establish in conjunction with the WHO solidarity II serology consortium and from other sources if publicly available a mapping of serosurvey efforts in Europe including the technical details of testing and study designSubtask 842 Toolbox for sound comparison of independently produced seroprevalence data This includes a literature analysis to optimise the characterisation of testing methods ie sensitivity specificity antigens crossreactivity etc reconciling algorithms for heterogenous seroprevalence data and a roadmap for comparative analysisSubtask 843 Networking for implementation of comparator panels for improvement of standardizsation We will contact potential partners Eur Blood alliance Eur Virus Archive WHO using synergies with existing projects see below to promote the implementation of comparator panels to improve the comparability of related assays according to antigen and assay type Synergies The partners are members of the WHO Solidarity II serology consortium which encompasses a large number of ongoing and planned serosurveys in Europe and globallyAt

Statistical guidance and approaches for dealing with heterogeneity, missing data and measurement error in pooled cohort data sets

Task 33 Reconciling measurements of individual cohort participants across heterogeneous data setsKey problems when combining multiple data sources arise when cohort studies adopt different variable definitions or measurement methods when data are prone to measurement error or when studies are affected by missing data For this reason this work package will develop a statistical framework to simultaneously account for all of the aforementioned sources of uncertainty and bias This framework will integrate stateoftheart methods for dealing with missing data and measurement error and extend them for application in heterogeneous data sets Further new multivariate metaanalysis methods will be developed to reconcile situations where standardization of certain variables is no longer feasible These methods will adopt advanced penalization schemes to facilitate their applicability in sparse and high dimensional data sets Finally we will integrate input from scientific experts ie immunologists virologists statisticians and teams on the ground to ensure that the underlying data generation processes are properly accounted for The proposed framework will adopt a Bayesian estimation paradigm to simultaneously propagate all relevant sources of uncertainty and to adapt model complexity as new participantlevel data covariates andor studies become available Data curation and statistical methods will work in concert to ensure that both the model complexity and findings are based on the most recent dataevidence

Report on the 1st round online survey and interviews related to perceived benefits and risks of sharing among cohort investigators

Task 2.1 : Elaboration of steps needed for cohorts to participate in collaborative, decentralized platform – Perceived risk versus benefits of sharing CE and HDL data in cloud-based, federated repository This task aims to set the scene for the establishment of collaborative and decentralized data sharing platforms. While sharing CE and HDL data is supported and sometimes required by major funding agencies, journals, and other stakeholders, cohort study staff and leadership oftentimes do not see data sharing as a net benefit for themselves and their research agenda. Data sharing concerns related to intellectual property, ownership, authorship, and regulatory barriers need to be fully understood and addressed prospectively. Therefore, the focus of this task is to understand the perspective of the potential participants – individual cohort study staff and investigators. We will develop an assessment tool to elicit input and responses from a broad audience of cohort investigators that range from those that are motivated to share all of their data to those who are unwilling to share CE and/or HDL data. We will focus our assessment on understanding: (i) the perceived risks vs. benefits of sharing data in a cloud-based, federated repository; (ii) the perceived need for automated work flows that provide regular analyses of CE and HDL data and how these can be incorporated into the shared platform; (iii) the perceived need for collaborative analyses that leverage human HDL data for personalized medicine and how these can be facilitated though the shared platform; (iv) the perceived need for research capacity building in management of CE and HDL data and analysis; (v) the perceived need and strategies for sensitizing cohort staff to the utility of new forms of high dimensional data for personalized medicine approaches (i.e. microbiome data). We will use an online survey to collect responses and allow for repeated input during the course of the research project. This will be complemented with in-person meetings and especially with one stakeholder conference (see WP6). This task will be managed back-to back with Task 5.1 dedicated to implementation of a Decentralized Organization. Both tasks will undoubtedly partially overlap, but it is critical to have these issues addressed at both a global unifying level and in the context of the local implementation of an adapted governance to overcome obstacles arising at the local level. To help in this integration process, the same participants will participate in both tasks.

Launch of unified infectious disease-related cohort data portal

"Task 5.2 : Unified infectious disease-related cohort data portal We will implement a unified data portal providing a single point of access for researchers to comprehensive infectious disease cohorts - from the Consortium and beyond - combining detailed descriptions and direct access to datasets held in the underlying data repositories European Nucleotide Archive (ENA) and European Genome-phenome Archive (EGA), and the respective connected cohort data hubs built upon these repositories. Following an initial back-fill of the portal, new datasets from infectious disease-related cohorts will be automatically identified in each archive or data hub, catalogued in the portal's back-end database and rapidly displayed both through the intuitive web portal interface and well documented programmatic API interface. This integration of data from multiple archive locations has been successfully implemented in previous large projects such as the European Virus Archive ( or more recently the HipSci ( that provide extensive metadata, clear visualisation of the available datasets from a range of archive locations and direct access to underlying data in each archive, including support for batch processing and information on applying for access to managed datasets. For this task we will specifically reuse technical components from the HipSci data portal, developed at EMBL-EBI."

Website of the project

Task 7.4 : External and internal communication/ Website of the project and dissemination via Twitter and Facebook An interactive website will be created and will be available for communication and dissemination of results. This will include extra- and intranet resources. Within the intranet, functionalities will include the possibility to find all the documentation needed for an efficient management of the project, alongside the possibility to share useful documents with the members of the other WPs. This platform- accessible only to the project members- will provide a single starting point to access internal and external resources such as partners contacts and mailing lists, administrative documents (templates, budget plan, calendar, reporting tools), internal procedures, communication tools (logos, brochures, etc.), meetings minutes and useful links. Dissemination of the final results of the project will be detailed by a dissemination plan, which will also lead into the final consortium meeting where we will combine the final stakeholder conference with a dissemination of the results for the scientific public. With regards to external communication, the (public) website will host the main content related to the project such as information about the consortium and partners involved, main objectives, latest news and scientific advances in the field. The project's outputs will be also disseminated within the scientific community and other external stakeholders through several social media, such as Twitter and Facebook.

Data management plan

Task 7.3: Data Management Plan and plan for dissemination of results The data management plan is part of the initiative for open science and as a mandatory deliverable will be coordinated from WP7, but in close collaboration with WP2-5 as the data generated in the project will emerge actually more in the form of a product (the searchable platform for EC and HDL data, with a federated data repository in the background).

Create operational decentralised Local Selection Panels in participating clinical centres

Task 51 Decentralised organisation for the coordinated management of data and biological resourcesSubtask 511 The main steps managed at the local level in clinical research programmes include obtaining clearance from local ethics committees and local regulators enrolling patients and collecting then managing both biological samples and clinicalepidemiological information from enrolees Accordingly optimising local resources implies a convergence between biological resources and clinical data repositories that should be managed backtobackWe propose to implement a decentralised organisation for the management of data and biological resources that will take advantage of some of the concepts tools and processes previously developed for EVA and will adapt them to the specific case of sharing data and biological resources We will generate a catalogue of biological resources and categorised data see next section as a part of an integrated process aiming at promoting sharing of data and biological resources This will allow sustainably associating biobanking and clinical information Requests for accessing data andor biological material will be received via the internet portal and examined by a selection panel including representatives of local stakeholders of the research programme concerned and of the overarching cataloguing organisation The selection panel will also include scientists who can assess the scientific and statistical soundness of requests The composition of the local component of this selection committee will be pragmatically adapted to the local situation to include representatives of the relevant stakeholders such as biobanks clinical researchers and possibly the local ethics committee The role of the panel is to examine the application and either make a direct decision or if necessary provide recommendations or further requests to decision makers eg the scientific committee of the biobank or the local ethics committee if concerned A standardised process and the presence of representatives of local decision makers should expedite this process The selection panel will review research proposals in a timely fashion using established criteria and will maintain a publicly accessible list of ongoing research projects that includes their submission revision and approval dates to ensure accountability to both cohort studies and the larger scientific communityCriteria examined will include the academic or commercial nature of the demand its scientific relevance and degree of priority for using the available resources the categories of data that would be relevantly and safely transmitted to the applicant the regulatory and ethical considerations related to material transfer the respect of reciprocity in data sharing etc In the case additional biological analyses would be requested the panel will also examine which analyses could be performed locally and funded by the applicant or would require transfer of the samples with respect of the Nagoya protocol It will also examine how new data generated by the applicant could be produced in a standardised format and further enrich the local database A document signed by the legal representatives of both parties should govern the process clarify duties and protect rights and ownershipSubtask 512 includes creating implementing and improving the complete procedures and reference documents and creating Local Selection Panel LSP for the sites included in the study It also implies promoting intersite standardisation whilst taking into account the specificity of each local contextSubtask 513 Legal aspects of harmonized biobanking and exchange of biological materialPolicies and Acts that guide biobanking and information sharing are rapidly evolving and represent significant constraints for both originators and applicants sharing data and biological material It is important to provide to the projects partners a safe

Searching for OpenAIRE data...


Current trends in the application of causal inference methods to pooled longitudinal observational infectious disease studies-A protocol for a methodological systematic review.

Author(s): Heather Hufstedler; Ellicott C. Matthay; Sabahat Rahman; Valentijn M.T. de Jong; Harlan Campbell; Paul Gustafson; Thomas P. A. Debray; Thomas Jaenisch; Thomas Jaenisch; Lauren Maxwell; Till Bärnighausen; Till Bärnighausen
Published in: PLoS ONE, 1, 2021, ISSN 1932-6203
Publisher: Public Library of Science
DOI: 10.1371/journal.pone.0250778

The European Nucleotide Archive in 2021.

Author(s): Carla Cummins; Alisha Ahamed; Raheela Aslam; Josephine Burgin; Rajkumar Devraj; Ossama Edbali; Dipayan Gupta; Peter W. Harrison; Muhammad Haseeb; Sam Holt; Talal Ibrahim; Eugene Ivanov; Suran Jayathilaka; Vishnukumar Balavenkataraman Kadhirvelu; Simon Kay; Manish Kumar; Ankur Lathi; Rasko Leinonen; Fábio Madeira; Nandana Madhusoodanan; Milena Mansurova; Colman O’Cathail; Matt Pearce; Stephane P
Published in: Nucleic Acids Research, 2, 2022, ISSN 0305-1048
Publisher: Oxford University Press
DOI: 10.1093/nar/gkab1051

Continual updating and monitoring of clinical prediction models: time for dynamic prediction systems?

Author(s): David A. Jenkins; David A. Jenkins; Glen P. Martin; Matthew Sperrin; Richard D Riley; Thomas P. A. Debray; Gary S. Collins; Niels Peek; Niels Peek
Published in: Diagnostic and Prognostic Research, Vol 5, Iss 1, Pp 1-7 (2021), 1, 2021, ISSN 2397-7523
Publisher: Diagnostic and prognostic research.
DOI: 10.1186/s41512-020-00090-3

The COVID-19 Data Portal: accelerating SARS-CoV-2 and COVID-19 research through rapid open access data sharing

Author(s): Harrison PW, Lopez R, Rahman N, Allen SG, Aslam R, Buso N, Cummins C, Fathy Y, Felix E, Glont M, Jayathilaka S, Kadam S, Kumar M, Lauer KB, Malhotra G, Mosaku A, Edbali O, Park YM, Parton A, Pearce M, Estrada Pena JF, Rossetto J, Russell C, Selvakumar S, Sitjà XP, Sokolov A, Thorne R, Ventouratou M, Walter P, Yordanova G, Zadissa A, Cochrane G, Blomberg N, Apweiler R.
Published in: Nucleic Acids Research, 2021, ISSN 0305-1048
Publisher: Oxford University Press
DOI: 10.1093/nar/gkab417

The European Nucleotide Archive in 2020.

Author(s): Peter W. Harrison; Alisha Ahamed; Raheela Aslam; Blaise T. F. Alako; Josephine Burgin; Nicola Buso; Mélanie Courtot; Jun Fan; Dipayan Gupta; Muhammad Haseeb; Sam Holt; Talal Ibrahim; Eugene Ivanov; Suran Jayathilaka; Vishnukumar Balavenkataraman Kadhirvelu; Manish Kumar; Rodrigo Lopez; Simon Kay; Rasko Leinonen; Xin Liu; Colman O’Cathail; Amir Pakseresht; Youngmi Park; Stephane Pesant; Nadim Ra
Published in: Nucleic Acids Research, 2, 2020, ISSN 0305-1048
Publisher: Oxford University Press
DOI: 10.1093/nar/gkaa1028

Pre-pregnancy and pregnancy cohorts: a scoping review protocol

Author(s): Lauren Maxwell; Regina Gilyan; Sayali Arvind Chavan; Marwah Al-Zumair; Shaila Akter; Thomas Jaenisch; Thomas Jaenisch
Published in: F1000Research 2021, 1, 2021, ISSN 2046-1402
Publisher: F1000 Research Ltd.
DOI: 10.12688/f1000research.55501.1

Current trends in the application of causal inference methods to pooled longitudinal non-randomised data: a protocol for a methodological systematic review

Author(s): Edmund Yeboah, Nicole Sibilla Mauer, Heather Hufstedler, Sinclair Carr, Ellicott C Matthay, Lauren Maxwell, Sabahat Rahman, Thomas Debray, Valentijn M T de Jong, Harlan Campbell, Paul Gustafson, Thomas Jänisch, Till Bärnighausen
Published in: BMJ Open, 2021, ISSN 2044-6055
Publisher: BMJ Publishing Group
DOI: 10.1136/bmjopen-2021-052969

The European Bioinformatics Institute: empowering cooperation in response to a global health crisis

Author(s): Cantelli G, Cochrane G, Brooksbank C, McDonagh E, Flicek P, McEntyre J, Birney E, Apweiler R.
Published in: Nucleic Acids Research, 2021, ISSN 0305-1048
Publisher: Oxford University Press
DOI: 10.1093/nar/gkaa1077

Individual participant data meta‐analysis of intervention studies with time‐to‐event outcomes: A review of the methodology and an applied example

Author(s): Valentijn M.T. Jong, Karel G.M. Moons, Richard D. Riley, Catrin Tudur Smith, Anthony G. Marson, Marinus J.C. Eijkemans, Thomas P.A. Debray
Published in: Research Synthesis Methods, 11/2, 2020, Page(s) 148-168, ISSN 1759-2879
Publisher: John Wiley & Sons
DOI: 10.1002/jrsm.1384

The European Nucleotide Archive in 2019.

Author(s): Amid, Clara; Alako, Blaise T F; Balavenkataraman Kadhirvelu, Vishnukumar; Burdett, Tony; Burgin, Josephine; Fan, Jun; Harrison, Peter W; Holt, Sam; Hussein, Abdulrahman; Ivanov, Eugene; Jayathilaka, Suran; Kay, Simon; Keane, Thomas; Leinonen, Rasko; Liu, Xin; Martinez-Villacorta, Josue; Milano, Annalisa; Pakseresht, Amir; Rahman, Nadim; Rajan, Jeena; Reddy, Kethi; Richards, Edward; Smirnov, Dmitr
Published in: Nucleic Acids Research, 2, 2020, ISSN 0305-1048
Publisher: Oxford University Press
DOI: 10.1093/nar/gkz1063

Systematic Review Reveals Lack of Causal Methodology Applied to Pooled Longitudinal Observational Infectious Disease Studies.

Author(s): Hufstedler, H., Rahman, S., Danzer, A. M., Goymann, H., de Jong, V., Campbell, H., Gustafson, P., Debray, T., Jaenisch, T., Maxwell, L., Matthay, E. C., & Bärnighausen, T.
Published in: Journal of clinical epidemiology, 2022, ISSN 0895-4356
Publisher: Elsevier BV
DOI: 10.1016/j.jclinepi.2022.01.008

Measurement error in meta-analysis (MEMA)—A Bayesian framework for continuous outcome data subject to non-differential measurement error.

Author(s): Campbell, H, de Jong, VMT, Maxwell, L, Jaenisch, T, Debray, TPA, Gustafson, P.
Published in: Research Synthesis Methods, 2022
Publisher: Wiley Online Library
DOI: 10.1002/jrsm.1515

Bayesian adjustment for preferential testing in estimating the COVID-19 infection fatality rate

Author(s): Campbell H; Valpine Pd; Maxwell L; Jong VMd; Thomas Debray; Jänisch T; Gustafson P
Published in: arXiv preprint, 1, 2021
Publisher: arXiv preprint