Skip to main content

ARCHITECTURE AND TOOLS FOR THE QUERY OF ANTIBODY AND T-CELL RECEPTOR SEQUENCING DATA REPOSITORIES FOR ENABLING IMPROVED PERSONALIZED MEDICINE AND IMMUNOTHERAPY

Periodic Reporting for period 1 - iReceptor Plus (ARCHITECTURE AND TOOLS FOR THE QUERY OF ANTIBODY AND T-CELL RECEPTOR SEQUENCING DATA REPOSITORIES FOR ENABLING IMPROVED PERSONALIZED MEDICINE AND IMMUNOTHERAPY)

Reporting period: 2019-01-01 to 2020-06-30

The integration of large-scale genomic data with extensive health data is revolutionizing biomedical research and holds great potential for improving patient care. One of the areas that holds the greatest potential for immediate breakthroughs began in 2009, when next generation sequencing was applied for the first time to antibody/B-cell and T-cell sequences of the Adaptive Immune Receptor Repertoire (AIRR-seq data). Researchers and clinicians now determine 1-10M such sequences per sample, allowing the characterization of the immune response in exquisite detail. These data have quickly grown in numbers of sequences per sample and in areas of application such as vaccine research, development of monoclonal antibody therapies against autoimmune disease, and novel cancer immunotherapies.
However, AIRR-seq data are typically stored and curated by individual labs, using a variety of tools and technologies. The objective of the iReceptor Plus project is to build a common scalable platform to integrate distributed repositories of AIRR-seq data for enabling improved personalized medicine
iReceptor Plus will be designed as a network of federated repositories (an AIRR Data Commons or ADC) that facilitates data queries and advances analyses through a centralized web portal (the iReceptor Plus Scientific Gateway). iReceptor Plus will also integrate relevant non-AIRR-seq data – both clinical and biological – to allow the analysis of global interactions within the immune system and in its environment. The platform is free and open software, making it possible for the research community to extend and adapt the tools and technologies utilized within the project, e.g. for research labs to integrate their own data repositories into the platform.
During the first month of the project a Kick-off meeting was held on project planning, and project implementation.
A user interface and user experience (UI/UX) group is actively implementing the AIRR Data Commons (ADC) API across iR+ components in parallel with the API development. An AIRR compliant version of the iReceptor Scientific Gateway was released in coordination with the ADC API release.
Based on the provisional single-cell extension (scXT) standard, passed by the AIRR Community in May 2019, development of a data model that implements the scXT and its further refinements started in September 2019. By January 2020, public community consultation from stakeholders was obtained. Using this information, a revised data model was presented and merged into the AIRR Community data standards and specification.
A security layer has been integrated into three iR+ API’s in a testing environment: iReceptor Turnkey, SciReptor and ImmuneDB. The chosen granularity for access control was at the study level, meaning, users will only be able to access data if they possess the rights to access them, either by ownership, or by having that data shared with them.
Data curation protocols are being designed for the integration in the TransSMART platform. A major effort with the APHP hospital partner developed clinical data curation protocols based on the TRANSIMMUNOM data. Data curation protocols for gene of of gene expression and immune phenotyping.
A communication and dissemination plan has been drafted and we are working accordingly. The project has published so far three press releases and articles were published in more than 7 relevant publications in different countries.
iReceptor Plus was presented by Partners in 12 events, which took place in Canada, Sweden, UK, Belgium, US, Israel, China and Italy. Unfortunately, due to the outbreak of the COVID19, other events were canceled or postponed.
The project is very active on social media. An attractive, user-friendly project website has been developed in order to increase visibility of the project’s latest news, outcomes results and details: https://www.ireceptor-plus.com/. In addition a project blog was set up https://www.ireceptor-plus.com/blog/ in which the project shares news about its progress to increase outreach, promote awareness to the project’s achievements and maintain interest. So far 14 blog posts have been published.
An Ethics Advisory Board was established for the iReceptor Plus project, described in a recent WP11 deliverable. As part of this exercise, compliance with ethics requirements for the project has been ensured by all researchers.
Our iR+ General Assembly meeting virtually June 8-10 2020. All participating institutions were represented. We established a Scientific Advisory Board (SAB) in March 2020, and the SAB fully participated in the General Assembly Meeting, providing valuable insight into technical and outreach aspects of the project.
The iR+ environment has been scaled up (handling repositories of up to 500 million sequences) and scaled out (to 5 integrated repositories). The platform’s ability to conduct queries across 100s of millions of rearrangements in seconds facilitates our researcher’s exploratory data analyses at scale. The immediate effectiveness of the platform can be demonstrated by the fact that during the recent global outbreak of the COVID-19 virus, 9 COVID-19 data sets have been uploaded to the iReceptor Gateway driving major scientific interest and resulting in > 150 new account requests since establishing this resource. The iReceptor platform is already providing critical tools for the scientific and pharmaceutical community in the development of a COVID19 vaccine treatment.
To some extent this is the iR+ project responding according to the plans laid out in the original project. However, the COVID-19 repository goes beyond the normal vision for the project in several ways. Perhaps most importantly, given the dissemination of the availability of this resource through iR+ and the AIRR-Community, researchers began to directly contact the iReceptor Plus team to make their data available, in many cases before their papers were accepted. The common path is for researchers to submit a paper, and deposit their data in a public repository only when the study is published, or often only releasing their data up to a year after publication, after any scientific insights and IP possibilities have been mined. The presence and dissemination power of iR+ has helped to change this approach to a more community-driven, open access approach during this time of world-wide crisis.
The example of COVID-19 data integration is a window into our future. We expect to expand the number and type of institutions that are contributing to the iR+ set of data repositories, including non-academic institutions such as biopharma companies and clinics/hospitals. We expect that the expanded data sets available for analysis through this expansion will provide the massive data sets necessary to discover biomarkers for diagnostics and therapeutic leads, especially using machine learning and artificial intelligence approaches to mine these biomarkers. As we go from the first iteration of the Gateway, with its focus on curating massive AIRR-seq data sets from multiple institutions, to the next generation Gateway, allowing the user to send federated sets of repertoires to sophisticated analysis tools, we will produce a project that will greatly accelerate biomedical research and improve patient care.