CORDIS - EU research results
CORDIS

ARCHITECTURE AND TOOLS FOR THE QUERY OF ANTIBODY AND T-CELL RECEPTOR SEQUENCING DATA REPOSITORIES FOR ENABLING IMPROVED PERSONALIZED MEDICINE AND IMMUNOTHERAPY

Periodic Reporting for period 3 - iReceptor Plus (ARCHITECTURE AND TOOLS FOR THE QUERY OF ANTIBODY AND T-CELL RECEPTOR SEQUENCING DATA REPOSITORIES FOR ENABLING IMPROVED PERSONALIZED MEDICINE AND IMMUNOTHERAPY)

Reporting period: 2022-01-01 to 2022-12-31

The integration of large-scale genomic data with extensive health data is revolutionizing biomedical research and holds great potential for improving patient care. One of the areas that holds the greatest potential for immediate breakthroughs began in 2009, when next generation sequencing was applied for the first time to antibody/B-cell and T-cell sequences of the Adaptive Immune Receptor Repertoire (AIRR-seq data). Researchers and clinicians now determine 1-10M such sequences per sample, allowing the characterization of the immune response in exquisite detail. These data have quickly grown in numbers of sequences per sample and in areas of application such as vaccine research, development of monoclonal antibody therapies against autoimmune disease, and novel cancer immunotherapies.
AIRR-seq data are typically stored and curated by individual labs, using a variety of tools and technologies, which greatly hinders data sharing and collaborations. Moreover, users who do not pose a sophisticated knowledge of bioinformatic tools are unable to access these valuable data. The objective of the iReceptor Plus (iR+) project is to build a common scalable accessible platform to integrate distributed repositories of AIRR-seq data to improve personalized medicine.
iR+ will be designed as a network of federated repositories (an AIRR Data Commons or ADC) that facilitates data queries and advances analyses through a centralized web portal, the iR+ Scientific Gateway. iR+ will also integrate relevant non-AIRR-seq data – both clinical and biological – to allow the analysis of global interactions within the immune system and in its environment. The platform is free and open software, making it possible for the research community to extend and adapt the tools and technologies utilized within the project, e.g. for research labs to integrate their own data repositories into the platform.
A user interface and user experience (UI/UX) group is continuously implementing the AIRR Data Commons (ADC) API across iR+ components, in parallel with the API development. An AIRR compliant version of the iR+ Gateway was released in coordination with the ADC API release.
Based on the provisional single-cell extension (scXT) standard, passed by the AIRR Community in May 2019, development of a data model that implements the scXT and its further refinements started in September 2019. By January 2020, public community consultation from stakeholders was obtained. Using this information, a revised data model was presented and merged into the AIRR Community data standards and specification.
A security layer has been integrated into three iR+ API’s in a testing environment: iReceptor Turnkey, SciReptor and ImmuneDB. The chosen granularity for access control was at the study level, meaning, users will only be able to access data if they possess the rights to access them, either by ownership, or by having that data shared with them.
Data curation protocols are being designed for the integration in the TransSMART platform. A major effort with the APHP hospital partner is devoted to development of clinical data curation protocols based on the TRANSIMMUNOM data, such as protocols for gene expression and immune phenotyping.
A communication and dissemination plan has been drafted and we are working accordingly. The project has published so far 4 press releases and articles were published in nearly 20 relevant publications in different countries including the lucrative Financial Times and LA Times.
iR+ was presented by Partners in more than 40 events, which took place in Canada, Sweden, UK, Belgium, US, Israel, China and Italy. Unfortunately, due to the outbreak of the COVID19, other events were canceled or postponed.
The project is very active on social media. An attractive, user-friendly project website has been developed in order to increase visibility of the project’s latest news, outcomes results and details: https://www.ireceptor-plus.com/. In addition, a project blog was set up https://www.ireceptor-plus.com/blog/ in which the project shares news about its progress to increase outreach, promote awareness to the project’s achievements and maintain interest. So far 29 blog posts have been published.
An Ethics Advisory Board (EAB) was established for the iR+ project, described in a recent WP11 deliverable. As part of this exercise, compliance with ethics requirements for the project are being worked on by all project partners together with the EAB to find solutions to act in accordance with all ethical and legal issues.
Our iR+ General Assembly meeting virtually June 8-10 2020. All participating institutions were represented. We established a Scientific Advisory Board (SAB) in March 2020, and the SAB fully participated in the General Assembly Meeting, providing valuable insight into technical and outreach aspects of the project.
The iR+ environment has been scaled up (handling repositories of up to 500 million sequences) and scaled out (to 5 integrated repositories). The platform’s ability to conduct queries across 100s of millions of rearrangements in seconds facilitates our researcher’s exploratory data analyses at scale. The immediate effectiveness of the platform can be demonstrated by the fact that during the recent global outbreak of the COVID-19 virus, 9 COVID-19 data sets have been uploaded to the iReceptor Gateway driving major scientific interest and resulting in > 150 new account requests since establishing this resource. The iReceptor platform is already providing critical tools for the scientific and pharmaceutical community in the development of a COVID19 vaccine treatment.
To some extent this is the iR+ project responding according to the plans laid out in the original project. However, the COVID-19 repository goes beyond the normal vision for the project in several ways. Perhaps most importantly, given the dissemination of the availability of this resource through iR+ and the AIRR-Community, researchers began to directly contact the iR+ team to make their data available, in many cases before their papers were accepted. The common path is for researchers to submit a paper, and deposit their data in a public repository only when the study is published, or often only releasing their data up to a year after publication, after any scientific insights and IP possibilities have been mined. The presence and dissemination power of iR+ has helped to change this approach to a more community-driven, open access approach during this time of world-wide crisis.
The example of COVID-19 data integration is a window into our future. We expect to expand the number and type of institutions that are contributing to the iR+ set of data repositories, including non-academic institutions such as biopharma companies and clinics/hospitals. We expect that the expanded data sets available for analysis through this expansion will provide the massive data sets necessary to discover biomarkers for diagnostics and therapeutic leads, especially using machine learning and artificial intelligence approaches to mine these biomarkers. As we go from the first iteration of the Gateway, with its focus on curating massive AIRR-seq data sets from multiple institutions, to the next generation Gateway, allowing the user to send federated sets of repertoires to sophisticated analysis tools, we will produce a project that will greatly accelerate biomedical research and improve patient care.
Project Logo with the tag line