Skip to main content
Weiter zur Homepage der Europäischen Kommission (öffnet in neuem Fenster)
Deutsch Deutsch
CORDIS - Forschungsergebnisse der EU
CORDIS

Computational multiplexing to optimise next-generation sequencing

Periodic Reporting for period 1 - MultiSeq (Computational multiplexing to optimise next-generation sequencing)

Berichtszeitraum: 2023-12-01 bis 2025-05-31

Recent advances in Next-Generation Sequencing (NGS) of nucleic acids (i.e. DNA or RNA) have transformed biology and medicine. Today NGS is one of the main pillars of research in various biological disciplines, and it has already pervaded numerous fields of applications ranging from clinics to the biotechnological industry. Given its versatility and high demand, the global market for NGS is rapidly expanding, with the number of sequenced samples doubling every two years. However, while major advances in NGS were mainly related to a rapid increase in sequencing throughput per machine, the preparation of sequencing libraries - the other integral step of NGS, has largely remained unchanged. Currently, the Library Preparation (LP) step is the major financial and operational bottleneck for sequencing projects, limiting the widespread adoption and utility of NGS. Current state-of-the-art solutions for overcoming these problems either require high upfront costs and/or are laborious. We are developing a bioinformatics solution to these problems, which minimizes the cost and time of library preparation. Our approach, called MultiSeq, allows designing a multiplexing strategy to reduce the number of libraries followed by computational demultiplexing. We plan to extend the experimental proof of concept of our method by applying it to broadly sequenced species. In addition, we will integrate our algorithms into a versatile computing framework and develop a pilot project in an industrially relevant context. In parallel, we will perform market analysis and evaluate the most suitable IP protection and commercialization strategies of our technology. If successful, MultiSeq will be a game-changing approach that will impact sequencing technology and related industries by further democratizing the field of NGS and benefiting both the scientific community and society.
In this project, we had the following scientific and technical goals: to perform experimental validation of MultiSeq on widely sequenced organisms, and to transform our internal bioinformatics scripts into a cohesive, user-friendly data analysis and management platform.
To accomplish the first goal, we analyzed publicly available genomics databases such as the Sequence Read Archive (https://www.ncbi.nlm.nih.gov/sra/(öffnet in neuem Fenster)) and the European Nucleotide Archive (https://www.ebi.ac.uk/ena/(öffnet in neuem Fenster)) and identified and obtained the genetic material of the 9 most frequently sequenced organisms worldwide, including human, mouse, fruit fly, Salmonella enterica, Escherichia coli, among others. Based on these data, we planned and successfully executed a large-scale experiment in which we performed high-coverage whole-genome sequencing of these organisms in pools according to the multiplexing strategy designed by MultiSeq, and also sequenced the individual samples as controls. We then compared the end results of the pooled and individually sequenced samples. Based on this analysis designed and performed using MultiSeq’s bioinformatics pipelines, we identified the species that can be successfully multiplexed and demultiplexed, and assessed various sequencing and data analysis parameters allowing to sequence these organisms without compromising the quality of end results, and assessed the cost saving achieved by MultiSeq.
Since the beginning of the project, we have also emphasized our efforts on turning the independent, research-grade scripts of MultiSeq into a cohesive, robust, and user-friendly industrial-scale software platform to facilitate adoption and usability. Throughout the project, we applied an Agile software development framework and iteratively developed MultiSeq in close cooperation with internal users from our own group, continuously improving the software based on user feedback. To this end, we unified the bioinformatics scripts of MultiSeq and turned them into an integrated, containerized platform with a web-based graphical user interface, offering various functionalities and databases for efficient data management. Both on-premise and AWS-based cloud versions of the platform were implemented.
Overall, both major scientific and technical objectives of the project were successfully and timely achieved.
The results of this project have far reaching implications and impact.

Economic impact of MultiSeq can be substantial - It is a direct cost saver for the main sequencing applications, addressing the significant expense of library preparation. For instance, in 2020 Wellcome Sanger Institute, one of the largest sequencing facilities, sequenced ~228000 samples, potentially spending ~€20 million on LP. By adopting MultiSeq, the institute can save millions - in our group it saves ~70% of LP costs. This cost reduction in turn boosts R&D output, which potential economic impact is difficult to overstate. Moreover, application of MultiSeq in a clinical setting (for example in microbiology and epidemiology units) can significantly reduce expenditures for the public health systems. Most importantly, since MultiSeq is a software-based technology, it has a potential for an explosive scaling, thus it can have an immense economic impact in a short time frame. The potential social impact of MultiSeq is hard to overestimate. Considering that it makes sequencing cheaper and faster, numerous clinical sequencing applications can benefit from it - patients will receive their genomic data faster allowing medical doctors to diagnose and prescribe medicines faster; medical doctors and researchers will be able to identify pathogens faster, allowing to contain the spread of contagions, which has been a crucial factor in the recent pandemic. In the context of infection disease, short turn-around time of pathogen detection is paramount, and it was shown that for acute infection like mycoses, 1 day of delay of precise diagnostics increases mortality by 15%. Moreover, since MultiSeq is a multiplier of sequencing-related research, the potential social impact of added research findings can be significant. The process of LP involves usage of numerous reagents and laboratory plastic (tubes, pipette tips,etc). Based on public data of SRA database in 2022 (~7.5 million sequenced samples), LP has generated 400000 kg of plastic waste. MultiSeq makes sequencing more sustainable by significantly reducing those kinds of wastes and consumables. Considering that today the number and scale of genomic projects are rapidly increasing worldwide, and in some areas it has already become a routinely used technology, the usage of MultiSeq can have a strong environmental impact.

Despite the huge potential impact of MulitSeq, further valorization and commercialization steps must be taken, such as active engagement with market players, establishing clear industrial use cases and partnerships, IP protection framenworks, GDPR and security compliance, among others.
Mein Booklet 0 0