Periodic Reporting for period 2 - MoSS (Molecular Storage System (MoSS): Intelligent DNA Data Storage)
Periodo di rendicontazione: 2023-05-01 al 2024-04-30
This EIC Transition project builds on the results generated from the OligoArchive project to develop the MoSS DNA data storage platform. Helixworks, in collaboration with Imperial College London and EURECOM, will undertake technology validation activities.
Specific project outcomes per beneficiary/associated partner are as follows:
Helixworks:
Development of DNA synthesis kits, containing all the components (enzymes, reagents, buffer solutions) to efficiently perform synthesis of long oligonucleotides. This also includes a synthesis protocol optimised for high accuracy and high oligo yield.
Automation of the various phases of enzymatic DNA synthesis, with workstation modules controlled using on- board embedded systems, and command software used to design and implement a library of synthesis protocols.
Development of a sequencing preparation kit.
Development of a device for automating sequencing library preparation.
Imperial College London
Imperial College London’s overall outcome is to develop a new encoding model that can be used for the scalable synthesis developed by Helixworks. This includes:
Building blocks: harden motifs and join sections for sequencing, synthesis and storage to ensure (a) motifs can be synthesised and replicated, (b) are joinable and, (c) can be sequenced with low errors.
Scalable encoding: analysis of error characteristics of Heliworks’ new synthesis techniques and of sequencing technology used and adaptation of encoding based on the constrained channel framework
Two-layer encoding: development of encoding on two layers: (1) datatype agnostic storage layer and (2) datatype specific layer adding features specific to the datatype stored
Scale Up: adapt methods to scaling up storage of massive amounts of content of different data types. EURECOM
The core outcome for EURECOM will be a scalable, accurate, fully automated, cost-effective read consensus solution customised to the enzymatic synthesis approach adopted by Helixworks:
Scalability: Our solution will use both algorithmic techniques (randomised embedding) and systems techniques (CPU-GPU acceleration) to perform read consensus in just a few minutes.
Accuracy: Our solution will be able to successfully infer original oligos even from highly-error-prone long-read sequencers from Oxford Nanopore.
Automation: It will be possible to plugin and extend the Nanopore basecalling pipeline with our solution to enable fully automated DNA-to-data restoration without any manual involvement.
Cost-effectiveness: Finally, our solution will improve cost efficiency in two ways: (1) by reducing the read coverage (number of reads that correspond to an oligo), we will reduce the sequencing cost, and (2) by deploying our solution on an embedded systems like NVIDIA Jetson, we will reduce the capital and operational cost.
This innovation offers a unique platform that provides an end-to-end infrastructure for DNA data storage. It abstracts the "wet" layer for developers, providing the necessary tools and services to facilitate application development in the realm of DNA data storage. It not only democratizes access to this nascent technology but also drastically reduces the entry costs and resource needs, enabling a wide range of potential users to explore and innovate within this space. The initial commercial exploitation of this platform will be focused on researchers and developers who can leverage it to create cutting-edge applications for industries such as molecular tagging, digital preservation, and traditional media. Our unique proposition positions us in a niche space where DNA vendors only provide a partial solution, therefore creating substantial commercial potential