Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS

DNA Data storage

Periodic Reporting for period 1 - DNA DS (DNA Data storage)

Reporting period: 2019-12-01 to 2020-05-31

The world is producing data at an ever-increasing rate - we were producing 2.5 exabytes (EB) data per day in 2017, rising to 463 EB per day generated in 2025. With the rise of connected devices, the so-called Internet of Things (IoT), these numbers are likely to be even greater. Existing data storage technologies are sufficient for now, but soon we are likely to face challenges due to limited supply and their shortcomings: limited lifespan (up to 30 years), physical space requirements and reliability. That is why we save only 50% of all the data that we generate in the world today. This gap will increase – in 2025 we will be able to save only 20% of all data. Since data is the fuel for artificial intelligence, which is revolutionizing our society, technology, and healthcare, it is even more important to invest into new data storage technologies that would overcome the limitations of the existing technologies.

One of the promising new data storage media is DNA, a molecule that can hold 20-million times more data per gram of substance than the state of the art technologies. While traditional DNA synthesis methods are still far too expensive for data storage needs, there is a need for novel ways to encode data as nucleotides (DNA building blocks) and efficiently synthesize the data-bearing DNA molecules.

At Biosistemika we invented and patented a novel data encoding system that allows for cheap and fast synthesis of DNA molecules. We demonstrated the usability at a laboratory scale. In the next steps, we are scaling down the volumes million-times to achieve the price that is comparable to the existing data storage technologies. In this study, we investigate which technologies could be used for scaling down and determined InkJet technology as the most promising one, due to low volumes (1.5 picoliter) and high speed (more than 100 000 droplets per second). We also studied the economical feasibility of DNA data storage technology and determined that the cost of storing 1 gigabyte of data to DNA can be comparable to storing the same data to magnetic tape. Because of these findings, we decided to continue with this project after the SME1 instrument concludes. In the last part of the project we, therefore, prepared a business plan and investigated possible commercialization strategies.
During this project, we prepared a feasibility study to determine whether DNA data storage technology is technologically feasible and whether there is a market opportunity for a new technology that cannot yet compete price-wise with the existing data storage technologies. Writing 1 gigabyte of data to DNA on a laboratory scale costs more than 16 million EUR and the archive would consist of more than 13 000 litres of liquid. Because of this, we would need to scale down the reaction at least million-times to achieve a practical and economical solution. Our DNA writing technology is based on Polymerase chain reaction assembly (PCA), therefore we first searched the literature for previous reports on low-volume PCA reactions. According to the literature, nanoliter scale PCA is possible, and picoliter scale PCA is feasible. Further, we investigated the liquid handling technologies that could be used to work with such minute volumes. The best characteristics showed InkJet technology, which can handle volumes below 2 picolitres and perform more than 100 000 liquid handling operations per second. This would allow for good price-performance and adequate writing speeds. In the next step, we looked at the economical feasibility of DNA data storage. Just by proper sourcing of materials and working in the ~100 picolitre range, the technology can compete price-wise with magnetic tapes. The economy of scale and further improvements in the technology can drive this price much below magnetic tape storage techniques. Our results demonstrate that DNA data storage is technologically and economically feasible.

The final part of the study focused on commercialization strategy and business plan, where we researched the data archiving market as well as the life science liquid handling market and determined the place of our DNA data storage technology in it. The life sciences market, such as DNA sequencing and nanoparticle synthesis, could make use of our novel liquid handling technology, which would allow an immediate market entry and revenue of ~13.3 million EUR, assuming 0.01% market share in several relevant markets. We estimate that the data archiving market, on the other hand, will likely be ready for DNA data storage technology in 5-10 years, therefore concrete revenue and profit projections are difficult to predict at this point.

Lastly, we performed a set of interviews with entrepreneurs and investors to determine the best commercialization strategy for a high-tech invention. We concluded that BioSistemika will likely form a spin-off company to transfer the invention into a new business entity. To bring the technology to the market we will look for public funding and private investments.
Our patented DNA data storage technology shows a clear advantage beyond the state of the art. Our technology allows greater data density and a more efficient data encoding algorithm, therefore reducing the complexity of liquid handling, hence allowing easier development of the technology. During the SME1 instrument project, we gained a better understanding of the liquid handling technologies that could be used and started developing our own liquid handling solution. Based on the conclusions of this study, we decided to continue pursuing this project.

Although DNA data storage is an important new technology that could solve many challenges related to data storage today and in the future (as we outlined in the introduction), an equally great impact could be also on life sciences and potential medical solutions. The low volume liquid handling technology would allow a great reduction in the price of diagnostic tests, particularly genetic tests, and pathogen detection. This is particularly relevant during the COVID-19 pandemics, where greater accessibility of the tests is essential. The low volume liquid handling solution would also allow the development of new diagnostic approaches, transitioning from targeted diagnostics to screening diagnostics. This would help diagnose diseases that might not be obvious from the symptoms.

DNA data storage technology is relevant today for digital preservation. Its longevity (lifespan 1000+ years) and resistance to the electromagnetic field make it a great medium to store information important for humanity. Many libraries and archives are using non-electronic technologies, based on paper or film reels, to preserve the information beyond the typical lifespan of magnetic tape (up to 30 years). One of the more famous attempts is Github Archive Program, which archived all open source software on multiple non-electronic media in case the humanity gets destroyed. DNA can be therefore also used in similar digital preservation attempts and can serve as an additional layer of data safety.
20200730-093252-min.jpg
20200730-093236-min.jpg
setup2-min.png
20200730-093313-min.jpg
setup1-min.png
20200730-093246-min.jpg