Skip to main content
European Commission logo
English English
CORDIS - EU research results
CORDIS
CORDIS Web 30th anniversary CORDIS Web 30th anniversary

Next Generation Molecular Data Storage

Periodic Reporting for period 1 - NEO (Next Generation Molecular Data Storage)

Reporting period: 2023-10-01 to 2024-09-30

Current long-term archival media (such as tape and disk) have several issues, the most important one being that they are short lived (up to 5-10 years) forcing data to be copied between storage media every few years in a costly process which also produces considerable electronic waste. Longer-lasting media are desperately needed, and DNA strands have been identified as a major contender to be the next archival storage medium. Here, data is written using DNA synthesis and read using DNA sequencing. DNA is particularly promising as a storage medium, due to its durability as it can last for several hundreds of years. However, storing data in DNA strands is currently too expensive due to the exorbitant cost of DNA synthesis (about 0.12USD to write one bit) as well as issues such as speed in writing (synthesis) and reading (sequencing). For this reason, we here investigate storing data in DNA nanostructures. Our approach is based on producing DNA nanostructures, like a breadboard, and attaching protein molecules at a given set of locations, to either write a one if the protein is present or a zero otherwise. The major benefit of our approach is that all possible nanostructures can be built out of a predefined, small set of DNA strands that can be produced cheaply and en-masse. Furthermore, editing stored information is currently infeasible with DNA storage based on strands but can be realized on DNA nanostructures using strand exchange mechanisms. With our approach, writing, reading (based on atomic force microscopy and automated image analysis), and editing are substantially faster and cheaper than standard approaches.
The activities performed so far were focused on DNA nanostructure synthesis and stability, bit writing, and data reading.
Synthesis:
Two candidate DNA nanostructure shapes were identified, i.e. a single-layer rectangle that offers a high bit density and a chiral Z shape for the easy identification of individual bit locations. The optimum arrangement of bits in the form of labelled DNA overhangs to be visualized by protein binding was evaluated for the rectangular DNA nanostructure. At maximum bit density, protein binding was hindered, with maximum binding observed only at half the maximum bit density. To optimise DNA nanostructure assembly, a protocol was developed to enable the efficient recovery of unincorporated DNA strands from prior DNA nanostructure assembly reactions. This method allows for the reuse of DNA strands in subsequent folding processes, significantly reducing production costs and improving the overall sustainability of data-carrying DNA nanostructures.
Stability:
To assess the long-term stability of the data-carrying DNA nanostructures, a dedicated accelerated aging test was developed that can be employed not only to estimate lifetimes but also to compare the effects of different stabilizing and destabilizing factors and measures. Electron and neutron irradiations of bare DNA nanostructures have been performed and the radiation stability of three different DNA nanostructure shapes was quantitatively evaluated.
Writing:
Prototypes for two alternative strand exchange reactions were successfully demonstrated under conditions suitable for maintaining DNA nanostructure stability.
Reading:
High-resolution atomic force microscopy (AFM) imaging was performed on various DNA nanostructure shapes deposited on solid surfaces. This imaging revealed well-defined DNA nanostructures, which is essential for developing future techniques to encode and decode data on DNA nanostructures at the molecular level. We achieved molecular resolution, enabling the visualization of individual DNA strands within the DNA nanostructures. This level of detail is crucial for precise functionalization of the DNA nanostructure with proteins, which can be used to encode and store data. Different approaches for the automated analysis of the AFM images based on modern computer vision und machine learning techniques have been evaluated and a software processing pipeline is currently in development.
Methodology for Reuse of DNA Strands: Significant progress was made toward reducing the costs associated with DNA nanostructure assembly by developing a method for recovering and recycling unincorporated DNA strands, improving sustainability and reducing total synthesis costs. This advancement supports future large-scale DNA nanostructure applications in data storage and other fields by making them economically more viable.
Accelerated Aging Test for DNA Nanostructures: Since DNA nanostructure stability is a highly important aspect in many technological and medical applications, we envision that the developed accelerated aging test will be widely used to characterize the impact of various environmental and design parameters on structural stability.
Strand Exchange Reactions on DNA Nanostructures: The successful optimization of strand exchange on DNA nanostructures in highly parallel reactions holds promise for scalable, high-precision data storage and molecular computing.
Automated Analysis of AFM Images: With the advent of high-speed AFM, which allows thousands of images to be recorded in few hours, the analysis of the recorded AFM images has become a serious bottleneck slowing down scientific discovery. The automated image analysis tools developed here hold great promise to solve this issue in the fields of DNA nanotechnology, biomedicine, and molecular biology.