DNA digital data storage

DNA digital data storage: In recent years digital data is a very important part of our life. Our personal data such as personal information, our digital keys, digital wallet information, passwords and bank details are some of the crucial data that must be stored very securely.

The computer stores our data on its physical storage devices having a capacity of several gigabytes. The first data storage device was the Williams tube which is later replaced by magnetic drums.

Cassettes, CDR, floppy, pan drive, micro SD card and now cloud storage are some of the known storage units we are using since long.  Sequentially the size of the device becomes reduced and the storage capacity is increased. The worlds total data are stored in magnetic and optical media.

The traditional data storage units have several limitations: each storage unit has a limited data storage capacity, it can be lost or damaged, it can be corrupted and we can not store data even for a longer period of time. So, what will be the future of digital data storage?

DNA will be the next big thing for digital data storage in the future. DNA is a future of data storage technology. Yes, Data storage in DNA is now a reality.

In this article, we are going to discuss the actual process of how digital data can be stored in DNA. The outline of the article is enlisted here,

  • How the digital data is stored in DNA
  • How does DNA digital data storage works
  • How does DNA hold information
  • Application fo DNA digital data storage 
  • Limitation of DNA digital data storage
  • My ultimate guide for storing data into DNA

 A gram of DNA can store 1 exabyte of data for as long as 2,000 years.

The global data burden is increasing every day and data storage companies have to invest millions of dollar for new storage facilities each year.

How the digital data is stored in DNA?

Comparison of different data storing units :

Storage unit

Storage capacity

Storage density per mm3

Magnetic tap

185 TB

10GB/mm3

Optical disc

1PB

100GB/mm3

DNA sequence

>1XB

1EB/mm3

DNA is a heredity unit of all living organism on earth. It inherits characters from parents to their generation so it is almost like a memory storage device of all living organism. By taking this information as a base for the DNA data storage, the Soviet scientist, Mikhail Neiman published his paper regarding the use of DNA as a digital data storage unit, in 1964.

The first form of data which is encoded into DNA was in the form of HTML text template, published by George Church at Harvard University in 2012. This was the first milestone finding in DNA data storage technology.

An organic molecule used for storing digital data is called as an organic data storage. This may be DNA, RNA or protein.

How does DNA digital data storage work?

The digital data is encoded in a DNA sequence, the corresponding sequence information is synthesized into an artificial DNA and the information is decoded by sequencing the artificial DNA strand. This is the exact path of storing and retrieving of digital data from DNA.

Read further,

  1. DNA packaging in eukaryotes
  2. Trinucleotide repeat expansion disorders
  3. Linking Number: A Topological Feature of DNA
DNA digital data storage
The image represents the general process of DNA digital data storage.

How does DNA hold information?

I am explaining each step in depth, how digital Data is stored in the DNA

Encoding data into the DNA sequence:

The computer is worked on a binary system of 1 and 2. In the very first step, digital data is incorporated into the DNA. The DNA has 4 nitrogenous bases: Adenine (A), Cytosine (C), Guanine (G) and Thymine (T).  For storing data into the DNA, the A, T, G and C bases of DNA first converted into binary codes 1 and 0. 

For more detail on the DNA structure read the article: DNA story: The structure and function of DNA

00 for A, 01 for G, 10 for C and 11 for T are the binary codes for storing information. The information in the binary form is converted into the sequence of A, T, G, C. Now we have the long digital sequence of DNA.

Artificial DNA synthesis:

The single-stranded arbitrary DNA sequence can be synthesized chemically. On the basis of the digital sequence data, each nucleotide is added to the adjacent nucleotide. However, the efficiency of artificial DNA synthesis is 99% but the error of 1% can create a major problem in digital data storage.

To overcome this problem, large numbers of parallel start sites are provided to produce multiple copies of given sequence. Thus, despite having an error in a single copy many other exact copies can be produced.

Taq DNA polymerase is an enzyme used in artificial DNA synthesis. Read more on Taq DNA polymerase here.

Storing of sample:

Now we have our data backup in the form of a liquid drop of several nanograms of DNA. The DNA can be stored in deep freeze where it can be last for 100 years or we can send it to the external storage systems (provided by some companies) which can store our DNA for more than thousand years.

DNA remains stable in any harsh conditions for millions of year. Nonetheless, some sequences could be lost over a period of time.

Sequencing of DNA:

For extracting the digital data back to its original form, we have to sequence the entire DNA. DNA sequencing is a process in which a DNA sequence is read into the digital sequence.

The labelled nucleotides are added complementary to our DNA strand. Each nucleotide is labelled with different fluorescent dye. The intensity of colour emitted by each dye is recorded by the detector.

The process is repeated for multiple times with different start sites and which gives multiple parallel sequences of our DNA. The sequence which is exactly matched with our DNA is selected and send to the decoder.

Decoding information:

Finally, the sequence gets back to the decoder which decodes the DNA sequence back into binary language. After decoding we can retrieve our data back.

Comparison of data storage units with respect to access time and durability.

Durability

Data storage unit

Access time

3 years

Flash drive

Mili second

5 years

HDD (hard disk)

10 second

Up to 30 years

Magnetic tape

1 minute

More than 100 years

DNA storage

More than 12 hours

The scientists of Microsoft research in collaboration with the University of Washington are working on DNA digital data storage technology. However, the gaps in DNA sequencing is the major error in retrieving data back to its original form.

Nonetheless, in 2015, Microsoft announced the successful retrieval of 100% of DNA stored data. The 200MB data of 35 different types of files were retrieved back from the DNA without any error and keep digital DNA digital data storage future alive.

Some of the interesting article,

  1. Different types of inheritance pattern
  2. Prokaryotic DNA replication: Replication class 2
  3. Different types of DNA extraction methods
  4. The revolutionary non invasive prenatal diagnostic technique in emerging medical science: cffDNA

DNA digital data storage technique has several tremendous application.

  • It is applicable for storing some miscellaneous files such as previous medical records, legal documents and formal records.
  • In any conditions, the data stored in DNA can last for more than 10,000 years and it is guaranteed.
  • We can also store the entire data in small replicon libraries Because it occupies very little space.
  • By creating a DNA achieves in a single room we can store the entire data of the world.

It is a futuristic huge data storing unit, albeit restricted. The DNA digital data storage system has several limitations,

  • It takes a lot of time for storing, processing and computing of data.
  • On an average, the entire process is completed in 3 to 4 days.
  • The cost is another major limitation. Around 15MB of data storage cost up to 100,000 dollars.
  • It cannot be used as like a pen drive or a magnetic tap.
  • If we want to extract a specific type of file from the entire DNA archive, then we have to sequence and read the entire DNA data archive. How can we extract the specific type of file? I have answered this problem in my ultimate guide section below.

Tech giant Google already initiated a DNA digital data storage facilities under the brand name of “google genomics”. However, the project is in the beta phase but Microsoft is reportedly buying 10 million strands of DNA for setting up the DNA digital data storage technology.

Twist Bioscience is now actively involved commercial in DNA digital data storage technology.  So far they have created around 2000 exabytes of data and shipped (decoded) approx. 800 exabytes of data.

Image credit:https: www.twistbioscience.com

My ultimate guide for storing data into DNA

Well, storing data in DNA sequence is not as easy as I mentioned still, it is not impossible. For storing data into DNA we have to develop a whole new operating system for coding and decoding DNA sequences.

I have asked a question previously, How can we extract the specific type of file? We have to sequence the entire archive for each type of file.

For extracting a specific type of file, PCR technology can help us to do so. PCR technology helps in the amplification of specific type of DNA fragment of our interest with the help of dNTPs, primer, template DNA, Taq DNA polymerase and PCR buffer.

Read our entire series of PCR articles,

  1. The Function of dNTPs in PCR reaction
  2. Role of DMSO in PCR: DMSO a PCR enhancer
  3. Function of taq DNA polymerase in PCR
  4. PCR primer design guidelines
  5. Role of MgCl2 in PCR reaction

Suppose we have stored two files, one audio file and another text file on a particular DNA sequence by marking the start and end point of each file we can design a specific primer set for the specific type of file. 

The PCR is performed at an appropriate annealing temperature using a specific primer set for 35 cycles. In each PCR cycle, the copy of the DNA fragment (in which we are interested) is amplified 2n times.

At the end of PCR, we have millions of copy of DNA fragment for a specific file type. Now we can send PCR product for sequencing. The data is decoded using DNA sequence information.

DNA sample can be lost or damaged in extreme conditions. For overcoming this problem, we can store our digital DNA information into the specific plasmid.

The plasmid is a smaller circular DNA present in bacteria. With the help of restriction endonuclease, we can cut our DNA into small fragments (or based on the different types of file store on DNA).

Insertion of smaller DNA fragments into the plasmid can store our DNA sequence information in bacteria. We can store this bacteria strains for a longer period of time.

We can label each file types as well.

By incorporating specific marker gene along with our DNA sequence (which contains specific information) we can label each file.

Suppose, we have a DNA sequence having the text file “welcome to the new world”. Along with this DNA sequence, we have incorporated one marker gene which codes for the green bacteria coat protein.

If the gene is expressed, green coloured colonies are observed which indicates that our DNA having text file is present in this bacterial genome.

The rest of the data retrieving process is explained in the figure. (note: these are my assumptions, may be scientifically it is not possible).

DNA digital data storage
The schematic representation of storing data into plasmid

Although DNA digital data storage technology is costly and time-consuming at present. Still, it will prove to be very useful in the near future.

Conclusively, DNA digital data storage will be the only hope for storing data in the near future. It will revolutionize the digital technology for sure. I hope you like the article. Comment in the section below by login and tell us what you think about the future of DNA digital data storage.

Do share our articles and like us on Facebook.

Read further on agarose gel electrophoresis,

  1. Agarose gel electrophoresis
  2. Agarose gel electrophoresis buffer
  3. DNA gel loading dye
  4. Role of EtBr in agarose gel electrophoresis

Written by: Tushar Chauhan 

Reviewed by: Binal Tailor