DNA Goes Digital

The ubiquity of digital and server-stored file formats has brought about increased interest in ensuring that creative works conceived today will be preserved, digitally, forever. Chemical engineers play an important role in preserving this information, and new research suggests that bioengineers may also be able to contribute.

Modern today, forgotten tomorrow

Some of today's best television shows are filmed and edited digitally and then released solely online, through content providers like Hulu, Netflix, or YouTube. Although this can provide a convenient viewing experience, some important factors must be considered pertaining to the long-term viability of such art. Those companies will certainly not be around forever. I would like to think that future generations will be able to have access to today's video, art, and music to help them decipher what life in 2012 was like, the way that contemporary historians can extract information from cave drawings that depict life 50,000 years ago, or become mesmerized by celluloid film images from 100 years ago. Modern data storage excels at many things: ease of transport, random access, and incredible economies of scale among them. At the same time, it is susceptible to temporary loss in the event of power outages and natural disasters, and permanent loss due to changing Internet legislation or cyber terrorism -- not to mention the ever-evolving software and technology used to encode and access the data. Digital media created and "saved" using the standard software of only a decade ago may no longer be easily readable using today's devices and standard platforms.

Storage through nature itself

If we were to design a storage format with the goal of having the data last forever, it might help to borrow from a model that has proven itself to be nature's long-lasting data-storage format -- DNA. DNA and RNA are remarkably simple considering their universality (they are found in all living things) and how accurately they transmit encoded data. They can store data for an incredibly long time (under the right conditions, an upper limit is estimated to be one million years) and their position at the very center of biology means that the format is unlikely to ever go out of use.

The viability of using a man-made analog to DNA for long-term data storage was detailed in a paper published in the August 17, 2012, issue of Science. Researchers at Harvard Medical School encoded a book with more than 50,000 words into a DNA microchip -- a sort of virtual DNA. The researchers converted each character of text into binary code, the same way a computer would, and then encoded each 1 as a molecule of adenine or cytosine and each 0 as a molecule of thymine or guanine. The end product is a long chain of DNA with each molecule indicating a 1or 0 in the binary code that represents the original text. The researchers then used next-generation DNA sequencing to read the book. Although the researchers could have used a base-4 system instead of base-2, with each type of nucleotide base indicating a different component of the encoded message, the base-2 system avoids problems with reading and writing certain sequences of DNA (e.g., long repeats of guanine andcytosine) that are fragile in nature. Even with this slightly less-efficient encoding, the researchers wereable to store 5.5 petabits (that's 700,000 gigabytes) per cubic millimeter of DNA.

Unparalleled efficiency for future storage

At this efficiency, the researchers say, about four grams of DNA could store the digital data humankind creates in a single year. The storage density exceeds any other known method by many orders of magnitude, but there are considerable limitations that will prevent DNA storage from appearing in your next computer. For example, the data cannot be read out of order; there will be limited starting points, just as biological DNA has start and stop codons. It is also semi-permanent -- an entirely new strand of DNA needs to be created to modify any base pair within it. Still, DNA storage may eventually prove useful as a long-term archive of history. Copies can b ecreated cheaply and stored separately, and the data should be readable by any future civilization with our level of technical knowledge. With DNA storage, we will not just be storing human-created works -- we will be able to store huge quantities of data that are currently meaningless to us and are lost as quickly as they are created. For example, imagine an enormous data store of every measurement made by meteorological instruments from all over the world. A thousand years of such data combined with measurements of other environmental factors such as industrial pollutant concentrations could provide future researchers with the information they need to study and test climatology theories on a much larger scale than we can currently envision. Chemical engineers, and now bioengineers, will be instrumental in designing the instruments that use this technology and others like it that revolutionize the way we think about data. Our profession will only become more important in the future.

How soon do you think DNA will become widely used as storage?