Automated Design for Synthesis and Assembly | AIChE

Automated Design for Synthesis and Assembly

Authors 

Oberortner, E. - Presenter, Lawrence Berkeley National Laboratory
Hillson, N. J., DOE Joint BioEnergy Institute
Deutsch, S., DOE Joint Genome Institute
Cheng, J. F., Lawrence Berkeley National Laboratory

Not every DNA sequence that one could conceive can be effectively manufactured using state-of-the-art DNA synthesis technologies. Over the last years and through the synthesis of many million base pairs, the DNA synthesis community has acquired knowledge regarding the feasibility of manufacturing synthetic DNA. Empirical analysis have enabled to identify common sequence features that impact the success/failure rates of synthetic DNA. Such features include, for example, repeating sequences, sequences with low or high GC content, or sequences that contain restriction sites.

A current major bottleneck for an efficient cost and time effective DNA synthesis process is the lack of incorporation of the DNA synthesis knowledge into the design process. Non-compliant designs result in severe process inefficiencies as: (i) sequences are often rejected by DNA synthesis vendors, (ii) when not rejected, they are synthesized at a higher cost or longer cycle time.

One major collaborative goal of DOE JGI, JBEI, and FutureBio is to develop scalable solutions to design and build complex DNA constructs for strain characterization in parallel with a high success rate. Therefore, we have developed the Sequence Polishing Library (SPL) that consists of various software tools that enable the automation of a "Design for Synthesis and Assembly" workflow. The SPL enables the verification of DNA sequences against specified synthesis constraints, such as repeats or GC content. In case of violations, careful modifications can be suggested that alter the DNA sequence without changing its functionality, such as swapping codons in the DNA’s coding regions. The SPL partitioning tool decomposes DNA sequences that exceed the maximum length of synthesis into synthesizable building blocks. The challenge here is to find appropriate overlapping sequences between properly sized building blocks in order to assemble the synthesized building blocks effectively. Thus, the partitioning tool enables the user to configure the characteristics of the overlaps, which are mostly determined by the utilized assembly protocol, such as length or melting temperature.

Every tool of SPL supports the exchange of sequence information using common data exchange formats, such as FASTA, GenBank, and SBOL. The SPL Web Application enables users to learn and utilize each tool interactively in an easy manner. In addition, every tool provides a RESTful API in order to programmatically invoke its functionality and to integrate it into automated workflows.