(619b) Automated Design of Non-Repetitive Genetic Parts Using Non-Repetitive Parts Calculator and Its Application in Characterizing 4,350 Highly Non-Repetitive E.coli Promoters | AIChE

(619b) Automated Design of Non-Repetitive Genetic Parts Using Non-Repetitive Parts Calculator and Its Application in Characterizing 4,350 Highly Non-Repetitive E.coli Promoters

Authors 

Hossain, A. - Presenter, Penn State University
Reis, A., Penn State University
Halper, S., Penn State Univ
Cetnar, D., Penn State University
Salis, H., Pennsylvania State University
Synthetic biologists are facing new challenges as we endeavor to engineer ever larger genetic systems (circuits, pathways, gene clusters and genomes). Characterized genetic part toolboxes are often too small for large projects, or their part variants share similar DNA sequences, forcing us to re-use the same or similar genetic part in several locations. The resulting repetitive DNA sequences remain difficult to build via commercial DNA synthesis, and can trigger genetic system deletion in hosts with homologous recombination activity. Therefore, new approaches are needed to build large genetic systems without repetitive sequences -- the synthetic biology “repeat challenge”.

To solve this challenge, we developed and experimentally validated the Non-Repetitive Parts (NRP) Calculator, an optimization algorithm that designs a large maximally non-repetitive toolbox of genetic parts according to a user-specified set of design constraints. The design constraints may be a degenerate nucleotide or amino acid sequence, a proscribed RNA structure, a set of context-dependent background sequences and/or a quantitative model of part function. The algorithm then uses advanced path-finding in k-mer sequence space to generate genetic parts that satisfy the design constraints while not sharing repeats above a threshold length L, dictated by the system construction and host specifications. From extensive benchmarking, we show that the NRP Calculator can generate over 100,000 non-repetitive (L = 14) genetic parts (each 100 bp long) in less than 7 minutes.

As a first experimental example, we applied the NRP Calculator to build extremely large toolboxes of non-repetitive σ70E. coli promoter sequences (L = 10), offering tunable control over many gene expression levels in large genetic systems. The first toolbox contains 800 non-repetitive strong promoters, each with a consensus -35 and -10 hexamer. The second toolbox contains 3,500 non-repetitive promoters with varied strengths, containing -35 and -10 hexamers that deviate from the consensus sequence by 0 to 6 mismatches. The third toolbox contains 50 weak non-repetitive promoters with 12 mismatches in the hexamers. We measured the transcription rates of these 4,350 new non-repetitive promoters, utilizing chip-based oligo pool synthesis and a massively parallel reporter assay in E. coli NEB 5α, followed by DNA-Seq and RNA-Seq measurements. The entire library spanned a 2,591,738-fold transcription rate dynamic range and 2,858 promoters from the library had transcription rates higher than the standard J23100 promoter. Overall, the Non-Repetitive Parts Calculator promises to design and deliver a cornucopia of well-characterized genetic parts that can all be simultaneously combined to build extremely large genetic systems that will power the future of Synthetic Biology.