Pooled Assembly, Genotyping and Scoring of Diverse Genomic Libraries

Authors: 
Arkin, A., University of California, Berkeley

Precision engineering of complex behaviors in living systems is complicated by our
limited understanding of multiple underlying contexts of gene expression and our limited
ability to manipulate the genetic code in individual cells at scale. Iterative build, design
and test cycles using indexed strains thus often sample only a small region of the
explorable parameter space for a given gene network. Further, the relative ease of
optimizing gene networks on multi-copy plasmids for bacterial expression often is not
predictive of system performance when transferred to the chromosome for deployment
in complex environments ranging from the bioreactor to the mammalian gut.
We have developed a scheme for the pooled assembly, genotyping and single-variant
fitness scoring of barcoded genomic libraries coupled with CRIPSRi-mediated retrieval
of individual genotypes from the diverse pool. To validate the workflow we generated a
genomic library of over one million barcoded variants of the violacein biosynthetic
pathway vioABEDC in E. coli, sampling from a genotype space of over 260,000
combinations.
We employed Lambda Red recombination to serially integrate a diversified DNA
fragment - here, a gene with a degenerate ribosome binding site - fused to a selectionfluorescence
marker with genome homology. Successive “inchworm” integration stages
replaced the selection-fluorescence marker from the previous stage, cycling among
combinations of two fluorescence markers and three antibiotic resistance markers. We
used fluorescence activated cell sorting of pooled integrants to enrich for the expected
fluorescence phenotype, screening out spontaneous resistance mutants and off-target
integration events manifest as dual fluorescence phenotypes.
We genotyped the genomic library by associating individual barcodes with ribosome
binding site variants of each vio gene by deep sequencing amplicon fusion libraries
generated by emulsion PCR. Using time-series measurements of barcode abundance
we quantified differential fitness scores for each genotype under pathway induction
conditions and in resource competition with violacein-sensitive Bacillus subtilis to
identify variants that optimally balance the benefit of antimicrobial production against the
cost of gene expression.
To isolate individual genotypes from the pooled library the final integration stage
included a degenerate (N)20 barcode embedded in the untranslated region of a gfp
gene. We validated the extraction of single genotypes in an indexed library of eight
clones with diverse GFP expression levels transformed with CRISPRi plasmids that coexpresses
catalytically dead Cas9 with a guide RNA that target each barcode. We also
demonstrated the retrieval of low-abundance genotypes from the pooled genomic library
by transforming the library with CRISPRi plasmids encoding guides that target rare
barcodes.
We believe our approach to constructing pooled genomic libraries by comprehensively
sampling the parameter space of a gene network will expand understanding of
sequence to function relationships and enable engineered phenotypes that are currently
unreachable. We anticipate this genomic assembly, screening and isolation platform will
advance synthetic biology efforts to optimize cryptic biosynthetic gene clusters for
natural product discovery and other engineered cellular behaviors.