Rapid Identification and Visualization of CRISPR Loci Via Automated High-Throughput Data Processing Pipeline | AIChE

Rapid Identification and Visualization of CRISPR Loci Via Automated High-Throughput Data Processing Pipeline

Authors 

Nethery, M. - Presenter, North Carolina State University
Barrangou, R., North Carolina State University
Clustered regularly interspaced short palindromic repeats (CRISPR) and associated sequences (cas) comprise an adaptive immune system widespread in bacteria and archaea. Located between each CRISPR repeat is a short segment of DNA collected from an invading mobile genetic element, called a spacer. Visualizing iterative spacer acquisitions representing unique evolutionary tracks has proven useful for genotyping, especially for comparative analysis of closely related organisms, and even clonal lineages. Current spacer visualization methods are tedious and typically require manual data manipulation and curation, including spacer extraction at each CRISPR locus from a genome of interest. Once spacers have been isolated, information regarding their length and content must be laboriously distilled and summarized into a format suited for comparative analysis. Here, we present a high-throughput processing pipeline and web-based visualization tool, facilitating spacer extraction, graphical comparison, alignment and clustering. The analysis pipeline automates the extraction of spacers from an unlimited number of genomes simultaneously, then feeds the resulting spacer files into a visualization engine for comparison of spacer length and content. Additional manipulation, including manual or automated multiple alignment, can be performed from the graphical user interface. This efficient high-throughput solution supports rapid analysis of large data sets and will enable and expedite large-scale genotyping efforts based on CRISPR loci.