(710e) Reproducible Computational Workflows with Signac

Authors: 
Adorf, C. S., University of Michigan
Dodd, P. M., University of Michigan
Ramasubramani, V., University of Michigan
Dice, B., University of Michigan
Glotzer, S. C., University of Michigan
Researchers in computational science are regularly posed with the challenge of managing and analyzing large, heterogeneous, and highly dynamic data spaces. We present signac, a non-intrusive, open-source Python framework that enables researchers to efficiently operate on primarily file-based data spaces while keeping track of all relevant metadata. The signac framework provides all components required to create a well-defined, collectively accessible data space and to implement reproducible workflows. The software is designed to be highly modular, decoupling its data and workflow management components so as to minimize the effort required for integration into existing workflows. The serverless data management and lightweight workflow model ensure that workflows are just as easily executed on laptops as in high-performance computing environments. The signac approach not only increases research efficiency, it also improves reproducibility and lowers barriers for data sharing by transparently enabling the robust tracking, selection, and searching of data by its metadata.