(729g) Recent Developments in the Signac Data Management Framework

Authors: 
Dice, B., University of Michigan
Adorf, C. S., University of Michigan
Ramasubramani, V., University of Michigan
Glotzer, S. C., University of Michigan
Computational resources for high-throughput data generation offer incredible potential for accelerating scientific discovery, especially if used in conjunction with well-managed computational workflows. The open-source signac data management framework enables researchers to maintain well-formed, reusable data spaces from early exploration through production runs on supercomputers. This is achieved through a transparent data and workflow model, as well as a simple and unobtrusive programmatic interface. The framework is application-agnostic, and has been applied in molecular simulations, quantum chemistry, photonics, computational fluid dynamics, machine learning, graph mining, and even organizing experimental data. Recently, the framework has been significantly extended. Among the new data management features are HDF5 integration for rapid access to large numerical data arrays; tools to import and export data spaces for long-term archival and publication of data sets; and increased integration with the scientific Python ecosystem to support easy export to pandas data frames or visualization in Jupyter notebooks. Furthermore, workflow automation has been expanded to support an increased set of supercomputers and allow more complex operations. We show examples of recent scientific applications that demonstrate the efficacy and versatility of signac across a range of research domains.