(595c) Integrated Upstream and Downstream Data Curation Tools As a Key to Enabling Reproducibility, Usability and Data Sharing

Phelan, F. R. Jr., National Institute of Standards and Technology
Rosch, T., NIST
Jeong, C., National Institute of Standards and Technology
Moroz, B., National Institute of Standards and Technology
In this presentation, we describe the development of a computational â??workbenchâ? whose goal is to provide an integrated computational and data environment to support multiscale modeling of soft materials for the Materials Genome Initiative (MGI). The design has three essential elements: a modular program structure that supports the addition of new functionality through Python scripting and run-time plugins; a hierarchical data structure which enables unified representation of materials at different levels of granularity; finally, integration of the NIST Materials Data Curation System (MDCS) [1-2] into the environment to support ontology based materials descriptions. A key element of the design which we emphasize in this presentation is the database element. The XML schema based database environment allows us to visualize the inter-relationships between data elements, and enables automated curation of both upstream and downstream data in the workflow. We show how controlling the data in this manner is essential for ensuring reproducibility, results in greatly enhanced usability, and allows users to build progressive, materials reference libraries which can be pushed or shared by various means. We will illustrate this using various examples including tools being developed for coarse-grained force-field development and property calculation tools.


  1. Materials Data and Informatics, http://www.nist.gov/itl/ssd/is/materials-data-and-informatics.cfm
  2. Materials Data Curation System, https://github.com/usnistgov/MDCS