(595b) Expanding Molecular Simulation Use By Data/Code Sharing in Scientific Publishing

Authors: 
Kitchin, J. R., Carnegie Mellon University
Molecular simulations hold a unique property in research methodologies; in the vast majority of cases the results can be exactly reproduced if the input and codes are known. Despite this, it is rare to find fully reproducible reports of molecular simulations in the literature. Especially for new-comers to the field, this makes it difficult for researchers to learn from, reproduce and build on results from the literature. Data sharing and reproducibility of research are increasingly important issues. Funding agencies are mandating data sharing in calls for proposals, and journals are increasingly requiring data sharing as a condition of publication. Scientists are increasingly interested in open access to data. A requirement, or even desire to share is not sufficient, however, if sharing is difficult or tedious. We believe that new authoring tools are needed that will integrate data and analysis into the research and publishing processes. These tools will reduce the difficulty of sharing and reusing data.

We have developed a new approach to writing scientific documents that enables the direct inclusion of human-readable, and machine-addressable data and code in the narrative text. In this talk we will illustrate the approach by example from papers [1,2] we have recently published using the approach. We will show examples of how we use this approach to document molecular simulation results, and how other researchers can see what we have done, and reuse the code/data from the published manuscripts for new purposes. We show that the combination of an extensible editor (Emacs) with a lightweight markup language (org-mode) provides a remarkable solution to data sharing and research reproducibility issues. This combination enables the documentation of experimental setup, data generation, and analysis in a single document, and subsequent export of a scientific manuscript that is suitable for submission to most journals. When coupled with external data repositories, the approach enables sharing of large or complex data sets that cannot easily be captured in a manuscript.

1. Kitchin, John R., Data Sharing in Surface Science, Surface Science, in press (2015). doi:10.1016/j.susc.2015.05.007.
2. Kitchin, John R., Examples of Effective Data Sharing in Scientific Publishing, ACS Catalysis, 5(6), pp. 3894-3899 (2015). doi:10.1021/acscatal.5b00538.