(420j) Mispr: A High-Throughput Multi-Scale Infrastructure for Automating Molecular Simulations | AIChE

(420j) Mispr: A High-Throughput Multi-Scale Infrastructure for Automating Molecular Simulations


Rajput, N. N. - Presenter, Stony Brook University
Atwi, R., Tufts University
Bliss, M., Stony Brook University
In this talk, I will present a robust open-source high-throughput multi-scale computational infrastructure coined MISPR (Materials Informatics for Structure-Property Relationships) developed by our lab that seamlessly integrates classical molecular dynamics (MD) simulations with density functional theory (DFT).1 By enabling high-performance data analytics and coupling between different methods and scales, MISPR addresses critical challenges arising from the needs of automated workflow management and data provenance recording. The major features of MISPR include automated DFT and MD simulations, error handling, derivation of molecular and ensemble properties, and creation of output databases that organize results from individual calculations to enable reproducibility and transparency. I will describe fully automated DFT and MD workflows implemented in MISPR to compute various electronic properties such as nuclear magnetic resonance chemical shift, binding energy, bond dissociation energy, and redox potential with support for multiple methods such as electron transfer and proton-coupled electron transfer reactions.2 The infrastructure also enables the characterization of large-scale ensemble properties by providing MD workflows that calculate a wide range of structural and dynamical properties in liquid solutions and at solid-liquid interfaces. At the backend, the infrastructure interfaces with the Gaussian 3 software which enables electronic structure calculations of chemical systems, and LAMMPS 4 open-source code for MD simulations. In addition to its ability to handle the generation of input/data files and parsing of output files from the mentioned computational software, the infrastructure allows management of the collected data and storing it in MongoDB 5, a NoSQL database program using JSON-like documents with flexible schema. Each derived property is saved in its own collection with auxiliary information like molecular metadata (smiles representation, chemical formula, ...), which makes it easy to query and data-mine structure-property relationships. The user can tune the calculations by overriding default workflow parameters, for example, the functional and basis set, by-passing selected steps, or packing many jobs over multiple nodes for supercomputing resources. LAMMPS workflows allow the execution of MD simulations in different ensembles and analysis of the dumped trajectories for various dynamical and structural properties using another open-source codebase developed by our lab called MDPropTools.6 MDPropTools is a powerful standalone in-house suite of Python-based post-processing routines, to perform statistical analysis of thermodynamic, structural, and dynamical properties of MD simulations of liquids and solid-liquid interfaces.6 Using MISPR and MDPropTools, we published the first publicly available database, ComBat ~2,000 computed QC and MD properties for reported LSB electrolytes composed of solvents spanning 16 different chemical classes.7, 8 The data generated from high-throughput physical models are then used to train machine learning (ML) models that will allow exploration of a larger chemical and parameter space at a significantly higher speed to establish structure-property relationships in multicomponent solutions. MISPR employs the methodologies of materials informatics to facilitate understanding and prediction of phenomenological structure-property relationships, which are crucial to designing novel optimal materials for numerous scientific applications and engineering technologies.