(156a) Data Science for Assembly Engineering | AIChE

(156a) Data Science for Assembly Engineering

Authors 

Glotzer, S. C. - Presenter, University of Michigan
Discovery and design of new materials able to self assemble from nanoscale building blocks are becoming increasingly enabled by large-scale molecular simulation. Aided by fast simulation codes leveraging powerful computer architectures, an unprecedented amount of data can be generated in the blink of an eye, shifting the effort and focus of the computational scientist from the simulation to the data. How do we manage so much data, and what do we do with it when we have it? In this talk, we discuss the applications of data science and data-driven thinking to molecular and materials simulation. Although we do so in the context of assembly engineering of soft matter, the tools and techniques discussed are general and applicable to a wide range of problems. We present applications of machine learning to automated, structure identification of complex colloidal crystals, high-throughput mapping of phase diagrams, the study of kinetic pathways between fluid and solid phases, and the discovery of previously elusive design rules and structure-property relationships.

We also discuss new tools we’ve developed to support these studies. We present pythia, a wrapper library for use with data analysis tools that generates descriptors of particle systems for ML. We present signac, an open-source data management framework that facilitates the management of large and heterogeneous data spaces by creating a well-defined, indexable storage layout for data and metadata. signac is lightweight and manages computational workflows on platforms ranging from laptops to supercomputers. signac supports HDF5 integration for rapid access to large numerical data arrays, has new tools to import and export data spaces for long-term archival and publication of data sets; and integrates seamlessly with the scientific Python ecosystem for use with Jupyter notebooks and more. signacis application agnostic and has been used for molecular simulations, quantum chemistry, photonics, computational fluid dynamics, machine learning, graph mining, and even organizing experimental data.

Contributors include Carl Simon Adorf, Joshua A. Anderson, Chengyu Dai, Bradley Dice, Yina Geng, Vyas Ramasubramani, and Matthew Spellings.