(611a) Open Chemistry and Jupyter: Platform for Data Mining and Machine Learning | AIChE

(611a) Open Chemistry and Jupyter: Platform for Data Mining and Machine Learning


Open, interactive interfaces employing data-centric workflows have been developed, reusing best-of-breed open source software in order to deliver an integrated platform for knowledge discovery. The Jupyter project offers a powerful electronic notebook interface, with server-side software kernels executing Python code. The JupyterLab web frontend provides a web-based interface with interactive cells where code can be edited, and panels where data can be visualized in various ways. Coupling these interfaces with a powerful data server, capable of triggering simulations, analyses, and other workflows offers a powerful capability to seamlessly execute codes from a pre-configured environment, store data, and apply machine learning techniques. The use of software containers improves both modularity and reproducibility.

Extension of the Python software kernels and web interface with chemistry specific capabilities results in a powerful software for chemical data. The open source platform will be described, along with links to a number of community codes from the computational chemistry community, machine learning, and the emerging computational chemistry machine learning community. The project is being developed in the open, with robust interfaces that facilitate the addition of new techniques. The core of the platform is a chemically aware data server, coupled with job submission capabilities, and data analytics.

The use of industry standard web programming interfaces, data formats, and modern HTML5 web components will be described. The use of the next-generation JupyterLab interface, its extension, and the integration of batch scheduling, search, and analytics will be covered. All major components will work in any modern web browser, including 3D visualization, analysis, and code execution. The use of Python kernels offers an ideal environment for analysis, reusing the tools already available in that ecosystem including significant investments in data mining and machine learning tools. A companion HTML5 application offers a simpler view of results that can be shared more widely, with full access to data, and linkage to the notebooks producing the data.