(161i) Machine Learning for Molecular Property Predictions and the Software Ecosystem That Enables It
AIChE Annual Meeting
2019 AIChE Annual Meeting
Computational Molecular Science and Engineering Forum
Software Engineering in and for the Molecular Sciences
Monday, November 11, 2019 - 2:30pm to 2:45pm
In this presentation, we will show how we employ machine learning to develop data-derived prediction models that are alternatives to physics-based models, and how we utilize them in massive-scale hyperscreening studies at a fraction of the cost. Aside from conducting such data-driven discovery, we also employ data mining techniques to develop an understanding of the hidden structure-property relationships that define the behavior of molecules, materials, and reactions. These insights form our foundation for the rational design and inverse engineering of novel compounds with tailored properties.
In this presentation we will discuss the progress on our software ecosystem for data-driven in silico research that enables data-driven research, both on the application as well as on the method development side. It consists of four loosely connected program suites: ChemLG is a generator for compound and material candidate libraries that allows us to enumerate chemical space (i.e., performing data definition); ChemHTPS provides an automated platform for the virtual high-throughput screening of these libraries (i.e., performing data generation); ChemBDDB offers a database and data model template for the massive information volumes created by data-intensive projects (i.e., performing data storage); and ChemML is a machine learning and informatics toolbox for the validation, analysis, mining, and modeling of such data sets.
The notion to utilize modern data science in chemistry is so recent that much of the basic infrastructure has not yet been developed, or is still in its infancy. The existing tools and expertise tend to be in-house, specialized, or otherwise unavailable to the community at large. Data science is thus in practice beyond the scope and reach of most researchers in the field. By contributing this open, general-purpose, comprehensive, easy-to-use software ecosystem, we aim to chart new paths in this area and help in overcoming this situation, filling the prevalent infrastructure gap, and thus making data-driven research a viable and widely accessible proposition for the community.