(88d) Design of an Automatic Platform for Machine-Learning Model Based Molecular Property Optimization | AIChE

(88d) Design of an Automatic Platform for Machine-Learning Model Based Molecular Property Optimization


McDonald, M. - Presenter, Georgia Tech
Koscher, B., Massachusetts Institute of Technology
Ha, S. K., MIT
Greenman, K. P., Massachusetts Institute of Technology
McGill, C. J., North Carolina State University
Gomez-Bombarelli, R., Massachusetts Institute of Technology
Green, W., Massachusetts Institute of Technology
Jensen, K., Massachusetts Institute of Technology
Bilodeau, C., Massachusetts Institute of Technology
Developments in machine learning have led to models capable of predicting a range of interesting molecular properties. Similarly, predictive retrosynthesis tools trained on large reaction databases have enabled automated planning of routes to novel compounds. Combining property and synthesis prediction, along with molecule generative modeling, allows automation of the molecule discovery process in silico. Physical evaluation of property-optimized molecules is a widely known bottleneck in the molecular discovery space. We have attempted to overcome this bottleneck by building a robotic platform guided by machine learning models. The platform uses a machine learning based retrosynthetic planner to synthesize interesting molecules (those having top-performing or uncertain predicted properties) and to measure and give feedback on their properties, for optimization in an automated, iterative fashion. We target three properties—optical absorbance, water/octanol partitioning, and oxidative stability—and use a scaffold approach to discover new compounds in a well-plate-based molecular design-make-test cycle.

As target molecules deviate more from the training data, the predictive power of the machine learning models decreases. To combat this, we have divided the discovery workflow into two phases: an exploration phase synthesizes a variety of molecules based on each scaffold to learn the particulars of said family, then, after retraining the models with the results of the first phase, an exploitation iteration creates a selection of high performers. Along the way, troublesome reactions are automatically optimized, with the results of optimization similarly being used to improve future synthetic routes. This talk will discuss the entirety of the workflow with respect to an example scaffold whose optical and partitioning properties have been optimized without sacrificing photo-oxidative stability. Automation efforts will be emphasized—particularly reaction execution and optimization in well plates, purification and isolation, product characterization, and engineering an interface between machine learning models and robotic hardware.