(140b) Learning to Design and Validate Small-Molecule Synthetic Routes from Historical Reaction Data

Authors: 
Coley, C. W., Massachusetts Institute of Technology
Plehiers, P., Ghent University
Jin, W., Massachusetts Institute of Technology
Gao, H., Massachusetts Institute of Technology
Barzilay, R., Massachusetts Institute of Technology
Jaakkola, T. S., Massachusetts Institute of Technology
Green, W. H., Massachusetts Institute of Technology
Jensen, K. F., Massachusetts Institute of Technology
Advances in laboratory automation promise to decrease the manual effort of synthesis, but determining how to synthesize a compound currently requires time and effort investment from expert chemists. To achieve full autonomous chemical synthesis, one must have robust synthesis planning software that can propose fully-specified synthetic routes to target molecules.

In this talk, we will describe our recent efforts to develop such software. The overarching theme of our work is how to most effectively leverage historical reaction data to inform decision-making in small molecule pathway design.

The overall synthesis planning workflow contains a number of interconnected modules. We focus on two critical aspects of computer-aided synthesis planning and how machine learning and other data-driven techniques have enabled new approaches to both challenges. First, we discuss the problem of retrosynthetic planning (i.e., identification of suitable starting materials) and how the recursive expansion and search strategy are both conducive to machine learning approaches. Second, we discuss the challenge of in silico reaction validation, which can be addressed by solving the inverse problem of forward reaction prediction. We summarize neural network-based approaches we have taken to develop models that can anticipate the products of a chemical reaction after being trained on previously published reactions. Here we use the model for reaction validation, but its utility extends to prediction of side products and impurities. Finally, we describe how these techniques for retrosynthesis and forward prediction are integrated into an overall workflow that, for a given molecular target, predicts a rank ordered list of reaction paths that connect the target to purchasable starting materials via a series of plausible reaction steps. The integrated program offers additional features for excluding specific reactions or chemicals, e.g., for IP or toxicity concerns.