(140c) Using Machine Learning to Recommend Reaction Conditions and Quantifying Similarity of Catalysts, Solvents, and Reagents

Authors: 
Gao, H., Massachusetts Institute of Technology
Coley, C. W., Massachusetts Institute of Technology
Green, W. H., Massachusetts Institute of Technology
Jensen, K. F., Massachusetts Institute of Technology
Using Machine Learning to Recommend Reaction Conditions and Quantifying Similarity of Catalysts, Solvents, and Reagents

Hanyu Gao, Thomas J. Struble, Connor W. Coley, William H. Green, Klavs F. Jensen

One essential element for synthetic planning is the reaction conditions. Properly designed reaction conditions can promote desired reactivity, and sometimes small changes in reaction context can lead to drastically different reaction outcomes. While other aspects of computer-aided synthetic planning has seen rapid development, including retrosynthesis1 and evaluation of reaction outcomes, 2,3 the suggestion of reaction conditions is primarily considered a human task and relies heavily on chemists’ knowledge and experience, and computer aided condition recommendation remains a challenging and under-explored problem. 4–6

In this work we develop a neural-network based model to predict suitable reaction conditions for organic transformations. The model is trained on roughly 10 million examples from the Reaxys7 database in order to predict the exact chemicals used as catalysts, solvents, reagents, and an appropriate temperature for the reaction. Prediction results are evaluated both quantitatively, using a variety of accuracy metrics, and qualitatively, using multiple sets of representative examples. The model is able to predict a closely similar context combination, based on strictly defined criteria, to recorded catalyst, solvent and reagent in the top-ten predictions with accuracy of 67.4%, with accuracies for individual elements reaching 80%~90%. Empirical evaluation shows that many of the failed predictions are potential alternatives to the true conditions, or due to data quality issues. Temperature is predicted to be within 20K deviation from the recorded temperature in 60%~70% of the test cases, with dependence on the correctness of chemical context prediction.

Besides making predictions, the model can also generate information that can be used to quantify the similarity of catalysts, solvents and reagents. We discovered that the weight matrix in the last layer for predicting catalysts/solvents/reagents can be effectively used to calculate a score that represents the functional similarity between catalysts/solvents/reagents. The identified near neighbors of common catalysts, solvents and reagents agree well with established knowledge. This work demonstrates the possibility of inferring functional similarity of chemicals purely from reaction data, without relying on property calculations.

The reaction condition suggestions and similarity information provided by this tool can be used to aid experimental design, improve the accuracy of in silico evaluation of reactivity, and pathway-level assessment and improvement of chemical synthesis processes.

Reference

(1) Segler, M. H. S.; Preuss, M.; Waller, M. P. Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI. Nature 2018, 555 (7698), 604.

(2) Jin, W.; Coley, C.; Barzilay, R.; Jaakkola, T. Predicting Organic Reaction Outcomes with Weisfeiler-Lehman Network. In Advances in Neural Information Processing Systems; 2017; pp 2604–2613.

(3) Coley, C. W.; Barzilay, R.; Jaakkola, T. S.; Green, W. H.; Jensen, K. F. Prediction of Organic Reaction Outcomes Using Machine Learning. ACS Cent. Sci. 2017, 3 (5), 434–443.

(4) Marcou, G.; Aires de Sousa, J.; Latino, D. A. R. S.; de Luca, A.; Horvath, D.; Rietsch, V.; Varnek, A. Expert System for Predicting Reaction Conditions: The Michael Reaction Case. J. Chem. Inf. Model. 2015, 55 (2), 239–250.

(5) Segler, M. H. S.; Waller, M. P. Modelling Chemical Reasoning to Predict and Invent Reactions. Chem. Eur. J. 2017, 23 (25), 6118–6128.

(6) Lin, A. I.; Madzhidov, T. I.; Klimchuk, O.; Nugmanov, R. I.; Antipin, I. S.; Varnek, A. Automatized Assessment of Protective Group Reactivity: A Step toward Big Reaction Data Analysis. J. Chem. Inf. Model. 2016, 56 (11), 2140–2148.

(7) Reaxys https://new.reaxys.com/.