(342i) Machine Learning of Retrosynthetic Disconnections and Reaction Outcomes: Influence of Reaction Template Characteristics | AIChE

(342i) Machine Learning of Retrosynthetic Disconnections and Reaction Outcomes: Influence of Reaction Template Characteristics

Authors 

Aude, A. - Presenter, Massachusetts Institute of Technology
Heid, E., Massachusetts Institute of Technology
Green, W., Massachusetts Institute of Technology
Template-based and template-free machine learning models have been shown to accurately predict retrosynthetic disconnections and forward reaction outcomes for organic synthesis. They usually take a reactant or product as input and propose a ranked list of reaction templates, i.e. chemical transformations, via multi-class classification. However, the performance of template-based models is heavily influenced by the size and canonicalization of the set of employed templates as well as the choice of evaluation metrics. First, extracting larger, more specific templates usually leads to a large number of unique templates for a given dataset, with some of the templates only being applicable to very few reactions. The performance of multi-class classification models usually decreases with an increasing number of classes and performs poorly on underrepresented classes. Thus, smaller, more general templates tend to increase model performance but may lead to poor suggestions from a chemical point of view. Finding the optimal level of template specificity is therefore an important objective. Second, non-exclusive templates add noise to the training data, unnecessarily increase the number of classes, and complicate model evaluation. Non-exclusivity can be caused by incomplete canonicalization and inconsistencies in the template extraction algorithm. Third, an inconsistent use of evaluation metrics in literature hampers the comparability of different models. For example, success may be defined as correctly recovering either the template extracted for a test reaction, or the reaction outcome after template application (reaction outcome being reactants in case of retrosynthesis, products in case of forward prediction). If non-exclusive templates exist in the employed set of templates, i.e. different templates lead to the same outcome, a discrepancy arises between the two measures of success. In addition, other metrics such as the fraction of recommended templates that are applicable, i.e. describe a chemical transformation that is actually possible for the given molecule, are often neglected.

Therefore, we systematically study the influence of template size, exclusivity and canonicalization on the performance of template-ranking algorithms as measured by different metrics on datasets of different sizes. Based on our findings, we composed a set of recommendations on the optimal template size and specificity for retrosynthesis and forward prediction models. We furthermore developed a fast, hierarchical correction scheme to filter out non-exclusive templates, which increased model performance considerably. Beyond our study of single-step reactions, we expect these findings to be useful for multi-step retrosynthesis pathway planning.