(271d) Cheminformatic Elucidation of Enzymatic Substrate Promiscuity
Enzymatic substrate promiscuity—the ability of an enzyme to perform a singular chemical reaction on several different substrates—presents an opportunity to expand the scope of existing biosynthetic pathways to include more efficient, direct routes to chemicals in common laboratory and industrial organisms and processes. To date, promiscuous enzymes are currently characterized for substrate specificity against a narrow distribution of chemicals that may not be representative of the enzyme’s versatility. This limitation exists largely because activity assays are slow and expensive, in spite of the sheer number of chemicals available for purchase (~107 according to the ZINC database). While testing large numbers of chemicals may be impractical, computational approaches may allow for existing data to be used more effectively and new data to be collected more strategically. In this instance, machine learning, and the support vector classifier (SVC) in particular may be a useful tool to predict whether a given compound will have observable activity with a given enzyme. In our current study, we develop SVCs for predicting substrate activity in addition to an SVC-based active learning algorithm that can be used to select batches of optimal compounds to test against enzymes of interest. Cross-validation of the constructed SVCs shows that existing data can predict the activity of untested compounds closely related to the tested compounds, but cannot accurately predict the activity of compounds that are more distantly related to the existing training set. The active learning algorithm addresses this shortcoming by selecting samples that can improve global accuracy with fewer experiments. The application of these algorithms to wide arrays of metabolic enzymes would result in a library of SVCs that can predict high-probability promiscuous enzymatic reactions and could prove a valuable resource for the design of novel metabolic pathways.