"ATLAS of Biochemistry", a Repository of All Possible Biochemical Reactions for Synthetic Biology and Metabolic Engineering Studies
Knowledge gaps in metabolic models and continuously growing biochemical databases indicate that our understanding of metabolism is far from being complete. Recent technical and analytical progress in the field of metabolomics has lead to vast amounts of new compounds being identified in living organisms, but how these metabolites integrate in metabolic networks remains largely unknown. To address these problems of incomplete knowledge, we propose a computational approach that identifies novel hypothetical reactions between known metabolites, it further integrates experimentally measured molecular structures into existing metabolic networks, and it finally predicts chemical compounds that are probable to exist in metabolism.
The computational framework BNICE.ch is used to exploit the known biochemistry contained in the Kyoto Encyclopedia of Genes and Genomes (KEGG). We summarize the vastly diverse functionalities of enzymatic reactions in a few hundred expert-curated reaction rules, each generalizing multiple biochemical reactions. We then apply these rules to all metabolites known to KEGG in order to create a database of all the biochemically plausible reactions between compounds reported to occur in living organism. This extrapolation of the known metabolism results in a network of more than 130’000 known and novel reactions, each connecting two or more KEGG compounds. For KEGG compounds that cannot be connected to any metabolic pathway through single known or novel reactions, we propose multi-step reactions involving not only KEGG compounds as intermediates, but also compounds appearing in purely chemical databases like PubChem. These hypothetical intermediate metabolites are highly probable to play a role in metabolism, even though they are not reported as biological compounds in KEGG.
For all the known and novel reactions generated by our approach we further estimate the Gibbs free energy using the Group Contribution Method (GCM), and we also assess the structural similarity of the hypothetical reactions to the known KEGG reactions. To validate the consistency of our results with known biochemistry, we show that several reactions that have been added in the last KEGG update can be predicted as novel reactions from compounds in the earlier versions of the KEGG database.
The generated information is organized in the “Atlas of biochemistry”, an online database that allows the user to search for all the possible routes from any substrate compound to any product. These pathways involve known and novel enzymatic steps and they provide potential targets for protein engineering to alter substrate specificity. Our approach of introducing novel biochemistry into pathway design can be of great interest in metabolic engineering and synthetic biology projects.