(562b) MetRxn: Reaction and Metabolite Standardization and Congruency Across Databases and Genome-Scale Metabolic Models

Suthers, P., The Pennsylvania State University
Maranas, C. D., Department of Chemical Engineering

The ever-accelerating pace of DNA sequencing and annotation information generation is spearheading the global inventorying of metabolic functions across all kingdoms of life. Increasingly, metabolite and reaction information is organized in the form of community, organism, or even tissue-specific genome-scale metabolic reconstructions. These reconstructions account for reaction stoichiometry and directionality, gene to protein to reaction associations, organelle reaction localization, transporter information, transcriptional regulation and biomass composition. Already over 40 genome-scale models are available for eukaryotic, prokaryotic and archaeal species and are becoming indispensable for computationally driving engineering interventions in microbial strains for targeted overproductions, elucidating the organizing principles of metabolism and even pinpointing drug targets. A key barrier to the pace of extraction of metabolic knowledge from data is our inability to directly make use of metabolite/reaction information from databases (e.g., BRENDA, KEGG, BioCyc, UM-BBD, PubChem, ChEBI, Reactome.org, Rhea, etc.) or other metabolic models due to incompatibilities of representation, duplications and errors. Therefore, the inadvertent inclusion of multiple replicates of the same metabolite, stoichiometrically inconsistent and/or elementally/charge unbalanced reactions can lead to erroneous model predictions and missed opportunities to reveal (synthetic) lethal gene deletions, repair network gaps and quantify metabolic flows. There have already been a number of efforts aimed at addressing some of these limitations such as the Rhea database and Model SEED. Motivated by this challenge we recently developed the MetRxn knowledgebase  that integrates, using internally consistent descriptions, metabolite and reaction information from 6 databases and 34 metabolic models Metabolite and reaction data was first downloaded from BRENDA, KEGG and BioCyc using a variety of methods based on protocols such as SOAP, FTP and HTTP. We subsequently pre-processed the data into flat files that were imported into MetRxn. All original information pertaining to metabolite name, abbreviations, metabolite geometry, related reactions, catalyzing enzyme and organism name, gene-protein-reaction associations, and compartmentalization was retained. For all 34 genome-scale models ancillary information culled form the corresponding publications was also imported. The “raw data” from both databases and models was unified using standard SQL scripts on a MySQL server. We used Marvin (Chemaxon) to analyze all 231,085 raw metabolite entries containing structural information (out of a total of 322,936 entries). Metabolite atom bond connectivity was calculated at a fixed pH of 7.2 and converted into standard Isomeric SMILES format. Metabolites were also annotated with Canonical SMILES using the OpenBabel Interface from ChemSpider. Metabolites with missing structural information were re-visited during the reaction reconciliation step. After generating the initial metabolite associations, we identified reaction overlaps using the reaction synonyms and reaction strings along with the metabolite SMILES representations. During this step, reactions were flagged as single-compartment or two-compartment (i.e., transport reactions). Using the corrected metabolite elemental composition and protonation states, reactions are evaluated for charge and elementally balance. We used a linear optimization program to charge and elementally balance all reactions. MetRxn contains over 62,000 unique metabolites and 56,000 unique reactions. In this talk, we present the web-based access to MetRxn, describing updates in its increased scope of standardized metabolic models. Specifically, we explore the impact standardization has on genome-scale models and resulting flux balance analysis of the models. We also explore some interesting observations with model content overlaps and contrast these to phylogenetic trees. We also discuss SBML export and a web-based API for integration with external applications.