(431c) Systematic extraction of meaningful context-specific models of metabolism using gene expression data
AIChE Annual Meeting
Tuesday, November 16, 2021 - 4:42pm to 5:00pm
Genome-scale models of metabolism are comprehensive encyclopedias of metabolic pathways in specific organisms. Context-specific models must be extracted algorithmically from these using omics data to accurately model the condition-specific physiology of an organism. However, the choice of extraction method/threshold and the existence of alternate solutions contributes to ambiguities in the content and predictive capabilities of extracted models. Therefore, there is a need for guidelines for constructing biologically relevant context-specific models. Here we quantify the influence of the choice of gene expression threshold, protection of required metabolic functionalities, and the influence of alternate optima on models extracted for E. coli and a renal cancer cell line (786O) using GIMME, iMAT, MBA, and mCADRE. While model size is strongly influenced by the choice of model extraction method, the threshold for high and low-expression genes impacts the distribution of model sizes and the variability in model content. In both E. coli and 786O, the smallest and largest models were generated by GIMME and MBA, respectively. Models extracted using mCADRE were least variable in size and content for both E. coli and 786O. On the other hand, models for E. coli generated using MBA and models for 786O generated using GIMME had the most variability in content with 28% and 70% of the reactions, respectively, contributing to alternate optimal solutions. Visualization of model performance using a Receiver-Operatic Characteristic (ROC) plot revealed a trade-off between model specificity and sensitivity in response to varying the threshold with an increase in specificity and a reduction in sensitivity upon increasing the threshold for high expression genes. The ROC plot is demonstrated to be a valuable tool enabling the selection of an optimal combination of threshold and method for extracting models that accurately reflect the biology of the organism. Upon screening the models using the ROC plot, models generated using GIMME for E. coli and those generated using mCADRE for 786O were identified to be in best agreement with gene knockout data. Finally, the predicted growth rate in models generated using MBA and mCADRE was less than 50% of the experimentally measured growth rate when flux constraints were not enforced. This suggests that merely protecting the reactions associated with required metabolic functionalities during model extraction is insufficient to ensure a biologically relevant flux through these reactions and therefore, existing methods must be equipped to protect both reactions and fluxes associated with required metabolic functionalities in extracted context-specific models. Based on the above findings, we propose a systematic workflow to extract meaningful context-specific models that accurately represent the cellular metabolic state and a set of guidelines to be considered when developing improved algorithms for -omics data integration.