Improving High Throughput Genome-Scale Metabolic Model Reconstruction and Validation with Tnseq Data Using Modelseed 2

Authors: 
Faria, J. P., Argonne National Laboratory
Edirisinghe, J. N., Argonne National Laboratory
Liu, F., University of Minho
Seaver, S. M. D., Argonne National Laboratory
Jeffryes, J. G., Argonne National Laboratory
Zhang, Q., Argonne National Laboratory
Weisenhorn, P., Argonne National Laboratory
Sadkhin, B., Argonne National Laboratory
Gupta, N., Argonne National Laboratory
Gu, T., Argonne National Laboratory
Henry, C. S., Argonne National Laboratory
The Department of Energy Systems Biology Knowledgebase (KBase) is a platform designed to solve the grand challenges of Systems Biology. KBase has implemented bioinformatics tools that allow for multiple workflows including genome annotation, comparative genomics, and metabolic modeling. We selected a phylogenetically diverse set of approximately 1000 genomes and constructed draft genome-scale metabolic models (GSMMs) using the ModelSEED pipeline implemented in KBase. We used these 1000 genomes as a test set to improve the quality of models produced by the ModelSEED. First, we curated our mapping of RAST functional roles to biochemistry by reconciling with data mined from KEGG and published metabolic models; we corrected errors in our reaction reversibility assertions to improve overall model constraints; we applied a new method to predict auxotrophy across all 1000 genomes to predict improved gapfilling media; we refined our gapfilling procedure to prevent draft models from our pipeline from overproducing ATP; and we process all models through the Memote pipeline, accompanying complete reconstructions with Memote reports. We show how all of our pipeline improvements increase the number of gene associations, decrease the number of gapfilled reactions, improve the accuracy of growth and ATP production yield predictions, and decrease the number of blocked reactions across all models. The addition of Memote to our pipeline enables us to provide a measure for model quality that is consistent across reconstruction platforms. We show how auxotrophy, and pathway presence varies across our 1000 training-set genomes along the phylogenetic tree. We also plot model quality across the phylogenetic tree, identify taxa where model quality is lower. Finally, we select five specific genomes for which comprehensive TN-seq data is available, and we compare model predictions of all data with experimental results, showing significant improvement in accuracy between models generated by the original ModelSEED and models from ModelSEED 2.