(598a) A Systems Biology Definition of the Core Proteome of Metabolism and Expression Is Consistent with High-Throughput Data | AIChE

(598a) A Systems Biology Definition of the Core Proteome of Metabolism and Expression Is Consistent with High-Throughput Data

Authors 

Yang, L. - Presenter, University of California, San Diego
Tan, J. - Presenter, University of California, San Diego
O'Brien, E. J. - Presenter, University of California, San Diego
Monk, J. M. - Presenter, University of California, San Diego
Kim, D. - Presenter, University of California, San Diego
Li, H. - Presenter, University of California, San Diego
Charusanti, P. - Presenter, University of California, San Diego
Ebrahim, A. - Presenter, University of California, San Diego
Lloyd, C. J. - Presenter, University of California, San Diego
Yurkovich, J. T. - Presenter, University of California, San Diego
Du, B. - Presenter, University of California, San Diego
Dräger, A. - Presenter, University of Tuebingen
Thomas, A. - Presenter, Technical University of Denmark
Sun, Y. - Presenter, Stanford University
Saunders, M. A. - Presenter, Stanford University
Palsson, B. O. - Presenter, University of California, San Diego

Finding the minimal set of gene functions needed to sustain life is of both fundamental and practical importance. Minimal gene lists have been proposed using comparative genomics-based core proteome definitions. A definition of a core proteome that is supported by empirical data, is understood at the systems level, and provides a basis for computing essential cell functions is lacking. Here, we use a systems biology-based genome-scale model of metabolism and expression to define a functional core proteome consisting of 356 gene products, accounting for 44% of the Escherichia coli proteome by mass based on proteomics data. This systems biology core proteome includes nearly 200 genes not found in previous comparative genomics-based core proteome definitions, accounts for 65% of known essential genes in E. coli, and has functional overlap with minimal genomes (Buchnera aphidicola and Mycoplasma genitalium) (AUC = 0.78). Based on transcriptomics data across growth conditions and genetic backgrounds, the systems biology core proteome is significantly enriched in non-differentially expressed genes, and depleted in differentially expressed genes. Compared to the non-core, core gene expression levels are also similar across genetic backgrounds. Furthermore, core genes exhibit significantly more complex transcriptional and post-transcriptional regulatory features (40% more transcription start sites per gene, >20% longer 5’UTR). Thus, genome-scale systems biology approaches rigorously identify a functional core proteome needed to support growth. This framework, discerned and validated using high-throughput datasets, facilitates a mechanistic understanding of systems-level core proteome function through in silico models; it de facto defines a paleome.