Bridging the Gap between Structural Systems Biology and Constraint-Based Modeling: Tools for Defining the Functional Structural Proteome and Extracting Genome-Scale Structural Information for Metabolic and Macromolecular Expression Models | AIChE

Bridging the Gap between Structural Systems Biology and Constraint-Based Modeling: Tools for Defining the Functional Structural Proteome and Extracting Genome-Scale Structural Information for Metabolic and Macromolecular Expression Models

Authors 

Catoiu, E. - Presenter, University of California, San Diego
Mih, N., UCSD
Lachance, J. C., Université de Sherbrooke
Lloyd, C. J., University of California, San Diego
Kavvas, E., UCSD
Palsson, B. O., University of California, San Diego
Genome-scale models of metabolism and macromolecular expression (ME-models) have used structural information to identify properties of the structural proteome by mapping individual genes to a subunit within a 3D protein structure. While gene-mapping is a first step in structural systems biology, enzymes that catalyze biochemical reactions can be composed of multiple protein subunits from multiple genes. Thus, the proper gene-mapping of multi-subunit protein structures to form a “functional” structural proteome is not only crucial for the integration of structural data into genome-scale models, but also provides the basis for discovery and incorporation of novel structural properties for constraint development in ME-models. Here, we describe the enhancement of a structural systems biology software (ssbio) tools to define and to gather novel genome-scale properties of the functional structural proteome of Escherichia coli. Specifically, tools were developed to address the technical challenges of working with structural data, ensuring that structures incorporated into ssbio are in a format that reflects the multi-subunit architecture of the enzyme. Structure quality predictions were improved to identify and remove mis-folded protein regions found in homology models. An algorithm was developed to distinguish between multiple conformations of structures available in the Protein Databank (PDB), to recreate annotated gene-enzyme stoichiometry, to define the functional structural proteome of E.coli, and to predict incorrect enzyme annotations. Upon the identification of the functional proteome, novel angstrom-level structural properties including cross-sectional area of transmembrane enzymes and soluble volume of enzymes were calculated. Additionally, computational tools to map residue-specific information such as enzymatic domains, secondary structure and strain-specific sequence variation onto structural data were deployed to analyze in-house mutations of interest. Together, the development of the aforementioned structural systems biology tools allows for seamless integration of functionally relevant structural information into genome-scale models, potentially leading to a new set of constraints driving biological discovery.