(598g) An Information Theoretic Approach to the Model Selection Problem in Systems Biology
AIChE Annual Meeting
Wednesday, November 11, 2015 - 5:15pm to 5:35pm
Data from biological systems can be modeled with a variety of differing mathematical descriptions. For instance, to model metabolic states one can use kinetic, constraint-based or cybernetic formulations to explain a given set of metabolic flux data. While arguments made from an intuitive standpoint may justify the selection of one model over another, the field of systems biology has yet to widely implement a universal scale to weigh the utility of models against one another. To address the problem of model selection, it is prudent to borrow arguments formulated from information theory. In this light, it becomes possible to distinguish models from one another by their ability to describe the regularity of a given system’s output data. Capturing this regularity is in itself an act of compressing the data and therefore information theoretic methods are applicable. In this vein, one may apply the Minimum Description Length (MDL) principle to candidate biological models in order to compare how well these models describe regularity in data. Metrics derived upon MDL principles seek to identify a single model from a set of candidate models that best compresses data. This is accomplished through the consideration of both the model’s likelihood of fit and its inherent complexity. The model that best compresses data from a process relative to other models can be said to be the most efficient and useful description of the system.
In this talk, MDL is employed to compare metabolic models on the basis of how well they compress metabolic flux data. Both static and dynamic data are considered for S. oneidensis and E. coli growing on lactate and glucose respectively. Metrics developed upon MDL principles including Akaike Information Criteria (AIC) and Bayesian Information Criterion (BIC) are applied in the analysis of a group of dynamic metabolic models to determine which model best compresses metabolic data through maximizing model fit and minimizing model complexity. This analysis identifies a point of diminishing returns in which additional model complexity provides little gain in describing data accurately for both nested and non-nested metabolic models. Starting with a comparison of flux predictions made by constraint based, kinetic and cybernetic metabolic models, this work develops a framework that intends to be extended to other model comparison applications in systems biology.