(315g) Optfill: A Novel Optimization-Based Tool to Automate the Gapfilling of Genome-Scale Metabolic Models | AIChE

(315g) Optfill: A Novel Optimization-Based Tool to Automate the Gapfilling of Genome-Scale Metabolic Models


Schroeder, W. - Presenter, The Pennsylvania State University
Saha, R., University of Nebraska-Lincoln

A Novel Optimization-Based Tool to Automate the Gapfilling of Genome-Scale Metabolic


Schroeder and Rajib Saha

University of Nebraska – Lincoln, Lincoln, NE

modeling of metabolism is now an indispensable tool to drive the processes of
understanding, discovering, and redesigning of biological systems. By defining the metabolic space,
genome-scale metabolic (GSM) models can assess allowable cellular phenotypes
and explore metabolic
potential and restrictions
under specific environmental and/or genetic conditions. GSM model curation processes
typically involve gleaning information on gene annotations and reactions from
major public databases such as KEGG, Uniprot, Metacyc and ModelSeed. However, inconsistencies
across these databases and incomplete gene annotations leave gaps in any GSM
models. One of the major tools called Gapfill (as well as its many offshoots) applies
Mixed Integer Linear Programming (MILP)-based approach and utilizes the
additional functionalities from closely related organisms or changing the
direction of existing reactions in order to fill gaps in any GSMs. Although GapFill
automates the model building processes, it fixes the gaps individually without any
consideration of not creating thermodynamically infeasible cycles (TICs). Hence,
Gapfill always makes redundant changes and increases the number of TICs in GSM
models, which ultimately require further manual scrutiny.

order to address these issues as well as increase the automation of GSM model reconstruction,
introduced here is an improved method, namely OptFill, to fill gaps in GSMs. OptFill
applies a multi-level Mixed Integer Non-Linear Programming (MINLP) optimization
approach which addresses the fixes needed on a per-GSM model basis (as opposed
to a per-metabolite/gap basis in Gapfill), which seeks to obtain three
objectives: maximize the number of gaps fixed, minimize the number of reactions/functionalities
added, and minimize the number of TICs created. As each imbedded optimization
level is MINLP in form, Lagrangian duality is used to reduce the multi-level
multi-objective formulation to a single-level single-objective formulation for the
ease of solution. OptFill is currently being applied to the development of a
GSM model of a poorly annotated black yeast strain Exophiala dermatitidis.
Since approximately 4% of open reading frames Exophiala dermatitidis are
annotated with enzyme classifications, the initial GSM reconstruction has a huge
number of metabolic gaps. Thus, OptFill provides a distinct advantage over the
traditional Gapfill approach in the extent of automation, speed of model
development, and needed manual curation after filling the gaps and this is more
evident for any non-model, poorly annotated, and under-studied organisms such
as Exophiala dermatitidis.