(444a) Optfill: A Novel Optimization-Based Tool to Automate the Gapfilling of Genome-Scale Metabolic Models | AIChE

(444a) Optfill: A Novel Optimization-Based Tool to Automate the Gapfilling of Genome-Scale Metabolic Models


Schroeder, W. - Presenter, The Pennsylvania State University
Saha, R., University of Nebraska-Lincoln

A Novel Optimization-Based Tool to Automate the Gapfilling of Genome-Scale
Metabolic Models


Schroeder and Rajib Saha

University of Nebraska – Lincoln, Lincoln, NE

modeling of metabolism is now an indispensable tool to drive the processes of
understanding, discovering, and redesigning of biological systems. By defining
the metabolic space, genome-scale metabolic (GSM) models can assess allowable
cellular phenotypes and explore metabolic potential and restrictions under
specific environmental and/or genetic conditions. GSM model curation processes
typically involve gleaning information on gene annotations and reactions from
major public databases such as KEGG, Uniprot, Metacyc and ModelSeed. However,
inconsistencies across these databases and incomplete gene annotations leave
gaps in any GSM models. One of the major tools called Gapfill (as well as its
many offshoots) applies Mixed Integer Linear Programming (MILP)-based approach
and utilizes the additional functionalities from closely related organisms or
changing the direction of existing reactions in order to fill gaps in any GSMs.
Although GapFill automates the model building processes, it fixes the gaps
individually without any consideration of not creating thermodynamically
infeasible cycles (TICs). Hence, Gapfill will make redundant changes and
increases the number of TICs in GSM models in a vast majority of cases, which
ultimately require further manual scrutiny.

In order to
address these issues as well as increase the automation of GSM model
reconstruction, introduced here is an improved method, namely OptFill, to fill
gaps in GSMs. OptFill applies a multi-level Mixed Integer Non-Linear Programming
(MINLP) optimization approach which addresses the fixes needed on a per-GSM
model basis (as opposed to a per-metabolite/gap basis in Gapfill), which seeks
to obtain three objectives: maximize the number of gaps fixed, minimize the
number of reactions/functionalities added, and prevent TIC creation subject to
a model and a database of potential filling reactions. This is accomplished
through two subtools: TICFind and ModelFill. The TICFind problem is applied
separately to the model and to the database of potential filling reactions and
these identified TICs are resolved in order to ensure high-quality input
datasets. The TICFind problem is then applied to the model and the database
together to identify sets of reaction which should not be added in their entirety
if TICs are to be avoided. The ModelFill tool is a bi-level optimization
problem which requires Lagrangian duality to reduce the multi-level
multi-objective formulation to a single-level single-objective formulation for
the ease of solution. OptFill is applied to published models of E. coli in
order to demonstrate its use to improve already published models by removing
tens of TICs and increasing model connectivity in non-intuitive ways. In
addition, OptFill is currently being applied to the development of a GSM model
of a poorly annotated black yeast strain Exophiala dermatitidis. Since
approximately 4% of open reading frames Exophiala dermatitidis are
annotated with enzyme classifications, the initial GSM reconstruction has a
huge number of metabolic gaps. Thus, OptFill provides a distinct advantage over
the traditional Gapfill approach in the extent of automation, speed of model
development, and needed manual curation after filling the gaps and this is more
evident for any non-model, poorly annotated, and under-studied organisms such
as Exophiala dermatitidis.