# (317h) Redemption: Reduced Dimensional Ensemble Modeling and Parameter Estimation

- Conference: AIChE Annual Meeting
- Year: 2014
- Proceeding: 2014 AIChE Annual Meeting
- Group: Computing and Systems Technology Division
- Session:
- Time: Tuesday, November 18, 2014 - 2:36pm-2:54pm

The creation of kinetic ordinary differential equation (ODE) models of biological networks is often hampered by imprecise knowledge of reaction rate equations and their associated parameters. The model is generally written as:

dX/dt = Sv(x,p) (1)

where X is the vector of species concentrations, S is the stoichiometric matrix, v(X,p) is the vector of rate equations, and p is the vector of kinetic parameters. Here, the estimation of unknown kinetic parameters from time-series concentration data frequently becomes the bottlenecking step in the model building, motivating the development of a large number of methods [1]. Existing methods can generally be divided into two groups: integral and differential approach, based on whether or not the ODE model is integrated during the estimation. In the integral approach, the parameter estimation is formulated as a constrained optimization to minimize model prediction error. Meanwhile, the differential approach involves a pre-processing step, in which the time-series data are smoothen and differentiated to give estimates of dX/dt and Eq (1) is then used to compute dynamic reaction flux estimates v. Afterwards, the parameters are estimated by minimizing flux prediction errors. The main advantage of the differential over the integral approach is computational speed, as the ODEs need not be integrated and the parameter estimation can be done one flux at a time. However, the differential approach is known to give biased parameter estimates, and the parameter accuracy is sensitive to the data-smoothing step.

In biological networks, the number of species often exceeds that of reactions, and thus the matrix S does not have a full column rank. Consequently, the differential approach cannot be directly applied to metabolic networks, as the dynamic flux estimation becomes underdetermined. Despite this restriction, it may be possible to estimate the dynamic fluxes when the data are sufficiently dense or by taking minimum norm solution [2, 3]. In a new class of methods, we have used the relationship between dX/dt and v in Eq. (1) to reduce the dimensionality of the parameter estimation [4]. Specifically, we first divided the fluxes into independent and dependent subsets, vi and vd, such that the dependent fluxes could be uniquely determined from the independent fluxes according to Eq. (1). Given the values of the (independent) parameters of vi(X,p) and data of X, the flux vi could be calculated according to the rate equations vi(X,p). Subsequently, the flux vd were obtained by solving Eq. (1) using the estimates of dX/dt from data and the vi values above. The (dependent) parameters of vd(X,p) could then be estimated by least square optimization of the differences between the flux vd and the flux function vd(X,p), performed one flux at a time. Therefore, the parameter estimation was done only over the independent parameters of vi(X,p). We have shown using case studies that such a strategy, called incremental parameter estimation (IPE), could provide more than two orders of magnitude improvement in the computational speed over the traditional parameter estimation using the same error function [4].

However, the strategy above still depends on time-series data preprocessing and thus suffers from the same issues affecting any differential estimation method. Bhatt and coworkers [5-7] recently proposed differential estimation methods that avoided time-series data smoothing and differentiation, by estimating the extents of reactions, not the reaction fluxes. Nevertheless, the estimation of reaction extents requires the S matrix to have a full column rank. In this work, we developed a new reduced dimensional parameter estimation, motivated by the extents of reactions. The new method involved the calculation of integrated fluxes V, based on rewriting Eq. (1) in the integral form:

X(t) – X(0) = SV(X,p) (2)

We divided the integrated fluxes V into independent and dependent components, and performed the parameter search only for those that appeared in the independent subset. We demonstrated the performance of the integrated flux estimation method using two case studies: an ODE model of branched metabolic network and a lin-log model of *L. lactis *metabolic pathways. The case studies showed that the integrated flux strategy could provide parameter estimates with better accuracy than the IPE method without the need to perform data smoothing. For the lin-log model, the integrated flux could provide a solution while previous methods failed to converge.

The difficulty in the parameter estimation above has been attributed to the parameter identifiability, or the lack thereof. In other words, there can exist multiple parameter combinations that are indistinguishable according to the data. This issue has motivated the application of ensemble modeling through the creation of a family of models. Within the umbrella of ensemble modeling, we have created a method for the generation of an ensemble of parameters that can provide statistically equivalent fit to the experimental data [8]. The method was based on the parameter estimation approach above. Here, instead of finding the global minimum solution of the error function, we used a combination of an out-of-equilibrium Adaptive Monte Carlo method and a multiple ellipsoids-based sampling method [9] to identify a set of parameters whose error functions were within a specified threshold. The threshold error value could be set using statistical score (e.g. F-distribution) or estimated by bootstrapping the original data using Monte Carlo approach. Our ensemble modeling method has the advantage that the ensemble could be compactly defined over only the space of the independent parameters.

The methods mentioned above are available as a MATLAB based toolbox, called REDEMPTION (REduced Dimensional Ensemble Modeling and Parameter estimaTION). Here, we have used eSS (Enhanced Scatter Search) algorithm from SSmGO (Scatter Search for Matlab Global Optimization) Toolbox for the global optimization [10, 11], and CVODE from SUNDIALS (SUite of Nonlinear and DIfferential/ALgebraic equation Solvers) for solving ODEs [12]. For the exploration of parameters for ensemble modeling, we have used the HYPERSPACE toolbox [9].

1. Chou, I.C. and E.O. Voit, *Recent developments in parameter estimation and structure identification of biochemical and genomic systems.* Mathematical Biosciences, 2009. **219**(2): p. 57-83.

2. Chou, I.C. and E.O. Voit, *Estimation of dynamic flux profiles from metabolic time series data.* BMC Systems Biology, 2012. **6**.

3. Voit, E.O., *Characterizability of metabolic pathway systems from time series data.*Mathematical Biosciences, 2013.

4. Jia, G.J., G. Stephanopoulos, and R. Gunawan, *Incremental parameter estimation of kinetic metabolic network models.* BMC Systems Biology, 2012. **6**.

5. Amrhein, M., et al., *Extents of Reaction and Flow for Homogeneous Reaction Systems with Inlet and Outlet Streams.* Aiche Journal, 2010. **56**(11): p. 2873-2886.

6. Bhatt, N., M. Amrhein, and D. Bonvin, *Incremental Identification of Reaction and Mass-Transfer Kinetics Using the Concept of Extents.* Industrial & Engineering Chemistry Research, 2011. **50**(23): p. 12960-12974.

7. Bhatt, N., et al., *Incremental identification of reaction systems-A comparison between rate-based and extent-based approaches.* Chemical Engineering Science, 2012. **83**: p. 24-38.

8. Jia, G., G. Stephanopoulos, and R. Gunawan, *Ensemble Kinetic Modeling of Metabolic Networks from Dynamic Metabolic Profiles.* Metabolites, 2012. **2**(4): p. 891-912.

9. Zamora-Sillero, E., et al., *Efficient characterization of high-dimensional parameter spaces for systems biology.* BMC Systems Biology, 2011. **5**.

10. Egea, J.A., et al., *Scatter search for chemical and bio-process optimization.* Journal of Global Optimization, 2007. **37**(3): p. 481-503.

11. Rodriguez-Fernandez, M., J.A. Egea, and J.R. Banga, *Novel metaheuristic for parameter estimation in nonlinear dynamic biological systems.* BMC Bioinformatics, 2006. **7**.

12. Hindmarsh, A.C., et al., *SUNDIALS: Suite of nonlinear and differential/algebraic equation solvers.* Acm Transactions on Mathematical Software, 2005. **31**(3): p. 363-396.