(711b) Model Parameterization through Data-Mining

Authors: 
Holiday, A., Princeton University
Jiang, Y., Georgia Institute of Technology
Kooshkbaghi, M., Princeton University
Gear, W., Princeton University
Kevrekidis, Y. G., Princeton University
Introduction: The detailed evolution of complex systems often involves disparity of scales in time and space. Low-dimensional representation of such complex dynamics is usually achievable through systematic model reduction approaches. Most of the time, the low-dimensional models can be parameterized by a small number of (nonlinear combinations of) parameters [1]. Therefore, one should find the reduced number of identifiable and effective parameters for the low-order model. In the present study, we employ a data-driven approach to identify parameters which are affecting the model’s outcome globally. The approach is explained and validated through a sequence of examples, including Michaelis-Menten type reaction networks.

Methodology: Principal Component Analysis (PCA) is the well-established approach to find parameters which are spanning the low-dimensional (linear) subspace of the system, or better say the good global reduced coordinates (parameters). However, it is known that in physical and chemical kinetics such a linear low-dimensional subspace is a poor description of the system and nonlinear embedding approaches should be considered. Diffusion maps (Dmap), as one of nonlinear model reduction techniques have been repeatedly applied to find low-dimensional, nonlinear manifolds underlying high-dimensional datasets [1] and offers the potential of finding ‘good’ global reduced coordinates [2].

By applying diffusion distance, Dmaps offers more meaning-full distance (metric) than Euclidean metric specifically in nonlinear manifolds. However the metric and kernel in Dmap can be modified further to organize the intrinsic parameters even better than its original formulation. For example, if the observed data on parameter space are locating on neighborhood of level-sets of a function then one can deduce that similarity between two data on the level set is more than similarity of one of them with other point somewhere else in the parameter space but with the same metric. This issue is addressed and solution is proposed in this work, lead us to find identifiable, effective global parameterization of low order model, while the problem set-up can be treated as a black box which we have only access on the observed output. The solution can be considered as combination of parameterization and sensitivity analysis while the numerical cost is considerably lower. The method is applied on datasets organized in the form of an nsamples´nvariables matrix which can be an output of numerical simulation or experiments.

Results: In the first example a dataset was found by Michaelis-Menten enzyme kinetics which the parameterization and low-dimensional model are well-established analytically. Then the methods is applied on the filtered highly resolved Euler-Lagrange gas-particle simulation to identify optimal set of filtered variables and sub-grid correlations, seeking constitutive models for two-phase flow setup.

Summary: We use a modification of Diffusion Maps, a manifold learning technique, that employs an output-inspired metric in the input (i.e. parameter) space of a dynamic model. This approach allows us to identify the number and type of relevant parameter combinations, and to identify singular as well as regular perturbation regimes without formulas, in a purely data-driven manner.

References:

[1] Coifman, Ronald R., and Stéphane Lafon. "Diffusion maps." Applied and computational harmonic analysis 21.1 (2006): 5-30.

[2] Singer, Amit, Radek Erban, Ioannis G. Kevrekidis, and Ronald R. Coifman. "Detecting intrinsic slow variables in stochastic dynamical systems by anisotropic diffusion maps. " Proceedings of the National Academy of Sciences 106, no. 38 (2009): 16090-16095.

Topics: