(333d) An Optimization-Based Approach for Learning Simple Parametric Surrogate Models

Authors: 
Wilson, Z. - Presenter, Carnegie Mellon University
Sahinidis, N., Carnegie Mellon University
Data obtained through simulations or experiments are routinely used to inform process decisions and build system models. In order to describe behavior in complex systems with non-linear behavior, non-parametric methods, such as artificial neural networks or support vector machines are routinely used to build accurate system models. However, these models often suffer from overfitting. Moreover, their non-convex functional forms make them difficult to interpret and incorporate directly into algebraic optimization algorithms. In order to develop simple, yet accurate algebraic models we have recently developed the ALAMO methodology to learn models from exogenous data [1]. ALAMO performs a number of nonlinear transformations of input variables to populate a regression basis set.

In this paper, we present a systematic computational study of several fitness metrics that can be used in an optimization-based subset selection methodology to identify an optimal subset of regression variables. These metrics include Mallowsâ?? Cp, Akaikeâ??s information criterion, and Bayesian information criterion amongst others. The resulting models consist of a linear combination of nonlinear transformations of input variables, and their simple algebraic form can help provide insight on the system at hand. We complement these exact optimization algorithms with fast heuristics and describe their computational performance in ALAMO. Moreover, we present a systematic comparison between ALAMOâ??s optimization-based approach to model fitting from data with a number of other parametric model building methods, including the lasso implementation in Matlab [2] and Râ??s leaps routine [3].

References

[1] Cozad, A., N. V. Sahinidis, and D. C. Miller, Automatic learning of algebraic models for optimization, AIChE Journal, 60, 2211-2227, 2014.

[2] http://www.mathworks.com/help/stats/lasso.html
[3] https://cran.r-project.org/web/packages/leaps/leaps.pdf