(51f) Data Driven Modeling in Alamo: Feature Selection and Non-Parametric Modeling Applications

Authors: 
Wilson, Z. - Presenter, Carnegie Mellon University
Sahinidis, N., Carnegie Mellon University
ALAMO is a computational methodology designed to learn algebraic models from data. The models produce via this approach constitute a linear combination of parametric transformations of input variables. Recent developments comparing variants of the lasso and best subset selection [1] have motivated the inclusion of additional heuristics to identify optimally predictive models. The linear model selection methodology utilized by ALAMO is explained and demonstrated across a diverse set of benchmark problems. The problem of feature engineering, or the identification of promising transformations of input variables for inclusion in linear models, is directly addressed by the ALAMO methodology. Diverse sets of algebraic features are applied to optimization benchmark problems in order to identify accurate models that are tailored for optimization with equation oriented solvers.

When physical interpretability of a model is not prioritized, it is common to use non-parametric regression techniques to provide smooth interpolations of available data. ALAMO is capable of using non-parametric transformations of points in the domain of interest as features for linear model selection. Examples utilizing Gaussian radial basis functions are explored in order to identify optimal interpolative models. Constrained regression techniques used by ALAMO can force these non-parametric models to obey insights enforced by the modeler, enhancing their ability to model accurately in extrapolative domains.

[1] Hastie, Trevor, Robert Tibshirani, and Ryan J. Tibshirani. "Extended Comparisons of Best Subset Selection, Forward Stepwise Selection, and the Lasso." arXiv preprint arXiv:1707.08692 (2017)