(575h) Evaluation of an MINLP Formulation for Symbolic Regression

Authors: 
Engle, M., Carnegie Mellon University
Sahinidis, N. V., Carnegie Mellon University
Symbolic regression learns both the model structure and parameters to model a data set, unlike traditional regression that limits the scope of the regression to a fixed functional form [1]. The regression only requires the specification of a set of operators and operands (+, -, *, /, exp(.), log(.), (.)2, (.)3, √(.), etc.) to flexibly develop new functional forms that accurately represent the data. Symbolic regression has typically been performed with genetic programming [1], but recent developments in applying global deterministic approach have shown improved fitting metrics, such as sum of squared error and other information criterion [2,3,5].

Recently, Cozad and Sahinidis [2] proposed an MINLP formulation that uses different variations of the base formulation that reduce the search space based on redundancy elimination, symmetry breaking, and node priorities. The purpose of the current paper is to gain insights into the formulation in [2] by evaluating it on a large number of problems. To improve our understanding of the formulation’s capabilities, we evaluated the eight different versions of the model against a benchmark set of sampled known functional forms as well as real world datasets with unknown models at different binary expression tree depths and complexity of the determined function [4]. This evaluation was conducted by running the models in parallel and terminating when any version of the model converged. Further work was also conducted on the bounding of the binary expression tree for the regressed values, as well as first and second derivative behaviors. Insights gained from these computations were crucial and facilitated the constrained regression of the embedded alpha function of cubic equations of state with constraints on the first and second derivatives.

Reference cited:

[1] Koza, J.R.: Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA (1992) 24.

[2] Cozad, A. & Sahinidis, N.V. (2018). A global MINLP approach to symbolic regression. Mathematical Programming, 170, 97-119, 2018.

[3] Cozad, A. Data- and theory-driven techniques for surrogate-based optimization. PhD thesis, Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA, May 2014.

[4] J. McDermott, D.R. White, S. Luke, L. Manzoni, M. Castelli, L. Vanneschi, W. Jaśkowski, K. Krawiec, R. Harper, K.D. Jong, U.M. O’Reilly, Genetic programming needs better benchmarks, in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO) (ACM, Philadelphia, 2012)

[5] Tawarmalani, M.;Sahinidis, N. V. Global optimization of mixed-integer nonlinear programs: A theoretical and computational study, Mathematical Programming, 99, 563-591, 2004.