(345g) Learning Models of Unspecified Functional Form through Symbolic Regression

Cozad, A. - Presenter, Carnegie Mellon University
Wilson, Z. - Presenter, Carnegie Mellon University
Sahinidis, N. - Presenter, Carnegie Mellon University

We address the problem of learning simple algebraic models from data obtained from simulations or experiments. Standard regression techniques seek to develop models with a pre-determined model structure or set of alternative model structures. However, in a practical setting, data is often acquired from a number of sources without clear understanding of the system at hand. Algebraic models could be used to make the system more amenable to the tasks of optimization, prediction, and control; however, a lack of insightful functional forms to use in a regression is problematic. Symbolic regression addresses this problem by learning an algebraic model of unspecified functional form from exogenous data [1].

Symbolic regression is traditionally approached with genetic programming and other heuristic algorithms. These stochastic approaches to model identification offer no guarantee of either local or global optimality, and often perform poorly in practice [2]. We show that symbolic regression can be formulated as a nonlinear nonconvex disjunctive program and can be solved to global optimality.  Our approach includes steps to avoid redundant solutions and ensure that the optimal solution can be efficiently found using a branch-and-bound framework. We present extensive computational results comparing our symbolic regression approach to approaches in the existing literature.


[1]    J.R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, 1992.

[2]    M.F. Korns, Accuracy in symbolic Regression. In Genetic Programming Theory and Practice IX, pages 129-151. Springer, 2011.