(282e) Mixed-Integer Programming Representations of Linear Model Decision Tree Surrogates | AIChE

(282e) Mixed-Integer Programming Representations of Linear Model Decision Tree Surrogates

Authors 

Ammari, B. - Presenter, Carnegie Mellon University
Laird, C., NA
Stinchfield, G., Carnegie Mellon University
Johnson, E., Sandia National Laboratories
Hart, W. E., Sandia National Laboratories
Bynum, M., Sandia National Laboratories
Pulsipher, J., University of Wisconsin-Madison
Significant advances in mixed-integer linear programming (MILP) and mixed-integer quadratic constrained programming (MIQCP) solvers (e.g., Gurobi) have resulted in increased interest in piecewise-linear surrogates within optimization settings. Piecewise-linear functions can be formulated as MILPs (Vielma et al., 2010), and they have been utilized extensively to approximate nonlinear, nonconvex functions that are difficult to optimize. More recently, several piecewise-linear machine learning (ML) models such as neural networks (NNs) with rectified linear unit (ReLU) activation functions (Fischetti & Jo, 2018), ensembles of decision trees (Mistry et al., 2021; Mišić, 2020) and linear model decision trees, have also been explored within optimization frameworks. However, given the many possible MILP and MIQCP representations of these piecewise-linear models and recent advances in solver capabilities, questions remain concerning the computational performance of these different representations when embedded within a broader optimization setting. In this work, we aim to gain insight into this general question by specifically evaluating the computational performance of linear model decision trees, ReLU NNs, and gradient boosted decision trees (GBDTs) embedded within MILPs and MIQCPs.

Linear model decision trees differ from standard decision trees by returning linear regression models rather than constants at the leaf nodes. Among many of their advantages include their ability to represent discontinuous functions, and potentially approximate arbitrary functions with smaller trees and reduced error. When embedding these smaller linear model decision trees within optimization problems, their fewer leaves correspond to fewer constraints. Multiple MILP and MIQCP representations of linear model decision trees have been developed utilizing Generalized Disjunctive Programming (GDP) formulations, extensions of existing formulations for standard trees, and hybrid Big-M methods.

We present several case studies, including process family design (Stinchfield et al., 2022), flexibility analysis (Swaney & Grossmann, 1985), and global optimization that showcase the benefits of linear model decision trees. We also investigate the computational performance of different representations for linear model decision trees, including both MILP and MICQP formulations, and discuss the properties of these formulations. Finally, we compare the performance of linear model decision trees against other ML models, including GBDTs using the Optimization and Machine Learning Toolkit (OMLT) (Ceccon et al., 2022).

Acknowledgements

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the United States Government

References

Ceccon, F., Jalving, J., Haddad, J., Thebelt, A., Tsay, C., Laird, C. D., and Misener, R. (2022). “OMLT: Optimization & Machine Learning Toolkit.” Journal of Machine Learning Research, 23(349):1–8.

Fischetti, M. and Jo, J. (2018). “Deep Neural Networks and Mixed Integer Linear Optimization.” Constraints, 23:296–309.

Mistry, M., Letsios, D., Krennrich, G., Lee, R. M., and Misener, R. (2021). “Mixed-integer Convex Nonlinear Optimization with Gradient-boosted Trees Embedded.” INFORMS Journal on Computing, 33:1103–1119.

Mišić, V. V. (2020). “Optimization of Tree Ensembles.” Operations Research, 68:1605–1624.

Stinchfield, G., Biegler, L. T., Eslick, J. C., Jacobson, C., Miller, D. C., Siirola, J. D., Zamarripa A., Zhang, C., Zhang, Q., and Laird, C. D. (2022). “Optimization-based Approaches for Design of Chemical Process Families Using ReLU Surrogates.” in proceedings of Foundations of Computer Aided Process Operations, Jan. 2023.

Swaney, R. E. and Grossmann, I. E. (1985). “An Index for Operational Flexibility in Chemical Process Design.” AIChE Journal, 31:621–630

Vielma, J. P., Ahmed, S., and Nemhauser, G. (2010). “Mixed-integer Models for Nonseparable Piecewise-linear Optimization: Unifying Framework and Extensions.” Operations Research, 58:303–315.