(753f) Randomized Rounding for Best Subset Selection Regression

Authors: 
Sarwar, O., Carnegie Mellon University
Sahinidis, N. V., Carnegie Mellon University
Learning algebraic models from data is becoming an increasingly popular method to model complex chemical engineering systems [1]. This trend is driven by the difficulty of constructing and optimization physics-based models, increases in available system and experimental data, and improved strategies for regression. In this work, we focus on predicting the value of a single output variable using a linear combination of features. These features include the original regression variables as well as nonlinear transformations of the regressors.

Well-known strategies for linear regression include Forward Stepwise Selection, the LASSO, and Best Subset Selection (BSS) [2]. BSS seeks to features to minimize the sum of squared error between the actual and model-predicted values of the output while also penalizing the inclusion of features in the objective. This task can be posed as a mixed-integer quadratic program (MIQP) and leads to models with low complexity and good accuracy [3, 4]. In order to capture the nonlinearities in the data-generating function, it is necessary to consider a wide-variety of transformations of the original variables. However, as the number of considered variables grows, the resulting combinatorial explosion makes the MIQP problem intractable to solve in a reasonable amount of time [5].

In this work, we propose a new strategy to construct a solution to the Best Subset Selection MIQP. Our algorithm cleverly applies a randomized approach to rounding the continuous relaxation of the integer variables. The coefficients of the variables in the active set can then be quickly calculated using basic linear algebra techniques. The resulting algorithm is orders of magnitude faster than solving large problems to global optimality via the MIQP and results in solutions that are near-optimal for a wide-variety of synthetic and real data.

Additionally, we propose a procedure that incorporates our heuristic strategy into a larger framework that uses insights gained from the approximate solution—along with work done in the statistics community to solve convex quadratic programs—to develop a faster branch-and-bound strategy to solving the Best Subset Selection problem to optimality.

  1. Cozad , N. V. Sahinidis , and D. C. Miller. Learning surrogate models for simulation-based optimization. AIChE Journal , 60(6):2211–2227, 2014.
  2. Tibshirani. Regression Selection and Shrinkage via the Lasso. Journal of the Royal Statistical Society, 58(1):267–288, 1996.
  3. Cozad, N. V. Sahinidis, and D. C. Miller. A combined first-principles and data-driven approach to model building. Computers & Chemical Engineering, 73:116–127, 2015.
  4. T. Wilson and N. V. Sahinidis. The ALAMO approach to machine learning. Computers & Chemical Engineering, 106:785–795, 2017.
  5. Bertsimas, A. King, and R. Mazumder. Best Subset Selection via a Modern Optimization Lens. The Annals of Statistics, 44(2):813–852, 2015.