(193d) Machine Learning through a Parametric Programming Lens | AIChE

(193d) Machine Learning through a Parametric Programming Lens

Authors 

Tso, W. W. - Presenter, Texas A&M University
Burnak, B., Texas A&M University
Onel, M., Texas A&M Energy Institute, Texas A&M University
Katz, J., Texas A&M University
Pistikopoulos, E., Texas A&M Energy Institute, Texas A&M University
The bias-variance tradeoff is an important concept in machine learning for selecting an optimal statistical model from a set of candidate ones. The selected model should have low bias and variance so that it generalizes well to outside data not seen during model training. Too much variance leads to overfitting and too high bias leads to underfitting [1, 2].

Including a regularization term in the loss function is the most utilized approach for fitting a machine learning algorithm to find an optimal model that balances this bias-variance tradeoff. The regularization term includes an exogenous hyperparameter that is set prior to training the model. This hyperparameter controls the importance and weight of the regularization term, which affects the resulting optimization solution of the machine learning algorithm. Some algorithms also have other hyperparameters that need to be prespecified.

In this manner, finding the optimal model amounts to correctly tuning these hyperparameters. Typical strategies for hyperparameter tuning involve discretizing the parameter space and implementing an iterative grid or random search to approximate the optimal hyperparameter and thereby the optimal model [3]. This involves solving an indefinite number of optimization problems during the validation or cross-validation steps.

Instead in this work, hyperparameter tuning is viewed as a parametric programming problem, in which each optimization variable to the machine learning algorithm is derived as a single piecewise affine function of the hyperparameter. Optimizing the hyperparameter then becomes determining the global minimum of these piecewise affine expressions. Finding the optimal hyperparameter from parametric programming is exact.

We discuss a parametric programming approach toward LASSO regression [4] and linear support vector machines [5], two popular regression and classification algorithms. We demonstrate the effectiveness of this strategy through a set of computational studies.

[1] Abu-Mostafa, Yaser S., Malik Magdon-Ismail, and Hsuan-Tien Lin. Learning From Data. Vol. 4. New York, NY: AMLBook, 2012.

[2] James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning. Vol. 112. New York, NY: Springer, 2013.

[3] Claesen, Marc, and Bart De Moor. "Hyperparameter Search in Machine Learning." arXiv preprint arXiv:1502.02127 (2015).

[4] Tibshirani, Robert. "Regression Shrinkage and Selection via the Lasso." Journal of the Royal Statistical Society: Series B (Methodological) 58, no. 1 (1996): 267-288.

[5] Cortes, Corinna, and Vladimir Vapnik. "Support-vector networks." Machine Learning 20, no. 3 (1995): 273-297.