(193d) Machine Learning through a Parametric Programming Lens
AIChE Annual Meeting
Monday, November 11, 2019 - 4:27pm to 4:46pm
Including a regularization term in the loss function is the most utilized approach for fitting a machine learning algorithm to find an optimal model that balances this bias-variance tradeoff. The regularization term includes an exogenous hyperparameter that is set prior to training the model. This hyperparameter controls the importance and weight of the regularization term, which affects the resulting optimization solution of the machine learning algorithm. Some algorithms also have other hyperparameters that need to be prespecified.
In this manner, finding the optimal model amounts to correctly tuning these hyperparameters. Typical strategies for hyperparameter tuning involve discretizing the parameter space and implementing an iterative grid or random search to approximate the optimal hyperparameter and thereby the optimal model . This involves solving an indefinite number of optimization problems during the validation or cross-validation steps.
Instead in this work, hyperparameter tuning is viewed as a parametric programming problem, in which each optimization variable to the machine learning algorithm is derived as a single piecewise affine function of the hyperparameter. Optimizing the hyperparameter then becomes determining the global minimum of these piecewise affine expressions. Finding the optimal hyperparameter from parametric programming is exact.
We discuss a parametric programming approach toward LASSO regression  and linear support vector machines , two popular regression and classification algorithms. We demonstrate the effectiveness of this strategy through a set of computational studies.
 Abu-Mostafa, Yaser S., Malik Magdon-Ismail, and Hsuan-Tien Lin. Learning From Data. Vol. 4. New York, NY: AMLBook, 2012.
 James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning. Vol. 112. New York, NY: Springer, 2013.
 Claesen, Marc, and Bart De Moor. "Hyperparameter Search in Machine Learning." arXiv preprint arXiv:1502.02127 (2015).
 Tibshirani, Robert. "Regression Shrinkage and Selection via the Lasso." Journal of the Royal Statistical Society: Series B (Methodological) 58, no. 1 (1996): 267-288.
 Cortes, Corinna, and Vladimir Vapnik. "Support-vector networks." Machine Learning 20, no. 3 (1995): 273-297.