(370s) GPU Parallel Forward and Backward Selection for Model Building
This feature selection problem is solved by known techniques like best subset selection. There are many approaches used to solve the best subset selection problem, such as branch-and-bound , forward and backward selection , the lasso , and mixed-integer optimization [1, 2]. As the number of data points and the number of features we consider grows, we are able to generate more accurate models at a much larger computational cost.
In this work, we address the problem of how to efficiently develop accurate models by solving the best subset selection problem with forward and backward selection algorithms with GPU computing. We begin by parallelizing the forward and backward selection algorithm. In each iteration, a number of linear least squares problems needs to be solved to identify the next best model. Fortunately, each of these linear least squares problems can be solved simultaneously, allowing us to take advantage of a batched problem structure to speed up the algorithms. We demonstrate that our GPU parallel version of the algorithm is five times faster than an equivalent CPU version.
 D. Bertsimas, A. King, and R. Mazumder. Best subset selection via a modern optimization lens. The Annals of Statistics, 44:813â852, 2016.
 A. Cozad, N. V. Sahinidis, and D. C. Miller. Automatic learning of algebraic models for optimization. AIChE Journal, 60:2211â2227, 2014.
 M.A. Efroymson. Multiple regression analysis. Mathematical methods for digital computers, pages 191â203, 1960.
 C. Gatu and EJ Kontoghiorghes. Branch-and-bound algorithms for computing the best-subset regression models. Computational and Graphical Statistics, 15:139â156, 2006.
 R. Tibshirani. Regression shrinkage and selection via the lasso. Royal Statistical Society, Methodological:267â288, 1996.