(575d) Tuning with Hybrid Derivative-Free Optimization Initialization Strategies

Authors: 
Sauk, B., Carnegie Mellon University
Sahinidis, N. V., Carnegie Mellon University
As the state-of-the-art in computing technology evolves, there is a growing need to design software that is optimal on a variety of system architectures. Non-intuitive relationships between software and hardware complicate the optimization of algorithms. To address this need, algorithms are designed with tunable parameters to allow for performance portability. However, determining optimal parameters involves the solution of a large-scale global optimization problem that does not have an explicit algebraic functional form. Previous efforts to identify tuning parameters have used derivative-free optimization (DFO) algorithms such as hill climbing algorithms, Nelder-Mead simplex algorithms [3], genetic algorithms, and the DIviding RECTangles (DIRECT) algorithm [2]. To address the problem of algorithmic parameter tuning, autotuning, other groups have developed software such as Active Harmony [5], or OpenTuner [1] that combine multiple local DFO algorithms.

In this work we investigate the benefits of autotuning with novel hybrid DFO algorithms. While many search techniques implemented in autotuners utilize local DFO solvers or heuristic based methods, by combining global DFO solvers [4] with local pattern search techniques, e are able to quickly find high quality tuning parameters. Given a problem, we initialize our algorithms by using the global DFO solver DIRECT to identify a local solution. That local solution is then passed to a local DFO solver, where a new solution is obtained. Then a new solver is chosen and this procedure is repeated until we have exhausted a computational budget.

We compare our proposed methodology against autotuning techniques such as OpenTuner [1], Active Harmony [5] and other sophisticated DFO techniques. We conduct experiments on the dense matrix-matrix multiplication kernel. We focus on tuning GPU algorithms as parameters have a significant impact on computational performance. Our results demonstrate that our hybrid techniques are able to improve the performance of dense matrix-matrix multiplication by 130% compared to the best results found by state-of-the-art autotuners, while only requiring several hundred function evaluations.

References

[1] J. Ansel, S. Kamil, K. Veeramachaneni, J. Ragan-Kelley, J. Bosboom, U. O’Reilly, and S. Amarasinghe. Opentuner: An extensible framework for program autotuning. In Parallel Architecture and Compilation Techniques (PACT), 2014 23rd International Conference on, pages 303–315, 2014.

[2] D.R. Jones. The DIRECT global optimization algorithm. In C. A. Floudas and P. M. Paralos (eds.), Encyclopedia of Optimization, Kluwer Academic Publishers, Boston, MA, 1:431-440, 2001.

[3] J. A. Nelder and R. Mead. A simplex method for function minimization. Computer Journal, 7:308–313, 1965.

[4] L. M. Rios and N. V. Sahinidis. Derivative-free optimization: A review of algorithms and comparison of software implementations. Journal of Global Optimization, 56:1247-1293, 2013.

[5] C. Ţăpuş, I. Chung, and J. Hollingsworth. Active harmony: Towards automated performance tuning. In Proceedings of the 2002 ACM/IEEE conference on Supercomputing, pages 1–11, 2002.