(78c) Efficient Parallel Optimization On Emerging Computational Architectures

Legg, S. - Presenter, Texas A&M University
Laird, C. D. - Presenter, Texas A&M University
Zhu, Y. - Presenter, Texas A&M University

The size and complexity of modern optimization problems continues to grow, and problems become more difficult to solve. Though the size of these problems continues to grow, the advances in computing clock rates that we once took for granted has slowed dramatically. Computer chip design companies have instead focused on designing more complicated processing units, which can utilize multiple processors in parallel to solve problems more quickly. Emerging computing architectures like graphics processing units (GPUs) are making a large impact in the ?personal supercomputer? sector by making hundreds of parallel processors affordable within a single machine. However, efficiently utilizing parallel architectures like distributed clusters, multi-core machines, and GPUs requires a novel approach to the traditional optimization algorithms that we have utilized in the past.

In this research, we have developed two approaches for parallel optimization on distributed clusters and GPUs. To effectively take advantage of the massively parallel architecture inherent in the GPU, we have developed an interior-point approach that uses an iterative preconditioned-conjugate gradient (PCG) approach on the doubly-augmented form of the linear KKT system. Because of the simplicity of the PCG approach, this technique is widely applicable on fine-grained parallel architectures like the GPU. For coarse-grained architectures like distributed clusters, it is much more important to exploit specific problem structure. In previous work, we presented a Schur-complement decomposition approach for efficient parallel solution of the augmented system required at each iteration of the nonlinear interior-point solver, IPOPT. This technique exploits the block-bordered structure found in multiscenario problems and parameter estimation problems. This technique scales extremely well to large numbers of CPUs and blocks. However, the process of forming the Schur-complement scales linearly with the number of coupling variables and, as the number of coupling variables increases, can become a prohibitive bottleneck in the approach. In this work, we present a quasi-Newton approach for updating an approximate Schur-complement, dramatically increasing the efficiency of the decomposition technique for problems with many coupling variables (hundreds). Both of these approaches are demonstrated on several applicable test problems and case-studies.