(624o) Multi-Threading Parallelism On Multi-Core Processor PC for Interval Analysis

Stadtherr, M., University of Notre Dame
Xu, G., Invensys/SimSci-Esscor

PCs with multi-core CPUs is common in today's market, yet majority scientific and engineering computations are either still based on single-threaded approach, or focusing on parallelism by computer-cluster and supercomputers. This paper addressed the issue of multi-threaded parallelism on single PC.

A general framework of "workload-stealing" by applying thread-safety technique, "mutex" (mutual-exclusion-event), is provided, and fitted into various network configurations (two-dimensional ring, two-dimensional mesh, three-dimensional mesh). The specially tested problem is a binary tree searching problem that the branches are dynamically generated and severely unbalanced. Such problem can be represented by solving Feigenbaum's equation with interval analysis.

Feigenbaum's equation has multiple solutions and is free to scale up. Meanwhile, all the solutions are squashed within a small region no matter how large the initial domain is. Thus, in order to find all the solutions, global solution technique such as interval analysis has to be applied, and the binary tree to be searched is dynamically generated and severely unbalanced. It had been tested that for Feigenbaum's equation, without "workload-stealing", one thread would have to perform almost all the loads, while others would be "idle" after done with their own loads. However, with "workload-stealing", excessive communication overhead or even dead-lock may happen without intelligent thread-safety handling technique.

In our approach, to improve communication efficiency, each thread had been designed to contain one "un-owned mutex" and a few neighbors (two neighbors for 2D-ring; four neighbors for 2D-mesh, and six neighbors for 3D-mesh). During solution solving process, if one thread is done with its loads, it will then target one of its neighbors that has most loads remaining, and steal loads from this neighbor. When stealing happens, both threads would need to modify their memory block for work loads, and this has to be protected by mutex lock (mutex owner ship has to be granted for thread safety). Because each thread has one mutex, multiple stealing events can happen at the same time between different pair of threads, which is much more efficient than having a global mutex.

The idea of mutex handling in this work is generally applicable to any area where parallelism is promising. It is also important to realize that our work is done on a single PC with multiple CPUs, which implies potential bright future for GPU computers in scientific and engineering calculation.