(266c) A Heterogeneous High Performance Computing Implementation of Thin Film Growth Simulation
However, because of the serial execution nature of kMC algorithm, it is quite challenging to leverage the high performance computing (HPC) power available nowadays in kMC computation since most HPC capability come from parallelization. Furthermore, as HPC architecture moving from homogeneous (all CPU computation) to heterogeneous (CPU + GPU or CPU + MIC/Intel Many Integrated Core Architecture), it has become even less straight forward to efficiently implement kMC algorithm on HPC.
In this work, we carefully investigated the implementation of kMC algorithm on a heterogeneous computing platform which consists of two Xeon CPUs and four Xeon Phi many integrated core coprocessors. Our parallel algorithm follows the synchronous method proposed by Martínez et al. of which time synchronization is perfect but computation is semi rigorous with boundary conflicts. We simulate a thin film growth process using a solid-on-solid lattice model with surface adsorption, migration, and desorption. We study the performance of the algorithm in terms of speedup factor, boundary error in different Xeon Phi execution mode (offload, native and symmetrical), as well as its relationship to simulation parameters such as lattice size, microscopic event activation energy level, etc.. Recommendations on implementation strategy is provided finally together with methods to minimize boundary conflicts.
 R. Rezaur, Intel Xeon Phi Architecture and Tools, Apress, 2013.
 E. MartÃnez, J. Marian, M.H. Kalos, J.M. Perlado, Synchronous parallel kinetic Monte Carlo for
continuum diffusion-reaction systems, Journal of Computational Physics, 2008, 227, pp3804â??3823