(405d) Optimization of a Lennard-Jones Particle Monte Carlo Gpucode

Authors: 
Mick, J. R., Wayne State University
Hailat, E., Wayne State University
Russo, V., Wayne State University
Ibrahem, K., Wayne State University
Schwiebert, L., Wayne State University

Monte Carlo (MC) simulations of atomic particles are pleasingly parallelizable problem  [1], making them an ideal candidate for simulation on graphics processing units (GPUs), powerful single-instruction-multiple-data (SIMD) computing devices.  GPUs offer cheaper parallel processing compared to CPUs, thanks to their large quantities of compute cores.  In many cases, evaluating every pair of interactions in the system is too costly, even with the parallel processing power of modern GPUs.  Simulation of  the 100,000+ atom biomolecular systems that are being currently explored [2, 3]  with molecular dynamics in the Gibbs or grand canonical ensembles is not  feasible unless significant reductions in the number of evaluations of the non-bonded interactions.

This work details refinements in Monte Carlo simulations performed on GPU [4] to enable the rapid simulation of 100,000 atom systems on typical desktop workstations.       Neighbor lists[5] are adapted to a form suitable for the GPU’s memory [6], which consists of high-speed shared memory and registers, plus low-speed global memory.  By keeping track of nearby molecules, the number of interactions to be considered is reduced significantly, which reduces looping on the CUDA-core-constrained GPU.  Further improvements include placing current CPU side logic (pseudo-random number generation (PRNG), etc.) on the GPU device and optimizations to the parallel displacement, volume swap, and particle insertion moves, in an effort to minimize the computationally expensive transfer of information from the device to the CPU over the PCI bus.  Additional speed gains are realized by using idle threads during the tree summation of energies to calculate part of the pair interactions based on the next random draws in the PRNG sequence.  This allows each kernel call for a specific move selection to perform the necessary arithmetic for the current move and part of the next.  The benefits of these improvements are highlighted by the simulation of large (N>100,000 particles) systems in the canonical and Gibbs ensembles.  The results of a very-large-scale simulations near the critical point [7] of a tail-corrected Lennard-Jones fluid are also presented.

1.            Zara, S.J. and D. Nicholson, Grand Canonical Ensemble Monte Carlo Simulation on a Transputer Array. Molecular Simulation, 1990. 5(3-4): p. 245-261.

2.            Freddolino, P.L., et al., Molecular Dynamics Simulations of the Complete Satellite Tobacco Mosaic Virus. Structure (London, England : 1993), 2006. 14(3): p. 437-449.

3.            Sanbonmatsu, K.Y., S. Joseph, and C.-S. Tung, Simulating movement of tRNA into the ribosome during decoding. Proceedings of the National Academy of Sciences of the United States of America, 2005. 102(44): p. 15854-15859.

4.            Mick, J.R., Potoff, J.J., Hailat, E., Russo, V., Schwiebert, L. GPU Accelerated Monte Carlo Simulations In the Gibbs and Canonical Ensembles. in AIChE Annual Conference. 2011. Minneapolis, MN.

5.            Verlet, L., Computer "Experiments" on Classical Fluids. I. Thermodynamical Properties of Lennard-Jones Molecules. Physical Review, 1967. 159(1): p. 98-103.

6.            Wang, P. Short Range Molecular Dynamics on GPU. in GPU Tech Conf. 2006.

7.            Potoff, J.J. and A.Z. Panagiotopoulos, Critical point and phase behavior of the pure fluid and a Lennard-Jones mixture. Journal of Chemical Physics, 1998. 109(24): p. 10914-10920.