(719a) Applying Advances in Parallel Computing to Pyrolysis Modeling Algorithms
Detailed modeling of complex kinetic networks nowadays still remains a daunting task. The computational cost of a single simulation rises quickly when the kinetic network becomes larger. The latter is in particularly true for combustion, oxidation and pyrolysis mechanisms where considering several thousands of species is no exception. Moreover the computational power of a single core has flattened out during the last years, and hence, the simulation time rises drastically with increasing network sizes. Speedup of the simulation code can nowadays be obtained in various different ways. A first option is rewriting the code (with for example OpenMP ) in such a way that it can be executed in parallel on the central programming unit (CPU) since most CPU's contain more than one core. Another option which recently became available is parallel programming on the graphics processing units (GPU). Both C programmers (CUDA from Nvidia ) as Fortran programmers (CUDA Fortran from PGI ) have tools available which simplify the way GPU subroutines can be written and called. However most researchers these days lack the necessary programming knowledge and stick to using and programming slower, sequential codes.
To illustrate the possible speed-up for combustion and pyrolysis applications we have applied these advances in parallel computing on free-radical based reaction mechanisms of different sizes and characteristics implemented in COILSIM1D, a recently developed single event microkinetic model (SEMK) for simulating the steam cracking process. COILSIM1D is able to simulate the cracking of a very broad range of industrially relevant feedstocks, from ethane to LPG, over naphtha to gas oil and vacuum gas oils. The duration of a single simulation however drastically increases when the feedstocks become more complex due to the increasing size of the reaction mechanism that is considered (see Figure 1). The highly parallel structure of certain parts of the code make COILSIM1D the ideal candidate for hybrid CPU/GPU calculations. Since Coilsim1D is originally written in Fortran the author's preferred using CUDA Fortran from PGI over CUDA C from Nvidia. To the author's knowledge this is the first time that CUDA Fortran is applied for combustion or pyrolysis applications. Examples will be given of how researchers should modify their code to take full advantage of the advances made in parallel computing.
The first step in speeding up the code is of course to identify the calculations which consume most of the time, so-called profiling. Profiling of Coilsim1D has shown that as expected the calculation of the rate of productions is one of the more costly calculations (see Figure 2). When both temperature and pressure profile are given and only the material balances need to be solved this part of the calculations takes about 50% of the total time. However when both energy and momentum balances need to be solved additionally it only uses between 10 and 30% of the total time (Case 2 and 3 on Figure 2). In these cases the evaluation of the viscosity of the mixture becomes the most time consuming calculation taking between 75 and 85% of the total time.
Viscosity of the reacting gas mixture in Coilsim1D is calculated according to Sutherland's formula with the coefficients derived from Wilke's formula . Since these Wilke coefficients are a two-dimensional matrix the algorithm to calculate the viscosity is of the order O(n²). This means that the computational cost of the viscosity rises quadratically with increasing network size (see Figure 3) which explains why the calculation takes up so much of the total time of the calculation. Since these Wilke coefficients are independent of each other the evaluation of these coefficients can easily be done in parallel and is perfect for the highly parallel computational capabilities of a GPU.
Figure 4 shows the speedup at different network sizes when Coilsim1D is calculated on a hybrid CPU/GPU system. The system is equipped with an Intel Xeon E5620 processor with 6 Gb of memory. It is extended with a Nvidia Tesla C2075 card for the CPU/GPU hybrid calculations. The highest speedup is obtained at the larger network sizes. For example the simulation time for a network with 610 components is reduced from 72.4s to 20.8s when the hybrid CPU/GPU version is used. For smaller network sizes the speedup is negligible or even lower than one. This is caused by an increased relative importance of the copy operation from the CPU memory to the GPU memory and vice versa. Figure 4 also shows the theoretical maximum speedup of the program. At large network sizes our algorithm closely approximates this maximum so additional speedup of the program by optimizing these part of the calculations will be difficult if not impossible to obtain.
This change also causes a change in the distribution of the major time consumers. Figure 5 shows the major time consumers after the speedup which is now the calculation of the rate of productions. Even more additional speedup (with a theoretical maximum of 2.5) of the program could be obtained if the GPU is also used for the calculation of the rates of production in line with Shi et al. 
Figure 5: Relative time consumption of different subroutines for CPU/GPU hybrid calculations.
(Case 1: only material balances, Case 2 & 3: Material, energy and momentum balances)
The case studies that were investigated revealed the major advantages of using a GPU in addition to a CPU because this combination can significantly reduce the simulation time especially for large reaction mechanisms. Significant speedup can be obtained when the viscosity is calculated on the GPU when in addition to material balances, momentum and energy balances need to be solved. Even further speedup is possible by using the GPU to calculate rate of production. These GPU routines can nowadays be programmed in an environment known to the researcher since both C and Fortran versions are available.
 OpenMP Application program interface, OpenMP, 2011.
 CUDA API reference manual, Nvidia, 2012.
 CUDA Fortran: Programming guide and reference, The Portland Group, 2012.
 P.J.M. Reid R.C., Poling B.R., Properties of gases and liquids, McGraw-Hill, 1979.
 Y. Shi, W.H. Green Jr, H.-W. Wong, O.O. Oluwole, Combustion and Flame 158 (2011) 836.
See more of this Group/Topical: Catalysis and Reaction Engineering Division