Nishadh KA

Parallella for WRF CHEM

2014-08-21


###Parallella cluster for WRF CHEM### 1. WRF-CHEM is a computationally intensive model, for example, “On a typically-sized 40×40 grid with 20 horizontal layers, the meteorological part of the simulation (the WRF weather model itself) is only 160 × 106 floating point operations per time step, about 2.5% the cost of the full WRF-Chem with both chemical kinetics and aerosol”[1]. It is generally executed with parallel mode to reduce time latency of model execution time. Identically, it is followed to use computer cluster using MPICH or OpenMPI to utilize the available processing cores to execute the model. This process of coarse-grained parallelism [2] is found to have limitations in terms of budget, energy cost, and computational constraints. 2. The alternative followed is to use fine-grained parallelism using coprocessors such as of GPUs. The GPUs are programmed by languages like CUDA based Fortran, OPEN MP etc, to convert the particular portions of code or modules in the whole model to utilize the GPUs. But in the long run for a project like real-time air pollution modeling for Coimbatore city, the GPU based solution is costly. 3. The solution in this situation is to use AWS based cluster service or custom build cluster using ARM-based single board computers. As a service, AWS is having a secondary preference for a funded project. In the case of ARM cluster, boards like raspberry pi, Beagle bone black, RADXA can consider as an option for the cluster in terms of cost and energy budget. But these boards have limitations in terms of lacking of processing power extension (limits to 3-4 GFLOPS per core of ARM processor) and 1 GB Ethernet connectivity(instead enabled with 100MB Ethernet) which can significantly hamper the efficiency of the cluster. 4. The single board computer named Parallella is potential to extents with 20-25 GFLOPS EPIPHENY 3 coprocessor and 1GB Ethernet connectivity. Moreover, the board built for fine-grained parallel computing and hardware components tuned for server operations.

####About Parallella board and its cluster#### 1. Parallella is a kick starter project of Adapteva intending to produce low cost, single board supercomputer as similar to the raspberry pi. As an extension to single board computer with ARM architecture and dual-core zynq processor it has EPIPHENY coprocessor with claimed 25 GFLOPS. Its architecture is as follows. 2. It is available in three versions for different uses such as from headless microserver, desktop and embedded system[3]. For the creation of a cluster, there is the availability of a cluster tool kit for the board[4] and implementation of resource management tool for cluster computing is also available [4a]. 3. The epipheny is a simple MIMD computer core with ram of 32 KB with computation functionality of single floating point(32bit) addition, subtraction, and multiplication. It can’t do operations such as divide and square root[5]. 4. This processor can are programmable with openCL, and it has a use case to execute R in parallel mode[6] or simulate Quantum Simulation[7]. There is an active step taken towards implementing openMP for EPIPHENY, it was a kick starter deliverables of the project[8]. OpenCl implementation for EPIPHENY 3 can be carried out using eclipse [9].

####Suitability and limitations of Parallella for WRF CHEM#### 1. The wrf is developed for the parallel programming on the go with complex computing infrastructures [10]. It has on the go support for OpenMP and not for OpenCL as currently supported by Parallella. Using openMP (shared memory parallel for the case EPIPHENY 3) or MPICH (Distributed memory parallel), WRF can compiled in Parallella. 2. WRF is programmed with single precision 10 and wrf chem is having double precision floating point operation along with singlefloating point operations as reported in this presentation and note[11,11a]. The kinetic preprocessor is to be the single most computationally intensive FP in wrf chem, it is having 110,514 FP operation compared to WRF model main components micro physics (2702 FPO) and advocation/Diffusion(301 FOP) [11]. 3. There is a project parallelizing this by KPPA [12], there is a review[13] and WRF chem implementation study is carried out on this aspect. The code for KPPA is available here [14] with publication details. 4. WRF is ported to run with Parallella main architecure ARM [15]. There is no single study directly related with WRF is carried out for running with OpenCL instead of OpenMP.

####Reference#### 1. Linford, J. C., Michalakes, J., Vachharajani, M., & Sandu, A. Automatic Generation of Multi-Core Accelerated Chemical Kinetics for Simulation and Prediction. 2. Michalakes, J., & Vachharajani, M. (2008). GPU acceleration of numerical weather prediction. Parallel Processing Letters, 18(04), 531-548. 3. http://www.rs-online.com/designspark/electronics/eng/blog/picking-the-right-parallella-board 4. http://groundelectronics.com/products/parallella-cluster-kit 4.a http://forums.parallella.org/viewtopic.php?f=32&t=1632#p10145 5. http://www.adapteva.com/wp-content/uploads/2011/06/adapteva_mpr.pdf 6. http://forums.parallella.org/viewtopic.php?f=39&t=391 7. http://www.eetimes.com/author.asp?section_id=36&doc_id=1322907 8. http://www.parallella.org/forums/viewtopic.php?f=19&t=142 9. http://nicksparallellaideas.blogspot.com.au/2014/08/opencl-on-parallella-using-eclipse.html 10. Michalakes, J. G., McAtee, M., & Wegiel, J. (2002). Software Infrastructure for the Weather Research and Forecast Model. Proceedings of UGC 2002. 11. http://www.arsc.edu/files/arsc/science/acceleration2010/attendees/presentations/michalakes-wrf-whpcaa.pdf 11a. Linford, J. C., Michalakes, J., Vachharajani, M., & Sandu, A. (2009, November). Multi-core acceleration of chemical kinetics for simulation and prediction. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (p. 7). ACM. 12. J.C. Linford, J. Michalakes, M. Vachharijani, and A. Sandu, “Multi-core acceleration of chemical kinetics for modeling and simulation”, SC’09. Portland, OR. Nov 14-20, 2009. 13. H. Zhang, J.C. Linford, A. Sandu, and R. Sander. Chemical Mechanism Solvers in Air Quality Models. Atmosphere. 2011; 2(3):510-532. 14. http://www.paratools.com/Kppa 15. http://www.supersmith.com/site/ARM_files/wrf_on_arm.pdf

#####Notes##### a. Accelerating Kernels from WRF on GPUs, http://vecpar.fe.up.pt/2010/workshops-PE_abs/Michalakes-slides.pdf b. http://www.academia.edu/1469359/Porting_the_WRF_Model_to_EumedGrid_and_Simulation_of_Air_Quality_in_Urban_Zones c. precompiled wrf virtual box, http://ronin.dgeo.udec.cl/LiveWRF/ d. https://wiki.canterbury.ac.nz/display/BlueFern/Using+OpenMP+threads,+with+or+without+MPI