**Improving Atmospheric Model Performance on a Multi-Core Cluster System**

Carla Osthoff1, Roberto Pinto Souto1, Fabrício Vilasbôas1, Pablo Grunmann1, Pedro L. Silva Dias1, Francieli Boito2, Rodrigo Kassick2, Laércio Pilla2, Philippe Navaux2, Claudio Schepke2, Nicolas Maillard2, Jairo Panetta3, Pedro Pais Lopes3 and Robert Walko4 *1Laboratório Nacional de Computação Científica (LNCC) 2Universidade Federal do Rio Grande do Sul (UFRGS) 3Instituto Nacional de Pesquisas Espaciais (INPE) 4University of Miami 1,2,3Brazil 4USA* 

#### **1. Introduction**

Numerical models have been used extensively in the last decades to understand and predict weather phenomena and the climate. In general, models are classified according to their operation domain: global (entire Earth) and regional (country, state, etc). Global models have spatial resolution of about 0.2 to 1.5 degrees of latitude and therefore cannot represent very well the scale of regional weather phenomena. Their main limitation is computing power. On the other hand, regional models have higher resolution but are restricted to limited area domains. Forecasting on limited domain demands the knowledge of future atmospheric conditions at domain's borders. Therefore, regional models require previous execution of global models.

OLAM (Ocean-Land-Atmosphere Model), initially developed at Duke University (Walko & Avissar, 2008), tries to combine these two approaches to provide a global grid that can be locally refined, forming a single grid. This feature allows simultaneous representation (and forecasting) of both the global and the local scale phenomena, as well as bi-directional interactions between scales.

Due to the large computational demands and execution time constraints, these models rely on parallel processing. They are executed on clusters or grids in order to benefit from the architecture's parallelism and divide the simulation load. On the other hand, over the next decade the degree of on-chip parallelism will significantly increase and processors will contain tens and even hundreds of cores, increasing the impact of levels of parallelism on clusters. In this scenario, it is imperative to investigate the scale of programs on multilevel parallelism environment.

Operational models worldwide use the highest possible resolution that allow the model to run at the established time window in the available computer system. New computer systems are selected for their ability to run the model at even higher resolution during the available time window. Given these limitations, the impact of multiple levels of parallelism and multi-core

Improving Atmospheric Model Performance on a Multi-Core Cluster System 3

This section presents the Ocean-Land-Atmosphere Model (OLAM). Its characteristics and performance issues are discussed. We also discuss the parameters used in the performance

Fig. 1. OLAM's subdivided icosahedral mesh and cartesian coordinate system with origin at

OLAM was developed to extend features of the Regional Atmospheric Modeling System (RAMS) to the global domain (Pielke et al., 1992). OLAM uses many functions of RAMS, including physical parameterizations, data assimilation, initialization methods, logic and coding structure, and I/O formats (Walko & Avissar, 2008). OLAM introduces a new dynamic core based on a global geodesic grid with triangular mesh cells. It also uses a finite volume discretization of the full compressible Navier Stokes equations. Local refinements can be defined to cover specific geographic areas with more resolution. Recursion may be applied to a local refinement. The global grid and its refinements define a single grid, as opposed to the usual nested grids of regional models. Grid refined cells do not overlap with the global grid

The model consists essentially of a global triangular-cell grid mesh with local refinement capability, the full compressible nonhydrostatic Navier-Stokes equations, a finite volume formulation of conservation laws for mass, momentum, and potential temperature, and numerical operators that include time splitting for acoustic terms. The global domain greatly expands the range of atmospheric systems and scale interactions that can be represented in

OLAM was developed in FORTRAN 90 and parallelized with Message Passing Interface

the model, which was the primary motivation for developing OLAM.

(MPI) under the Single Program Multiple Data (SPMD) model.

architectures in the execution time of operational models is indispensable research.

evaluation.

Earth center.

cells - they substitute them.

This chapter is based on recent works from *Atmosfera Massiva Research Group*<sup>1</sup> on evaluating OLAM's performance and scalability in multi-core environments - single node and cluster.

Large-scale simulations, as OLAM, need a high-throughput shared storage system so that the distributed instances can access their input data and store the execution results for later analysis. One characteristic of weather and climate forecast models is that data generated during the execution is stored on a large amount of small files. This has a large impact on the scalability of the system, especially when executing using parallel file systems: the large amount of metadata operations for opening and closing files, allied with small read and write operations, can transform the I/O subroutines in a significant bottleneck.

General Purpose computation on Graphics Processing Units (GPGPU) is a trend that uses GPUs (Graphics Processing Units) for general-purpose computing. The modern GPUs' highly parallel structure makes them often more effective than general-purpose CPUs for a range of complex algorithms. GPUs are "many-core" processors, with hundreds of processing elements.

In this chapter, we also present recent studies that evaluates a implementation of OLAM that uses GPUs to accelerate its computations. Therefore, this chapter presents an overview on OLAM's performance and scalability. We aim at exploiting all levels of parallelism in the architectures, and also at paying attention to important performance factors like I/O.

The remainder of this chapter is structured as follows. Section 2 presents the Ocean-Land-Atmosphere Model, and Section 3 presents performance experiments and analysis. Related works are shown in Section 4. The last section closes the chapter with final remarks and future work.
