**5.1.1.1 RegCM on gLite**

The present day status of the RegCM software and the GRID gLite infrastructure makes it not really suitable for long production runs, which require a number of CPUs in the 64-256 range, but a well implemented MPI handling mechanism, such as MPI-Start, makes the running of small to medium size RegCM simulations feasible. The data transfer from and to the GRID Storage Elements is still a matter of concern and its impact on performance should be investigated thoroughly in the future.

The GRID could be therefore used with proficiency for physical and technical testing of the code by developers and users, as well as for "parametric" simulations, that is. running many shorter/smaller simulation with different parameterisation at the same time.

Applications Exploiting e-Infrastructures Across

report is in preparation.

want to run.

execution.

terabyte of data.

**5.1.2 Overall achievements** 

data access protocol capabilities of netCDF library.

large simulations which produce terabyte of data.

**5.1.1.2 RegCM across Garuda and gLite by means of GridWay** 

model without any concerns about the status the data management.

Europe and India Within the EU-IndiaGrid Project 299

pre-requisite, along with its development libraries to be used, to enable OpenDAP remote

We performed several experiments in order to estimate how feasible is to perform all the steps require to perform RegCM climate simulation on the grid hiding all the data management complexity within the netcdf library and the openDAP protocol. A detailed

On GARUDA grid infrastructure RegCM can be easily compiled by the user itself connecting to the MPI clusters she plans to use and once the executable is provided on the cluster as local software simulation can be submitted by means of Gridway metascheduler. There was however at the moment still a big concern about data storage and movement because data management tools and services on GARUDA still have to provided. At the moment users can only move simulation data back and forth from the grid resources to the local resources by means of globus-url-copy command, a solution far to be acceptable for

For this reason an approach based on netcdf opendap protocols allows us to run easily the

As said the actual submission will then be done using the Gridway scheduler able to reach all the MPI resources made available on the Garuda GRID. Being now Gridway perfectly able to manage both gLite and globus resources it is clear that RegCM simulations can be easily and transparently submitted on both Indian and European GRID resources. A unique job description file can be used and modifying just the name of the hostname where you

This working environment is presently made available by MILU, the interoperable user interface and the GRIDWAY package preconfigured as discussed in the above sections.

The regional climate model version 4.1.1 was ported and tested on both Indian and European infrastructure and the feasibility and performance of executing the code on both HPC and grid infrastructure has been tested. It has been observed that the performance of the code on different platforms is comparable considering the CPU cores. The performance has been measured in terms of execution speed and the time taken to complete one

Data management issues has been solved by means of the openDAP approach that proved to be efficient and feasible. This marks a considerable result that allow to exploit in a similar manner all the available computational resources without the need to move back and forth

Finally Different simulations has been performed on the South Asia CORDEX domain to find out the best suited configuration of parameters and tune the model to be able to get the best possible results for the Indian sub-continent. The tuning is being done by performing various experiments using different set of parameters each simulation, for instance, using

As said the RegCM preprocessing part requires a big data ensemble (several TBytes) to be locally available every time it is run. This is quite impossible to accomplish on the GRID so the preprocessing needs to be always performed locally before submitting the job. A typical run will be started by uploading the input data previously stored on a SE. This can be accomplished by a pre-run hook script. Afterwards the actual execution will be handled by MPI-Start in a transparent way on the appropriate resources requested. Once the execution is over the data produced by the simulation will be transferred on a SE and the job will terminate. This is accomplished by a post-run hook script. In this way a RegCM simulation can be run on GRID resources.

To run RegCM properly on a MPI-Start enabled resource will actually require a compilation to be performed in advanced. This means that the RegCM software should be made available on the Grid resources by means of the Vo software managers and the grid site will then publish the availability of such a software.

Not many CEs support MPI-Start and the new MPI-related attributes in the JDL script files so we provide the possibility to run RegCM (or any other MPI parallel application actually) through a "relocatable package" approach.

With this approach all the software needed, starting from an essential OpenMPI distribution, is moved to the CEs by the job. All the libraries needed by the program have to be precompiled elsewhere and packaged for easy deployability on any architecture the job will land on. The main advantage of this solution is that it will run on almost every machine available on the GRID and the user will not even need to know what the GRID will have assigned to him. The code itself will need to be compiled with the same "relocatable" libraries and shipped to the CE by the job.

This alternative approach allows a user to run a small RegCM simulation on any kind of resource available to him, even if not a MPI-Start enabled one, although it is actually aimed at SMP resources, which are quite widely available nowadays. The main drawback of this solution is that a precompiled MPI distribution will not take advantage of any high speed network available and will not be generally able to use more than one computing node. The "relocatable" solution will though be able to use an available SMP resource, making it a reasonably good solution to run small sized RegCM simulations on any GRID resource available.

RegCM4.1.1 manages all the I/O operation through netcdf data format, provided by the netcdf library. This allows the use of the OPeNDAP Data Access Protocol (DAP), which is a protocol for requesting and transporting data across the web. This means that all the RegCM input/output operations can be done remotely without no need to upload/download data through grid data tools. Any data server providing opendap protocol (for instance THREDDS Server) can therefore be used to provide global dataset for creating DOMAIN and ICBC without the need to download the global dataset, but just the required subset in space and time. This means that even pre-processing phase can be easily done on any e-infrastructure which provides outbound connectivity. The netCDF library is to be compiled with OpenDAP support to be able to use this approach The schema is therefore to just submit the correct input file where an an URL can be used as a path in the inpter and inpglob variables in the regcm.in file instead of the usual data path on the server itself. A command line web downloader such as curl is also to be installed on the system as 298 Grid Computing – Technology and Applications, Widespread Coverage and New Horizons

As said the RegCM preprocessing part requires a big data ensemble (several TBytes) to be locally available every time it is run. This is quite impossible to accomplish on the GRID so the preprocessing needs to be always performed locally before submitting the job. A typical run will be started by uploading the input data previously stored on a SE. This can be accomplished by a pre-run hook script. Afterwards the actual execution will be handled by MPI-Start in a transparent way on the appropriate resources requested. Once the execution is over the data produced by the simulation will be transferred on a SE and the job will terminate. This is accomplished by a post-run hook script. In this way a RegCM simulation

To run RegCM properly on a MPI-Start enabled resource will actually require a compilation to be performed in advanced. This means that the RegCM software should be made available on the Grid resources by means of the Vo software managers and the grid site will

Not many CEs support MPI-Start and the new MPI-related attributes in the JDL script files so we provide the possibility to run RegCM (or any other MPI parallel application actually)

With this approach all the software needed, starting from an essential OpenMPI distribution, is moved to the CEs by the job. All the libraries needed by the program have to be precompiled elsewhere and packaged for easy deployability on any architecture the job will land on. The main advantage of this solution is that it will run on almost every machine available on the GRID and the user will not even need to know what the GRID will have assigned to him. The code itself will need to be compiled with the same "relocatable"

This alternative approach allows a user to run a small RegCM simulation on any kind of resource available to him, even if not a MPI-Start enabled one, although it is actually aimed at SMP resources, which are quite widely available nowadays. The main drawback of this solution is that a precompiled MPI distribution will not take advantage of any high speed network available and will not be generally able to use more than one computing node. The "relocatable" solution will though be able to use an available SMP resource, making it a reasonably good solution to run small sized RegCM simulations on any GRID resource

RegCM4.1.1 manages all the I/O operation through netcdf data format, provided by the netcdf library. This allows the use of the OPeNDAP Data Access Protocol (DAP), which is a protocol for requesting and transporting data across the web. This means that all the RegCM input/output operations can be done remotely without no need to upload/download data through grid data tools. Any data server providing opendap protocol (for instance THREDDS Server) can therefore be used to provide global dataset for creating DOMAIN and ICBC without the need to download the global dataset, but just the required subset in space and time. This means that even pre-processing phase can be easily done on any e-infrastructure which provides outbound connectivity. The netCDF library is to be compiled with OpenDAP support to be able to use this approach The schema is therefore to just submit the correct input file where an an URL can be used as a path in the inpter and inpglob variables in the regcm.in file instead of the usual data path on the server itself. A command line web downloader such as curl is also to be installed on the system as

can be run on GRID resources.

then publish the availability of such a software.

through a "relocatable package" approach.

libraries and shipped to the CE by the job.

available.

pre-requisite, along with its development libraries to be used, to enable OpenDAP remote data access protocol capabilities of netCDF library.

We performed several experiments in order to estimate how feasible is to perform all the steps require to perform RegCM climate simulation on the grid hiding all the data management complexity within the netcdf library and the openDAP protocol. A detailed report is in preparation.
