**6.3 Asynchronous schema**

Once the computing elements, where the jobs will be submitted, have been selected, the next step involves correctly specifying the jobs. In that sense, it will be necessary to produce the specification using the job description language in gLite. An example of a JDL file can be seen in the Fig. 3.

Due to the asynchronous behavior of this schema, the number of slaves (jobs) that can be submitted (the maximum size of *N*) is only limited by the infrastructure. However, other schemas such as the one showed in the next point, could achieve a better performance in

Using Grid Computing for Constructing Ternary Covering Arrays 237

This schema a sophisticated mechanism known, in Grid terminology, as submission of *pilot jobs*. The submission of pilot jobs is based on the master-worker architecture and supported by the DIANE (DIANE, 2011) + Ganga (Moscicki et al., 2009) tools. When the processing begins a master process (a server) is started locally, which will provide tasks to the worker nodes until all the tasks have been completed, being then dismissed. On the other side, the worker agents are jobs running on the Working Nodes of the Grid which communicate with the master. The master must keep track of the tasks to assure that all of them are successfully completed while workers provide the access to a CPU previously reached through scheduling, which will process the tasks. If, for any reason a task fails or a worker losses contact with the master, the master will immediately reassign the task to another worker. The whole process is exposed in Fig. 4. So, in contrast to the asynchronous schema, in this case the master is

However, before initiating the process or execution of the master/worker jobs, it is necessary to define their characteristics. Firstly, the specification of a run must include the master configuration (workers and heartbeat timeout). It is also necessary to establish master scheduling policies such as the maximum number of times that a lost or failed task is assigned to a worker; the reaction when a task is lost or fails; and the number of resubmissions before a worker is removed. Finally, the master must know the arguments of the tasks and the files

certain scenarios.

**6.4 Synchronous schema**

continuously in contact with the slaves.

Fig. 4. Pilot jobs schema offered by DIANE-Ganga.

shared by all tasks (executable and any auxiliary files).

```
Type = " Job ";
VirtualOrganisation = "biomed ";
Executable = " te s t . sh ";
Arguments = "16 21 3 2";
StdOutput = " std . out ";
StdError = "std . err ";
InputSandbox = { "/ home / CA_experiment / gridCA . c " ,"/ home / CA_experiment / N16k21v3t2. ca " ,"/ home / CA_experiment / t e s t . sh " } ;
OutputSandbox = {" std . out " ," std . err " ,"N16k21v3t2. ca " };
```
Fig. 3. JDL example for the case of *N* = 16, *k* = 21, *v* = 3, *t* = 2.

As it can be seen in Fig. 3, the specification of the job includes: the virtual organisation where the job will be launched (VirtualOrganisation), the main file that will start the execution of the job (Executable), the arguments that will used for invoking the executable (Arguments), the files in which the standard outputs will be dumped (StdOutput y StdError), and finally the result files that will be returned to the user interface (OutputSandBox).

So, the most important part of the execution lies in the program (a shell-script) specified in the *Executable* field of the description file. The use of a shell-script instead of directly using the executable (gridCA) is mandatory due to the heterogeneous nature present in the Grid. Although the conditions vary between different resources, as it was said before, the administrators of the sites are recommended to install Unix-like operative systems. This measure makes sure that all the developed programs will be seamlessly executed in any machine of the Grid infrastructure. The source code must be dynamically compiled in each of the computing resources hosting the jobs. Thus, basically, the shell-script works like a wrapper that looks for a *gcc*-like compiler (the source code is written in the *C* language), compiles the source code and finally invokes the executable with the proper arguments (values of *N*, *k*, *v* and *t* respectively).

One of the most crucial parts of any Grid deployment is the development of an automatic system for controlling and monitoring the evolution of an experiment. Basically, the system will be in charge of submitting the different gLite jobs (the number of jobs is equal to the value of the parameter N), monitoring the status of these jobs, resubmitting (in case a job has failed or it has been successfully completed but the SA algorithm has not already converged) and retrieving the results. This automatic system has been implemented as a master process which periodically (or asynchronously as the name of the schema suggests) oversees the status of the jobs.

This system must possess the following properties: completeness, correctness, quick performance and efficiency on the usage of the resources. Regarding the completeness, we have take into account that an experiment will involve a lot of jobs and it must be ensured that all jobs are successfully completed at the end. The correctness implies that there should be a guarantee that all jobs produce correct results which are comprehensive presented to the user and that the data used is properly updated and coherent during the whole experiment (the master must correctly update the file with the .ca extension showed in the JDL specification in order the Simulated Annealing algorithm to converge). The quick performance property implies that the experiment will finish as quickly as possible. In that sense, the key aspects are: a good selection of the resources that will host the jobs (according to the empirical tests performed in the preprocessing stage) and an adequate resubmission policy (sending new jobs to the resources that are being more productive during the execution of the experiment). Finally, if the on-the-fly tracking of the most productive computing resources is correctly done, the efficiency in the usage of the resources will be achieved.

Due to the asynchronous behavior of this schema, the number of slaves (jobs) that can be submitted (the maximum size of *N*) is only limited by the infrastructure. However, other schemas such as the one showed in the next point, could achieve a better performance in certain scenarios.
