**6.4 Synchronous schema**

16 Grid Computing

*InputSandbox = { "/ home / CA\_experiment / gridCA . c " ,"/ home / CA\_experiment / N16k21v3t2. ca " ,"/ home / CA\_experiment / t e s t . sh " } ;*

As it can be seen in Fig. 3, the specification of the job includes: the virtual organisation where the job will be launched (VirtualOrganisation), the main file that will start the execution of the job (Executable), the arguments that will used for invoking the executable (Arguments), the files in which the standard outputs will be dumped (StdOutput y StdError), and finally the

So, the most important part of the execution lies in the program (a shell-script) specified in the *Executable* field of the description file. The use of a shell-script instead of directly using the executable (gridCA) is mandatory due to the heterogeneous nature present in the Grid. Although the conditions vary between different resources, as it was said before, the administrators of the sites are recommended to install Unix-like operative systems. This measure makes sure that all the developed programs will be seamlessly executed in any machine of the Grid infrastructure. The source code must be dynamically compiled in each of the computing resources hosting the jobs. Thus, basically, the shell-script works like a wrapper that looks for a *gcc*-like compiler (the source code is written in the *C* language), compiles the source code and finally invokes the executable with the proper arguments (values of *N*, *k*, *v*

One of the most crucial parts of any Grid deployment is the development of an automatic system for controlling and monitoring the evolution of an experiment. Basically, the system will be in charge of submitting the different gLite jobs (the number of jobs is equal to the value of the parameter N), monitoring the status of these jobs, resubmitting (in case a job has failed or it has been successfully completed but the SA algorithm has not already converged) and retrieving the results. This automatic system has been implemented as a master process which periodically (or asynchronously as the name of the schema suggests) oversees the status of the

This system must possess the following properties: completeness, correctness, quick performance and efficiency on the usage of the resources. Regarding the completeness, we have take into account that an experiment will involve a lot of jobs and it must be ensured that all jobs are successfully completed at the end. The correctness implies that there should be a guarantee that all jobs produce correct results which are comprehensive presented to the user and that the data used is properly updated and coherent during the whole experiment (the master must correctly update the file with the .ca extension showed in the JDL specification in order the Simulated Annealing algorithm to converge). The quick performance property implies that the experiment will finish as quickly as possible. In that sense, the key aspects are: a good selection of the resources that will host the jobs (according to the empirical tests performed in the preprocessing stage) and an adequate resubmission policy (sending new jobs to the resources that are being more productive during the execution of the experiment). Finally, if the on-the-fly tracking of the most productive computing resources is correctly done,

the efficiency in the usage of the resources will be achieved.

*Type = " Job ";*

and *t* respectively).

jobs.

*VirtualOrganisation = "biomed "; Executable = " te s t . sh "; Arguments = "16 21 3 2"; StdOutput = " std . out "; StdError = "std . err ";*

*OutputSandbox = {" std . out " ," std . err " ,"N16k21v3t2. ca " };*

Fig. 3. JDL example for the case of *N* = 16, *k* = 21, *v* = 3, *t* = 2.

result files that will be returned to the user interface (OutputSandBox).

This schema a sophisticated mechanism known, in Grid terminology, as submission of *pilot jobs*. The submission of pilot jobs is based on the master-worker architecture and supported by the DIANE (DIANE, 2011) + Ganga (Moscicki et al., 2009) tools. When the processing begins a master process (a server) is started locally, which will provide tasks to the worker nodes until all the tasks have been completed, being then dismissed. On the other side, the worker agents are jobs running on the Working Nodes of the Grid which communicate with the master. The master must keep track of the tasks to assure that all of them are successfully completed while workers provide the access to a CPU previously reached through scheduling, which will process the tasks. If, for any reason a task fails or a worker losses contact with the master, the master will immediately reassign the task to another worker. The whole process is exposed in Fig. 4. So, in contrast to the asynchronous schema, in this case the master is continuously in contact with the slaves.

Fig. 4. Pilot jobs schema offered by DIANE-Ganga.

However, before initiating the process or execution of the master/worker jobs, it is necessary to define their characteristics. Firstly, the specification of a run must include the master configuration (workers and heartbeat timeout). It is also necessary to establish master scheduling policies such as the maximum number of times that a lost or failed task is assigned to a worker; the reaction when a task is lost or fails; and the number of resubmissions before a worker is removed. Finally, the master must know the arguments of the tasks and the files shared by all tasks (executable and any auxiliary files).

5. Frozen factor *φ* = 11

3. Middleware: gLite

equation must satisfy

2 ≤ *t* ≤ 4.

are:

applied using a probability *P* = 0.3

4. Total number of WNs available: 281.530

2. Virtual Organisation: biomed

1. Infrastructure's name: EGI (European Grid infrastructure)

**7.1 Fine tuning the probability of execution of the neighborhood functions**

6. According to the results shown in section 7.1, the neighborhood function N3(*s*, *x*) is

Using Grid Computing for Constructing Ternary Covering Arrays 239

Moreover, the characteristics of the Grid infrastructure employed for carrying the experiments

It is well-known that the performance of a SA algorithm is sensitive to parameter tuning. In this sense, we follow a methodology for a fine tuning of the two neighborhood functions used in our SA algorithm. The fine tuning was based on the next linear Diophantine equation,

*P*1*x*<sup>1</sup> + *P*2*x*<sup>2</sup> = *q* where *xi* represents a neighborhood function and its value set to 1, *Pi* is a value in {0.0, 0.1, .., 1.0} that represents the probability of executing *xi* , and *q* is set to 1.0 which is the maximum probability of executing any *xi*. A solution to the given linear Diophantine

*Pixi* = 1.0

This equation has 11 solutions, each solution is an experiment that test the degree of participation of each neighborhood function in our SA implementation to accomplish the construction of an CA. Every combination of the probabilities was applied by SA to construct the set of CAs shows in Table 5(a) and each experiment was run 31 times, with the data obtained for each experiment we calculate the median. A summary of the performance of

Finally, given the results shown in Fig. 5 the best configuration of probabilities was *P*<sup>1</sup> = 0.3 and *P*<sup>2</sup> = 0.7 because it found the CAs in smaller time (median value). The values *P*<sup>1</sup> = 0.3

In the next subsection, we will present more computational results obtained from a performance comparison carried out among our SA algorithm, a well-known greedy algorithm (IPOG\_F) and a tool named TConfig that constructs CAs using recursive functions.

For the second of our experiments we have obtained the ACTS and TConfig software. We create a new benchmark composed by 60 ternary CAs instances where 5 ≤ *k* ≤ 100 and

The SA implementation reported by (Cohen et al., 2003) for solving the CAC problem was intentionally omitted from this comparison because as their authors recognize this algorithm

fails to produce competitive results when the strength of the arrays is *t* ≥ 3.

2 ∑ *i*=1

SA with the probabilities that solved the 100% of the runs is shown in Table 5(b).

and *P*<sup>2</sup> = 0.7 were kept fixed in the second experiment.

**7.2 Comparing SA with the state-of-the-art algorithms**

At this point, the master can be started using the specification described above. Upon checking that all is right, the master will wait for incoming connections from the workers.

Workers are generic jobs that can perform any operation requested by the master which are submitted to the Grid. In addition, these workers must be submitted to the selected CEs in the pre-processing stage. When a worker registers to the master, the master will automatically assign it a task.

This schema has several advantages derived from the fact that a worker can execute more than one task. Only when a worker has successfully completed a task the master will reassign it a new one. In addition, when a worker demands a new task it is not necessary to submit a new job. This way, the queuing time of the task is intensively reduced. Moreover, the dynamic behavior of this schema allows achieving better performance results, in comparison to the asynchronous schema.

However, there are also some disadvantages that must be mentioned. The first issue refers to the unidirectional connectivity between the master host and the worker hosts (Grid node). While the master host needs inbound connectivity, the worker node needs outbound connectivity. The connectivity problem in the master can be solved easily by opening a port in the local host; however the connectivity in the worker will rely in the remote system configuration (the CE). So, in this case, this extra detail must be taken into account when selecting the computing resources. Another issue is defining an adequate timeout value. If, for some reason, a task working correctly suffers from temporary connection problems and exceeds the timeout threshold it will cause the worker being removed by the master. Finally, a key factor will be to identify the rightmost number of worker agents and tasks. In addition, if the number of workers is on the order of thousands (i.e. when N is about 1000) bottlenecks could be met, resulting on the master being overwhelmed by the excessive number of connections.
