**5. w-GA performance and discussion**

The goal of the experiment is to measure the feasibility of the solution, its *makespan* and the time needed for the computation. The environment used for the experiments is rather standard and simple (Intel Duo 2,8Ghz, 1GB RAM, Linux FC5).

To do the experiment, we generated 18 different workflows which:

• Have different topologies.

Fig. 11. Newly created configurations

16 Will-be-set-by-IN-TECH

If there are some changes in either of the two configurations, we move to step 5. If there is no change, we move to step 4. In our example, the possible changes could be presented in Figure

**Step 5:** Do crossover. When there is no change with step 4 we do a normal crossover with the

**Step 6:** Do mutation. The mutation is done on the derived configuration with no successful change. With each selected configuration, the mutation point is chosen randomly. At the mutation point, *rj* of *si* is replaced by another RMS in the candidate RMS set. Like the normal GA algorithm, the probability to do mutation with a configuration is small. We choose a random selection because the main purpose of mutation is to maintain genetic diversity from one generation of a population of chromosomes to the next. If we also use mutation to improve the quality of the configuration, the operation mutation needs a lot of time. Our initial experiment shows that the algorithm cannot find a good solution within the allowable

**Step 7:** Reform the configuration. We return the derived configurations to the original configurations to have the new configurations. With our example, assume that step 4 is successful so we have two new derived configurations as in Figure 9. The new configurations

The goal of the experiment is to measure the feasibility of the solution, its *makespan* and the time needed for the computation. The environment used for the experiments is rather

2 3 4 5 6

2 3 2 2 6

2 1 2 5 6

4 3 4 2 6

2 1 4 2 6

4 3 2 5 6

2 1 4 2 6

4 3 2 5 6

Crossover point

Fig. 10. Normal crossover operations

sj0 sj2 sj3 sj4 sj7

two derived configurations. This procedure is presented in Figure 10.

Fig. 9. Derived configurations after exchanging

sj0 sj2 sj3 sj4 sj7

9.

(1)

(2)

(1)

(2)

period.

are presented in Figure 11.

• Have different topologies.

**5. w-GA performance and discussion**

standard and simple (Intel Duo 2,8Ghz, 1GB RAM, Linux FC5). To do the experiment, we generated 18 different workflows which:


The runtime of each sub-job in each type of RMS is assigned by using Formula 4.

$$rt\_{\dot{i}} = \frac{rt\_{\dot{i}}}{\frac{pk\_{\dot{i}} + (pk\_{\dot{i}} - pk\_{\dot{i}}) \ast k}{pk\_{\dot{i}}}} \tag{4}$$

With *pki*, *pkj* is the performance of a CPU in RMS *ri*, *rj* respectively and *rti* is the estimated runtime of the sub-job with the resource configuration of RMS *ri*. *k* is the speed up control factor. Within the performance experiment, in each workflow, 60% of the number of sub-jobs have *k* = 0.5, 30% of the number of sub-jobs have *k* = 0.25, and 10% of the number of sub-jobs have *k* = 0.

The complexity of the workflow depends on the number of sub-job in the workflow. In the experiment, we stop at 32 sub-jobs for a workflow because it is much greater than the size of the recognized workflows. As far as we know, with our model of parallel task sub-job, most existing scientific workflows as described by Ludtke et al. Ludtke et al. (1999), Berriman et al. Berriman et al. (2003) and Lovas et al. Lovas et al. (2004) include just 10 to 20 sub-jobs.

As the difference in the static factors of an RMS such as OS, CPU speed and so on can be easily filtered by SQL query, we use 20 RMSs with the resource configuration equal to or even better than the requirement of sub-jobs. Those RMSs have already had some initial workload in their resource reservation and bandwidth reservation profiles. In the experiment, 30% of the number of RMS have CPU performance equals to the requirement, 60% of the number of RMS have CPU performance which is 100% more powerful than requirement, 10% of the number of RMS have CPU performance which is 200% more powerful than requirement.

We created 20 RMSs in the experiment because it closely parallels the real situation in Grid Computing. In theory, the number of sites joining a Grid can be very large. However, in reality, this number is not so great. The number of sites providing commercial service is even smaller. For example, the Distributed European Infrastructure for Supercomputing Applications (DEISA) has only 11 sites. More details about the description of resource configurations and workload configurations can be seen at the address: http://it.i-u.de/schools/altmann/ DangMinh/desc\_expe2.txt.

**5.2 Performance comparison**

Quan (2007)

5.

algorithms.

experiment result is shown in Figure 12.

of the Grid-Based Workflow Within an SLA Context

We have not noticed a similar resource model or workflow model as stated in Section 2. To do the performance evaluation, in the previous work we implemented the w-DCP, Grasp, minmin, maxmin, and suffer algorithms to our problem Quan (2007). The extensive

<sup>21</sup> w-TG: A Combined Algorithm to Optimize the Runtime

Fig. 12. Overall performance comparison among w-Tabu and other algorithms

The experiment result in Figure 12 shows that the w-Tabu algorithm has the highest performance. For that reason, we only need to consider the w-Tabu algorithm in this work. To compare the performance of the w-GA algorithm with other algorithms, we map 18 workflows to RMSs using the w-GA, the w-Tabu, and the n-GA algorithms. Similar to the experiment studying the convergence of the w-GA algorithm, this experiment is also divided into three levels according to the size of the workflow. With the n-GA algorithm, we run it with 1000 generations. With w-GA algorithm, we run it with 120 generations and 1000 generations and thus we have the w-GA1000 algorithm and the w-GA120 algorithm respectively. The purpose of running the w-GA at 1000 generations is for theoretical purpose. We want to see the limit performance of w-GA and n-GA within a long enough period. Thus, with the theoretical aspect, we compare the performance of the w-GA1000, the w-Tabu and the n-GA1000 algorithms. The purpose of running w-GA at 120 generations is for practical purposes. We want to compare the performance of the w-Tabu algorithm and the w-GA algorithm in the same runtime. With each mapping instance, the *makespan* of the solution and the runtime of the algorithm are recorded. The experiment results are presented in Table

In three levels of the experiments, we can see the domination of the w-GA1000 algorithm. In the whole experiment, w-GA1000 found 14 better and 3 worse solutions than did the n-GA1000 algorithm and the w-Tabu algorithm. The overall performance comparison in average relative value is presented in Figure 13. From this Figure, we can see that the w-GA1000 is about 21% better than the w-Tabu and the n-GA1000 algorithms. The data in the Table 5 and Figure 13 also show an equal performance between the w-Tabu and the n-GA1000


Table 4. w-GA convergent experiment results

#### **5.1 Time to convergence**

To study the convergence of the w-GA algorithm, we do three levels of experiments according to the size of the workflow. At each level, we use the w-GA to map workflows to the RMSs. The maximum number of generations is 1000. The best found *makespan* is recorded at 0, 100, 200, 400, 600, 800 and 1000 generations. The result is presented in Table 4.

From the data in the Table 4, we see a trend that the w-GA algorithm needs more generations to convergence when the size of the workflow increases.

At the simple level experiment, we map workflow having from 7 to 13 sub-jobs to the RMSs. From this data, we can see that the w-GA converges to the same value after fewer than 200 generations in most case.

At the intermediate level of the experiment where we map a workflow having from 14 to 20 sub-jobs to the RMSs, the situation is slightly different than the simple level. In addition to many cases showing that the w-GA converges to the same value after fewer than 200 generations, there are some cases where the algorithm found a better solution after 600 or 800 generations.

When the size of the workflow increases from 21 to 32 sub-jobs as in the advanced level experiment, converging after fewer than 200 generations happens in only one case. In other cases, the w-GA needs from 400 to more than 800 generations.
