**1. Introduction**

In the Grid Computing environment, many users need the results of their calculations within a specific period of time. Examples of those users are weather forecasters running weather forecasting workflows, and automobile producers running dynamic fluid simulation workflow Lovas et al. (2004). Those users are willing to pay for getting their work completed on time. However, this requirement must be agreed on by both, the users and the Grid provider, before the application is executed. This agreement is contained in the Service Level Agreement (SLA) Sahai et al. (2003). In general, SLAs are defined as an explicit statement of expectations and obligations in a business relationship between service providers and customers. SLAs specify the a-priori negotiated resource requirements, the quality of service (QoS), and costs. The application of such an SLA represents a legally binding contract. This is a mandatory prerequisite for the Next Generation Grids.

However, letting Grid-based workflows' owners work directly with resource providers has two main disadvantages:


To free users from this kind of work, it is necessary to introduce a broker to handle the workflow execution for the user. We proposed a business model Quan & J. Altmann (2007) for the system as depicted in Figure 1, in which, the SLA workflow broker represents the user as specified in the SLA with the user. This controls the workflow execution. This includes

stated in the original SLA is very low and the ability to be fined because of not fulfilling SLA is nearly 100%. Within the SLA context, which relates to business, the fine is usually very high and increases with the lateness of the workflow's finished time. Thus, those sub-jobs, which form an workflow, must be mapped to the healthy RMSs in a way, which

<sup>5</sup> w-TG: A Combined Algorithm to Optimize the Runtime

• When the Grid is busy, there are few free resources. In this circumstance, finding a feasible solution meeting the user's deadline is a difficult task. This constraint equals to find an optimizing workflow execution time mapping solution. Even when the mapping result does not meet the preferred deadline, the broker can still use it for further negotiation with

The previous work proposed an algorithm, namely the w-Tabu Quan (2007), to handle this problem. In the w-Tabu algorithm, a set of referent solutions, which distribute widely over the search space, is created. From each solution in the set, we use the Tabu search to find the local minimal solution. The Tabu search extends the local search method by using memory structures. When a potential solution has been determined, it is marked as "taboo" so that the algorithm does not visit that solution frequently. However, this mechanism only searches the area around the referent solution. Thus, many areas containing good solutions may not be examined by the w-Tabu algorithm and thus, the quality of the solution is still not as high as

In this book chapter, we propose a new algorithm to further improve the quality of the

• An algorithm based Genetic algorithm called the w-GA algorithm. According to the character of the workflow, we change the working mechanism of the crossover and mutation operations. Thus, the algorithm could find a better solution than the standard

• An analysis the strong and weak points of w-GA algorithm compared to the w-Tabu algorithm. We do an extensive experiment in order to see the quality of w-GA algorithm

• An combined algorithm, namely w-TG. We propose a new algorithm by combining the w-GA algorithm and the w-Tabu algorithm. The experiment shows that the new algorithm

In the early state of the business Grid like now, there are not so many users or providers and the probability of numerous requests coming at a time is very low. Moreover, even when the business Grid becomes crowd, there are many periods that only one SLA workflow request coming at a time. Thus, in this book chapter, we assume the broker handles one workflow running request at a time. The extension of mapping many workflows at a time will be the

The chapter is organized as follows. Sections 2 and 3 describe the problem and the related works respectively. Section 4 presents the w-GA algorithm. Section 5 describes the performance experiment, while section 6 introduces the combined algorithm w-TG and its

mapping solution. The main contribution of the book chapter includes:

finds out solutions about 9% greater than the w-Tabu algorithm.

performance. Section 7 concludes the book chapter with a short summary.

GA algorithm with the same runtime.

in performance and runtime.

minimizes the workflow finishing time Quan (2007).

of the Grid-Based Workflow Within an SLA Context

the user.

it should be.

future work.

mapping of sub-jobs to resources, signing SLAs with the services providers, monitoring, and error recovery. When the workflow execution has finished, it settles the accounts, pays the service providers and charges the end-user. The profit of the broker is the difference. The value-added that the broker provides is the handling of all the tasks for the end-user.

Fig. 1. Stakeholders and their business relationship

We presented a prototype system supporting SLAs for the Grid-based workflow in Quan et al. (2005; 2006); Quan (2007); Quan & Altmann (2007). Figure 2 depicts a sample scenario of running a workflow in the Grid environment.

Fig. 2. A sample running Grid-based workflow scenario

In the system handling the SLA-based workflow, the mapping module receives an important position. Our ideas about Grid-based workflow mapping within the SLA context have 3 main scenarios.


The requirement of optimizing the execution time emerges in several situations.

• In the case of catastrophic failure, when one or several resource providers are detached from the grid system at a time, the ability to finish the workflow execution on time as 2 Will-be-set-by-IN-TECH

mapping of sub-jobs to resources, signing SLAs with the services providers, monitoring, and error recovery. When the workflow execution has finished, it settles the accounts, pays the service providers and charges the end-user. The profit of the broker is the difference. The

SLA subjob

Service provider 1

SLA subjob Service

Subjob 7 RMS 6

provider 2

Service provider 3

value-added that the broker provides is the handling of all the tasks for the end-user.

Grid resource broker for workflow

SLA subjob

We presented a prototype system supporting SLAs for the Grid-based workflow in Quan et al. (2005; 2006); Quan (2007); Quan & Altmann (2007). Figure 2 depicts a sample scenario of

> Subjob 6 RMS 5

In the system handling the SLA-based workflow, the mapping module receives an important position. Our ideas about Grid-based workflow mapping within the SLA context have 3 main

• Mapping heavy communication Grid-based workflow within the SLA context, satisfying

• Mapping light communication Grid-based workflow within the SLA context, satisfying

• Mapping Grid-based workflow within the SLA context with execution time optimization.

• In the case of catastrophic failure, when one or several resource providers are detached from the grid system at a time, the ability to finish the workflow execution on time as

The requirement of optimizing the execution time emerges in several situations.

Subjob 4 RMS 4

SLA workflow broker

Subjob 1 RMS 2

Subjob 2 RMS 3

SLA workflow

User

Fig. 1. Stakeholders and their business relationship

Subjob 5 RMS 1 Subjob 3 RMS 2

Fig. 2. A sample running Grid-based workflow scenario

the deadline and optimizing the cost Quan et al. (2006).

the deadline and optimizing the cost Quan & Altmann (2007).

running a workflow in the Grid environment.

Subjob 0 RMS 1

scenarios.

stated in the original SLA is very low and the ability to be fined because of not fulfilling SLA is nearly 100%. Within the SLA context, which relates to business, the fine is usually very high and increases with the lateness of the workflow's finished time. Thus, those sub-jobs, which form an workflow, must be mapped to the healthy RMSs in a way, which minimizes the workflow finishing time Quan (2007).

• When the Grid is busy, there are few free resources. In this circumstance, finding a feasible solution meeting the user's deadline is a difficult task. This constraint equals to find an optimizing workflow execution time mapping solution. Even when the mapping result does not meet the preferred deadline, the broker can still use it for further negotiation with the user.

The previous work proposed an algorithm, namely the w-Tabu Quan (2007), to handle this problem. In the w-Tabu algorithm, a set of referent solutions, which distribute widely over the search space, is created. From each solution in the set, we use the Tabu search to find the local minimal solution. The Tabu search extends the local search method by using memory structures. When a potential solution has been determined, it is marked as "taboo" so that the algorithm does not visit that solution frequently. However, this mechanism only searches the area around the referent solution. Thus, many areas containing good solutions may not be examined by the w-Tabu algorithm and thus, the quality of the solution is still not as high as it should be.

In this book chapter, we propose a new algorithm to further improve the quality of the mapping solution. The main contribution of the book chapter includes:


In the early state of the business Grid like now, there are not so many users or providers and the probability of numerous requests coming at a time is very low. Moreover, even when the business Grid becomes crowd, there are many periods that only one SLA workflow request coming at a time. Thus, in this book chapter, we assume the broker handles one workflow running request at a time. The extension of mapping many workflows at a time will be the future work.

The chapter is organized as follows. Sections 2 and 3 describe the problem and the related works respectively. Section 4 presents the w-GA algorithm. Section 5 describes the performance experiment, while section 6 introduces the combined algorithm w-TG and its performance. Section 7 concludes the book chapter with a short summary.

Number CPU available

<sup>7</sup> w-TG: A Combined Algorithm to Optimize the Runtime

138 time

time

1728

of the Grid-Based Workflow Within an SLA Context

51 45 166

10MB/s Bandwidth

requiring the reservation of bandwidth.

the overall mechanism.

**2.3 Problem specification**

435 419 357

0 21 67 82

0 21 50 65 138

If two output-input-dependent sub-jobs are executed on the same RMS, it is assumed that the time required for the data transfer equals zero. This can be assumed since all compute nodes in a cluster usually use a shared storage system such as NFS or DFS. In all other cases, it is assumed that a specific amount of data will be transferred within a specific period of time,

The link capacity between two local RMSs is determined as the average available capacity between those two sites in the network. The available capacity is assumed to be different for each different RMS couple. Whenever a data transfer task is required on a link, the possible time period on the link is determined. During that specific time period, the task can use the entire capacity, and all other tasks have to wait. Using this principle, the bandwidth reservation profile of a link will look similar to the one depicted in Figure 4. A more realistic model for bandwidth estimation (than the average capacity) can be found in Wolski (2003). Note, the kind of bandwidth estimation model does not have any impact on the working of

Table 2 presents the main resource configuration including the RMS specification and the bandwidth specification of the 6 RMSs in Figure 2. The RMS specification includes the number of CPU (cpu), the CPU speed in Mhz (speed), the amount of storage in GB (stor), the number of expert (exp). The bandwidth specification includes the source RMS (s), the destination RMS (d), and the bandwidth in GB/slot (bw). For presentation purpose, we assume that all reservation profiles are empty. It is noted that the CPU speed of each RMS can be different.

We set it to the same value for the presentation purposes only.

The formal specification of the described problem includes following elements:

Fig. 4. A sample bandwidth reservation profile of a link between two local RMSs

100

Fig. 3. A sample CPU reservation profile of a local RMS

Number CPU require


Table 1. Sample workflow specification
