**2. Problem statement**

### **2.1 Grid-based workflow model**

Like many popular systems handling Grid-based workflows Deelman et al. (2004); Lovas et al. (2004); Spooner et al. (2003), our system is of the Directed Acyclic Graph (DAG) form. The user specifies the required resources needed to run each sub-job, the data transfer between sub-jobs, the estimated runtime of each sub-job, and the expected runtime of the whole workflow. In this book chapter, we assume that time is split into slots. Each slot equals a specific period of real time, from 3 to 5 minutes. We use the time slot concept in order to limit the number of possible start-times and end-times of sub-jobs. Moreover, a delay of 3 minutes is insignificant for the customer. Table 1 presents the main parameters including sub-job specifications and data transfer specifications of the sample workflow in Figure 2. The sub-job specification includes the number of CPU (cpu), the CPU speed (speed), the amount of storage (stor), the number of experts (exp), the required runtime (rt). The data transfer specification includes the source sub-job (S-sj), the destination sub-job (D-sj), and the number of data (data). It is noted that the CPU speed of each sub-job can be different. However, we set it to the same value for the presentation purposes only.

#### **2.2 Grid service model**

The computational Grid includes many High Performance Computing Centers (HPCCs). The resources of each HPCC are managed by a software called local Resource Management System (RMS)1. Each RMS has its own unique resource configuration, the number of CPUs, the amount of memory, the storage capacity, the software, the number of experts, and the service price. To ensure that the sub-job can be executed within a dedicated time period, the RMS must support an advance resource reservation such as CCS Hovestadt (2003). Figure 3 depicts an example of an CPU reservation profile of such an RMS. In our model, we reserve three main types of resources: CPU, storage, and expert. The addition of further resources is straightforward.

<sup>1</sup> In this book chapter, RMS is used to represent the cluster/super computer as well as the Grid service provided by the HPCC.

4 Will-be-set-by-IN-TECH

Sjs cpu speed stor exp rt S-sj D-sj data

 128 1000 30 2 16 0 1 1 64 1000 20 1 13 0 2 4 128 1000 30 2 14 0 3 6 128 1000 30 2 5 0 5 10 128 1000 30 2 8 1 7 3 32 1000 10 0 13 2 4 2 64 1000 20 1 16 3 4 8 128 1000 30 2 7 4 7 4

Workflow starting slot: 10 6 7 6

Like many popular systems handling Grid-based workflows Deelman et al. (2004); Lovas et al. (2004); Spooner et al. (2003), our system is of the Directed Acyclic Graph (DAG) form. The user specifies the required resources needed to run each sub-job, the data transfer between sub-jobs, the estimated runtime of each sub-job, and the expected runtime of the whole workflow. In this book chapter, we assume that time is split into slots. Each slot equals a specific period of real time, from 3 to 5 minutes. We use the time slot concept in order to limit the number of possible start-times and end-times of sub-jobs. Moreover, a delay of 3 minutes is insignificant for the customer. Table 1 presents the main parameters including sub-job specifications and data transfer specifications of the sample workflow in Figure 2. The sub-job specification includes the number of CPU (cpu), the CPU speed (speed), the amount of storage (stor), the number of experts (exp), the required runtime (rt). The data transfer specification includes the source sub-job (S-sj), the destination sub-job (D-sj), and the number of data (data). It is noted that the CPU speed of each sub-job can be different. However, we

The computational Grid includes many High Performance Computing Centers (HPCCs). The resources of each HPCC are managed by a software called local Resource Management System (RMS)1. Each RMS has its own unique resource configuration, the number of CPUs, the amount of memory, the storage capacity, the software, the number of experts, and the service price. To ensure that the sub-job can be executed within a dedicated time period, the RMS must support an advance resource reservation such as CCS Hovestadt (2003). Figure 3 depicts an example of an CPU reservation profile of such an RMS. In our model, we reserve three main types of resources: CPU, storage, and expert. The addition of further resources is

<sup>1</sup> In this book chapter, RMS is used to represent the cluster/super computer as well as the Grid service

Table 1. Sample workflow specification

set it to the same value for the presentation purposes only.

**2. Problem statement**

**2.2 Grid service model**

straightforward.

provided by the HPCC.

**2.1 Grid-based workflow model**

(Mhz) (GB) (slot) (GB)

5 4 3 5 6 6

Fig. 3. A sample CPU reservation profile of a local RMS

Fig. 4. A sample bandwidth reservation profile of a link between two local RMSs

If two output-input-dependent sub-jobs are executed on the same RMS, it is assumed that the time required for the data transfer equals zero. This can be assumed since all compute nodes in a cluster usually use a shared storage system such as NFS or DFS. In all other cases, it is assumed that a specific amount of data will be transferred within a specific period of time, requiring the reservation of bandwidth.

The link capacity between two local RMSs is determined as the average available capacity between those two sites in the network. The available capacity is assumed to be different for each different RMS couple. Whenever a data transfer task is required on a link, the possible time period on the link is determined. During that specific time period, the task can use the entire capacity, and all other tasks have to wait. Using this principle, the bandwidth reservation profile of a link will look similar to the one depicted in Figure 4. A more realistic model for bandwidth estimation (than the average capacity) can be found in Wolski (2003). Note, the kind of bandwidth estimation model does not have any impact on the working of the overall mechanism.

Table 2 presents the main resource configuration including the RMS specification and the bandwidth specification of the 6 RMSs in Figure 2. The RMS specification includes the number of CPU (cpu), the CPU speed in Mhz (speed), the amount of storage in GB (stor), the number of expert (exp). The bandwidth specification includes the source RMS (s), the destination RMS (d), and the bandwidth in GB/slot (bw). For presentation purpose, we assume that all reservation profiles are empty. It is noted that the CPU speed of each RMS can be different. We set it to the same value for the presentation purposes only.
