**4. Overall system architecture**

In this research we have focused on the problem of allocating resources for a given bag of PBDT tasks. The bag-of-tasks consists of a set of independent PBDT tasks all of which must be executed successfully. The Grid system consists of n nodes. Collectively, these n nodes are represented by a set Δ. Each individual PBDT task in the given bag-of-tasks may be divided into a number of sub-tasks called *jobs* which can be executed in parallel, independent of each other. As discussed, PBDT tasks are resource-intensive tasks that use a large amount of computing resources and communication bandwidth. Usually, if a node starts processing a PBDT task, pre-emption of this task is counter-productive as it wastes the effort of transferring the raw-data file to the concerned node. Also, due to excessive demand of computing power, a node is assumed to handle the processing of only one PBDT task at a time. In this research we have made the following two assumptions regarding the running of constituent jobs of a task on Grid nodes.


For the cost analysis of the purposed architecture, we have measured *cost* by the time (in seconds) spent in performing a particular communication or computation job. We have chosen one megabyte as a unit of data. When a particular node i accesses data in node j, the communication cost of transporting a data unit from node i to node j is designated by d(i,j). It is assumed that the communication costs are metrics, meaning that they are non-negative, represented by Cpm which is the cost of processing a unit of data. Set of all the nodes in the system is represented by **Δ**. To represent the computing costs, a vector of |**Δ|** dimensions denoted by [Cp] is used which holds the values of the computing costs of all the nodes in the system. A matrix [Cc] of dimensions |**Δ|** x |**Δ|** denotes the values of the communication costs between all the nodes in the system. The objective of this research is to assign resources to tasks in such a manner that the total cost in executing the given bag-of-tasks is minimized; where the total cost is defined as the total time spent by a task at all the resources it has used during its execution. Total cost indicates the total resource usage for executing a task and hence the minimization of the total cost is a system objective.

Resource Management for Data Intensive Tasks on Grids 55

details of the jobs in a file which is called the *workflow* of a Task, Ti. BiLeG architecture includes a software component called *workflow engine* which is designed to execute all the constituent jobs of a Task. The workflow engine is implemented as service and is

A combination of a TRPS policy and an RA algorithm is called an *Allocation Plan(AP)* and is represented by AP{<Policy>,<Algorithm>}. This paper explores the factors that determine

Note that the visibility of RA for a particular task is limited to its resource-pool. RA is myopic in nature and is not concerned with the overall system performance optimization. The objective of RA is to optimize the performance for a particular task only. TRPS is concerned with global system performance and has the responsibility to choose an appropriate resource-pool for each of the tasks and pass it on to RA. RA assigns a set of

It can be observed that In the BiLeG architecture, by dividing the overall system into two independent decision-making modules and by assigning both decision-making modules separate responsibilities; we divide the problem of scheduling the tasks in the given bag-of-

These three sub-problems may be solved by three independent algorithms. The division into three independent sub-problems makes the architecture customizable. It also provides finergrade control over the resource allocation for the given bag-of-tasks and helps improving

A TRPS resource selection policy is used at the upper decision making module to select the resource-pool for each task. It can be either *static* or *dynamic* in nature. A TRPS policy is said to be static if mapping between tasks and their corresponding resource-pools is established before the system starts executing tasks and it is dynamic if these mappings are established during runtime according to the current availability of the resources. Two static TRPS policies considered in this paper are presented in this section. Dynamic TRPS polices are

In SRPSP, the TRPS algorithm has two phases, a *mapping phase* and *an execution phase.*

this mapping, TRPS iteratively calls the algorithm at RA for each task in T.

The mapping phase (Fig. 2) is performed before the execution of the first task. Each task in T has a given resource-pool Δ. Thus, for each task ܶ߳ܶ and Ґi= Δ. In *Mapping* phase, a mapping between each task and the most appropriate set of resources it needs, is determined. To create

In the *Execution Phase*, the first task in set T is executed first. TRPS iterates through all the tasks in T and chooses the next task for which the complete set of resources needed is

3. Resource allocation for each constituent job in the given bag-of-tasks at RA.

responsible for running all the constituent jobs associated with a particular Task.

the choice of the most efficient allocation plan for a given bag-of-tasks.

resources from the resource-pool passed to it by TRPS.

1. Determination of the task execution order at TRPS

**5.1 Static Resource-Pool--Single Partition (SRPSP) policy** 

tasks into three different sub-problems:

2. Selection of resource-pool

the stated optimization objective.

discussed available in [6].

**5. TRPS resoruce selection policy** 

The BiLeG resource management system consists of two decision-making modules; a lower level decision-making module called Resource Allocator (RA) and a higher level decisionmaking module called Task Resource Pool Selector (TRPS). TRPS selects a task Ti from the given bag of PBDT tasks and allocates it a resource-pool which is a subset of all resources available. A resource-pool of a particular task Ti is represented by Γ�, where Γ� ⊆ Δ. RA allocates resources for a particular task Ti chosen from its associated resource-pool Γ�.

Each PBDT task consists of an unprocessed raw-data file, information about the processing operation that is required to be performed on the raw-data file and a set of sink nodes where the processed file is to be delivered. The source node groups the submitted tasks into a bagof-tsks (Fig. 1, Step-1) and initiates the processing by sending "initiate" signal to the TRPS Fig. 1 , Step-2). TRPS determines how many Grid resources are reserved (Fig. 1 , Step-3) by interacting with the Grid Computing Environment. This set of reserved nodes is represented by Δ. TRPS determines a resource-pool Ґi for each of the tasks Ti. Not all the Grid nodes reserved are available or visible to an individual task Ti in the bag-of-tasks, T. Typically, each task has a different resource-pool selected by TRPS according to the TRPS policy used. For an individual task, using all the resources of the resource-pool may not be the best option for its most efficient execution. A TRPS resource selection policy is deployed at TRPS and determines the way in which TRPS chooses a resource-pool for each individual task. The policy uses the existing system state and resource availability in making its decision.

Fig. 1. BiLeG Architecture

From the resource-pool Ґi allocated by TRPS to Ti, the lower level decision-making module (RA) chooses a set of resources that are used to perform Ti. This set of resources is denoted by ωi. For different systems, different resource allocation algorithm may be best suited at RA. The remaining set of resources (Ґi - ωi) are returned to TRPS. Based on the resources chosen by the algorithm, RA divides a particular task into various jobs. RA specifies the 54 Grid Computing – Technology and Applications, Widespread Coverage and New Horizons

The BiLeG resource management system consists of two decision-making modules; a lower level decision-making module called Resource Allocator (RA) and a higher level decisionmaking module called Task Resource Pool Selector (TRPS). TRPS selects a task Ti from the given bag of PBDT tasks and allocates it a resource-pool which is a subset of all resources available. A resource-pool of a particular task Ti is represented by Γ�, where Γ� ⊆ Δ. RA

Each PBDT task consists of an unprocessed raw-data file, information about the processing operation that is required to be performed on the raw-data file and a set of sink nodes where the processed file is to be delivered. The source node groups the submitted tasks into a bagof-tsks (Fig. 1, Step-1) and initiates the processing by sending "initiate" signal to the TRPS Fig. 1 , Step-2). TRPS determines how many Grid resources are reserved (Fig. 1 , Step-3) by interacting with the Grid Computing Environment. This set of reserved nodes is represented by Δ. TRPS determines a resource-pool Ґi for each of the tasks Ti. Not all the Grid nodes reserved are available or visible to an individual task Ti in the bag-of-tasks, T. Typically, each task has a different resource-pool selected by TRPS according to the TRPS policy used. For an individual task, using all the resources of the resource-pool may not be the best option for its most efficient execution. A TRPS resource selection policy is deployed at TRPS and determines the way in which TRPS chooses a resource-pool for each individual task. The policy uses the existing system state and resource availability in making its decision.

From the resource-pool Ґi allocated by TRPS to Ti, the lower level decision-making module (RA) chooses a set of resources that are used to perform Ti. This set of resources is denoted by ωi. For different systems, different resource allocation algorithm may be best suited at RA. The remaining set of resources (Ґi - ωi) are returned to TRPS. Based on the resources chosen by the algorithm, RA divides a particular task into various jobs. RA specifies the

allocates resources for a particular task Ti chosen from its associated resource-pool Γ�.

Fig. 1. BiLeG Architecture

details of the jobs in a file which is called the *workflow* of a Task, Ti. BiLeG architecture includes a software component called *workflow engine* which is designed to execute all the constituent jobs of a Task. The workflow engine is implemented as service and is responsible for running all the constituent jobs associated with a particular Task.

A combination of a TRPS policy and an RA algorithm is called an *Allocation Plan(AP)* and is represented by AP{<Policy>,<Algorithm>}. This paper explores the factors that determine the choice of the most efficient allocation plan for a given bag-of-tasks.

Note that the visibility of RA for a particular task is limited to its resource-pool. RA is myopic in nature and is not concerned with the overall system performance optimization. The objective of RA is to optimize the performance for a particular task only. TRPS is concerned with global system performance and has the responsibility to choose an appropriate resource-pool for each of the tasks and pass it on to RA. RA assigns a set of resources from the resource-pool passed to it by TRPS.

It can be observed that In the BiLeG architecture, by dividing the overall system into two independent decision-making modules and by assigning both decision-making modules separate responsibilities; we divide the problem of scheduling the tasks in the given bag-oftasks into three different sub-problems:


These three sub-problems may be solved by three independent algorithms. The division into three independent sub-problems makes the architecture customizable. It also provides finergrade control over the resource allocation for the given bag-of-tasks and helps improving the stated optimization objective.
