**1. Introduction**

The ubiquitous Internet as well as the availability of powerful computers and high-speed network technologies as low-cost commodity components are changing the way computing is carried out. It becomes more feasible to use widely distributed computers for solving large-scale problems, which cannot often be effectively dealt without using a single existing powerful supercomputer. In terms of computations and data requirements, these problems are often resource intensive due to their size and complexity. They may also involve the use of a variety of heterogeneous resources that are not usually available in a single location. This led to the emergence of what is known as Grid computing. Grid computing enables sharing of heterogeneous distributed resources across different administrative and geographical boundaries [3]. By sharing these distributed resources, many complex distributed tasks can be performed in a cost effective way. The way the resources are allocated to tasks holds a pivotal importance for achieving satisfactory system performance [4]. To perform efficiently, the resource allocation algorithm has to take into account many factors, such as, the system and workload conditions, type of the task to be performed and the requirements of the end user.

To devise more efficient allocation algorithms, it may be useful to classify the given tasks into predefined types based on similarities in their predicted resource needs or workflows. This classification of tasks into various types provides the possibility to customize the allocation algorithm according to a particular group of similar tasks. This chapter presents an effective resource management middleware developed for a type of resource-intensive tasks classified as Processable Bulk Data Transfer (PBDT) tasks. The common trait among PBDT tasks is the transfer of a very large amount of data which has to be processed in some way before it can be delivered from a source node to a set of designated sink nodes (Ahmad, I & Majumdar, S. , 2008). Typically, these tasks can be broken down into parallel sub-tasks, called jobs. Various multimedia and High Energy Physics (HEP) applications can be classified as PBDT tasks. The processing operation involved in these tasks may be as simple as applying a compression algorithm to a raw video file in a multimedia application; or, as complex as isolating information about particles pertaining to certain wavelengths in High Energy Physics (HEP) experimentations [22][25]. Performing PBDT tasks requires both computing power and large bandwidths for data transmission. To perform such resourceintensive tasks, in recent years, research has been conducted in devising effective resource

Resource Management for Data Intensive Tasks on Grids 51

One of the traditional approaches is to use a Grid resource broker which selects suitable resources by interacting with various middleware services. Venugopal describes such a Grid resource broker that discovers computational and data resources running diverse middleware through distributed discovery services [12]. However, any mechanism for

YarKhan and Dongarra [22] have also performed scheduling experiments in a Grid environment using simulated annealing. To evaluate the schedules generated by the simulated annealing algorithm they use a Performance Model, a function specifically created to predict the execution time of the program. Generating such a Performance Model

Another effort worth mentioning is Grid Application Development Software (GrADS) Project [2]. At the heart of the GrADS architecture is an enhanced execution environment which continually adapts the application to changes in the Grid resources, with the goal of maintaining overall performance at the highest possible level. A number of resource allocation algorithms can be used at GrADS to schedule a given bag-of-tasks in Grid environments. Due to the NP-complete nature of the resource allocation problem the

For resource allocation in Grids, some researchers have also proposed policy based resource allocation techniques. Sander et al. [12] propose a policy based architecture for QoS configuration for systems that comprise different administrative domains in a Grid. They focus on making decisions when users attempt to make reservations for network bandwidth across several administrative network domains that are controlled by a bandwidth broker. The bandwidth broker acts as an allocator and establishes an end-to-end signalling process that chooses the most efficient path based on the available bandwidth. The work presented in [13] is concerned with data transmission costs only; whereas the research presented in this research needs to consider both computation and communication costs associated with the PBDT tasks. Verma. et al. [19] has also proposed a technique in which resource allocation is performed based on a predefined policy. But in this research, the resource

Many recent efforts have focused on scheduling of workflows in Grids. [16] presents a QoSbased workflow management system and a scheduling algorithm that match workflow applications with resources using event condition action rules. Pandey and Buyya have worked on scheduling scientific workflows using various approaches in the context of their

1. Traditional Schedulers and Resource Brokers

**2.1 Traditional schedulers and resource brokers** 

Each of these approaches is discussed in a following subsection.

breaking a given task into parallel jobs for processing, is not present.

majority of proposed solutions are heuristic algorithms [14] [18] [20].

requires detailed analysis of the program to be scheduled.

**2.2 Policy based resource allocation** 

allocation is not based on any performance measure.

**2.3 Workflow based resource allocation** 

2. Policy based Resource Allocation 3. Workflow based Resource Allocation

management middleware which has led to the creation of various efficient technologies. In order to provide a satisfactory performance, these systems must optimize the overall execution time (or makespan) of the resource-intensive tasks. This requires efficient allocation of the resources for the sub-tasks (called jobs in this paper) of the PBDT tasks at the individual machines of the network of nodes.

The problem of optimally scheduling these sub-tasks is a well-known NP complete problem [12]. To tackle it, various heuristics-based algorithms that can generate near-optimal solutions to optimization problems in polynomial times are devised. In this chapter a Bilevel Grid Resource Management System abbreviated as BiLeG is presented, in which the decision-making module is divided into two separate sub-modules. The upper level decision-making module is called the Task & Resource Pool Selector (TRPS). It selects a task from the given bag-of-tasks for which resources are to be assigned and chooses a partition of resources available for this chosen task (called the resource-pool of this task) which is typically a subset of all the resources available. The lower level decision-making module is called the Resource Allocator (RA), which uses an assignment algorithm to decide how the resources(from the chosen resource-pool) are allocated to the jobs, in a given task. Various algorithms can be used at RA whereas various policies can be deployed at TRPS. A particular combination of a TRPS policy and a RA scheduling algorithm deployed at a time is called an allocation-plan which determines the resource allocation for each task in the given bag-of-tasks. The following notation is used in this paper to write an allocation-plan: TRPS Policy, RA-Algorithm>. Investigating the choice of the most appropriate allocationplan under a specific set of workload and system conditions is the focus of this chapter.

The main contributions of this paper are summarized.

1. It proposes the ATSRA algorithm and two extensions based on constraints relaxation technique.

Based on simulation, it analyses the performance of the proposed algorithms for different number of available Grid nodes.

2. The experimental results capture the trade-off between accuracy in resource allocation and scheduling overhead both of which affect the overall system performance. The chapter discusses under which circumstances the proposed original algorithm or its extensions should be used.

The rest of the paper is organized as follows. In Section 2, different approaches to resource allocation of tasks on Grids are presented. In Section 3, PBDT tasks are described. In Section 4, the problem being solved is defined and an overview of the proposed system is presented. In Section 5 policies are described. In Section 6, the concept of Architectural Templates is described. In Section 7, a Linear Programming (LP) based algorithm and its extensions are described that can be used to perform PBDT tasks. In Section 8, experimental results are presented. Finally, in Section 9, the chapter is concluded.
