**6.1 2-Tier Architectural Templates**

In 2-Tier Templates only the source and the sink nodes are used for both processing and data transfer. There are two different types of 2 tier Templates: 2-Tier-a and 2-Tier-b. In a 2- Tier-a Template, the source node is used for data processing. Fig. 3 (a) shows the process, if 2-Tier-a architecture is used in a system. TRPS co-ordinates with the Task RA (1) and gives it a PBDT task and a resource pool (which is the set of all available nodes for this task). Task RA sends an acknowledgment signal back to TRPS (2). The Task Workflow Engine, deployed at Lower Level, L1, signals the source node to start the processing of data at the

Resource Management for Data Intensive Tasks on Grids 59

it is combined at the Egress node. The role of the data-farm is to replicate this processed data at chosen nodes to optimize its transfer to the sink nodes. Initially, TRPS co-ordinates with the RA and gives it a PBDT task and a corresponding resource pool (1). After running the resource allocation algorithm, RA generates the workflow of the given task Ti which indicates that which of the nodes from the provided resource pool will be used. RA returns ώi back to TRPS indicating which of the resources are planned to be used for the execution of task Ti (2). The Task Workflow Engine initiates the process (3). Once processing of the data is completed at the compute-farm nodes, these partitions are transferred to a special node called Egress Node where they are combined to produce the required processed file. The Egress Node sends a signal to the Task Workflow Engine (4) to indicate the completion

The responsibility of the Egress node is to make sure that all the partitions of the raw data file associated with Ti have been successfully processed. Even if a small portion of data gets missing or corrupted due to any unforeseen error, the resultant processed file formed by the combination of the constituent processed files may be become invalid. In practical environments catching such error at earlier stage is often desirable as the Task Workflow Engine can re-initiate the processing of faulty data partition only. If Egress node is not present, the system is not able to catch such errors at early stages and in case of an error in the processing of one of the partitions, the resultant processed file becomes invalid. In this case the only way to recover is to restart the whole process again from the scratch which

From Egress, this processed data is transferred to the data nodes chosen by the algorithm in the workflow. From there it is delivered to each of the k sink nodes. Once the processed data is delivered to all sink nodes, Task Workflow Engine is notified (51 to 5k) which, in turn, notifies the TRPS (6) to indicate the completion of the task. Note that in compute-farm partitions of raw data files are transferred. But in data-farm complete processed files (not partitions) are transferred and replicated. 3-tier Architectural Template (having a computefarm, but no data-farm or Egress node) is not discussed in this paper. If data-farm is not

In this section ATSRA algorithms are described which enable RA to assign resources to Ti within the resource pool, Ґi, allocated to it by TRPS. ATSRA algorithms are based on Linear Programming which is a popular technique for solving optimization problems [12][13]. It models an optimization problem as a set of linear expressions composed of input parameters and output parameters. The LP solver starts by creating a problem instance of the model by assigning values to the input parameters[16][17]. The problem instance is then subjected to an objective function, which is also required to be a linear expression. The values of the output variables, which collectively represent the optimal solution, are determined for the best value of the objective function. Based on this approach three algorithms are presented in this section which can be deployed at RA. Each of the ATSRA

required, 3-tier Architectural Template can be used instead of a 4-tier Template.

**Stage-1:** Selection of the most appropriate Architectural Template, ATi for Ti.

**Stage-2:** Allocation of the resources for ATi (if not done in stage 1.)

would be considerable wastage of both time and resources.

of this stage.

**7. RA Algorithms** 

algorithms has the following two stages.

source node (31). The raw data file is processed at source node (32), and is delivered to each of the sink nodes (331 to 33k). After the transfer of processed data is completed, each of the k sink nodes sends an acknowledgment to the Task Workflow Engine to indicate that the processed file have reached the sink nodes(shown by (41) to (4k)). Once all the k sink nodes have sent completion signals to RA, RA sends the signal to TRPS to indicate that the task has been completed (5).

2-Tier-b Architectural Template is similar to 2-Tier-a (shown in Fig. 3 (b)). The important difference is that instead of using the source node, the data processing job is done at each of the sink nodes (331 to 33k in Fig. 3 (b)).

Fig. 3. (a) 2-Tier-a Architecture (b) 2-Tier-b Architecture

#### **6.2 4-tier Architectural templates**

In a 4-tier Architectural Template, the resource pool of the given task (representing the set of available resources for the given task selected by TRPS) is grouped in two domains a compute-farm and a data-farm. Both a compute-farm and a data-farm have a specific role (see Fig. 4). The role the compute-farm is to process the data. Once all the data is processed, 58 Grid Computing – Technology and Applications, Widespread Coverage and New Horizons

source node (31). The raw data file is processed at source node (32), and is delivered to each of the sink nodes (331 to 33k). After the transfer of processed data is completed, each of the k sink nodes sends an acknowledgment to the Task Workflow Engine to indicate that the processed file have reached the sink nodes(shown by (41) to (4k)). Once all the k sink nodes have sent completion signals to RA, RA sends the signal to TRPS to indicate that the task has

2-Tier-b Architectural Template is similar to 2-Tier-a (shown in Fig. 3 (b)). The important difference is that instead of using the source node, the data processing job is done at each of

(a)

(b)

In a 4-tier Architectural Template, the resource pool of the given task (representing the set of available resources for the given task selected by TRPS) is grouped in two domains a compute-farm and a data-farm. Both a compute-farm and a data-farm have a specific role (see Fig. 4). The role the compute-farm is to process the data. Once all the data is processed,

Fig. 3. (a) 2-Tier-a Architecture (b) 2-Tier-b Architecture

**6.2 4-tier Architectural templates** 

been completed (5).

the sink nodes (331 to 33k in Fig. 3 (b)).

it is combined at the Egress node. The role of the data-farm is to replicate this processed data at chosen nodes to optimize its transfer to the sink nodes. Initially, TRPS co-ordinates with the RA and gives it a PBDT task and a corresponding resource pool (1). After running the resource allocation algorithm, RA generates the workflow of the given task Ti which indicates that which of the nodes from the provided resource pool will be used. RA returns ώi back to TRPS indicating which of the resources are planned to be used for the execution of task Ti (2). The Task Workflow Engine initiates the process (3). Once processing of the data is completed at the compute-farm nodes, these partitions are transferred to a special node called Egress Node where they are combined to produce the required processed file. The Egress Node sends a signal to the Task Workflow Engine (4) to indicate the completion of this stage.

The responsibility of the Egress node is to make sure that all the partitions of the raw data file associated with Ti have been successfully processed. Even if a small portion of data gets missing or corrupted due to any unforeseen error, the resultant processed file formed by the combination of the constituent processed files may be become invalid. In practical environments catching such error at earlier stage is often desirable as the Task Workflow Engine can re-initiate the processing of faulty data partition only. If Egress node is not present, the system is not able to catch such errors at early stages and in case of an error in the processing of one of the partitions, the resultant processed file becomes invalid. In this case the only way to recover is to restart the whole process again from the scratch which would be considerable wastage of both time and resources.

From Egress, this processed data is transferred to the data nodes chosen by the algorithm in the workflow. From there it is delivered to each of the k sink nodes. Once the processed data is delivered to all sink nodes, Task Workflow Engine is notified (51 to 5k) which, in turn, notifies the TRPS (6) to indicate the completion of the task. Note that in compute-farm partitions of raw data files are transferred. But in data-farm complete processed files (not partitions) are transferred and replicated. 3-tier Architectural Template (having a computefarm, but no data-farm or Egress node) is not discussed in this paper. If data-farm is not required, 3-tier Architectural Template can be used instead of a 4-tier Template.
