**8. Results experimental**

This paper uses the following performance metrics.

Resource Management for Data Intensive Tasks on Grids 65

Fig. 5. Performance with SRPsp chosen at TRPS (a) Scheduling Algorithm Running time (b)

constraints associated with the LP formulation, we are allocating the resource with the highest precision and it this allocation is expected to be efficient. As constraint relaxation is applied at stage one of the algorithms in ATSRASSR, tms-nonSch increases. It further increases for ATSRABSR in which constraint relaxation is applied at both stage of the algorithm. Fig. 6(d) shows the total cost (tcost) for each of the three algorithms. ATSRAorg has the lowest tcost as we are allocating the resources with highest precision. This is followed by ATSRASSR and ATSRABSR. The overall makespan time, tms-total shown in Fig. 5(c) includes both the scheduling time and the execution time for the bag-of-tasks. It captures the tradeoff between tms-noSch and tsch presented in Fig. 5(b) and 5(a) respectively. For a very small number of nodes the scheduling overhead for the ATSRAorg is small and tms-nonsch is the lowest and as a result the best tms-total is achieved. For a large number of nodes, the scheduling overhead for ATSRAorg is very high and the benefit of using a better resource allocation is offset by the overhead and its performance deteriorates. ATSRABSR that exhibits the smallest scheduling overhead for a large number of nodes (see Fig. 5(a)) demonstrates the best tms-total (see Fig.

Makespan non-Scheduling time (c) Makespan Total (d) Total Cost

**Makespan total (tms-total):** The time in seconds required for completing all the tasks in the given bag-of-tasks, T.

**Makespan non-scheduling (tms-nonSch):** The time in seconds required for completing all the tasks in T, *excluding* the time taken by scheduling algorithm.

**Scheduling algorithm running time (tsch)**: Total time in seconds taken by the scheduling algorithm to schedule all the given resources.

**Total cost (tcost).** Sum of the time in seconds for which all the resources are utilized in performing the bag-of-tasks T.

To analyze the performance of the proposed RA algorithms, a simulation based investigation is performed. Various performance metrics described earlier were captured at the end of each simulation. Each experiment was repeated enough number of times to produce a confidence interval of ±5% for each performance metric at a confidence level of 95%. The workload chosen for these experiments is a bag-of-tasks consisting of 32 PBDTfixedpar tasks. Each of these tasks models the encoding of a raw multimedia file which is to be processed and delivered to a set of sink nodes. The choice of the raw data file is based on a typical animation movie described in [2]. The size of the raw data files of each of the tasks in the given bag-of-tasks is an important workload parameter. A detailed study of the characteristics of similar real-world tasks was carried out. The true representative probability distribution of the sizes of the raw or unprocessed data files used in similar tasks has been a subject of discussion over the years in the research community. Researchers seem to be split over characterizing it either with a Pareto or with Log-normal distribution. After careful analysis the Pareto distribution seems to be a better representative of PBDT multimedia workloads and is thus used. Another important parameter for the workload is the value of p, which is the number of partitions in which raw data files can be divided. This value is included in the metadata of each of the raw data file of the given tasks. The value of p depends on the structure of the raw data file and the type of processing required for it. For example, if a raw data file of multimedia animation contains 20 sub-sequences, each of which has to be processed as a single partition, then this task has a p of 20. The number of partitions (referred to as sub-sequences in the description of the movie rendering project presented in [1]) for each raw file varies from 1 to 30. We have used a uniform distribution [1-30] for modeling the number of partitions in each raw multimedia file. The mean of the raw data files is fixed at 650MB.

For performance analysis of the proposed algorithms total number of nodes of the Grid system is increased. Number of Grid nodes is directly related to the time-complexity of the deployed algorithm. All other parameters related to workload and system conditions are kept constant.

Fig. 5 shows the performance of three RA algorithms (ATSRAorg, ATSRASSR and ATSRABSR) with SRPsp deployed at the TRSP. Fig. 5(a) shows the time taken by each of the algorithm to run. It can be that for small number of nodes, there is not much difference in the scheduling time taken be these three algorithms. The time taken by the ATSRAorg algorithm rises sharply as number of nodes is increased more than 32. It can be observed that for ATSRABSR, tsch does not rise sharply. Fig. 6(b) shows the value of Makespan non-scheduling (tms-nonSch) for each of the proposed algorithms as the number of nodes is increased. It is clear that ATSRAorg has the lowest value of tms-nonSch for all values of n. This is expected as by using all

64 Grid Computing – Technology and Applications, Widespread Coverage and New Horizons

**Makespan total (tms-total):** The time in seconds required for completing all the tasks in the

**Makespan non-scheduling (tms-nonSch):** The time in seconds required for completing all the

**Scheduling algorithm running time (tsch)**: Total time in seconds taken by the scheduling

**Total cost (tcost).** Sum of the time in seconds for which all the resources are utilized in

To analyze the performance of the proposed RA algorithms, a simulation based investigation is performed. Various performance metrics described earlier were captured at the end of each simulation. Each experiment was repeated enough number of times to produce a confidence interval of ±5% for each performance metric at a confidence level of 95%. The workload chosen for these experiments is a bag-of-tasks consisting of 32 PBDTfixedpar tasks. Each of these tasks models the encoding of a raw multimedia file which is to be processed and delivered to a set of sink nodes. The choice of the raw data file is based on a typical animation movie described in [2]. The size of the raw data files of each of the tasks in the given bag-of-tasks is an important workload parameter. A detailed study of the characteristics of similar real-world tasks was carried out. The true representative probability distribution of the sizes of the raw or unprocessed data files used in similar tasks has been a subject of discussion over the years in the research community. Researchers seem to be split over characterizing it either with a Pareto or with Log-normal distribution. After careful analysis the Pareto distribution seems to be a better representative of PBDT multimedia workloads and is thus used. Another important parameter for the workload is the value of p, which is the number of partitions in which raw data files can be divided. This value is included in the metadata of each of the raw data file of the given tasks. The value of p depends on the structure of the raw data file and the type of processing required for it. For example, if a raw data file of multimedia animation contains 20 sub-sequences, each of which has to be processed as a single partition, then this task has a p of 20. The number of partitions (referred to as sub-sequences in the description of the movie rendering project presented in [1]) for each raw file varies from 1 to 30. We have used a uniform distribution [1-30] for modeling the number of partitions in each raw multimedia file. The mean of the

For performance analysis of the proposed algorithms total number of nodes of the Grid system is increased. Number of Grid nodes is directly related to the time-complexity of the deployed algorithm. All other parameters related to workload and system conditions are kept constant. Fig. 5 shows the performance of three RA algorithms (ATSRAorg, ATSRASSR and ATSRABSR) with SRPsp deployed at the TRSP. Fig. 5(a) shows the time taken by each of the algorithm to run. It can be that for small number of nodes, there is not much difference in the scheduling time taken be these three algorithms. The time taken by the ATSRAorg algorithm rises sharply as number of nodes is increased more than 32. It can be observed that for ATSRABSR, tsch does not rise sharply. Fig. 6(b) shows the value of Makespan non-scheduling (tms-nonSch) for each of the proposed algorithms as the number of nodes is increased. It is clear that ATSRAorg has the lowest value of tms-nonSch for all values of n. This is expected as by using all

tasks in T, *excluding* the time taken by scheduling algorithm.

algorithm to schedule all the given resources.

performing the bag-of-tasks T.

raw data files is fixed at 650MB.

given bag-of-tasks, T.

Fig. 5. Performance with SRPsp chosen at TRPS (a) Scheduling Algorithm Running time (b) Makespan non-Scheduling time (c) Makespan Total (d) Total Cost

constraints associated with the LP formulation, we are allocating the resource with the highest precision and it this allocation is expected to be efficient. As constraint relaxation is applied at stage one of the algorithms in ATSRASSR, tms-nonSch increases. It further increases for ATSRABSR in which constraint relaxation is applied at both stage of the algorithm. Fig. 6(d) shows the total cost (tcost) for each of the three algorithms. ATSRAorg has the lowest tcost as we are allocating the resources with highest precision. This is followed by ATSRASSR and ATSRABSR. The overall makespan time, tms-total shown in Fig. 5(c) includes both the scheduling time and the execution time for the bag-of-tasks. It captures the tradeoff between tms-noSch and tsch presented in Fig. 5(b) and 5(a) respectively. For a very small number of nodes the scheduling overhead for the ATSRAorg is small and tms-nonsch is the lowest and as a result the best tms-total is achieved. For a large number of nodes, the scheduling overhead for ATSRAorg is very high and the benefit of using a better resource allocation is offset by the overhead and its performance deteriorates. ATSRABSR that exhibits the smallest scheduling overhead for a large number of nodes (see Fig. 5(a)) demonstrates the best tms-total (see Fig.

Resource Management for Data Intensive Tasks on Grids 67

Fig. 6. Performance with SRPsp+BF chosen at TRPS (a) Scheduling Algorithm Running time

In this chapter, by using BiLeG an allocation-plan is devised which reflects the overall resource allocation strategy comprising two parts; a policy used at the higher decision making module, TRPS, which has the responsibility to select a resource-pool for each of the tasks; and a resource allocation algorithm used at the lower decision making module, RA, which actually assigns resources from the resource-pool selected by TRPS for a particular PBDT task. Three RA algorithms and six TRPS policies have been proposed in this chapter forming different allocation-plans. The suitability of various allocation-plans under different

Detailed study of the various trade-offs, implicit in the use of different allocation-plans, is the focal points of this chapter. The most suitable allocation-plan not only depends on various workload and system parameters, it also depends on the user requirements and the hardware available. It can be seen that from the performance perspective various trade-offs exist among different allocation-plans and understanding these trade-offs in depth is the

(b) Makespan non-Scheduling time (c) Makespan Total (d) Total Cost

sets of system and workload parameters has been explored.

focus of the experiments conducted in this chapter.

**9. Summary and conclusion** 

5(c)). It is interesting to see that ATSRASSR produces the best tms-total for a range of intermediate values of the number of Grid nodes. The accuracy of resource allocation for ATSRASSR lies between that achieved with ATSRAorg and ATSRABSR. For a small number of nodes, tsch of ATSRASSR is comparable to that of ATSRAorg; whereas the tms-nonSch achieved by ATSRASSR is inferior to that achieved by ATSRAorg. Thus if the number of nodes is small, ATSRASSR is inferior to that of ATSRAorg.

For a large number of nodes, although ATSRASSR gives rise to a lower scheduling overhead than ATSRABSR, the advantage is offset by the much lower execution time produced by ATSRABSR. The net effect is that tms-total achieved by ATSRASSR is inferior to that of ATSRABSR for a large number of nodes.

Fig. 6 shows the performance of ATSRA algorithms when SRPsp +BF is deployed at TRPS. As in the case of Fig. 5(c) the best tms-total is achieved by ATSRABSR for larger numbers of nodes; whereas ATSRAorg demonstrates the best performance for a lower number of nodes. ATSRASSR demsonstrates a slightly higher tms-total than ATSRAorg when the number of Grid nodes is small. Although the total makespan achieved by it is better than ATSRAorg at higher number of nodes, it is higher than that achieved by ATSRABSR. The relative performances of the three algorithms captured in Fig. 6(a) , Fig. 6(b) and Fig. 6(d) are the same as those displayed in Fig. 5(a), Fig. 5(b) and Fig. 5(d) respectively. ATSRAorg demonstrates the best in tms-nonSch and tcost followed by ATSRASSR and ATSRABSR; whereas the smallest scheduling overhead is achieved with ATSRABSR and ATSRAorg demonstrates the highest scheduling overhead. The rationale for such a behavior has been provided in the discussion presented earlier for Fig. 5(a) Fig. 5(b) and Fig. 5(d). Note that although the shapes of the graphs in Fig. 5(a) and Fig. 6(a) are similar, the value of tsh for a given number of nodes in Fig 6(a) is higher than the value of tsh for the same number of nodes in Fig. 5(a). This is because in SRPsp +BF backfilling is used which increases scheduling overheads. While the relative performance of ATSRAorg, ATSRASSR and ATSRABSR remains almost the same, this additional scheduling overhead has shifted the graphs upwards in Fig. 6(a) as compared to Fig. 5(a).

For ATSRAorg and ATSRASSR algorithms and any given number of nodes, the tms-nonSch achieved with SRPsp +BF is observed to be smaller than that achieved SRPsp (see Fig. 5 (b) and Fig. 6(b). This demonstrates the effectiveness of using backfilling that can increase the concurrency of task execution. Except for the case in which the number of Grid nodes is 128, a similar behavior is observed with ATSRABSR.

Comparing tms-total achieved with SRPsp (Fig. 5(c)) and SRPsp +BF (Fig. 6(c)), we observe that for any given ATSRA algorithm, the total makespan achieved by SRPsp +BF is superior to that achieved by SRPsp when the number of nodes is small. For higher number of nodes, SRPsp +BF demonstrates an inferior performance. This becauseat smaller number of nodes concurrent execution of tasks may be severely limited with SRPsp because many tasks may not be able to get all their resources at the same time. With the use of backfilling this problem is alleviated as RA is run for each waiting task with the set of unused resources as the resource pool. However, this problem with task concurrency is not that severe at higher number of nodes. Thus, SRPsp +BF that re-runs RA multiple times and incurs a higher scheduling overhead demonstrates an inferior performance as the potential performance benefit due to backfilling is offset by the overhead.

66 Grid Computing – Technology and Applications, Widespread Coverage and New Horizons

5(c)). It is interesting to see that ATSRASSR produces the best tms-total for a range of intermediate values of the number of Grid nodes. The accuracy of resource allocation for ATSRASSR lies between that achieved with ATSRAorg and ATSRABSR. For a small number of nodes, tsch of ATSRASSR is comparable to that of ATSRAorg; whereas the tms-nonSch achieved by ATSRASSR is inferior to that achieved by ATSRAorg. Thus if the number of nodes is small,

For a large number of nodes, although ATSRASSR gives rise to a lower scheduling overhead than ATSRABSR, the advantage is offset by the much lower execution time produced by ATSRABSR. The net effect is that tms-total achieved by ATSRASSR is inferior to that of ATSRABSR

Fig. 6 shows the performance of ATSRA algorithms when SRPsp +BF is deployed at TRPS. As in the case of Fig. 5(c) the best tms-total is achieved by ATSRABSR for larger numbers of nodes; whereas ATSRAorg demonstrates the best performance for a lower number of nodes. ATSRASSR demsonstrates a slightly higher tms-total than ATSRAorg when the number of Grid nodes is small. Although the total makespan achieved by it is better than ATSRAorg at higher number of nodes, it is higher than that achieved by ATSRABSR. The relative performances of the three algorithms captured in Fig. 6(a) , Fig. 6(b) and Fig. 6(d) are the same as those displayed in Fig. 5(a), Fig. 5(b) and Fig. 5(d) respectively. ATSRAorg demonstrates the best in tms-nonSch and tcost followed by ATSRASSR and ATSRABSR; whereas the smallest scheduling overhead is achieved with ATSRABSR and ATSRAorg demonstrates the highest scheduling overhead. The rationale for such a behavior has been provided in the discussion presented earlier for Fig. 5(a) Fig. 5(b) and Fig. 5(d). Note that although the shapes of the graphs in Fig. 5(a) and Fig. 6(a) are similar, the value of tsh for a given number of nodes in Fig 6(a) is higher than the value of tsh for the same number of nodes in Fig. 5(a). This is because in SRPsp +BF backfilling is used which increases scheduling overheads. While the relative performance of ATSRAorg, ATSRASSR and ATSRABSR remains almost the same, this additional scheduling

overhead has shifted the graphs upwards in Fig. 6(a) as compared to Fig. 5(a).

a similar behavior is observed with ATSRABSR.

benefit due to backfilling is offset by the overhead.

For ATSRAorg and ATSRASSR algorithms and any given number of nodes, the tms-nonSch achieved with SRPsp +BF is observed to be smaller than that achieved SRPsp (see Fig. 5 (b) and Fig. 6(b). This demonstrates the effectiveness of using backfilling that can increase the concurrency of task execution. Except for the case in which the number of Grid nodes is 128,

Comparing tms-total achieved with SRPsp (Fig. 5(c)) and SRPsp +BF (Fig. 6(c)), we observe that for any given ATSRA algorithm, the total makespan achieved by SRPsp +BF is superior to that achieved by SRPsp when the number of nodes is small. For higher number of nodes, SRPsp +BF demonstrates an inferior performance. This becauseat smaller number of nodes concurrent execution of tasks may be severely limited with SRPsp because many tasks may not be able to get all their resources at the same time. With the use of backfilling this problem is alleviated as RA is run for each waiting task with the set of unused resources as the resource pool. However, this problem with task concurrency is not that severe at higher number of nodes. Thus, SRPsp +BF that re-runs RA multiple times and incurs a higher scheduling overhead demonstrates an inferior performance as the potential performance

ATSRASSR is inferior to that of ATSRAorg.

for a large number of nodes.

Fig. 6. Performance with SRPsp+BF chosen at TRPS (a) Scheduling Algorithm Running time (b) Makespan non-Scheduling time (c) Makespan Total (d) Total Cost
