**1. Introduction**

26 Will-be-set-by-IN-TECH

28 Grid Computing – Technology and Applications, Widespread Coverage and New Horizons

Sahai, A., Graupner, S., Machiraju, V. and Moorsel, A. 2003 'Specifying and Monitoring

Sarkar, V., 1989, Partitioning and Scheduling Parallel Programs for Multiprocessors. MIT

Sih, G. C., and Lee, E. A., 1993, Declustering: a new multiprocessor scheduling technique.

Singh, M. P. and Vouk, M. A. (1997) *Scientific Workflows: Scientific Computing*

Spooner, D. P., Jarvis, S. A., Cao, J., Saini, S. and Nudd, G. R. (2003) 'Local Grid Scheduling

Yu, J., Buyya R. (2005) 'A taxonomy of scientific workflow systems for grid computing',

Wolski, R. (2003) 'Experiences with Predicting Resource Performance On-line in

Computational Grid Settings', *ACM SIGMETRICS Performance Evaluation Review*, Vol.

Fischer, L. *Workflow Handbook 2004*, Future Strategies Inc., Lighthouse Point, FL, USA.

*IEEE Transactions on Parallel and Distributed Systems*, 4, 625-637.

papers/databases/workflows/sciworkflows.html

Techniques Using Performance Prediction',

*SIGMOD Record*, Vol. 34, No. 3, pp.44-49.

*CCGrid2003*, pp.292–300.

Press, Cambridge, MA.

Digital Techniques, pp.87–96.

30, No. 4, pp.41-49.

Guarantees in Commercial Grids through SLA', *Proceeding of the 3rd IEEE/ACM*

*Meets Transactional Workflows*, http://www.csc.ncsu.edu/faculty/mpsingh/

IEEE Proceedings - Computers and

Actual task execution models over the networked processors, e.g., cluster, grid and utility computing have been studied and developed for maximizing the system throughput by utilizing computational resources. One of major trends in task execution types is to divide the required data into several pieces and then distribute them to workers like "master-worker model". In contrast to such a data intensive job, how to divide a computational intensive job into several execution units for parallel execution is under discussion from theoretical points of view. If we take task parallelization into account in a grid environment such as a computational grid environment, an effective task scheduling strategy should be established. In the light of combining task scheduling concepts and grid computing methodologies, heterogeneity with respect to processing power, communication bandwidth and so on should be incorporated into a task scheduling strategy. If we assume the situation where multiple jobs are being submitted in the unknown number of computational resources over the Internet, objective functions can be considered as follows: (i) Minimization of the schedule length (the time duration from per each job, (ii) Minimization of the completion time of the last job, (iii) Maxmization of the degree of contribution to the total speed up ratio for each computational resources. As one solution for those three objective functions, in the literature(Kanemitsu, 2010) we proposed a method for minimizing the schedule length per one job with a small number of computational resources (processors) for a set of identical processors. The objective of the method is "utilization of computational resources". The method is based on "task clustering" (A. Gerasoulis, 1992), in which tasks are merged into one "cluster" as an execution unit for one processor. As a result, several clusters are generated and then each of which becomes one assignment unit. The method proposes to impose the lower bound for every cluster size to limit the number of processors. Then the literature theoretically showed the near-optimal lower bound to minimize the schedule length.

However, which processor should be assigned to a cluster is not discussed because the proposal assumes identical processors. If we use one of conventional cluster assignment

Triplet is that one cluster is assigned a processor groups composed of two or more processors.

<sup>31</sup> On the Effect of Applying the Task Clustering

In FCS algorithm(S. Chingchit, 1999), it defines two parameters, i.e., *β*: total task size to total data size ratio (where task size means that the time unit required to execute one instruction) for each cluster and *τ*: processing speed to communication bandwidth ratio for each processor. During task merging steps are performed, if *β* of a cluster exceeds *τ* of a processor, the cluster is assigned to the processor. As a result, the number of clusters depends on each processor's speed and communication bandwidth. Thus, there is one possibility that "very small cluster"

We assume a job to be executed among distributed processor elements (PEs) is a Directed

*s* is the number of task merging steps(described in sec. 3.2), *Vs* is the set of tasks after *s* task merging steps, *Es* is the set of edges (data communications among tasks) after *s* task merging

taken for being processed by the reference processor element. We define data dependency

One constraint imposed by a DAG is that a task can not be started execution until all data from

*<sup>j</sup>* as *<sup>e</sup><sup>s</sup> i*,*j*

*cls* is the set of clusters which consists of one or more tasks after *s* task merging

. And *c*(*e<sup>s</sup>*

*i*,*j*

. And let *pred*(*n<sup>s</sup>*

*<sup>i</sup>*) be a size of *<sup>n</sup><sup>s</sup>*

*<sup>j</sup>* over the reference communication link.

*<sup>i</sup>*,*<sup>j</sup>* means that *<sup>n</sup><sup>s</sup>*

*j*

*<sup>i</sup>* <sup>≺</sup> *<sup>n</sup><sup>s</sup> j* .

Task clustering is a set of task merging steps, that is finished when certain criteria have been

Throughout this chapter, we denote that *clss*(*i*) is "linear" if and only if *clss*(*i*) contains no independent task(A. Gerasoulis, 1993). Note that if one cluster is linear, at least one path

We assume that each PE is completly connected to other PEs, with non-identical processing speeds and communication bandwidths The set of PEs is expressed as *P* = {*P*1, *P*2,..., *Pm*},

*<sup>i</sup>*) be the set of immediate successors of *<sup>n</sup><sup>s</sup>*

, are included in the same cluster, they are assigned to the same processor. Then

*<sup>j</sup>* is localized, so that we define *<sup>c</sup>*(*e<sup>s</sup>*

*cls* = (*Vs*, *Es*, *<sup>V</sup><sup>s</sup>*

*<sup>i</sup>* , i.e., *<sup>w</sup>*(*n<sup>s</sup>*

*<sup>i</sup>* is called END task. If there are one or more paths

*<sup>k</sup>* is included in *clss*(*i*) by "the *s* + 1 th task

*cls*) be the DAG, where

*<sup>i</sup>*) is the sum of unit times

*<sup>i</sup>*) be the set of immediate

*<sup>k</sup>*}. If any two tasks, i.e.,

) becomes zero.

*<sup>i</sup>*) = <sup>∅</sup>, *<sup>n</sup><sup>s</sup>*

*<sup>i</sup>* is

) is the sum of unit times taken for

*<sup>j</sup>* can not be started until data from

*<sup>i</sup>* . If *pred*(*n<sup>s</sup>*

*i*,*j*

Thus, such a policy does not match with the concept of processor utilization.

is generated and then FCS can not match with the concept of processor utilization.

*<sup>i</sup>* . Let *<sup>w</sup>*(*n<sup>s</sup>*

*<sup>i</sup>* to *<sup>n</sup><sup>s</sup>*

*<sup>i</sup>*) = <sup>∅</sup>, *<sup>n</sup><sup>s</sup>*

merging", we formulate one task merging as *clss*+1(*i*) <sup>←</sup> *clss*(*i*) ∪ {*n<sup>s</sup>*

*<sup>i</sup>* and *<sup>n</sup><sup>s</sup>*

among any two tasks in the cluster exists and task execution order is unique.

*cls* as *clss*(*i*). If *<sup>n</sup><sup>s</sup>*

Acyclic Graph (DAG), which is one of task graphs. Let *G<sup>s</sup>*

for Identical Processor Utilization to Heterogeneous Systems

*<sup>i</sup>* to *<sup>n</sup><sup>s</sup>*

*<sup>i</sup>* arrives at the processor which will execute *<sup>n</sup><sup>s</sup>*

, we denote such a relation as *n<sup>s</sup>*

*<sup>i</sup>* , and *suc*(*n<sup>s</sup>*

**3. Assumed model**

**3.1 Job model**

steps, and *V<sup>s</sup>*

*ns*

*ns <sup>i</sup>* and *<sup>n</sup><sup>s</sup> j*

satisfied.

from *n<sup>s</sup>*

steps. An *i*-th task is denoted as *n<sup>s</sup>*

called START task, and if *suc*(*n<sup>s</sup>*

We denote the *i*-th cluster in *V<sup>s</sup>*

the communication between *n<sup>s</sup>*

transferring data from *n<sup>s</sup>*

predecessors of *n<sup>s</sup>*

*<sup>i</sup>* to *<sup>n</sup><sup>s</sup> j*

**3.2 Task clustering**

**3.3 System model**

and direction of data transfer from *n<sup>s</sup>*

its predecessor tasks arrive. For instance, *e<sup>s</sup>*

methods such as CHP(C. Boeres, 2004), triplet(B. Cirou, 2001), and FCS(S. Chingchit, 1999), almost all processors may be assigned to clusters because they try to achieve the maximum task parallelism to obtain the minimized schedule length. Thus, the third objective function may not be achieved by those cluster assignment strategies.

In this chapter, we propose a method for deriving the lower bound of the cluster size in heterogeneous distributed systems and a task clustering algorithm. From results of experimental simulations, we discuss the applicability of the proposal to obtain better processor utilization.

The remainder of this chapter is organized as follows. Sec. 2 presents other conventional approaches related to task clustering for heterogeneous distributed systems, and sec. 3 presents our assumed model, then the lower bound of the cluster size is derived in sec. 4. Sec. 5 presents a task clustering algorithm which adopts the lower bound shown in sec. 4. Experimental results are shown in sec. 6, and finally we present conclusion and future works in sec. 7.
