**3.1 Job model**

2 Will-be-set-by-IN-TECH

methods such as CHP(C. Boeres, 2004), triplet(B. Cirou, 2001), and FCS(S. Chingchit, 1999), almost all processors may be assigned to clusters because they try to achieve the maximum task parallelism to obtain the minimized schedule length. Thus, the third objective function

In this chapter, we propose a method for deriving the lower bound of the cluster size in heterogeneous distributed systems and a task clustering algorithm. From results of experimental simulations, we discuss the applicability of the proposal to obtain better

The remainder of this chapter is organized as follows. Sec. 2 presents other conventional approaches related to task clustering for heterogeneous distributed systems, and sec. 3 presents our assumed model, then the lower bound of the cluster size is derived in sec. 4. Sec. 5 presents a task clustering algorithm which adopts the lower bound shown in sec. 4. Experimental results are shown in sec. 6, and finally we present conclusion and future works

In a distributed environment, where each processor is completely connected, task clustering(A. Gerasoulis, 1992; T. Yang, 1994; J. C. Liou, 1996) has been known as one of task scheduling methods. In a task clustering, two or more tasks are merged into one cluster by which communication among them is localized, so that each cluster becomes one assignment unit to a processor. As a result, the number of clusters becomes that of required processors. On the other hand, if we try to perform a task clustering in a heterogeneous distributed system, the objective is to find an optimal processor assignment, i.e., which processor should be assigned to the cluster generated by a task clustering. Furthermore, since the processing time and the data communication time depend on each assigned processor's performance, each cluster should be generated with taking that issue into account. As related works for task clustering in heterogeneous distributed systems, CHP(C. Boeres, 2004), Triplet(B. Cirou,

CHP(C. Boeres, 2004) firstly assumes that "virtual identical processors", whose processing speed is the minimum among the given set of processors. Then CHP performs task clustering to generate a set of clusters. In the processor assignment phase, the cluster which can be scheduled in earliest time is selected, while the processor which has possibility to make the cluster's completion time earliest among other processors is selected. Then the cluster is assigned to the selected processor. Such a procedure is iterated until every cluster is assigned to a processor. In CHP algorithm, an unassigned processor can be selected as a next assignment target because it has no waiting time. Thus, each cluster is assigned to different processor, so that many processors are required for execution and therefore CHP can not lead

In Triplet algorithm(B. Cirou, 2001), task groups, each of which consists of three tasks, named as "triplet" according to data size to be transferred among tasks and out degree of each task. Then a cluster is generated by merging two triplets according to its execution time and data transfer time on the fastest processor and the slowest processor. On the other hand, each processor is grouped as a function of its processing speed and communication bandwidth, so that several processor groups are generated. As a final stage, each cluster is assigned to a processor groups according to the processor group's load. The processor assignment policy in

may not be achieved by those cluster assignment strategies.

2001), and FCS(S. Chingchit, 1999) have been known.

processor utilization.

**2. Related works**

to the processor utilization.

in sec. 7.

We assume a job to be executed among distributed processor elements (PEs) is a Directed Acyclic Graph (DAG), which is one of task graphs. Let *G<sup>s</sup> cls* = (*Vs*, *Es*, *<sup>V</sup><sup>s</sup> cls*) be the DAG, where *s* is the number of task merging steps(described in sec. 3.2), *Vs* is the set of tasks after *s* task merging steps, *Es* is the set of edges (data communications among tasks) after *s* task merging steps, and *V<sup>s</sup> cls* is the set of clusters which consists of one or more tasks after *s* task merging steps. An *i*-th task is denoted as *n<sup>s</sup> <sup>i</sup>* . Let *<sup>w</sup>*(*n<sup>s</sup> <sup>i</sup>*) be a size of *<sup>n</sup><sup>s</sup> <sup>i</sup>* , i.e., *<sup>w</sup>*(*n<sup>s</sup> <sup>i</sup>*) is the sum of unit times taken for being processed by the reference processor element. We define data dependency and direction of data transfer from *n<sup>s</sup> <sup>i</sup>* to *<sup>n</sup><sup>s</sup> <sup>j</sup>* as *<sup>e</sup><sup>s</sup> i*,*j* . And *c*(*e<sup>s</sup> i*,*j* ) is the sum of unit times taken for transferring data from *n<sup>s</sup> <sup>i</sup>* to *<sup>n</sup><sup>s</sup> <sup>j</sup>* over the reference communication link.

One constraint imposed by a DAG is that a task can not be started execution until all data from its predecessor tasks arrive. For instance, *e<sup>s</sup> <sup>i</sup>*,*<sup>j</sup>* means that *<sup>n</sup><sup>s</sup> <sup>j</sup>* can not be started until data from *ns <sup>i</sup>* arrives at the processor which will execute *<sup>n</sup><sup>s</sup> j* . And let *pred*(*n<sup>s</sup> <sup>i</sup>*) be the set of immediate predecessors of *n<sup>s</sup> <sup>i</sup>* , and *suc*(*n<sup>s</sup> <sup>i</sup>*) be the set of immediate successors of *<sup>n</sup><sup>s</sup> <sup>i</sup>* . If *pred*(*n<sup>s</sup> <sup>i</sup>*) = <sup>∅</sup>, *<sup>n</sup><sup>s</sup> <sup>i</sup>* is called START task, and if *suc*(*n<sup>s</sup> <sup>i</sup>*) = <sup>∅</sup>, *<sup>n</sup><sup>s</sup> <sup>i</sup>* is called END task. If there are one or more paths from *n<sup>s</sup> <sup>i</sup>* to *<sup>n</sup><sup>s</sup> j* , we denote such a relation as *n<sup>s</sup> <sup>i</sup>* <sup>≺</sup> *<sup>n</sup><sup>s</sup> j* .
