**2. Related works**

2 Will-be-set-by-IN-TECH

2. **Heterogeneity**: The organizations that form part of VO may have different resources such as hardware, operating system and network bandwidth. Accordingly, VO is considered as

3. **Dynamism**: In the grid system, organizations or their resources can join or leave VO

Grid systems provide the ability to perform higher throughput computing by usage of many networked computers to distribute process execution across a parallel infrastructure. Nowadays, organizations around the world are utilizing grid computing in such diverse areas as collaborative scientific research, drug discovery, financial risk analysis, product design and

Interestingly, task scheduling in grid has been paid a lot of attention over the past few years. The important goal of task scheduling is to efficiently allocate tasks as fast as possible to avialable resources in a global, heterogeneous and dynamic environment. Kousalya pointed out that the grid scheduling consists of three stages: First, resource discovery and filtering. Second, resource selection and scheduling according to certain objective. Third, task submission. The third stage includes the file staging and cleanup (Kousalya & Balasubramanie, 2009; 2008). High performance computing and high throughput computing are the two different goals of grid scheduling algorithm. The main aim of the high performance computing is to minimize the execution time of the application. Allocation of resources to a large number of tasks in grid computing environment presents more difficulty

The scheduling problem is well known NP-complete (Garey & Johnson, 1979). It is a combinatorial optimization problem by nature. Many algorithms are proposed for task scheduling in grid environments. In general, the existing heuristic mapping can be divided

First, online mode, where the scheduler is always in ready mode. Whenever a new task arrives to the scheduler, it is immediately allocated to one of the existing resources required by that

Second, batch mode, the tasks and resources are collected and mapped at prescheduled time. This mode takes better decision because the scheduler knows the full details of the available tasks and resources. This chapter proposes a heuristic algorithm that falls in batch

However, this chapter studies the problem of minimizing makespan, i.e., the total execution time of the schedule in grid environment. The proposed Mutation-based Simulated Annealing (MSA) algorithm is proved to have high performance computing scheduling algorithm. MSA

algorithm will be studied for random and Expected Time to Compute (ETC) Models.

environment customization to users.

a collection of heterogeneous resources of organizations.

3−D seismic imaging in the oil and gas industry (Dimitri et al., 2005).

task. Each task is considered only once for matching and scheduling.

depending on their requirements or functional status.

than in conventional computational environments.

into two categories (Jinquan et al., 2005):

mode Jinquan et al. (2005).

computing resources and have to submit their service request at just one point of entry to the grid system. Foster introduced the concept of Virtual Organization (VO) (Foster et al., 2001). He defines VO as a *"dynamic collection of multiple organizations providing flexible, secure, coordinated resource sharing"*. Figure 1 shows three actual organizations with both computational and data resources to share across organizational boundaries. Moreover, the same figure forms two VOs, A and B, each of them can have access to a subset of resources in each of the organizations (Moallem, 2009). Virtualization is a mechanism that improves the usability of grid computing systems by providing

One salient issue in grid is to design efficient schedulers, which will be used as a part of middleware services to provide efficient planning of users' tasks to grid resources. Various scheduling approaches that were suggested in classical parallel systems literature are adopted for the grid systems with appropriate modifications. Although these modifications made them suitable for execution in grid environment, these approaches failed to deliver on the performance factor. For this reason, Genetic Algorithm (GA) and Simulated Annealing (SA) algorithm, among others, used to solve difficulties of task scheduling in grid environment. They gave reasonable solutions comparing with classical scheduling algorithms. GA solutions for grid scheduling are addressed in several works (Abraham et al., 2000; Carretero & Xhafa, 2006; Abraham et al., 2008; Martino & Mililotti, 2002; Martino, 2003; Y. Gao et al., 2005). These studies ignored how to speed up convergence and shorten the search time of GA.

Furthermore, SA algorithm was studied in previous works Fidanova (2006); Manal et al. (2011). These works show important results and high quality solutions indicating that SA is a powerful technique and can be used to solve grid scheduling problem. Moreover, Jadaan, in Jadaan et al. (2009; 2010; 2011), exposed the importance of rank in GA.

The authors Wook& Park (2005) proved that both GA and SA algorithms have complementary strengths and weaknesses, accordingly, they proposed a new SA-selection to enhance GA performance to solve combinatorial optimization problem. The population size which they use is big that makes time consumed by algorithm large, specially when problem size increases. While Kazem tried to solve a static task scheduling problem in grid computing using a modified SA (Kazem et al., 2008). Prado propose a fuzzy scheduler obtained by means of evolving a fuzzy scheduler to improve the overall response time for the entire workflow (Prado et al., 2009). Rules of this evolutionary fuzzy system is obtained using genetic learning process based on Pittsburgh approach.

Wael proposed an algorithm that minimizes makespan, flowtime and time to release as well as it maximizes reliability of grid resources (Wael & Ramachandram, 2011). It takes transmission time and waiting time in resource queue into account. It uses stochastic universal sampling selection and single exchange mutation to outperform other GAs.

Lee et al. (2011) provided Hierarchical Load Balanced Algorithm (HLBA) for Grid environment. He used the system load as a parameter in determining a balance threshold. the scheduler adapts the balance threshold dynamically when the system load changes. The loads of resource are CPU utilization, network utilization and memory utilization.

P.K. Suri & Singh Manpreet (2010) proposed a Dynamic Load Balancing Algorithm (DLBA) which performs an intra-cluster and inter cluster load balancing. Intra-cluster load balancing is performed depending on the Cluster Manager (CM). CM decides whether to start the local balancing based on the current workload of the cluster which is estimated from the resources below it. Inter-cluster load balancing is done when some CMs fail to balance their workload. In this case, the tasks of the overloaded cluster will be transferred to another cluster which is underloaded. In order to check the cluster overloading, they introduced a balanced threshold. If the load of cluster is larger than balanced threshold, load balancing will be executed. The value of balanced threshold is fixed. Therefore, the balanced threshold is not appropriate for the dynamic characteristics in the grid system.

Chang et al. (2009) introduced Balanced Ant Colony Optimization algorithm (BACO) to choose suitable resources to execute tasks according to resources status. The pheromone

<sup>0</sup> <sup>100</sup> <sup>200</sup> <sup>300</sup> <sup>400</sup> <sup>500</sup> <sup>600</sup> <sup>700</sup> <sup>800</sup> <sup>900</sup> <sup>1000</sup> <sup>350</sup>

(a) Makespan results with one point crossover and

<sup>0</sup> <sup>100</sup> <sup>200</sup> <sup>300</sup> <sup>400</sup> <sup>500</sup> <sup>600</sup> <sup>700</sup> <sup>800</sup> <sup>900</sup> <sup>1000</sup> <sup>319</sup>

(c) Makespan results with Random-MCT and one

genNo

algorithms RGSGCS and MSA, on the another hand.

Fig. 3. The relationship among RM/ETC model and the RGSGCS/MSA

genNo

All genes exchange Two genes exchange

All genes exchange Two genes exchange <sup>0</sup> <sup>100</sup> <sup>200</sup> <sup>300</sup> <sup>400</sup> <sup>500</sup> <sup>600</sup> <sup>700</sup> <sup>800</sup> <sup>900</sup> <sup>1000</sup> <sup>400</sup>

(b) Makespan results with two points crossover

<sup>0</sup> <sup>100</sup> <sup>200</sup> <sup>300</sup> <sup>400</sup> <sup>500</sup> <sup>600</sup> <sup>700</sup> <sup>800</sup> <sup>900</sup> <sup>1000</sup> <sup>318</sup>

(d) Makespan results with Random-MCT and two

genNo

and random initialization of GA

genNo

All genes exchange Two genes exchange

All genes exchange Two genes exchange

points crossover

Fig. 2. Makespan results of experiment 8 (mentioned in table 3) which consists of 1000 tasks and 50 resources with one/two point(s) crossover, with/without Random-MCT, and two/all

high. Figure 3 shows the relationships among RM and ETC models, on one hand, and both

 

For any problem formulation is fundamental issue which help to understand the problem at hand. This chapter considers a grid with sufficient arriving tasks to GA for scheduling. Let *N* be the total number of tasks to be scheduled and *Wi*, where *i* = 1, 2, ··· , *N*, be the workload of

Makespan

Makespan

Task Scheduling in Grid Environment Using Simulated Annealing and Genetic Algorithm 93

point crossover

genes exchanged.

**4. Problem formulation**

Makespan

random initialization of GA

Makespan

update functions perform balancing to the system load. While local pheromone update function updates the status of the selected resource after tasks assignment. Global pheromone update the status of each resource for all tasks after completion of all tasks.

In this chapter MSA maintains two solutions at a time, and it uses single exchange mutation operator as well as random-MCT heuristic (demonstrated in subsection 5.2).

Previous works, namely, Wael et al. (2009c;b;a) considered the minimization of the makespan using GA based on Rank Roullete Wheel Selection (RRWSGA). They use standard deviation of fitness function as a termination condition of the algorithm. The aim of using standard deviation is to shorten the the time consumed by the algorithm with taking into account reasonable performance of Computing resources (97%).

Yin introduced GA which used standard deviation less than (0.1) as stopping criterion to limit the number of iterations of GA (Yin et al., 2007). This algorithm has drawbacks such as low quality solutions ( almost same as low quality solutions of standard GA ), generating initialization population randomly (even though the time consumed by algorithm is small comparing with standard GA), and mutation depends on exchange of every gene in the chromosome. This mutation will destroy the good information in subsequent chromosomes in next generations. In order to illustrate the usefulness of this work, next section explains the motivation behind it.
