**2.3 Workflow based resource allocation**

Many recent efforts have focused on scheduling of workflows in Grids. [16] presents a QoSbased workflow management system and a scheduling algorithm that match workflow applications with resources using event condition action rules. Pandey and Buyya have worked on scheduling scientific workflows using various approaches in the context of their

Resource Management for Data Intensive Tasks on Grids 53

Multimedia encoding is required for applying a specific codec to a video [27]. Conventional methods use a single system for the conversion. The compression of the raw captured video data into an MPEG-1 or MPEG-2 data stream can take an enormous amount of time, which increases with higher quality conversions. Depending on the quality level of the video capture, the data required for a typical one hour tape can create over 10 GB of video data, which needs to be compressed to approximately 650 MB to fit on a VideoCD. The compression stage is CPU intensive, since it matches all parts of adjacent video frames looking for similar sub-pictures, and then creates an MPEG data stream encoding the frames. At higher quality levels, more data is initially captured and enhanced algorithms, which consume more time, are used. The compression process can take a day or more, depending on the quality level and the speed of the system being used. For commercial DVD quality, conversions are typically done by a service company that has developed higher quality conversion algorithms which may take considerable amount of time to

execute. Grid technology is ideal for improving the process of video conversion.

1. Once a job starts executing on a Grid node, it cannot be pre-empted.

2. Only one job can be executed on a Grid node at a time.

In this research we have focused on the problem of allocating resources for a given bag of PBDT tasks. The bag-of-tasks consists of a set of independent PBDT tasks all of which must be executed successfully. The Grid system consists of n nodes. Collectively, these n nodes are represented by a set Δ. Each individual PBDT task in the given bag-of-tasks may be divided into a number of sub-tasks called *jobs* which can be executed in parallel, independent of each other. As discussed, PBDT tasks are resource-intensive tasks that use a large amount of computing resources and communication bandwidth. Usually, if a node starts processing a PBDT task, pre-emption of this task is counter-productive as it wastes the effort of transferring the raw-data file to the concerned node. Also, due to excessive demand of computing power, a node is assumed to handle the processing of only one PBDT task at a time. In this research we have made the following two assumptions regarding the running

For the cost analysis of the purposed architecture, we have measured *cost* by the time (in seconds) spent in performing a particular communication or computation job. We have chosen one megabyte as a unit of data. When a particular node i accesses data in node j, the communication cost of transporting a data unit from node i to node j is designated by d(i,j). It is assumed that the communication costs are metrics, meaning that they are non-negative, represented by Cpm which is the cost of processing a unit of data. Set of all the nodes in the system is represented by **Δ**. To represent the computing costs, a vector of |**Δ|** dimensions denoted by [Cp] is used which holds the values of the computing costs of all the nodes in the system. A matrix [Cc] of dimensions |**Δ|** x |**Δ|** denotes the values of the communication costs between all the nodes in the system. The objective of this research is to assign resources to tasks in such a manner that the total cost in executing the given bag-of-tasks is minimized; where the total cost is defined as the total time spent by a task at all the resources it has used during its execution. Total cost indicates the total resource usage for

executing a task and hence the minimization of the total cost is a system objective.

**3.2 Multimedia encoding** 

**4. Overall system architecture** 

of constituent jobs of a task on Grid nodes.

GridBus workflow management effort [11] [23]. [23] has developed an architecture to specify and to schedule workflows under resource allocation constraints. Also, many of the data Grid projects that support distributed processing of remote data have proposed workflow scheduling [11] [21].
