**3. Processable Bulk Data Transfer (PBDT) tasks**

PBDT tasks require bulk transfer of processed data. Such data transfers are typical in multimedia systems and HEP experiments. For example in [1], 650MB of data was transferred on an average from a source to a set of sink nodes. High communication and computing times in PBDT tasks effectively amortizes the overhead of the LP-based algorithm used for optimization of the system performance. A PBDT task is characterized by the following three characteristics.


$$\mathbf{L} = \sum\_{\mathbf{i}=\mathbf{1}}^{\mathbf{k}} \mathbf{L}\_{\mathbf{i}} \tag{1}$$

Then for a PBDT task, the length of the required processed file is given by

$$\mathbf{L}^{\sim} = \boldsymbol{\Sigma}\_{\mathbf{l}=\mathbf{1}}^{\mathbf{k}} \boldsymbol{\varepsilon} \mathbf{L}\_{\mathbf{l}} \tag{2}$$

where ε*i* is a processing factor which is the ratio of the size of the processed partition and that of the original partition.

PBDT tasks are increasingly becoming important. They are used in various multimedia, high-energy physics and medical applications. The following section explains two of the practical examples of PBDT tasks.

### **3.1 Particle physics data grids**

Particle Physics Data Grids (PPDG) is a colloboratory project concerned with providing next-generation infrastructure for high-energy and nuclear physics experiments. One of the important requirements of PPDG is to deal with the enormous amount of data that is created during high-energy physics experiments that must be analyzed by large groups of specialists. Data storage, replication, job scheduling, resource management and security components of the Grid must be integrated for use by the physics collaborators. Processing these tasks require huge computing capabilities and fast communication capabilities. Grid computing is used for processing PPDG tasks that can be classified as a PBDT task.

### **3.2 Multimedia encoding**

52 Grid Computing – Technology and Applications, Widespread Coverage and New Horizons

GridBus workflow management effort [11] [23]. [23] has developed an architecture to specify and to schedule workflows under resource allocation constraints. Also, many of the data Grid projects that support distributed processing of remote data have proposed

PBDT tasks require bulk transfer of processed data. Such data transfers are typical in multimedia systems and HEP experiments. For example in [1], 650MB of data was transferred on an average from a source to a set of sink nodes. High communication and computing times in PBDT tasks effectively amortizes the overhead of the LP-based algorithm used for optimization of the system performance. A PBDT task is characterized by

1. The task involves large data transfer that has to be processed in some way before it can be used at the sink nodes. The large amount of data involved in the PBDT differentiates it from the compute intensive tasks where data usually consists of is only the parameters of the remote functions invoked. This implies that the data communication

3. The unprocessed raw file is such that it can be either processed as a whole or be divided into multiple partitions. If divided into partitions, each partition can be processed independently. The resultant processed partitions can later be combined to generate the required processed file. Consider a source file F, of size L. F can be partitioned into k

> L = ∑ L� �

 L~ = ∑ �L� �

where ε*i* is a processing factor which is the ratio of the size of the processed partition and

PBDT tasks are increasingly becoming important. They are used in various multimedia, high-energy physics and medical applications. The following section explains two of the

Particle Physics Data Grids (PPDG) is a colloboratory project concerned with providing next-generation infrastructure for high-energy and nuclear physics experiments. One of the important requirements of PPDG is to deal with the enormous amount of data that is created during high-energy physics experiments that must be analyzed by large groups of specialists. Data storage, replication, job scheduling, resource management and security components of the Grid must be integrated for use by the physics collaborators. Processing these tasks require huge computing capabilities and fast communication capabilities. Grid

computing is used for processing PPDG tasks that can be classified as a PBDT task.

��� (1)

��� (2)

workflow scheduling [11] [21].

the following three characteristics.

that of the original partition.

practical examples of PBDT tasks.

**3.1 Particle physics data grids** 

**3. Processable Bulk Data Transfer (PBDT) tasks** 

costs cannot be ignored while scheduling a PBDT task.

disjoint partitions, with data sizes of {L1, L2…. Lk}, such that

Then for a PBDT task, the length of the required processed file is given by

2. Cost of data processing is proportional to the length of the raw data file.

Multimedia encoding is required for applying a specific codec to a video [27]. Conventional methods use a single system for the conversion. The compression of the raw captured video data into an MPEG-1 or MPEG-2 data stream can take an enormous amount of time, which increases with higher quality conversions. Depending on the quality level of the video capture, the data required for a typical one hour tape can create over 10 GB of video data, which needs to be compressed to approximately 650 MB to fit on a VideoCD. The compression stage is CPU intensive, since it matches all parts of adjacent video frames looking for similar sub-pictures, and then creates an MPEG data stream encoding the frames. At higher quality levels, more data is initially captured and enhanced algorithms, which consume more time, are used. The compression process can take a day or more, depending on the quality level and the speed of the system being used. For commercial DVD quality, conversions are typically done by a service company that has developed higher quality conversion algorithms which may take considerable amount of time to execute. Grid technology is ideal for improving the process of video conversion.
