**1.3.4 GOP-level parallelism**

The coarsest grained parallelism is at the GOP level. H.264 can be parallelized at the GOPlevel by defining a GOP size of *N* frames and assigning each GOP to a processor. GOP-level

(a) (b)

Task-level decomposition requires significant communication between the different tasks in order to move the data from one processing stage to the other, and this may become the bottleneck. This overhead can be reduced using double buffering and blocking to maintain the piece of data that is currently being processed in cache or local memory. Additionally, synchronization is required for activating the different modules at the right time. This

The main drawbacks, however, of task-level decomposition are load balancing and scalability. Balancing the load is difficult because the time to execute each task is not known a priori and depends on the data being processed. In a task-level pipeline the executiontime for each stage is not constant and some stage can block the processing of the others. Scalability is also difficult to achieve. If the application requires higher performance, for example by going from standard to high definition resolution, it is necessary to reimplement the task partitioning which is a complex task and at some point it could not provide the required performance for high throughput demands. Finally from the software optimization perspective the task-level decomposition requires that each task/processor implements a specific software optimization strategy, i.e., the code for each processor is different and

In data-level decomposition the work (data) is divided into smaller parts and each assigned to a different processor, as depicted in Fig.10b. Each processor runs the same program but on different (multiple) data elements (SPMD). In H.264 data decomposition can be applied at different levels of the data structure (see Fig.11), which goes down from Group of Pictures (GOP), to frames, slices, MBs, and finally to variable sized pixel blocks. Data-level parallelism can be exploited at each level of the data structure, each one having different

The coarsest grained parallelism is at the GOP level. H.264 can be parallelized at the GOPlevel by defining a GOP size of *N* frames and assigning each GOP to a processor. GOP-level

Fig. 10. H.264 parallelization techniques. **a** Task-level decomposition. **b** Data-level

should be performed by a control processor and adds significant overhead.

constraints and requiring different parallelization methodologies.

decomposition.

requires different optimizations.

**1.3.3 Data-level decomposition** 

**1.3.4 GOP-level parallelism** 

Fig. 11. GOP Structure of H.264 Video

parallelism requires a lot of memory for storing all the frames, and therefore this technique maps well to multicomputers in which each processing node has a lot of computational and memory resources. However, parallelization at the GOP-level results in a very high latency that cannot be tolerated in some applications. This scheme is therefore not well suited for multicore architectures, in which the memory is shared by all the processors, because of cache pollution.
