**1.3.5 Frame-level parallelism for independent frames**

At frame-level in fig.12, the input video stream is divided in GOPs. Since GOPs are usually made independent from each other, it is possible to develop a parallel architecture where a controller is in charge of distributing the GOPs among the available cores.In a sequence of I-B-B-P frames inside a GOP, some frames are used as reference for other frames (like I and P frames) but some frames (the B frames in this case) might not. Thus in this case the B frames can be processed in parallel. To do so, a control/central processor assigns independent frames to different processors.

Fig. 12. Frame-level Parallelism

Frame-level parallelism has scalability problems due to the fact that usually there are no more than two or three B frames between P frames. The advantages of this model are clear:

H.264 Motion Estimation and Applications 75

encoder can decide that the deblocking filter has to be applied across slice boundaries. This greatly reduces the speedup achieved by slice level parallelism. Another problem is load balancing wherein the slices are created with the same number of MBs, and thus can result in an imbalance at the decoder because some slices are decoded faster than others

There are two ways of exploiting MB-level parallelism: in the spatial domain and/or in the temporal domain. In the spatial domain MB-level parallelism can be exploited if all the intra-frame dependencies are satisfied. In the temporal domainMB-level parallelism can be exploited if, in addition to the intra-dependencies, interframe dependencies are satisfied.

Usually MBs in a slice are processed in scan order, which means starting from the top left corner of the frame andmoving to the right, row after row. To exploit parallelism between MBs inside a frame it is necessary to take into account the dependencies between them. In H.264, motion vector prediction, intra prediction, and the deblocking filter use data from neighboring MBs defining a structured set of dependencies. These dependencies are shown

Fig. 14. 2D-Wave approach for exploiting MB parallelism in the spatial domain. The *arrows* 

MBs can be processed out of scan order provided these dependencies are satisfied. Processing MBs in a diagonal wavefront manner satisfies all the dependencies and at the same time allows to exploit parallelism between MBs. We refer to this parallelization technique as 2D-

Fig.14 depicts an example for a 5×5 MBs image (80×80 pixels). At time slot T7 three independent MBs can be processed: MB (4,1), MB (2,2) and MB (0,3). The figure also shows

depending on the content of the slice.

**1.3.7 Macroblock level parallelism** 

in Fig. 14.

indicate dependencies.

Wave.

**1.3.8 Macroblock-level parallelism in the spatial domain** 

PSNR and bit-rate do not change and it is easy to implement, since GOPs' independency is assured with minimal changes in the code. However, the memory consumption significantly increases, since each encoder must have its own Decoded Picture Buffer (DPB), where all GOP's references are stored. Moreover, real-time encoding is hardly implemented using this approach, making it more suitable for video storage purposes.However, the main disadvantage of frame-level parallelism is that, unlike previous video standards, in H.264 B frames can be used as reference [24]. In such a case, if the decoder wants to exploit framelevel parallelism, the encoder cannot use B frames as reference. This might increase the bitrate, but more importantly, encoding and decoding are usually completely separated and there is no way for a decoder to enforce its preferences to the encoder.
