**3.1 Parallel Wyner-Ziv**

The DVC framework is based on displacing the complexity from encoders to decoders. However, reducing the complexity of decoders as much as possible is desirable. In traditional feedback-based WZ architectures (Aaron et al., 2002), the rate control is performed at the decoder and is controlled by means of the feedback channel; this is the

Mobile Video Communications Based on Fast DVC to H.264 Transcoding 19

communications. However, H.263 offers lower performance than other codecs based on H.264/AVC and they did not exploit the correlation between the WZ MVs and the traditional ME successfully and only used them to determine the starting centre of the ME process.

In our previous work, we proposed the first transcoding architecture from WZ to H.264/AVC (Martínez et al., 2009). This work introduced an improvement to accelerate the H.264/AVC ME stage using the Motion Vectors (MV) gathered in the WZ decoding stage. Nevertheless, this transcoder is not flexible since it only applies the ME improvement for transcoding from WZ frames to P frames. In addition, it only allows transcoding from WZ GOPs of length 2 to IPIP H.264/AVC GOP patterns, so it does not use practical patterns due to the high bit rate generated neither flexible. Furthermore, this work used a less realistic WZ implementation. For this reason, the approach presented in this chapter improves this part by introducing a better and more realistic WZ implementation based on the VISNET-II codec (Ascenso et al., 2010), which implements lossy key frame coding, on-line correlation noisy modeling and uses

The main task of a transcoder is to convert a source coding format into another one. In the case of mobile video communications, the transcoding process should be done as fast as possible. In addition, a flexible transcoder should take into account the conversion between the input and the output patterns. In order to provide a flexible and fast transcoding

This architecture is composed of a Wyner-Ziv decoder and a H.264/AVC encoder with several modifications or extra modules. In particular, the WZ decoder is redesigned to parallelize the decoding process and the black modules in Figure 3 have been included or modified to obtain a faster H.264/AVC encoding. Details will be given in the following

WZ video coding accumulates the majority of the complexity on the decoder side. If you study each module inside the decoder scheme (Figure 1), you discover that most of this complexity is concentrated in the Channel Decoder module (Brites et al., 2008). This module receives successive chunks of parity bits. Then, the quantized symbol stream associated to each bitplane is obtained in an iterative process, which is based on the residual statistics calculated by the CNM. This procedure stops when a condition based on probabilities is satisfied. Obviously, the complexity of the decoder increases when more bitplanes (in the pixel domain) or coefficient bands (for the transform domain) are decoded. At this point, as a first stage on the transcoding process, it is proposed a WZ decoding architecture which distributes decoding complexity across several processing units. The proposed architecture is shown in Figure 3. The approach is a flexible and scalable architecture which distributes the parallel decoding between two parallelism levels: GOPs and frames. First, the input bitstream composed of K frames is stored in a K-frame buffer. Then, at the first parallelism level, the WZ frames inside two K frames delimit a GOP structure, and therefore each GOP

a more realistic procedure call at the decoder for the stopping criterion.

**4. Transcoding for mobile to mobile communications** 

architecture, it is proposed the architecture displayed in Figure 3.

**4.1 Introduction** 

subsections.

**4.2 Parallelization of WZ decoding** 

main reason for the decoder complexity, as once a parity chunk arrives at the decoder, the turbo decoding algorithm (one of the most computationally-demanding tasks (Brites et al., 2008) is called. Taking this fact into account, there are several approaches which try to reduce the complexity of the decoder, which usually induces a rate distortion penalty. However, due to technological advances, new parallel hardware is beginning to be introduced into practical video coding solutions. These new features of computers offer a new challenge to the research community with regards to integrating their algorithms into a parallel framework; this opens a new door in multimedia research. It is true that, with regards to traditional standards, several approaches have been proposed since multicores appeared on the market, but this chapter focuses on parallel computing applied to the WZ framework.

Having said this, in 2010 several different parallel solutions for WZ were proposed. In particular, in (Oh et al., 2010) Oh et al. proposed a WZ parallel execution carried out by Graphic Processing Units (GPUs). In this proposal, the authors focus on designing a parallel distribution for a Slepian-Wolf decoder based on rate Adaptative Low Density Check Code (LDPC) with Accumulator (LDPCA). LDPC codes are composed of many bit-nodes which do not have many dependencies between each node, so they propose a parallel execution in three kernels (steps): i) kernels for check node calculations, ii) kernels for bit node calculations, and iii) kernels for termination condition calculations. In a GPU they achieve a decoding 4~5 times faster for QCIF and 15~20 for CIF. On the other hand, in (Momcilovic et al., 2010) Momcilovic et al. proposed a WZ LDPC parallel decoding based on multicore processors. In this work, the authors parallelize several LDPC approaches. On a Quad-Core machine, they achieve a speedup of about 3.5. Both approaches propose low-level parallelism for a particular LDPC/LDPCA implementation.

This chapter presents a WZ to H.264/AVC transcoder which includes a higher-level parallel WZ video decoding algorithm implemented on a multicore system. The reference WZ decoding algorithm is adapted to a multicore architecture, which divides each frame into several slices and distributes the work among available cores. In addition, the proposed algorithm is scalable because it does not depend on the hardware architecture, the number of cores or even on the implementation of the internal Wyner-Ziv decoder. Therefore, the time reduction can be increased simply by increasing the number of cores, as technology advances. Furthermore, the proposed method can also be applied to WZ architectures with or without a feedback channel (Sheng et al., 2010).

#### **3.2 WZ to H.26x transcoding**

Nowadays, mobile-to-mobile video communications are getting more and more common. Transcoding from a low cost encoder format to a low cost decoder provides a practical solution for these types of communications. Although H.264/AVC has been included in multiple transcoding architectures from other coding formats (such as MPEG-2 to H.264/AVC (Fernandez-Escribano et al., 2007, 2008) or even homogeneous H.264/AVC (De Cock et al., 2010), proposals in WZ to H.26x to support mobile communications are rather recent and there are only a few approaches so far.

In 2008, the first WZ transcoder architecture was introduced by Peixoto et al. in (Peixoto et al., 2010). In this work, they presented a WZ to H.263 transcoder for mobile video communications. However, H.263 offers lower performance than other codecs based on H.264/AVC and they did not exploit the correlation between the WZ MVs and the traditional ME successfully and only used them to determine the starting centre of the ME process.

In our previous work, we proposed the first transcoding architecture from WZ to H.264/AVC (Martínez et al., 2009). This work introduced an improvement to accelerate the H.264/AVC ME stage using the Motion Vectors (MV) gathered in the WZ decoding stage. Nevertheless, this transcoder is not flexible since it only applies the ME improvement for transcoding from WZ frames to P frames. In addition, it only allows transcoding from WZ GOPs of length 2 to IPIP H.264/AVC GOP patterns, so it does not use practical patterns due to the high bit rate generated neither flexible. Furthermore, this work used a less realistic WZ implementation. For this reason, the approach presented in this chapter improves this part by introducing a better and more realistic WZ implementation based on the VISNET-II codec (Ascenso et al., 2010), which implements lossy key frame coding, on-line correlation noisy modeling and uses a more realistic procedure call at the decoder for the stopping criterion.
