**5.1 GMC and LMC based video coding**

The aim of this part is to illustrate how video compression performances can be improved by utilizing adaptive GMC/LMC mode determination. GMC/LMC based motion compensation mode selection approach in MPEG-4 is given [1], [2]. Global motion estimation and compensation is used in MPEG-4 advanced simple profiles (ASP) to remove the residual information of global motion. Global motion compensation (GMC) is a new coding technology for video compression in MPEG-4 standard. By extracting camera motion, MPEG-4 coder can remove the global motion redundancy from the video. In MPEG-4 ASP, each macro block (MB) can be selected to be coded use GMC or local motion compensation (LMC) adaptively during mode determination. Intuitively, some types of motion, e.g., panning, zooming or rotation; could be described using one set of motion parameters for the entire VOP (video object plane). For example, each MB could potentially have the exact same MV for the panning. GMC allows the encoder to pass one set of motion parameters in the VOP header to describe the motion of all MBs. Additionally MPEG-4 allows each MB to specify its own MV to be used in place of the global MV.

In MPEG-4 Advanced simple profile, the main target of Global Motion Compensation (GMC) is to encode the global motion in a VOP (video object plane) using a small number of parameters. Each MB can be predicted either from the previous VOP by global motion compensation (GMC) using warping parameters or from the previous VOP by local motion compensation (LMC) using local motion vectors as in the classical scheme. The selection is made based on which predictor leads to the lower prediction error. In this Section we only expressed the GMC/LMC mode selection approach. More detail expression for the INTER4V/INTER/field prediction, GMC/LMC, and INTRA/INTER can be found in the Section 18.8.2 GMC prediction and MB type selection [2]. The pseudo-code of GMC/LMC mode decision in MPEG-4 AS is as follows:

if (*SAD* GMC - *P* < *SAD* LMC) then GMC else LMC

Global Motion Estimation and Its Applications 91

intensity and average motion intensities of global motion. The local motion information is

where (*MVxj*, *MVyj*) is the motion vector (MV) of the block with its coordinates (*xj*, *yj*) and *j*  is the block index. The average global motion intensity (*AGMV* ) is calculated as follows:

where (,) *GMVx GMV j j y* is the global motion vector of the block at (,) *xj j y* , M is the total block number. The global motion vector (,) *GMVx GMVy t t* at the coordinates (,) *t t x y* is

> *GMVx x x GMVy y y*

where (,) *t t x y* are the warped coordinates in the reference frame by the global motion

The GM/LM based global view refinement is carried out by the following empirical rules [18]: If the motion energy of a frame satisfies *AMV*<0.5, then it is stationary otherwise nonstationary. The non-stationary shot is further classified into zoom and track. A frame is a zoom-in if *m*0=*m*5 >1, a zoom-out if *m*0=*m*5 <1, otherwise a track (*m*0=*m*5 =1). The track is a slow-track if the average global motion intensity *AGMV* satisfies 2 *AGMV* , otherwise a

**5.3 Global motion and local motion (GM/LM) based application in error concealment**  The aim of this sub-chapter is to show how combine global and local motion to improve

Temporal recovery (TR) is often utilized to replace the erroneous macro-blocks (EMBs) by their spatially corresponding MBs in the reference frames. TR is efficient for the stationary video sequences. Temporal average (TA) uses the average or medium MV of the correctly received MBs in its neighbors to substitute the losing MVs for the corrupted MBs [19]. Boundary pixels of the top and bottom-, or (and) left and right-adjacent MBs as the references [20], [21]. A recursive block matching (RBM) technique is utilized to recover the error MBs [20]. The correctly received MBs in its neighbors are utilized. Recovery results of the corrupted MBs are improved step by step using the full searching technique within a given searching range. However, this approach is not effective when the reference blocks located in texture-alike or smooth region. There are more than one best matches for the two

*ttt ttt*

1

1

1 *<sup>M</sup>*

*j AGMV GMVx GMVy <sup>M</sup>*

determined as follows

fast-track.

parameters from the coordinate (*xt*, *yt*).

visual video qualities of corrupted video sequences.

8×16 blocks in the smooth regions at reference frames.

**5.3.1 Related work on error concealment (EC)** 

1 *<sup>M</sup>*

*j AMV MVx MVy <sup>M</sup>*

2 2

2 2

*j j*

(12)

(13)

(14)

*j j*

represented by average motion intensity (*AMV* ) which is expressed as follows

where *SAD* GMC is defined as the sum of the absolute difference in the luminance block when using GMC prediction, and *SAD* LMC and *P* are defined as: if the previous criterion was INTER4V

$$\begin{array}{l} \mathrm{SAD\\_LMC} = \mathrm{SAD\\_8} \\ P = \mathrm{N}\_B \, \text{\* } \mathbf{Qp} \text{ / } 16 \end{array}$$

if the previous criterion was INTER

$$\begin{array}{c} SAD \text{ } \_{\text{LMC}} = SAD \_{16} \\ P = \text{N}\_{B} \text{ } \* \text{ Qp} / \text{ } 64 \end{array}$$

if the previous criterion was INTER and the motion vector was (0,0)

$$\begin{array}{l} \text{SAD}\_{LMC} = \text{SAD}\_{16} + (N\_B/2 + 1) \\ P = \text{NB}/2 + 1 \end{array}$$

if the previous criterion was field prediction

$$\begin{array}{c} \mathrm{SAD\\_LMC} = \mathrm{SAD}\_{16^\circ 8} \\ P = \mathrm{N}\_{\mathrm{B}} \text{ } ^\circ \mathrm{Qp} \text{ } / \, 32 \end{array}$$

where *SAD*8 (sum of absolute difference for four 8x8 luminance blocks when the INTER4V mode is selected), *SAD*16 (sum of absolute difference for a 16x16 luminance block when the INTER mode is selected) and *SAD*16\*8 (sum of absolute difference for two 16x8 interlaced luminance blocks when the field prediction mode is selected) are computed with half pixel motion vectors. *NB* indicates the number of pixels inside the VOP. Qp is the quantization parameter.

#### **5.2 Global motion based shot classification for sport video**

In [13], Xu et al*.* classified soccer video shots into the views of global, zoom-in and close-up. From the view sequences, each soccer video clip is classified into either a play or a break. In [14], Duan et al. classified video shots into eight categories by fusing the global motion pattern, color, texture, shape, and shot length information in a supervised learning framework. Ekin and Tekalp utilized shot category and shot duration information to carry out play-break detection according to the dominant related rules and soccer video production knowledge [16]. Similarly, Li et al. classified video shots into event and nonevent by identifying the canonical scenes and the camera breaks [15]. Tan et al. also segmented a basketball sequence into wide angle, close-up, fast-break and possible shoot-atthe-basket using motion information [17].

In soccer video, the global views give audiences an overall view of the sport, while the close up and medium views, being complementary to global views, show certain details of the game. Typically the camera men operate cameras, by fast track, or zoom in to provide audiences with clearer views of the games. Based on the view type, camera motion patterns and domain related knowledge, high level semantics can be inferred. The classified shot category information is helpful for highlight events discrimination. In [18], global views of soccer video are further refined into the following three types: stationary, zoom and track in terms of camera motion information using a set of empirical rules with respect to domain and production knowledge. The key-frames of a shot with stationary by means of average motion

where *SAD* GMC is defined as the sum of the absolute difference in the luminance block when

where *SAD*8 (sum of absolute difference for four 8x8 luminance blocks when the INTER4V mode is selected), *SAD*16 (sum of absolute difference for a 16x16 luminance block when the INTER mode is selected) and *SAD*16\*8 (sum of absolute difference for two 16x8 interlaced luminance blocks when the field prediction mode is selected) are computed with half pixel motion vectors. *NB* indicates the number of pixels inside the VOP. Qp is the quantization

In [13], Xu et al*.* classified soccer video shots into the views of global, zoom-in and close-up. From the view sequences, each soccer video clip is classified into either a play or a break. In [14], Duan et al. classified video shots into eight categories by fusing the global motion pattern, color, texture, shape, and shot length information in a supervised learning framework. Ekin and Tekalp utilized shot category and shot duration information to carry out play-break detection according to the dominant related rules and soccer video production knowledge [16]. Similarly, Li et al. classified video shots into event and nonevent by identifying the canonical scenes and the camera breaks [15]. Tan et al. also segmented a basketball sequence into wide angle, close-up, fast-break and possible shoot-at-

In soccer video, the global views give audiences an overall view of the sport, while the close up and medium views, being complementary to global views, show certain details of the game. Typically the camera men operate cameras, by fast track, or zoom in to provide audiences with clearer views of the games. Based on the view type, camera motion patterns and domain related knowledge, high level semantics can be inferred. The classified shot category information is helpful for highlight events discrimination. In [18], global views of soccer video are further refined into the following three types: stationary, zoom and track in terms of camera motion information using a set of empirical rules with respect to domain and production knowledge. The key-frames of a shot with stationary by means of average motion

using GMC prediction, and *SAD* LMC and *P* are defined as:

if the previous criterion was INTER and the motion vector was (0,0)

**5.2 Global motion based shot classification for sport video** 

if the previous criterion was INTER4V

*SAD LMC* = *SAD*16 + (*NB* /2 + 1)

if the previous criterion was field prediction

the-basket using motion information [17].

*SAD* LMC = *SAD*<sup>8</sup> *P* = *NB* \* Qp / 16 if the previous criterion was INTER *SAD* LMC = *SAD*<sup>16</sup> *P* = *NB* \* Qp / 64

*P = NB*/2 + 1

parameter.

*SAD* LMC = *SAD*16\* 8 *P* = *NB* \* Qp / 32

intensity and average motion intensities of global motion. The local motion information is represented by average motion intensity (*AMV* ) which is expressed as follows

$$AMV = \frac{1}{M} \sum\_{j=1}^{M} \sqrt{MV\mathbf{x}\_j^2 + MV\mathbf{y}\_j^{-2}} \tag{12}$$

where (*MVxj*, *MVyj*) is the motion vector (MV) of the block with its coordinates (*xj*, *yj*) and *j*  is the block index. The average global motion intensity (*AGMV* ) is calculated as follows:

$$AGMV = \frac{1}{M} \sum\_{j=1}^{M} \sqrt{\text{GMVx}\_{j}^{2} + \text{GMVy}\_{j}^{2}} \tag{13}$$

where (,) *GMVx GMV j j y* is the global motion vector of the block at (,) *xj j y* , M is the total block number. The global motion vector (,) *GMVx GMVy t t* at the coordinates (,) *t t x y* is determined as follows

$$\begin{cases} \mathbf{G}MVx\_t = \mathbf{x}'\_t - \mathbf{x}\_t \\ \mathbf{G}MVy\_t = y'\_t - y\_t \end{cases} \tag{14}$$

where (,) *t t x y* are the warped coordinates in the reference frame by the global motion parameters from the coordinate (*xt*, *yt*).

The GM/LM based global view refinement is carried out by the following empirical rules [18]: If the motion energy of a frame satisfies *AMV*<0.5, then it is stationary otherwise nonstationary. The non-stationary shot is further classified into zoom and track. A frame is a zoom-in if *m*0=*m*5 >1, a zoom-out if *m*0=*m*5 <1, otherwise a track (*m*0=*m*5 =1). The track is a slow-track if the average global motion intensity *AGMV* satisfies 2 *AGMV* , otherwise a fast-track.

#### **5.3 Global motion and local motion (GM/LM) based application in error concealment**

The aim of this sub-chapter is to show how combine global and local motion to improve visual video qualities of corrupted video sequences.

#### **5.3.1 Related work on error concealment (EC)**

Temporal recovery (TR) is often utilized to replace the erroneous macro-blocks (EMBs) by their spatially corresponding MBs in the reference frames. TR is efficient for the stationary video sequences. Temporal average (TA) uses the average or medium MV of the correctly received MBs in its neighbors to substitute the losing MVs for the corrupted MBs [19]. Boundary pixels of the top and bottom-, or (and) left and right-adjacent MBs as the references [20], [21]. A recursive block matching (RBM) technique is utilized to recover the error MBs [20]. The correctly received MBs in its neighbors are utilized. Recovery results of the corrupted MBs are improved step by step using the full searching technique within a given searching range. However, this approach is not effective when the reference blocks located in texture-alike or smooth region. There are more than one best matches for the two 8×16 blocks in the smooth regions at reference frames.

Global Motion Estimation and Its Applications 93

3. Otherwise the EMB is a GLMB. The EMB may contain both global and local motion regions. Boundaries between background and objects usually exist in the EMB. To determine the accurate boundaries, complicate boundary matching algorithms such as RBM, and AECOD [21] can be adopted. We use RBM method to search the optimal MV

Objective error concealment performances of the TR, TA, GM, RBM and GM/LM are given. Fig. 4 (a) and (b) show the objective averaged PSNR (peak signal to noise ratio) values of the EC methods applied to each of the P-frame of the testing sequences *flower* and *mobile* under the PER (packet error rates) 15%. From Fig.4, we find that our GM/LM based EC method

(a) flower (b) mobile

Fig. 4. EC Performances Comparison of the corresponding TR, TA, GM, RBM and GM/LM

To show the subjective recovery results of the TR, TA, GM, RBM and GM/LM based error concealment approaches, two frames are extracted from the test video sequences with several erroneous slices, as shown in Fig. 5. We find that the recovery results of TR are not so effective. TA is not effective to get accurate motion information for the MBs in heavy motion regions. RBM performs well for the area where non-periodical texture appears. However, it is not so effective in the circumstance that the reference blocks are in smooth and texture similar regions as shown in Fig. 5(b). GM provides better recovery results for the background regions. However, large distortions are produced for recovering the EMBs in local motion regions. Comparatively, better performances are achieved by the proposed

that both global motion and local motion have certain consistence.

to recover the EMB.

**5.3.3 Error concealment performance** 

gives comparatively better recovery results.

for the video sequences under PER 15%.

GM/LM based EC method.

reference frame using the average MV of the non-corrupted or recovered MBs in its 8 neighbors. The GMV and LMV based replacement for the EMB are based on the facts

A global motion based error concealment method is proposed by Su *et al.*[3,4]. In [3] MVs generated by global motion parameters are utilized to recover the EMBs under the assumption that they are all located in global motion regions. When the EMBs are in LM or GM/LM overlapped regions, usually the MVs generated by global motion parameters are incorrect to recover the lost MVs.
