**4. Computational measures and performance assessment**

In the previous section, analyst assessments of image quality were characterized. This section identifies computational attributes for image quality that can be extracted from video clips. Performance measures will evaluate the computational image quality metrics and provide an understanding of how well they compare to codec, bitrate, and scene parameters.

#### **4.1 Computational image metrics**

We reviewed a variety of image metrics to quantify image quality (Bhat *et al.* 2010; Chikkerur *et al.* 2011; Culibrk *et al.* 2011; Huang 2011; Sohn *et al.* 2010). Based on a review of the literature and assessment of the properties of these metrics, we selected four measures for this study: two edge-based metrics, structural similarity image metric (SSIM), and SNR. SSIM and edge metrics are performed at each pixel location. The resultant can be viewed as an image (Fig. 6). SNR metrics deal with overall information content and cannot be visualized as an image. These metrics were computed for the original (uncompressed) clips and for all of the compressed products. We will present the computation methods and the results.

The color information was transposed into panchromatic (intensity) using either a HSI transformation or luminance. Intensity was computed using (2):

$$I = \frac{R+G+B}{3} \tag{2}$$

Luminance was computed (3)

$$Y = 0.299R + 0.587G + 0.114B \tag{3}$$

or JPEG-2000. Only JPEG-2000 supported more extreme intraframe compression and highly compressed renditions were produced from all of the parent clips. As with the interframe comparisons, there were systematic differences across the clips, as expected, but the effects of the codecs and bitrates were consistent. The analysis of covariance confirms these statistical effects (Table 4). When modeled as a covariate, effects of bitrate dominate. The effect due to codec is modest, but still significant. As expected, there is a significant main

> Intercept 1 0.9 0.0038 BitRate 1 24.4 0.0078 Codec 3 5.6 0.001 Scene 4 26.2 0.00001 Codec \* Scene 12 0.5 0.84

In the previous section, analyst assessments of image quality were characterized. This section identifies computational attributes for image quality that can be extracted from video clips. Performance measures will evaluate the computational image quality metrics and provide an understanding of how well they compare to codec, bitrate, and scene

We reviewed a variety of image metrics to quantify image quality (Bhat *et al.* 2010; Chikkerur *et al.* 2011; Culibrk *et al.* 2011; Huang 2011; Sohn *et al.* 2010). Based on a review of the literature and assessment of the properties of these metrics, we selected four measures for this study: two edge-based metrics, structural similarity image metric (SSIM), and SNR. SSIM and edge metrics are performed at each pixel location. The resultant can be viewed as an image (Fig. 6). SNR metrics deal with overall information content and cannot be visualized as an image. These metrics were computed for the original (uncompressed) clips and for all of the compressed products. We will present the computation methods and the

The color information was transposed into panchromatic (intensity) using either a HSI

3

*RGB <sup>I</sup>* (2)

*Y RGB* 0.299 0.587 0.114 (3)

**Freedom F-statistic Significance**

**< 0.025** 

effect due to scene, but no scene-by-codec interaction.

Table 4. Analysis of Variance for Interframe Comparisons

parameters.

results.

**4.1 Computational image metrics** 

Luminance was computed (3)

**Source Deg. of** 

**4. Computational measures and performance assessment** 

transformation or luminance. Intensity was computed using (2):

(c) (d)

(g)

*Imagery extracted from the VIVID Public Release Data set provided by Air Force Research Laboratory* 

Fig. 6. (a) Original, (b) Compressed version, (c,d) original edge images, (e) edge images displayed together where Red, Blue, Magenta are from the original, the compressed, both edge images respectively (f) edge intensities, and (g) is the SSIM image darker areas represent more noticeable differences.

Quantifying Interpretability Loss due to Image Compression 47

An additional set of edge operators were also applied. These operators are called edge strength (ES) metrics. Let Y1 be the luminance component of an original frame from an clip and let Y2 be the corresponding frame after compression processing, also in luminance. We

> 101 202 101

 

We define two metrics, one for local loss of edge energy (EL) (thus finding blurred edges from Y1 in Y2) and the other for the addition of edge energy (thus finding edges added to Y2 that are weaker in Y1). Each metric examines the strongest edges in one image (either Y1 or Y2) and compares them to the edges at the corresponding pixels in the other (Y2 or Y1).

For the grayscale image F, let I(F,f) be the set of image pixels, p, where F (p) is at least as

*mean S Y BlurIndex*

Finally, we examined the peak signal to noise ratio (PSNR). The PSNR is defined for a pair

<sup>1</sup> , ,

0 0

*m n*

*i j MSE Y ij Y ij*

*mn*

2 2 *SF H F V F* \* \* (7)

(7.1)

*<sup>T</sup> V H* (7.2)

*mean S Y* (9)

*I F*( , ) : ( ) \* max( ) *f Pixels F p f F* (8)

1 2 ( ( )) ( ( )) *mean S Y AddedEdgeEnergy mean S Y* (10)

2 1 ( ( )) ( ( ))

 2 1 1 1 2

(11)

where the sum is taken over all the pixels within a given frame.

The filters H and V used in the Sobel edge detector are:

Using the definition of Y(F,f), the two edge metrics are:

where the means are taken over the set I(S(I1), 0.99)

where the means are taken over the set I(S(Y2), 0.99).

of m×n luminance images, Y1 and Y2. Let MSE be defined by,

large as f \* max(F). That is:

**4.1.3 SNR** 

The PSNR is defined as:

apply a Sobel filter, S, to both Y1 and Y2, where for a grayscale frame F:

*H*

#### **4.1.1 SSIM**

The first metric for image quality is the Structural Similarity Image Metric (Wang *et al.* 2004). SSIM quantifies differences between two images, I1 and I2, by taking three variables into consideration, luminance, contrast, and spatial similarity. For grey level images, those variables are measured in the images as the mean, standard deviation, and Pearson's correlation coefficient between the two images respectively. For our application, the RGB data was converted to grey level using the standard Matlab function. Let:

$$\begin{aligned} \mu &= mean(I\_1) \\ \sigma\_1 &= \text{standard deviation}(I\_1) \\ \sigma\_{12} &= \frac{1}{N-1} \sum\_{i,j} (I\_1(i,j) - \mu\_1)(I\_2(i,j) - \mu\_2) : \text{covariance} \end{aligned}$$

then

$$\text{SSIM(I}\_1 \text{ , I}\_2\text{)} = \frac{2\,\mu\_1\mu\_2}{\mu\_1^2 + \mu\_2^2} \ast \frac{2\,\sigma\_1\sigma\_2}{\sigma\_1^2 + \sigma\_2^2} \ast \frac{\sigma\_{12}}{\sigma\_1^2\sigma\_2^2} \tag{4}$$

Equation 4 is modified to avoid singularities, e.g., when both means are 0. SSIM is computed locally on each corresponding MxM sub-image of I1 and I2. In practice, the sub-image window size is 11x11, implemented as a convolution filter. The SSIM value is the average across the entire image.

#### **4.1.2 Edge metrics**

Two edge metrics were examined. The first is denoted by CE for Common Edges and the second is denoted SE for strength of edges (O'Brien *et al.* 2007). Heuristically, CE measures the ratio of the number edges in a compressed image to the number of edges in the original; whereas SE measures a ratio of the strength of the edges in a compressed version to strength of the edges in the original.

Given two images I1 and I2 CE(I1, I2 ) and SE(I1, I2 ) are computed as follows. From the grey level images, edge images are constructed using the Canny edge operator. The edge images are designated as E1 and E2. Assume that the values in E1 and E2 are 1 for an edge pixel and 0 otherwise. Let "\*" denote the pixel wise product. Let G1 and G2 denote the gradient images of I1 and I2 respectively. G(m,n) was approximated as the maximum of absolute value of the set {I(m,n) - I(m+t1,n + t2) | -6 < t1 < 6 and -6< t2 < 6}, i.e. the maximum difference between the center value and all values in a 5x5 neighborhood around it. With that notation,

$$\text{CEE(I}\_1, I\_2) = \frac{2 \ast \sum (E\_1 \ast E\_2)}{\sum E\_1 + \sum E\_2} \tag{5}$$

where the sum is taken over all the pixels within a given frame.

$$SE(I\_1, I\_2) = \frac{\sum (E\_2 \, \, \ast \, G\_2)}{\sum E\_1 \, \, \ast \, G\_1} \tag{6}$$

The first metric for image quality is the Structural Similarity Image Metric (Wang *et al.* 2004). SSIM quantifies differences between two images, I1 and I2, by taking three variables into consideration, luminance, contrast, and spatial similarity. For grey level images, those variables are measured in the images as the mean, standard deviation, and Pearson's correlation coefficient between the two images respectively. For our application, the RGB

<sup>1</sup> ( ( , ) )( ( , ) ) : covariance <sup>1</sup> *i j*

1 2 2 2 2 2 22

 

Equation 4 is modified to avoid singularities, e.g., when both means are 0. SSIM is computed locally on each corresponding MxM sub-image of I1 and I2. In practice, the sub-image window size is 11x11, implemented as a convolution filter. The SSIM value is the average

Two edge metrics were examined. The first is denoted by CE for Common Edges and the second is denoted SE for strength of edges (O'Brien *et al.* 2007). Heuristically, CE measures the ratio of the number edges in a compressed image to the number of edges in the original; whereas SE measures a ratio of the strength of the edges in a compressed version to strength

Given two images I1 and I2 CE(I1, I2 ) and SE(I1, I2 ) are computed as follows. From the grey level images, edge images are constructed using the Canny edge operator. The edge images are designated as E1 and E2. Assume that the values in E1 and E2 are 1 for an edge pixel and 0 otherwise. Let "\*" denote the pixel wise product. Let G1 and G2 denote the gradient images of I1 and I2 respectively. G(m,n) was approximated as the maximum of absolute value of the set {I(m,n) - I(m+t1,n + t2) | -6 < t1 < 6 and -6< t2 < 6}, i.e. the maximum difference between

2\* ( \* ) (, )

(\* ) (, ) \*

*E G*

*E E* 

1 2

(5)

(6)

*E E*

1 2

2 2

*E G*

1 1

the center value and all values in a 5x5 neighborhood around it. With that notation,

1 2

1 2

*SE I I*

*CE I I*

where the sum is taken over all the pixels within a given frame.

1 2 1 2 12

 

(4)

 

1 2 1 2 12

*I*

2 2 SSIM(I , I ) \* \* 

data was converted to grey level using the standard Matlab function. Let:

standard deviation( )

12 1 12 2

*I ij I ij <sup>N</sup>*

1 1 1

( )

*mean I*

 

.

**4.1.1 SSIM** 

then

across the entire image.

of the edges in the original.

**4.1.2 Edge metrics** 

where the sum is taken over all the pixels within a given frame.

An additional set of edge operators were also applied. These operators are called edge strength (ES) metrics. Let Y1 be the luminance component of an original frame from an clip and let Y2 be the corresponding frame after compression processing, also in luminance. We apply a Sobel filter, S, to both Y1 and Y2, where for a grayscale frame F:

$$S(F) = \sqrt{{H \ \*F} \overline{\phantom{F}}^2 + {\begin{pmatrix} V \ \*F \end{pmatrix}}^2} \tag{7}$$

The filters H and V used in the Sobel edge detector are:

$$\begin{array}{ccccc} -1 & 0 & 1 \\ H = -2 & 0 & 2 \\ & -1 & 0 & 1 \end{array} \tag{7.1}$$

$$V = \boldsymbol{H}^T \tag{7.2}$$

We define two metrics, one for local loss of edge energy (EL) (thus finding blurred edges from Y1 in Y2) and the other for the addition of edge energy (thus finding edges added to Y2 that are weaker in Y1). Each metric examines the strongest edges in one image (either Y1 or Y2) and compares them to the edges at the corresponding pixels in the other (Y2 or Y1).

For the grayscale image F, let I(F,f) be the set of image pixels, p, where F (p) is at least as large as f \* max(F). That is:

$$I(F, f) = \left\{ P \text{ixels} : F(p) \ge f \text{ \* } \max(F) \right\} \tag{8}$$

Using the definition of Y(F,f), the two edge metrics are:

$$\text{BularIndex} = \frac{mean(S(Y\_2))}{mean(S(Y\_1))} \tag{9}$$

where the means are taken over the set I(S(I1), 0.99)

$$\text{AddedEdgeEnergy} = \frac{mean(S(Y\_1))}{mean(S(Y\_2))} \tag{10}$$

where the means are taken over the set I(S(Y2), 0.99).

#### **4.1.3 SNR**

Finally, we examined the peak signal to noise ratio (PSNR). The PSNR is defined for a pair of m×n luminance images, Y1 and Y2. Let MSE be defined by,

$$MSE = \frac{1}{mn} \sum\_{i=0}^{m-1} \sum\_{j=0}^{n-1} \left\| Y\_1(i,j) - Y\_2(i,j) \right\|^2 \tag{11}$$

The PSNR is defined as:

Quantifying Interpretability Loss due to Image Compression 49

the compressed frames to the original frames. Along this initial portion of the clip the

Fig. 8 plots SSIM versus frame at differeing bitrates for the H.264 codec, which is an interframe codec. The saw-tooth nature of the graph is the result of the group of pictures (GOP) sequence. The peak and trough differences are between bilinear interpolation

The observations for the metrics listed above for H.264 were also visually evident in the case of MPEG compression. Close inspection of the clips shows the quality to be lower in the case of MPEG than for H.264. The example in Fig. 9 is taken from a clip that was compressed to 2 Mbits/second using both codecs. While discernable in both the original and the H.264 compressed versions, some of the individuals' heads seem to be nearly totally lost in the

metrics agree with human perception of the image quality increasing.

Fig. 8. Plot of the SSIM evaluated on each frame for 11 different bit rates. Each clip was compressed using H.264 with the key frames 1 every 300 frames.

saw tooth differences between the B and P frames were readily observable.

These experiments demonstrate the existence of several metrics that are monotonic with bitrate. The metrics showed considerable sensitivity to image quality that matched the authors' observations. Specifically, the MPEG quality was considerably less than H.264 at the same bitrate. The knee of the quality curves exist between 500k and 1000k bps. In addition, the metrics were sensitive to the encoded structure of the individual frames as the

between key frames (B) and predicted (P) encoded frames.

MPEG version.

**5. Discussion** 

$$PSNR = 10\log\_{10}\frac{MAX\_I^2}{MSE} = 20\log\_{10}\frac{MAX\_I}{\sqrt{MSE}}\tag{12}$$

where MAXI is maximum pixel value of the image. In our case, MAXI is taken to be 255.

#### **4.2 Metrics and performance**

The image metrics were plotted (Fig. 7). The image metrics are all highly correlated across both bitrate and codec, for both intraframe and interframe compression techniques. For the set of clips with every 300 key frame interval, the correlation was greater than 0.9. In each case, the lower information content is indicated by lower position on the Y axis, quality. The X axis is the target bitrate. Due to the high correlation a single computational metric was chosen for more detailed analysis to quantify the relationship between image quality and bitrate. SSIM was selected because it generates an image to diagnose unexpected values and the computation is based upon perceptual difference of spatial similarity and contrast.

Fig. 7. Target Bitrate (k bps) versus Image Metric: SSIM, EL, ES, and PSNR

Fig. 7 indicates that SSIM, CE, and SE that measures separate image quality based on bitrate. H.264's asymptotic quality improvement observed in the rise in the graph from the initial frames (Fig. 7). This corresponds to exactly where the algorithm is increasing its fidelity of

where MAXI is maximum pixel value of the image. In our case, MAXI is taken to be 255.

Fig. 7. Target Bitrate (k bps) versus Image Metric: SSIM, EL, ES, and PSNR

Fig. 7 indicates that SSIM, CE, and SE that measures separate image quality based on bitrate. H.264's asymptotic quality improvement observed in the rise in the graph from the initial frames (Fig. 7). This corresponds to exactly where the algorithm is increasing its fidelity of

The image metrics were plotted (Fig. 7). The image metrics are all highly correlated across both bitrate and codec, for both intraframe and interframe compression techniques. For the set of clips with every 300 key frame interval, the correlation was greater than 0.9. In each case, the lower information content is indicated by lower position on the Y axis, quality. The X axis is the target bitrate. Due to the high correlation a single computational metric was chosen for more detailed analysis to quantify the relationship between image quality and bitrate. SSIM was selected because it generates an image to diagnose unexpected values and the computation is based upon perceptual difference of spatial similarity and contrast.

**4.2 Metrics and performance** 

2 <sup>10</sup> <sup>10</sup> 10log 20log *MAXI I MAX PSNR*

*MSE MSE* (12)

the compressed frames to the original frames. Along this initial portion of the clip the metrics agree with human perception of the image quality increasing.

Fig. 8 plots SSIM versus frame at differeing bitrates for the H.264 codec, which is an interframe codec. The saw-tooth nature of the graph is the result of the group of pictures (GOP) sequence. The peak and trough differences are between bilinear interpolation between key frames (B) and predicted (P) encoded frames.

The observations for the metrics listed above for H.264 were also visually evident in the case of MPEG compression. Close inspection of the clips shows the quality to be lower in the case of MPEG than for H.264. The example in Fig. 9 is taken from a clip that was compressed to 2 Mbits/second using both codecs. While discernable in both the original and the H.264 compressed versions, some of the individuals' heads seem to be nearly totally lost in the MPEG version.

Fig. 8. Plot of the SSIM evaluated on each frame for 11 different bit rates. Each clip was compressed using H.264 with the key frames 1 every 300 frames.

#### **5. Discussion**

These experiments demonstrate the existence of several metrics that are monotonic with bitrate. The metrics showed considerable sensitivity to image quality that matched the authors' observations. Specifically, the MPEG quality was considerably less than H.264 at the same bitrate. The knee of the quality curves exist between 500k and 1000k bps. In addition, the metrics were sensitive to the encoded structure of the individual frames as the saw tooth differences between the B and P frames were readily observable.

Quantifying Interpretability Loss due to Image Compression 51

is needed to validate a model-based relationship that could predict Video NIIRS loss due to

Abomhara, M., Khalifa, O.O., Zakaria, O., Zaidan, A.A., Zaidan, B.B., and Rame, A. 2010.

Baily, H.H. 1972. Target Acquisition Through Visual Recognition: An Early Model, *Target* 

Bhat, A., Richardson, I., and Kannangara, S. 2010. A New Perceptual Quality Metric for

Cermak, G., Pinson, M., and Wolf, S. 2011. The Relationship Among Video Quality, Screen Resolution, and Bit Rate, *IEEE Transactions on Broadcasting,* 57(2): 258-262. Chikkerur, S., Sundaram, V., Reisslein, M., and Karam, L.J. 2011. Objective Video Quality

Culibrk, D., Mirkovic, M., Zlokolica, V., Pokric, M., Crnojevic, V., and Kukolj, D. 2011.

Driggers, R.G., Cox, P.G., and Kelley, M. 1997. National Imagery Interpretation Rating

Driggers, R.G., Cox, P.G., Leachtenauer, J., Vollmerhausen, R., and Scribner, D.A. 1998.

Fenimore, C., Irvine, J.M., Cannon, D., Roberts, J., Aviles, I., Israel, S.A., Brennan, M., Simon,

*Vision and Electronic Imaging XI,* SPIE*,* San Jose, 16-19 January, 6057: 248-256. Gibson, L., Irvine, J.M., O'Brien, G., Schroeder, S., Bozell, A., Israel, S.A., and Jaeger, L. 2006.

Hands, D.S. 2004. A Basic Multimedia Quality Metric, *IEEE Transactions on Multimedia,* 6(6):

He, Z. and Xiong, H. 2006. Transmission Distortion Analysis for Real-Time Video Encoding

Hewage, C.T.E.R., Worrall, S.T., Dogan, S., Villette, S., and Kondoz, A.M. 2009. Quality

Surveillance, *IEEE Transactions on Multimedia,* 10(6): 1142-1154.

Video Compression Techniques: An Overview, *Journal of Applied Sciences,* 10(16):

Compressed Video Based on Mean Squared Error, *Signal Processing: Image* 

Assessment Methods: A Classification, Review, and Performance Comparison,

Salient Motion Features for Video Quality Assessment, *IEEE Transactions on Image* 

System and the Probabilities of Detection, Recognition, and Identification, *Optical* 

Targeting and Intelligence Electro-Optical Recognition and Modeling: A Juxtaposition of the Probabilities of Discrimination and the General Image Quality

L., Miller, J., Haverkamp, D., Tighe, P.F., and Gross, M. 2006. Perceptual Study of the Impact of Varying Frame Rate on Motion Imagery Interpretability, *Human* 

User Evaluation of Differential Compression for Motion Imagery, *Defense and Security Symposium 2006,* SPIE*,* Orlando, 17-21 April, 6209: paper number 6209-03. Gualdi, G., Prati, A., and Cucchiara, R. 2008. Video Streaming for Mobile Video

and Streaming Over Wireless Networks, *IEEE Transactions on Circuits and Systems* 

Evaluation of Color Plus Depth Map-Based Stereoscopic Video, *IEEE Journal of* 

compression using the objective image metrics presented here.

*Acquisition Symposium,* Orlando, 14-16 November.

*IEEE Transactions on Broadcasting,* 57(2): 165-182.

Equation, *Optical Engineering,* 37(3): 789-797.

*for Video Technology,* 16(9): 1051-1062.

*Selected Topics in Signal Processing,* 3(2): 304-318.

*Communication,* 25(7): 588-596.

*Processing,* 20(4): 948-958.

806-816.

*Engineering,* 36(7): 1952-1959.

**7. References** 

1834-1840.

(a) (b)

Fig. 9. (a) Original frame and compressed using (b) H.264 and (c) MPEG.

A qualitative comparison of the objective metrics to the user assessment of interpretability shows strong consistency. Compression of these video products to bitrates below 1,000k bps yields discernable losses in image interpretability. The objective metrics shows a similar knee in the curve. These data suggest that one could estimate loss in interpretability from compression using the objective metrics and derive a prediction of the loss in Video NIIRS. Development of such a model would require conducting a second user experiment to establish the relationship between the subjective interpretability scale used in this study and the published Video NIIRS. The additional data from such an experiment would also support validation of a model for predicting loss due to compression.
