**1.2 Image interpretability**

A fundamental premise of the preceding example is that the imagery, whether a still image or a video clip, is of sufficient quality that the appropriate analysis can be performed (Le Meur *et al.* 2010; Seshadrinathan *et al.* 2010; Xia *et al.* 2010). Military applications have led to the development of a set of standards for assessing and quantifying this aspect of the imagery. The National Imagery Interpretability Rating Scale (NIIRS) is a quantification of image interpretability that has been widely applied for intelligence, surveillance, and reconnaissance (ISR) missions (Irvine 2003; Leachtenauer 1996; Maver *et al.* 1995). Each NIIRS level indicates the types of exploitation tasks an image can support based on the expert judgments of experienced analysts. Development of a NIIRS for a specific imaging modality rests on a perception-based approach. Additional research has verified the relationship between NIIRS and performance of target detection tasks (Baily 1972; Driggers *et al.* 1997; Driggers *et al.* 1998; Lubin 1995). Accurate methods for predicting NIIRS from the

configuration of objects. This additional information can support the recognition of basic

Fig. 1 is a hierarchy for target recognition information complexity. Each box's color indicates the ability of the developer community to assess the performance and provide confidence measures. The first two boxes on the left exploit information in the sensor phenomenology

To illustrate the concept, consider a security application with a surveillance camera overlooking a bank parking lot. If the bank is robbed, a camera that collects still images might acquire an image depicting the robbers exiting the building and show several cars in the parking lot. The perpetrators have been detected but additional information is limited. A video camera might collect a clip showing these people entering a specific vehicle for their getaway. Now both the perpetrators and the vehicle have been identified because the activity (a getaway) was observed. If the same vehicle is detecting on other security cameras throughout the city, analysis of multiple videos could reveal the pattern of movement and suggest the location for the robbers' base of operations. In this way, an association is formed between the event and specific locations, namely the bank and the robbers' hideout. If the same perpetrators were observed over several bank robberies, one could discern their pattern of behavior, i.e. their *modus operandi*. This information could enable law enforcement to anticipate future events and respond appropriately (Gualdi *et al.* 2008; Porter *et al.* 2010).

A fundamental premise of the preceding example is that the imagery, whether a still image or a video clip, is of sufficient quality that the appropriate analysis can be performed (Le Meur *et al.* 2010; Seshadrinathan *et al.* 2010; Xia *et al.* 2010). Military applications have led to the development of a set of standards for assessing and quantifying this aspect of the imagery. The National Imagery Interpretability Rating Scale (NIIRS) is a quantification of image interpretability that has been widely applied for intelligence, surveillance, and reconnaissance (ISR) missions (Irvine 2003; Leachtenauer 1996; Maver *et al.* 1995). Each NIIRS level indicates the types of exploitation tasks an image can support based on the expert judgments of experienced analysts. Development of a NIIRS for a specific imaging modality rests on a perception-based approach. Additional research has verified the relationship between NIIRS and performance of target detection tasks (Baily 1972; Driggers *et al.* 1997; Driggers *et al.* 1998; Lubin 1995). Accurate methods for predicting NIIRS from the

**Observed Derived**

**Recognize**

**Characterize**

**Behaviors**

**Associations**

activities, associations among objects, and analysis of complex behavior (Fig. 1).

domain. The right two boxes exploit extracted features derived from the sensor data.

Fig. 1. Image Exploitation and Analysis

**Static Dynamic**

**Detect**

**Activities**

**1.2 Image interpretability** 

**Characterize**

**Objects**

sensor parameters and image acquisition conditions have been developed empirically and substantially increase the utility of NIIRS (Leachtenauer *et al.* 1997; Leachtenauer and Driggers 2001).

The NIIRS provides a common framework for discussing the interpretability, or information potential, of imagery. NIIRS serves as a standardized indicator of image interpretability within the community. An image quality equation (IQE) offers a method for predicting the NIIRS of an image based on sensor characteristics and the image acquisition conditions (Leachtenauer *et al.* 1997; Leachtenauer and Driggers 2001). Together, the NIIRS and IQE are useful for:


The foundation for the NIIRS is that trained analysts have consistent and repeatable perceptions about the interpretability of imagery. If more challenging tasks can be performed with a given image, then the image is deemed to be of higher interpretability. A set of standard image exploitation tasks or "criteria" defines the levels of the scale. To illustrate, consider Fig. 2. Several standard NIIRS tasks for visible imagery appear at the right. Note that the tasks for levels 5, 6, and 7 can be performed, but the level 8 task cannot. The grill detailing and/or license plate on the sedan are not evident. Thus, an analyst would assign a NIIRS level 7 to this image.

Fig. 2. Illustration of NIIRS for a still image

Recent studies have extended the NIIRS concept to motion imagery (video). In exploring avenues for the development of a NIIRS-like metric for motion imagery, a clearer understanding of the factors that affect the perceived quality of motion imagery was needed (Irvine *et al.* 2006a; Young *et al.* 2010b). Several studies explored specific aspects of this

Quantifying Interpretability Loss due to Image Compression 39

The dataset for the study consisted of the original (uncompressed) motion imagery clips and clips compressed by three compression methods at various compression rates (Abomhara *et* 

All three were exercised in intraframe mode. Each of the parent clips was compressed to three megabits per second, representing a modest level of compression. In addition, each parent clip was severely compressed to examine the limits of the codecs. Actual bitrates for these severe cases depend on the individual clip and codec. The choice of compression methods and levels supports two goals: comparison across codecs and comparisons of the same compression method at varying bitrates. Table 1 shows the combinations represented in the study. We recorded the actual bit rate for each product and use this as a covariate in

The study used the Kakadu implementation of JPEG2000, the Vanguard Software Solutions, Inc. implementation of H.264, and the Adobe Premiere's MPEG-2 codec. In each case, the 300 key frame interval was used for interframe compression unless otherwise noted.

The study used progressive scan motion imagery in a 848 x 480 pixel raster at 30 frames per second (f/s). Since most of the desirable source material was available to us in 720 P HD video, a conversion process was employed to generate the lower resolution/lower frame rate imagery. We evaluated the conversion process to assure the goals of the study could be

> **Intraframe**

**JPEG 2000 (KDU)** 

**Intraframe** 

**MPEG 2 (Premiere)** 

> **Intraframe**

**Interframe** 

Intraframe encoding is comparable to interframe encoding with 1 key frame interval.

**frame** 

Severe X X X

Note: the severe bitrate represents the limit of the specific codec on a given clip.

of each clip is 10 seconds. Ten video clips were used for this study.

3 MB/sec X X X X X

The study consists of two parts. Both parts used the set of compression products described above. The first part was an evaluation in which trained imagery analysts viewed the compressed products and the original parent clip to assess the effects of compression on interpretability. The second part of the study implemented a set of computational image metrics and examined their behavior with respect to bitrate and codec. The typical duration

met. The video clips were converted using Adobe Premiere tools.

**Bitrate Uncompressed H.264 (VSS)** 

 **Inter-**

Native X

Table 1. Codecs and Compression Rates

**2.2 Experimental design** 

**2.1 Data compression** 

the analysis.

*al.* 2010). The three compression methods were:

 Motion JPEG 2000 – intraframe only MPEG-2 – intraframe and interframe H.264 – intraframe and interframe

problem, including target motion, camera motion, and frame rate, and the nature of the analysis tasks (Hands 2004; Huynh-Thu *et al.* 2011; Moorthy *et al.* 2010). Factors affecting perceived interpretability of motion imagery include the ground sample distance (GSD) of the imagery, motion of the targets, motion of the camera, frame rate (temporal resolution), viewing geometry, and scene complexity. These factors have been explored and characterized in a series of evaluations with experienced imagery analysts:


Building on these perceptions studies, a new Video NIIRS was developed (Petitti *et al.* 2009; Young *et al.* 2009). The work presented in this paper quantifies video interpretability using a 100-point scale described in Section 3 (Irvine *et al.* 2007a; Irvine *et al.* 2007b; Irvine *et al.* 2007c). The scale development methodologies imply that each scale is a linear transform of the other, although this relationship has not been validated (Irvine *et al.* 2006a; Irvine *et al.* 2006b). Other methods for measuring video image quality frequently focus on objective functions of the imagery data, rather than perception of the potential utility of the imagery to support specific types of analysis (Watson *et al.* 2001; Watson and Kreslake 2001; Winkler 2001; Winkler *et al.* 2001).
