**2. Related work**

Some subjective recognition metrics, described below, have been proposed over the past decade. They usually combine aspects of Quality of Recognition (QoR) and QoE. These metrics have been not focused on practitioners as subjects, but rather on naïve participants.

<sup>1</sup> International Telecommunication Union — Telecommunication Standardization Sector

on commonalities between them. Most importantly, as mentioned above, each application consists of some type of recognition task. The ability to achieve a recognition task is impacted by many parameters, and five of them have been selected as being of particular importance.

Quality Assessment in Video Surveillance 59

1. **Usage time-frame.** Specifies whether the video will need to be analyzed in real-time or

3. **Target size.** Specifies whether the anticipated region of interest in the video occupies a

These parameters form what are referred to as generalized use classes, or GUCs. Figure 1 is a

Fig. 1. Classification of video into generalized use classes as proposed by VQiPS (source:

The VQiPS user guide is intended to help the end users determine how their application fits within these parameters. The research and specifications provided to users is also to be framed within those parameters. The end user is thus led to define their application within the five parameters and will in turn be led to specifications and other information most appropriate

Recently, a new VQEG project, Quality Assessment for Recognition Tasks (QART), was created for task-based video quality research. QART will address the problems of a lack of quality standards for video monitoring. The aims of QART are the same as the other VQEG projects — to advance the field of quality assessment for task-based video through collaboration in the development of test methods (including possible enhancements of the ITU-T Recommendations), performance specifications and standards for task-based video,

and predictive models based on network and other relevant parameters.

2. **Discrimination level.** Specifies how fine a level of detail is sought from the video.

As defined in: (Ford & Stange, 2010), they are:

relatively small or large percentage of the frame.

representation of the GUC determination process.

for their needs (Leszczuk, Stange & Ford, 2011).

(Ford & Stange, 2010)).

4. **Lighting level.** Specifies the anticipated lighting level of the scene. 5. **Level of motion.** Specifies the anticipated level of motion in the scene.

will be recorded for later analysis.

The metrics are not context specific, and they do not apply video surveillance-oriented standardized discrimination levels.

One of the metrics being definitively worth mention is Ghinea's Quality of Perception (QoP) (Ghinea & Chen, 2008; Ghinea & Thomas, 1998). Anyway, the QoP metric does not entirely fit video surveillance needs. It targets mainly video deterioration caused by frame rate (fps), whereas fps not necessarily affects the quality of CCTV and the required bandwidth (Janowski & Romaniak, 2010). The metric has been established for rather low, legacy resolutions, and tested on rather small groups of subjects (10 instead of standardized 24 valid, correlating subjects). Furthermore, a video recognition quality metric for a clear objective of video surveillance context requires tests in fully controlled environment (ITU-T, 2000), with standardized discrimination levels (avoiding ambiguous questions) and with minimized impact of subliminal cues (ITU-T, 2008).

Another metric being worth mention is QoP's offshoot, Strohmeier's Open Profiling of Quality (OPQ) (Strohmeier et al., 2010). This metric puts more stress on video quality than on recognition/discrimination levels. Its application context, being focused on 3D, is also different than video surveillance which requires rather 2D. Like the previous metric, this one also does not apply standardized discrimination levels, allowing subjects to use their own vocabulary. The approach is qualitative rather than quantitative, whereas the latter is preferred by public safety practitioners for e.g. public procurement. The OPQ model is somehow content/subject-oriented, while for video surveillance more generalized metric framework is needed.

OPQ partly utilizes free sorting, as used in (Duplaga et al., 2008) but also applied in the method called Interpretation Based Quality (IBQ) (Nyman et al., 2006; Radun et al., 2008), adapted from (Faye et al., 2004; Picard et al., 2003). Unfortunately, these approaches allow mapping relational, rather than absolute, quality.

Extensive work has been carried out in recent years in the area of consumer video quality, mainly driven by two working groups: VQiPS (Video Quality in Public Safety) (VQiPS, 2011) and VQEG (Video Quality Experts Group) (VQEG, n.d.).

The VQiPS Working Group, established in 2009 and supported by the U.S. Department of Homeland Security's Office for Interoperability and Compatibility, has been developing a user guide for public safety video applications. The goal of the guide is to provide potential public safety video customers with links to research and specifications that best fit their particular application, as such research and specifications become available. The process of developing the guide will have the desired secondary effect of identifying areas in which adequate research has not yet been conducted, so that such gaps may be filled. A challenge for this particular work is ensuring that it is understandable to customers within public safety, who may have little knowledge of video technology (Leszczuk, Stange & Ford, 2011).

In July 2010, Volume 1.0 of the framework document "Defining Video Quality Requirements: A Guide for Public Safety" was released (VQiPS, 2010). This document provides qualitative guidance, such as explaining the role of various components of a video system and their potential impact on the resultant video quality. The information in this document as well as quantitative guidance have started to become available at the VQiPS Website in June 2011 (VQiPS, 2011).

The approach taken by VQiPS is to remain application agnostic. Instead of attempting to individually address each of the many public safety video applications, the guide is based

The metrics are not context specific, and they do not apply video surveillance-oriented

One of the metrics being definitively worth mention is Ghinea's Quality of Perception (QoP) (Ghinea & Chen, 2008; Ghinea & Thomas, 1998). Anyway, the QoP metric does not entirely fit video surveillance needs. It targets mainly video deterioration caused by frame rate (fps), whereas fps not necessarily affects the quality of CCTV and the required bandwidth (Janowski & Romaniak, 2010). The metric has been established for rather low, legacy resolutions, and tested on rather small groups of subjects (10 instead of standardized 24 valid, correlating subjects). Furthermore, a video recognition quality metric for a clear objective of video surveillance context requires tests in fully controlled environment (ITU-T, 2000), with standardized discrimination levels (avoiding ambiguous questions) and with minimized

Another metric being worth mention is QoP's offshoot, Strohmeier's Open Profiling of Quality (OPQ) (Strohmeier et al., 2010). This metric puts more stress on video quality than on recognition/discrimination levels. Its application context, being focused on 3D, is also different than video surveillance which requires rather 2D. Like the previous metric, this one also does not apply standardized discrimination levels, allowing subjects to use their own vocabulary. The approach is qualitative rather than quantitative, whereas the latter is preferred by public safety practitioners for e.g. public procurement. The OPQ model is somehow content/subject-oriented, while for video surveillance more generalized metric

OPQ partly utilizes free sorting, as used in (Duplaga et al., 2008) but also applied in the method called Interpretation Based Quality (IBQ) (Nyman et al., 2006; Radun et al., 2008), adapted from (Faye et al., 2004; Picard et al., 2003). Unfortunately, these approaches allow

Extensive work has been carried out in recent years in the area of consumer video quality, mainly driven by two working groups: VQiPS (Video Quality in Public Safety) (VQiPS, 2011)

The VQiPS Working Group, established in 2009 and supported by the U.S. Department of Homeland Security's Office for Interoperability and Compatibility, has been developing a user guide for public safety video applications. The goal of the guide is to provide potential public safety video customers with links to research and specifications that best fit their particular application, as such research and specifications become available. The process of developing the guide will have the desired secondary effect of identifying areas in which adequate research has not yet been conducted, so that such gaps may be filled. A challenge for this particular work is ensuring that it is understandable to customers within public safety,

who may have little knowledge of video technology (Leszczuk, Stange & Ford, 2011).

In July 2010, Volume 1.0 of the framework document "Defining Video Quality Requirements: A Guide for Public Safety" was released (VQiPS, 2010). This document provides qualitative guidance, such as explaining the role of various components of a video system and their potential impact on the resultant video quality. The information in this document as well as quantitative guidance have started to become available at the VQiPS Website in June 2011

The approach taken by VQiPS is to remain application agnostic. Instead of attempting to individually address each of the many public safety video applications, the guide is based

standardized discrimination levels.

impact of subliminal cues (ITU-T, 2008).

mapping relational, rather than absolute, quality.

and VQEG (Video Quality Experts Group) (VQEG, n.d.).

framework is needed.

(VQiPS, 2011).

on commonalities between them. Most importantly, as mentioned above, each application consists of some type of recognition task. The ability to achieve a recognition task is impacted by many parameters, and five of them have been selected as being of particular importance. As defined in: (Ford & Stange, 2010), they are:


These parameters form what are referred to as generalized use classes, or GUCs. Figure 1 is a representation of the GUC determination process.

Fig. 1. Classification of video into generalized use classes as proposed by VQiPS (source: (Ford & Stange, 2010)).

The VQiPS user guide is intended to help the end users determine how their application fits within these parameters. The research and specifications provided to users is also to be framed within those parameters. The end user is thus led to define their application within the five parameters and will in turn be led to specifications and other information most appropriate for their needs (Leszczuk, Stange & Ford, 2011).

Recently, a new VQEG project, Quality Assessment for Recognition Tasks (QART), was created for task-based video quality research. QART will address the problems of a lack of quality standards for video monitoring. The aims of QART are the same as the other VQEG projects — to advance the field of quality assessment for task-based video through collaboration in the development of test methods (including possible enhancements of the ITU-T Recommendations), performance specifications and standards for task-based video, and predictive models based on network and other relevant parameters.

Because of the above reasons there are additional requirements related to quality metrics. These requirements reflect a specific recognition task but also the viewing scenario. The Real-Time viewing scenario is more similar to the traditional quality assessment tests, although even here additional parameter such as relative target size has to be taken into account. In the case of the Viewer-Controlled viewing scenario additional quality parameters are related to a single shot quality. This is especially important for monitoring objects with a significant velocity. Sharpness of a single video frame (referred to as motion blur) may be a

Quality Assessment in Video Surveillance 61

There is one another quality parameter inherent for both viewing scenarios, i.e. source quality of a target. It reflects the ability to perform a given recognition task under the perfect conditions (when additional quality degradation factors do not exist). An example of two similar targets having completely different source quality is two pictures containing car license plate, one taken in a car dealer showroom and one during an off-road race. The second plate may be not only soiled but also blurred due to high velocity of the car. In such a case the license plate source quality is much lower for the second picture what affects significantly

All the additional factors have to be taken into account while assessing QoR for the task-based scenario. The definition of QoR changes between different recognition tasks and requires

In the rest of this chapter, as we have already mentioned at the end of Section 1, we would like to review the development of techniques for assessing video surveillance quality. In particular, we introduce a typical usage of task-based video: surveillance video for accurate license plate recognition. Furthermore, we also present the field of task-based video quality assessment from subjective psycho-physical experiments to objective quality models. Example test results

This section contains a description of the car plate recognition experiment. The purpose of this section is to illustrate an example of a task-based experiment. The presented experiment design phase reveals differences between the traditional QoE assessment and the task-based quality assessment tests. In the following sections, issues concerning the validation of testers

The purpose of the tests was to analyze the people's ability to recognize car license plates on video material recorded using a CCTV camera and compressed with the H.264/AVC codec.

The intended outcome of this experiment was to gather the results of the human recognition capabilities. Non-expert testers rated video sequences influenced by different compression parameters. We recorded the video sequences used in the test at a parking lot using a CCTV camera. We adjusted the video compression parameters in order to cover the recognition ability threshold. We selected ITU's ACR (Absolute Category Rating, described in ITU-T P.910

The recognition task was threefold: 1) type-in the license plate (number), 2) select a car color, and 3) select a car make. We allowed subjects to control playback and enter full screen mode.

crucial parameter determining the ability to perform a recognition task.

recognition ability.

implementation of dedicated quality metrics.

and models are provided alongside the descriptions.

**5. Case study — License plate recognition test-plan**

and the development of objective metrics are presented.

(ITU-T, 1999)) as the applied subjective test methodology.

In order to perform the analysis, we carried out a subjective experiment.

**5.1 Design of the experiment**
