**5. Case study — License plate recognition test-plan**

This section contains a description of the car plate recognition experiment. The purpose of this section is to illustrate an example of a task-based experiment. The presented experiment design phase reveals differences between the traditional QoE assessment and the task-based quality assessment tests. In the following sections, issues concerning the validation of testers and the development of objective metrics are presented.

#### **5.1 Design of the experiment**

The purpose of the tests was to analyze the people's ability to recognize car license plates on video material recorded using a CCTV camera and compressed with the H.264/AVC codec. In order to perform the analysis, we carried out a subjective experiment.

The intended outcome of this experiment was to gather the results of the human recognition capabilities. Non-expert testers rated video sequences influenced by different compression parameters. We recorded the video sequences used in the test at a parking lot using a CCTV camera. We adjusted the video compression parameters in order to cover the recognition ability threshold. We selected ITU's ACR (Absolute Category Rating, described in ITU-T P.910 (ITU-T, 1999)) as the applied subjective test methodology.

The recognition task was threefold: 1) type-in the license plate (number), 2) select a car color, and 3) select a car make. We allowed subjects to control playback and enter full screen mode.

**5.2 Source video sequences**

• resolution: 1280×720 pixels (720p)

• frame rate: 25 frames/s

**5.3 Processed video sequences**

used to present video sequences.

to a minimum.

We collected source video sequences at AGH University of Science and Technology in Krakow by filming a car parking lot during high traffic volume. In this scenario, we located the camera 50 meters from the parking lot entrance in order to simulate typical video recordings. Using ten-fold optical zoom, we obtained 6m×3.5m field of view. We placed the camera statically without changing the zoom throughout the recording time, which reduced global movement

Quality Assessment in Video Surveillance 63

We conducted acquisition of video sequences using a 2 mega-pixel camera with a CMOS sensor. We stored the recorded material on an SDHC memory card inside the camera.

We analyzed all the video content collected in the camera and we cut it into 20 second shots including cars entering or leaving the car park. The license plate was visible for a minimum

We asked the owners of the vehicles filmed for their written consent, which allowed the use

If picture quality is not acceptable, the question naturally arises of how it happens. As we have already mentioned at the beginning of Section 5.2, the sources of potential problems are located in different parts of the end-to-end video delivery chain. The first group of distortions (1) can be introduced at the time of image acquisition. The most common problems are noise, lack of focus or improper exposure. Other distortions (2) appear as a result of further compression and processing. Problems can also arise when scaling video sequences in the quality, temporal and spatial domains, as well as, for example, the introduction of digital watermarks. Then (3), for transmission over the network, there may be some artifacts caused by packet loss. At the end of the transmission chain (4), problems may relate to the equipment

Considering this, we encoded all source video sequences (SRC) with a fixed quantization parameter QP using the H.264/AVC video codec, x264 implementation. Prior to encoding, we applied some modifications involving resolution change and crop in order to obtain diverse aspect ratios between car plates and video size (see Figure 3 for details related to processing). We modified each SRC into 6 versions and we encoded each version with 5 different quantization parameters (QP). We selected three sets of QPs: 1) {43, 45, 47, 49, 51}, 2) {37, 39, 41, 43, 45}, and 3) {33, 35, 37, 39, 41}. We adjusted selected QP values to different video processing paths in order to cover the license plate recognition ability threshold. We have kept frame rates intact as, due to inter-frame coding, their deterioration does not necessarily result in bit-rates savings (Janowski & Romaniak, 2010). Furthermore, we have not considered network streaming artifacts as we believed that in numerous cases they are related to excessive bit-streams, which we had already addressed by different QPs. Reliable video streaming solution should adjust video bit-stream according to the available network resources and

prevent from packet loss. As a result, we obtained 30 different HRC.

17 seconds in each sequence. The parameters of each source sequence are as follows:

• average bit-rate: 5.6 — 10.0 Mbit/s (depending on the local motion amount) • video compression: H.264/AVC in Matroska Multimedia Container (MKV)

of the video content for testing and publication purposes.

We performed the experiment using diverse display equipment in order to be eventually able to analyze the influence of display resolution on the recognition results.

We decided that each tester would score 32 video sequences. The idea was to show each source (SRC) sequence processed under different conditions (HRC) only once and then add two more sequences in order to find out whether testers would remember the license plates already viewed. We screened the *n*-th tester with two randomly selected sequences and 30 SRCs processed under the following HRCs:

$$\text{HRC} = \text{mod}(n - 2 + \text{SRC}, \text{30}) + 1 \tag{1}$$

The tests were conducted using a Web-based interface connected to a database. We gathered

Fig. 2. Test interface.

both information about the video samples and the answers received from the subjects in the database. The interface is presented in Figure 2 (Leszczuk, Janowski, Romaniak, Glowacz & Mirek, 2011).

#### **5.2 Source video sequences**

6 Will-be-set-by-IN-TECH

We performed the experiment using diverse display equipment in order to be eventually able

We decided that each tester would score 32 video sequences. The idea was to show each source (SRC) sequence processed under different conditions (HRC) only once and then add two more sequences in order to find out whether testers would remember the license plates already viewed. We screened the *n*-th tester with two randomly selected sequences and 30

The tests were conducted using a Web-based interface connected to a database. We gathered

both information about the video samples and the answers received from the subjects in the database. The interface is presented in Figure 2 (Leszczuk, Janowski, Romaniak, Glowacz &

HRC = *mod*(*n* − 2 + SRC, 30) + 1 (1)

to analyze the influence of display resolution on the recognition results.

SRCs processed under the following HRCs:

Fig. 2. Test interface.

Mirek, 2011).

We collected source video sequences at AGH University of Science and Technology in Krakow by filming a car parking lot during high traffic volume. In this scenario, we located the camera 50 meters from the parking lot entrance in order to simulate typical video recordings. Using ten-fold optical zoom, we obtained 6m×3.5m field of view. We placed the camera statically without changing the zoom throughout the recording time, which reduced global movement to a minimum.

We conducted acquisition of video sequences using a 2 mega-pixel camera with a CMOS sensor. We stored the recorded material on an SDHC memory card inside the camera.

We analyzed all the video content collected in the camera and we cut it into 20 second shots including cars entering or leaving the car park. The license plate was visible for a minimum 17 seconds in each sequence. The parameters of each source sequence are as follows:


We asked the owners of the vehicles filmed for their written consent, which allowed the use of the video content for testing and publication purposes.

#### **5.3 Processed video sequences**

If picture quality is not acceptable, the question naturally arises of how it happens. As we have already mentioned at the beginning of Section 5.2, the sources of potential problems are located in different parts of the end-to-end video delivery chain. The first group of distortions (1) can be introduced at the time of image acquisition. The most common problems are noise, lack of focus or improper exposure. Other distortions (2) appear as a result of further compression and processing. Problems can also arise when scaling video sequences in the quality, temporal and spatial domains, as well as, for example, the introduction of digital watermarks. Then (3), for transmission over the network, there may be some artifacts caused by packet loss. At the end of the transmission chain (4), problems may relate to the equipment used to present video sequences.

Considering this, we encoded all source video sequences (SRC) with a fixed quantization parameter QP using the H.264/AVC video codec, x264 implementation. Prior to encoding, we applied some modifications involving resolution change and crop in order to obtain diverse aspect ratios between car plates and video size (see Figure 3 for details related to processing). We modified each SRC into 6 versions and we encoded each version with 5 different quantization parameters (QP). We selected three sets of QPs: 1) {43, 45, 47, 49, 51}, 2) {37, 39, 41, 43, 45}, and 3) {33, 35, 37, 39, 41}. We adjusted selected QP values to different video processing paths in order to cover the license plate recognition ability threshold. We have kept frame rates intact as, due to inter-frame coding, their deterioration does not necessarily result in bit-rates savings (Janowski & Romaniak, 2010). Furthermore, we have not considered network streaming artifacts as we believed that in numerous cases they are related to excessive bit-streams, which we had already addressed by different QPs. Reliable video streaming solution should adjust video bit-stream according to the available network resources and prevent from packet loss. As a result, we obtained 30 different HRC.

answer for a particular task could be remembered. For this reason different pool of sequences is shown to different subjects (e.g. each compression level for a given source sequences needs

Quality Assessment in Video Surveillance 65

A more formal way toward validation of subjects is the Rasch theory (Boone et al., 2010). It defines the difficult level for each particular question (e.g. single video sequence from a test set), or whether a subject is more or less critical in general. Based on this information it is possible to detect answers that not only do not fit the average, but also individual subjects' behavior. Formally the probability of giving correct answer is estimated by equation (Baker,

Estimating both the task difficulty and subject ability make it possible to predict the correct

In order to estimate *β<sup>n</sup>* and *δ<sup>i</sup>* values the same tasks have to be run by all the subjects which is a disadvantage of the Rasch theory, similarly to the correlation-based method. Moreover, the more subjects involved in the test, the higher the accuracy of the method. An excellent example of this methodology in use is national high school exams, where the Rasch theory helps in detecting the differences between different boards marking the pupils' tests (Boone et al., 2010). In subjective experiments, there are always limited numbers of answers per question. This means that the Rasch theory can still be used, although the results need to be checked carefully. Tasks-based experiments are a worst-case scenario. In this case each subject carries out a task a very limited number of times in order to ensure that the task result (for example license plate recognition) is based purely on the particular distorted video and is

In order to solve this problem we propose two custom metrics for subject validation. They both work for partially ordered test sets (Insall & Weisstein, 2011), i.e. those for which certain subsets can be ordered by task difficulty. Additionally we assume that answers can be classified as correct or incorrect (i.e. a ground truth is available, e.g. a license plate to be recognized). Note that due to the second assumption these metrics cannot be used for quality assessment tasks, since we cannot say that one answer is better than another (as we have

Assuming that the test set is partially ordered can be interpreted in a numeric way: if a subject fails to recognize a license plate, and for *n* sequences with higher or equal QP the license plate was recognized correctly by other subjects, the subject's inaccuracy level is increased by *n*. Higher *n* values may indicate a better chance that the subject is irrelevant and did not pay

Computing such coefficients for different sequence results in the total subject quality (*Sqi*)

*Sqi* = ∑ *j*∈*Si*

<sup>1</sup> <sup>+</sup> exp(*β<sup>n</sup>* <sup>−</sup> *<sup>δ</sup>i*) (2)

*ssqi*,*<sup>j</sup>* (3)

*<sup>P</sup>*(*Xin* <sup>=</sup> <sup>1</sup>) = <sup>1</sup>

where *β<sup>n</sup>* is ability of *n*th person to make a task and *δ<sup>i</sup>* is the *i*th task difficulty.

answer probability. Such probability can be compared with the real task result.

not remembered by the subject. This makes the Rasch theory difficult to use.

mentioned before, there is no ground truth regarding quality).

to be shown to a different subject).

1985)

**6.1 Logistic metric**

given by

attention to the recognition task.

Fig. 3. Generation of HRCs.

Based on the above parameters, it is easy to determine that the whole test set consists of 900 sequences (each SRC 1-30 encoded into each HRC 1-30).
