**3.2 Challenges in appearance-based object retrieval for surveillance videos**

This section aims at pointing out existing challenges in appearance-based object retrieval for surveillance videos. As object indexing and retrieval take the output of video analysis as its input (cf. Fig. 1), the quality of the video analysis has a huge influence on object indexing and retrieval. Current achievements on surveillance video analysis show that video analysis is far from perfect since it is hampered by issues in low resolution, pose and lighting variations and object occlusion. In this section, we point out the challenges in appearancebased object retrieval by analyzing the effect of two modules of video analysis on the object indexing and retrieval quality: the object detection and the object tracking modules.

The object detection module is the module that allows to determine the object blobs. An object detection module is good if all blobs of a detected object (1) cover totally this object and (2) do not contain other objects. However, these constraints are not always met. Object retrieval has to address three difficult cases as shown in Fig. 6. In the first case, the object is not present at all in the blob (Fig. 6a). With the second case, the object is partially present in the blob (Fig. 6b) while with the third case, the blob of the detected object covers totally this object, however, it contains also other objects (Fig. 6c and Fig. 6d).

Concerning the object tracking quality, two metrics that are widely used for evaluating the performance of object tracking in the video surveillance community are *object ID persistence* and *object ID confusion* (Nghiem, Bremond et al. 2007). The *object ID persistence* metric helps to evaluate the ID persistence. It computes over the time how many tracked objects (output of the object tracking module) are associated to one ground-truth object. On the contrary, the *object ID confusion* metric computes the number of objects per detected object (having the same ID). A good object tracking algorithm obtains a small value for these two metrics (minimum is 1).

44 Recent Developments in Video Surveillance

It is worth noting that object blobs can be non-consecutive since an object may not be detected in certain frames and the value of N varies depending on the object life time in the scene. Fig. 5 gives an example of an object that is represented by its blobs. As we can notice, with poor object detection, several object blobs do not cover well the object appearance.

where O is object, Bi is the ith object blob, N is the total number of blobs of object O.

**3.2 Challenges in appearance-based object retrieval for surveillance videos** 

indexing and retrieval quality: the object detection and the object tracking modules.

object, however, it contains also other objects (Fig. 6c and Fig. 6d).

This section aims at pointing out existing challenges in appearance-based object retrieval for surveillance videos. As object indexing and retrieval take the output of video analysis as its input (cf. Fig. 1), the quality of the video analysis has a huge influence on object indexing and retrieval. Current achievements on surveillance video analysis show that video analysis is far from perfect since it is hampered by issues in low resolution, pose and lighting variations and object occlusion. In this section, we point out the challenges in appearancebased object retrieval by analyzing the effect of two modules of video analysis on the object

The object detection module is the module that allows to determine the object blobs. An object detection module is good if all blobs of a detected object (1) cover totally this object and (2) do not contain other objects. However, these constraints are not always met. Object retrieval has to address three difficult cases as shown in Fig. 6. In the first case, the object is not present at all in the blob (Fig. 6a). With the second case, the object is partially present in the blob (Fig. 6b) while with the third case, the blob of the detected object covers totally this

Concerning the object tracking quality, two metrics that are widely used for evaluating the performance of object tracking in the video surveillance community are *object ID persistence* and *object ID confusion* (Nghiem, Bremond et al. 2007). The *object ID persistence* metric helps to evaluate the ID persistence. It computes over the time how many tracked objects (output of the object tracking module) are associated to one ground-truth object. On the contrary, the *object ID confusion* metric computes the number of objects per detected object (having the same ID). A good object tracking algorithm obtains a small value for these two metrics

Fig. 5. An object is represented by its blobs.

(minimum is 1).

Fig. 6. Examples of object detection quality (a) The object is not present in the blob; (b) The object is partially present in the blob; (c) and (d) The object is totally present in the blob.

However, the obtained results in several video surveillance benchmarks show that current achievement on object tracking is still limited (object ID persistence and object ID confusion metrics are generally much greater than 1). Fig. 7 shows an example of the object ID persistence problem: two tracked objects created for one sole ground-truth object, therefore object ID persistence is equal to 2. Fig. 8 illustrates an example of object ID confusion: three ground-truth objects IDs associated to one sole detected object (object ID confusion = 3).

Fig. 7. An example of the object ID persistence problem: two tracked objects created for one sole ground-truth object (object ID persistence = 2).

Based on the above-mentioned analysis, the main challenge in surveillance object indexing and retrieval is the poor quality of object detection and tracking. An object indexing and retrieval algorithm is robust if it can work with different quality of the object detection and tracking.

With the object representation as defined in Eq. 1, we believe that object indexing and retrieval methods can address the poor quality of object detection and tracking problem if they have an effective object signature building and a robust object matching.

Appearance-Based Retrieval for Tracked Objects in Surveillance Videos 47

, 1, *Fj M <sup>j</sup>* : set of features extracted on the representative blobs. The extracted

Instead of calculating only the representative blobs, several authors compute a set of pairs: the representative blob and its associating weight while the weight associated with a representative blob shows the importance of this blob. With this, the first approach is

, 1, , , 1, , , 1,

(3)

*B i N Br w j M F w j M*

Fig. 9 shows an example of the first object signature building approach. From a large number of blobs (905 blobs), the object signature building method selects only 4

Fig. 9. An example of representative blob detection: 4 representative blobs are extracted

distinguished each from the other by the way to define the representative blobs.

the object blobs. This method is composed of the three following steps:

**Step 2.** Remove clusters having a small number of elements.

The methods presented in (Ma and Cohen 2007) and in (Le, Thonnat et al. 2009) are the most significant ones of the first object signature building approach. These methods are

The representative blob detection method proposed by Ma et Cohen (Ma and Cohen 2007) is based on the agglomerative hierarchical clustering and the covariance matrix extracted from

**Step 1.** Do agglomerative clustering on the original set of object blobs based on the

The first step aims at forming clusters of similar blobs. The similarity of two blobs is defined by using the covariance matrix. The covariance matrix is built over a feature vector f, for

(1) (2)

*i j j j j*

1

representative blobs. Their associated weights are 0.142, 0.005, 0.016 and 0.835.

*j*

*M j*

, 1, *Br j M <sup>j</sup>* : set of representative blobs detected for the object O.

feature can be color histogram, dominant color, etc.

with and 1

*NM w*

defined as follows:

from 905 blobs.

covariance matrix.

**Step 3.** Select representative blobs.

Fig. 8. An example of object ID confusion: three ground-truth object IDs associated to one sole detected object (object ID confusion = 3).
