**3.1 Definitions**

42 Recent Developments in Video Surveillance

trajectory represents a relatively stable pattern of object movement. Other work attempts to move to higher levels of object trajectory representation, named symbolic level and semantic level. At symbolic level, (Chen, Ozsu et al. 2004; Hsieh, Yu et al. 2006; Le, Boucher et al. 2007) aim to convert object trajectory into a character sequence. The advantage is that they promote the applying of successful and famous methods in text retrieval such as the Edit Distance for object trajectory matching. The approaches dedicated to object trajectory representation at the semantic level try to learn the semantic meaning such as turn left, low speed from object movement (Hu, Xie et al. 2007). As results, the output is close to the

Object representation based on its appearance has attracted a lot of research interest. Appearance-based object retrieval methods for surveillance video are distinguished each other by two criteria. The first criterion is the appearance feature extracted on the image/frame where the object is detected and the second one is the way to create object signature from all features extracted over the object's life time and to match objects based on their signatures. In the next section, we describe in detail the object signature building and object matching methods. In this section, we only present the object appearance feature.

There is a great variety of object features used for surveillance object representation. In fact, all features that are proposed for image retrieval can be applied for surveillance object representation. Appearance object features can be divided into two categories: global and local. Global features are color histogram, dominant color, covariance matrix, just to name a few. Besides global features, local features such as interest points and SIFT descriptor can be

In (Yuk, Wong et al. 2007), the authors have proposed to use MPEG-7 descriptors such as dominant colors, edge histograms for surveillance retrieval. In the context of one research project conducted by IBM research center1, the researchers have evaluated a large number of color features for surveillance application that are standard color histograms, weighted color histograms, variable bin size color histograms and color correlograms. Results show color correlogram to have the best performance. Ma et Cohen (Ma and Cohen 2007) suggest to use the covariance matrix as object feature. According to the authors, the covariance matrix is appealing because it fuses different types of features and has small dimensionality. The small dimensionality of the model is well suited for its use in surveillance videos because it takes very little storage space. In our research (Le, Boucher et al. 2010), we have evaluated the performance of 4 descriptors which are dominant color, edge histogram, covariance matrix (CM) and SIFT descriptor for surveillance object representation and matching. The obtained results show that if the objects are detected while the background and context objects are not present in the object region, the used descriptors allow retrieving objects with relatively good results. For other cases, the covariance matrix is more effective than the other descriptors. According to our experiments, it is interesting to see that when the covariance matrix represents information of all pixels in a blob, the points of interest use only few pixels. The dominant color and the edge histogram use the approximate information of pixel color and edge. A pair of descriptors (covariance matrix and dominant color) or (covariance matrix and edge histogram) or (covariance matrix and SIFT

descriptors) may be chosen as default descriptors for object representation.

1 https://researcher.ibm.com/researcher/view\_project.php?id=1393

human manner of thinking. However, they strongly depend on applications.

extracted from the object's region.

Definition 1: An **object blob** is a region determined by a minimal bounding box in a frame where the object is detected.

The minimal bounding box is calculated by the object detection module in video analysis and an object has one sole minimal bounding box. Fig. 4 gives some examples of detected objects and their corresponding blobs.

Fig. 4. Detected objects and their blobs (Bak, Corvee et al. 2010).
