**5. Classification results**

166 Advances in Object Recognition Systems

performance of any approach is evaluated by measuring the accuracy of motion classification using a specified algorithm. Many algorithms can be used. The more used in

It is a popular unsupervised method for learning object categories from interest point features and it was implemented based on Niebles et al. (Niebles et al., 2008). Histogram features of training or testing samples are concatenated to form a co-occurrence matrix

Support Vector Machines (SVM) is one of the most popular classifier which has recently gained popularity within visual pattern recognition. In spatial recognition, local features have recently been combined with SVM in a robust classification approach. In a similar manner, Schuldt et al. (Schuldt et al., 2004) explored the combination of local space-time features and SVM and apply the resulting approach to the recognition of human actions.

The classification algorithm is based on an unsupervised clustering algorithm K-MEANS The choice of this method is justified by the low running time and a priori knowledge of the number of classes K. The algorithm is based on a parameter vector V based on the criteria mentioned in previous sections. Table 3 shows the ranking of the parameters belonging to

P5 Distance between the spatiotemporal box centroid and the bounding

The classification of movements is made in a hierarchical manner. Indeed a first algorithm classifies the movement into two classes. The first concerns the movements made by the whole body while the second represents the movements made only by hands. In this algorithm we used only five parameters {P2, P3, P4, P5 and P6}. The second algorithm

will be described in the following subsections.

which is an input of the pLSA algorithm.

**4.2 Support Vector Machines (SVM)** 

**4.3 Proposed algorithm** 

Parameter Feature

**4.1 Probabilistic latent semantic analysis (pLSA)** 

the vector V from the most to least significant paarameter.

P2 Spatiotemporal box area/ Body bounding box area

P10 Average value of the activity function variance

achieves an overall classification and uses the entire set of parameters.

P11 Slope of the curve x=f(t) of the centroid

P1 Spatiotemporal box area

P3 H-STIP existence (1 or 0) P4 L-STIP existence (1 or 0)

box centroid P6 STIPs Number /100 frames P7 Global maximum value P8 Local maxima number P9 Mean value of local maxima

Table 3. List of parameters belonging to the vector V

The KTH human action database is the largest database available. Each video contains a single action. The database contains six types of human movements (walking, jogging, running, boxing, hand waving and hand clapping). These movements are performed several times by 25 persons in different scenarios, in external or internal environment. The database contains a total of 600 long sequences, that can be divided to more than 10 short sequences of 4 seconds each one.

To test the results of our approach for the recognition task, we used 25% of samples from the video database for the learning task. The 75% remaining video samples are used in the validation task of the performance of the method developed. Figure 10 shows the confusion matrix of classification results for the KTH database.

The confusion matrix in Figure 10 shows the performance obtained for the KTH human action database. Indeed, 450 samples were used to obtain these results (75 for each class). Each column of the matrix represents the accuracy of a class estimated, while each row represents the accuracy of a real class.

Non-Rigid Objects Recognition: Automatic Human Action Recognition in Video Sequences 169

between them. Additionally, other metrics will be used to evaluate the methods

Bobick, A. F. & Davis, J. W. (2001). The recognition of human movement using temporal

Dollár, P.; Rabaud, V.; Cottrell, G. & Belongie, S. (2005). Behavior recognition via sparse

Efros, A. A., Berg, A. C., Mori, G., & Malik, J. (2003). Recognizing action at a distance.

Harris, C., & Stephens, M. (1988). A combined corner and edge detector. *Proceedings of the* 

Hoey, J. (2001). Hierarchical unsupervised learning of facial expression categories.

Ikizler, N., & Duygulu, P., (2009). Histogram of oriented rectangles: A new pose descriptor

Kadir, T., & Brady, M., (2003). Scale saliency: a novel approach to salient feature and scale

Koelstra, S., & Patras, I., (2009). The fast-3D spatio-temporal interest region detector.

Laganière, R., Bacco, R., Hocevar, A. Lambert, P. Païs, G. and Ionescu B.E., Video

Laptev, I. (2005). On space-time interest points. *International Journal of Computer Vision*,

Laptev, I., & Lindeberg, T., (2004). Local descriptors for spatiotemporal recognition.

Lowe, D., (1999). Object recognition from local scale-invariant features. *Proceedings of* 

MacQueen, J. B. (1967). Some Methods for classification and Analysis of Multivariate

242-245, ISBN 978-1-4244-3609-5, London, UK, May 6-8, 2009.

Vol.64, No.2–3, (September 2005), pp. 107–123, ISSN 0920-5691

summarization from spatio-temporal features. ACM, 2008.

templates. *IEEE Transactions on Pattern Analysis and Machine Intelligence*, Vol.23,

spatio-temporal features, *Proceedings of 2nd joint IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance,* pp. 65–72,

*Proceedings of the ninth IEEE international conference on computer vision,* Vol. 2, pp.

*fourth Alvey vision conference,* pp. 147– 152, University of Manchester, UK, August

*Proceedings of IEEE workshop on detection and recognition of events in video,* pp. 99–106,

for human action recognition. *Image and Vision Computing,* Vol.27, No.10,

selection. *Proceedings of international Conference on Visual Information Engineering,* pp.

*Proceedings of 10th Workshop on Image Analysis for Multimedia Interactive Services*, pp.

Proceedings of First International Workshop "Spatial Coherence for Visual Motion Analysis" Springer LNCS Vol.3667, pp. 91-103, ISBN 3-540-32533-6. Prague, Czech

*International Conference on Computer Vision*, pp. 1150–1157, ISBN 0-7695-0164-8,

Observations. *Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability,* pp. 281–297, University of California, USA, June 21-July 18, 1965 and

performances such as Precision, Recall, True Negative Rate (TNR) etc.

No.3, (Mars 2001), pp. 257–267, ISSN 0162-8828.

31- September 2, 1988.

Republic, May, 15, 2004.

ISBN 0-7803-9424-0, Beijing, China, October 15-16, 2005.

ISBN 0-7695-1293-3, Vancouver, Canada, July 8, 2001.

(September 2009), pp. 1515–1526, ISSN 0262-8856.

25–28, ISBN 1-55860-715-3, November, 2000.

Kerkyra, Corfu, Greece, September, 20-25, 1999.

December 27, 1965-January 7, 1966

726–733, ISBN 0-7695-1950-4, Nice, France, October 13-16, 2003.

**7. References** 

Fig. 10. Confusion matrix for KTH human action database

The best accuracy is obtained for running action while boxing action has the lowest accuracy. The overall recognition rate of our approach exceeds 95%.

The developed approach leads to interesting results compared to other algorithms for human action recognition. All these methods use STIPs to characterize movements without tracking algorithms or background segmentation. Our approach is also comparable to methods based on tracking or segmentation. In Table 4, we illustrate the classification of different approaches according to their accuracy.


Table 4. Classification of different approaches according to their accuracy
