**1. Introduction**

The purpose of the visual tracking for outdoor vehicle is to estimate the state of outdoor vehicle and provide current traffic state accurately and comprehensively. At present, it has become an important part of intelligent transport system (ITS). However, robust tracking for outdoor vehicle is still a challenging problem due to the complex and varying outdoor

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and eproduction in any medium, provided the original work is properly cited. © 2018 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

environment. Many researchers proposed solutions to the different challenging environment. Rad [1] proposed a strategy that can solve the problem of occlusion during the tracking process of moving vehicles on highway. But, the tracking accuracy of this method will be greatly reduced when the lighting conditions change sharply. Zhang et al. [2] proposed a multi-layer occlusion detection and processing framework that can be used to deal with the problem of mutual occlusion between two vehicles. Faro et al. [3] further improved [2] by introducing curvature scale space to segment occluded region accurately. Xin et al. [4] proposed adaptive multiple cues integration for robust outdoor vehicle visual tracking in the particle filter framework. This method has strong robustness against color interference and partial or complete occlusion of vehicles. Although these existing methods have achieved certain progress in outdoor vehicles visual tracking, these methods can only deal with the occlusion problem between two vehicles or the occlusion of vehicles by other objects. However, in the actual traffic scene, the mutual occlusion between multiple vehicles often occurs and faces the challenges of complex outdoor environments such as illumination variation (IV), cluttered background (BC), and fast motion (FM). Therefore, the robust outdoor vehicles visual tracking remains a thorny issue.

Existing visual object tracking algorithms are mainly divided into two major categories that include generative model and discriminative model. The generative model learns the appearance representation of the object and searches the candidate area that most closely matches the object appearance template as the location of the object in the new frame. The discriminative model treats the object tracking as a binary classification problem, using the learned characteristics to distinguish the object and background information. Therefore, the extraction of robust features is the key to the success of the object tracking technology. Traditional visual object tracking methods rely on artificial features; the low-dimensional artificial features are not robust to large appearance variation of object. Recently, deep learning shows promising performance in automatic extracting feature that outperforms pre-defined handcraft feature methods. Deep learning can map the original feature space to another feature space to learn more abundant features. Recently, deep learning has been widely applied to image processing, speech recognition, natural language processing, health care, robotics, and other fields for its powerful feature learning capability. It has been proved that feature representation when learnt in a deep learning way encourages sparsity. And k-sparse constraint can guarantee that each input for a certain sparsity. At the same time, some scholars have applied it to video object tracking technology. Due to the powerful feature representation ability of deep learning, the robustness of visual object tracking technology has been greatly improved. Wang et al. [5] proposed a fully convolutional networks tracker (FCNT) that uses convolutional neural networks to learn the characteristics of objects from large-scale classification datasets and further analyses performance of the extracted features in the object tracking aspect. High-level features are good at distinguishing different kinds of objects and are very robust to the appearance variation of the object. Low-level features more focus on the local details of the object and can be used to distinguish similar distractors in the background. FCNT can effectively prevent object tracking drift based on the effective use of different layers of convolutional neural network (CNN) features. Nam et al. [6] subsequently proposed the Multi-Domain Convolutional Neural Networks (MDNet). Unlike [5], MDNet directly uses the tracking video data sets train, the deep learning model to obtain the universal feature representation of the object, and then fine-tunes the network parameters for each particular video sequence in online tracking to achieve more robust tracking. However, the tracking speed of MDNet is slow and cannot meet the requirements of real-time performance. Existing research has demonstrated that sparseness is encouraged when deep learning learns feature representations. Because sparse representation can reduce the complexity of the representation, which is crucial to improve the speed of the object tracking algorithm, sparse constraints can be used to further optimize the deep network [7, 8] and can make the original signal express more meaningful, which has been verified by independent principal component analysis and sparse coding algorithm [9]. In general, there are two ways to add sparse constraints into the deep network for sparse representation: sparseness of the hidden layer response and weight sparseness between the hidden layer and the input layer. In this chapter, we adopt the first method for sparse representation. At the same time, we perform k-sparse constraint in neural network to keep only k highest activities in hidden layers, which can maintain the sparse representation of each input [10]. In other words, we add the k sparse constraint to the original stacked denoising autoencoder (SDAE) hidden layer unit and form kSSDAE, which is used as a feature extractor in the target tracking to better learn the invariant feature of the object appearance. Therefore, the application of kSSDAE in object tracking can overcome poor robustness problem and further improve the robustness of visual tracking. The main contributions of this chapter are as follows.

