**3.1. Construction of classification neural network**

The main function of this module is to compute the confidence of each particle during the online tracking. Here, confidence is used for evaluating every particle's reliability. In this chapter, the classification neural network can be constructed by connecting the encoder of the well-trained kSSDAE with a classification layer as shown in **Figure 2**.

In feedforward phase, the hidden activities function *z* can be computed as

$$z = \mathcal{W}^T \mathbf{x} + \mathbf{b} \tag{1}$$

**Figure 2.** Architecture of classification neural network (1024-2560-1024-512-256-1).

where, *x* is the input vector, *W* is weight, and *b* is bias. We keep the k as largest hidden units and set others to zero.

Reconstruction error can be computed using the sparsified *z* as follows:

$$E = \left\| \mathbf{x} - \left( \mathbf{W} \mathbf{z} + \mathbf{b}' \right) \right\|\_{2}^{2} \tag{2}$$

In back propagate phase, weights can adjusted by the k highest activities back propagating the reconstruction. The confidence computed by classification neural network reflects the credibility of decision in feature vector space of classifier. Ref. [27] has proved that when we use mean square error or cross-entropy as the cost function, the output expectation of multi-layer neural network is posterior probability of each class.

Let *oi* be the output of the neural network corresponding to the *ki* class, the output expectation can be computed by the posterior probability

$$E(o\_i) = P(k\_i|\mathbf{x})\tag{3}$$

Generally, the class with maximum probability is taken as a decision. So, the confidence can be obtained from the maximum output of the classification neural network.

$$c(\mathbf{x}) = E(\mathbf{max}a\_i) \tag{4}$$

At the beginning of the visual tracking, we select the object to be tracked and fine-tune the classification neural network using positive and negative samples. In the process of online tracking, in order to adapt specific object appearance changes, we need to fine-tune the classification neural network again when the confidence, which is calculated by the classification neural network, is lower than the predefined threshold.

#### **3.2. Estimation of the object state**

Object state can be estimated by the object tracking algorithm, which can be viewed as a problem to estimate the posterior distribution *p s<sup>i</sup> <sup>t</sup>*j*y*<sup>1</sup>:*<sup>t</sup>* of state *st* at time *t* according to dynamic system *p s<sup>i</sup> <sup>t</sup>*j*st*�<sup>1</sup> of the object state. In this chapter, object state *st* is represented by six affine transformation parameters corresponding to horizontal translation, vertical translation, scale angle, aspect ratio, and skewness, and the state transition distribution *p s<sup>i</sup> t*j*si t*�1 of each dimension can be modeled as a zero-mean normal distribution. The purpose of visual object tracking is to estimate the object state *st* (location, scale, etc.) from image sequences given all observations by any appropriate loss function, for example, maximum a posteriori (MAP) estimation or minimum mean square error (MMSE) estimation. The main online tracking steps under the particle filter framework are as follows.

#### *3.2.1. Computing observation probability*

Each particle represents a possible instantiation of the state of the object being tracked. Most likely, particle represents the object state at time *t*. Confidence *ci <sup>t</sup>* of each particle can be calculated by the classification neural network. When the maximum confidence is lower than the predefined threshold *τ*, that is, if max *ci t* ≤ *τ*, we will fine-tune classification neural network by reselecting positive and negative training samples. If max *ci t > τ*, we calculate the observation probability by normalizing confidence

$$p\left(y\_t|s\_t^i\right) \propto c\_{t'}^i \, i = 1, 2, \ldots, n \tag{5}$$

#### *3.2.2. Updating weight*

The weights for each particle can be updated according to the observation probability

$$w\_t^i = w\_{t-1}^i \cdot \frac{p\left(y\_t|\mathbf{s}\_t^i\right)p\left(\mathbf{s}\_t^i|\mathbf{s}\_{t-1}^i\right)}{q\left(\mathbf{s}\_t|\mathbf{s}\_{t-1}, y\_{1:t}\right)}\tag{6}$$

where, *q st*j*st*�<sup>1</sup>*; y*<sup>1</sup>:*<sup>t</sup>* is importance distribution and is often assumed to follow a first-order Markov process in which the state transition is independent of the observation. So the weights are updated as *wi <sup>t</sup>* <sup>¼</sup> *wi <sup>t</sup>*�<sup>1</sup> � *p yt* j*si t* .

Finally, object state can be estimated by taking the particle with the largest weight at each time step.

The implementation process of the proposed kSSDAE-based tracker is given as follows:

#### **Algorithm** Outdoor Vehicle Tracking

**Input**: Training samples; Video frame *t*.

Training SDAE offline;

Constructing classification neural network;

Connecting the encoder part of kSSDAE and a classification layer as shown in **Figure 2**;

Adding k sparse constraint into classification neural network;

**For** *t* ¼ 1*,* 2*,*…*, N* frame number **do**

Sampling particles *St* <sup>¼</sup> *<sup>s</sup><sup>i</sup> t <sup>n</sup> i*¼1*;*

Calculate confidence for each particle by (4);

**If** *t* = 1

Sampling positive and negative samples;

Fine-tuning classification neural network;

**end**

**If** max *ci t* ≤ *τ*

Sampling positive and negative samples;

Fine-tuning classification neural network.

#### **Else**

Calculating observation probability by (5);

Updating weights by (6);

*t* ¼ *t* þ 1.

**end**

**end**

**Output**: Object state
