4. Active co-tracking

The co-tracking framework provides a means for classifiers to exchange information. This framework utilizes a utility measure (e.g., the classification confidence in [34]) to select the data for which one of the collaborators fails to classify with high confidence and then trains the other classifier on those data. This approach has two main shortcomings: (1) the redundant labeling of all samples for both classifiers and (2) training the collaborator with "all" of the uncertain samples. While the former increases the complexity of the system, the latter is not the optimal solution for tracking a target with non-stationary appearance distributions [35].

In this view, a principled ordering of samples for training [70] and selecting a subset of them based on criteria [37] can reduce the cost of labeling leading to faster performance increase as a function of the amount of data available. It is found that detectors trained with an effective, noise-free, and outlier-free subset of the training data may achieve higher performance than those trained with the full set [71, 72].

Robust learning algorithms provide an alternative way of differentially treating training examples, by assigning different weights to different training examples or by learning to ignore outliers [73]. Learning first from easy examples [74], pruning adversarial examples1 [75], and sorting the samples based on their training value [37] are some of the approaches explored in the literature. However, the most common setting is active learning, whereby most of the data is unlabeled and an algorithm selects which training examples to label at each step, for the highest gains in performance. Thus, some active learning approaches focus on learning the hardest examples first (those closest to the decision boundary). Some approaches focus on learning the hardest examples first (e.g., those closest to the decision boundary), whereas some others gauge the information contained in the sample and select the most informative ones first. For example, Lewis and Gale [76] utilized the uncertainty of the classifier for a sample as an index of its usefulness for training.

Algorithm 1: Active co-tracking (ACT)

Input: Target position in last frame <sup>p</sup><sup>t</sup>�<sup>1</sup>

pj t <sup>t</sup> <sup>j</sup>θð Þ<sup>1</sup> t � � (Eq.(6))

Determine uncertain samples U<sup>t</sup> (Eq.(7))

<sup>t</sup> Sign h x

� � <sup>&</sup>gt; <sup>τ</sup><sup>p</sup> and <sup>P</sup><sup>n</sup>

Approximate target state <sup>p</sup>b<sup>t</sup> (Eq.(9))

<sup>t</sup> with U<sup>t</sup>

else target occluded

<sup>t</sup> : l<sup>j</sup>

pj t <sup>t</sup> ; l j t � �

<sup>t</sup> is uncertain

t Sign s<sup>j</sup>

pj t <sup>t</sup> <sup>j</sup>θð Þ<sup>2</sup> t � � � �

> t � �

<sup>t</sup> with D<sup>t</sup>�Δ,::,t every Δ frames (Δ ¼ 1 for T4)

<sup>j</sup>¼<sup>1</sup> <sup>π</sup><sup>j</sup>

<sup>t</sup> > τ<sup>a</sup> then

for j 1 to n do Generate a sample p<sup>j</sup>

<sup>t</sup> h x

<sup>t</sup> <sup>∈</sup>U<sup>t</sup> then <sup>θ</sup>ð Þ<sup>1</sup>

<sup>t</sup> : l<sup>j</sup>

Calculate s<sup>j</sup>

Query θð Þ<sup>2</sup>

Label using θð Þ<sup>1</sup>

D<sup>t</sup> Dt∪ x

Update θð Þ<sup>2</sup>

Update θð Þ<sup>1</sup>

<sup>p</sup>b<sup>t</sup> <sup>p</sup><sup>t</sup>�<sup>1</sup>

<sup>j</sup>¼<sup>1</sup> <sup>1</sup> <sup>l</sup> j <sup>t</sup> > 0

if P<sup>n</sup>

if p<sup>j</sup>

else

Output: Target position in current frame p<sup>t</sup>

<sup>t</sup> � <sup>N</sup> <sup>p</sup><sup>t</sup>�<sup>1</sup>;Σsearch � �

samples from the main detector and feeds them to the lightweight classifier to learn.

Figure 4. Active co-tracker, a collaborative tracker that utilizes an active query mechanism to query the most informative

Active Collaboration of Classifiers for Visual Tracking http://dx.doi.org/10.5772/intechopen.74199 111

## 4.1. The idea

Active learning has been used in visual tracking to consider the uncertainty caused by bags of samples [55], to reduce the number of necessary labeled samples [77], to unify sample learning and feature selection procedure [78], and to reduce the sampling bias by controlling the variance [79].

In this study, we utilized the sampling uncertainty that can bind the active learning and cotracking. As mentioned earlier, the baseline classifier, despite being accurate, has low generalization on new samples, slow classification speed, and computationally expensive retraining. On the other hand, the auxiliary classifier is agile and learns rapidly, with negligible retraining time. To combine the merits of these two classifiers, to cancel out their demerits with one another, and to address the aforementioned issues of co-tracking (redundant labeling and excessive samples), we incorporate an active learning module to select the most informative data, i.e., those for which the naive classifier is most uncertain, and query their labels from the part-based detector. This architecture (Figure 4, here called T4) mainly uses naive classifier for labeling the data and only asks the label of hard samples from the slower detector and, therefore, limits the redundancy and unleashes the speed of the agile classifier. In addition, by training the naive classifier only on hard samples, the generalization of this classifier is preserved while increasing its accuracy.

To further increase the accuracy of the tracker and make it more robust against occlusions and drastic temporal changes of the target, it is possible to update the detector less frequently. This asymmetric version of the active co-tracker (T5), by introducing long-term memory to the tracker, benefits from combining the long- and short-term collaboration (as in [62]) and reduces the frequency of the expensive updates of the tracker (Algorithm 1).

<sup>1</sup> Images with tiny, imperceptible perturbations that fool a classifier into predicting the wrong labels with high confidence

Active Collaboration of Classifiers for Visual Tracking http://dx.doi.org/10.5772/intechopen.74199 111

Figure 4. Active co-tracker, a collaborative tracker that utilizes an active query mechanism to query the most informative samples from the main detector and feeds them to the lightweight classifier to learn.

Algorithm 1: Active co-tracking (ACT)

Input: Target position in last frame <sup>p</sup><sup>t</sup>�<sup>1</sup>

Output: Target position in current frame p<sup>t</sup>

for j 1 to n do

function of the amount of data available. It is found that detectors trained with an effective, noise-free, and outlier-free subset of the training data may achieve higher performance than

Robust learning algorithms provide an alternative way of differentially treating training examples, by assigning different weights to different training examples or by learning to ignore outliers [73]. Learning first from easy examples [74], pruning adversarial examples1 [75], and sorting the samples based on their training value [37] are some of the approaches explored in the literature. However, the most common setting is active learning, whereby most of the data is unlabeled and an algorithm selects which training examples to label at each step, for the highest gains in performance. Thus, some active learning approaches focus on learning the hardest examples first (those closest to the decision boundary). Some approaches focus on learning the hardest examples first (e.g., those closest to the decision boundary), whereas some others gauge the information contained in the sample and select the most informative ones first. For example, Lewis and Gale [76] utilized the uncertainty of the classifier for a sample as

Active learning has been used in visual tracking to consider the uncertainty caused by bags of samples [55], to reduce the number of necessary labeled samples [77], to unify sample learning and feature selection procedure [78], and to reduce the sampling bias by controlling the

In this study, we utilized the sampling uncertainty that can bind the active learning and cotracking. As mentioned earlier, the baseline classifier, despite being accurate, has low generalization on new samples, slow classification speed, and computationally expensive retraining. On the other hand, the auxiliary classifier is agile and learns rapidly, with negligible retraining time. To combine the merits of these two classifiers, to cancel out their demerits with one another, and to address the aforementioned issues of co-tracking (redundant labeling and excessive samples), we incorporate an active learning module to select the most informative data, i.e., those for which the naive classifier is most uncertain, and query their labels from the part-based detector. This architecture (Figure 4, here called T4) mainly uses naive classifier for labeling the data and only asks the label of hard samples from the slower detector and, therefore, limits the redundancy and unleashes the speed of the agile classifier. In addition, by training the naive classifier only on hard samples, the generalization of this classifier is pre-

To further increase the accuracy of the tracker and make it more robust against occlusions and drastic temporal changes of the target, it is possible to update the detector less frequently. This asymmetric version of the active co-tracker (T5), by introducing long-term memory to the tracker, benefits from combining the long- and short-term collaboration (as in [62]) and

Images with tiny, imperceptible perturbations that fool a classifier into predicting the wrong labels with high confidence

reduces the frequency of the expensive updates of the tracker (Algorithm 1).

those trained with the full set [71, 72].

110 Human-Robot Interaction - Theory and Application

an index of its usefulness for training.

served while increasing its accuracy.

4.1. The idea

variance [79].

1

$$\text{Generator a sample } \mathbf{p}\_t^j \sim \mathcal{N}\left(\mathbf{p}\_{t-1}, \Sigma\_{search}\right).$$

$$\text{Calculate } \mathbf{s}\_t^j \leftarrow h\left(\mathbf{x}\_t^{\mathbf{p}\_t^j} | \boldsymbol{\Theta}\_t^{(1)}\right) \text{ (Eq.(6))}$$

Determine uncertain samples U<sup>t</sup> (Eq.(7))

if p<sup>j</sup> <sup>t</sup> <sup>∈</sup>U<sup>t</sup> then <sup>θ</sup>ð Þ<sup>1</sup> <sup>t</sup> is uncertain

$$\text{Query } \boldsymbol{\Theta}\_t^{(2)} \colon \boldsymbol{l}\_t^{\flat} \leftarrow \text{Sign}\left(\boldsymbol{h}\left(\mathbf{x}\_t^{\mathbf{p}\_t^{\flat}} | \boldsymbol{\Theta}\_t^{(2)}\right)\right).$$

else

$$\begin{array}{l} \text{Label using } \boldsymbol{\theta}\_{t}^{(1)} \colon \boldsymbol{l}\_{t}^{i} \leftarrow \operatorname{Sign} \Big( \boldsymbol{s}\_{t}^{j} \Big) \\\\ \mathcal{D}\_{t} \leftarrow \mathcal{D}\_{t} \boldsymbol{\omega} \Big\langle \begin{array}{l} \mathbf{x}\_{t}^{p\_{i}^{j}}, \boldsymbol{l}\_{t}^{i} \\\\ \end{array} \Big\rangle \\\\ \text{Update } \boldsymbol{\theta}\_{t}^{(2)} \text{ with } \mathcal{D}\_{t-\Delta,\ldots,t} \text{ every } \Delta \text{ frames ( $\Delta = 1$  for T4):} \\\\ \text{if } \sum\_{j=1}^{n} \mathbf{1} \Big( \boldsymbol{l}\_{t}^{j} > 0 \Big) > \tau\_{p} \text{ and } \sum\_{j=1}^{n} \pi\_{t}^{j} > \tau\_{d} \text{ then} \\\\ \text{Approximate target state } \widehat{\mathbf{p}}\_{t} \text{ (Eq.(9))} \\\\ \text{Update } \boldsymbol{\theta}\_{t}^{(1)} \text{ with } \mathcal{U}\_{t} \end{array}$$

else target occluded

$$
\hat{\mathbf{p}}\_t \leftarrow \mathbf{p}\_{t-1}
$$

#### 4.2. Formalization

In the proposed active co-tracking framework, a main classifier attempts to label the sample, and it queries the label from the other classifier if the main classifier emits uncertain results. This is in contrast with using a linear combination of both classifiers based on their classification accuracy as adopted in T3. At the CLASSIFYING step, the proposed tracker can score each sample based on the classifier confidence, i.e., for sample p<sup>j</sup> <sup>t</sup> we calculate score s j t:

$$\mathbf{s}\_t^j = h\left(\mathbf{x}\_t^{\mathbf{p}\_t^j} | \boldsymbol{\Theta}\_t^{(1)}\right). \tag{6}$$

address this issue, if the number of positive samples is less than τp, and their score average is

Figure 5. Quantitative performance comparison of the asymmetric active co-tracker (T5), active co-tracker (T4), the

Active Collaboration of Classifiers for Visual Tracking http://dx.doi.org/10.5772/intechopen.74199 113

Figure 5 illustrates the effectiveness of the proposed trackers against their baselines. The active query mechanism in T4 improves the efficiency and effectiveness of co-tracking (T3). Especially in the asymmetric co-tracker (T5), the mixture of long-term and short-term memory classifiers using this method is to key to automatically balance the stability-plasticity equilibrium. It is also prudent for the tracker to adapt to the temporal distribution of the target

In summary, the advantages of the proposed trackers especially the asymmetric ones (T5) compared to the conventional co-tracking (T3) are as follows: (1) the classifiers do not exchange all the data they have problems in labeling; instead, the most informative samples are selected by uncertainty sampling and exchanged; (2) the update rate of classifiers is different to realize a short- and long-term memory mixture; (3) the samples that are labeled for the target localization can be reused for training, and the need for an extra round of sampling and labeling is revoked; and (4) since in the proposed asymmetric co-tracking, one of the classifiers scaffolds the other one instead of participating in every labeling process, a

Ensemble discriminative tracking utilizes a committee of classifiers, to label data samples, which are in turn used for retraining the tracker to localize the target using the collective

more sophisticated classifier with higher computational complexity can be used.

less than τa, the target is deemed occluded to avoid tracker degeneracy.

ordinary co-tracker (T3), and their individual trackers (T1 and T2).

appearance, before its redistribution by illumination changes, etc.

5. Active ensemble co-tracking

4.3. Evaluation

Based on uncertainty sampling [76], the samples for which the classification score is more uncertain (i.e., s j <sup>t</sup> ! 0) contain more information for the classifier if they are labeled by the other classifier. Therefore, the scores of all samples are sorted, and m samples with the closest values to 0 are selected to be queried from θð Þ<sup>2</sup> <sup>t</sup> . To handle the situations for which the number of highly uncertain samples are more than m, a range of scores are determined by lower and higher thresholds (τ<sup>l</sup> and τu), and all the samples in this range are considered highly uncertain:

$$\mathcal{U}\_t = \left\{ \mathbf{p}\_t^i | \boldsymbol{\pi}\_l < \mathbf{s}\_t^i < \boldsymbol{\pi}\_u \quad \text{or} \quad | \left\{ \exists \mathbf{j} \neq \mathbf{i} | \mathbf{s}\_t^j \leq \mathbf{s}\_t^i \right\} | < m \right\} \tag{7}$$

in which U<sup>t</sup> is the list of uncertain samples. The label of the samples l j <sup>t</sup> ∈Lt, j ¼ 1, …, N is then determined by

$$\mathbf{p}\_t^j = \begin{pmatrix} \text{sign}\left(h\left(\mathbf{x}\_t^{\mathbf{p}\_t^j} | \boldsymbol{\theta}\_t^{(1)}\right)\right) & \text{, } \mathbf{p}\_t^j \in \mathcal{U}\_t\\ \text{sign}\left(h\left(\mathbf{x}\_t^{\mathbf{p}\_t^j} | \boldsymbol{\theta}\_t^{(2)}\right)\right) & \mathbf{p}\_t^j \notin \mathcal{U}\_t \end{pmatrix} \tag{8}$$

and all image patches x pj t <sup>t</sup> and labels l j <sup>t</sup> are stored in Dt.

At the ESTIMATION step, we follow the importance sampling mechanism originally employed by particle filter trackers:

$$\widehat{\mathbf{p}}\_t = \frac{\sum\_{j=1}^n \pi\_t^j \mathbf{p}\_t^j}{\sum\_{j=1}^1 \pi\_t^j}. \tag{9}$$

where π<sup>j</sup> <sup>t</sup> ¼ s j <sup>t</sup>1 l j <sup>t</sup> > 0 � � and 1ð Þ: are the indicator function, 1 if true, zero otherwise. This mechanism approximates the state of the target, based on the effect of positive samples, in which samples with higher scores gravitate the final results more toward themselves. Upon the events such as massive occlusion or target loss, this sampling mechanism degenerates [13]. In such cases, the number of positive samples and their corresponding weights shrinks significantly, and the importance sampling is prone to outliers, distractors, and occluded patches. To

Figure 5. Quantitative performance comparison of the asymmetric active co-tracker (T5), active co-tracker (T4), the ordinary co-tracker (T3), and their individual trackers (T1 and T2).

address this issue, if the number of positive samples is less than τp, and their score average is less than τa, the target is deemed occluded to avoid tracker degeneracy.

#### 4.3. Evaluation

4.2. Formalization

uncertain (i.e., s

determined by

and all image patches x

by particle filter trackers:

<sup>t</sup> ¼ s j <sup>t</sup>1 l j <sup>t</sup> > 0 � �

where π<sup>j</sup>

j

112 Human-Robot Interaction - Theory and Application

values to 0 are selected to be queried from θð Þ<sup>2</sup>

<sup>U</sup><sup>t</sup> <sup>¼</sup> <sup>p</sup><sup>i</sup> t jτ<sup>l</sup> < s i

> l j <sup>t</sup> ¼

<sup>t</sup> and labels l

pj t

in which U<sup>t</sup> is the list of uncertain samples. The label of the samples l

0

BBB@

j

sign h x

sign h x

In the proposed active co-tracking framework, a main classifier attempts to label the sample, and it queries the label from the other classifier if the main classifier emits uncertain results. This is in contrast with using a linear combination of both classifiers based on their classification accuracy as adopted in T3. At the CLASSIFYING step, the proposed tracker can score

Based on uncertainty sampling [76], the samples for which the classification score is more

other classifier. Therefore, the scores of all samples are sorted, and m samples with the closest

of highly uncertain samples are more than m, a range of scores are determined by lower and higher thresholds (τ<sup>l</sup> and τu), and all the samples in this range are considered highly uncertain:

<sup>t</sup> < τ<sup>u</sup> or j ∃j 6¼ ijs

pj t <sup>t</sup> <sup>j</sup>θð Þ<sup>1</sup> t � � � �

pj t <sup>t</sup> <sup>j</sup>θð Þ<sup>2</sup> t � � � �

<sup>t</sup> are stored in Dt. At the ESTIMATION step, we follow the importance sampling mechanism originally employed

> Pn j¼1 πj tpj t

> > P 1 j¼1 πj t

mechanism approximates the state of the target, based on the effect of positive samples, in which samples with higher scores gravitate the final results more toward themselves. Upon the events such as massive occlusion or target loss, this sampling mechanism degenerates [13]. In such cases, the number of positive samples and their corresponding weights shrinks significantly, and the importance sampling is prone to outliers, distractors, and occluded patches. To

<sup>p</sup>b<sup>t</sup> <sup>¼</sup>

n o

<sup>t</sup> ! 0) contain more information for the classifier if they are labeled by the

j <sup>t</sup> ≤ s i t

, pj <sup>t</sup> ∈U<sup>t</sup>

, pj <sup>t</sup>∉U<sup>t</sup>

and 1ð Þ: are the indicator function, 1 if true, zero otherwise. This

n o

<sup>t</sup> we calculate score s

: (6)

<sup>t</sup> . To handle the situations for which the number

j < m

j

: (9)

<sup>t</sup> ∈Lt, j ¼ 1, …, N is then

j t:

(7)

(8)

each sample based on the classifier confidence, i.e., for sample p<sup>j</sup>

s j <sup>t</sup> ¼ h x pj t <sup>t</sup> <sup>j</sup>θð Þ<sup>1</sup> t � �

> Figure 5 illustrates the effectiveness of the proposed trackers against their baselines. The active query mechanism in T4 improves the efficiency and effectiveness of co-tracking (T3). Especially in the asymmetric co-tracker (T5), the mixture of long-term and short-term memory classifiers using this method is to key to automatically balance the stability-plasticity equilibrium. It is also prudent for the tracker to adapt to the temporal distribution of the target appearance, before its redistribution by illumination changes, etc.

> In summary, the advantages of the proposed trackers especially the asymmetric ones (T5) compared to the conventional co-tracking (T3) are as follows: (1) the classifiers do not exchange all the data they have problems in labeling; instead, the most informative samples are selected by uncertainty sampling and exchanged; (2) the update rate of classifiers is different to realize a short- and long-term memory mixture; (3) the samples that are labeled for the target localization can be reused for training, and the need for an extra round of sampling and labeling is revoked; and (4) since in the proposed asymmetric co-tracking, one of the classifiers scaffolds the other one instead of participating in every labeling process, a more sophisticated classifier with higher computational complexity can be used.

#### 5. Active ensemble co-tracking

Ensemble discriminative tracking utilizes a committee of classifiers, to label data samples, which are in turn used for retraining the tracker to localize the target using the collective knowledge of the committee. In such frameworks the labeling process is performed by leveraging a group of classifiers with different views [45, 56, 80], subsets of training data [57, 81], or memories [57, 82].

In ensemble tracking [45, 47, 56, 57, 60, 83–85], the self-learning loop is broken, and the labeling process is performed by eliciting the belief of a group of classifiers. However, this framework typically does not address some of the demands of tracking-by-detection approaches like a proper model update to avoid model drift or non-stationary of the target sample distribution. Besides, ensemble classifiers do not exchange information, and collaborative classifiers entirely trust the other classifier to label the challenging samples for them and are susceptible to label noise.

Traditionally, ensemble trackers were used to providing a multi-view classification of the target, realized by using different features to construct weak classifiers. In this view, different classifiers represent different hypotheses in the version space, to accurately model the target appearance. Such hypotheses are highly overlapping; therefore an ensemble of them overfits the target. The desired committee, however, consists of competing hypotheses, all consistent with the training data, but each of the specialized in certain aspect. In this view, the most informative data samples are those about which the hypotheses disagree the most, and by labeling them, the version space is minimized leading to quick convergence yet accurate classification [86]. Motivated by this, we proposed a tracker that employs a randomized ensemble of classifiers and selects the most informative data samples to be labeled.

QBC was originally designed to work with stochastic learning algorithms, which pose limitations to use it with non-probabilistic or deterministic models. To alleviate this problem, Abe and Mamitsuka [89] enable deterministic classifiers to work with random subsets of training data to create different variations of the same learning model. By creating temporary ensemble using this "bagging" procedure [90], they realized Query-by-Bagging (QBag) to enhance the

Figure 6. Active ensemble co-tracker. The bagging-induced ensemble labels the input samples and only queries the most

We propose the adjustment of the QBag algorithm for online training to solve the label noise problem in T6. Similar to T5, the drift problem is handled using dual-memory strategy: the committee rapidly adapts to target changes, whereas the main classifier possesses a longer

An ensemble discriminative tracker employs a set of classifiers instead of one. These classifiers,

neous and independent (e.g., [56, 85]). Popular ensemble trackers utilize the majority voting of

And Eq. (8) is used to label the samples. Finally, the model is updated for each classifier independently, meaning that each of the committee members is trained with a random subset

samples X. The uncertain set U<sup>t</sup> contains all of the samples for which the ensemble disagrees

<sup>p</sup>t�<sup>1</sup> <sup>∘</sup> <sup>y</sup><sup>j</sup> t <sup>t</sup> <sup>j</sup>θð Þ<sup>c</sup> t

sign h x

<sup>t</sup> ;…; <sup>θ</sup>ð Þ <sup>C</sup> t

� � where <sup>u</sup>ð Þ <sup>θ</sup>; <sup>X</sup> is the updating the model <sup>θ</sup> with

n o and are typically homoge-

Active Collaboration of Classifiers for Visual Tracking http://dx.doi.org/10.5772/intechopen.74199 115

<sup>t</sup> is also updated with all

� � � � : (10)

learning speed and generalization of the base learning algorithm.

memory to promote the stability of the target template (Figure 6).

hereafter called committee, are represented by <sup>C</sup> <sup>¼</sup> <sup>θ</sup>ð Þ<sup>1</sup>

<sup>t</sup>þ<sup>1</sup> <sup>¼</sup> <sup>u</sup> <sup>θ</sup>ð Þ<sup>c</sup>

s j <sup>t</sup> <sup>¼</sup> <sup>X</sup> C

c¼1

<sup>t</sup> � U<sup>t</sup>

<sup>t</sup> ; <sup>Γ</sup>ð Þ<sup>c</sup>

and was sent to the auxiliary classifier for labeling. The detector θð Þ<sup>o</sup>

the committee as their utility function:

disputed ones from the slow part-based classifier.

5.2. Formalization

of the uncertain set. θð Þ<sup>c</sup>

recent data D<sup>t</sup>�Δ,::,t every Δ frames.

#### 5.1. The idea

To create ensembles of classifiers, researchers typically make different classifiers by altering the features [45], using a pool of appearance and dynamics models [87], utilizing different memory horizons [82], and employing previous snapshots of a classifier in different times [57], but creating a collaborative mechanism in the ensemble, where classifiers exchange information is hardly addressed in the visual tracking literature. This data exchange can be in the form of query passing between ensemble members, in which the queries can be the samples for which a classifier is uncertain or even the ensemble is most uncertain.

Selecting such queries is addressed in different machine learning domains such as curriculum learning [74] and active learning. Query-by-Committee (QBC) algorithm [86, 88] is an active learning approach for ensembles that selects the most informative query to pass within a committee of models which are all trained on the current labeled set but represent competing hypotheses. The label of the queried sample is then decided by the vote of the ensemble members, and the samples for which the ensemble has more diverse ideas are selected as the next query to ask from the teacher (here, the auxiliary classifier). In this case, where the task is a binary classification, the most disputed sample (i.e., with close positive and negative votes) is the most informative since learning its label would maximally train the ensemble. Training with the external label for this sample, shrinks the version space (i.e., the space of all consistent hypotheses with the training data) such that it remains consistent with the hypotheses of all classifiers, but rejects more potential incorrect ones.

Figure 6. Active ensemble co-tracker. The bagging-induced ensemble labels the input samples and only queries the most disputed ones from the slow part-based classifier.

QBC was originally designed to work with stochastic learning algorithms, which pose limitations to use it with non-probabilistic or deterministic models. To alleviate this problem, Abe and Mamitsuka [89] enable deterministic classifiers to work with random subsets of training data to create different variations of the same learning model. By creating temporary ensemble using this "bagging" procedure [90], they realized Query-by-Bagging (QBag) to enhance the learning speed and generalization of the base learning algorithm.

We propose the adjustment of the QBag algorithm for online training to solve the label noise problem in T6. Similar to T5, the drift problem is handled using dual-memory strategy: the committee rapidly adapts to target changes, whereas the main classifier possesses a longer memory to promote the stability of the target template (Figure 6).

### 5.2. Formalization

knowledge of the committee. In such frameworks the labeling process is performed by leveraging a group of classifiers with different views [45, 56, 80], subsets of training data [57,

In ensemble tracking [45, 47, 56, 57, 60, 83–85], the self-learning loop is broken, and the labeling process is performed by eliciting the belief of a group of classifiers. However, this framework typically does not address some of the demands of tracking-by-detection approaches like a proper model update to avoid model drift or non-stationary of the target sample distribution. Besides, ensemble classifiers do not exchange information, and collaborative classifiers entirely trust the other classifier to label the challenging samples for them and are susceptible to label noise.

Traditionally, ensemble trackers were used to providing a multi-view classification of the target, realized by using different features to construct weak classifiers. In this view, different classifiers represent different hypotheses in the version space, to accurately model the target appearance. Such hypotheses are highly overlapping; therefore an ensemble of them overfits the target. The desired committee, however, consists of competing hypotheses, all consistent with the training data, but each of the specialized in certain aspect. In this view, the most informative data samples are those about which the hypotheses disagree the most, and by labeling them, the version space is minimized leading to quick convergence yet accurate classification [86]. Motivated by this, we proposed a tracker that employs a randomized

To create ensembles of classifiers, researchers typically make different classifiers by altering the features [45], using a pool of appearance and dynamics models [87], utilizing different memory horizons [82], and employing previous snapshots of a classifier in different times [57], but creating a collaborative mechanism in the ensemble, where classifiers exchange information is hardly addressed in the visual tracking literature. This data exchange can be in the form of query passing between ensemble members, in which the queries can be the samples for which

Selecting such queries is addressed in different machine learning domains such as curriculum learning [74] and active learning. Query-by-Committee (QBC) algorithm [86, 88] is an active learning approach for ensembles that selects the most informative query to pass within a committee of models which are all trained on the current labeled set but represent competing hypotheses. The label of the queried sample is then decided by the vote of the ensemble members, and the samples for which the ensemble has more diverse ideas are selected as the next query to ask from the teacher (here, the auxiliary classifier). In this case, where the task is a binary classification, the most disputed sample (i.e., with close positive and negative votes) is the most informative since learning its label would maximally train the ensemble. Training with the external label for this sample, shrinks the version space (i.e., the space of all consistent hypotheses with the training data) such that it remains consistent with the hypotheses of all

ensemble of classifiers and selects the most informative data samples to be labeled.

a classifier is uncertain or even the ensemble is most uncertain.

classifiers, but rejects more potential incorrect ones.

81], or memories [57, 82].

114 Human-Robot Interaction - Theory and Application

5.1. The idea

An ensemble discriminative tracker employs a set of classifiers instead of one. These classifiers, hereafter called committee, are represented by <sup>C</sup> <sup>¼</sup> <sup>θ</sup>ð Þ<sup>1</sup> <sup>t</sup> ;…; <sup>θ</sup>ð Þ <sup>C</sup> t n o and are typically homogeneous and independent (e.g., [56, 85]). Popular ensemble trackers utilize the majority voting of the committee as their utility function:

$$\mathbf{s}\_t^j = \sum\_{c=1}^{\mathbb{C}} \text{sign}\left(h\left(\mathbf{x}\_t^{\mathbf{p}\_{t-1}} \mathbf{^\*\mathbf{y}\_t^j} | \boldsymbol{\theta}\_t^{(c)}\right)\right). \tag{10}$$

And Eq. (8) is used to label the samples. Finally, the model is updated for each classifier independently, meaning that each of the committee members is trained with a random subset of the uncertain set. θð Þ<sup>c</sup> <sup>t</sup>þ<sup>1</sup> <sup>¼</sup> <sup>u</sup> <sup>θ</sup>ð Þ<sup>c</sup> <sup>t</sup> ; <sup>Γ</sup>ð Þ<sup>c</sup> <sup>t</sup> � U<sup>t</sup> � � where <sup>u</sup>ð Þ <sup>θ</sup>; <sup>X</sup> is the updating the model <sup>θ</sup> with samples X. The uncertain set U<sup>t</sup> contains all of the samples for which the ensemble disagrees and was sent to the auxiliary classifier for labeling. The detector θð Þ<sup>o</sup> <sup>t</sup> is also updated with all recent data D<sup>t</sup>�Δ,::,t every Δ frames.

6. Discussion

the field.

The instances of the proposed framework are evaluated against state-of-the-art trackers on public sequences that become the de facto standards of benchmarking the trackers. The trackers are compared with popular metrics such as success plot and precision plot to establish a fair benchmark. In addition, the performance of the proposed trackers is investigated for videos with a distinguished tracking challenge, and the results are compared with state of the art and discussed. Additionally, the effect of the information exchanged will be examined thoroughly to illustrate the dynamics of the system. The preliminary results of the proposed framework demonstrate a superior performance for the proposed trackers when applied on all the sequences and most of the subsets of the test dataset with distinguished challenges. Finally, the future research direction is discussed, and the opened research avenues are introduced to

Active Collaboration of Classifiers for Visual Tracking http://dx.doi.org/10.5772/intechopen.74199 117

As Figure 7 and Table 2 demonstrate, T6 has the best overall performance among investigated trackers on this dataset. While this algorithm has a clear edge in handling many challenges, its performance is comparable with T5 in the case of occlusions and z-rotations. It is also evident that T6 is troubled with fast deformations since neither of the ensemble members is specialized in handling a specific type of deformations and the collective decision of the ensemble may involve mistakes with high confidence. On the other hand, T5 utilizes a dual-memory scheme, and a single classifier can handle extreme temporal deformations better than the ensemble in

IV DEF OCC SV IPR OPR OV LR BC FM MB ALL

T0 12 12 13 12 13 13 14 5 12 15 18 14 T1 37 29 3 36 42 39 43 30 33 39 36 38 T2 23 19 23 23 28 25 25 22 23 24 20 25 T3 41 32 39 40 44 42 43 30 36 43 39 41 T4 50 39 47 48 53 49 48 37 44 50 45 49 T5 52 47 53 51 59 56 52 38 41 53 46 52 T6 57 40 51 53 61 55 63 46 53 60 58 56 TLD 49 32 42 44 50 43 45 37 40 45 42 46 STRK 46 41 44 43 51 48 44 39 39 52 48 48 CSK 40 36 36 34 43 39 32 29 42 39 32 41 MIL 35 35 38 35 41 39 40 32 31 35 28 36 BSBT 23 18 23 21 27 24 32 23 23 26 24 25

The first, second, and third best methods are shown in color. The challenges are illumination variation (IV), scale variation (SV), occlusions (OCC), deformations (DEF), motion blur (MB), fast motion (FM), in-plane rotation (IPR), out-of-play

Table 2. Quantitative evaluation of state of the art under different visual tracking challenges using AUC of success plot (%).

rotation (OPR), out-of-view problem (OV), background clutter (BC), and low resolution (LR)

Figure 7. Quantitative performance comparison of the active ensemble co-tracker (T6) with its predecessors.

#### 5.3. Evaluation

Figure 7 depicts the overall performance of the proposed tracker against other benchmarked algorithms on all sequences of the dataset. The plots show that T6 has a superior performance over T5 and its predecessors. The steep slope between 0:9 ≥ τov > 1 indicates the high quality of the predictions (i.e., more predictions have higher overlap with the ground truth, rather than being partially correct), and the other slope around τov ≈ 0:4 along with high success rate near τov ! 0 indicates that the algorithm was successful in continue tracking, despite all the tracking challenges.
