6. Discussion

5.3. Evaluation

116 Human-Robot Interaction - Theory and Application

challenges.

Figure 7 depicts the overall performance of the proposed tracker against other benchmarked algorithms on all sequences of the dataset. The plots show that T6 has a superior performance over T5 and its predecessors. The steep slope between 0:9 ≥ τov > 1 indicates the high quality of the predictions (i.e., more predictions have higher overlap with the ground truth, rather than being partially correct), and the other slope around τov ≈ 0:4 along with high success rate near τov ! 0 indicates that the algorithm was successful in continue tracking, despite all the tracking

Figure 7. Quantitative performance comparison of the active ensemble co-tracker (T6) with its predecessors.

The instances of the proposed framework are evaluated against state-of-the-art trackers on public sequences that become the de facto standards of benchmarking the trackers. The trackers are compared with popular metrics such as success plot and precision plot to establish a fair benchmark. In addition, the performance of the proposed trackers is investigated for videos with a distinguished tracking challenge, and the results are compared with state of the art and discussed. Additionally, the effect of the information exchanged will be examined thoroughly to illustrate the dynamics of the system. The preliminary results of the proposed framework demonstrate a superior performance for the proposed trackers when applied on all the sequences and most of the subsets of the test dataset with distinguished challenges. Finally, the future research direction is discussed, and the opened research avenues are introduced to the field.

As Figure 7 and Table 2 demonstrate, T6 has the best overall performance among investigated trackers on this dataset. While this algorithm has a clear edge in handling many challenges, its performance is comparable with T5 in the case of occlusions and z-rotations. It is also evident that T6 is troubled with fast deformations since neither of the ensemble members is specialized in handling a specific type of deformations and the collective decision of the ensemble may involve mistakes with high confidence. On the other hand, T5 utilizes a dual-memory scheme, and a single classifier can handle extreme temporal deformations better than the ensemble in


The first, second, and third best methods are shown in color. The challenges are illumination variation (IV), scale variation (SV), occlusions (OCC), deformations (DEF), motion blur (MB), fast motion (FM), in-plane rotation (IPR), out-of-play rotation (OPR), out-of-view problem (OV), background clutter (BC), and low resolution (LR)

Table 2. Quantitative evaluation of state of the art under different visual tracking challenges using AUC of success plot (%).

algorithm selects the most informative samples to learn from the long-term memory auxiliary detector, which realizes a gradually decreasing dependence on this slow and likely overfit detector yet robust against fluctuations in target appearance and occlusions. Furthermore, using an expectation of the bounding boxes compensates for overreliance of the tracker on the classifiers' confidence function. The balance in stability-plasticity equilibrium is achieved by the combination of several short-term classifiers with a long-term classifier and managing their

Active Collaboration of Classifiers for Visual Tracking http://dx.doi.org/10.5772/intechopen.74199 119

The trail of proposed trackers led to T6, which incorporates ensemble tracking, active learning, and co-learning in a discriminative tracking framework and outperform state-of-the-art discriminative and generative trackers on a large video dataset with various types of challenges

The future direction of this study involves other detectors to care for context, to have accurate physical models for known categories, to use deep features to improve discrimination, and to examine different methods of building the ensemble and detecting most informative samples

This article is based on results obtained from a project commissioned by the Japan NEDO and was supported by Post-K application development for exploratory challenges from the Japan

[1] Borangiu T. "Visual conveyor tracking in high-speed robotics tasks," in Industrial Robot-

[2] Cech J, Mittal R, Deleforge A, Horaud R. Active-speaker detection and localization with

[3] Cosgun A, Florencio DA, Christensen HI. Autonomous person following for telepresence

interaction with an active learning mechanism.

such as appearance changes and occlusions.

or exchanging.

MEXT.

Author details

References

Kourosh Meshgi\* and Shigeyuki Oba

\*Address all correspondence to: meshgi-k@sys.i.kyoto-u.ac.jp

robots. In: ICRA'13; IEEE; 2013. pp. 4335-4342

Graduate School of Informatics, Kyoto University, Kyoto, Japan

ics: Theory, Modelling and Control. InTech, Rijeka, Croatia 2006

mic and cameras embedded into a robotic head. In: Humanoids'13; 2013

Acknowledgements

Figure 8. Qualitative results of T6 in red against other trackers (T0–T5 in blue and TLD, STRK, CSK, MIL, and BSBT in gray) on challenging video scenarios of OTB-100 [65]. The sequences are (from top to bottom, left to right) FaceOcc2 and Walking2 with severe occlusion, Deer and Skating1 with abrupt motions, Firl and Ironman with drastic rotations, Singer1 and CarDark and Shaking with poor lighting, Jumping and Basketball with nonrigid deformations, and Shaking,Soccer with drastic lighting, pose, and noise level changes and Board with intensive background clutter. The ground truth is illustrated with yellow dashed box. The results are available in http://ishiilab.jp/member/meshgi-k/act.html.

T6. Interestingly, it is observed that in most of the subcategories that T6 is clearly better than the other trackers, the success plot of T6 starts with a plateau and later has a sharp drop around τov ¼ 0:8. This means that T6 provides high-quality localization (i.e., bigger overlaps with the ground truth). Similarly, from precision plot, it is evident that T6 shows a graceful degradation in different scenarios, and although it does not provide a good scale adaptation for targets, it is able to localize them better than the competing trackers (Figure 8).
