Preface

Surveillance systems have become an essential part in most establishments nowadays. There are many uses to these systems in national security, safety in public areas, flow control in crowded scenes, private safety and in providing special care for the aged and disabled. At the heart of any surveillance system are video cameras which have had their numbers significantly multiply over the last decade, thanks to advances in digital networks and automated video processing. This has resulted in an abundance of available surveillance video which made the process of monitoring them by human operators not only outdated but also practically infeasible. Several methods have been developed to automate the detection and reporting of scenes, events and subjects that satisfy application specific requirements.

The purpose of this book is to collect recent advances in select areas of video surveillance. Research in that area usually combines results from machine learning, artificial intelligence, software engineering, stochastic modeling, signal processing in addition to pattern recognition and digital image/video processing. Solving problems related to video surveillance often requires the reconciliation between several contradicting objectives. This makes it a challenging task and also an open research area where novel solutions are continually presented to overcome earlier shortcomings but still without reaching a final solution.

The book is organized into six chapters outlined as follows:

*Chapter 1* addresses the challenges that face surveillance applications due to the increased availability of visual data to be processed. As a case study, the problem of visual tracking is presented, its classical techniques are described and the difficulties that these techniques have to deal with due to increased visual data are illustrated. The emerging theory of compressive sensing is then introduced as a solution to these challenges, applying it to the successive stages of object tracking. Unlike the mathematical oriented approach used earlier, *Chapter 2* presents a software engineering approach to the problem of tracking. In particular, the technology of mobile agents is applied to the problem of tracking objects as they move between the fields of view of several cameras. The challenge here is to recover and maintain the identities of targets lost by the system. Neighborhood node determination techniques are introduced, analyzed and compared.

XII Preface

*Chapter 3* surveys most recent advances in the area of indexing surveillance video. The challenges in the retrieval of tracked objects from video are presented. Then, existing and suggested solutions are evaluated. This brings us to an important aspect of video surveillance: the difficulty in comparing and evaluating results due to the lack of standards. Subjective assessment of the quality of surveillance video is yet another emerging area presented in this book and is the topic of *Chapter 4*. It is compared to earlier qualitative assessment approaches such as ʺuser satisfactionʺ or other quantitative measures of quality deterioration versus frame per second rates. In contrast, the suggested assessment method focuses on recognition and discrimination tasks which fit video surveillance needs more adequately.

Preface IX

**Hazem El‐Alfy**

Alexandria, EGYPT

Dept. of Engineering Mathematics and Physics Faculty of Engineering, Alexandria University

(Computer Scientist at SRI International, USA) and Vlad I. Morariu, PhD (Research Associate at the University of Maryland, USA) for reviewing several chapters and

providing helpful comments.

April 2012

*Chapter 5* presents an important application of video surveillance in the area of public safety, namely at railroad crossings. Current safety settings include sensor triggered devices that detect objects crossing rail tracks when a train is approaching (danger zone). The suggested approach, however, uses stereo color surveillance cameras to accurately detect, in 3D, obstacles that are either moving or stopped in the danger zone. A novel background subtraction technique that uses color information is developed and, in addition, the chapter contains a clear presentation of a wealth of classical topics in computer vision, such as stereo matching, segmentation and tracking. The book concludes in *Chapter 6* with an application in event understanding (event modeling) which lies at the edge between machine learning and computer vision. It highlights the common challenge within the computer vision community of choosing an appropriate data representation. A suitable representation results in more efficient and accurate data processing. This is illustrated in the chapter using a feature space representation for motion trajectories. Clustering of trajectories is then performed more efficiently and is used to detect outliers which are typically reported as suspicious activity.

The chapters of this book comprise multiple areas of video surveillance ranging from classical computer vision areas of video segmentation, stereo matching, anomaly detection and video indexing to recently emerging areas such as quality assessment and compressive sensing. Recent developments in those areas are presented along with practical real life applications. Still, each chapter contains a clear presentation of the area it covers with references to earlier related work. This makes the book available for a wide range of readers. Academic researchers will find a reliable compilation of relevant literature in addition to timely pointers to current advances in the field of video surveillance. Industry practitioners will find useful hints about state‐of‐the‐art applications. The book also provides directions for open problems where further advances can be pursued.

#### **Acknowledgements**

I am indebted to many people who assisted me in the different processing stages of this book. In particular, I would like to acknowledge the editorial staff for their professionalism and patience. I also extend special thanks to Behjat Siddiquie, PhD (Computer Scientist at SRI International, USA) and Vlad I. Morariu, PhD (Research Associate at the University of Maryland, USA) for reviewing several chapters and providing helpful comments.

April 2012

VIII Preface

as suspicious activity.

advances can be pursued.

**Acknowledgements**

*Chapter 3* surveys most recent advances in the area of indexing surveillance video. The challenges in the retrieval of tracked objects from video are presented. Then, existing and suggested solutions are evaluated. This brings us to an important aspect of video surveillance: the difficulty in comparing and evaluating results due to the lack of standards. Subjective assessment of the quality of surveillance video is yet another emerging area presented in this book and is the topic of *Chapter 4*. It is compared to earlier qualitative assessment approaches such as ʺuser satisfactionʺ or other quantitative measures of quality deterioration versus frame per second rates. In contrast, the suggested assessment method focuses on recognition and discrimination

*Chapter 5* presents an important application of video surveillance in the area of public safety, namely at railroad crossings. Current safety settings include sensor triggered devices that detect objects crossing rail tracks when a train is approaching (danger zone). The suggested approach, however, uses stereo color surveillance cameras to accurately detect, in 3D, obstacles that are either moving or stopped in the danger zone. A novel background subtraction technique that uses color information is developed and, in addition, the chapter contains a clear presentation of a wealth of classical topics in computer vision, such as stereo matching, segmentation and tracking. The book concludes in *Chapter 6* with an application in event understanding (event modeling) which lies at the edge between machine learning and computer vision. It highlights the common challenge within the computer vision community of choosing an appropriate data representation. A suitable representation results in more efficient and accurate data processing. This is illustrated in the chapter using a feature space representation for motion trajectories. Clustering of trajectories is then performed more efficiently and is used to detect outliers which are typically reported

The chapters of this book comprise multiple areas of video surveillance ranging from classical computer vision areas of video segmentation, stereo matching, anomaly detection and video indexing to recently emerging areas such as quality assessment and compressive sensing. Recent developments in those areas are presented along with practical real life applications. Still, each chapter contains a clear presentation of the area it covers with references to earlier related work. This makes the book available for a wide range of readers. Academic researchers will find a reliable compilation of relevant literature in addition to timely pointers to current advances in the field of video surveillance. Industry practitioners will find useful hints about state‐of‐the‐art applications. The book also provides directions for open problems where further

I am indebted to many people who assisted me in the different processing stages of this book. In particular, I would like to acknowledge the editorial staff for their professionalism and patience. I also extend special thanks to Behjat Siddiquie, PhD

tasks which fit video surveillance needs more adequately.

## **Hazem El‐Alfy**

Dept. of Engineering Mathematics and Physics Faculty of Engineering, Alexandria University Alexandria, EGYPT

**1. Introduction**

tracking.

estimation for the quantities of interest.

real-time and bandwidth constraints.

**2. Classical visual tracking**

Visual tracking is an important component of many video surveillance systems. Specifically, visual tracking refers to the inference of physical object properties (e.g., spatial position or velocity) from video data. This is a well-established problem that has received a great deal of attention from the research community (see, e.g., the survey (Yilmaz et al., 2006)). Classical techniques often involve performing object segmentation, feature extraction, and sequential

**Compressive Sensing in Visual Tracking** 

Garrett Warnell and Rama Chellappa *University of Maryland, College Park*

*USA* 

**1**

Recently, a new challenge has emerged in this field. Tracking has become increasingly difficult due to the growing availability of cheap, high-quality visual sensors. The issue is data deluge (Baraniuk, 2011), i.e., the quantity of data prohibits its usefulness due to the inability of the system to efficiently process it. For example, a video surveillance system consisting of many high-definition cameras may be able to gather data at a high rate (perhaps gigabytes per second), but may not be able to process, store, or transmit the acquired video data under

The emerging theory of *compressive sensing (CS)* has the potential to address this problem. Under certain conditions related to sparse representations, it effectively reduces the amount of data collected by the system while retaining the ability to faithfully reconstruct the information of interest. Using novel sensors based on this theory, there is hope to accomplish

This chapter will first present classical components of and approaches to visual tracking, including background subtraction, the Kalman and particle filters, and the mean shift tracker. This will be followed by an overview of CS, especially as it relates to imaging. The rest of the chapter will focus on several recent works that demonstrate the use and benefit of CS in visual

The purpose of this section is to give an overview of classical visual tracking. As a popular component present in many methods, an overview of techniques used for background subtraction will be provided. Next, the focus will shift to the probabilistic tracking frameworks that define the Kalman and particle filters. This will be followed by a presentation

tracking tasks while collecting significantly less data than traditional systems.

of an effective application-specific method: the mean shift tracker.
