**Intelligent Surveillance System Based on Stereo Vision for Level Crossings Safety Applications**

Nizar Fakhfakh, Louahdi Khoudour, Jean-Luc Bruyelle and El-Miloudi El-Koursi *French Institute of Science and Technology for Transport, Development and Networks (IFSTTAR) France* 

## **1. Introduction**

18 Will-be-set-by-IN-TECH

74 Recent Developments in Video Surveillance

Picard, D., Dacremont, C., Valentin, D. & Giboreau, A. (2003). Perceptual dimensions of tactile

Radun, J., Leisti, T., Häkkinen, J., Ojanen, H., Olives, J.-L., Vuori, T. & Nyman, G. (2008).

Strohmeier, D., Jumisko-Pyykkö, S. & Kunze, K. (2010). Open profiling of quality: a mixed

VQE (2010). *Report on the validation of video quality models for high definition video content*, version

VQiPS (2010). Defining video quality requirements: A guide for public safety, volume 1.0,

Wang, Z., Lu, L. & Bovik, A. C. (2004). Video quality assessment based on structural distortion measurement, *Signal Processing: Image Communication* 19(2): 121–132. URL: http://dx.doi.org/10.1016/S0923-5965(03)00076-6%20

2010: 3:1–3:17. URL: http://dx.doi.org/10.1155/2010/658980 Takahashi, A., Schmidmer, C., Lee, C., Speranza, F., Okamoto, J., Brunnström, K., Janowski,

com/science/article/pii/S0001691803000751

URL: http://doi.acm.org/10.1145/1278760.1278762

VQEG (n.d.). The video quality experts group. URL: http://www.vqeg.org/

and Compatibility. URL: http://goo.gl/TJOdU VQiPS (2011). Video quality tests for object recognition applications.

*Appl. Percept.* 4: 2:1–2:15.

Video Quality Experts Group.

2.0 edn. http://www.vqeg.org/.

1627\_additionalstatement.htm

textures, *Acta Psychologica* 114(2): 165 – 184. URL: http://www.sciencedirect.

Content and quality: Interpretation-based estimation of image quality, *ACM Trans.*

method approach to understanding multimodal quality perception, *Adv. MultiMedia*

L., Barkowsky, M., Pinson, M., Staelens, Nicolas Huynh Thu, Q., Green, R., Bitto, R., Renaud, R., Borer, S., Kawano, T., Baroncini, V. & Dhondt, Y. (2010). Report on the validation of video quality models for high definition video content, *Technical report*,

*Technical report*, U.S. Department of Homeland Security's Office for Interoperability

URL: http://www.safecomprogram.gov/SAFECOM/library/technology/

Considered as a weak point in road and railway infrastructure, level crossings (LC) improvement safety became an important field of academic research and took increasingly railways undertakings concerns. Improving safety of persons and road-rail facilities is an essential key element to ensure a good operating of the road and railway transport. Statistically, nearly 44% of level crossings users have a bad perception of the environment which consequently increases the accidents risks Nelson (2002). However, the behavior of pedestrians, road vehicle drivers and railway operators cannot be previously estimated beforehand. According to Griffioen (2004), the human errors are the causes of 99% of accidents at LC whose 93% are caused by road users. It is important also to note the high cost related to each accident, approximately one hundred million euro per year in the EU for all level crossing accidents. For this purpose, road and railway safety professionals from several countries have been focused on providing a level crossings as safer as possible. Actions are planned in order to exchange information and provide experiments for improving the management of level crossing safety and performance. This has enabled us to discuss sharing knowledge gained from research into improving safety at level crossings.

High safety requirements for level crossing systems mean a high cost which hinders the technological setup of advanced systems. High technology systems are exploited and introduced in order to timely prevent collisions between trains and automobiles and to help reduce levels of risk from railroad crossings. Several conventional object detection systems have been tested on railroad crossings. These techniques provide more or less significant information accuracy. Any proposed system based on a technological solution is not intended to replace the present equipment installed on each level crossing. The purpose of such an intelligent system is to provide additional information to the human operator; it can be considered as support system operations. This concerns the detection and localization of any kind of objects, such as pedestrians, people on two-wheeled vehicle, wheelchairs and car drivers on the dangerous zone Yoda et al. (2006). Today, there are a number of trigger technologies installed at level crossings, but they all serve the same purpose: they detect moving object when passing at particular points in the LC. Indeed, those conventional obstacle detection systems have been used to prevent collisions between trains

the background and the foreground as statistically independent signals in space and time. Although many relatively effective motion estimation methods exist, ICA is retained for two reasons: first, it is less sensitive to noise caused by the continuously environment changes over time, such as swaying branches, sensor noise, and illumination changes. Second, this method provides clear-cut separation of the objects from the background, and can detect objects that remain motionless for a long period. Foreground extraction is performed separately on both cameras. The motion detection step allows focusing on the areas of interest, in which 3-D

<sup>77</sup> Intelligent Surveillance System Based

– *3-D localization of Moving and Stationary object detection:* this process applies a specific stereo matching algorithm for localizing the detected objects. In order to deal with poor quality images, a selective stereo matching algorithm is developed and applied to the moving regions. First, a disparity map is computed for all moving pixels according to a dissimilarity function entitled Weighted Average Color Difference (WACD) detailed in Fakhfakh et al. (2010). An unsupervised classification technique is then applied to the initial set of matching pixels. This allows to automatically choose only well-matched pixels. A pixel is considered as well-matched if the pair of matched pixels have a confidence measure higher than a threshold. The classification is performed applying a Confidence Measure technique detailed in Fakhfakh et al. (2009). It consists in evaluating the result of the likelihood function, based on the *winner-take-all* strategy. However, the disparities of pixels considered as badly-matched are then estimated applying a hierarchical belief propagation technique detailed further. This

**3. Background subtraction by spatio-temporal independent component analysis**

Complex scenes acquired in outdoor environments require advanced tools to be dealt with, for instance, sharp brightness variation, swaying branches, shadows and sensor noise. The use of stationary cameras restricts the choice of techniques to those based on temporal differencing and background subtraction. The latter aims at segmenting foreground regions corresponding to moving objects from the background, somehow by evaluating the difference of pixel features between a reference background and a current scene image. This kind of technique requires updating the background model over time by modeling the possible states that a pixel can take. A trade-off is to be found between performing a real time implementation and handling background changes which are caused by gradual or sudden illumination

The pixel-based techniques assumes statistical independence between the intensity at each pixel throughout the training sequence of images. The main drawback is that it is not effective to model a complex scene. A mixture of Gaussian distribution (GMM) Stauffer & Grimson (2000) have been proposed to model complex and non-static scenes. It consists of modeling the background as a constant or adaptive number of Gaussians. A relatively robust non-parametric method has been proposed in Elgammal et al. (2000). The authors estimate the density function of a distribution given only very recent history information. This method allows obtaining a sensitive detection. In Zhen & Zhenjiang (2008) the authors use an improved GMM and Graph Cut to minimize an energy function to extract foreground objects. The main disadvantage is that the fast variations cannot be accurately modeled.

allows obtaining, for each obstacle, a high accurate dense disparity map.

localization module is applied.

on Stereo Vision for Level Crossings Safety Applications

**3.1 State of the art**

fluctuations and moving background objects.

and automobiles. In Fakhfakh et al. (2010), the conventional technologies applied at LC are discussed and both the advantages and drawbacks of each are highlighted.

One of the main operational purposes for the introduction of CCTV (Closed Circuit Television) at LC is the automatic detection of specific events. Some object detection vision-based systems have been tested at level crossings, and provide more or less significant information. In video surveillance, one camera, or a set of cameras, supervise zones considered as unsafe in which security must be increased Fakhfakh et al. (2011). Referring to the literature, little research has focused on passive vision to solve the problems at LC. Among the existing systems, two of them based on CCTV cameras are to be distinguished: a system using a single camera Foresti (1998). It uses a single grayscale CCD camera placed on a high pole in a corner of the LC, classifying objects as cars, bikes, trucks, pedestrians and others, and localizing them according to the camera calibration process, assuming a planar model of the road and railroad. This system is prone to false and missed alarms caused by fast illumination changes or shadows. In Ohta (2005), a second system using two cameras with a basic stereo matching algorithm and 3D background removal. This system allows detecting more or less vehicles and pedestrians, but it is extremely sensitive to adverse weather conditions. The 3D localization module is not very accurate because of the simplicity of the proposed stereo matching algorithm.

We propose in this chapter an Automatic Video-Surveillance system (AVS) for an automatic detection of specific events at level crossing. The system allows automatically and accurately detecting and 3D localizing obstacles which are stopped or in motion at the level crossing. This information can be timely transmitted to the train's driver, in a form of red lighting in the cabin, and, on his monitor, the images of such hazardous situation. So, we would be able to evaluate the risk and to warn the appropriate persons. This chapter is organized as follows: after an introduction covering the problem and the area of reserach, we describe in section 2 an overview of our proposed system for object localization at LC. Section 3 will focus on detailing the background subtraction algorithm for stationary and moving object detection from real scenes. Section 4 is dedicated to outlining a robust approach for 3D localization the objects highlighted in section 3. Results concern the object extraction and 3D localization steps are detailed in Section 6. The conclusion is devoted to a discussion on the obtained results, and perspectives are provided.
