**2.2 Basic movement patterns**

268 Real-Time Systems, Architecture, Scheduling, and Application

When this theorem is no longer fulfilled, it appears the phenomenon of sub-sampling or aliasing. In space-time images, this phenomenon produces incorrect inclinations or structures unrelated to each other, as an example of temporal aliasing, we can observe a rotation of the propeller of the planes in the opposite direction to true as shown in Figure 3b. In short, no long displacements can be estimated from input patterns with small scales. In addition to this problem, we have the problem of aperture, discussed previously. These two problems (aliasing and aperture) fulfill the general problem of correspondence as shown in

Fig. 3. (3a. left). So-called "Aperture problem". (3b. right) "Aliasing problem". The two

Therefore, the movement of the input patterns does not always corresponds to features of consecutive frames in an unambiguous manner. The physical correspondence may be undetectable due to the problem of aperture, the lack of texture (example of Figure 2), the long displacements which commute between frames, etc. Similarly, the apparent motion can lead to a false correspondence. For such situations, it is possible using matching algorithms (tracking and correlation), although currently there is much debate about the advantages and disadvantages of using these techniques rather than those based on gradient and energy

The correlation methods are less sensitive to changes in lighting, they are able to estimate long displacements that do not meet the sampling theorem (Yacoob & Davis, 1999). However, they are extremely sensitive to cyclical structures providing various local minima

Alternatively, the other methods are better in efficiency and accuracy; they are able to

Typically in machine vision CCD cameras are used with a discrete ratio, where varying this modifies the displacement between frames, if these shifts are too large, so gradient methods fail (since it fractures the continuity of space-time volume). Although it is possible using an anti-aliasing spatial smoothing to avoid temporal aliasing (Christmas, 1998; Zhang & Wu, 2001), this is the counterpart to degrade spatial information. Therefore, for a given spatial

On the other hand, it is quite common, for real-time optical flow algorithms remain a

and when the aperture problem arises, the responses obtained are unpredictable.

estimate the perpendicular optical flow (in the presence of the aperture problem).

resolution, one has to sample at a high temporal frequency (Yacoob & Davis, 1999).

functional architecture, as shown in Figure 4, via a hierarchical process.

problems conform the "correspondence problem" for motion estimation.

Figure 3.

of motion.

Despite the difficulties in the recovery of flow, the biological systems work surprisingly well in real-time. In the same way that these systems have specialized mechanisms to detect color and stereopsis, are also devoted to visual motion mechanisms (Albright, 1993). As in other areas of research in computer vision, they are formed models of such natural systems to formalize bio-inspired solutions.

Thanks to psychophysical and neurophysiological studies, it has been possible to build models that extract the motion from a sequence of images, usually characterized these biological models to be complex and designed to operate poorly at high speed in real-time.

One of the first models based on real-time bio-inspired visual sensor was proposed by Reichardt (Reichardt, 1961). The detector consists of a couple of receptive fields as shown in Figure 5, where the first signal is delayed with respect to the second before nonlinearly combined by multiplication.

Receptors 1 and 2 (shown as edge detectors) are spaced a distance ΔS, imposing a delay on each signal, after the signal C1 and C2 are operated via multiplication. In the final stage, the result of the first half of the detector is subtracted from the next, and then estimated what each contributes to increased directional selectivity. The sensors shown in Figure 5 are Laplacian detectors, although it is possible using any spatial filter or feature detector.

Real-Time Motion Processing Estimation Methods in Embedded Systems 271

size M2, the search template is size N2, and the search window size is L2, then the whole estimate of computational complexity required would be around M2N2L2. By way of example, with an typical image of 640x480 points, a template window size 50x50 and search one of 100x100, would be required to compute 0.8 billion (long scale) operations. The current trend is to try to reduce the search domain (Oh & Lee, 2000; Anandan *et al*., 1993; Accame *et al*., 1998), although still the need for resources is too high. One of the most common application models used is the encoding of video in real time (Accame *et al*., 1998; Defaux & Moscheni, 1995) increasing the amount of effort devoted to research in

A key in video compression is the similarity between adjacent images temporarily in a sequence. This technique, demand less bandwidth to transfer the differences between frames than to transfer the entire sequence. It is possible even further reduce the data amount transmitted if known a priori the movement and deformation needed to move from

This family of algorithms can be classified deeply, and there are two prominent approaches: • Correlation as a function of 4 variables, depending on the position of the window and displacement, with output normalized between 0 and 1, independent of changes in

• Minimizing the distance, quantifying the dissimilarity between regions. Many optimizations have been on the line to reduce the search space (Oh & Lee, 2000) and

Adelson and Bergen (Adelson & Bergen, 1985) advocate no biological evidence of such models, since they are not able to make predictions about complex stimuli (as example randomly positioned vertical bars), for which, experimental observers perceive different moves in different positions. These techniques are straightforward, have spent many years

The ability to work in environments where the displacements between frames are longer than a few points is one of the main advantages. Though this requires extensive processing

these algorithms.

Fig. 6. Block-Matching technique.

increase their speed (Accame *et al*., 1998).

researching and dominate in industrial inspection and quality control.

one frame to the next.

lighting.

search spaces.

Fig. 5. Reichardt real-time correlation model.

One of the main disadvantages of this detector, is that the correlation is dependent on the contrast, in addition, no speed can be retrieved directly. It requires banks of detectors calibrated at various speeds and directions, and its interpretation at least ambiguous. However, despite these drawbacks, the Reichardt detector can be easily applied by biological systems and is used successfully to explain the visual system of insects. The detector continues to be used as a starting point for more sophisticated models of vision (Beare & Bouzerdoum, 1999; Zanker, 1996), detectors can be implemented in real-time CCD sensor using VLSI technology (Arias-Estrada *et al*., 1996).

### **2.2.1 Change detection and correlation methods**

Considering the basic case of a segmented region with motion in static regions, it comes up as result a binary image which shows regions of motion. The process may seem easy, so they are simply looking for changes in image intensity over a threshold, which is supposed to cause the movement of an object in the visual field. However, the number of false positives that stem from sources such as noise sensors, camera movement, shadows, environmental effects (rain, reflections, etc.), occlusions and lighting changes make extraordinarily difficult to detect robust movement.

Biological systems again despite being highly sensitive to movement are also robust to noise and visual effects uninteresting. This technique is used in situations where motion detection is an event that should be taken into account for future use. Currently, the requirements of estimation for these algorithms are minimal reaching a satisfactory result with little more than an input buffer, arithmetic signs and some robust statistics, as the supervisory systems have to be particularly sensitive and not normally available in large computing power (Rosin, 1998; Pajares, 2007).

When the differential approaches are subject to errors due to noncompliance with the sampling theorem (Nyquist, 2006) or inconvenience lighting changes, it is necessary to apply other strategies. The methods of correlation or pattern matching are the most intuitive to regain speed and direction of movement, work characteristics in selecting a frame in the sequence of images and then looking for these same characteristics in the next as shown in Figure 6. Changes in the position indicate movement in time, i.e speed.

These algorithms are characterized by a poor performance due to its exhaustive search and iterative operations, usually requiring a prohibitive amount of resources. If the image is of size M2, the search template is size N2, and the search window size is L2, then the whole estimate of computational complexity required would be around M2N2L2. By way of example, with an typical image of 640x480 points, a template window size 50x50 and search one of 100x100, would be required to compute 0.8 billion (long scale) operations. The current trend is to try to reduce the search domain (Oh & Lee, 2000; Anandan *et al*., 1993; Accame *et al*., 1998), although still the need for resources is too high. One of the most common application models used is the encoding of video in real time (Accame *et al*., 1998; Defaux & Moscheni, 1995) increasing the amount of effort devoted to research in these algorithms.

Fig. 6. Block-Matching technique.

270 Real-Time Systems, Architecture, Scheduling, and Application

One of the main disadvantages of this detector, is that the correlation is dependent on the contrast, in addition, no speed can be retrieved directly. It requires banks of detectors calibrated at various speeds and directions, and its interpretation at least ambiguous. However, despite these drawbacks, the Reichardt detector can be easily applied by biological systems and is used successfully to explain the visual system of insects. The detector continues to be used as a starting point for more sophisticated models of vision (Beare & Bouzerdoum, 1999; Zanker, 1996), detectors can be implemented in real-time CCD

Considering the basic case of a segmented region with motion in static regions, it comes up as result a binary image which shows regions of motion. The process may seem easy, so they are simply looking for changes in image intensity over a threshold, which is supposed to cause the movement of an object in the visual field. However, the number of false positives that stem from sources such as noise sensors, camera movement, shadows, environmental effects (rain, reflections, etc.), occlusions and lighting changes make

Biological systems again despite being highly sensitive to movement are also robust to noise and visual effects uninteresting. This technique is used in situations where motion detection is an event that should be taken into account for future use. Currently, the requirements of estimation for these algorithms are minimal reaching a satisfactory result with little more than an input buffer, arithmetic signs and some robust statistics, as the supervisory systems have to be particularly sensitive and not normally available in large computing power

When the differential approaches are subject to errors due to noncompliance with the sampling theorem (Nyquist, 2006) or inconvenience lighting changes, it is necessary to apply other strategies. The methods of correlation or pattern matching are the most intuitive to regain speed and direction of movement, work characteristics in selecting a frame in the sequence of images and then looking for these same characteristics in the next as shown in

These algorithms are characterized by a poor performance due to its exhaustive search and iterative operations, usually requiring a prohibitive amount of resources. If the image is of

Figure 6. Changes in the position indicate movement in time, i.e speed.

Fig. 5. Reichardt real-time correlation model.

sensor using VLSI technology (Arias-Estrada *et al*., 1996).

**2.2.1 Change detection and correlation methods** 

extraordinarily difficult to detect robust movement.

(Rosin, 1998; Pajares, 2007).

A key in video compression is the similarity between adjacent images temporarily in a sequence. This technique, demand less bandwidth to transfer the differences between frames than to transfer the entire sequence. It is possible even further reduce the data amount transmitted if known a priori the movement and deformation needed to move from one frame to the next.

This family of algorithms can be classified deeply, and there are two prominent approaches:


Adelson and Bergen (Adelson & Bergen, 1985) advocate no biological evidence of such models, since they are not able to make predictions about complex stimuli (as example randomly positioned vertical bars), for which, experimental observers perceive different moves in different positions. These techniques are straightforward, have spent many years researching and dominate in industrial inspection and quality control.

The ability to work in environments where the displacements between frames are longer than a few points is one of the main advantages. Though this requires extensive processing search spaces.

Real-Time Motion Processing Estimation Methods in Embedded Systems 273

III uv 0 xyt ∂∂∂ + +=

where u=dx/dt and v=dy/dt. The parameters (x, y, t) are omitted for the sake of clarity. Since there is only one equation with two unknowns, (two unknown velocity components), it is possible to recover only the velocity component vn, which lies in the direction of the

( ) ( ) <sup>2</sup> <sup>2</sup>

+

*I I x y*

There are several problems associated with the motion constraint equation, because it is an equation with two unknowns, therefore, insufficient for estimating the optical flow. Using just only equation (3), is possible only to obtain a linear combination of velocity components, this effect, moreover, is fully consistent with the aperture problem mentioned before in the present section. A second problem arises if *Ix* or *Iy* spatial gradients become very small or zero, in which case, the equation becomes ill-conditioned and estimated speed tends asymptotically to infinity. Furthermore, the stable realization of the spatial derivatives is something problematic by itself, applying a differential filter convolution, as operators of

As they are using numerical derivatives of a function sampled, they are best suited for space-time intervals small, the problem of aliasing will appear every time the sample in the space-time is not enough, especially in the time domain, as commented before. There are several filtering techniques to solve this problem, such as a spatiotemporal low-pass filtering

Ideally, the sampling rate should be high enough to reduce all movements within one pixel/frame, so the temporal derivative is well-conditioned (Nyquist, 2006). Moreover, the differential space-time filters that are used to implement gradient algorithms seem reasonably to those found in the visual cortex, although there is no consensus on the optimal from the point of view of functionality (Young and Lesperance, 2001). One advantage of models on the energy gradient, is that they provide a speed from a combination of filters,

We have seen that the equation of motion constraint (MCE) has some anomalies have to be addressed properly to estimate optical flow. There is a wide range of methods to improve it. Many restrictions apply to resolve the two velocity components *u* and *v*, collecting more information (through the acquisition of more images or getting more information for each

• Applying multiple filters (Mitiche, 1987; Sobey & Srinivasan, 1991; Arnspang, 1993;

• Using a neighborhood integration (Lucas & Kanade, 1981; Uras *et al*., 1988; Simoncelli &

image) otherwise, applying physical restraints to generate additional MCEs:

∂ ∂ ∂ ∂

*<sup>I</sup> <sup>t</sup> <sup>n</sup>*

*<sup>v</sup>* <sup>∂</sup> <sup>∂</sup>

<sup>−</sup> <sup>=</sup>

gradient of luminance.

Sobel, Prewitt or difference of Gaussians.

as noted by Zhang (Zhang & Jonathan, 2003).

**2.3 Improving optical flow measures** 

Ghosal & Mehrotra, 1997).

Heeger, 1991).

however energy models provide a population of solutions.

∂∂∂ (3)

(4)

#### **2.2.2 Space-time methods: Gradient and energy**

The movement can be considered as an orientation in the space-time diagram. For example, Figure 7a presents a vertical bar moving continuously from left to right, sampled four times over time. Examining the space-time volume, we can observe the progress of the bar about the time axis thus a stationary angle oriented which shows the extent of movement.

The orientation of the space-time structure can be retrieved through low-level filters. There are currently two dominant strategies: the gradient model and the model of energy as shown in Figure 7b, where the ellipsis represents the negative and positive lobes.

Fig. 7. (7a.left). Motion as orientation in the space-time (*x*-*t* plane) where the angle increases with the velocity. (7b.right). Space-time filtering models.

The gradient model applies the ratio of a spatial and a temporal filter as a measure of speed, however, the energy model uses a set of filter banks oriented in space-time. Both models use a bio-inspired tightly filters (Adelson & Bergen, 1985; Young & Lesperance, 1993). The debate about which is the scheme adopted by the human visual system remains open, and there are even gateways to go from one model to another because it is possible to synthesize the filters oriented in the pattern of energy through space-time separable filters (Fleet & Jepson, 1989; Huang & Chen, 1995). It is also interesting to note that independent component analysis (ICA) notices these spatial filters are those that cover the majority of components of the structure of the image (Hateren & Ruderman, 1998). In the gradient model, the main working hypothesis is the conservation of intensity over time (Horn & Schunk, 1981). Assuming this, over short periods of time, the intensity variations are due only to translation and not to lighting change, reflectance, etc. The total derivative of image intensity with respect to time is zero at every point in space-time, therefore, defining the image intensity as I (x, y, t), we have:

$$\text{dI} \left( \mathbf{x}, \mathbf{y}, \mathbf{t} \right) / \text{ dt} = \mathbf{0} \tag{1}$$

Differentiating by parts, we obtain the so-called motion constraint equation:

$$
\frac{\partial \mathbf{I}}{\partial \mathbf{x}} \frac{d \mathbf{x}}{d \mathbf{t}} + \frac{\partial \mathbf{I}}{\partial \mathbf{y}} \frac{d \mathbf{y}}{d \mathbf{t}} + \frac{\partial \mathbf{I}}{\partial \mathbf{t}} \frac{d \mathbf{t}}{d \mathbf{t}} = \mathbf{0} \tag{2}
$$

272 Real-Time Systems, Architecture, Scheduling, and Application

The movement can be considered as an orientation in the space-time diagram. For example, Figure 7a presents a vertical bar moving continuously from left to right, sampled four times over time. Examining the space-time volume, we can observe the progress of the bar about

The orientation of the space-time structure can be retrieved through low-level filters. There are currently two dominant strategies: the gradient model and the model of energy as

Fig. 7. (7a.left). Motion as orientation in the space-time (*x*-*t* plane) where the angle increases

The gradient model applies the ratio of a spatial and a temporal filter as a measure of speed, however, the energy model uses a set of filter banks oriented in space-time. Both models use a bio-inspired tightly filters (Adelson & Bergen, 1985; Young & Lesperance, 1993). The debate about which is the scheme adopted by the human visual system remains open, and there are even gateways to go from one model to another because it is possible to synthesize the filters oriented in the pattern of energy through space-time separable filters (Fleet & Jepson, 1989; Huang & Chen, 1995). It is also interesting to note that independent component analysis (ICA) notices these spatial filters are those that cover the majority of components of the structure of the image (Hateren & Ruderman, 1998). In the gradient model, the main working hypothesis is the conservation of intensity over time (Horn & Schunk, 1981). Assuming this, over short periods of time, the intensity variations are due only to translation and not to lighting change, reflectance, etc. The total derivative of image intensity with respect to time is zero at every point in space-time, therefore, defining the

Differentiating by parts, we obtain the so-called motion constraint equation:

I dx I I dt dy <sup>0</sup> x dt y dt t dt ∂∂∂

+ +=

dI (x, y, t) / dt = 0 (1)

∂∂∂ (2)

the time axis thus a stationary angle oriented which shows the extent of movement.

shown in Figure 7b, where the ellipsis represents the negative and positive lobes.

**2.2.2 Space-time methods: Gradient and energy**

with the velocity. (7b.right). Space-time filtering models.

image intensity as I (x, y, t), we have:

$$\frac{\partial \mathbf{J}}{\partial \mathbf{x}} \mathbf{u} + \frac{\partial \mathbf{I}}{\partial \mathbf{y}} \mathbf{v} + \frac{\partial \mathbf{I}}{\partial \mathbf{t}} = \mathbf{0} \tag{3}$$

where u=dx/dt and v=dy/dt. The parameters (x, y, t) are omitted for the sake of clarity. Since there is only one equation with two unknowns, (two unknown velocity components), it is possible to recover only the velocity component vn, which lies in the direction of the gradient of luminance.

$$\nu\_n = \frac{-\sqrt{\left(\frac{\partial \ell}{\partial \ell}\right)^2 + \left(\frac{\partial \ell}{\partial \ell}\right)^2}}{\sqrt{\left(\frac{\partial \ell}{\partial \ell}\right)^2 + \left(\frac{\partial \ell}{\partial \ell}\right)^2}}\tag{4}$$

There are several problems associated with the motion constraint equation, because it is an equation with two unknowns, therefore, insufficient for estimating the optical flow. Using just only equation (3), is possible only to obtain a linear combination of velocity components, this effect, moreover, is fully consistent with the aperture problem mentioned before in the present section. A second problem arises if *Ix* or *Iy* spatial gradients become very small or zero, in which case, the equation becomes ill-conditioned and estimated speed tends asymptotically to infinity. Furthermore, the stable realization of the spatial derivatives is something problematic by itself, applying a differential filter convolution, as operators of Sobel, Prewitt or difference of Gaussians.

As they are using numerical derivatives of a function sampled, they are best suited for space-time intervals small, the problem of aliasing will appear every time the sample in the space-time is not enough, especially in the time domain, as commented before. There are several filtering techniques to solve this problem, such as a spatiotemporal low-pass filtering as noted by Zhang (Zhang & Jonathan, 2003).

Ideally, the sampling rate should be high enough to reduce all movements within one pixel/frame, so the temporal derivative is well-conditioned (Nyquist, 2006). Moreover, the differential space-time filters that are used to implement gradient algorithms seem reasonably to those found in the visual cortex, although there is no consensus on the optimal from the point of view of functionality (Young and Lesperance, 2001). One advantage of models on the energy gradient, is that they provide a speed from a combination of filters, however energy models provide a population of solutions.

#### **2.3 Improving optical flow measures**

We have seen that the equation of motion constraint (MCE) has some anomalies have to be addressed properly to estimate optical flow. There is a wide range of methods to improve it. Many restrictions apply to resolve the two velocity components *u* and *v*, collecting more information (through the acquisition of more images or getting more information for each image) otherwise, applying physical restraints to generate additional MCEs:


Real-Time Motion Processing Estimation Methods in Embedded Systems 275

Due to the lack of information and spatial structure of the image, is not easy to estimate a

To correct this problem, several restrictions are applied, as for example, that the points move closer together in a similar way. The general philosophy is that the original flow field, once

The first constraint was proposed by Horn and Schunk (Horn & Schunk, 1981). Optic flow resulting from the global constraints, is quite robust due to the combination of results, and is also flattering to the human eye. Two of the biggest drawbacks are its iterative nature, requiring large amounts of time and computing resources, and motion discontinuities are not handled properly, so that erroneous results are produced in the regions surrounding the motion edges. To address these latter gaps are proposed other techniques that use global

In all MCE estimation techniques, appear significant restrictions on a neighborhood where it is assumed constant flux. To meet this requirement when there are movement patterns, this neighborhood has to be as small as possible, but at the same time it must be large enough to obtain information and to avoid the aperture problem. Therefore, we need a trade-off

A variety of models uses estimations related to this neighborhood, such as least squares. If using a quadratic objective function, is assumed a Gaussian residual error rate, but if having multiple movements in the neighborhood, these errors can no longer be considered Gaussian. Even if these errors were independent (very usual situation) the error distribution

There are approximate models can be incorporated into the flow range of techniques that are being exposed. These approaches also model spatial variations of multiple movements. The neighborhood integration techniques, as mentioned, assume that the image motion is purely translational in local regions. Thus, more elaborate models (such as the affine model) can extend the range of motion, and provides additional restrictions. These methods recast the MCE with an error function that will resolve or minimize least squares (Campani & Verri, 1992; Bergen & Bart, 1992; Gupta & Kanal, 1995, 1997; Giaccone & Jones, 1997, 1998). Large displacements between frames that originate gradient methods, behave inappropriately, since the image sequences are insufficient or that the time derivative measures are inaccurate. As a workaround it is possible using larger spatial filters than early

The use of multi-scale Gaussian pyramid can handle high movements between frames and fill the gaps in large regions where the texture is uniform, so that estimates of coarse-scale

The use of temporal multi-scale (Yacoob & Davis, 1999) also allows the accurate estimation of a range of different movements, but this method requires using a high enough sampling

**2.3.2 Different general methods for restricting MCE and improve optical flow** 

estimated, is iteratively regularized with respect to the smoothing restriction.

statistics such as random Markov chains (Heitz & Bouthemy, 1993).

**measures** 

compromise.

can be modeled as a bimodal.

model (Christmas, 1998).

motion are used as sources for a finer scale (Zhang, 2001).

rate to reduce movements about a pixel/frame.

sufficiently dense velocity field.

• Using multispectral images (Golland & Bruckstein, 1997).

There are different general methods for restricting MCE and improve optical flow measures:

