**Advances in Adaptive Composite Filters for Object Recognition**

Victor H. Diaz-Ramirez1, Leonardo Trujillo2 and Sergio Pinto-Fernandez1 <sup>1</sup>*Instituto Politecnico Nacional - CITEDI* <sup>2</sup>*Instituto Tecnologico de Tijuana México*

#### **1. Introduction**

The problem of object recognition is one of the most common problems that is addressed by researchers and engineers that want to develop artificial vision or image analysis systems. In order to recognize an object within an image or video sequence we must basically solve two different but related tasks. Firstly, it is essential to detect the target object within the scene image, and secondly its exact location within the image must be estimated. While the general concept of object recognition is straightforward, even a brief review of modern literature reveals a wide range of proposals and systems (Goudail & Refregier, 2004; Szeliski, 2010). However, one of the most common and successful approaches are local feature-based systems that normally employ two basic steps (Lowe, 2004; Tuytelaars & Mikolajczyk, 2008). First, object features are extracted from the scene image, and afterwards a classification step is used to determine if the observed features belong to the target object; a process known as feature matching. Feature-based systems have achieved very good results and are widely used in many application domains. Nevertheless, feature based systems suffer from two noteworthy drawbacks. First, they can be computationally expensive1, and second their overall performance depends upon some ad-hoc decisions that might require optimization (Brown et al., 2011; Olague & Trujillo, 2011; Pérez & Olague, 2008; Theodoridis & Koutroumbas, 2008; Trujillo & Olague, 2008).

An attractive alternative to feature-based systems is given by correlation filtering algorithms, an approach that has been intensively investigated over the last decades (Vijaya-Kumar et al., 2005). A correlation filter is basically a linear system whose output is the maximum-likelihood estimator of the targets coordinates in the observed scene (Goudail & Refregier, 2004; Refregier, 1999). In other words, detection is carried out by searching for correlation peaks in the system output, and the coordinates of these peaks provide the position estimates that localize the objects within the scene. An advantage of correlation filtering is that it possesses a strong mathematical foundation. Moreover, the design process of correlation filters usually considers the optimization of various performance criteria (Vijaya-Kumar & Hassebrook,

<sup>1</sup> While some implementations can achieve very high frame rates, they nevertheless are far behind the almost instantaneous results that optical-electronic systems can achieve with correlation filters such as those described in this chapter.

As result, OTSDF filters can yield a balanced performance in recognizing a target corrupted by several types of concurrent noise processes. Recently, a composite filter which performs a compromise between a constrained and unconstrained filter using two mutually exclusive training sets was proposed (Diaz-Ramirez, 2010). This constrained filter improves tolerance

Advances in Adaptive Composite Filters for Object Recognition 93

A main drawback of both constrained and unconstrained composite filters is that their performance strongly depends upon the proper selection of the training set of images. In fact, the training images are commonly chosen based on the experience of the designer in an ad-hoc manner. Therefore, it is not possible to guarantee optimal performance in the general case, given that it is not possible to a priori determine the optimal set of training patterns.

To overcome these shortcomings, recent works propose an adaptive approach towards filter synthesis (Aguilar-Gonzalez et al., 2008; Diaz-Ramirez & Kober, 2007; Diaz-Ramirez et al., 2006; Gonzalez-Fraga et al., 2006; Martinez-Diaz et al., 2008; Ramos-Michel & Kober, 2008). In such an approach, the goal is to construct a composite filter with optimal performance characteristics for a fixed set of patterns, rather than a filter that achieves average performance over an ensemble of images. One possible way to implement an adaptive approach for filter synthesis is to use an incremental search algorithm. Such an algorithm can use all available information about the objects to be recognized, as well as examples of false objects or background samples that should be rejected. The adaptive process for filter synthesis can also account for additive sensor noise by training with images corrupted by a particular noise model. Therefore, adaptive filters can exhibit a high amount of robustness to noise during the

This chapter presents recent advances in the design of adaptive composite correlation filters for robust object recognition. We describe two different design approaches, based on the basic models of constrained and unconstrained filters. We show that the resultant adaptive constrained filters can achieve a high recognition rate with a low computational complexity, by simply using EOC constraints with complex values. Furthermore, unconstrained adaptive filters can be constructed to produce robust recognition in highly noisy conditions. The remainder of the chapter is organized follows. Section 2 presents a brief review of the most successful composite filters for object recognition. Then, Section 3 describes two proposed algorithms to synthesize adaptive composite filters. Computer simulation results obtained with the proposed adaptive filters are presented in Section 4. These results are discussed and compared in terms of performance metrics with those obtained with existing composite filters

In this Section, the main strategies for composite correlation filter designs are recalled. We consider constrained SDF and MACE filters, as well as unconstrained MACH and OTSDF filters. Basically, composite filters can be used for intraclass distortion-tolerant pattern recognition; i.e., detection of distorted patterns belonging to the same class of objects. Let {*S*} = {*Ti*(*μ*, *ν*)|*i* = 1, ..., *N*} be a set consisting of *N* different training images expressed in the frequency domain, where each one represents a distorted versions of the target object *t*(*x*, *y*),

to intraclass distortions without lowering the signal to noise ratio.

in noisy scenes. Finally, Section 5 summarizes our conclusions.

**2. Composite correlation filters**

imaging process.

1990). As result, correlation filters have been used to develop reliable object recognition systems that exhibit robust performance even when used in highly noisy conditions (Javidi & Hormer, 1994; Javidi & Wang, 1997; Javidi et al., 1996). Correlation filters are commonly implemented using hybrid opto-digital correlators, thus exploiting the inherent parallelism of optics and achieving a very high rate of operation. Optical correlators follow two basic types of architectures: the 4F correlator (4FC) (Vanderlugt, 1964; 1992) and the joint transform correlator (JTC) (Javidi & Horner, 1989; Weaver & Goodman, 1966). Both architectures allow fast object recognition, however they are very sensitive to ambient disturbances and to misalignments in the optical setup (Nicolás et al., 2001). On the other hand, it is also possible to effectively implement correlation filters using a digital computer and efficient algorithms for the fast Fourier transform. In fact, currently there are several very large scale integration (VLSI) devices that can be used to digitally implement correlation filtering algorithms that operate in real-time, such as field programmable gate arrays (FPGA) (Rakvic et al., 2010) and graphics processing units (GPU) (Sanders & Kandrot, 2010).

In general, correlation filters can be broadly classified into two main classes, analytical filters and composite filters. Analytical filters are typically given by a closed form mathematical expression that is directly derived from the respective signal and noise models while optimizing specific quality metrics (Javidi & Wang, 1997; Kerekes & Vijaya-Kumar, 2006; Vijaya-Kumar et al., 2000; Yaroslavsky, 1993). On the other hand, composite filters are constructed by combining a set of training images, which are explicit representations of the target object and their expected distortions (Bahri & Kumar, 1988; Kerekes & Vijaya-Kumar, 2008; Vijaya-Kumar, 1992). It is assumed that when the training images are properly chosen, we can synthesize composite filters that achieve very good and robust performance in recognizing the target object. The rest of this chapter deals with composite correlation filters, while the interested reader is referred to (Javidi & Hormer, 1994; Vijaya-Kumar et al., 2005) for more information regarding analytical filters. Composite filters can be further classified as constrained or unconstrained filters. Constrained filters are designed in such a manner that the filter's output at the origin of the training images must be equal to a prespecified value (Kerekes & Vijaya-Kumar, 2008; Vijaya-Kumar, 1992). These restrictions are known as the equal output correlation-peak (EOC) constraints. Synthetic discriminant functions (SDF) (Hester & Casasent, 1980) and minimum average correlation energy (MACE) (Mahalanobis et al., 1987), are two popular constrained filters. Unconstrained filters avoid the EOC constraints in order to expand the solutions space for filter synthesis, thus achieving a higher robustness to scene distortions when compared to constrained filters. Maximum average correlation height (MACH) filters (Mahalanobis et al., 1994) and optimal trade-off SDF (OTSDF) filters (Goudail & Refregier, 2004; Vijaya-Kumar et al., 1994) are examples of widely used unconstrained filters. The MACH filters maximize the average response at the origin of the training images and also minimize an average dissimilarity measure over the training set. Thus, MACH filters are robust to distorted versions of the target which are not included in the training set (called intraclass distortions). Several versions of MACH filters exist, among these the generalized MACH (GMACH) filter achieves the lowest variations in correlation peaks among the set of training images (Alkanhal et al., 2000; Nevel & Mahalanobis, 2003). This means that the GMACH filter yields an optimized response to intraclass distortions. The OTSDF filters, on the other hand, provide a compromise between multiple performance criteria by optimizing their weighted sum (Vijaya-Kumar et al., 1994).

2 Will-be-set-by-IN-TECH

1990). As result, correlation filters have been used to develop reliable object recognition systems that exhibit robust performance even when used in highly noisy conditions (Javidi & Hormer, 1994; Javidi & Wang, 1997; Javidi et al., 1996). Correlation filters are commonly implemented using hybrid opto-digital correlators, thus exploiting the inherent parallelism of optics and achieving a very high rate of operation. Optical correlators follow two basic types of architectures: the 4F correlator (4FC) (Vanderlugt, 1964; 1992) and the joint transform correlator (JTC) (Javidi & Horner, 1989; Weaver & Goodman, 1966). Both architectures allow fast object recognition, however they are very sensitive to ambient disturbances and to misalignments in the optical setup (Nicolás et al., 2001). On the other hand, it is also possible to effectively implement correlation filters using a digital computer and efficient algorithms for the fast Fourier transform. In fact, currently there are several very large scale integration (VLSI) devices that can be used to digitally implement correlation filtering algorithms that operate in real-time, such as field programmable gate arrays (FPGA) (Rakvic et al., 2010) and

In general, correlation filters can be broadly classified into two main classes, analytical filters and composite filters. Analytical filters are typically given by a closed form mathematical expression that is directly derived from the respective signal and noise models while optimizing specific quality metrics (Javidi & Wang, 1997; Kerekes & Vijaya-Kumar, 2006; Vijaya-Kumar et al., 2000; Yaroslavsky, 1993). On the other hand, composite filters are constructed by combining a set of training images, which are explicit representations of the target object and their expected distortions (Bahri & Kumar, 1988; Kerekes & Vijaya-Kumar, 2008; Vijaya-Kumar, 1992). It is assumed that when the training images are properly chosen, we can synthesize composite filters that achieve very good and robust performance in recognizing the target object. The rest of this chapter deals with composite correlation filters, while the interested reader is referred to (Javidi & Hormer, 1994; Vijaya-Kumar et al., 2005) for more information regarding analytical filters. Composite filters can be further classified as constrained or unconstrained filters. Constrained filters are designed in such a manner that the filter's output at the origin of the training images must be equal to a prespecified value (Kerekes & Vijaya-Kumar, 2008; Vijaya-Kumar, 1992). These restrictions are known as the equal output correlation-peak (EOC) constraints. Synthetic discriminant functions (SDF) (Hester & Casasent, 1980) and minimum average correlation energy (MACE) (Mahalanobis et al., 1987), are two popular constrained filters. Unconstrained filters avoid the EOC constraints in order to expand the solutions space for filter synthesis, thus achieving a higher robustness to scene distortions when compared to constrained filters. Maximum average correlation height (MACH) filters (Mahalanobis et al., 1994) and optimal trade-off SDF (OTSDF) filters (Goudail & Refregier, 2004; Vijaya-Kumar et al., 1994) are examples of widely used unconstrained filters. The MACH filters maximize the average response at the origin of the training images and also minimize an average dissimilarity measure over the training set. Thus, MACH filters are robust to distorted versions of the target which are not included in the training set (called intraclass distortions). Several versions of MACH filters exist, among these the generalized MACH (GMACH) filter achieves the lowest variations in correlation peaks among the set of training images (Alkanhal et al., 2000; Nevel & Mahalanobis, 2003). This means that the GMACH filter yields an optimized response to intraclass distortions. The OTSDF filters, on the other hand, provide a compromise between multiple performance criteria by optimizing their weighted sum (Vijaya-Kumar et al., 1994).

graphics processing units (GPU) (Sanders & Kandrot, 2010).

As result, OTSDF filters can yield a balanced performance in recognizing a target corrupted by several types of concurrent noise processes. Recently, a composite filter which performs a compromise between a constrained and unconstrained filter using two mutually exclusive training sets was proposed (Diaz-Ramirez, 2010). This constrained filter improves tolerance to intraclass distortions without lowering the signal to noise ratio.

A main drawback of both constrained and unconstrained composite filters is that their performance strongly depends upon the proper selection of the training set of images. In fact, the training images are commonly chosen based on the experience of the designer in an ad-hoc manner. Therefore, it is not possible to guarantee optimal performance in the general case, given that it is not possible to a priori determine the optimal set of training patterns.

To overcome these shortcomings, recent works propose an adaptive approach towards filter synthesis (Aguilar-Gonzalez et al., 2008; Diaz-Ramirez & Kober, 2007; Diaz-Ramirez et al., 2006; Gonzalez-Fraga et al., 2006; Martinez-Diaz et al., 2008; Ramos-Michel & Kober, 2008). In such an approach, the goal is to construct a composite filter with optimal performance characteristics for a fixed set of patterns, rather than a filter that achieves average performance over an ensemble of images. One possible way to implement an adaptive approach for filter synthesis is to use an incremental search algorithm. Such an algorithm can use all available information about the objects to be recognized, as well as examples of false objects or background samples that should be rejected. The adaptive process for filter synthesis can also account for additive sensor noise by training with images corrupted by a particular noise model. Therefore, adaptive filters can exhibit a high amount of robustness to noise during the imaging process.

This chapter presents recent advances in the design of adaptive composite correlation filters for robust object recognition. We describe two different design approaches, based on the basic models of constrained and unconstrained filters. We show that the resultant adaptive constrained filters can achieve a high recognition rate with a low computational complexity, by simply using EOC constraints with complex values. Furthermore, unconstrained adaptive filters can be constructed to produce robust recognition in highly noisy conditions. The remainder of the chapter is organized follows. Section 2 presents a brief review of the most successful composite filters for object recognition. Then, Section 3 describes two proposed algorithms to synthesize adaptive composite filters. Computer simulation results obtained with the proposed adaptive filters are presented in Section 4. These results are discussed and compared in terms of performance metrics with those obtained with existing composite filters in noisy scenes. Finally, Section 5 summarizes our conclusions.

## **2. Composite correlation filters**

In this Section, the main strategies for composite correlation filter designs are recalled. We consider constrained SDF and MACE filters, as well as unconstrained MACH and OTSDF filters. Basically, composite filters can be used for intraclass distortion-tolerant pattern recognition; i.e., detection of distorted patterns belonging to the same class of objects. Let {*S*} = {*Ti*(*μ*, *ν*)|*i* = 1, ..., *N*} be a set consisting of *N* different training images expressed in the frequency domain, where each one represents a distorted versions of the target object *t*(*x*, *y*),

and is subject to meet the EOC constraints

of true-class objects (target class), given by

**h**MACE = **D**−1**T**

and to reject training images from the false-class (unwanted class), given by

Advances in Adaptive Composite Filters for Object Recognition 95

Assume that there are several distorted versions of a target object {*ti*(*x*, *y*)} and various objects to be discriminated { *fi*(*x*, *y*)}; in other words, a two-class pattern recognition problem. Then, the goal is to design a constrained composite filter to recognize images from the training set

A two-class composite filter can be constructed by combining all of the given training images in a set {*S*} = {*T*} ∪ {*F*}. Afterwards, to solve the two-class pattern recognition problem we

for the false-class objects. In this manner, the vector **c** of EOC constraints is given by

Moreover, this approach can easily be extended to multi-class problems.

**c** = [1, 1, . . . 1, 0, 0, . . . , 0]

It can be seen that both SDF and MACE filters with equal output correlation peaks can be used for intraclass distortion-tolerant pattern recognition or for interclass pattern recognition. For a two-class constrained composite filter, we can expect that the central correlation peak will be close to unity for the true-class objects and close to zero for objects of the false-class.

Suppose that the true-class subset {*T*} is given by the union of *K* different subsets of training

*k*=1

{*T*} <sup>=</sup> *K*

**T**+**D**−1**T**

−<sup>1</sup>

{*T*} = {*T*1(*μ*, *ν*), *T*2(*μ*, *ν*),..., *TNT* (*μ*, *ν*)} (9)

{*F*} = {*F*1(*μ*, *ν*), *F*2(*μ*, *ν*),..., *FNF* (*μ*, *ν*)} (10)

{*ci* = 0; *i* = *NT* + 1, *NT* + 2, . . . *NT* + *NF*} (12)

{*ci* = 1; *i* = 1, 2, . . . , *NT*} (11)

*<sup>T</sup>* (13)

{*Tk*} (14)

is given by (Mahalanobis et al., 1987)

**Multiclass pattern recognition**

can set the filter output as

**Multiclass problem**

images, as follows

for the true-class objects, and

**Two-class problem**

**c**<sup>∗</sup> = **T**+**h**MACE (7)

**c**∗ (8)

where *T*(*μ*, *ν*) is the Fourier transform of *t*(*x*, *y*). Composite filters must be able to recognize the target and all the distorted versions in {*S*} using a single correlation operation.

#### **2.1 Constrained composite filters**

#### **Synthetic Discriminant Functions (SDF) filter**

An SDF filter can be expressed as a linear combination of the Fourier transformed training images *Ti*(*μ*, *ν*), as follows,

$$H(\mu, \nu) = \sum\_{i=1}^{N} a\_i T\_i(\mu, \nu) \tag{1}$$

where {*ai*|*i* = 1, . . . , *N*} are unknown coefficients that must be chosen to satisfy the inner-product conditions (Vijaya-Kumar, 1992)

$$c\_i = \langle T\_i(\mu, \nu), H(\mu, \nu) \rangle \tag{2}$$

The quantities {*ci*} represent the EOC constraints, that is, prespecified values in the correlation output at the origin of each training image. Let **T** be a matrix with *N* columns and *d* rows (the number of pixels in each training image) where its *i*th column is given by **t***i*, a *d* × 1 vector constructed by placing the elements of *Ti*(*μ*, *ν*) in lexicographical order. Let **a** and **c** respectively represent column vectors of {*ai*} and {*ci*}. In matrix-vector notation, filter *H*(*μ*, *ν*) and constraints {*ci*} can be rewritten as

$$\mathbf{h}\_{\text{SDF}} = \mathbf{T} \mathbf{a} \tag{3}$$

and

$$\mathbf{c}^\* = \mathbf{T}^+ \mathbf{h}\_{\text{SDF}} \tag{4}$$

where superscripts "∗" and "+" represent the complex conjugate and the conjugate transpose, respectively. Combining Eqs. (3) and (4) the solution of the system of equations is **a** = **T**+**T** −<sup>1</sup> **c**, and if matrix **T**+**T** is nonsingular the filter solution is

$$\mathbf{h}\_{\rm SDF} = \mathbf{T} \left(\mathbf{T}^+ \mathbf{T}\right)^{-1} \mathbf{c}^\* \tag{5}$$

#### **Minimum Average Correlation Energy (MACE) filter**

The MACE filter is able to produce sharp correlation peaks by suppressing lateral sidelobes (Mahalanobis et al., 1987). This can be done by minimizing the average correlation energy (ACE) in the filter output, subject to the prespecified EOC constraints. The effect of minimizing the ACE measure is that the resultant correlation function would yield values close to zero everywhere except at the central location for training images, where the EOC constraints occur (Mahalanobis et al., 1987). Let **D** be a *d* × *d* diagonal matrix where the entries along the main diagonal are obtained by computing E |**t***i*| 2 ; *i* = 1, . . . , *N* , which are the average power spectra of the training images. In matrix-vector notation, filter **h**MACE which minimizes

$$\text{ACE} = \mathbf{h}\_{\text{MACE}}^{+} \mathbf{D} \mathbf{h}\_{\text{MACE}} \tag{6}$$

and is subject to meet the EOC constraints

$$\mathbf{c}^\* = \mathbf{T}^+ \mathbf{h}\_{\text{MACE}} \tag{7}$$

is given by (Mahalanobis et al., 1987)

$$\mathbf{h}\_{\rm MACE} = \mathbf{D}^{-1} \mathbf{T} \left(\mathbf{T}^{+} \mathbf{D}^{-1} \mathbf{T}\right)^{-1} \mathbf{c}^{\*} \tag{8}$$

#### **Multiclass pattern recognition**

#### **Two-class problem**

4 Will-be-set-by-IN-TECH

where *T*(*μ*, *ν*) is the Fourier transform of *t*(*x*, *y*). Composite filters must be able to recognize

An SDF filter can be expressed as a linear combination of the Fourier transformed training

*N* ∑ *i*=1

where {*ai*|*i* = 1, . . . , *N*} are unknown coefficients that must be chosen to satisfy the

The quantities {*ci*} represent the EOC constraints, that is, prespecified values in the correlation output at the origin of each training image. Let **T** be a matrix with *N* columns and *d* rows (the number of pixels in each training image) where its *i*th column is given by **t***i*, a *d* × 1 vector constructed by placing the elements of *Ti*(*μ*, *ν*) in lexicographical order. Let **a** and **c** respectively represent column vectors of {*ai*} and {*ci*}. In matrix-vector notation, filter

where superscripts "∗" and "+" represent the complex conjugate and the conjugate transpose, respectively. Combining Eqs. (3) and (4) the solution of the system of equations is **a** =

The MACE filter is able to produce sharp correlation peaks by suppressing lateral sidelobes (Mahalanobis et al., 1987). This can be done by minimizing the average correlation energy (ACE) in the filter output, subject to the prespecified EOC constraints. The effect of minimizing the ACE measure is that the resultant correlation function would yield values close to zero everywhere except at the central location for training images, where the EOC constraints occur (Mahalanobis et al., 1987). Let **D** be a *d* × *d* diagonal matrix where the

the average power spectra of the training images. In matrix-vector notation, filter **h**MACE

ACE = **h**<sup>+</sup>

**h**SDF = **T**

is nonsingular the filter solution is

**T**+**T** <sup>−</sup><sup>1</sup>

*aiTi*(*μ*, *ν*) (1)

*ci* = �*Ti*(*μ*, *ν*), *H*(*μ*, *ν*)� (2)

**h**SDF = **Ta** (3)

**c**<sup>∗</sup> = **T**+**h**SDF (4)

 |**t***i*| 2

**c**∗ (5)

; *i* = 1, . . . , *N*

MACE**Dh**MACE (6)

, which are

the target and all the distorted versions in {*S*} using a single correlation operation.

*H*(*μ*, *ν*) =

**2.1 Constrained composite filters**

images *Ti*(*μ*, *ν*), as follows,

and

 **T**+**T** −<sup>1</sup>

which minimizes

**Synthetic Discriminant Functions (SDF) filter**

inner-product conditions (Vijaya-Kumar, 1992)

*H*(*μ*, *ν*) and constraints {*ci*} can be rewritten as

**T**+**T** 

entries along the main diagonal are obtained by computing E

**Minimum Average Correlation Energy (MACE) filter**

**c**, and if matrix

Assume that there are several distorted versions of a target object {*ti*(*x*, *y*)} and various objects to be discriminated { *fi*(*x*, *y*)}; in other words, a two-class pattern recognition problem. Then, the goal is to design a constrained composite filter to recognize images from the training set of true-class objects (target class), given by

$$\{T\} = \{T\_1(\mu, \nu), T\_2(\mu, \nu), \dots, T\_{N\_\Upsilon}(\mu, \nu)\}\tag{9}$$

and to reject training images from the false-class (unwanted class), given by

$$\{F\} = \{F\_1(\mu, \nu), F\_2(\mu, \nu), \dots, F\_{N\_F}(\mu, \nu)\} \tag{10}$$

A two-class composite filter can be constructed by combining all of the given training images in a set {*S*} = {*T*} ∪ {*F*}. Afterwards, to solve the two-class pattern recognition problem we can set the filter output as

$$\{\mathbf{c}\_i = 1; i = 1, 2, \dots, N\_T\} \tag{11}$$

for the true-class objects, and

$$\{c\_i = 0; i = N\_T + 1, N\_T + 2, \dots, N\_T + N\_F\} \tag{12}$$

for the false-class objects. In this manner, the vector **c** of EOC constraints is given by

$$\mathbf{c} = \begin{bmatrix} 1, 1, \dots, 1, 0, 0, \dots, 0 \end{bmatrix}^T \tag{13}$$

It can be seen that both SDF and MACE filters with equal output correlation peaks can be used for intraclass distortion-tolerant pattern recognition or for interclass pattern recognition. For a two-class constrained composite filter, we can expect that the central correlation peak will be close to unity for the true-class objects and close to zero for objects of the false-class. Moreover, this approach can easily be extended to multi-class problems.

#### **Multiclass problem**

Suppose that the true-class subset {*T*} is given by the union of *K* different subsets of training images, as follows

$$\{T\} = \bigcup\_{k=1}^{K} \{T\_k\} \tag{14}$$

**2.2 Unconstrained composite filters**

2

produced by the training images **v***<sup>i</sup>* = **X**<sup>∗</sup>

where the resultant MACH filter is given by

**Generalized MACH (GMACH) filter**

the average training image ¯**v** = **M**∗**h**MACH, that is,

function *J* = |ACH|

and

where

et al., 1987):

**Maximum Average Correlation Height (MACH) filter**

correlation values produced by the training images, as

elements of the training vectors **t***i*, and the average training vector

The MACH filter **h**MACH is designed to maximize the ratio between the intensity of the output average correlation height (ACH) and the average similarity measure (ASM) among training images (Mahalanobis et al., 1994). Hence, the MACH filter is designed to maximize the

Advances in Adaptive Composite Filters for Object Recognition 97

**<sup>m</sup>** <sup>=</sup> <sup>1</sup> *N*

*N*

*dN*

ACH <sup>=</sup> <sup>1</sup>

ASM <sup>=</sup> <sup>1</sup>

In a compact notation we can rewrite the ACH and ASM measures as follows,

ASM = **h**<sup>+</sup>

*N* ∑ *i*=1

*<sup>J</sup>*(**h**MACH) = **<sup>h</sup>**<sup>+</sup>

**<sup>S</sup>** <sup>=</sup> <sup>1</sup> *dN*

Furthermore, the ACH measure can be described as the average of the output central

*N* ∑ *i*=1 **t** +

Additionally, the ASM can be seen as the average error between the full correlation responses

*N* ∑ *i*=1


(**X***<sup>i</sup>* − **M**) (**X***<sup>i</sup>* − **M**)

MACH**mm**+**h**MACH

MACH**Dh**MACH

Thus, filter **h**MACH is obtained by maximizing the following objective function (Mahalanobis

**h**<sup>+</sup>

The GMACH filter **h**GMACH (Alkanhal et al., 2000), can be seen as a trade-off between a filter with EOC constraints and the MACH filter. Note, that the correlation output at the origin for

*N* ∑ *i*=1

/ASM. Let **X***<sup>i</sup>* and **M**, be both *d* × *d* diagonal matrices containing the

**t***<sup>i</sup>* (20)

*<sup>i</sup>* **h**MACH (21)

<sup>2</sup> (22)

∗ (25)

(26)

*<sup>i</sup>* **h**MACH, and the correlation function produced by

ACH = **m**+**h**MACH (23)

**h**MACH = **S**−1**m** (27)

MACH**Sh**MACH (24)

where {*Tk*} is a subset of training images that represents the *k*th true-class of objects to be recognized, which is given by

$$\{T\_k\} = \left\{ T\_i^k(\mu, \nu) \,|\, i = 1, \dots, N\_T \right\} \tag{15}$$

Here, *T<sup>k</sup> <sup>i</sup>* (*μ*, *ν*) is the *i*th Fourier transformed training image, which belongs to the *k*th true-class of objects. For simplicity, we assume that each subset {*Tk*} contains *NT* training images. The set {*S*} of all training images can be constructed as follows

$$\{\mathcal{S}\} = \left\{ \bigcup\_{k=1}^{K} \{T\_k\} \right\} \cup \{F\} \tag{16}$$

According to the SDF approach a constrained filter can be constructed as a linear combination of all training images in {*S*}, subject to satisfy the prespecified EOC constraints {*ci*} (Vijaya-Kumar, 1992). In the basic two-class object recognition problem, we need to set the filter output to yield an intensity value equal to unity for any object that belongs to {*T*}, and to yield an intensity value of zero for any object that belongs to {*F*}; i.e.,

$$\left|c\_{ki}\right|^2 = 1; \text{ for } \left\{T\_i^k(\mu, \nu)\right\} \in \{T\}, \ k = 1, \ldots, K \tag{17}$$

and

$$\left|c\_{(K+1)i}\right|^2 = 0; \text{ for } \left\{T\_i(\mu, \nu)\right\} \in \{F\} \tag{18}$$

Furthermore, to distinguish among objects from different true-classes {*Tk*}, the constraint vales {*ci*} must not only satisfy Eqs. (17) and (18), they must also provide information regarding the specific class of each training image. For this, we propose to use complex values {*ci*} with a magnitude value equal to unity for all, but each with a different prespecified phase value that indicates the class that correspond to each training image. The encoded phase values must be chosen to allow us to associate (in the complex correlation plane of the output) any unknown input patterns to one of the *K* different true-classes. This can be achieved by using the following EOC constraints,

$$\{c\_{i} = \exp\left(i\phi\_{1}\right), \text{ for } i = 1, \dots, N\_{T} \ \}; \forall \left\{T\_{i}^{1}\left(\mu, \nu\right)\right\} \in \{T\_{1}\}$$

$$\{c\_{i} = \exp\left(i\phi\_{2}\right), \text{ for } i = N\_{T} + 1, \dots, 2N\_{T} \ \}; \forall \left\{T\_{i}^{2}\left(\mu, \nu\right)\right\} \in \{T\_{2}\}$$

$$\vdots$$

$$\{c\_{i} = \exp\left(i\phi\_{K}\right), \text{ for } i = (K - 1)\,N\_{T} + 1, \dots, KN\_{T}\ \}; \forall \left\{T\_{i}^{K}\left(\mu, \nu\right)\right\} \in \{T\_{K}\}\tag{19}$$

Here, {*φk*|*k* = 1, . . . , *K*} are prespecified phase values associated to the *k*th true-class of objects {*Tk*}. Observe that by using a constrained composite filter with complex EOC constraints, we satisfy the equal output intensity restrictions imposed by Eqs. (17) and (18), and at the same time we can classify any unknown input pattern from the input scene by comparing the obtained phase values *φ*ˆ*<sup>k</sup>* at coordinates of maximum intensities (correlation peaks), with the prespecified *φ<sup>k</sup>* values previously defined in the filter constraints (Diaz-Ramirez et al., 2012).

#### **2.2 Unconstrained composite filters**

6 Will-be-set-by-IN-TECH

where {*Tk*} is a subset of training images that represents the *k*th true-class of objects to be

true-class of objects. For simplicity, we assume that each subset {*Tk*} contains *NT* training

{*Tk*} 

 *K*

*k*=1

According to the SDF approach a constrained filter can be constructed as a linear combination of all training images in {*S*}, subject to satisfy the prespecified EOC constraints {*ci*} (Vijaya-Kumar, 1992). In the basic two-class object recognition problem, we need to set the filter output to yield an intensity value equal to unity for any object that belongs to {*T*}, and

Furthermore, to distinguish among objects from different true-classes {*Tk*}, the constraint vales {*ci*} must not only satisfy Eqs. (17) and (18), they must also provide information regarding the specific class of each training image. For this, we propose to use complex values {*ci*} with a magnitude value equal to unity for all, but each with a different prespecified phase value that indicates the class that correspond to each training image. The encoded phase values must be chosen to allow us to associate (in the complex correlation plane of the output) any unknown input patterns to one of the *K* different true-classes. This can be achieved by

> *T*1 *<sup>i</sup>* (*μ*, *ν*)

Here, {*φk*|*k* = 1, . . . , *K*} are prespecified phase values associated to the *k*th true-class of objects {*Tk*}. Observe that by using a constrained composite filter with complex EOC constraints, we satisfy the equal output intensity restrictions imposed by Eqs. (17) and (18), and at the same time we can classify any unknown input pattern from the input scene by comparing the obtained phase values *φ*ˆ*<sup>k</sup>* at coordinates of maximum intensities (correlation peaks), with the prespecified *φ<sup>k</sup>* values previously defined in the filter constraints (Diaz-Ramirez et al., 2012).

 *T*2 *<sup>i</sup>* (*μ*, *ν*) 

*<sup>i</sup>* (*μ*, *ν*)|*i* = 1, . . . , *NT*

*<sup>i</sup>* (*μ*, *ν*) is the *i*th Fourier transformed training image, which belongs to the *k*th

∪ {*F*} (16)

∈ {*T*} , *k* = 1, . . . , *K* (17)

= 0; for {*Ti*(*μ*, *<sup>ν</sup>*)} ∈ {*F*} (18)

∈ {*T*1}

 *TK <sup>i</sup>* (*μ*, *ν*) 

∈ {*T*2}

∈ {*TK*} (19)

(15)

{*Tk*} =

images. The set {*S*} of all training images can be constructed as follows

{*S*} =

to yield an intensity value of zero for any object that belongs to {*F*}; i.e.,

 *Tk <sup>i</sup>* (*μ*, *ν*) 

<sup>2</sup> = 1; for

 *c*(*K*+1)*<sup>i</sup>* 2

{*ci* = exp (*iφ*1), for *i* = 1, . . . , *NT* } ; ∀

{*ci* = exp (*iφ*2), for *i* = *NT* + 1, . . . , 2*NT* } ; ∀

{*ci* = exp (*iφK*), for *i* = (*K* − 1) *NT* + 1, . . . , *KNT* } ; ∀


using the following EOC constraints,

. . . . . .  *Tk*

recognized, which is given by

Here, *T<sup>k</sup>*

and

#### **Maximum Average Correlation Height (MACH) filter**

The MACH filter **h**MACH is designed to maximize the ratio between the intensity of the output average correlation height (ACH) and the average similarity measure (ASM) among training images (Mahalanobis et al., 1994). Hence, the MACH filter is designed to maximize the function *J* = |ACH| 2 /ASM. Let **X***<sup>i</sup>* and **M**, be both *d* × *d* diagonal matrices containing the elements of the training vectors **t***i*, and the average training vector

$$\mathbf{m} = \frac{1}{N} \sum\_{i=1}^{N} \mathbf{t}\_i \tag{20}$$

Furthermore, the ACH measure can be described as the average of the output central correlation values produced by the training images, as

$$\text{ACH} = \frac{1}{N} \sum\_{i=1}^{N} \mathbf{t}\_i^+ \mathbf{h}\_{\text{MACH}} \tag{21}$$

Additionally, the ASM can be seen as the average error between the full correlation responses produced by the training images **v***<sup>i</sup>* = **X**<sup>∗</sup> *<sup>i</sup>* **h**MACH, and the correlation function produced by the average training image ¯**v** = **M**∗**h**MACH, that is,

$$\text{ASM} = \frac{1}{dN} \sum\_{i=1}^{N} |\mathbf{v}\_i - \bar{\mathbf{v}}|^2 \tag{22}$$

In a compact notation we can rewrite the ACH and ASM measures as follows,

$$\text{ACH} = \text{m}^+ \text{h}\_{\text{MACH}} \tag{23}$$

and

$$\text{ASM} = \mathbf{h}\_{\text{MACH}}^{+} \mathbf{S} \mathbf{h}\_{\text{MACH}} \tag{24}$$

where

$$\mathbf{S} = \frac{1}{dN} \sum\_{i=1}^{N} \left( \mathbf{X}\_i - \mathbf{M} \right) \left( \mathbf{X}\_i - \mathbf{M} \right)^\* \tag{25}$$

Thus, filter **h**MACH is obtained by maximizing the following objective function (Mahalanobis et al., 1987):

$$J(\mathbf{h}\_{\rm MACH}) = \frac{\mathbf{h}\_{\rm MACH}^{+} \mathbf{m} \mathbf{m}^{+} \mathbf{h}\_{\rm MACH}}{\mathbf{h}\_{\rm MACH}^{+} \mathbf{D} \mathbf{h}\_{\rm MACH}} \tag{26}$$

where the resultant MACH filter is given by

$$\mathbf{h}\_{\text{MACH}} = \mathbf{S}^{-1} \mathbf{m} \tag{27}$$

#### **Generalized MACH (GMACH) filter**

The GMACH filter **h**GMACH (Alkanhal et al., 2000), can be seen as a trade-off between a filter with EOC constraints and the MACH filter. Note, that the correlation output at the origin for

**Optimal trade-off SDF (OTSDF) filter**

and *ω*<sup>2</sup>

<sup>1</sup> <sup>+</sup> *<sup>ω</sup>*<sup>2</sup>

Refregier, 2004)

In earlier sections, we have seen that most successful composite filters are designed to optimize certain performance criteria, namely ACE, ASM, and ACH. However, some of these metrics are in fact conflicting objectives, for instance ACE and ASM. For example, consider the MACE filter, which produces sharp correlation peaks by optimizing (minimizing) the output ACE. This means that the MACE filter has a great capacity to distinguish between target objects that should be recognized and false patterns that should be rejected. However, it is well known that MACE filter has a poor tolerance to intraclass distortions, which is characterized by the ASM metric. Therefore, OTSDF filters are designed to perform a compromise between several conflicting measures (Goudail & Refregier, 2004). For instance, an OTSDF filter can be

Advances in Adaptive Composite Filters for Object Recognition 99

obtained by minimizing the following function (Vijaya-Kumar et al., 1994):

OTSDF**Dh**OTSDF <sup>+</sup> *<sup>ω</sup>*2**h**<sup>+</sup>

where ACE and ASM are functions to be minimized, ACH is a function to be maximized,

We can see that unconstrained filters cannot restrict their correlation responses at the origin of the training images in the same manner that a constrained filters does. Instead, these filters maximize the intensity value produced by the average training image and minimize

In Section 2 we described how a basic SDF filter is designed to satisfy the EOC constraints. This means that the filter is only able to control the output correlation points at the central location of the training images within the observed scene. This limited control yields the appearance of high correlation sidelobes over the entire image background. This undesirable property causes a drastic reduction in recognition performance for the SDF filter when it is used in highly cluttered scenes. However, this problem is solved by the MACE filter, which yields sharp correlation peaks at the central location of the training images and suppresses correlation sidelobes by minimizing the ACE metric. However, as we see in Section 2 the MACE filter has a poor tolerance to intraclass distortions. In contrast, the OTSDF filter removes the EOC constraints to gain more control over the output correlation plane. In this manner, the filter can suppress the correlation sidelobes more efficiently and can improve its tolerance to intraclass distortions. This is accomplished because the OTSDF filter optimizes the ACH, ASM, and ACE performance measures. However, note that these metrics are based on the calculation of spatial averages over the complete training set of images. This leads to the synthesis of composite filters which can only yield average performance over several

In this chapter, we are interested in designing composite filters that are optimized in terms of performance metrics for a given set of patterns that are directly related to a particular

**h**OTSDF = (*ω*2**D** + *ω*2**S**)

<sup>2</sup> = 1 are trade-off constants. The resultant OTSDF filter, is given by (Goudail &

OTSDF**Sh**OTSDF <sup>−</sup>

**<sup>h</sup>**<sup>+</sup> OTSDF**<sup>m</sup>**

<sup>−</sup><sup>1</sup> **m** (35)

(34)

*J* (**h**OTSDF) = *ω*1ACE + *ω*2ASM − |ACH|

= *ω*1**h**<sup>+</sup>

the intensity response produced by unwanted patterns.

similar applications and assumming stationary conditions.

**3. Adaptive composite filter designs**

Fig. 1. Iterative training procedure to synthesize an adaptive constrained composite filter the *i*th training image, is given by

$$\mathbf{c}\_{i} = \mathbf{h}\_{\text{GMACH}}^{+} \mathbf{t}\_{i} \tag{28}$$

Furthermore, the average correlation output at the origin is

$$
\vec{\varepsilon} = \mathbf{h}\_{\text{GMACH}}^{+} \mathbf{m}\_{i} \tag{29}
$$

The output correlation variance can be written as (Alkanhal et al., 2000)

$$
\sigma\_{\vec{c}}^{2} = \frac{1}{N} \sum\_{i=1}^{N} \left| c\_{i} - \vec{c} \right|^{2} = \mathbf{h}\_{\text{GMACH}}^{+} \boldsymbol{\Omega} \mathbf{h}\_{\text{GMACH}} \tag{30}
$$

where

$$\boldsymbol{\Omega} = \frac{1}{N} \sum\_{i=1}^{N} \mathbf{h}\_{\text{GMACH}}^{+} \left(\mathbf{t}\_{i} - \mathbf{m}\right) \left(\mathbf{t}\_{i} - \mathbf{m}\right)^{+} \mathbf{h}\_{\text{GMACH}} \tag{31}$$

is a covariance matrix estimate. The GMACH filter **h**GMACH is designed to maximize the function (Alkanhal et al., 2000)

$$f(\mathbf{h}\_{\text{GMACH}}) = \frac{\left|\bar{c}\right|^2}{\sigma\_c^2} = \frac{\mathbf{h}\_{\text{GMACH}}^+ \mathbf{m} \mathbf{m}^+ \mathbf{h}\_{\text{GMACH}}}{\mathbf{h}\_{\text{GMACH}}^+ \mathbf{On}\_{\text{GMACH}}} \tag{32}$$

where the resultant filter is

$$\mathbf{h}\_{\text{CMACH}} = \boldsymbol{\Omega}^{-1} \mathbf{m} \tag{33}$$

#### **Optimal trade-off SDF (OTSDF) filter**

8 Will-be-set-by-IN-TECH

BEGIN

objects to set {*T*} Add all true-class

Construct matrix T and vector c from {*T*}

> Synthesize SDF filter hsdf

to be rejected performance

Furthermore, the average correlation output at the origin is

*σ*2 *<sup>c</sup>* <sup>=</sup> <sup>1</sup> *N*

**<sup>Ω</sup>** <sup>=</sup> <sup>1</sup> *N*

Correlation process with hsdf

filter Evaluate

STEP 3 STEP 4 STEP 5

Fig. 1. Iterative training procedure to synthesize an adaptive constrained composite filter

*ci* = **h**<sup>+</sup>

*c*¯ = **h**<sup>+</sup>


2 *σ*2 *c*

<sup>2</sup> = **h**<sup>+</sup>

GMACH (**t***<sup>i</sup>* − **m**) (**t***<sup>i</sup>* − **m**)

is a covariance matrix estimate. The GMACH filter **h**GMACH is designed to maximize the

<sup>=</sup> **<sup>h</sup>**<sup>+</sup>

**h**<sup>+</sup>

GMACH**mm**+**h**GMACH

GMACH**Ωh**GMACH

**h**GMACH = **Ω**−1**m** (33)

The output correlation variance can be written as (Alkanhal et al., 2000)

*N* ∑ *i*=1

*N* ∑ *i*=1 **h**<sup>+</sup>

*<sup>J</sup>*(**h**GMACH) = <sup>|</sup>*c*¯<sup>|</sup>

Background

where

the *i*th training image, is given by

function (Alkanhal et al., 2000)

where the resultant filter is

STEP 2

STEP 1

Yes

GMACH**t***<sup>i</sup>* (28)

GMACH**m***<sup>i</sup>* (29)

GMACH**Ωh**GMACH (30)

<sup>+</sup> **<sup>h</sup>**GMACH (31)

(32)

STEP 6

Create a new object to be rejected

object Add new

to set {*F*}

to optimize metrics

Search for sidelobe

No

FINISH

prespecified? <sup>≤</sup> performance

vector c

Update matrix T and add a 0 to In earlier sections, we have seen that most successful composite filters are designed to optimize certain performance criteria, namely ACE, ASM, and ACH. However, some of these metrics are in fact conflicting objectives, for instance ACE and ASM. For example, consider the MACE filter, which produces sharp correlation peaks by optimizing (minimizing) the output ACE. This means that the MACE filter has a great capacity to distinguish between target objects that should be recognized and false patterns that should be rejected. However, it is well known that MACE filter has a poor tolerance to intraclass distortions, which is characterized by the ASM metric. Therefore, OTSDF filters are designed to perform a compromise between several conflicting measures (Goudail & Refregier, 2004). For instance, an OTSDF filter can be obtained by minimizing the following function (Vijaya-Kumar et al., 1994):

$$\begin{split} \mathcal{J}\left(\mathbf{h}\_{\text{OTSDF}}\right) &= \omega\_1 \text{ACE} + \omega\_2 \text{ASM} - |\text{ACH}| \\ &= \omega\_1 \mathbf{h}\_{\text{OTSDF}}^{+} \mathbf{D} \mathbf{h}\_{\text{OTSDF}} + \omega\_2 \mathbf{h}\_{\text{OTSDF}}^{+} \mathbf{S} \mathbf{h}\_{\text{OTSDF}} - \left| \mathbf{h}\_{\text{OTSDF}}^{+} \mathbf{m} \right| \end{split} \tag{34}$$

where ACE and ASM are functions to be minimized, ACH is a function to be maximized, and *ω*<sup>2</sup> <sup>1</sup> <sup>+</sup> *<sup>ω</sup>*<sup>2</sup> <sup>2</sup> = 1 are trade-off constants. The resultant OTSDF filter, is given by (Goudail & Refregier, 2004)

$$\mathbf{h}\_{\text{OTSDF}} = \left(\omega\_2 \mathbf{D} + \omega\_2 \mathbf{S}\right)^{-1} \mathbf{m} \tag{35}$$

We can see that unconstrained filters cannot restrict their correlation responses at the origin of the training images in the same manner that a constrained filters does. Instead, these filters maximize the intensity value produced by the average training image and minimize the intensity response produced by unwanted patterns.

#### **3. Adaptive composite filter designs**

In Section 2 we described how a basic SDF filter is designed to satisfy the EOC constraints. This means that the filter is only able to control the output correlation points at the central location of the training images within the observed scene. This limited control yields the appearance of high correlation sidelobes over the entire image background. This undesirable property causes a drastic reduction in recognition performance for the SDF filter when it is used in highly cluttered scenes. However, this problem is solved by the MACE filter, which yields sharp correlation peaks at the central location of the training images and suppresses correlation sidelobes by minimizing the ACE metric. However, as we see in Section 2 the MACE filter has a poor tolerance to intraclass distortions. In contrast, the OTSDF filter removes the EOC constraints to gain more control over the output correlation plane. In this manner, the filter can suppress the correlation sidelobes more efficiently and can improve its tolerance to intraclass distortions. This is accomplished because the OTSDF filter optimizes the ACH, ASM, and ACE performance measures. However, note that these metrics are based on the calculation of spatial averages over the complete training set of images. This leads to the synthesis of composite filters which can only yield average performance over several similar applications and assumming stationary conditions.

In this chapter, we are interested in designing composite filters that are optimized in terms of performance metrics for a given set of patterns that are directly related to a particular

filter. The goal is to incorporate a segment, or region, that is cropped from the synthetic background around a central point as a new false class image in {*F*}, call this new image taken from the background *fn*(*x*, *y*) that has a support region which is similar to that of the target image class. The new image *fn*(*x*, *y*) should provide the maximum performance increase, based on a chosen performance criteria, when compared to all other possible background segments that could have been chosen. After including *fn*(*x*, *y*) in {*F*} a new composite filter is synthesized. This procedure is iteratively repeated until a prespecified performance level for the filter is reached. Note that the suggested training procedure can be used to synthesize adaptive composite filters based on constrained or unconstrained models. The general steps

Advances in Adaptive Composite Filters for Object Recognition 101

• STEP 1: Include all available training images to a corresponding subset {*T*} or {*F*}, and

• STEP 2: Synthesize a composite filter trained for {*S*} using a constrained or unconstrained

• STEP 3: Carry out the correlation between the actual composite filter and a synthetic image

• STEP 4: Calculate the performance metrics of the composite filter and set them the current performance level of the filter. If the performance level of the filter is greater than a

• STEP 5: Find the maximum intensity value in the output correlation plane, and around this point extract a new training image to be rejected from the background. The region of support of this new training image is similar to that of the reference image of the target.

• STEP 6: Include the new false-image to set {*F*} and update set {*S*}. Next, go to STEP 2.

An adaptive constrained filter can be constructed by training a simple SDF filter with the iterative procedure described above. First, all available views of the target are included in the true-class training set {*T*}. Next, we construct the matrix **T** and the vector of constraints **c**, and a basic SDF filter **h***sdf* is synthesized using Eq. (5). At this point, the **h***sdf* filter is able to recognize all objects in subset {*T*} with a single correlation operation. However, the filter may produce high correlation sidelobes when the target is embedded into a highly cluttered background. Nonetheless, we can train the filter **h***sdf* to optimize its ability to distinguish among the different views of the target and the background by optimizing the discrimination capability (DC) of the filter. The DC can be formally defined, as follows (Yaroslavsky, 1993):

DC = 1 −

 *cB max* 2

 *cT max* 

plane over the area occupied by the target. The background area and the target area are complementary. A filter with a DC value close to unity possesses a good capacity to

<sup>2</sup> is the maximum intensity value in the output correlation plane over the

<sup>2</sup> is the maximum intensity value in the output correlation

<sup>2</sup> (36)

prespecified value, the procedure is finished. Otherwise, go to next step.

of the training procedure are summarized as follows:

construct the training set {*S*} = {*T*} ∪ {*F*}.

filter model.

where

*cB max* 

background area, and

*cT max* 

of the background.

**Adaptive constrained filter design**

Fig. 2. Iterative training procedure to synthesize an adaptive unconstrained composite filter

application problem. First, we analyze the two-class pattern recognition problem, where the training set is given by {*S*} = {*T*} ∪ {*F*}. We assume that the true-class training images {*ti*(*x*, *y*)} ∈ {*T*} are previously chosen by the filter designer and that the false-class images { *fi*(*x*, *y*)} ∈ {*F*} can be given by any known false objet to be rejected, or by unknown patterns that have similar structures to those of the target. If information of the background where detection will be carried out is available, the false-class images *fi*(*x*, *y*) can be given by small fragments taken from a synthetic image with similar statistical properties to those of the expected background in the image scene.

Let us define a set {*UF*} that contains all feasible image patterns that can be chosen as false-class images *fi*(*x*, *y*), an extremely vast set given the size and resolution of common digital images. The set {*UF*} can be seen as the universe of feasible training images from which we can obtain subset {*F*}. In this sense, we can see that an optimal subset {*FO*} ⊂ {*UF*} of image patterns must exist, which is the set of false-class images that can be used to synthesize a composite filter that achieves optimal performance; i.e., when {*F*} = {*FO*}. Note that the subset {*FO*} is a priori unknown, and its contents cannot be derived analytically from the problem definition. Therefore, a search and optimization strategy is required to find {*FO*}2.

In this chapter, the proposal is to use an adaptive iterative algorithm to search for {*FO*}. The first step of the adaptation algorithm is to perform the correlation process between the background scene and a basic composite filter, initially trained with all available versions of the target and known false-class objects. The background function can be either described deterministically as an image or by a stochastic process. Next, we search for the coordinates of a point in the output correlation plane that allows us to improve the performance of the

<sup>2</sup> The theoretical goal is to find set {*FO*}. However, in practice the goal is relaxed, instead searching for the best possible approximation to {*FO*}.

10 Will-be-set-by-IN-TECH

BEGIN

objects to set {*T*} Add all true-class

Construct matrix S from {*T*} and vector m

Synthesize

OTSDF filter hotsdf

to be rejected performance

expected background in the image scene.

the best possible approximation to {*FO*}.

with hotsdf

Correlation process

filter Evaluate

STEP 3 STEP 4 STEP 5

Fig. 2. Iterative training procedure to synthesize an adaptive unconstrained composite filter

application problem. First, we analyze the two-class pattern recognition problem, where the training set is given by {*S*} = {*T*} ∪ {*F*}. We assume that the true-class training images {*ti*(*x*, *y*)} ∈ {*T*} are previously chosen by the filter designer and that the false-class images { *fi*(*x*, *y*)} ∈ {*F*} can be given by any known false objet to be rejected, or by unknown patterns that have similar structures to those of the target. If information of the background where detection will be carried out is available, the false-class images *fi*(*x*, *y*) can be given by small fragments taken from a synthetic image with similar statistical properties to those of the

Let us define a set {*UF*} that contains all feasible image patterns that can be chosen as false-class images *fi*(*x*, *y*), an extremely vast set given the size and resolution of common digital images. The set {*UF*} can be seen as the universe of feasible training images from which we can obtain subset {*F*}. In this sense, we can see that an optimal subset {*FO*} ⊂ {*UF*} of image patterns must exist, which is the set of false-class images that can be used to synthesize a composite filter that achieves optimal performance; i.e., when {*F*} = {*FO*}. Note that the subset {*FO*} is a priori unknown, and its contents cannot be derived analytically from the problem definition. Therefore, a search and optimization strategy is required to find {*FO*}2.

In this chapter, the proposal is to use an adaptive iterative algorithm to search for {*FO*}. The first step of the adaptation algorithm is to perform the correlation process between the background scene and a basic composite filter, initially trained with all available versions of the target and known false-class objects. The background function can be either described deterministically as an image or by a stochastic process. Next, we search for the coordinates of a point in the output correlation plane that allows us to improve the performance of the

<sup>2</sup> The theoretical goal is to find set {*FO*}. However, in practice the goal is relaxed, instead searching for

Background

STEP 2

STEP 1

Yes

STEP 6

Create a new object to be rejected

Add new object to set {*F*}

to optimize metrics

Search for sidelobe

No

FINISH

prespecified? <sup>≤</sup> performance

Calculate matrix

from {*F*} D

filter. The goal is to incorporate a segment, or region, that is cropped from the synthetic background around a central point as a new false class image in {*F*}, call this new image taken from the background *fn*(*x*, *y*) that has a support region which is similar to that of the target image class. The new image *fn*(*x*, *y*) should provide the maximum performance increase, based on a chosen performance criteria, when compared to all other possible background segments that could have been chosen. After including *fn*(*x*, *y*) in {*F*} a new composite filter is synthesized. This procedure is iteratively repeated until a prespecified performance level for the filter is reached. Note that the suggested training procedure can be used to synthesize adaptive composite filters based on constrained or unconstrained models. The general steps of the training procedure are summarized as follows:


#### **Adaptive constrained filter design**

An adaptive constrained filter can be constructed by training a simple SDF filter with the iterative procedure described above. First, all available views of the target are included in the true-class training set {*T*}. Next, we construct the matrix **T** and the vector of constraints **c**, and a basic SDF filter **h***sdf* is synthesized using Eq. (5). At this point, the **h***sdf* filter is able to recognize all objects in subset {*T*} with a single correlation operation. However, the filter may produce high correlation sidelobes when the target is embedded into a highly cluttered background. Nonetheless, we can train the filter **h***sdf* to optimize its ability to distinguish among the different views of the target and the background by optimizing the discrimination capability (DC) of the filter. The DC can be formally defined, as follows (Yaroslavsky, 1993):

$$\text{DC} = 1 - \frac{\left|\mathbf{c}\_{\text{max}}^{B}\right|^{2}}{\left|\mathbf{c}\_{\text{max}}^{T}\right|^{2}}\tag{36}$$

where *cB max* <sup>2</sup> is the maximum intensity value in the output correlation plane over the background area, and *cT max* <sup>2</sup> is the maximum intensity value in the output correlation plane over the area occupied by the target. The background area and the target area are complementary. A filter with a DC value close to unity possesses a good capacity to

(a) (b) (c) (d)

Advances in Adaptive Composite Filters for Object Recognition 103

(e) (f) (g) (h)

scene shown in (a), (f) for scene shown in (b), (g) for scene shown in (c), (h) for scene shown

power spectrum vector of the representative image of the background. Note that the objective function increases when both of the ACE and ASM metrics are minimized and when the ACH metric is maximized. If the value of Eq. (37) is greater than a desired value then the training procedure is finished. Otherwise, we search for coordinates in the output correlation plane (between **h***otsd f* and the background image) that achieves the maximum improvement of the objective function. These coordinates are the center of the background region that is extracted and included as a new training image. This new training image is included in the false set {*F*} and the matrix **D** is updated; finally a new filter **h***otsd f* is constructed. This cycle can be

In this section, we analyze and discuss the simulation performance of the proposed adaptive filters for object recognition. These results are compared with those obtained with conventional MACE (Mahalanobis et al., 1987) and MACH (Mahalanobis et al., 1994) composite filters. The performance of the composite filters is evaluated in terms of recognition performance and location accuracy. Recognition performance is given by discrimination capability (see Eq. (36)), whereas location accuracy is characterized by the location errors

*<sup>n</sup>* = 16/256. Output correlation intensity plane obtained with ACF: (e) for

*<sup>n</sup>* = 2/256, (b) *σ*<sup>2</sup>

**b***<sup>g</sup>*  *<sup>n</sup>* = 4/256, (c)

<sup>2</sup> which is the

Fig. 4. Example of input test scene with noise variance of: (a) *σ*<sup>2</sup>

continued until a designed trade-off performance is obtained.

where **<sup>D</sup>***bg* is a diagonal matrix where the main diagonal is given by

*σ*2

in (d).

*<sup>n</sup>* = 8/256, (d) *σ*<sup>2</sup>

**4. Experimental results**

distinguish between targets and unwanted objects. Negative values of DC indicate that the filter is unable to recognize any target. Note that other discrimination metrics can be used in the training procedure; for instance, the peak to correlation energy (PCE) (Vijaya-Kumar & Hassebrook, 1990) and the peak to sidelobe (PSR) ratio (Kerekes & Vijaya-Kumar, 2008). To measure the DC of the filter we carry out the correlation process between **h***sdf* and a synthetic image of the background with similar statistical properties to those of the real background, then we calculate the DC using Eq. (36). If the DC of the **h***sdf* filter is greater than a prespecified value the training procedure is finished. Otherwise, we search for the coordinates of the highest sidelobe in the output correlation plane between **h***sdf* and the background image. These coordinates are set as the origin, and around the origin we construct a training image form the background. This new training image is included in the false-class subset {*F*} and a new **h***sdf* filter is synthesized to recognize the object patterns in {*T*} and reject the object patterns in {*F*}. This cycle can be continued until a desired DC value is reached. The training algorithm to synthesize an adaptive constrained composite filter is presented in Fig. 1.

#### **Adaptive unconstrained filters**

An unconstrained adaptive composite filter can be constructed by training a basic OTSDF filter and optimizing several performance criteria. It must be noted that since the OTSDF filter is not restricted to satisfy hard EOC constraints, the filter has more freedom to concurrently optimize multiple criteria. The flow diagram of the proposed iterative algorithm is presented in Fig. 2. The algorithm begins by constructing subset {*T*} with all available views of the target objects. Next, we create the mean vector of training images **m** (see Eq. (20)) and matrix **S** using Eq. (25), then a basic OTSDF filter is synthesized following Eq. (35). The diagonal matrix **D** required in Eq. (35) can be constructed using all available known patterns that ought to be rejected; otherwise **D** is zero. The next step of the algorithm is to carry out the correlation process between the current **h***otsd f* filter and a synthetic image that is representative of the background. Afterwards, we evaluate the performance of the filter using the following objective function:

$$\begin{split} f(\mathbf{h}) &= \frac{|\text{ACH}|^2}{\text{ACE}\_{bg} + \text{ASM}} \\ &= \frac{\mathbf{h}\_{otsd}^+ \mathbf{mm}^+ \mathbf{h}\_{otsdf}}{\mathbf{h}\_{otsd}^+ \mathbf{D}\_{bg} \mathbf{h}\_{otsdf} + \mathbf{h}\_{otsdf}^+ \mathbf{S} \mathbf{h}\_{otsdf}} \end{split} \tag{37}$$

12 Will-be-set-by-IN-TECH

(a) (b) (c) (d) (e)

distinguish between targets and unwanted objects. Negative values of DC indicate that the filter is unable to recognize any target. Note that other discrimination metrics can be used in the training procedure; for instance, the peak to correlation energy (PCE) (Vijaya-Kumar & Hassebrook, 1990) and the peak to sidelobe (PSR) ratio (Kerekes & Vijaya-Kumar, 2008). To measure the DC of the filter we carry out the correlation process between **h***sdf* and a synthetic image of the background with similar statistical properties to those of the real background, then we calculate the DC using Eq. (36). If the DC of the **h***sdf* filter is greater than a prespecified value the training procedure is finished. Otherwise, we search for the coordinates of the highest sidelobe in the output correlation plane between **h***sdf* and the background image. These coordinates are set as the origin, and around the origin we construct a training image form the background. This new training image is included in the false-class subset {*F*} and a new **h***sdf* filter is synthesized to recognize the object patterns in {*T*} and reject the object patterns in {*F*}. This cycle can be continued until a desired DC value is reached. The training algorithm to synthesize an adaptive constrained composite filter is presented in Fig. 1.

An unconstrained adaptive composite filter can be constructed by training a basic OTSDF filter and optimizing several performance criteria. It must be noted that since the OTSDF filter is not restricted to satisfy hard EOC constraints, the filter has more freedom to concurrently optimize multiple criteria. The flow diagram of the proposed iterative algorithm is presented in Fig. 2. The algorithm begins by constructing subset {*T*} with all available views of the target objects. Next, we create the mean vector of training images **m** (see Eq. (20)) and matrix **S** using Eq. (25), then a basic OTSDF filter is synthesized following Eq. (35). The diagonal matrix **D** required in Eq. (35) can be constructed using all available known patterns that ought to be rejected; otherwise **D** is zero. The next step of the algorithm is to carry out the correlation process between the current **h***otsd f* filter and a synthetic image that is representative of the background. Afterwards, we evaluate the performance of the filter using

2

*otsd f***D***bg***h***otsd f* <sup>+</sup> **<sup>h</sup>**<sup>+</sup>

*otsd f***mm**+**h***otsd f*

*otsd f***Sh***otsd f*

(37)

ACE*bg* + ASM

*<sup>J</sup>*(**h**) = <sup>|</sup>ACH<sup>|</sup>

**h**<sup>+</sup>

<sup>=</sup> **<sup>h</sup>**<sup>+</sup>

Fig. 3. Sample views of the target object

**Adaptive unconstrained filters**

the following objective function:

Fig. 4. Example of input test scene with noise variance of: (a) *σ*<sup>2</sup> *<sup>n</sup>* = 2/256, (b) *σ*<sup>2</sup> *<sup>n</sup>* = 4/256, (c) *σ*2 *<sup>n</sup>* = 8/256, (d) *σ*<sup>2</sup> *<sup>n</sup>* = 16/256. Output correlation intensity plane obtained with ACF: (e) for scene shown in (a), (f) for scene shown in (b), (g) for scene shown in (c), (h) for scene shown in (d).

where **<sup>D</sup>***bg* is a diagonal matrix where the main diagonal is given by **b***<sup>g</sup>* <sup>2</sup> which is the power spectrum vector of the representative image of the background. Note that the objective function increases when both of the ACE and ASM metrics are minimized and when the ACH metric is maximized. If the value of Eq. (37) is greater than a desired value then the training procedure is finished. Otherwise, we search for coordinates in the output correlation plane (between **h***otsd f* and the background image) that achieves the maximum improvement of the objective function. These coordinates are the center of the background region that is extracted and included as a new training image. This new training image is included in the false set {*F*} and the matrix **D** is updated; finally a new filter **h***otsd f* is constructed. This cycle can be continued until a designed trade-off performance is obtained.

#### **4. Experimental results**

In this section, we analyze and discuss the simulation performance of the proposed adaptive filters for object recognition. These results are compared with those obtained with conventional MACE (Mahalanobis et al., 1987) and MACH (Mahalanobis et al., 1994) composite filters. The performance of the composite filters is evaluated in terms of recognition performance and location accuracy. Recognition performance is given by discrimination capability (see Eq. (36)), whereas location accuracy is characterized by the location errors

2 4 6 8 10 12 14 16

Iteration index

Advances in Adaptive Composite Filters for Object Recognition 105

Fig. 6. Normalized performance of AUF in terms of objective function *J*(**h***AUF*) vs iteration

at unknown coordinates, and corrupted with additive noise. In our experiments, we use monochrome images of size 400×400 pixels. The signal range is [0, 1] with 256 quantization levels. The size of the target is of about 120×95 pixels, with the mean value and standard deviation of *μ<sup>t</sup>* = 0.354, *σ<sup>t</sup>* = 0.237, respectively. The background image has a mean value *μ<sup>b</sup>* = 0.73 and standard deviation *σ<sup>b</sup>* = 0.21. Fig. 4 (a)-(d) shows examples of the input test scene for different positions of the target and different amounts of noise. First, we design an adaptive constrained filter (ACF) trained to recognize the five views of the target shown in Fig. 3, using the iterative algorithm shown in Fig. 1. In the design process we use a different background image, which has similar statistical properties than the one used during the recognition experiments. Before the first iteration the DC value for the ACF is negative. However, after 31 iterations of the adaptation process the ACF reaches DC=0.95. This implies that a high level of control over the correlation plane for the input scene can be achieved. Fig. 5 shows the performance of the ACF in the design process in terms of the DC value versus the iteration index. To illustrate the performance of the proposed method, Fig. 4 (a)-(d) show four test scenes and Fig. 4 (e)-(h) presents the output intensity planes obtained

*<sup>n</sup>* = 4/256 *σ*<sup>2</sup>

AUF DC=0.87±0.02 DC=0.82±0.06 DC=0.77±0.07 DC=0.68±0.11 LE=0.8±0.21 LE=1.3±0.3 LE = 2.8±1.3 LE = 3.2±1.8

MACH DC=0.71±0.04 DC=0.66±0.02 DC=0.49±0.04 DC=0.38±0.09

MACE DC=0.57±0.11 DC=0.41±0.18 DC=0.23±0.16 DC=0.18±0.25

Table 2. DC and LE performance with 95% confidence of AUF, MACE and MACH filters

LE=2.1±0.25 LE=2.95±0.67 LE=9.73±7.06 LE=13.37±9.9

LE=6.03±0.88 LE=14.23±0.02 LE=23.1±11.66 LE=37.2±18.1

*<sup>n</sup>* = 8/256 *σ*<sup>2</sup>

*<sup>n</sup>* = 6/256

0

*σ*2

while noise variance *σ*<sup>2</sup>

*<sup>n</sup>* = 2/256 *σ*<sup>2</sup>

*<sup>n</sup>* is changed.

index during training procedure.

0.2

0.4

0.6

*J*(h*AUF* ) 0.8

1

Fig. 5. DC performance of ACF vs iteration index during training procedure.

(LE) defined by (Kober & Campos, 1996):

$$\text{LE} = \left[ (\tau\_{\text{x}} - \hat{\tau}\_{\text{x}})^2 + (\tau\_{\text{y}} - \hat{\tau}\_{\text{y}})^2 \right]^{1/2} \tag{38}$$

where *τx*,*τ<sup>y</sup>* and *τ*ˆ*x*,*τ*ˆ*<sup>y</sup>* are the exact and estimated target coordinates, respectively. *τx*,*τ<sup>y</sup>* are assumed to be known, whereas *τ*ˆ*x*,*τ*ˆ*<sup>y</sup>* are estimated from correlation-peak location. The target is a flying bird whose sample views are shown in Fig. 3, which were extracted from a real video sequence. The input scene is defined with a non-overlapping signal model (Javidi & Wang, 1994; Kober et al., 2000) as follows,

$$f\left(\mathbf{x}, y\right) = \mathbf{t}\_k \left(\mathbf{x} - \boldsymbol{\tau}\_{\mathbf{x}\boldsymbol{\iota}}, y - \boldsymbol{\tau}\_{\mathbf{y}\boldsymbol{\iota}}\right) + \left[1 - w\_k \left(\mathbf{x} - \boldsymbol{\tau}\_{\mathbf{x}\boldsymbol{\iota}}, y - \boldsymbol{\tau}\_{\mathbf{y}\boldsymbol{\iota}}\right)\right] b\left(\mathbf{x}, y\right) + n\left(\mathbf{x}, y\right) \tag{39}$$

where *tk* (*x*, *y*) represents the *k*th view of the target, *τxk* ,*τyk* are random variables representing unknown coordinates of the target within the scene, *b* (*x*, *y*) is the background, *n* (*x*, *y*) is a zero-mean additive noise with variance *σ*<sup>2</sup> *<sup>n</sup>*, and *wk* (*x*, *y*) is the region of support of *tk* (*x*, *y*). The input scene can be interpreted as a view of the target embedded into a background


Table 1. DC and LE performance with 95% confidence of ACF, MACE and MACH filters while noise variance *σ*<sup>2</sup> *<sup>n</sup>* is changed.

14 Will-be-set-by-IN-TECH

5 10 15 20 25 30

Interation Index

<sup>2</sup> +

where *τx*,*τ<sup>y</sup>* and *τ*ˆ*x*,*τ*ˆ*<sup>y</sup>* are the exact and estimated target coordinates, respectively. *τx*,*τ<sup>y</sup>* are assumed to be known, whereas *τ*ˆ*x*,*τ*ˆ*<sup>y</sup>* are estimated from correlation-peak location. The target is a flying bird whose sample views are shown in Fig. 3, which were extracted from a real video sequence. The input scene is defined with a non-overlapping signal model (Javidi &

> 1 − *wk*

where *tk* (*x*, *y*) represents the *k*th view of the target, *τxk* ,*τyk* are random variables representing unknown coordinates of the target within the scene, *b* (*x*, *y*) is the background, *n* (*x*, *y*) is a

The input scene can be interpreted as a view of the target embedded into a background

*<sup>n</sup>* = 4/256 *σ*<sup>2</sup>

ACF DC=0.93±0.06 DC=0.91±0.07 DC=0.90±0.06 DC=0.87±0.06 LE=0 LE=0 LE=0 LE=0

MACH DC=0.92±0.01 DC=0.85±0.01 DC=0.68±0.04 DC=0.41±0.04

MACE DC=0.9±0.01 DC=0.81±0.03 DC=0.68±0.04 DC=0.51±0.04

Table 1. DC and LE performance with 95% confidence of ACF, MACE and MACH filters

LE=0.02±0.02 LE=0.02±0.02 LE=9.1±8.06 LE=25.2±9.2

LE=0.02±0.02 LE=1.95±0.37 LE=14.57±11.06 LE=32.27±14.9

*τ<sup>y</sup>* − *τ*ˆ*<sup>y</sup>* 2 1/2

*x* − *τxk* , *y* − *τyk*

*<sup>n</sup>*, and *wk* (*x*, *y*) is the region of support of *tk* (*x*, *y*).

*<sup>n</sup>* = 8/256 *σ*<sup>2</sup>

(38)

*b* (*x*, *y*) + *n* (*x*, *y*) (39)

*<sup>n</sup>* = 16/256

Fig. 5. DC performance of ACF vs iteration index during training procedure.

 +

(*τ<sup>x</sup>* − *τ*ˆ*x*)

LE = 

*x* − *τxk* , *y* − *τyk*

*<sup>n</sup>* = 2/256 *σ*<sup>2</sup>

*<sup>n</sup>* is changed.

0

(LE) defined by (Kober & Campos, 1996):

Wang, 1994; Kober et al., 2000) as follows,

zero-mean additive noise with variance *σ*<sup>2</sup>

*σ*2

*f* (*x*, *y*) = *tk*

while noise variance *σ*<sup>2</sup>

0.2

0.4

0.6

DC

0.8

1

Fig. 6. Normalized performance of AUF in terms of objective function *J*(**h***AUF*) vs iteration index during training procedure.

at unknown coordinates, and corrupted with additive noise. In our experiments, we use monochrome images of size 400×400 pixels. The signal range is [0, 1] with 256 quantization levels. The size of the target is of about 120×95 pixels, with the mean value and standard deviation of *μ<sup>t</sup>* = 0.354, *σ<sup>t</sup>* = 0.237, respectively. The background image has a mean value *μ<sup>b</sup>* = 0.73 and standard deviation *σ<sup>b</sup>* = 0.21. Fig. 4 (a)-(d) shows examples of the input test scene for different positions of the target and different amounts of noise. First, we design an adaptive constrained filter (ACF) trained to recognize the five views of the target shown in Fig. 3, using the iterative algorithm shown in Fig. 1. In the design process we use a different background image, which has similar statistical properties than the one used during the recognition experiments. Before the first iteration the DC value for the ACF is negative. However, after 31 iterations of the adaptation process the ACF reaches DC=0.95. This implies that a high level of control over the correlation plane for the input scene can be achieved. Fig. 5 shows the performance of the ACF in the design process in terms of the DC value versus the iteration index. To illustrate the performance of the proposed method, Fig. 4 (a)-(d) show four test scenes and Fig. 4 (e)-(h) presents the output intensity planes obtained


Table 2. DC and LE performance with 95% confidence of AUF, MACE and MACH filters while noise variance *σ*<sup>2</sup> *<sup>n</sup>* is changed.

(a)-(d) exhibits several input test-scenes containing a distorted version of the target over the background at unknown coordinates. The output intensity planes obtained with the AUF for each of the input scenes are presented in Fig. 7 (e)-(h). It can be seen that the distorted target can be accurately located in each scene with the adaptive filter. Next, we test the recognition performance of AUF in recognizing geometrically distorted views of the target embedded within noisy scenes. To guarantee correct statistical results, 120 statistical trials of each experiment for different positions, rotations, and scale changes of the target (within the training intervals) and realizations of random noise processes were carried out. In each trial, we randomly choose a geometrically distorted view of the target which can be given by a rotated version of the target within the range of [-10,10] degrees or by a scaled version within the range of [0.8,1.2] scale factors. The distorted target is embedded into the background at unknown coordinates and the scene is corrupted with additive noise. Then, the constructed scene is correlated with the composite filters and the DC and LE metrics are calculated. The results are summarized in Table 2, it can be seen that the proposed AUF yields the best results

Advances in Adaptive Composite Filters for Object Recognition 107

Finally, the simulation results suggest that both ACF and AUF possess very good discrimination capability, outperforming conventional MACE and MACH filters in all our tests. Moreover, one can observe that the ACF is more robust than the AUF with respect to additive noise, and also yields a better location accuracy. In contrast, the AUF is more tolerant in recognizing geometrically distorted views which are embedded into a background.

In summary, the chapter presents an iterative approach to synthesize adaptive composite correlation filters for object recognition. The approach can be used to monotonically improve the quality of a simple composite filter in terms of quality metrics using all available information about the target object to be recognized, and false patterns to be rejected such as the background. Given a subset of true-class training images the proposed approach designs the impulse response of an optimized adaptive filter in terms of a particular performance criterion using an incremental search-based strategy. We designed an adaptive constrained filter with the suggested iterative algorithm optimizing the discrimination capability. According to the simulation results, the proposed adaptive constrained filter proved to be very robust in recognizing different views of a target within an input scene that is corrupted with additive noise. Moreover, the filter exhibits high levels of discrimination capability and location accuracy when compared with conventional MACE and MACH formulations. Furthermore, we synthesized an adaptive unconstrained composite filter optimized with respect to a proposed objective function based on the ACH, ACE, and ASM metrics. Here again, the experimental results suggest that the adaptive unconstrained filter provides a robust detection of geometrically distorted versions of the target when it is

Finally, we can envision several lines of future research that can be derived from the algorithms and methods presented here. First, future experimental tests should consider real-world scenarios and applications to validate the usefulness of these filters in applied domains. Second, while the adaptive design process presented here has shown promising

in terms of DC and LE whereas the MACE filter yields the worst results.

embedded within a highly cluttered background.

**5. Conclusions**

Fig. 7. Examples of input test scenes with a geometrically ditorted versions of the target and with additive noise variance *σ*<sup>2</sup> *<sup>n</sup>* = 2/256. (a) Target rotated by -10 degrees, (b) Target rotated by 10 degrees, (c) Target with size enlarged by 20%, (d) Target with size reduced by 20%. Output correlation intensity plane obtained with AUF: (e) for scene shown in (a), (f) for scene shown in (b), (g) for scene shown in (c), (h) for scene shown in (d).

by the ACF on each scene. We can see one sharp correlation peak in each output intensity plane, indicating the presence of the target at the correct position. Moreover, observe that the output-correlation intensity values in the background area are very low in all the tests. Next, we compare the recognition performance of all considered composite filters when different views of the target are embedded into the background at unknown coordinates, and the variance of additive noise *σ*<sup>2</sup> *<sup>n</sup>* is changed. To guarantee correct statistical results, 120 statistical trials of each experiment for different views of the target and realizations of random noise processes were carried out. With 95% confidence the performance results in terms of DC and LE are presented in Table 1. One can observe that the proposed ACF yields the best results in terms of DC and no location errors occurred. This means that the proposed ACF is robust to additive noise and to background disjoint noise.

Now, we design an adaptive unconstrained filter (AUF) trained to recognize the five views of the target including rotated versions from -10 to 10 degrees with increments of two degrees, and scaled versions with 0.8 and 1.2 scale factors. In this case, the true-class training set {*T*} contains 70 training images. The AUF was synthesized using the iterative training algorithm shown in Fig. 2, reaching its maximum value in terms of the objective function "*J*(**h***AUF*)" (see Ec. (37)) after 16 iterations. The normalized performance of the AUF in the design process in terms of *J*(**h***AUF*) versus the iteration index is shown in Fig. 6. To illustrate the performance of the AUF in recognizing geometrically distorted views of the target, Fig. 7 (a)-(d) exhibits several input test-scenes containing a distorted version of the target over the background at unknown coordinates. The output intensity planes obtained with the AUF for each of the input scenes are presented in Fig. 7 (e)-(h). It can be seen that the distorted target can be accurately located in each scene with the adaptive filter. Next, we test the recognition performance of AUF in recognizing geometrically distorted views of the target embedded within noisy scenes. To guarantee correct statistical results, 120 statistical trials of each experiment for different positions, rotations, and scale changes of the target (within the training intervals) and realizations of random noise processes were carried out. In each trial, we randomly choose a geometrically distorted view of the target which can be given by a rotated version of the target within the range of [-10,10] degrees or by a scaled version within the range of [0.8,1.2] scale factors. The distorted target is embedded into the background at unknown coordinates and the scene is corrupted with additive noise. Then, the constructed scene is correlated with the composite filters and the DC and LE metrics are calculated. The results are summarized in Table 2, it can be seen that the proposed AUF yields the best results in terms of DC and LE whereas the MACE filter yields the worst results.

Finally, the simulation results suggest that both ACF and AUF possess very good discrimination capability, outperforming conventional MACE and MACH filters in all our tests. Moreover, one can observe that the ACF is more robust than the AUF with respect to additive noise, and also yields a better location accuracy. In contrast, the AUF is more tolerant in recognizing geometrically distorted views which are embedded into a background.
