**2. General HONN filter's design and implementation**

42 Advances in Object Recognition Systems

input through the NNET in parallel due to the parallel input sources. However, to allow

Let assume there are three training images of a car, size 100 100 ( 1 100 100 in vector form), of different angle of view, to pass through the NNET. The chosen first design (see Fig. 1) consists of one input source used for all the training images. The input source consists of 10,000 i.e. 1 100 100 input neurons equal to the size of each training image (in vector form). Each layer needs, by definition, to have the same input connections to each of its hidden neurons. However, Fig. 1 is referred to as of the fourth layer since there are three hidden layers (shown here aligned under each other) and one output layer. The input layer does not contain neurons with activation functions and so is omitted in the numbering of the layers. Each of the hidden layers has only one hidden neuron. Though the network initially is fully connected to the input layer during the training stage, only one hidden layer is connected for each training image presented through the NNET. Fig. 1 is thus not a contiguous three (hidden) layer

easier implementation, we chose the former design of the NNET.

network during training, which is why the distinction is made.

Fig. 1. Architecture of the selected artificial NNET block of the HONN filter.

Next, in section 2 we will give a brief description of the G-HONN system's design and implementation already described with details in the literature. Section 3 describes the M-HONN system. Section 4 focuses on multiple objects recognition and the M-HONN system's The novel design of NNET's architecture of the G-HONN system is implemented as a feedforward multi-layer architecture trained with a backpropagation algorithm. It has a single input source (as explained in the previous section) of input neurons equal to the size of the training image in vector form. In effect, for the training image i 1N x of size m n , there are m n input neurons in the single input source. The input weights are fullyconnected from the input layer to the hidden layers. There are N input weights i w proportional to the size of the training set. The number of the hidden layers, Nl , is equal to the number of the images of the training set, N :

$$\mathbf{N} = \mathbf{1}, \mathbf{2}, \mathbf{3}, \dots, \mathbf{i} \tag{7}$$

$$\mathbf{N}\_{\parallel} = \mathbf{N} \tag{8}$$

Each hidden layer consists of a single neuron. The layer weights are fully connected to the output layer. If we set the output layer to have a single output neuron, then the number of the layer's weights, Nlw equals the number of the training images N :

$$\mathbf{N}\_{\rm lw} = \mathbf{N} \times \mathbf{N}\_{\rm opn} \tag{9}$$

where N 1 opn is the number of the output neurons. There are bias connections to each one of the hidden layers:

$$\mathbf{N}\_{\rm b} = \mathbf{N}\_{\rm l} \tag{10}$$

where Nb is the number of the bias connections. But from Eq. (8), Eq. (10) becomes:

$$\mathbf{N}\_{\mathbf{b}} = \mathbf{N} \tag{11}$$

Assuming there is only a single output neuron in the output layer, then there is only one target connection for that output neuron.

We apply Nguyen-Widrow (Nguyen & Widrow, 1989; Nguyen & Widrow, 1990) initialisation algorithm for setting the initial values of the input weights, the layer weights and the biases. The transfer function of the hidden layers is set as the Log-Sigmoidal function. When a new training image is presented to the NNET we leave connected the input weights of only one of the hidden neurons. In order not to upset any previous learning of the rest of the hidden layer neurons we do not alter their weights when the new image is input to the NNET. It is emphasised that there is no separate feature extraction

Performance Analysis of

N i i i 1

f sg

where k exp

where f 

the NNET is given by:

the frequency domain as:

The activation function f

In the frequency domain Eq. (15) is written as:

the Modified-Hybrid Optical Neural Network Object Recognition System Within Cluttered Scenes 45

the training set and forcing all the non-reference images to follow the activation graph. Moreover the NNET generalizes between all the reference and non-reference images. Quantitatively, we could demonstrate the above observations as follows. The average training set image x in the space domain of the combinatorial-type filters is given by:

*x* (15)

*x* (16)

1 exp x (17)

  (18)

 

*<sup>x</sup>* <sup>1</sup> N 1 *N i i*

*<sup>x</sup>* <sup>1</sup>

 <sup>s</sup> 1 exp x f x

**N** 1 *N i i*

The non-linear activation function of each hidden neuron of an artificial neural network with a non-linear activation function such as the sigmoidal function f can take the form: s

where and shift the graph of the function with respect the x-axis and y-axis and are called the saturation level and slope. It can be shown (Kypraios et al., 2009) that the output y N of an artificial neural network with a non-linear activation function corresponding to an input i s for i 1, ,N (where N is the number of the training set images) is written as:

1 k exp s g exp s g exp s g exp s g exp s g 1 k exp s g exp s g exp s g exp s g exp s g

Therefore from Eq. (16) and Eq. (18) it is shown that any artificial neural network with a non-linear activation function can non-linearly interpolate through the different training set views of the true-class object. Thus, the average training set image *x* in the space domain of

is the activation function of node in the space domain. Eq. (19) is written in

*<sup>x</sup>* <sup>1</sup> N f x 

*x=* <sup>1</sup> **N f x** 

2. On the activation function graph the true-class object values (which is similar for the false-

 

 

1 1 2 2 3 3 N1 N1 N N 1 1 2 2 3 3 N1 N1 N N

 

takes a constant value (and *g* i the neural network node' weights).

of node against the training set images x i is plotted in Fig.

 

 

<sup>i</sup> (19)

**<sup>i</sup>** (20)

 

stage (The Mathworks, 2008; Talukder & Casasent, 1999; Casasent et al., 1998) applied to the training set images. To achieve faster learning we used a modified steepest descent (Looney, 1997; The Mathworks, 2008) back propagation algorithm based on heuristic techniques. This adaptive training algorithm updates the weights and bias values according to the gradient descent momentum and an adaptive learning rate:

$$
\Delta\mathbf{w}\left(\mathbf{i},\mathbf{i}+\mathbf{1}\right) = \mu \times \Delta\mathbf{w}\left(\mathbf{i}-\mathbf{1},\mathbf{i}\right) + \alpha \times \mu \times \frac{\Delta\mathbf{P}\_{\mathbf{i}}}{\Delta\mathbf{w}\left(\mathbf{i}+\mathbf{1},\mathbf{i}\right)}\tag{12}
$$

$$
\Delta \mathbf{b} \left( \mathbf{i}, \mathbf{i} + \mathbf{1} \right) = \mu \times \Delta \mathbf{b} \left( \mathbf{i} - \mathbf{1}, \mathbf{i} \right) + \alpha \times \mu \times \frac{\Delta \mathbf{P}\_{\mathbf{i}}}{\Delta \mathbf{b} \left( \mathbf{i} + \mathbf{1}, \mathbf{i} \right)} \tag{13}
$$

$$\alpha = \begin{cases} \alpha = \alpha + \varepsilon & \text{if } \Delta \mathbf{P}\_t < 0 \\ \alpha = \text{no change} & \text{if } 0 < \Delta \mathbf{P}\_t \text{ for } \Delta \mathbf{P}\_t > \max\left(\mathbf{P}\_t\right) \\ \alpha = \alpha - \varepsilon & \text{if } \Delta \mathbf{P}\_t > \max\left(\mathbf{P}\_t\right) \end{cases} \tag{14}$$

where now variable i is the iteration index of the network and is updated every time all the training set images pass through the NNET. w is the update function of the input and layer weights, b is the update function of the biases of the layers and is the momentum constant. The momentum (Looney, 1997; Haykin, 1999; Beale & Jackson, 1990; The Mathworks, 2008) allows the network to respond not only to the local gradient, but also to recent trends in the error surface. Thus, it acts like a low-pass filter by removing the small features in the error surface of the NNET. The employment of momentum in the training algorithm allows the network not to get stuck in a shallow local minimum, but to slide through such a minimum. Pf is the performance function, usually set as being the mean square error (mse) (Looney, 1997; Haykin, 1999) and Pf is the derivative of the performance function. The learning rate is indicated with the letter . It adapts iteratively based on the derivative of the performance function Pf . In effect, if there is a decrease in the Pf , then the learning rate is increased by the constant . If Pf increases but the derivative does not take a value higher than the maximum allowed value of the performance function, max P <sup>f</sup> , then the learning rate does not change. If Pf increases more than max P <sup>f</sup> , then the learning rate decreases by the constant . The layer weights remain connected with all the hidden layers for all the training set and throughout all the training session.

Hence, now that we have described the design and implementation of the G-HONN filter (object recognition system) we can proceed with a detailed description of the modified-HONN filter.

### **3. Modified-HONN system implementation**

We can make the following qualitatively observations for the G-HONN system. Though the combinatorial-type filters (Samos, 2001) contain no information on non-reference objects in the training set used during their synthesis, the NNET includes information for reference and non-reference images of the true-class object. That can be explained due to the NNET interpolating non-linearly (Kypraios et al.,2002) between the reference images included in the training set and forcing all the non-reference images to follow the activation graph. Moreover the NNET generalizes between all the reference and non-reference images. Quantitatively, we could demonstrate the above observations as follows. The average training set image x in the space domain of the combinatorial-type filters is given by:

$$\overline{\mathbf{x}} = \frac{1}{\mathbf{N}} \sum\_{i=1}^{N} \mathbf{x}\_i \tag{15}$$

In the frequency domain Eq. (15) is written as:

44 Advances in Object Recognition Systems

stage (The Mathworks, 2008; Talukder & Casasent, 1999; Casasent et al., 1998) applied to the training set images. To achieve faster learning we used a modified steepest descent (Looney, 1997; The Mathworks, 2008) back propagation algorithm based on heuristic techniques. This adaptive training algorithm updates the weights and bias values according to the gradient

> w i,i 1 w i 1,i Pf

 

no change if 0 P & & P max P

if P max P

f f

where now variable i is the iteration index of the network and is updated every time all the

constant. The momentum (Looney, 1997; Haykin, 1999; Beale & Jackson, 1990; The Mathworks, 2008) allows the network to respond not only to the local gradient, but also to recent trends in the error surface. Thus, it acts like a low-pass filter by removing the small features in the error surface of the NNET. The employment of momentum in the training algorithm allows the network not to get stuck in a shallow local minimum, but to slide through such a minimum. Pf is the performance function, usually set as being the mean square error (mse) (Looney,

maximum allowed value of the performance function, max P <sup>f</sup> , then the learning rate does

constant . The layer weights remain connected with all the hidden layers for all the training

Hence, now that we have described the design and implementation of the G-HONN filter (object recognition system) we can proceed with a detailed description of the modified-

We can make the following qualitatively observations for the G-HONN system. Though the combinatorial-type filters (Samos, 2001) contain no information on non-reference objects in the training set used during their synthesis, the NNET includes information for reference and non-reference images of the true-class object. That can be explained due to the NNET interpolating non-linearly (Kypraios et al.,2002) between the reference images included in

f

if P 0

is the update function of the biases of the layers and is the momentum

Pf b i,i 1 b i 1,i b i 1,i (13)

ff f

Pf is the derivative of the performance function. The learning rate is

. It adapts iteratively based on the derivative of the performance

Pf increases but the derivative does not take a value higher than the

Pf increases more than max P <sup>f</sup> , then the learning rate decreases by the

w is the update function of the input and layer

Pf , then the learning rate is increased by

w i 1,i (12)

(14)

descent momentum and an adaptive learning rate:

training set images pass through the NNET.

set and throughout all the training session.

**3. Modified-HONN system implementation** 

Pf . In effect, if there is a decrease in the

weights, b

function

1997; Haykin, 1999) and

indicated with the letter

the constant . If

not change. If

HONN filter.

$$\overline{\mathbf{x}} = \frac{1}{\mathbf{N}} \sum\_{i=1}^{N} \mathbf{x}\_i \tag{16}$$

The non-linear activation function of each hidden neuron of an artificial neural network with a non-linear activation function such as the sigmoidal function f can take the form: s

$$\mathbf{f}\_s(\mathbf{x}) = \alpha \frac{1 - \exp(\beta \mathbf{x})}{1 + \exp(\beta \mathbf{x})} \tag{17}$$

where and shift the graph of the function with respect the x-axis and y-axis and are called the saturation level and slope. It can be shown (Kypraios et al., 2009) that the output y N of an artificial neural network with a non-linear activation function corresponding to

an input i s for i 1, ,N (where N is the number of the training set images) is written as:

$$\begin{split} & \mathbf{f} \left( \sum\_{i=1}^{N} \mathbf{s}\_{i} \mathbf{g}\_{i} - \boldsymbol{\theta} \right) = \\ & = a \frac{1 - \mathbf{k} \exp \left( \beta \mathbf{s}\_{1} \mathbf{g}\_{1} \right) \exp \left( \beta \mathbf{s}\_{2} \mathbf{g}\_{2} \right) \exp \left( \beta \mathbf{s}\_{3} \mathbf{g}\_{3} \right) \cdots \exp \left( \beta \mathbf{s}\_{N-1} \mathbf{g}\_{N-1} \right) \exp \left( \beta \mathbf{s}\_{\mathbf{s}} \mathbf{g}\_{\mathbf{N}} \right)}{1 + \mathbf{k} \exp \left( \beta \mathbf{s}\_{1} \mathbf{g}\_{1} \right) \exp \left( \beta \mathbf{s}\_{2} \mathbf{g}\_{2} \right) \exp \left( \beta \mathbf{s}\_{3} \mathbf{g}\_{3} \right) \cdots \exp \left( \beta \mathbf{s}\_{\mathbf{s} - 1} \mathbf{g}\_{\mathbf{N}-1} \right) \exp \left( \beta \mathbf{s}\_{\mathbf{s}} \mathbf{g}\_{\mathbf{N}} \right)}} \end{split} (18)$$

where k exp takes a constant value (and *g* i the neural network node' weights). Therefore from Eq. (16) and Eq. (18) it is shown that any artificial neural network with a non-linear activation function can non-linearly interpolate through the different training set views of the true-class object. Thus, the average training set image *x* in the space domain of the NNET is given by:

$$\overline{\mathbf{x}} = \frac{1}{\mathbf{N}} \mathbf{f}\_{\mathbf{x}}(\mathbf{x}\_{\iota}) \tag{19}$$

where f is the activation function of node in the space domain. Eq. (19) is written in the frequency domain as:

$$\overline{\dot{\mathbf{x}}} = \frac{1}{\mathbf{N}} \mathbf{f}\_{\times}(\mathbf{x}\_{\mathbb{L}}) \tag{20}$$

The activation function f of node against the training set images x i is plotted in Fig. 2. On the activation function graph the true-class object values (which is similar for the false-

Performance Analysis of

Fig. 3. M-HONN system block diagram.

matrix

elements of

given by:

, *<sup>c</sup> x mn* . *<sup>c</sup> <sup>x</sup>*

the Modified-Hybrid Optical Neural Network Object Recognition System Within Cluttered Scenes 47

where W and x <sup>c</sup> <sup>x</sup> <sup>c</sup> L are the input and layer weights from the input neuron of the input vector element at row m and column n to the associated hidden layer for the training image

element at row m and column n to the associated output neuron q. This time, instead of multiplying each training image with the corresponding weight connections as for the G-HONN system's implementation, we keep constant the weight connection values, setting

them to be equal with a randomly chosen image included in the training set , *<sup>c</sup> x mn* . The

The transformed image *S mn i N* <sup>1</sup> , calculated from the dot product of the matrix

c is used for creating the optical mask for the M-HONN system's implementation.

c with the corresponding training image matrix elements of *X mn i N* <sup>1</sup> , is

*m n l* are the input and layer weights from the hidden neuron of the layer vector

class object) are marked with + . Now, if we mark on the plot the activation function values for the training image at 30º and 40º degrees object poses, then the activation function for the training image at 35º degrees will be located on the graph between the values of the activation function for the 30º and 40º degree inputs. The actual activation function values for the training set images of x <sup>30</sup> , x 40 and x 35 are located in the area included under the graph for activation function values greater or equal to the pre-specified true-class object classification level, in this case shown we assume it is set at +40.

Fig. 2. It shows the activation function graph of node against the training set images x <sup>i</sup> .

Motivated by these observations, we apply an optical mask to the filter's input (see Fig. 3). The mask is constructed by the weight connections of the reference images of the true-class object and is applied to all the tested images. Modified-HONN (M-HONN) system is described as follows:

$$\boldsymbol{\Gamma}\_{c} = \boldsymbol{W}^{\boldsymbol{x}\_{c}} \times \boldsymbol{L}^{\boldsymbol{x}\_{c}} = \begin{bmatrix} \boldsymbol{w}^{\boldsymbol{x}\_{c}}\_{11} & \boldsymbol{w}^{\boldsymbol{x}\_{c}}\_{12} & \boldsymbol{L} & \boldsymbol{w}^{\boldsymbol{x}\_{c}}\_{1n-1} & \boldsymbol{w}^{\boldsymbol{x}\_{c}}\_{1n} \\\\ \boldsymbol{w}^{\boldsymbol{x}\_{c}}\_{21} & \boldsymbol{w}^{\boldsymbol{x}\_{c}}\_{22} & \boldsymbol{L} & \boldsymbol{w}^{\boldsymbol{x}\_{c}}\_{2n-1} & \boldsymbol{w}^{\boldsymbol{x}\_{c}}\_{2n} \\\\ & & & \\\\ \boldsymbol{w}^{\boldsymbol{x}\_{c}}\_{m1} & \boldsymbol{w}^{\boldsymbol{x}\_{c}}\_{m2} & \boldsymbol{L} & \boldsymbol{w}^{\boldsymbol{x}\_{c}}\_{m:n-1} & \boldsymbol{w}^{\boldsymbol{x}\_{c}}\_{mn} \end{bmatrix} \times \begin{bmatrix} I^{x\_{c}}\_{1} \ L \ I^{x\_{c}}\_{1q} \\\\ I^{x\_{c}}\_{21} \ L \ I^{x\_{c}}\_{2q} \\\\ \vdots \\\\ I^{x\_{c}}\_{n1} \ L \ I^{x\_{c}}\_{nq} \end{bmatrix} \tag{21}$$

where W and x <sup>c</sup> <sup>x</sup> <sup>c</sup> L are the input and layer weights from the input neuron of the input vector element at row m and column n to the associated hidden layer for the training image , *<sup>c</sup> x mn* . *<sup>c</sup> <sup>x</sup> m n l* are the input and layer weights from the hidden neuron of the layer vector element at row m and column n to the associated output neuron q. This time, instead of multiplying each training image with the corresponding weight connections as for the G-HONN system's implementation, we keep constant the weight connection values, setting

Fig. 3. M-HONN system block diagram.

46 Advances in Object Recognition Systems

class object) are marked with + . Now, if we mark on the plot the activation function values for the training image at 30º and 40º degrees object poses, then the activation function for the training image at 35º degrees will be located on the graph between the values of the activation function for the 30º and 40º degree inputs. The actual activation function values for the training set images of x <sup>30</sup> , x 40 and x 35 are located in the area included under the graph for activation function values greater or equal to the pre-specified true-class object

Fig. 2. It shows the activation function graph of node against the training set images x <sup>i</sup> .

described as follows:

c

*c c*

*x x*

Motivated by these observations, we apply an optical mask to the filter's input (see Fig. 3). The mask is constructed by the weight connections of the reference images of the true-class object and is applied to all the tested images. Modified-HONN (M-HONN) system is

*W ×L = ×*

*c c c c cc c c c c cc*

*x x x x xx 11 12 1n-1 1n 11 1q x x x x xx 21 22 2n-1 2n 21 2q*

*w w L w w l Ll w w L w w l Ll*

*c c c c cc*

(21)

*x x x x xx m1 m2 mn-1 mn n1 nq*

*w w L w w l Ll*

classification level, in this case shown we assume it is set at +40.

them to be equal with a randomly chosen image included in the training set , *<sup>c</sup> x mn* . The matrix c is used for creating the optical mask for the M-HONN system's implementation. The transformed image *S mn i N* <sup>1</sup> , calculated from the dot product of the matrix elements of c with the corresponding training image matrix elements of *X mn i N* <sup>1</sup> , is given by:

Performance Analysis of

and

the Modified-Hybrid Optical Neural Network Object Recognition System Within Cluttered Scenes 49

contiguous three layer network during training, which is why the distinction is made. In effect, neuron 1 is trained with the training still image or video frame x <sup>1</sup> , neuron 2 is trained with the training still image or video frame x 2 and so on, ending with neuron <sup>N</sup> being trained with the training still image or video frame x <sup>N</sup> . Thus, the number of the input

where N is the number of the input weights, N, is i w the size of the training set equal to the

Fig. 4 shows the modified NNET block architecture for accommodating multiple objects for more than one class recognition. As for all the family of G-HONN filters, NNET is implemented as a feedforward multi-layer architecture trained with a backpropagation algorithm. It has a single input source of input neurons equal to the size of the training image or video frame in vector form. In effect, for the training still image or video frame i 1N x of size [m×n], there are [m×n] input neurons in the single input source. The input weight are fully connected from the input layer to the hidden layers. There are N input i w weights proportional to the size of the training set. The number of the hidden layers, Nl is

We have set to each hidden layer to contain a single neuron. The layer weights are fully

where N is the number of the output neurons and opn Nclasses is the number of the different classes. In effect, we have augmented the output layer by adding more output neurons, one

where N and class1 lw N are the layer weights corresponding to object class 1 and object class2 lw

where Nb is the number of the bias connections. There are N target connections from t arg et w

class 2, respectively. There are bias connections to each one of the hidden layers:

connected to the output layer. Now, the number of the layer weights Nlw is equal to:

number of the training images and [m×n] is the size of the image of the training set.

equal to the number of the images or video frames of the training set N:

for each different class. On Fig. 4 we assume N 2 classes . Thus:

the N output neurons of the output layer: opn

N N mn i w (24)

N 1,2,3, ,i and N N <sup>l</sup> (25)

N NN l w opn and N N opn classes (26)

NN 2 opn classes so, there are N N2 l w (27)

N N class1 lw , N N class2 lw (28)

N N <sup>b</sup> (29)

N N t arg et w opn (30)

weights increases proportionally to the size of the training set:

$$S\_{\,i-1\cdots N} = \Gamma\_c \cdot X\_{\,i-1\cdots N}(m, n) \tag{22}$$

Thus, the M-HONN system's transfer function is formulated as follows:

$$M - H \text{ONN} \quad = \sum\_{i=1\dots N}^{N} a\_i \cdot S\_i \left( m, n \right) \tag{23}$$

In Eq. (23) we have chosen to constrain the correlation peak height values as we did with the constrained-HONN (C-HONN) system's implementation, but we can also easily re-write the system's transfer equation for the case of the unconstrained peak height values as with the unconstrained-HONN (U-HONN) system's implementation (Mahalanobis, 1994; Kypraios et al., 2004b).
