**1. Introduction**

38 Advances in Object Recognition Systems

Thorpe, C. M., & Wilkie, D. M. (2006). Properties of time-place learning, In: *Comparative* 

Wasserman (Eds.), Oxford University Press, ISBN 0195167651, Oxford, UK Tulving, E. (1983). *Elements of Episodic Memory.* Clarendon Press, ISBN 0198521022, Oxford,

Tulving, E. (2002). Episodic Memory: From mind to brain. *Annual Review of Psychology*,

Widman, D. R., Gordon, D., & Timberlake, W. (2000). Response cost and time-place

Wood-Gush, D. G. M, and Vestergaard, K. (1991). The seeking of novelty and its relation to

Zentall, T., Clement, T., Bhatt, R., & Allen, J. (2001). Episodic-like memory in pigeons. *Psychonomic Bulletin and Review*, Vol.8, No.4, pp. 685-690, ISSN 10699384

play. *Animal Behaviour*, Vol.42, No.4, pp. 599-606, ISSN 00033472

UK

Vol.53, pp. 1-25, ISSN 00664308

298-309, ISSN 00904996

*Cognition: Experimental Explorations of Animal Intelligence,* T. R. Zentall & E. A.

discrimination by rats in maze tasks. *Animal learning & Behavior*, Vol.28, No.3, pp.

In literature, we could categorise two broad main approaches for pattern recognition systems. The first category consists of linear combinatorial-type filters (LCFs) (Stamos, 2001) where commonly image analysis is done in the frequency domain with the help of Fourier Transformation (FT) (Lynn & Fuerst, 1998; Proakis & Manolakis, 1998). The second category consists of pure neural modelling methods. (Wood, 1996) has given a brief but clear review of invariant pattern recognition methods. His survey has divided the methods into two further sub-categories of solving the invariant pattern recognition problem. The first subcategory has two distinct stages of separately calculating the features of the training set pattern to be invariant to certain distortions and then classifying the extracted features. The second sub-category, instead of having two separate stages, has a single stage which parameterises the desired invariances and then adapts them. (Wood, 1996) has also described the integral transforms, which fall under the first sub-category of feature extractors. They are based on Fourier analysis, such as the multidimensional Fourier transform, Fourier-Mellin transform, triple correlation (Delopoulos et al., 1994) and others. Part of the first sub-category is also the group of algebraic invariants, such as Zernike moments (Khotanzad & Hong, 1990; Perantonis & Lisboa, 1992), generalised moments (Shvedov et al., 1979) and others. Wood has given examples of the second sub-category, the main representative being based on artificial neural network (NNET) architectures. He has presented the weight-sharing neural networks (LeCun, 1989; LeCun et al. 1990), the highorder neural networks (Giles & Maxwell, 1987; Kanaoka et al. 1992; Perantonis & Lisboa, 1992; Spirkovska & Reid, 1992), the time-delay neural networks (TDNN) (Bottou et al., 1990; Simard & LeCun, 1992; Waibel et al., 1989) and others. Finally, he has included an additional third sub-category with all the methods which cannot be placed under either the featureextraction feature-classification approach or the parameterised approach. Such methods are image normalisation pre-processing (Yuceer & Oflazer, 1993) methods for achieving invariance to certain distortions. (Dobnikar et al., 1992) have compared the invariant pattern classification (IPC) neural network architecture versus the Fourier Transform method. They used for their comparison black-and-white images. They have proven the generalisation

Performance Analysis of

the bias vector of unit ι.

the Modified-Hybrid Optical Neural Network Object Recognition System Within Cluttered Scenes 41

Now, let an image *s* be the input vector to an artificial neural network's hidden neuron

*<sup>p</sup>*

We train a novel-designed NNET with N training set images. The network has N neurons in the hidden layer, i.e. equal to the number of training images. There is a single neuron at the output layer to separate two different object classes. (In a multi-class object recognition problem, the increase of the different classes of objects would require more than one neuron at the output layer to correctly separate all the training images.) From Eq. (3) the net input

*i i*

*N N*

*x x*

*x x*

*m×n*

*x ι ι ι =1*

input layer to the hidden neurons for the training image x i of size m n in matrix form or of size 1 mn in vector form. Similarly, for the training image xN of size m n

*m×n*

*x ι ι ι = 1*

From Eqs.(1) and (3) and (5) there is a direct analogy between the combinatorial-type filter

There are two possible and equivalent custom designs (The Mathworks, 2008) of NNET architectures which could be used to form the basis of the combinatorial-type filter synthesis. In both of the designs each neuron of the hidden layer is trained with only one of the training set images. In effect, neuron 1 with the training image x <sup>1</sup> , neuron 2 with the training image x 2 and so on, ending with neuron N with the training image x <sup>N</sup> . In the first design the number of the input sources is kept constant whereas in the second design the number of the input sources is equal to the number of the training images. In both designs each hidden neuron learns one of the training images. In effect the number of the input

where N is the number of the input weights, N , is i w the size of the training set equal to the number of the training images and m n is the size of the image of the training set. The latter design would allow parallel implementation, since all the training images could be

*wo b* 

 

*i*

i.e. it is the weighted sum of the calculated output from the node ι to node . b

*i*

in matrix form ( 1 mn in vector form) the net input, <sup>x</sup> <sup>N</sup> net is given by:

*N*

synthesis procedure and the combination of all the layers' weighted input vectors.

*net = w s* (4)

*net = w s* (5)

N N mn i w (6)

and o <sup>p</sup>

is represented by *wι κ* .

are the input weights from the

represent the

(3)

represents

(node), *<sup>p</sup> <sup>κ</sup> t* represent the target output for pattern *p* on node

calculated output at that node. The weight from node ι to node

*net <sup>p</sup>* 

The activation of each node κ, for pattern *p*, can be written as:

of each of the neurons in the hidden layer is now given by:

where *net* is the net input of each of the hidden neurons. w <sup>x</sup> <sup>i</sup>

weights increases proportionally to the size of the training set:

properties and fault-tolerant abilities to input patterns of the artificial neural network architectures.

An alternative approach for a pattern recognition system has been well demonstrated previously with the Generalised Hybrid Optical Neural Network (G-HONN) filter (object recognition system) (Kypraios, 2010; Kypraios et al., 2004a). G-HONN system combines the digital design of a filter by artificial neural network techniques with an optical correlatortype implementation of the resulting non-linear combinatorial correlator type filter (Jamal-Aldin et al., 1998). The motivation for the design and implementation of the G-HONN object recognition system was to achieve the performance advantages of both artificial neural networks (Looney, 1997; Haykin, 1999; Beale & Jackson, 1990) and the optically implemented correlators (Kumar, 1992). Thus, NNETs exhibit non-linear superposition abilities (Kypraios et al., 2002) of the training set pattern images, learning and generalisation abilities (Kypraios et al., 2004a; Kypraios et al., 2003) over the whole set of the input images. Also, optical correlators allow high speed implementation of the algorithms described.

There are two main design blocks in the G-HONN system, the NNET and a non-linear combinatorial-type correlator (filter) block (Jamal-Aldin, 1998; Casasent, 1984; Caulfield, 1980; Caulfield & Maloney, 1969). Briefly, the original input images pass first through the NNET block and, then, the extracted images from the NNET block's output are used to form a non-linear combinatorial-type correlator filter. Thus the output of the correlator block is a composite image of the G-HONN system's output. To test the system, we correlate it with an input image. Before proceeding to analytical descriptions of the general architecture of the G-HONN system and in an effort to keep consistency between the different mathematical symbolism of artificial neural networks and optical correlators we need to unify their representation. We denote the variables names and functions by non-italic letters (except the vector elements written within the vector, which are written in italic, too), the names of the vectors by italic lower case letters and the matrices by italic upper case. The frequency domain vectors, matrices, variable names and functions are represented by bold letters and the space domain vectors, matrices, variables and functions by plain letters.

Let *h k, l* denote the composite image of the correlator block and *x <sup>i</sup> k, l* denote the training set images, where i 1,2, ,N and N is the number of the training images used in the synthesis of a combinatorial-type filter. The basic filter's transfer function, from the weighed linear combination of *<sup>i</sup> x* , is given by:

$$h(\{k, l\}) = \sum\_{l=1}^{N} \quad \text{a}\_{\text{i}} \propto\_{i} \{k, l\} \tag{1}$$

where the coefficients **a i = 1,2,...,N <sup>i</sup>** are to set the constraints on the peak given by *c.*  The i a values are determined from:

$$a \equiv \mathbb{R}^{-1}c\tag{2}$$

where *a* is the vector of the coefficients **a i = 1,2,...,N <sup>i</sup>** , *R* is the correlation matrix of *<sup>i</sup> t* and *c* is the peak constraint vector. The elements of this are usually set to zeros for falseclass objects and to ones for true class objects.

40 Advances in Object Recognition Systems

properties and fault-tolerant abilities to input patterns of the artificial neural network

An alternative approach for a pattern recognition system has been well demonstrated previously with the Generalised Hybrid Optical Neural Network (G-HONN) filter (object recognition system) (Kypraios, 2010; Kypraios et al., 2004a). G-HONN system combines the digital design of a filter by artificial neural network techniques with an optical correlatortype implementation of the resulting non-linear combinatorial correlator type filter (Jamal-Aldin et al., 1998). The motivation for the design and implementation of the G-HONN object recognition system was to achieve the performance advantages of both artificial neural networks (Looney, 1997; Haykin, 1999; Beale & Jackson, 1990) and the optically implemented correlators (Kumar, 1992). Thus, NNETs exhibit non-linear superposition abilities (Kypraios et al., 2002) of the training set pattern images, learning and generalisation abilities (Kypraios et al., 2004a; Kypraios et al., 2003) over the whole set of the input images. Also, optical correlators allow high speed implementation of the algorithms described.

There are two main design blocks in the G-HONN system, the NNET and a non-linear combinatorial-type correlator (filter) block (Jamal-Aldin, 1998; Casasent, 1984; Caulfield, 1980; Caulfield & Maloney, 1969). Briefly, the original input images pass first through the NNET block and, then, the extracted images from the NNET block's output are used to form a non-linear combinatorial-type correlator filter. Thus the output of the correlator block is a composite image of the G-HONN system's output. To test the system, we correlate it with an input image. Before proceeding to analytical descriptions of the general architecture of the G-HONN system and in an effort to keep consistency between the different mathematical symbolism of artificial neural networks and optical correlators we need to unify their representation. We denote the variables names and functions by non-italic letters (except the vector elements written within the vector, which are written in italic, too), the names of the vectors by italic lower case letters and the matrices by italic upper case. The frequency domain vectors, matrices, variable names and functions are represented by bold letters and the space domain vectors, matrices, variables and functions by plain letters.

Let *h k, l* denote the composite image of the correlator block and *x <sup>i</sup> k, l* denote the training set images, where i 1,2, ,N and N is the number of the training images used in the synthesis of a combinatorial-type filter. The basic filter's transfer function, from the

> **N i=1**

where the coefficients **a i = 1,2,...,N <sup>i</sup>** are to set the constraints on the peak given by *c.* 

where *a* is the vector of the coefficients **a i = 1,2,...,N <sup>i</sup>** , *R* is the correlation matrix of *<sup>i</sup> t* and *c* is the peak constraint vector. The elements of this are usually set to zeros for false-

**=** <sup>i</sup> a *x k,l <sup>i</sup>* (1)

*-1 a=R c* (2)

*h k,l*

weighed linear combination of *<sup>i</sup> x* , is given by:

class objects and to ones for true class objects.

The i a values are determined from:

architectures.

Now, let an image *s* be the input vector to an artificial neural network's hidden neuron (node), *<sup>p</sup> <sup>κ</sup> t* represent the target output for pattern *p* on node and o <sup>p</sup> represent the calculated output at that node. The weight from node ι to node is represented by *wι κ* . The activation of each node κ, for pattern *p*, can be written as:

$$met\_{p\_{\parallel}k\_{\parallel}} = \sum\_{i} \left( w\_{\perp \kappa} o\_{p\_{\parallel}k\_{\parallel}} + b\_{\perp} \right) \tag{3}$$

i.e. it is the weighted sum of the calculated output from the node ι to node . b represents the bias vector of unit ι.

We train a novel-designed NNET with N training set images. The network has N neurons in the hidden layer, i.e. equal to the number of training images. There is a single neuron at the output layer to separate two different object classes. (In a multi-class object recognition problem, the increase of the different classes of objects would require more than one neuron at the output layer to correctly separate all the training images.) From Eq. (3) the net input of each of the neurons in the hidden layer is now given by:

$$met\_{\boldsymbol{x}\_{\boldsymbol{x}\_{\boldsymbol{\cdot}}}} = \sum\_{\boldsymbol{\cdot}=\boldsymbol{1}}^{m \times \boldsymbol{n}} \boldsymbol{w}\_{\boldsymbol{\cdot}}^{\boldsymbol{x}\_{\boldsymbol{\cdot}}} \boldsymbol{s}\_{\boldsymbol{\cdot}}^{\boldsymbol{x}\_{\boldsymbol{\cdot}}} \tag{4}$$

where *net* is the net input of each of the hidden neurons. w <sup>x</sup> <sup>i</sup> are the input weights from the input layer to the hidden neurons for the training image x i of size m n in matrix form or of size 1 mn in vector form. Similarly, for the training image xN of size m n in matrix form ( 1 mn in vector form) the net input, <sup>x</sup> <sup>N</sup> net is given by:

$$\text{met}\_{\mathbf{x}\_N} = \sum\_{\iota=1}^{m \star \iota} w\_{\iota}^{\mathbf{x}\_N} \text{ s}\_{\iota}^{\mathbf{x}\_N} \tag{5}$$

From Eqs.(1) and (3) and (5) there is a direct analogy between the combinatorial-type filter synthesis procedure and the combination of all the layers' weighted input vectors.

There are two possible and equivalent custom designs (The Mathworks, 2008) of NNET architectures which could be used to form the basis of the combinatorial-type filter synthesis. In both of the designs each neuron of the hidden layer is trained with only one of the training set images. In effect, neuron 1 with the training image x <sup>1</sup> , neuron 2 with the training image x 2 and so on, ending with neuron N with the training image x <sup>N</sup> . In the first design the number of the input sources is kept constant whereas in the second design the number of the input sources is equal to the number of the training images. In both designs each hidden neuron learns one of the training images. In effect the number of the input weights increases proportionally to the size of the training set:

$$\mathbf{N}\_{\rm inv} = \mathbf{N} \times \left[ \mathbf{m} \times \mathbf{n} \right] \tag{6}$$

where N is the number of the input weights, N , is i w the size of the training set equal to the number of the training images and m n is the size of the image of the training set. The latter design would allow parallel implementation, since all the training images could be

Performance Analysis of

of the hidden layers:

target connection for that output neuron.

the Modified-Hybrid Optical Neural Network Object Recognition System Within Cluttered Scenes 43

design. It describes the augmented design of the NNET block for accommodating multiple objects recognition of different classes. Section 5 discusses about the performance of M-HONN system with respect to peak sharpness and detectability, distortion range and discrimination ability. We discuss about the M-HONN system and biologically-inspired knowledge learning and representation. Finally, we record the series of tests we conducted with M-HONN system for multiple objects recognition of the same class and of different

The novel design of NNET's architecture of the G-HONN system is implemented as a feedforward multi-layer architecture trained with a backpropagation algorithm. It has a single input source (as explained in the previous section) of input neurons equal to the size of the training image in vector form. In effect, for the training image i 1N x of size m n , there are m n input neurons in the single input source. The input weights are fullyconnected from the input layer to the hidden layers. There are N input weights i w proportional to the size of the training set. The number of the hidden layers, Nl , is equal to

Each hidden layer consists of a single neuron. The layer weights are fully connected to the output layer. If we set the output layer to have a single output neuron, then the number of

where N 1 opn is the number of the output neurons. There are bias connections to each one

Assuming there is only a single output neuron in the output layer, then there is only one

We apply Nguyen-Widrow (Nguyen & Widrow, 1989; Nguyen & Widrow, 1990) initialisation algorithm for setting the initial values of the input weights, the layer weights and the biases. The transfer function of the hidden layers is set as the Log-Sigmoidal function. When a new training image is presented to the NNET we leave connected the input weights of only one of the hidden neurons. In order not to upset any previous learning of the rest of the hidden layer neurons we do not alter their weights when the new image is input to the NNET. It is emphasised that there is no separate feature extraction

where Nb is the number of the bias connections. But from Eq. (8), Eq. (10) becomes:

N 1,2,3, ,i (7)

N NN l w opn (9)

N N b l (10)

N N <sup>b</sup> (11)

N N <sup>l</sup> (8)

classes within clutter. Section 6 concludes and suggests future work.

the layer's weights, Nlw equals the number of the training images N :

**2. General HONN filter's design and implementation** 

the number of the images of the training set, N :

input through the NNET in parallel due to the parallel input sources. However, to allow easier implementation, we chose the former design of the NNET.

Let assume there are three training images of a car, size 100 100 ( 1 100 100 in vector form), of different angle of view, to pass through the NNET. The chosen first design (see Fig. 1) consists of one input source used for all the training images. The input source consists of 10,000 i.e. 1 100 100 input neurons equal to the size of each training image (in vector form). Each layer needs, by definition, to have the same input connections to each of its hidden neurons. However, Fig. 1 is referred to as of the fourth layer since there are three hidden layers (shown here aligned under each other) and one output layer. The input layer does not contain neurons with activation functions and so is omitted in the numbering of the layers. Each of the hidden layers has only one hidden neuron. Though the network initially is fully connected to the input layer during the training stage, only one hidden layer is connected for each training image presented through the NNET. Fig. 1 is thus not a contiguous three (hidden) layer network during training, which is why the distinction is made.

Fig. 1. Architecture of the selected artificial NNET block of the HONN filter.

Next, in section 2 we will give a brief description of the G-HONN system's design and implementation already described with details in the literature. Section 3 describes the M-HONN system. Section 4 focuses on multiple objects recognition and the M-HONN system's design. It describes the augmented design of the NNET block for accommodating multiple objects recognition of different classes. Section 5 discusses about the performance of M-HONN system with respect to peak sharpness and detectability, distortion range and discrimination ability. We discuss about the M-HONN system and biologically-inspired knowledge learning and representation. Finally, we record the series of tests we conducted with M-HONN system for multiple objects recognition of the same class and of different classes within clutter. Section 6 concludes and suggests future work.
