**4. Experimental results and discussion**

This section is intended to measure the ORS HOG-MLP performance through a set of experiments. In the first experiment, system performance is analyzed when it is adapted to recognize only one object class and was conducted on several object classes. In the second experiment, the system behavior is analyzed when it is configured to recognize multiple classes. Finally, our scheme's performance is with other results reported in the literature.

Before we present and discuss the data obtained from experiments, let us introduce concepts and methods used to measure ORS HOG-MLP performance. Let *O* be an object presented to ORS HOG-MLP and the system generates the class *c*<sup>0</sup> to indicate that the object does not belong to its bank of models, then

A True Positive (TP) occurs when the class generated by the system is **c***<sup>μ</sup>* and *O* belongs to the class **<sup>c</sup>***<sup>μ</sup>*, for *<sup>μ</sup>* <sup>¼</sup> 2, … ,*q*; this indicates a successful classification.

A true negative (TN) occurs when the class generated by the system is **c**<sup>0</sup> and *O* does not belong to the bank of models; this indicates a successful rejection.

A False Positive (FP) occurs when the class generated by the system is **c***<sup>μ</sup>* and *O* does not belong to this class, for *μ* ¼ 2, … ,*q*; this indicates an incorrect classification.

A False Negative (FN) occurs when the class generated by the system is **c**<sup>0</sup> and *O* belongs to the bank of models; this indicates an incorrect rejection.

From these concepts, objective methods, which give information concerning system performance are obtained.

True positive rate (TPR). TPR determines the sensitivity of the system, i.e. it measures the proportion of successful classification obtained by system; TPR is defined as

$$\text{TPR} = \frac{\text{number of TP}}{\text{number of TP} + \text{number of FN}} \tag{14}$$

False Positive Rate (FPR). FPR indicates the proportion of wrongly classified objects; FPR is defined as

$$\mathbf{E}(\boldsymbol{\mu}) = \left( \mathbf{F} \mathbf{P} \mathbf{P} \mathbf{W}(\boldsymbol{\mu}), \mathbf{F} \mathbf{N} \mathbf{R}(\boldsymbol{\mu}) \right) \tag{15}$$

Accuracy (ACC). ACC is used to evaluate the tendency of the system to ascertain correct TPs, defined as

$$\text{ACC} = \frac{\text{number of TP} + \text{number of TN}}{\text{number of TP} + \text{number of TN} + \text{number of FP} + \text{number of FN}} \quad \text{(16)}$$

False negative rate (FNR). FNR indicates the miss rate of the system, defined as

$$\text{FNR} = \frac{\text{number of FN}}{\text{number of TP} + \text{number of FN}} = 1 - \text{TPR} \tag{17}$$

False positives per window (FPPW). FPPW indicates the number of errors by detection window and it is defined as

$$\text{FPPW} = \frac{\text{number of FP}}{\text{N}} \tag{18}$$

where N is the total number of windows processed. Finally, the proposed MLP classifier returns a real value for each detection window, this value is thresholding with a fixed value *u* in order to determine whether or not it is an object belonging to a class of bank of models. Thus, FNR and FPPW are dependent functions of *u*, allowing the plotting of evaluation curves ROC (Receiver Operating Characteristic) [47], **Eq. (19)**, that show the tradeoff between the miss rate and the FPPW for each *u*.

$$\mathbf{E}(u) = (\text{FPPW}(u), \text{FNR}(u))\tag{19}$$

The Caltech 101 dataset, collected by Fei-Fei et al. [48], was used for benchmarking our proposal. For both training and operation phases of the system, positive images (those that contain only interest objects) and negative images (those that do not contain objects of interest or background) were generated.

The Caltech 101 dataset consists of a set of 1179 negative images for the system training phase *I train* � and a set of 1179 negative images for the system operation phase *I test* � . The number of positive images for training and operation phases is *Wtrain*\_*<sup>γ</sup>* þ and *Wtest*\_*<sup>γ</sup>* þ , respectively, varies for each class (see **Table 2**).

In order to demonstrate the performance of our proposal when only one object needs to be identified, in first experiment, ORS HOG-MLP was adapted to individually identify each class of objects in **Table 2**. With the intention of adding robustness


**Table 2.**

*Positive images per class.*

to the system, from *I train* � and *I test* � , additional negative windows for training and testing of the system, *Wtrain* � <sup>¼</sup> 1743 and *<sup>W</sup>test* � ¼ 1909, respectively, were generated. Thus, the training and testing of the final sets of negative images were defined as: *I*0*train* � ¼ *I train* � <sup>þ</sup> *<sup>W</sup>train* � <sup>¼</sup> 2922, and *<sup>I</sup>*0*test* � ¼ *I test* � <sup>þ</sup> *<sup>W</sup>test* � ¼ 3288.

Then, negative images are associated with class 0, and positive images of the object under study, defined by *γ*, to class 1. The training set is defined as *I*0*train* � n o, *<sup>c</sup>*<sup>0</sup> � �, *<sup>W</sup>train*\_*<sup>γ</sup>* þ n o, *<sup>c</sup>*<sup>1</sup> n o � � , this set is presented to ORS HOG-MLP in order to adapt it to recognize the object *γ*. On the other hand, the ORS HOG-MLP performance is evaluated by applying the recognition phase to sets *I* <sup>0</sup>*test* � and *<sup>W</sup>test*\_*<sup>γ</sup>* <sup>þ</sup> . This process was repeated for all objects belonging to **Table 2**.

The HOG algorithm parameters were adjusted as follows: The number of cells by detection window varies depending on the object shape, 9 bins uniformly spaced over 0–180° are defined, the size of blocks is of 2 � 2 adjacent cells and overlap of one cell in both *x*-axis and *y*-axis is used. The MLP is defined with the following features: The hidden layer has 5 neurons, the activation functions of neurons in the hidden layer and output layer are sigmoid, and random initial weights in the range of ½ � �0*:*25,0*:*25 , neurons bias is �1.0, learning rate *ε* ¼ 0*:*01, and for each object, 20,000 iterations were carried out to train the network. **Figure 6** shows some examples of detections, and **Figure 7** and **Table 3** summarize the results of the first experiment.

Considering the results shown in **Table 3**, the average value of TPR and FPR parameters, 0.6387 and 0.001228, respectively, show that the probability of correctly classifying an object is approximately 64%, and the probability of misclassifying an object is less than 1%. Meanwhile, the ACC parameter indicates that the system accuracy is over 98% (e.g., for motorbikes, corresponding to 198 out

**Figure 6.** *Examples of detection results on the Caltech 101 dataset. Detected objects are enclosed in rectangles.*

#### **Figure 7.**

*Performance of our proposal based on evaluation curves ROC: (a) Airplane class, (b) Butterfly class, (c) Motorbikes class.*

#### *Vision Sensors - Recent Advances*


**Table 3.** *Results of first experiment.*

of 200 correct detections with 2 false positives). These results also show that using a detection window size of 8 8 or 16 8 cells does not significantly affect system performance.

In the definition of MLP structure, tests were performed using 5, 10, 15, and 20 neurons in the hidden layer. The results show variations of 0.5% of 10<sup>3</sup> FPPW, so we


*Multi-Object Recognition Using a Feature Descriptor and Neural Classifier DOI: http://dx.doi.org/10.5772/intechopen.106754*

> **Table 4.** *Results of second experiment.*

decided to work with the smaller number of neurons to reduce the computational cost of the system.

In second experiment, ORS HOG-MLP is configured as a multi-class recognition system. Therefore, the system is trained to recognize several groups of objects belonging to **Table 2**, where the number of objects by group can be 2, 3, 4, or 6. Thus, the training set is defined as *I* <sup>0</sup>*train* � n o, *<sup>c</sup>*<sup>0</sup> � �, *<sup>W</sup>train*\_*<sup>γ</sup>* þ n o, *<sup>c</sup>*<sup>1</sup> � �, … , *<sup>W</sup>train*\_*<sup>γ</sup>* þ n o, *<sup>c</sup><sup>μ</sup>* n o � � , *<sup>μ</sup>* <sup>¼</sup> 2,3,4,6, and *γ* identifies the different objects belonging to a group; e.g. *I* <sup>0</sup>*train* � n o, *<sup>c</sup>*<sup>0</sup> � �, *<sup>W</sup>train*\_*cA* þ � �, *c*<sup>1</sup> � �, *Wtrain*\_*cC* þ � �, *c*<sup>2</sup> � �, *Wtrain*\_*cK* þ � �, *c*<sup>3</sup> n o � � is a group that includes the airplane, carside, and motorbike objects, and the negative images are associated with the class 0. For this group, the recognition phase uses *I*0*test* � , *<sup>W</sup>test*\_*cA* <sup>þ</sup> , *Wtest*\_*cC* <sup>þ</sup> , and *<sup>W</sup>test*\_*cK* <sup>þ</sup> to evaluate system performance.

The configuration parameters of the HOG algorithm and MLP are the same as those used in Experiment 1. **Table 4** shows the results of this experiment. The results of second experiment show that system performance is not affected when operating in multiclass mode. This is deduced from the average values of TPR and ACC, which indicate that the probability of correctly classifying an object is approximately 68% and that the system accuracy fluctuates between 97% and 99%. e.g., for group2 = {airplane<sup>1</sup> , carside<sup>2</sup> , electricguitar<sup>3</sup> , motorbikes<sup>4</sup> , revolver<sup>5</sup> , watch6 }, using a block of 8�8 cells, ORS HOG-MLP presents an ACC = 98% for airplane<sup>1</sup> (corresponding to 196 out of 200 correct detections with 4 false positives), and it presents an ACC = 99% for motorbikes<sup>4</sup> (corresponding to 198 out of 200 correct detections with 2 false positives). Those results make the robustness of the proposed system evident, when it is used in multi-object recognition applications.

Finally, the performance of the proposed scheme is compared with several object recognition schemes that have been cited frequently in related studies in this area. **Table 5** shows the results of this comparison.

**Table 5** shows a comparison of our scheme's performance with other results reported in the literature. With an ACC performance of 99%, our method presents a significant improvement over the previous result. The table also shows that the performance of the proposed scheme is not significantly affected when used in multi-object


#### **Table 5.**

*Performance comparison on accuracy for object recognition for two of the Caltech categories with other methods from the literature.*

recognition applications. Zhang et al. reported a similar performance with 99% using a scheme composed of PCA-SIFT method and shape context method for object representation and two-layer AdaBoost network as classification technique [37]. However, due to the methods and techniques used by Zhang et al., this scheme presents a computational cost relatively greater than that presented by our proposal.
