**5. Classification: Support Vector Machine**

Once the images were transformed to a set of features, the classification stage tried to produce an answer to the spider identification problem. In this work, the well known Support Vector Machine (SVM) technique has been used.

The SVM is a method of structural risk minimization (SRM) derived from the statistical learning theory developed by Vapnik and Chervonenkis [17]. It is enclosed in the group of supervised learning methods of pattern recognition, and it is used for classification and regression analysis.

Based on characteristic points called Support Vectors (SVs), the SVM uses an hyperplane or a set of hyperplanes to divide the space in zones enclosing a common class. Labeling these zones the system is able to identify the membership of a testing sample. The interesting aspect of the SVM is that it is able to do so even when the problem is not linearly separable. This is achieved by projecting the problem into a higher dimensional space where the classes are linearly separable. The projection is performed by an operator known as kernel, and this technique is called the kernel trick [18] [19]. The use of hyperplanes to divide the space gives rise to margins as shown in figure 10.

In this work, the Suykens' et. al. LS-SVM [20] was used along with the Radial Basis Function kernel (RBF-kernel). The regularization parameter and the bandwidth of the RBF function were automatically optimized by the validation results obtained from 10 iterations of a Hold-Out cross-validation process. Two samples from each class (from the training set) were used for testing and the remaining for training as we saw that the number of training samples has a big impact in the LS-SVM optimal parameters. Once the optimal parameters were found, they were used to retrain the LS-SVM using all available training samples.

## **6. Experiments and results**

To sum up, the proposed system normalized all the images to 10x10 pixels. This system used the first M features obtained from the DCT projection of the spider webs images and the outcome of the DWT transformation of the spider webs images as inputs for a RBF-kernel LS-SVM with regularization and kernel parameters. The former parameter (the number of features) was varied during experimentation, while the later two parameters (the

**Figure 10.** Example of a separate hyperplane and its Support Vectors and margin for a linear problem.

regularization and the kernel parameters) were automatically optimized by iteration using validation results. To obtain more reliable results, the available samples were divided into training and test sets, so that the system was trained and tested with totally different samples.

The well known K-Folds cross-validation and Hold-Out cross-validation techniques were used to obtain the final results. In particular, experiments with K equal 3, 5, 7, and 10 were run. The percent of training samples in Hold-Out cross-validation was 50, 40, 30, 20, 10 respectively. It is worth it to mention that the training and testing sets were computed for each class individually, having into account that each class has different number of samples. These experiments were performed for both datasets, i.e. using the whole spiderwebs and only the center area.

**Figure 11.** Evolution of success rate when number of coefficients is incremented for the system

(*K*) (% of training samples)

**Table 2.** Results obtained for K-Fold cross-validation and Hold-Out cross-validation using DCT

The Table 3, 4, 5 shows the results reached for K-Fold cross-validation and Hold-Out

This work has faced the problem of spider web recognition improving the results obtained by the previous work [11]. It is important to note that, to the authors extend, these are the only

Images were preprocessed to isolate the center of the spider web and remove the effects of the background in the system. The resulting images were then transformed by using DCT and DWT. For the former, the optimal number of DCT coefficients *M* was found by

*K*-Fold cross-validation Success rate (%) Hold-Out cross-validation Success rate (%)

 98.75% ± 0.18 50 98.69% ± 1.20 98.56% ± 0.58 40 98.45% ± 1.38 98.44% ± 0.76 30 96.89% ± 2.06 98.07% ± 0.52 20 94.29% ± 4.07 - - 10 79.75% ± 6.54

Image Processing for Spider Classi cation 169

DCT-based.

cross-validation for each DWT family.

**7. Discussion and conclusions**

published works using the proposed technique.

#### **6.1. DCT results**

In order to obtain the optimal number of coefficients, 30 experiments were performed using the Hold-Out cross validation technique. As the size of image was normalized to 10x10 pixels the total number of characteristics corresponded to 100, therefore, in this phase, the number of coefficients was swept from 1 to 100 coefficients. Figure 11 represents the mean of those 30 experiments. It can be observed that 60 is the optimal number of coefficients.

Table 2 shows the results reached for K-Fold cross-validation and Hold-Out cross-validation using the optimal number of coefficients.

#### **6.2. DWT results**

In this case, the length of the feature vector depends on the wavelet family. Windows *db1* and *bior3.7* return an image with half the size of the original image while *dmey* returns an image with the same dimension. In all cases, the reconstructed image was used for classification, ignoring the horizontal, vertical and perpendicular components.

**Figure 11.** Evolution of success rate when number of coefficients is incremented for the system DCT-based.


**Table 2.** Results obtained for K-Fold cross-validation and Hold-Out cross-validation using DCT

The Table 3, 4, 5 shows the results reached for K-Fold cross-validation and Hold-Out cross-validation for each DWT family.

#### **7. Discussion and conclusions**

10 Will-be-set-by-IN-TECH

**Figure 10.** Example of a separate hyperplane and its Support Vectors and margin for a linear problem.

only the center area.

**6.1. DCT results**

**6.2. DWT results**

using the optimal number of coefficients.

regularization and the kernel parameters) were automatically optimized by iteration using validation results. To obtain more reliable results, the available samples were divided into training and test sets, so that the system was trained and tested with totally different samples. The well known K-Folds cross-validation and Hold-Out cross-validation techniques were used to obtain the final results. In particular, experiments with K equal 3, 5, 7, and 10 were run. The percent of training samples in Hold-Out cross-validation was 50, 40, 30, 20, 10 respectively. It is worth it to mention that the training and testing sets were computed for each class individually, having into account that each class has different number of samples. These experiments were performed for both datasets, i.e. using the whole spiderwebs and

In order to obtain the optimal number of coefficients, 30 experiments were performed using the Hold-Out cross validation technique. As the size of image was normalized to 10x10 pixels the total number of characteristics corresponded to 100, therefore, in this phase, the number of coefficients was swept from 1 to 100 coefficients. Figure 11 represents the mean of those 30

Table 2 shows the results reached for K-Fold cross-validation and Hold-Out cross-validation

In this case, the length of the feature vector depends on the wavelet family. Windows *db1* and *bior3.7* return an image with half the size of the original image while *dmey* returns an image with the same dimension. In all cases, the reconstructed image was used for classification,

experiments. It can be observed that 60 is the optimal number of coefficients.

ignoring the horizontal, vertical and perpendicular components.

This work has faced the problem of spider web recognition improving the results obtained by the previous work [11]. It is important to note that, to the authors extend, these are the only published works using the proposed technique.

Images were preprocessed to isolate the center of the spider web and remove the effects of the background in the system. The resulting images were then transformed by using DCT and DWT. For the former, the optimal number of DCT coefficients *M* was found by


as the number of samples for training lowered. All in all, the standard deviation achieved in

Image Processing for Spider Classi cation 171

When comparing DCT and DWT, the DWT provided a better behavior for this problem. It is worth it to emphasize that the images have been normalized to a size of 10x10, this is, quite

The results achieved by this work support the conclusions derived from [11] stating that the center of the spiderwebs provide enough discriminative information to recognize different species of spiders. However, it is still necessary to run more experiments with a larger database and execute a more detailed study on which parts of the spiderweb provide the most discriminative information before make stronger conclusions. On the other hand, this will allow to test the system's performance with larger training sets, which will be interesting having into account that the system clearly improved when the number of training samples

This work has been supported by Spanish Government, in particular by "Agencia Española de Cooperación Internacional para el Desarrollo" under funds from D/027406/09 for 2010,

Ticay-Rivas Jaime R., del Pozo-Baños Marcos, Gutiérrez-Ramos Miguel A., Travieso Carlos M.

*Signals and Communications Department, Institute for Technological Development and Innovation in Communications, University of Las Palmas de Gran Canaria, Campus University of Tafira, 35017, Las*

*Smithsonian Tropical Research Institute and Escuela de Biologia Universidad de Costa Rica, Ciudad*

[1] Sytnik, K.M., Preservation of biological diversity: Top-priority tasks of society and state

[2] Carvalho, J. C., Cardoso, P., Crespo, L.C., Henriques, S., Carvalho,R., Gomes, P., "Biogeographic patterns of spiders in coastal dunes along a gradient of mediterraneity."

[3] Johnston, J. M. 2000. The contribution of microarthropods to aboveground food webs: A review and model of belowground transfer in a coniferous forest. American Midland

[4] Peterson, A. T., D. R. Osborne, and D. H. Taylor. 1989. Tree trunk arthropod faunas as

[5] Cardoso P, Arnedo MA, Triantis KA, Borges PAV (2010) Drivers of diversity in Macaronesian spiders and the role of species extinctions. J Biogeogr 37:1034"1046 [6] Finch O-D, Blick T, Schuldt A (2008) Macroecological patterns of spider species richness

(2010) Ukrainian Journal of Physical Optics, 11 (SUPPL. 1), pp. S2-S10.

food resources for birds. Ohio Journal of Science 89(1): 23-25.

both K-Fold and Hold-Out procedures are smaller than those obtained on [11].

compressed, considering the spatial distribution of the threads in the spider webs.

increased.

**Acknowledgements**

and D/033858/10 for 2011

*Palmas de Gran Canaria, Las Palmas, Spain*

Biodiversity and conservation (2011):1-22.

across Europe.Biodivers Conserv 17:2849"2868

**Author details**

and Jesús B. Alonso

Eberhard William G.

**8. References**

*Universitaria, Costa Rica*

Naturalist 143: 226-238

**Table 3.** Results obtained for different Ks of a K-Fold cross-validation and procedure using wavelet db1


**Table 4.** Results obtained for different Ks of a K-Fold cross-validation procedure using wavelet bior3.7


**Table 5.** Results obtained for different Ks of a K-Fold cross-validation procedure using wavelet dmey

euristics, while families *db1*, *bior3.7* and *dmey* were tested for the DWT. Finally, the resulting characteristics were classified using an LS-SVM. In this case, regularization and kernel parameters were automatically optimized by the system dividing the training samples in training and validation sets and retraining the system with the optimal configuration using all available training data.

The results confirmed the improvement compared to [11], where only three species (versus the four species used in thius work) were classified with a maximum success rate of 95%. Thus, tables 2, 3, 4 and 5 show that the new system reached performance of around 99% on K-Fold cross-validation and 98% on Hold-Out cross- validation. Moreover, the obtained standard deviation was significantly low, although, as expected, slightly higher on Hold-Out as the number of samples for training lowered. All in all, the standard deviation achieved in both K-Fold and Hold-Out procedures are smaller than those obtained on [11].

When comparing DCT and DWT, the DWT provided a better behavior for this problem. It is worth it to emphasize that the images have been normalized to a size of 10x10, this is, quite compressed, considering the spatial distribution of the threads in the spider webs.

The results achieved by this work support the conclusions derived from [11] stating that the center of the spiderwebs provide enough discriminative information to recognize different species of spiders. However, it is still necessary to run more experiments with a larger database and execute a more detailed study on which parts of the spiderweb provide the most discriminative information before make stronger conclusions. On the other hand, this will allow to test the system's performance with larger training sets, which will be interesting having into account that the system clearly improved when the number of training samples increased.
