**5.1 Neural network (NN)**

Artificial neural networks (NN) are computational systems that inspired from biological neurons, so neurons provide the information processing ability (Khan et al., 2010). NNs, like people, learn by example. NN is configured for a specific application, such as pattern recognition or data classification, through a learning process. In the last decade, NN has gained momentum in remote sensing field due to the good results obtained in many applications. NN models have two important properties: the ability to learn from input data and to generalize and predict unseen patterns based on the data source, rather than on any particular a priori model. Although there are a wide range of network types and possible applications in remote sensing, most attention has focused on the use of Multilayer Perceptron (MLP) networks trained with a back-propagation learning algorithm for supervised classification. Fig. 4 demonstrated the basic NN structure.

Fig. 4. Basic neural network architecture

Analysis of Land Cover Classification

bits.

in Arid Environment: A Comparison Performance of Four Classifiers 125

only concern with FBC approach. Frequency-based contextual classification of multispectral imagery is performed by using a grey level reduced image and a set of training site bitmaps. The input layer must be 8-bit data. Any 16-bit and 32-bit data layers should be scaled to 8-

There are a number of factors affecting the land cover classification accuracy of the FBC. For instance, the collection of the training area and selection of the pixel window size are very important for this approach. The training area must be representative and of a reasonable size to capture the spatial structure of any land cover type in an image. Nevertheless, pixelwindow size determines the amount of spatial information that can be included in the classification. Because the optimal pixel window varies with the individual class and image resolution, it is usually difficult to determine before image classification. Therefore, an appropriate window size is usually determined empirically. Pixel window size needs to be specified specifically when performing contextual classification on each pixel. Users may have to run the contextual classifier with the same input data, but using different settings for window size until a desirable output is produced. In general, contextual classification performs better when specifying a larger window size, especially if the original input image contains complicated mixed classes (such as urban areas). If the classes are uniform and spectrally pure, then a smaller window size may sufficient. A few examples of different window sizes are shown in Fig. 5. It seems clear that the inclusion of spatial arrangement information of gray-level values in a pixel neighborhood can considerably improve the performance of the FBC, as expected by Gong and Howarth (1992). But, this classifier also has their drawback itself. Contextual classification cannot classify pixels along the edges of the image. If the output window borders the edge of the image file, then the output pixels along the edge are set to zero, to indicate unclassified or unknown pixels. Usually, the error patterns caused by the contextual classification algorithm are usually systematically located along the class boundaries. Meanwhile, the classification results demonstrate that a significant increase in overall accuracy can be achieved by combining spatial data with spectral data when comparing the results obtained from traditional method although it

Fig. 5. Examples of window sizes used in frequency based contextual method (3x3, 5x5, 7x7).

cannot overtake the performance of neural network algorithm.

Black pixel indicates the center pixel of the specific window.

Generally, NN require three or more layers of processing nodes: an input layer which accepts the input variables (e.g., satellite image band values) used in the classification procedure, one or more hidden layers which identify internal structure of the input data, and an output layer. The number of nodes (also called processing units or neurons) at the input layer is equal to the dimensionality of the input vector. For the purpose of land cover classification, the number of nodes at the output layer is the same as the number of the classes intended for the classification scheme. In the meantime, the size of the hidden layer can be a crucial question in network design and need to be determined carefully. Nodes between any two consecutive layers are fully connected with connection weights controlling the strength of the connections. The relationship of input - hidden layers and hidden – output layer are given by Equation 1 and 2 (Sarkheil et al., 2009):

$$\|\mathbf{v}\| = \|\mathbf{v}\| \sum\_{i} \mathbf{w}\_{i}^{\*} \mathbf{w}\_{i}^{\*} \mathbf{v}\_{i}^{\*}\| \tag{1}$$

$$\|\|\| - \lambda^\* \left( \sum\_{i=1}^{d} \nu\_{z^i} \dot{z}\_i \right) \tag{2}$$

where:

ai is the input node i of the input layer, bj is the output node j of the hidden layer, Wij is the weight between input and hidden layer, Vji is the weight between hidden and output layer.

The complexity of the MLP network can be changed by varying the number of layers and the number of units in each layer. Hence, the right structures of NN have to be found by experiments. It has been reported by several researchers (Lippmann, 1987; Cybenko, 1989) that a single hidden layer should usually be sufficient for most problems, especially for classification tasks. The major efforts were focused on controlling the complexity of the model in order to avoid a too complex model structure which may lead into an over fitted ANN model (Niska et al., 2010).

The non-parametric neural network classifiers have numerous advantages over the statistical methods, such as no assumption about the probabilistic models of data, the ability to generalize in noisy environments, and the ability to learn complex patterns. Other advantages of NNs are that they can classify data with a smaller training set than conventional classifiers and be more tolerant of noise present in the training patterns (Mather, 1999).

#### **5.2 Frequency-based contextual (FBC)**

Unlike three methods previously discussed, contextual technique considering both spectral and spatial information in order to perform the classification process instead of depending on spectral component alone (Mustapha et al., 2011). Classification results of spectral data can be improved by taking into account other information into the original image. The simplest way is to incorporate spatial information within the neighboring pixel. Contextual information, or so-called context for simplicity, may be defined as how the probability of presence of one object (or objects) is affected by its (their) neighbors (Tso & Olsen, 2005). There are many examples of contextual classification approach, but in this present article we

Generally, NN require three or more layers of processing nodes: an input layer which accepts the input variables (e.g., satellite image band values) used in the classification procedure, one or more hidden layers which identify internal structure of the input data, and an output layer. The number of nodes (also called processing units or neurons) at the input layer is equal to the dimensionality of the input vector. For the purpose of land cover classification, the number of nodes at the output layer is the same as the number of the classes intended for the classification scheme. In the meantime, the size of the hidden layer can be a crucial question in network design and need to be determined carefully. Nodes between any two consecutive layers are fully connected with connection weights controlling the strength of the connections. The relationship of input - hidden layers and hidden –

The complexity of the MLP network can be changed by varying the number of layers and the number of units in each layer. Hence, the right structures of NN have to be found by experiments. It has been reported by several researchers (Lippmann, 1987; Cybenko, 1989) that a single hidden layer should usually be sufficient for most problems, especially for classification tasks. The major efforts were focused on controlling the complexity of the model in order to avoid a too complex model structure which may lead into an over fitted

The non-parametric neural network classifiers have numerous advantages over the statistical methods, such as no assumption about the probabilistic models of data, the ability to generalize in noisy environments, and the ability to learn complex patterns. Other advantages of NNs are that they can classify data with a smaller training set than conventional classifiers and be more tolerant of noise present in the training patterns

Unlike three methods previously discussed, contextual technique considering both spectral and spatial information in order to perform the classification process instead of depending on spectral component alone (Mustapha et al., 2011). Classification results of spectral data can be improved by taking into account other information into the original image. The simplest way is to incorporate spatial information within the neighboring pixel. Contextual information, or so-called context for simplicity, may be defined as how the probability of presence of one object (or objects) is affected by its (their) neighbors (Tso & Olsen, 2005). There are many examples of contextual classification approach, but in this present article we

(1)

(2)

output layer are given by Equation 1 and 2 (Sarkheil et al., 2009):

where:

ai is the input node i of the input layer, bj is the output node j of the hidden layer,

ANN model (Niska et al., 2010).

**5.2 Frequency-based contextual (FBC)** 

(Mather, 1999).

Wij is the weight between input and hidden layer, Vji is the weight between hidden and output layer. only concern with FBC approach. Frequency-based contextual classification of multispectral imagery is performed by using a grey level reduced image and a set of training site bitmaps. The input layer must be 8-bit data. Any 16-bit and 32-bit data layers should be scaled to 8 bits.

There are a number of factors affecting the land cover classification accuracy of the FBC. For instance, the collection of the training area and selection of the pixel window size are very important for this approach. The training area must be representative and of a reasonable size to capture the spatial structure of any land cover type in an image. Nevertheless, pixelwindow size determines the amount of spatial information that can be included in the classification. Because the optimal pixel window varies with the individual class and image resolution, it is usually difficult to determine before image classification. Therefore, an appropriate window size is usually determined empirically. Pixel window size needs to be specified specifically when performing contextual classification on each pixel. Users may have to run the contextual classifier with the same input data, but using different settings for window size until a desirable output is produced. In general, contextual classification performs better when specifying a larger window size, especially if the original input image contains complicated mixed classes (such as urban areas). If the classes are uniform and spectrally pure, then a smaller window size may sufficient. A few examples of different window sizes are shown in Fig. 5. It seems clear that the inclusion of spatial arrangement information of gray-level values in a pixel neighborhood can considerably improve the performance of the FBC, as expected by Gong and Howarth (1992). But, this classifier also has their drawback itself. Contextual classification cannot classify pixels along the edges of the image. If the output window borders the edge of the image file, then the output pixels along the edge are set to zero, to indicate unclassified or unknown pixels. Usually, the error patterns caused by the contextual classification algorithm are usually systematically located along the class boundaries. Meanwhile, the classification results demonstrate that a significant increase in overall accuracy can be achieved by combining spatial data with spectral data when comparing the results obtained from traditional method although it cannot overtake the performance of neural network algorithm.

Fig. 5. Examples of window sizes used in frequency based contextual method (3x3, 5x5, 7x7). Black pixel indicates the center pixel of the specific window.

Analysis of Land Cover Classification

**7. Accuracy assessment** 

ground to confirm accuracy of measurement.

independent of the training data (Thomas et al., 2003).

Classified Reference

land cover category i and the reference land cover category j.

uniformly distributed in entire image.

in Arid Environment: A Comparison Performance of Four Classifiers 127

attribute of a training set. Some of the literature suggests the use of a minimum of 10–30p cases per-class for training, where p is the number of wavebands used (Piper, 1992 & VanNiel et al., 2005). In addition, all training and test sample sites were revisited on the

Accuracy assessment is an important aspect of land cover mapping as a guide to map quality. The accuracy assessment sites were used to provide a statistical assessment of the accuracy produced by each of the classification mapping approaches tested for this project. The accuracy assessment sites were set aside until the map was completed and accuracy assessment was performed. This process insured that the accuracy data were completely

The error matrix is the standard method used to assess classification accuracy. In the error matrix, the column represents the reference data, while the rows represent the classified data (Table 3). It is typical to extract several statistics from the error matrix: overall accuracy, Kappa coefficient, producer's accuracy and user's accuracy. To conduct the accuracy assessment, a total of 500 sample plots, covering different land cover types, were randomly allocated and examined using field data, a SPOT-5 image with 5m in spatial resolution and high resolution of google earth map. Luedeling & Buerkert (2008) used the google earth map as one of their validation method. The sampling pixels used for accuracy assessment were selected using the randomly stratified sampling method. In addition, the test pixels were

1 2 3 4 5 Total

1 p11 p12 p13 p14 p15 p1+ 2 p21 p22 p23 p24 p25 p2+ 3 p31 p32 p33 p34 p35 p3+ 4 p41 p42 p43 p44 p45 p4+ 5 p51 p52 p53 p54 p55 p5+

Table 3. Population error matrix with pij representing the proportion of area in the mapped

Overall accuracy is the simplest and one of the most popular accuracy measures and is computed by dividing the total correct (i.e., the sum of the major diagonal) by the total number of pixels in the error matrix (Congalton, 1991). Meanwhile, Rosenfield and Fitzpatricklin (1986) identified the Kappa coefficient as a suitable accuracy measure in the thematic classification for representing class accuracy. Its strength lies in the fact that it takes all the elements (diagonal and non-diagonal) of the confusion matrix into consideration, in contrast to the overall accuracy measures which only consider the diagonal element of the matrix. In addition, Two types of thematic errors can be measured in a confusion matrix. They take into account the accuracy of individual categories. One is given by the producer's

Total p+1 p+2 p+3 p+4 p+5

#### **6. Training areas development**

Training is the identification of a sample of pixels of known class membership obtained from reference data. These training pixels are used to derive spectral signatures for classification, and signature statistics are evaluated to ensure adequate separability. Then, the pixels of the image are allocated to the class with greatest similarity to the training data metrics (Alberti et al., 2004). The training stage of a supervised classification is designed to provide the necessary information. The training sites were used to train the supervised classification algorithm for classification process. In remote sensing, the aim of the training stage has typically been the production of descriptive statistics for each class which may then be used in the determination of class membership by the selected classifier (Foody & Mathur, 2006). Obtaining enough training data has been a tough question with land cover applications. Two sets of training data were finally prepared. The first set of data was prepared for the use of the traditional method. Meanwhile, the second set of the training data was used for the advance method. The use of the different datasets for classifying same area by using different classifier will be discussed in section 8.3.

For advanced method, knowledge of the statistical distribution is not required. Rather NNs learn it from a representative training set. In our case, the training phase of the NN was based on the back-propagation (BP) learning rule to minimize the mean square error (MSE) between the desired target vectors and the actual output vectors. Training patterns were presented to the network, and the weights of each node were adjusted so that the approximation created by the NN minimized the error between the desired output and the added output created by the network. In a network each connecting line has an associated weight. NN are trained by adjusting these input weights (connection weights), so that the calculated outputs approximate the desired. In the learning phase, input patterns from training data are fed forward through a network initiated with random synapse weights. The root-mean-square error (RMSE) is calculated between the network outputs and the desired outputs. The errors are back-propagated through the network and the synapse weights are adjusted in order to reduce the total RMSE. This process continues until a convergence criterion is satisfied (Rumelhart et *al.,* 1986). The successful generalization of the NNs used in this application is indicated by the low residual RMS errors. The training is finished when the output value is equal to the ideal output value. Mean Squares of the network Errors (MSE) is given by the Equation 3 (Moghadassi et al., 2009):

$$\text{MSE} = \frac{1}{N} \sum\_{i=1}^{N} (\mathbf{e}\_i)^2 = \frac{1}{N} \sum\_{i=1}^{N} (\mathbf{r}\_i \cdot \mathbf{a}\_i)^2 \tag{3}$$

where Target output (τi) αi is output from neuron

Meanwhile, the selection of training sets were based on field surveys, reference information from SPOT-5 images and visual inspection of the image of the particular area. Only the training samples believed to be the most useful and informative were selected for the classification. Training data acquisition can be a very costly process. Training data that are not carefully selected may introduce error. Collection of training data is the crucial step for image classification and it directly influences the classification accuracy (Wang et al., 2007). Training set size can impact greatly on classification result. However, size is only one attribute of a training set. Some of the literature suggests the use of a minimum of 10–30p cases per-class for training, where p is the number of wavebands used (Piper, 1992 & VanNiel et al., 2005). In addition, all training and test sample sites were revisited on the ground to confirm accuracy of measurement.
