**4.2.1 Classifiers**

In this chapter four different classifiers and approaches were evaluated in the example of Landsat TM sub-scenes recorded over Eastern Mediterranean coastal part. Methods and performances were assessed based on accuracy, capability and applicability. This assessment covered traditional (minimum distance, maximum likelihood, linear discriminant analyses), machine learning (decision tree, artificial neural network, support vector machine), fuzzy (linear mixture modeling, fuzzy c-means, artificial neural network, regression tree) and object based classifiers for LUC mapping. The summary of the techniques and classifiers for various purposes were provided in table 4.

Land Use/Cover Classification Techniques

Objectoriented classifiers

classifiers

Hard classification

Soft (fuzzy) classification

Spectral classifiers

classifiers

contextual classifiers

**4.2.1.1 Model based classifiers (traditional)** 

MD, LDA and unsupervised k-means.

Contextual

Spectral-

Which kind of

information is

Per-field

pixel

used

Whether output is a definitive decision about land cover class or not

Whether spatial information is used or not

Using Optical Remotely Sensed Data in Landscape Planning 29

an image into parcels, and

the mixed pixel problem.

Image segmentation merges pixels into objects and classification is conducted based on the objects, instead of an individual pixel. No GIS vector data

GIS plays an important role in per-field classification, integrating raster and vector data in a classification. The vector data are often used to subdivide

classification is based on the parcels, avoiding the spectral variation inherent in the same class.

Making a definitive decision about the land cover class that each pixel is allocated to a single class. The area estimation by hard classification may produce large errors, especially from coarse spatial resolution data due to

Providing for each pixel a measure of the degree of similarity for every class. Soft classification provides more information and potentially a more accurate result, especially for coarse spatial resolution data classification.

Pure spectral information is used in image classification. A 'noisy' classification result is often produced due to the high variation in the spatial distribution of the same class.

The spatially neighbouring pixel information is used in image

Spectral and spatial information is used in classification. Parametric or non-parametric classifiers are used to generate initial classification images and then contextual classifiers are implemented in the classified images.

Model based classifiers are run using basic statistical theories like mean, variance and standard deviation of the dataset. The most used ones at the literatures are supervised MLC,

classification

Table 4. A taxonomy of image classification methods (Lu and Weng 2007).

**classifiers** 

GIS-based classification approaches.

SVM

MLC, MD, ANN, DT,

Fuzzy-set classifiers, subpixel classifier, spectral mixture analysis.

Maximum likelihood, minimum distance, artificial neural network.

Iterated conditional modes, point-topoint contextual correction, and frequency-based contextual classifier.

ECHO, combination of para metric or non-parametric and

contextual algorithms.

eCognition.

**Criteria Categories Characteristics Example of** 

are used.



Land Use/Cover Classification Techniques Using Optical Remotely Sensed Data in Landscape Planning 29

28 Landscape Planning

Land cover classes are defined. Sufficient reference data are available and used as training samples. The signatures generated from the training samples are then used to train the classifier to classify the spectral data

Clustering-based algorithms are used to partition the spectral image into a number of spectral classes based on the statistical information inherent in the image. No prior definitions of the classes are used. The analyst is responsible for labeling and merging the spectral classes into meaningful

Gaussian distribution is assumed. The parameters (e.g. mean vector and covariance matrix) are often generated

from training samples. When landscape is complex, parametric classifiers often produce 'noisy' results. Another major drawback is that it is difficult to integrate ancillary data, spatial and contextual attributes, and non-statistical information into a classification procedure.

No assumption about the data is required. Non-parametric classifiers do not employ statistical parameters to calculate class separation and are especially suitable for incorporation of non-remote-sensing data into a classification procedure.

Traditional classifiers typically develop a signature by combining the spectra of all training-set pixels from a given feature. The resulting signature contains the contributions of all materials present in the training-set pixels, ignoring the mixed pixel

The spectral value of each pixel is assumed to be a linear or non-linear combination of defined pure materials

proportional membership of each pixel

(or endmembers), providing

to each endmember.

**classifiers** 

Maximum likelihood (MLC), minimum distance (MD), Artificial neural network (ANN), decision tree (DT) classifier.

ISODATA, K-means clustering algorithm.

MLC and Linear discriminant analysis

ANN, DT, Support vector machine (SVM), evidential reasoning, expert

MLC, MD, SVM, ANN, DT

Fuzzy-set classifiers, subpixel classifier, spectral mixture analysis.

system.

(LDA)

**Criteria Categories Characteristics Example of** 

into a thematic map.

classes.

problems.

Whether training samples are used or no

Whether parameters such as mean vector and covariance matrix are used

or not

Non-

Which kind of

information is

Subpixel

pixel

used

Supervised Classification approaches

classification approaches

Parametric classifiers

parametric classifiers

Per-pixel classifiers

classifiers

Unsupervised

Table 4. A taxonomy of image classification methods (Lu and Weng 2007).

#### **4.2.1.1 Model based classifiers (traditional)**

Model based classifiers are run using basic statistical theories like mean, variance and standard deviation of the dataset. The most used ones at the literatures are supervised MLC, MD, LDA and unsupervised k-means.

Land Use/Cover Classification Techniques

bodies, settlement, sand dunes(figure 4).

Adana City

Mediterranean Sea

**MLC MD** 

**LDA K-means** 

Using Optical Remotely Sensed Data in Landscape Planning 31

heuristic, greedy algorithm for minimizing SSE (Sum of Squared Errors), hence, it may not converge to a global optimum. Since its performance strongly depends on the initial estimation of the partition, a relatively large number of clusters are generally recommended to acquire as complete an initial pattern of centroids as possible (Richards & Jia, 1999).

All of the model based classifiers were compared each other using the same training data set in order to ensure the comparability of each technique. Landsat TM image recorded in August 2010 over the Eastern Mediterranean coastal zone of Turkey was used. Main LUC classes were coniferous tree, deciduous tree, permanent farmlands, temporary irrigated farmlands, temporary non-irrigated farmlands, bulrush, grassland, bareground, water

Fig. 4. Model based LUC classification results using strong training dataset and unsupervised K-means classification result in sample study area (in yellow).

largely used non-parametric classifiers were assessed such as ANN, DT and SVM.

Data dependent classifiers are based on non-parametric rules. Particularly, the machine learning classifiers use different approaches according to classifier type. In this chapter,

The ANN is one of several artificial intelligence techniques that have been used for automated image classification as an alternative to conventional statistical approaches. Introductions to the use of ANNs in remote sensing are provided in (Kohonen 1988), (Bishop 1995) and (Atkinson and Tatnall 1997). The multilayer perceptron described by

**4.2.1.2 Data dependent (machine learning classifiers)** 

The minimum distance classifier is used to classify unknown image data to classes which minimize the distance between the image data and the class in multi-feature space. The distance is defined as an index of similarity so that the minimum distance is identical to the maximum similarity. If a pixel closer than to mean of a signature pixels, it classifies as same as nearest one. In figure 3, the nearest signature mean to unclassified pixel is settlement, thus it will be assigned to settlement class according to MD classifier.

Fig. 3. MD classifier concept

The MLC procedure is based on Bayesian probability theory. Using the information from a set of training sites, MLC uses the mean and variance/covariance data of the signatures to estimate the posterior probability that a pixel belong to each class. MLC procedure is similar to MD with the standardized distance option. The difference is that MLC accounts for intercorrelation between bands. By incorporating information about the covariance between bands as well as their inherent variance, MLC procedures what can be conceptualized as an elliptical zone of characterization of signature. It calculates the posterior probability of belonging to each class, where the probability is highest at mean position of the class, and falls off in an elliptical pattern away from the mean.

The LDA classifier conducts linear discriminant analysis of the training site data to form a set of linear combination that expresses the degree of support for each class. The assigned class for each pixel is then that class which receives the highest support after evaluation of all functions. These functions have a form similar to that of a multivariate linear regression equation, where the independent variables are the image bands, and the dependent variable is the measure of support. In fact, the equations are calculated such that they maximize the variance between classes and minimize the variance within classes. So that class separation becomes easier.

In k-means unsupervised technique, K-means clustering technique is used to partition a ndimensional imagery into K exclusive clusters. This method begins by initializing k centroids (means), then assigns each pixel to the cluster whose centroid is nearest, updates the cluster centroids, then repeats the process until the k centroids are fixed. This is a

The minimum distance classifier is used to classify unknown image data to classes which minimize the distance between the image data and the class in multi-feature space. The distance is defined as an index of similarity so that the minimum distance is identical to the maximum similarity. If a pixel closer than to mean of a signature pixels, it classifies as same as nearest one. In figure 3, the nearest signature mean to unclassified pixel is settlement,

The MLC procedure is based on Bayesian probability theory. Using the information from a set of training sites, MLC uses the mean and variance/covariance data of the signatures to estimate the posterior probability that a pixel belong to each class. MLC procedure is similar to MD with the standardized distance option. The difference is that MLC accounts for intercorrelation between bands. By incorporating information about the covariance between bands as well as their inherent variance, MLC procedures what can be conceptualized as an elliptical zone of characterization of signature. It calculates the posterior probability of belonging to each class, where the probability is highest at mean position of the class, and

RED

The LDA classifier conducts linear discriminant analysis of the training site data to form a set of linear combination that expresses the degree of support for each class. The assigned class for each pixel is then that class which receives the highest support after evaluation of all functions. These functions have a form similar to that of a multivariate linear regression equation, where the independent variables are the image bands, and the dependent variable is the measure of support. In fact, the equations are calculated such that they maximize the variance between classes and minimize the variance within classes. So that class separation

In k-means unsupervised technique, K-means clustering technique is used to partition a ndimensional imagery into K exclusive clusters. This method begins by initializing k centroids (means), then assigns each pixel to the cluster whose centroid is nearest, updates the cluster centroids, then repeats the process until the k centroids are fixed. This is a

Settlement

•Signature mean

Unclassified pixel

thus it will be assigned to settlement class according to MD classifier.

Agriculture

Forest

Water

Fig. 3. MD classifier concept

Near infrared

becomes easier.

falls off in an elliptical pattern away from the mean.

heuristic, greedy algorithm for minimizing SSE (Sum of Squared Errors), hence, it may not converge to a global optimum. Since its performance strongly depends on the initial estimation of the partition, a relatively large number of clusters are generally recommended to acquire as complete an initial pattern of centroids as possible (Richards & Jia, 1999).

All of the model based classifiers were compared each other using the same training data set in order to ensure the comparability of each technique. Landsat TM image recorded in August 2010 over the Eastern Mediterranean coastal zone of Turkey was used. Main LUC classes were coniferous tree, deciduous tree, permanent farmlands, temporary irrigated farmlands, temporary non-irrigated farmlands, bulrush, grassland, bareground, water bodies, settlement, sand dunes(figure 4).

Fig. 4. Model based LUC classification results using strong training dataset and unsupervised K-means classification result in sample study area (in yellow).

#### **4.2.1.2 Data dependent (machine learning classifiers)**

Data dependent classifiers are based on non-parametric rules. Particularly, the machine learning classifiers use different approaches according to classifier type. In this chapter, largely used non-parametric classifiers were assessed such as ANN, DT and SVM.

The ANN is one of several artificial intelligence techniques that have been used for automated image classification as an alternative to conventional statistical approaches. Introductions to the use of ANNs in remote sensing are provided in (Kohonen 1988), (Bishop 1995) and (Atkinson and Tatnall 1997). The multilayer perceptron described by

Land Use/Cover Classification Techniques

according to training error (table 5).

Table 5. ANN parameters and values

Active node

Fig. 6. Decision tree architecture

al. 2010).

Using Optical Remotely Sensed Data in Landscape Planning 33

layer architecture. This NN was included 2 hidden layers. First hidden layer was included nodes two times more than input and the second hidden layer was contained nodes three times more than first hidden layer. Learning rate and learning momentum have defined

DT is a non-parametric image classification technique. A decision tree is composed of the root (starting point), active node or internode (rule node) and leaf (class). The root is starting point of the tree, active node creates leaves and the leaves are a group of pixels that either

Root

A Decision Tree is built from a training set, which consists of objects, each of which is completely described by a set of attributes and a class label. Attributes are a collection of properties containing all the information about one object. Unlike class, each attribute may have either ordered (integer or a real value) or unordered values (Boolean value) (Ghose et

Leaf

Most of the DT algorithms generally use the recursive-partitioning algorithm, and its input requires a set of training examples, a splitting rule, and a stopping rule. Splitting rules are determined tree partitioning. Entropy, gini, twoing and gain ratio are the most used splitting rules at the literature (Quinlan 1993, Zambon et al. 2006, Ghose et al. 2010). The stopping rule determines if the training samples can split further. If a split is still possible, the samples in the training set are divided into subsets by performing a set of statistical test

belong to same class or are assigned to a particular class (figure 6).

Parameters Values Input layer 6 nodes 1. Hidden layer 12 nodes 2. Hidden layer 36 nodes Output layer 11 nodes Learning rate 0.001 Learning momentum 0.5 Number of cycles 44864

(Rumelhart et al. 1986) is the most commonly encountered ANN model in remote sensing (because of its generalization capability). The accuracy of an ANN is affected primarily by five variables: (1) the size of the training set, (2) the network architecture, (3) the learning rate, (4) the learning momentum, and (5) the number of training cycles.

*Size of the training set* is the most important part in all LUC classifications. If training pixel counts are enough, accuracy of a LUC map would be better than less training pixels.

*Network architecture* of an ANN is similar to the small part of a neural network (NN) system of human brain. Essentially, there are 3 parts in a NN as input, hidden and output nodes (figure 5). Input nodes are the image bands (e.g. for Landsat TM 6 nodes except thermal band) in a LUC mapping using optical images. Hidden node count depends on the user or previous experiences. There are two way to detect optimal hidden node count; (i) user may check the literature deals with the similar or same area of study site to find the optimal hidden node counts, (ii) user may apply several possibilities itself to find optimal hidden node count checking the accuracy of each applications. According to literature, if a NN system uses the one hidden layer, it is two or three times more than the input nodes generally (Berberoglu et al. 2009). Output nodes counts are equal the class count. Each output nodes are produced a class probability.

Fig. 5. NN architecture

*The learning rate,* determines the portion of the calculated weight change that will be used for weight adjustment. This acts like a low-pass filter, allowing the network to ignore small features in the error surface. Its value ranges between 0 and 1. The smaller the learning rate, the smaller the changes in the weights of the network at each cycle. The optimum value of the learning rate depends on the characteristics of the error surface. Lower learning rates are require more cycles than a larger learning rate.

*Learning momentum* is added to the learning rate to incorporate the previous changes in weight with the current direction of movement in the weight space. It is an additional correction to the learning rate to adjust the weights and ranges between 0.1 and 0.9.

*Number of training cycles* is defined according to training error of a NN system. When the training error became optimal, training cycles are sufficient.

In this chapter NN architecture was defined based on previous literature. Berberoglu 1999, designed a NN architecture for Eastern Mediterranean LUC mapping. Several NN architectures were tried in that literature and the highest performance was taken in four

(Rumelhart et al. 1986) is the most commonly encountered ANN model in remote sensing (because of its generalization capability). The accuracy of an ANN is affected primarily by five variables: (1) the size of the training set, (2) the network architecture, (3) the learning

*Size of the training set* is the most important part in all LUC classifications. If training pixel

*Network architecture* of an ANN is similar to the small part of a neural network (NN) system of human brain. Essentially, there are 3 parts in a NN as input, hidden and output nodes (figure 5). Input nodes are the image bands (e.g. for Landsat TM 6 nodes except thermal band) in a LUC mapping using optical images. Hidden node count depends on the user or previous experiences. There are two way to detect optimal hidden node count; (i) user may check the literature deals with the similar or same area of study site to find the optimal hidden node counts, (ii) user may apply several possibilities itself to find optimal hidden node count checking the accuracy of each applications. According to literature, if a NN system uses the one hidden layer, it is two or three times more than the input nodes generally (Berberoglu et al. 2009). Output nodes counts are equal the class count. Each

*The learning rate,* determines the portion of the calculated weight change that will be used for weight adjustment. This acts like a low-pass filter, allowing the network to ignore small features in the error surface. Its value ranges between 0 and 1. The smaller the learning rate, the smaller the changes in the weights of the network at each cycle. The optimum value of the learning rate depends on the characteristics of the error surface. Lower learning rates are

Input layer (image bands)

Connections Output layer (class count)

Hidden layer (various up to input neuron count)

Node

*Learning momentum* is added to the learning rate to incorporate the previous changes in weight with the current direction of movement in the weight space. It is an additional

*Number of training cycles* is defined according to training error of a NN system. When the

In this chapter NN architecture was defined based on previous literature. Berberoglu 1999, designed a NN architecture for Eastern Mediterranean LUC mapping. Several NN architectures were tried in that literature and the highest performance was taken in four

correction to the learning rate to adjust the weights and ranges between 0.1 and 0.9.

counts are enough, accuracy of a LUC map would be better than less training pixels.

rate, (4) the learning momentum, and (5) the number of training cycles.

output nodes are produced a class probability.

require more cycles than a larger learning rate.

training error became optimal, training cycles are sufficient.

Fig. 5. NN architecture

layer architecture. This NN was included 2 hidden layers. First hidden layer was included nodes two times more than input and the second hidden layer was contained nodes three times more than first hidden layer. Learning rate and learning momentum have defined according to training error (table 5).


Table 5. ANN parameters and values

DT is a non-parametric image classification technique. A decision tree is composed of the root (starting point), active node or internode (rule node) and leaf (class). The root is starting point of the tree, active node creates leaves and the leaves are a group of pixels that either belong to same class or are assigned to a particular class (figure 6).

Fig. 6. Decision tree architecture

A Decision Tree is built from a training set, which consists of objects, each of which is completely described by a set of attributes and a class label. Attributes are a collection of properties containing all the information about one object. Unlike class, each attribute may have either ordered (integer or a real value) or unordered values (Boolean value) (Ghose et al. 2010).

Most of the DT algorithms generally use the recursive-partitioning algorithm, and its input requires a set of training examples, a splitting rule, and a stopping rule. Splitting rules are determined tree partitioning. Entropy, gini, twoing and gain ratio are the most used splitting rules at the literature (Quinlan 1993, Zambon et al. 2006, Ghose et al. 2010). The stopping rule determines if the training samples can split further. If a split is still possible, the samples in the training set are divided into subsets by performing a set of statistical test

Land Use/Cover Classification Techniques

**Error matrix** Agricultur

**SVM** 

32 / 39 82%

Table 6. Error matrix and accuracy calculations

Agriculture Forest Water Total ground truth pixels

> **Producer accuracy**

Using Optical Remotely Sensed Data in Landscape Planning 35

**ANN DT** 

Fig. 7. Data dependent LUC classification results using strong training dataset

34 / 43 79%

120 = **85**%

to the classifications, however, it was discussed in section 5.

e Forest Water Total classified

36 / 38 95% **Overall accuracy** Correct pixels / Total pixel = 32+34+36 /

Overall classification accuracies and kappa coefficiencies of each classification using weak (6962 training pixels) and strong (16300 training pixels) training dataset were evaluated (table 7). In addition, each of the LUC user, producer and kappa accuracies were compared using strong training dataset to assess results in detail (table 8). No ancillary data integrated

pixels **User accuracy** 

32 / 40 = 80% 34 / 40 = 85% 36 / 40 = 90%

defined by the splitting rule. This procedure is recursively repeated for each subset until no more splitting is possible (Ghose et al 2010).

In this chapter, gain ratio, entropy and gini splitting algorithms have been used to find the most accurate one, and entropy accuracy was determined almost 3% more accurate than the gain ratio. Gini resulted the poorest performance for the study area. Stopping criteria and active nodes were determined according to fallowing rule;

If a subset of classes determined as pure, create a leaf and assign to interest class. If a subset having more than one class creates active nodes applying splitting algorithm, continue this processes until class leafs became purer.

The SVM represents a group of theoretically superior machine learning algorithms. SVM employs optimization algorithms to locate the optimal boundaries between classes. Statistically, the optimal boundaries should be generalized to unseen samples with least errors among all possible boundaries separating the classes, therefore minimizing the confusion between classes. In practice, the SVM has been applied to optical character recognition, handwritten digit recognition and text categorization (Vapnik 1995, Joachims 1998). SVM uses the pairwise classification strategy for multiclass classification. SVM can be used linear and non-linear form applying different kernel functions. In this chapter only sigmoidal non-linear kernel were used because, model based classifiers have already worked well if data histogram is linear. All data based models were run non-linearly, and sigmoidal application takes less time than other non-linear kernels. Different kernel functions like radial basis function, linear function or polynomial function may be applied. Even the accuracy of the SVM classifier may change when used the one kernel. For example, in polynomial kernel function, accuracy of SVM is various according to applied polynomial order (Huang et al. 2002).

All data dependent classifiers which were introduced in this chapter were evaluated in the Eastern Mediterranean environment (figure 7).

### **4.3 Accuracy assessments**

A classification accuracy assessment generally includes three basic components: sampling design, response design, and estimation and analysis procedures (Stehman and Czaplewski 1998). Selection of a suitable sampling strategy is a critical step (Congalton 1991). The major components of a sampling strategy include sampling unit (pixels or polygons), sampling design, and sample size (Muller et al. 1998). Possible sampling designs include random, stratified random, systematic, double, and cluster sampling. A detailed description of sampling techniques can be found in previous literature such as Stehman and Czaplewski (1998) and Congalton and Green (1999).

The error matrix approach is the one most widely used in accuracy assessment (Foody 2002). In order to properly generate an error matrix, one must consider the following factors: (1) reference data collection, (2) classification scheme, (3) sampling scheme, (4) spatial autocorrelation, and (5) sample size and sample unit (Congalton and Plourde 2002). After generation of an error matrix, other important accuracy assessment elements, such as overall accuracy, user accuracy, producer accuracy (table 6), and kappa coefficient can be derived. Kappa is the difference between the observed accuracy and the chance agreement divided by one minus that chance agreement (Lillesand and Kiefer 1994).

defined by the splitting rule. This procedure is recursively repeated for each subset until no

In this chapter, gain ratio, entropy and gini splitting algorithms have been used to find the most accurate one, and entropy accuracy was determined almost 3% more accurate than the gain ratio. Gini resulted the poorest performance for the study area. Stopping criteria and

If a subset of classes determined as pure, create a leaf and assign to interest class. If a subset having more than one class creates active nodes applying splitting algorithm, continue this

The SVM represents a group of theoretically superior machine learning algorithms. SVM employs optimization algorithms to locate the optimal boundaries between classes. Statistically, the optimal boundaries should be generalized to unseen samples with least errors among all possible boundaries separating the classes, therefore minimizing the confusion between classes. In practice, the SVM has been applied to optical character recognition, handwritten digit recognition and text categorization (Vapnik 1995, Joachims 1998). SVM uses the pairwise classification strategy for multiclass classification. SVM can be used linear and non-linear form applying different kernel functions. In this chapter only sigmoidal non-linear kernel were used because, model based classifiers have already worked well if data histogram is linear. All data based models were run non-linearly, and sigmoidal application takes less time than other non-linear kernels. Different kernel functions like radial basis function, linear function or polynomial function may be applied. Even the accuracy of the SVM classifier may change when used the one kernel. For example, in polynomial kernel function, accuracy of SVM is various according to applied polynomial

All data dependent classifiers which were introduced in this chapter were evaluated in the

A classification accuracy assessment generally includes three basic components: sampling design, response design, and estimation and analysis procedures (Stehman and Czaplewski 1998). Selection of a suitable sampling strategy is a critical step (Congalton 1991). The major components of a sampling strategy include sampling unit (pixels or polygons), sampling design, and sample size (Muller et al. 1998). Possible sampling designs include random, stratified random, systematic, double, and cluster sampling. A detailed description of sampling techniques can be found in previous literature such as Stehman and Czaplewski

The error matrix approach is the one most widely used in accuracy assessment (Foody 2002). In order to properly generate an error matrix, one must consider the following factors: (1) reference data collection, (2) classification scheme, (3) sampling scheme, (4) spatial autocorrelation, and (5) sample size and sample unit (Congalton and Plourde 2002). After generation of an error matrix, other important accuracy assessment elements, such as overall accuracy, user accuracy, producer accuracy (table 6), and kappa coefficient can be derived. Kappa is the difference between the observed accuracy and the chance agreement divided

more splitting is possible (Ghose et al 2010).

processes until class leafs became purer.

order (Huang et al. 2002).

**4.3 Accuracy assessments** 

Eastern Mediterranean environment (figure 7).

(1998) and Congalton and Green (1999).

by one minus that chance agreement (Lillesand and Kiefer 1994).

active nodes were determined according to fallowing rule;

Fig. 7. Data dependent LUC classification results using strong training dataset


Table 6. Error matrix and accuracy calculations

Overall classification accuracies and kappa coefficiencies of each classification using weak (6962 training pixels) and strong (16300 training pixels) training dataset were evaluated (table 7). In addition, each of the LUC user, producer and kappa accuracies were compared using strong training dataset to assess results in detail (table 8). No ancillary data integrated to the classifications, however, it was discussed in section 5.

Land Use/Cover Classification Techniques

training data sets (figure 8).

Accuracy with different

poorest performances respectively).

500m, 1km) (Aplin and Atkinson 2001).

**4.3.1 Soft (fuzzy) classifiers** 

training sizes

SVM is time costly when using standard PC and laptops.

difference between each class or classification stability (table 9).

as an ideal approache for LUC classification.

Using Optical Remotely Sensed Data in Landscape Planning 37

SVM has a reasonable performance than other data dependent classifiers using weak training dataset. However, the largest accuracy was resulted in DT classifier using strong dataset.

SVM classified forestlands, grassland and permanent farmlands more accurate than other classifiers. There was not significant difference in built up areas among classifiers. The most accurate sand dunes, bulrush and irrigated farmland class accuracies were resulted from DT classifier. DT, LDA, SVM showed reasonably well performance with both weak and strong

In general, data dependent classifiers performed well with weak training dataset. Especially SVM was successful in vegetative area separation. It is clear that if more detailed classification scheme required (e.g. forest tree species) using weak training dataset, SVM might be first option in terms of classification accuracy. On the other hand, application of

Three accuracy calculation methods were shown in table 8, however, major question is which one should be used? Large number of studies have utilized the kappa coefficiencies

A number of criteria were selected for the comparison of both model based and data dependent classifiers as (a) Overall accuracy, (b) classification speed, (c) input parameter handling, (d) hardness in application, (e) accuracy with different training sizes and accuracy

**Criteria** MLC MD LDA k-means ANN DT SVM Overall accuracy \*\*\*\* \*\* \*\*\*\* \* \*\*\* **\*\*\*\*\* \*\*\*\***  Classification speed **\*\*\*\*\* \*\*\*\*\*** \*\*\*\* \*\*\*\* \*\* \*\*\* \* Input parameter handling **\*\*\*\*\* \*\*\*\*\* \*\*\*\*\* \*\*\*\*\*** \*\*\* \*\*\* \*\* Hardness in application **\*\*\*\*\* \*\*\*\*\*** \*\*\*\* **\*\*\*\*\*** \*\*\* \*\* \*

\*\*\*\* \* \*\*\*\* No

Classification stability \*\*\* \*\*\* \*\*\*\* \* \*\*\* **\*\*\*\*\* \*\*\***  Table 8. Comparing hard classifiers (**\*\*\*\*\*** stars and \* star refer the most accurate and the

Defining "what is in a pixel?" numerically, very important for understanding the earth surface in remote sensing science. Increased spatial information may be valuable in a variety of situations. The forthcoming range of satellite spectrometers (e.g. MODIS, MERIS) provided detailed attribute information at relatively coarse spatial resolutions (e.g. 250m,

Traditional hard per-pixel classification of remotely sensed images is limited by mixed pixels (Cracknell 1998). Soft classification overcomes this limitation by predicting the proportional membership of each pixel to each class. Mapping is generally achieved through the application of a conventional statistical classification, which allocates each image pixel to a land cover class. Such approaches are inappropriate for mixed pixels, which contain two or more land cover classes, and fuzzy classification approach is required (Foody 1996).

training

\*\*\* **\*\*\*\*\* \*\*\*\*\*** 


Table 7. Overall and kappa accuracies of model based (MB) and data dependent (DD) classifiers using weak and strong training dataset

Overall classification accuracies indicated that MLC was the most accurate model based classifier when the strong training dataset was used. However, LDA with weak training dataset performed accurately because of its distance separation algorithm. On the other hand, unsupervised k-means classifier was the least accurate one due to the fact that no training pixels were used.

Fig. 8. Visual detail of a small subview; (a) Ground truth, (b) MLC, (c) MD, (d) LDA, (e) ANN, (f) DT and (g) SVM results

**MLC** 69.5 65.6 85.6 83.8 **MD** 67.6 63.3 72 68.5 **LDA** 73.5 70 80.5 78 **K-means** Overall = 57.3 Kappa = 52.1

**ANN** 70.5 66.5 76.9 74.2 **DT** 73 69.5 82.5 80 **SVM** 74 70 79.2 76.7

Overall classification accuracies indicated that MLC was the most accurate model based classifier when the strong training dataset was used. However, LDA with weak training dataset performed accurately because of its distance separation algorithm. On the other hand, unsupervised k-means classifier was the least accurate one due to the fact that no

Table 7. Overall and kappa accuracies of model based (MB) and data dependent (DD)

**a b c**

**d e f**

**g**

Fig. 8. Visual detail of a small subview; (a) Ground truth, (b) MLC, (c) MD, (d) LDA, (e)

**Accuracies (%)**  Weak Strong Overall Kappa Overall Kappa

**Classifier** 

**MB** 

**DD** 

training pixels were used.

ANN, (f) DT and (g) SVM results

Irrigated farmland

Non-irrigated farmland Permanent farmland

Water bodies Sand dunes

Grassland Bareground

Settlement

classifiers using weak and strong training dataset

SVM has a reasonable performance than other data dependent classifiers using weak training dataset. However, the largest accuracy was resulted in DT classifier using strong dataset.

SVM classified forestlands, grassland and permanent farmlands more accurate than other classifiers. There was not significant difference in built up areas among classifiers. The most accurate sand dunes, bulrush and irrigated farmland class accuracies were resulted from DT classifier. DT, LDA, SVM showed reasonably well performance with both weak and strong training data sets (figure 8).

In general, data dependent classifiers performed well with weak training dataset. Especially SVM was successful in vegetative area separation. It is clear that if more detailed classification scheme required (e.g. forest tree species) using weak training dataset, SVM might be first option in terms of classification accuracy. On the other hand, application of SVM is time costly when using standard PC and laptops.

Three accuracy calculation methods were shown in table 8, however, major question is which one should be used? Large number of studies have utilized the kappa coefficiencies as an ideal approache for LUC classification.

A number of criteria were selected for the comparison of both model based and data dependent classifiers as (a) Overall accuracy, (b) classification speed, (c) input parameter handling, (d) hardness in application, (e) accuracy with different training sizes and accuracy difference between each class or classification stability (table 9).


Table 8. Comparing hard classifiers (**\*\*\*\*\*** stars and \* star refer the most accurate and the poorest performances respectively).

#### **4.3.1 Soft (fuzzy) classifiers**

Defining "what is in a pixel?" numerically, very important for understanding the earth surface in remote sensing science. Increased spatial information may be valuable in a variety of situations. The forthcoming range of satellite spectrometers (e.g. MODIS, MERIS) provided detailed attribute information at relatively coarse spatial resolutions (e.g. 250m, 500m, 1km) (Aplin and Atkinson 2001).

Traditional hard per-pixel classification of remotely sensed images is limited by mixed pixels (Cracknell 1998). Soft classification overcomes this limitation by predicting the proportional membership of each pixel to each class. Mapping is generally achieved through the application of a conventional statistical classification, which allocates each image pixel to a land cover class. Such approaches are inappropriate for mixed pixels, which contain two or more land cover classes, and fuzzy classification approach is required (Foody 1996).


Table 9. (K) kappa, (P) producer, and (U) user accuracies of each LUC using hard classifiers in study area.  Land Use/Cover Classification Techniques

defined by a membership function:

integration were discussed in section 5.

μ s(X) =

1990). The form of mixture model is:

μ

membership

(Kandel, 1992).

Using Optical Remotely Sensed Data in Landscape Planning 39

Fuzzy logic models constitute the modeling tools of soft computing. Fuzzy logic is a tool for embedding structured human knowledge into workable algorithms. There are two main types of sets. The 'crisp (or classic) sets' and the 'fuzzy sets'. For example, a crisp set can be

In crisp sets a function of this type is also called characteristic function. Fuzzy sets can be used to produce the rational and sensible clustering. For fuzzy sets there exists a degree of

simultaneously belongs to interest LUC clusters with a different degree of membership

(1)

0 if X

1 if X

S

S

There are several soft classification techniques and these are variable according to training and testing dataset, scale of the study. In this frame, linear mixture modeling (LMM), Regression tree (RT), multi linear regression (MLR) and artificial neural network (ANN) soft classification techniques were evaluated in Eastern Mediterranean area called Upper Seyhan Plane (USP). Berberoglu et al. (2009), was focused on these four soft classification techniques to map percentage of tree cover using ENVISAT MERIS (full spatial resolution 300m) dataset and vegetation metrics. These metrics and more information about ancillary data

For the accuracy assessment of a LUC or fuzzy map, we need to get high resolution ground truth data. Crisp data is adequate for the hard classifications, however assessment of a soft classification needs fuzzy ground truth like real forest cover in study scale quantitavely. High spatial resolution Ikonos (4m) satellite images of three selected plots were used to derive training and testing ground data. Ikonos images were classified as forest and nonforest classes and, results rescaled to MERIS spatial resolution. 80% of this tree cover dataset was used as training data and 20% were separated for accuracy assessment. Linear (LMM

LMM is one of the most used fuzzy techniques in the literature (Berberoglu & Satir 2008) and based on the assumption that class mixing is performed in a linear manner and therefore adopts a least squares procedure to estimate the class proportions within each pixel. The idea is that a continuous scene can be modeled as the sum of the radiometric interactions between individual cover types weighted by their relative proportions (Graetz

(2)

and MLR) and non-linear (ANN and RT) techniques were compared.

*Vi =* 

Ʃ*fj rij + ei* 

*n* 

*j=I* 

s(X) that is mapped on [0, 1]. In the case of LUC map, every area

38 Landscape Planning

Table 9. (K) kappa, (P) producer, and (U) user accuracies of each LUC using hard classifiers in study area.

Fuzzy logic models constitute the modeling tools of soft computing. Fuzzy logic is a tool for embedding structured human knowledge into workable algorithms. There are two main types of sets. The 'crisp (or classic) sets' and the 'fuzzy sets'. For example, a crisp set can be defined by a membership function:

In crisp sets a function of this type is also called characteristic function. Fuzzy sets can be used to produce the rational and sensible clustering. For fuzzy sets there exists a degree of membership μs(X) that is mapped on [0, 1]. In the case of LUC map, every area simultaneously belongs to interest LUC clusters with a different degree of membership (Kandel, 1992).

$$\mu\_\*(X) = \begin{array}{c} \text{l if } X \in \mathcal{S} \\\\ \text{\$ } \\\\ \text{0 if } X \notin \mathcal{S} \end{array} \tag{1}$$

There are several soft classification techniques and these are variable according to training and testing dataset, scale of the study. In this frame, linear mixture modeling (LMM), Regression tree (RT), multi linear regression (MLR) and artificial neural network (ANN) soft classification techniques were evaluated in Eastern Mediterranean area called Upper Seyhan Plane (USP). Berberoglu et al. (2009), was focused on these four soft classification techniques to map percentage of tree cover using ENVISAT MERIS (full spatial resolution 300m) dataset and vegetation metrics. These metrics and more information about ancillary data integration were discussed in section 5.

For the accuracy assessment of a LUC or fuzzy map, we need to get high resolution ground truth data. Crisp data is adequate for the hard classifications, however assessment of a soft classification needs fuzzy ground truth like real forest cover in study scale quantitavely. High spatial resolution Ikonos (4m) satellite images of three selected plots were used to derive training and testing ground data. Ikonos images were classified as forest and nonforest classes and, results rescaled to MERIS spatial resolution. 80% of this tree cover dataset was used as training data and 20% were separated for accuracy assessment. Linear (LMM and MLR) and non-linear (ANN and RT) techniques were compared.

LMM is one of the most used fuzzy techniques in the literature (Berberoglu & Satir 2008) and based on the assumption that class mixing is performed in a linear manner and therefore adopts a least squares procedure to estimate the class proportions within each pixel. The idea is that a continuous scene can be modeled as the sum of the radiometric interactions between individual cover types weighted by their relative proportions (Graetz 1990). The form of mixture model is:

$$V\mathbf{i} = \sum\_{j=I}^{n} \mathbf{j}\_{j}\mathbf{r}\_{ij} + \mathbf{e}\_{i} \tag{2}$$

Land Use/Cover Classification Techniques

Band 3 > 4448 Band12<31651

Band 3>4448 Band12>31651

Rule 1

Rule 2

Rule 3

technique.

Using Optical Remotely Sensed Data in Landscape Planning 41

Band 3>4448 –96.3 – 0.2479 (band02) + 0.1819 (band01) + 0.0672 (band07) +

66.2 – 0.1033 (band02) + 0.0509 (band03) + 0.0388 (band01) + 0.0014 (band07) – 0.0009 (band06) + 0.0012 (band04) – 0.0014 (band08) + 0.00012 (band12) – 0.0007 (band09) + 0.002 (band05)

0.0883 (band03) – 0.04 (band06) – 0.0459 (band08) – 0.0414 (band09) + 0.00472 (band12) + 0.095 (band05) – 0.00379

Condition Target variable (percentage tree cover)

(band10) + 0.0101 (band04)

Correlation coefficiencies of each result with testing dataset from LMM, MLR, ANN and RT were 0.68, 0.69, 0.68 and 0.71 respectively. The most accurate result was obtained using RT

These techniques are not only used to map two classes but also can be applied for more LUC class. In this frame, LMM and ANN fuzzy classification techniques were compared in almost same area as RT classification by Şatr (2006) (figure 10). Only forested areas were selected in Şatr's study. Training and testing dataset were derived from Landsat TM/ETM for each LUC.

–95.5 + 0.00571 (band10)

Table 10. Regression tree rules for tree cover percentage from MERIS data.

Fig. 10. Study area boundary for LMM and ANN fuzzy classifiers comparison.

**N** 

**Adana city** 

**Mediterranean sea** 

km

Where *Vi* is the value of a pixel in input *i*, *fj* the fractional abundance of endmember *j* in input *i*, *rij* is the value of the highest endmember *j* in input *i*, *ei* is the residual error associated with input *i* and *n* is the number of endmembers. Equation (2) is constrained by the assumption that the sum of the input components in each grid should equate to 1.0 as defined by equation (3):

$$\sum\_{i=I}^{n} f\_{j} = 1 \tag{3}$$

LMM needs pure pixels for each class to define the endmembers. Class membership functions are obtained based on endmember spectral characteristics (figure 9).

Fig. 9. Methodology for application of LMM

MLR refers to relating a response variable Y to a set of predictors xi in the form (e.g.Chatterjee and Price, 1991):

$$\text{Y = b0 + b1.x1 + b2.x2 + ...+ bp.xp \tag{4}$$

Where the b0 is the constant value and b1 refers to coefficient of the first variable x1 (waveband). An advantage of linear regression is that it is easy to implement. MLR models are computationally efficient and can also predict confidence intervals for the obtained coefficients and the predicted data. Some of variable was eliminated using stepwise regression models.

The RT method has in recent years become a common alternative to conventional soft classification approaches, particularly with MODIS data (Hansen et al. 2005). The basic concept of a decision tree is to split a complex decision into several simpler decisions that can lead to a solution that is easier to interpret. When the target variable is discrete (e.g. class attribute in a land cover classification), the procedure is known as decision tree classification. By contrast, when the target variable is continuous, it is known as decision tree regression. In an RT, the target variable is a continuous numeric field such as percentage tree cover. Splitting algorithms were introduced in data dependent classifiers section. Splitting rules were contained only crisp equations. However, splitting rules were contained regression equation for each rule additionally in RT. In this study fallowing RT rules were applied to derive tree cover percentage (table 10).

Where *Vi* is the value of a pixel in input *i*, *fj* the fractional abundance of endmember *j* in input *i*, *rij* is the value of the highest endmember *j* in input *i*, *ei* is the residual error associated with input *i* and *n* is the number of endmembers. Equation (2) is constrained by the assumption that the sum of the input components in each grid should equate to 1.0 as

LMM needs pure pixels for each class to define the endmembers. Class membership

MLR refers to relating a response variable Y to a set of predictors xi in the form

Where the b0 is the constant value and b1 refers to coefficient of the first variable x1 (waveband). An advantage of linear regression is that it is easy to implement. MLR models are computationally efficient and can also predict confidence intervals for the obtained coefficients and the predicted data. Some of variable was eliminated using stepwise

The RT method has in recent years become a common alternative to conventional soft classification approaches, particularly with MODIS data (Hansen et al. 2005). The basic concept of a decision tree is to split a complex decision into several simpler decisions that can lead to a solution that is easier to interpret. When the target variable is discrete (e.g. class attribute in a land cover classification), the procedure is known as decision tree classification. By contrast, when the target variable is continuous, it is known as decision tree regression. In an RT, the target variable is a continuous numeric field such as percentage tree cover. Splitting algorithms were introduced in data dependent classifiers section. Splitting rules were contained only crisp equations. However, splitting rules were contained regression equation for each rule additionally in RT. In this study fallowing RT

functions are obtained based on endmember spectral characteristics (figure 9).

*n* 

Ʃ *fj* = 1 *i=I* 

(3)

Y = b0 + b1.x1 + b2.x2 + ….+ bp.xp (4)

defined by equation (3):

Fig. 9. Methodology for application of LMM

rules were applied to derive tree cover percentage (table 10).

(e.g.Chatterjee and Price, 1991):

regression models.


Table 10. Regression tree rules for tree cover percentage from MERIS data.

Correlation coefficiencies of each result with testing dataset from LMM, MLR, ANN and RT were 0.68, 0.69, 0.68 and 0.71 respectively. The most accurate result was obtained using RT technique.

These techniques are not only used to map two classes but also can be applied for more LUC class. In this frame, LMM and ANN fuzzy classification techniques were compared in almost same area as RT classification by Şatr (2006) (figure 10). Only forested areas were selected in Şatr's study. Training and testing dataset were derived from Landsat TM/ETM for each LUC.

Fig. 10. Study area boundary for LMM and ANN fuzzy classifiers comparison.

Land Use/Cover Classification Techniques

schemes.

12).

**4.3.2 Object based classification** 

Using Optical Remotely Sensed Data in Landscape Planning 43

**Detailed classifications LMM ANN (soft) ANN (hard)**  Bareground 78 72 60 Farmland 60 58 12 Water bodies 99 98 85 Turkish pine 85 89 56 Crimean pine 60 63 30 Lebanese cedar 35 40 8 Taurus fir 26 31 0 Juniper 34 44 7 Overall 60 62 33 **General classifications LMM ANN (soft) ANN (hard)**  Bareground 79 72 70 Farmland 59 55 18 Forest 84 83 45 Water bodies 99 100 85 Overall 80 78 57

Table 11. Accuracy comparisons of fuzzy classification methods in different classification

Many complex land covers exhibit similar spectral characteristics making separation in feature space by simple per-pixel classifiers difficult, leading to inaccurate classification. Therefore, an object-based classification is a potential solution for the classification of such regions. The specific benefits are an increase in accuracy, a decrease in classification time and that it helps to eliminate within-field spectral mixing (Berberoglu et al., 2000). The object-based classification approach involved the integration of vector data and raster images within a geographical information system (GIS) and enabled the knowledge free extraction of image object primitives at different spatial resolutions, the so-called multiresolution segmentation. The segmentation operated as a heuristic optimization procedure which minimized the average heterogeneity of image objects at a given spatial resolution for the whole scene (Bian et al. 1992). The objective was to construct a hierarchical net of image objects, in which fine objects were sub-objects of coarser structures. Due to the hierarchical structure, the image data were simultaneously represented at different spatial resolutions. The defined local object-oriented context information was then used together with other (spectral, form, texture) features of the image objects for classification. At the next stage, supervised per-field classification was performed using the nearest neighbor algorithm utilizing field boundary data generated as a result of the segmentation procedure. Objects are segmented in the image and all objects are created object layer. Two or more object layer is called object hierarchy (figure

**Accuracies (%)** 

Turkish pine (*pinus brutia*), Crimean pine (*pinus nigra*), Lebanese cedar (*cedrus libani*), taurus fir (*abies cilicica*) and juniper (*juniperus excelsa*) were classified in detailed classification scheme. In addition, bareground, farmlands, forestlands and water bodies were classified in general classification scheme. LMM, ANN fuzzy classifications and ANN hard classification results were compared to see hard and soft classification accuracy difference and the best fuzzy classification technique (figure 11).

Fig. 11. Soft classification results (a: Turkish pine, b: Crimean pine, c: Juniper, d: Lebanese cedar, e: Taurus fir and f: General soft classification of forest).

LMM and ANN fuzzy classifications using medium spatial resolution data (300m) resulted reasonable classification outcomes if the training data set is large enough. On the other hand, in general both fuzzy classifications were more accurate than the hard classification results (table 11).

Fuzzy classifications are ideal for LUC mapping using coarse or medium spatial resolution data. However, fuzzy classification is not necessary in LUC mapping using very high spatial resolution data (e.g. 0.5m or 1m). High spatial resolution data have the characteristic that group of pixel shows the similar spectral characteristics. Object based classification techniques are suggested in this point.


Table 11. Accuracy comparisons of fuzzy classification methods in different classification schemes.

#### **4.3.2 Object based classification**

Many complex land covers exhibit similar spectral characteristics making separation in feature space by simple per-pixel classifiers difficult, leading to inaccurate classification. Therefore, an object-based classification is a potential solution for the classification of such regions. The specific benefits are an increase in accuracy, a decrease in classification time and that it helps to eliminate within-field spectral mixing (Berberoglu et al., 2000). The object-based classification approach involved the integration of vector data and raster images within a geographical information system (GIS) and enabled the knowledge free extraction of image object primitives at different spatial resolutions, the so-called multiresolution segmentation. The segmentation operated as a heuristic optimization procedure which minimized the average heterogeneity of image objects at a given spatial resolution for the whole scene (Bian et al. 1992). The objective was to construct a hierarchical net of image objects, in which fine objects were sub-objects of coarser structures. Due to the hierarchical structure, the image data were simultaneously represented at different spatial resolutions. The defined local object-oriented context information was then used together with other (spectral, form, texture) features of the image objects for classification. At the next stage, supervised per-field classification was performed using the nearest neighbor algorithm utilizing field boundary data generated as a result of the segmentation procedure. Objects are segmented in the image and all objects are created object layer. Two or more object layer is called object hierarchy (figure 12).

42 Landscape Planning

Turkish pine (*pinus brutia*), Crimean pine (*pinus nigra*), Lebanese cedar (*cedrus libani*), taurus fir (*abies cilicica*) and juniper (*juniperus excelsa*) were classified in detailed classification scheme. In addition, bareground, farmlands, forestlands and water bodies were classified in general classification scheme. LMM, ANN fuzzy classifications and ANN hard classification results were compared to see hard and soft classification accuracy difference and the best

LMM LMM ANN ANN

d

e

f

0

1

Fig. 11. Soft classification results (a: Turkish pine, b: Crimean pine, c: Juniper, d: Lebanese

LMM and ANN fuzzy classifications using medium spatial resolution data (300m) resulted reasonable classification outcomes if the training data set is large enough. On the other hand, in general both fuzzy classifications were more accurate than the hard classification

Fuzzy classifications are ideal for LUC mapping using coarse or medium spatial resolution data. However, fuzzy classification is not necessary in LUC mapping using very high spatial resolution data (e.g. 0.5m or 1m). High spatial resolution data have the characteristic that group of pixel shows the similar spectral characteristics. Object based classification

cedar, e: Taurus fir and f: General soft classification of forest).

results (table 11).

0 50 km

a

b

c

techniques are suggested in this point.

fuzzy classification technique (figure 11).

Land Use/Cover Classification Techniques

Fig. 14. Lower Seyhan Plane (in yellow)

**Mediterranean sea** 

Using Optical Remotely Sensed Data in Landscape Planning 45

Each LUC class can be defined using different dataset and rules according to characteristics of LUC. In this chapter, object and pixel based classifications were evaluated in a

Especially in agricultural land, object based classification is the most suitable technique. Most of the agricultural fields has regular shape and contains one dominant crop in a field in one time. In winter time dominant crop is wheat in the study area, summer period includes corn, soybean and cotton. Mapping the farmlands may be inappropriateusing only one optical image. Multitemporal object based classification approach was used to map LUC in LSP. Two Landsat TM images from March and April were classified together, and June, August images and some of physical variables like distance from cost line and distance from built up areas were added to create rules for each LUC. In this chapter only winter crop pattern discussed using LDA classifier, and rule dependent object based classification were

Mediterranean agricultural land called Lower Seyhan Plane (LSP) (figure 14).

compared each other to see accuracy difference in each LUC (figure 15).

**Adana city** 

Fig. 12. Image – object hierarchy

Basically, there are three steps in object based classification as segmentation, classification and per field integration. An image was divided segments dependent on pixel spectral similarities, structure of the image and surface texture characteristics. This progress is up to variables like scaling factor, smoothness vs. compactness and shape factors (figure 13).

Fig. 13. *(a)* non-segmented image, *(b)* segmented image using scale factor 50, *(c)* segmented image using scale factor 10.

Each segments are contained a group of pixels and scaling factor is defined minimum pixel counts which have similar spectral characteristics in a segment. Compactness and smoothness are important for creating pixel groups. Shape factor is deal with boundary of a segment. Scale factor is variable according to the study scale and ideal scale can be found trying different scale factors. When the sensitive LUC analyze is necessary, compactness factor should be high and smoothness should be low (e.g. vegetation classification in CORINE level 3 and more). Shape factor is very important if shape of the LUC objects have a dominant characteristic (e.g. agricultural lands, roads and buildings).

Basically, there are three steps in object based classification as segmentation, classification and per field integration. An image was divided segments dependent on pixel spectral similarities, structure of the image and surface texture characteristics. This progress is up to variables like scaling factor, smoothness vs. compactness and shape factors (figure 13).

Fig. 13. *(a)* non-segmented image, *(b)* segmented image using scale factor 50, *(c)* segmented

Each segments are contained a group of pixels and scaling factor is defined minimum pixel counts which have similar spectral characteristics in a segment. Compactness and smoothness are important for creating pixel groups. Shape factor is deal with boundary of a segment. Scale factor is variable according to the study scale and ideal scale can be found trying different scale factors. When the sensitive LUC analyze is necessary, compactness factor should be high and smoothness should be low (e.g. vegetation classification in CORINE level 3 and more). Shape factor is very important if shape of the LUC objects have

a dominant characteristic (e.g. agricultural lands, roads and buildings).

Fig. 12. Image – object hierarchy

*a b* 

*c* 

image using scale factor 10.

Each LUC class can be defined using different dataset and rules according to characteristics of LUC. In this chapter, object and pixel based classifications were evaluated in a Mediterranean agricultural land called Lower Seyhan Plane (LSP) (figure 14).

Especially in agricultural land, object based classification is the most suitable technique. Most of the agricultural fields has regular shape and contains one dominant crop in a field in one time. In winter time dominant crop is wheat in the study area, summer period includes corn, soybean and cotton. Mapping the farmlands may be inappropriateusing only one optical image. Multitemporal object based classification approach was used to map LUC in LSP. Two Landsat TM images from March and April were classified together, and June, August images and some of physical variables like distance from cost line and distance from built up areas were added to create rules for each LUC. In this chapter only winter crop pattern discussed using LDA classifier, and rule dependent object based classification were compared each other to see accuracy difference in each LUC (figure 15).

Fig. 14. Lower Seyhan Plane (in yellow)

Land Use/Cover Classification Techniques

discussed in USP using DT and RT classifiers.

**5. Ancillary data integration** 

**5.1 Physical data integration** 

moisture.

wavebands.

**5.2 Surface texture data** 

Using Optical Remotely Sensed Data in Landscape Planning 47

Remotely sensed data may not be enough to map all LUC accurately alone. Ancillary data provide additional information on physical land dynamics, vegetation, climate, social geography and surface variability in LUC classification. When suitable ancillary dataset used, classification accuracy would be more accurate. In this chapter, only elevation (physical), texture (surface variability) and vegetation data (vegetation indices) were

Land physical dynamics such as elevation is vital physical input to LUC mapping. Digital elevation models (DEM) can be derived from stereo image pairs (e.g. ASTER) or radar (e.g. SRTM). Especially, vegetation formation and species vary according to elevation, aspect and climate. Using these ancillary data may improve accuracy of LUC maps (Coops et al. 2006, Şatr 2006). It is also possible to integrate soil characteristics into LUC mapping, because vegetation distribution and plant species are strongly dependent on soil depth, texture and

DEM was integrated to the DT and MLC classification in Eastern Mediterranean area discussed in section 4. Overall accuracy of the classification was increased approximately 4% and particularly bulrush, sand dunes and forestlands classified more accurately using DT. If topography vary in a study area, integrating the DEM may improve the LUC mapping accuracy. However, MLC classification overall accuracy was stable with and without DEM information. Most of the ancillary data increased the accuracy when using non-parametric techniques because parametric techniques like MLC uses the statistical equation to calculate distance of each LUC signature mean to the unknown pixel. However, DT creates rules based on the training data ranges, including elevation and spectral

Some of the variables can be produced using image wavebands such as surface texture and vegetation metrics. Surface textures are also used widely in LUC mapping. Many texture measures have been developed (Haralick et al. 1973, Kashyap et al. 1982, He and Wang 1990, Unser 1995, Emerson et al. 1999) and have been used for image classifications (Franklin and Peddle 1989, Narasimha Rao et al. 2002, Berberoglu et. al. 2000). Franklin and Peddle (1990) found that textures based on a grey-level co-occurrence matrix (GLCM) and spectral features of a SPOT HRV image improved the overall classification accuracy. Gong et al. (1992) compared GLCM, simple statistical transformations (SST), and texture spectrum (TS) approaches with SPOT HRV data, and found that some textures derived from GLCM and SST improved urban classification accuracy. Shaban and Dikshit (2001) investigated GLCM, grey-level difference histogram (GLDH), and sum and difference histogram (SADH) textures from SPOT spectral data in an Indian urban environment, and found that a combination of texture and spectral features improved the classification accuracy. The results based solely on spectral features increased about 9% to 17% with an addition of one or two texture measures. Furthermore, contrast, entropy, variance, and inverse difference

Fig. 15. LDA pixel based and object based classification results of LSP

Pixel based LDA classifier was failed in onion, sour orange, settlement, bulrush and sand dunes using March and April images. However, June and August images, distance from built up areas, distance from cost line were integrated in rule dependent object based classification and kappa coefficient was increased 28% in general. Sour orange, bulrush, sand dunes and settlement accuracy were raised impressively. One of the advantages in rule dependent classifiers was allowed to add new class during the classification. In this study, saline vegetation and natural grasslands were included to improve classification accuracy (table 12).


Table 12. Kappa accuracy of each LUC and difference between object and pixel based classifications
