*3.1.2 Wrapper and embedded FS criteria*

Several classifiers were used to define wrapper scores related to their classification performances achieved using feature subsets. Only fast classifiers which did not require an optimization of hyper-parameters were used:


These different classifiers were chosen because their underlying principles were different from each other. SAM, SID and ML rely on class models, while the others use inter-class separation models. RF can model even complex class frontiers remaining quite fast, while linear SVM selects features achieving the most possible linear separation between classes.

Wrapper FS scores measuring classification performance were considered:


$$\mathcal{R}(\mathcal{X}) = \sum\_{i=1}^{n} \delta\left(\mathcal{y}\_i, \mathcal{c}(\mathbf{x}\_i)\right) . m(\mathbf{x}\_i, \mathcal{c}(\mathbf{x}\_i)) \tag{7}$$

(**Figure 1**) includes two steps. The suitable number of bands to select is first estimated for each data set, thanks to an incremental FS optimization algorithm called sequential forward floating search (SFFS) [44]. Then, the core comparison of FS criteria was performed. They were optimized to select this fixed number of bands using a stochastic FS optimization algorithm. A genetic algorithm (GA) (3.2.2) was used. Indeed, it proved to be efficient and generic enough to be used for all tested criteria. Besides it can provide valuable intermediate results (3.2.4) to assess FS stability. GA was launched several times to select this fixed number of bands for all tested FS criteria. It thus provided several possible band subset solutions. Indeed, performing FS several times was also a way to benefit from the stochastic nature of GA and thus to explore more band subset configurations. These different solutions were then quantitatively evaluated, according to different classifiers, to be able to draw conclusions about their relevance quite independently from a given classifier (3.2.3). Besides, to perform a qualitative analysis of the obtained solutions (and especially their stability), band importance measures were derived from intermediate results provided by this stochastic FS (3.2.2). It enabled to visually identify the parts of the spectrum considered as important by the FS criterion and to have a qualitative analysis concerning the stability of the proposed band subset solutions according to the FS criterion.

*Spectral Optimization of Airborne Multispectral Camera for Land Cover Classification…*

In practice, for each FS criterion, the GA feature selection process was launched five times on five limited data sets (100 training and 500 (300 for Indian Pines) testing samples) randomly selected with replacement among the whole data set. To sum it up, at the end, 25 'optimal' feature subset solutions were thus obtained for

each criterion and had to be evaluated (**Figure 2**).

*DOI: http://dx.doi.org/10.5772/intechopen.88507*

**Figure 1.** *Assessment process.*

**79**

with *δ*ð Þ¼ � *i; j* f 1 if *i* 6¼ *j* and 1 otherwise g and *c*ð Þ **x** the label given to **x** by the classifier. Such score measures both the ability to well classify the test samples for a given feature set and the separability between classes. Indeed, the more the samples are well classified, the more the score increases. The more the classifier is confident for well-classified samples, the more the score increases. The more the classifier is confident for bad-labelled samples, the more the score decreases. This confidence score was used in our experiments only for RF and linear SVM classifiers.

Embedded FS criteria. The two following criteria measuring the generalization performance of two classifiers were also tested. They are not pure embedded but can be considered as intermediate between wrapper and embedded. However, differentiating them from previous common wrapper scores, they are here referred to as 'embedded' in the sense that they assess the classification performance directly using a measure calculated directly while training the classifier and not after an evaluation of the model on a test data set. These scores are:


#### **3.2 Assessment approach**

It must be kept in mind that study is a comparison of FS criteria and not of optimization methods. Thus all were optimized using the same optimization heuristics on the same classic hyperspectral data sets (3.3). The proposed workflow

*Spectral Optimization of Airborne Multispectral Camera for Land Cover Classification… DOI: http://dx.doi.org/10.5772/intechopen.88507*

(**Figure 1**) includes two steps. The suitable number of bands to select is first estimated for each data set, thanks to an incremental FS optimization algorithm called sequential forward floating search (SFFS) [44]. Then, the core comparison of FS criteria was performed. They were optimized to select this fixed number of bands using a stochastic FS optimization algorithm. A genetic algorithm (GA) (3.2.2) was used. Indeed, it proved to be efficient and generic enough to be used for all tested criteria. Besides it can provide valuable intermediate results (3.2.4) to assess FS stability. GA was launched several times to select this fixed number of bands for all tested FS criteria. It thus provided several possible band subset solutions. Indeed, performing FS several times was also a way to benefit from the stochastic nature of GA and thus to explore more band subset configurations. These different solutions were then quantitatively evaluated, according to different classifiers, to be able to draw conclusions about their relevance quite independently from a given classifier (3.2.3). Besides, to perform a qualitative analysis of the obtained solutions (and especially their stability), band importance measures were derived from intermediate results provided by this stochastic FS (3.2.2). It enabled to visually identify the parts of the spectrum considered as important by the FS criterion and to have a qualitative analysis concerning the stability of the proposed band subset solutions according to the FS criterion.

In practice, for each FS criterion, the GA feature selection process was launched five times on five limited data sets (100 training and 500 (300 for Indian Pines) testing samples) randomly selected with replacement among the whole data set. To sum it up, at the end, 25 'optimal' feature subset solutions were thus obtained for each criterion and had to be evaluated (**Figure 2**).

**Figure 1.** *Assessment process.*

**Initialization**: (*t* 0) Randomly generate a population *G*ð Þ 0 of *N* individuals,

Keep only the *n* (*n*< *N*) best band subsets of the current population. Let *R t*ð Þ

Random mutations occur (randomly replacing a selected band by another

*δ*ð Þ *b; R t*ð Þ where *δ*ð Þ¼ *b; R* 1if *b*∈*R,* 0 otherwise*:* (8)

Calculate the score of each band subset of the current population.

*Spectral Optimization of Airborne Multispectral Camera for Land Cover Classification…*

Obtain a new individual by randomly crossing these 2 parents.

The GA approach has some advantages for our problem. First, only the best solution is usually kept, while GA has visited many other candidates. Many of them have scores quite similar to the score of the best solution: they are almost as good as the final solution. Therefore, these intermediate results can be used to determine which bands are often selected in the solutions (see **Figure 3**) of these intermediate good band subset populations [27]. Thus, an individual band importance score *I b*ð Þ (defined in equation 8) is calculated for each band *b*, measuring the occurrence at which it has been selected by GA among the different *n* best sets of

To increase robustness, GA can be launched several times (i.e. so that different initializations and mutations occur) and over several training/testing sets randomly extracted from the whole data set. The proposed importance score is calculated for each of these results. Finally, the mean of these scores is considered for each

In state of the art, FS is often considered as a first step in a specific

best classification performance for a problem while sometimes lacking

classification workflow. In this context, wrappers are considered as achieving the

generality and being too classifier dependent. However, in our superspectral sensor design context, selected band subsets must be as efficient as possible for most classifiers and not only for the used FS criteria. Therefore, selected band subsets were here evaluated considering their classification quality reached with several

Kappa coefficient was used as classification quality measure for the next classifiers: ML, RF and 1-vs-1 SVM with a radial basis function (RBF) kernel (with optimized parameters). It can here be noted that the latter was the only one not involved previously in a tested FS criterion. Thus, RBF SVM is the only classifier that is completely independent from all tested FS criteria. To come into details,

Generate a new population *G t*ð Þ of *N* individuals from *R t*ð Þ:

one) in order to avoid to stay in a local optimum.

Randomly select 2 parents among *R t*ð Þ.

i.e. *N* sets of *p* bands. **while** *t*<*tmax* **do** *//generation loop*

be this remaining population.

*DOI: http://dx.doi.org/10.5772/intechopen.88507*

**for all** new individual **do**

*3.2.2.1 GA-derived importance measures*

bands obtained for all generations

X *R*∈*L t*ð Þ

band, giving the importance associated with this band.

*I b*ð Þ¼ <sup>X</sup> *t*

*3.2.3 Quantitative evaluation*

classifiers.

**81**

*t t*þ1

**end for end while**

**Figure 2.**

*Evaluation of FS criteria using band subsets obtained using a GA optimization.*

#### *3.2.1 Optimal band subset size using a sequential FS algorithm*

Intermediate results of a sequential FS algorithm were used to identify how many bands must be selected. In our experiments, the sequential forward floating search (SFFS) algorithm was used [44].

This optimization method provides useful intermediate results. Indeed, it selects the 'best'sets of bands for different band subset sizes, starting from 1. Thus, it provides for each of them both the selected band subset (that could then be evaluated according to the performance of several classifiers) and the value reached by the FS score. Therefore, it enables to observe the evolution of FS score and classification quality, with the number of selected bands and then to decide how many bands are necessary to obtain suitable results. Other sequential methods as SVM-RFE [13] or SFS could also provide such information, but contrary to them, SFFS has the advantage to question at each step the selected set of bands obtained at the previous step, which enables possible modifications in the already selected band subset.

#### *3.2.2 Band subset solutions using a genetic algorithm*

Genetic algorithm (GA) is a family of stochastic optimization heuristics simulating the evolution mechanisms on a population of individuals. A score measuring its adaptation and its aptitude to stay alive is associated with each individual. In FS context, each individual is a feature subset and the score is the FS score.

#### **Algorithm 1** Genetic algorithm.

It is intended to select less than *p* bands among a band set B. *J* is the FS score to optimize.

*Spectral Optimization of Airborne Multispectral Camera for Land Cover Classification… DOI: http://dx.doi.org/10.5772/intechopen.88507*


#### *3.2.2.1 GA-derived importance measures*

The GA approach has some advantages for our problem. First, only the best solution is usually kept, while GA has visited many other candidates. Many of them have scores quite similar to the score of the best solution: they are almost as good as the final solution. Therefore, these intermediate results can be used to determine which bands are often selected in the solutions (see **Figure 3**) of these intermediate good band subset populations [27]. Thus, an individual band importance score *I b*ð Þ (defined in equation 8) is calculated for each band *b*, measuring the occurrence at which it has been selected by GA among the different *n* best sets of bands obtained for all generations

$$I(b) = \sum\_{t} \sum\_{R \in L(t)} \delta(b, R(t)) \text{ where } \delta(b, R) = 1 \text{ if } b \in R \text{, 0 otherwise.} \tag{8}$$

To increase robustness, GA can be launched several times (i.e. so that different initializations and mutations occur) and over several training/testing sets randomly extracted from the whole data set. The proposed importance score is calculated for each of these results. Finally, the mean of these scores is considered for each band, giving the importance associated with this band.

#### *3.2.3 Quantitative evaluation*

In state of the art, FS is often considered as a first step in a specific classification workflow. In this context, wrappers are considered as achieving the best classification performance for a problem while sometimes lacking generality and being too classifier dependent. However, in our superspectral sensor design context, selected band subsets must be as efficient as possible for most classifiers and not only for the used FS criteria. Therefore, selected band subsets were here evaluated considering their classification quality reached with several classifiers.

Kappa coefficient was used as classification quality measure for the next classifiers: ML, RF and 1-vs-1 SVM with a radial basis function (RBF) kernel (with optimized parameters). It can here be noted that the latter was the only one not involved previously in a tested FS criterion. Thus, RBF SVM is the only classifier that is completely independent from all tested FS criteria. To come into details,

spectral resolution ranging from 460 nm to 860 nm. Noisy bands have been discarded, and only 102 spectral bands from the original 115 bands have been kept. It covers an urban area (city centre). Its associated land cover ground truth consists of nine urban classes (materials and

*Spectral Optimization of Airborne Multispectral Camera for Land Cover Classification…*

sensor over the Indian Pines test site in North-western Indiana. It is a radiance VNIR-SWIR hyperspectral image consisting of 220 spectral bands ranging from 400 to 2500 nm. Its associated ground truth consists of agricultural classes and other classes concerning perennial vegetation (forest, grass). In our experiments, only nine classes out of the original were kept. The discarded classes concerned less than 400 samples, which were considered as too few for

over the Salinas Valley in California at a 3.7 m spatial resolution. It is an atsensor radiance VNIR-SWIR hyperspectral image consisting of 224 spectral bands ranging from 400 to 2500 nm. Its associated ground truth consists of agricultural classes, that is to say different kinds of culture at different

An optimal number of bands to select was identified using SFFS incremental FS method, starting from one selected band and incrementing the band subset until a maximal number of bands. Indeed, this maximum number of bands was fixed to 20 considering the superspectral sensor design application, for which the number of possible spectral bands is limited. In practice, the influence of the number of selected bands on the FS score and on the classification performance (measured by Kappa and the F-score of the worst classified class) for a RBF SVM classifier using the best selected band subset was considered. The optimal number of bands was chosen as the one from which these scores virtually no longer increase. Results obtained using several FS scores were also considered to make this decision, and at the end, the number of bands to select is a trade-off between several FS criteria. For Pavia data set, the influence of the number of selected bands on the FS score and on the classification performance (measured by Kappa and the F-score of the worst classified class) for a RBF SVM classifier using the best selected band subset can be seen in **Figure 4**. The different quality indices no longer evolve a lot from five bands, except the minimal F-score increasing slightly up to seven bands. Similar results were obtained using several FS criteria, even though some differences exist. For instance, the quality indices increased slower for *jm* than for *rf.conf* in

**Figure 4**. Thus seven bands were selected for Pavia data set for further

<sup>2</sup> Indian Pines data set is provided by Purdue University and available at https://engineering.purdue.ed

<sup>3</sup> Salinas data set was downloaded from http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral\_

: This hyperspectral scene was collected by the AVIRIS

: This hyperspectral scene was collected by the AVIRIS sensor

vegetation).

• **Indian Pines scene**<sup>2</sup>

*DOI: http://dx.doi.org/10.5772/intechopen.88507*

our experiments.

• **Salinas scene**<sup>3</sup>

growing steps.

experiments.

Remote\_Sensing\_Scenes.

**83**

u/biehl/MultiSpec/hyperspectral.html.

**3.4 Results and discussion**

*3.4.1 Optimal number of bands using SFFS*

#### **Figure 3.**

*Each line is a band subset selected in the intermediate results of GA, and each black dot represents a selected band. Blue histogram represents the importance associated with each band.*

evaluation was performed and averaged on five training/testing sample sets: for each of them, classifiers were trained using 50 samples per class (in order to be in a difficult case with few training samples), and results were evaluated on all remaining ground truth samples. For each FS criterion, all selected band subsets (obtained for the several launches of the algorithm) were evaluated, and the mean Kappa coefficient was then computed over all of them (see **Figure 2**).
