*3.2.4 Selected band stability*

Another evaluation criterion of the FS criteria quality was the stability of the selected features. As explained in Section 3.2.2, band importance profiles (**Figure 3**) can be derived from intermediate results of a GA feature selection. As the contiguous bands in hyperspectral data are correlated, such band importance profile should be quite regular and smooth (i.e. not too noisy). The smoothness/ regularity of these profiles is thus related to the stability of the solutions obtained using a FS criterion. Furthermore, the final optimal solutions provided by the different launch of GA can also be examined. This analysis remains only qualitative.

#### **3.3 Data sets**

Three state-of-the-art available hyperspectral data sets were used for the experiments:

• **Pavia City Centre scene**<sup>1</sup> : This first data set is a hyperspectral scene acquired by the ROSIS sensor over the city centre of Pavia with a 1.3 m spatial resolution. It is a reflectance VNIR hyperspectral image with a

<sup>1</sup> Pavia data set is provided by Pavia University available at http://www.ehu.eus/ccwintco/index.php? title=Hyperspectral\_Remote\_Sensing\_Scenes.

*Spectral Optimization of Airborne Multispectral Camera for Land Cover Classification… DOI: http://dx.doi.org/10.5772/intechopen.88507*

spectral resolution ranging from 460 nm to 860 nm. Noisy bands have been discarded, and only 102 spectral bands from the original 115 bands have been kept. It covers an urban area (city centre). Its associated land cover ground truth consists of nine urban classes (materials and vegetation).


#### **3.4 Results and discussion**

## *3.4.1 Optimal number of bands using SFFS*

An optimal number of bands to select was identified using SFFS incremental FS method, starting from one selected band and incrementing the band subset until a maximal number of bands. Indeed, this maximum number of bands was fixed to 20 considering the superspectral sensor design application, for which the number of possible spectral bands is limited. In practice, the influence of the number of selected bands on the FS score and on the classification performance (measured by Kappa and the F-score of the worst classified class) for a RBF SVM classifier using the best selected band subset was considered. The optimal number of bands was chosen as the one from which these scores virtually no longer increase. Results obtained using several FS scores were also considered to make this decision, and at the end, the number of bands to select is a trade-off between several FS criteria.

For Pavia data set, the influence of the number of selected bands on the FS score and on the classification performance (measured by Kappa and the F-score of the worst classified class) for a RBF SVM classifier using the best selected band subset can be seen in **Figure 4**. The different quality indices no longer evolve a lot from five bands, except the minimal F-score increasing slightly up to seven bands. Similar results were obtained using several FS criteria, even though some differences exist. For instance, the quality indices increased slower for *jm* than for *rf.conf* in **Figure 4**. Thus seven bands were selected for Pavia data set for further experiments.

<sup>2</sup> Indian Pines data set is provided by Purdue University and available at https://engineering.purdue.ed u/biehl/MultiSpec/hyperspectral.html.

<sup>3</sup> Salinas data set was downloaded from http://www.ehu.eus/ccwintco/index.php?title=Hyperspectral\_ Remote\_Sensing\_Scenes.

#### **Figure 4.**

*Pavia test site: influence of the number of selected bands on the feature selection score (left) and on classification performance (using the best band subset with a RBF SVM classifier) (right with kappa coefficient for the blue line and F-score of the worst classified class for the red line). Two FS criteria tested: rf.conf (top) and jm (bottom).*

It can also be noticed from **Figure 6** that the best FS scores lead to quite equivalent classification quality. This is clearly visible for Pavia and to a less extent for Salinas. On the opposite, results are more contrasted on Indian Pines. This might be due to the fact that Indian Pines is a more difficult data set, with a stronger intraclass variability and inter-class similarity, whereas Pavia is a quite simple data set with few well-distinguished classes. These results will now be discussed for each category of FS criteria. Band importance provided by GA will also be considered.

*Indian Pines test site: Influence of the number of selected bands on the feature selection score (left) and on classification performance (using the best band subset with a RBF SVM classifier) (right with kappa coefficient for the blue line and F-score of the worst classified class for the red line). Two FS criteria tested: rf.conf (top)*

*Spectral Optimization of Airborne Multispectral Camera for Land Cover Classification…*

*DOI: http://dx.doi.org/10.5772/intechopen.88507*

It can be seen from **Figure 6** that the FS scores *sam.K* and *sid.K* are less good than the other wrapper scores. This phenomenon appears strongly for Indian Pines and Salinas and is also a light trend for Pavia. The fact that it is more striking on Indian Pines scene can be related to the important intra-class variability of this data set. The other wrapper scores relying on Kappa coefficient as a measure of classification performance lead to quite equivalent quantitative results. However, band importance profiles (**Figures 7** and **8**) provide other additional information. For instance, for Pavia data set (**Figure 7**), the FS score *svm.lin.K* tends to select the first bands (around band 5) of the spectrum, even though these bands are quite noisy. *ml.K* score performs very well considering classification performance but tends to be very sensitive to a probable atmospheric artefact, paying a lot of importance to bands from band 80 to band 85 and especially to band 82. This part of the spectrum corresponds to an atmospheric correction artefact, and not to a true discriminant phenomenon. This trend to select bands corresponding to this artefact is also

Using classification confidence-based FS scores instead of classic classification accuracy scores tends to improve results. This trend can be observed in **Figure 6** both for RF and SVM: using *rf.conf* instead of *rf.K* or using *svm.lin.conf* instead of

*3.4.2.1 Comparison of wrapper criteria*

**Figure 5.**

*and jm (bottom).*

observed for other FS scores.

**85**

The same kind of results was obtained for Salinas, and seven bands were also selected for this data set in further experiments.

For Indian Pines, obtained results are slightly different as shown in **Figure 5**. The FS score increases fastly until seven bands are selected. Then, it remains quite constant for *rf.conf* but continues to very slightly increase for *jm*. The same phenomenon can be observed for classification accuracies reached by a RBF SVM classifier using the selected band subsets. For *rf.conf* FS criterion, a maximum is reached around 10–11 selected bands, while for *jm*, a stage is reached for these values followed by a new slight increase.

However, it must be kept in mind that this data set is more difficult than the other ones. Indeed, on the one hand, it offers less training/testing samples (and thus an increased risk of over-fitting). On the other hand, classes are more difficult to distinguish to each other, and raw classification results (that is to say without any regularization post-processing step) remain noisy. Thus 10 bands were selected in further experiments for Indian Pines data set.

#### *3.4.2 Comparison of FS criteria*

GA optimization heuristic was then launched to select 7 bands for Pavia, 10 bands for Indian Pines and 7 bands for Salinas. For each FS score, several feature subset solutions were proposed using GA. Their classification quality rate (Kappa) (averaged over all of them) using several classifiers is presented in **Figure 6**. At the first glance, most of the time, Kappa coefficients reached using features selected according to different FS scores are correlated over the different classifiers (RBF SVM, RF and ML) used for evaluation. Indeed, if a FS score leads to the best classification for a classifier, it will also generally be the best for the other classifiers. Thus the relevance of score appeared to be quite independent from the classifier used at validation step.

*Spectral Optimization of Airborne Multispectral Camera for Land Cover Classification… DOI: http://dx.doi.org/10.5772/intechopen.88507*

#### **Figure 5.**

*Indian Pines test site: Influence of the number of selected bands on the feature selection score (left) and on classification performance (using the best band subset with a RBF SVM classifier) (right with kappa coefficient for the blue line and F-score of the worst classified class for the red line). Two FS criteria tested: rf.conf (top) and jm (bottom).*

It can also be noticed from **Figure 6** that the best FS scores lead to quite equivalent classification quality. This is clearly visible for Pavia and to a less extent for Salinas. On the opposite, results are more contrasted on Indian Pines. This might be due to the fact that Indian Pines is a more difficult data set, with a stronger intraclass variability and inter-class similarity, whereas Pavia is a quite simple data set with few well-distinguished classes. These results will now be discussed for each category of FS criteria. Band importance provided by GA will also be considered.

#### *3.4.2.1 Comparison of wrapper criteria*

It can be seen from **Figure 6** that the FS scores *sam.K* and *sid.K* are less good than the other wrapper scores. This phenomenon appears strongly for Indian Pines and Salinas and is also a light trend for Pavia. The fact that it is more striking on Indian Pines scene can be related to the important intra-class variability of this data set.

The other wrapper scores relying on Kappa coefficient as a measure of classification performance lead to quite equivalent quantitative results. However, band importance profiles (**Figures 7** and **8**) provide other additional information. For instance, for Pavia data set (**Figure 7**), the FS score *svm.lin.K* tends to select the first bands (around band 5) of the spectrum, even though these bands are quite noisy. *ml.K* score performs very well considering classification performance but tends to be very sensitive to a probable atmospheric artefact, paying a lot of importance to bands from band 80 to band 85 and especially to band 82. This part of the spectrum corresponds to an atmospheric correction artefact, and not to a true discriminant phenomenon. This trend to select bands corresponding to this artefact is also observed for other FS scores.

Using classification confidence-based FS scores instead of classic classification accuracy scores tends to improve results. This trend can be observed in **Figure 6** both for RF and SVM: using *rf.conf* instead of *rf.K* or using *svm.lin.conf* instead of

and Indian Pines (**Figure 8**). Thus, using a confidence-based FS score tends to

*Spectral Optimization of Airborne Multispectral Camera for Land Cover Classification…*

Classification qualities reached using both tested embedded criteria (*svm.lin. marg* and *rf.oob*) appeared to be generally less good than using the wrapper scores associated with these two classifiers. This is especially clear for *svm.lin.marg*, which

Even though it performs quite well, feature subsets selected using *rf.oob* lead generally to worse classification performance than using the best wrapper scores,

Considering classification quality (**Figure 6**), mutual information (*mi*) leads to different results for the various data sets: on Pavia data set, feature subsets selected according to this FS score enable to reach classification performance as good as the best wrapper scores, while on Indian Pines data set, obtained results are among the worst. Band importance profiles (**Figures 9** and **10**) obtained using *mi* are also very different from those obtained for the other FS scores: they tend to neglect wide parts of the spectrum. This is especially striking for Indian Pines data set, where bands from 30 to 100 are not considered as important, contrary to other FS scores. The other tested filter FS scores are separability measures. They perform very well considering classification quality (**Figure 6**): they lead to classification results

as good or better than those obtained using the best wrapper FS scores. In particular, the Jeffries-Matusita separability distance (*jm*) appears to be one of

However, considering band importance profiles obtained for Pavia (**Figure 9**) using *jm*, it tends to strongly focus on a part of the spectrum (bands 80 to 85) concerned by artefacts caused by atmospheric corrections. This phenomenon also

*Indian Pines test site: band importance profiles obtained using several FS criteria: (a) ml.K, (b) svm.lin.K,*

regularize feature importances and thus to stabilize feature selection.

is the worst FS score, for all classifiers used at evaluation step.

and especially *rf.K* and *rf.conf*, also associated to random forests.

*3.4.2.2 Comparison of wrapper and embedded criteria*

*DOI: http://dx.doi.org/10.5772/intechopen.88507*

*3.4.2.3 Comparison of wrapper and filter criteria*

the best FS scores.

**Figure 8.**

**87**

*(c) rf.K and (d) rf.conf.*

#### **Figure 6.**

*Mean kappa coefficients obtained by classifiers RBF kernel SVM (red), RF (blue) and ML (yellow) using band subsets selected using the different FS criteria for the three data sets. From (a-c): Pavia, Indian Pines and Salinas.*

#### **Figure 7.**

*Pavia test site: band importance profiles obtained using several FS criteria: (a) ml.K, (b) svm.lin.K, (c) rf.K and (d) rf.conf.*

*svm.conf* tends to slightly improve classification quality. Considering band importance profiles obtained for Pavia (**Figure 7**), using *rf.conf* instead of *rf.K* avoids to select the noisy bands around band five. Band importance profiles obtained using *rf. conf* also seem to be slightly more regular than using *rf.K* both for Pavia (**Figure 7**)

*Spectral Optimization of Airborne Multispectral Camera for Land Cover Classification… DOI: http://dx.doi.org/10.5772/intechopen.88507*

and Indian Pines (**Figure 8**). Thus, using a confidence-based FS score tends to regularize feature importances and thus to stabilize feature selection.

#### *3.4.2.2 Comparison of wrapper and embedded criteria*

Classification qualities reached using both tested embedded criteria (*svm.lin. marg* and *rf.oob*) appeared to be generally less good than using the wrapper scores associated with these two classifiers. This is especially clear for *svm.lin.marg*, which is the worst FS score, for all classifiers used at evaluation step.

Even though it performs quite well, feature subsets selected using *rf.oob* lead generally to worse classification performance than using the best wrapper scores, and especially *rf.K* and *rf.conf*, also associated to random forests.

#### *3.4.2.3 Comparison of wrapper and filter criteria*

Considering classification quality (**Figure 6**), mutual information (*mi*) leads to different results for the various data sets: on Pavia data set, feature subsets selected according to this FS score enable to reach classification performance as good as the best wrapper scores, while on Indian Pines data set, obtained results are among the worst. Band importance profiles (**Figures 9** and **10**) obtained using *mi* are also very different from those obtained for the other FS scores: they tend to neglect wide parts of the spectrum. This is especially striking for Indian Pines data set, where bands from 30 to 100 are not considered as important, contrary to other FS scores.

The other tested filter FS scores are separability measures. They perform very well considering classification quality (**Figure 6**): they lead to classification results as good or better than those obtained using the best wrapper FS scores. In particular, the Jeffries-Matusita separability distance (*jm*) appears to be one of the best FS scores.

However, considering band importance profiles obtained for Pavia (**Figure 9**) using *jm*, it tends to strongly focus on a part of the spectrum (bands 80 to 85) concerned by artefacts caused by atmospheric corrections. This phenomenon also

#### **Figure 8.**

*Indian Pines test site: band importance profiles obtained using several FS criteria: (a) ml.K, (b) svm.lin.K, (c) rf.K and (d) rf.conf.*

◦ Confidence-based wrapper scores taking into account classification confidence (*rf.conf* or *svm.lin.conf*) perform better than classic wrapper scores expressed as a simple classification "hard label" error rate. This trend could be observed both in quantitative (classification performance) and qualitative (band importance profiles) analyses. Indeed, taking into account classification confidence tends to regularize feature importances

*Spectral Optimization of Airborne Multispectral Camera for Land Cover Classification…*

At the end, the most interesting FS scores are *rf.conf* for wrappers and *jm* for filters, since they lead to the best quantitative results. *rf.conf* seems to provide more stable results than *jm*, considering its more regularized band importance profile. Besides it is more robust to some artefacts (e.g. atmospheric correction artefact for Pavia). However, even though computing times were not discussed in this study, it must be added that FS selection using filter separability measures (such as *jm*) is

Thematic comments. Conclusions about interesting spectrum parts can be drawn using the importance profiles provided by the different FS criteria:

• Optimized spectral configurations are different from one FS criterion to another. Indeed, some parts of the spectrum are identified as important by most FS criteria, but other ones correspond to a clear disagreement.

• Spectrum parts considered as important can often be understood considering the spectra of classes. Indeed, they can correspond to almost constant spectrum parts located before or after a strong variation of spectra of some classes. They can also correspond to intersections between the spectra of

• For Indian Pines and Salinas scenes, no precaution was taken to handle noisy bands corresponding to the main atmospheric absorption windows. However, importance measures associated with these bands were very weak for most FS criteria (except the worse of them). Such observation can be considered as

• Band importance profiles obtained for Indian Pines are often more difficult to analyse than for Pavia. Nevertheless, some common trends could be observed, especially in the SWIR domain, where some blobs along the

to the locations of some spectral bands of the WorldView-3 satellite.

**4. Exploring bandwidth and extracting optimal spectral bands using**

spectrum are visible for most FS criteria and might correspond approximately

Works in the previous section were dedicated to the identification of a FS score. It was used for band selection, that is to say to select a subset of original bands out of a hyperspectral data set (without optimizing their weights). This section will focus on band extraction and will consider band subsets composed of spectral bands with different spectral widths. Indeed, optimizing spectral width is important to design a spectral sensor, as having wider bands is a way to limit signal noise while

an additional quality criterion for the tested FS scores.

having too wide bands can also lead to a loss a useful information.

and provide more stable feature subsets.

faster than using wrapper scores such as *rf.conf*.

*DOI: http://dx.doi.org/10.5772/intechopen.88507*

several classes.

**hierarchical band merging**

**89**

**Figure 9.**

*Pavia test site. Band importance profiles obtained using several FS criteria: (a) JM distance and (b) mutual information.*

#### **Figure 10.**

*Indian Pines test site. Band importance profiles obtained using several FS criteria: (a) JM distance and (b) mutual information.*

occurred for *bdist* and *fisher* and, as explained above, was also observed for some wrapper FS scores.

Furthermore, band importance profiles obtained using *jm* FS score seem slightly more noisy or more difficult to interpret than using the best wrapper FS scores (*rf.K,rf.conf*).
