**3. Which band selection criterion?**

This study is a comparison of **FS criteria that can be optimized using generic optimization heuristics**, thus excluding several specific embedded or ranking approaches. The following FS criteria (listed in **Table 3**) were evaluated.

#### **3.1 Compared FS criteria**

#### *3.1.1 Filter FS criteria*

Filter criteria are independent from any classifier. Only scores assessing the relevance of feature subsets were considered, excluding filter FS methods ranking features independently according to an individual feature score (e.g. ReliefF).

#### *3.1.1.1 Separability*

Separability measures are used to identify the feature subsets achieving the best class distinction. Fisher, Bhattacharyya and Jeffries-Matusita measures [30, 35, 45, 52] are such scores. They were used assuming Gaussian class models. Let *μ<sup>i</sup>* ! and Σ*<sup>i</sup>* be the mean and covariance matrices of the spectral distribution of class *i*. Fisher separability between classes *i* and *j* is defined in equation (1)

$$F\_{\vec{\mu}} = \frac{\left(\overrightarrow{\mathbf{w}} \cdot \left(\overrightarrow{\mu\_i} - \overrightarrow{\mu\_j}\right)\right)^2}{\overrightarrow{\mathbf{w}} \cdot \left(\Sigma\_i + \Sigma\_{\vec{\mu}}\right)} \text{ where } \overrightarrow{\mathbf{w}} = \left(\Sigma\_i + \Sigma\_j\right)^{-1} \left(\overrightarrow{\mu}\_i - \overrightarrow{\mu\_j}\right) \tag{1}$$

Bhattacharyya separability between classes *i* and *j* is defined by equation 2.


*3.1.1.2 Mutual information*

*DOI: http://dx.doi.org/10.5772/intechopen.88507*

referred to as *mi* in **Table 3**.

*3.1.2 Wrapper and embedded FS criteria*

*J S*ð Þ¼ <sup>X</sup> *f* ∈*S*

not require an optimization of hyper-parameters were used:

more precisely the Kullback-Leibler measure.

linear separation between classes.

• **Decision trees (DT)** [19].

**77**

labelled by its most probable class according to the model.

equation 6.

Another FS criterion based on high-order statistics from information theory, e.g. mutual information (MI), was adapted from [14] and tested: it took into account both feature-class dependencies and between feature correlations. It is defined in

*I C*ð Þ� *; <sup>f</sup>* <sup>1</sup>

*Spectral Optimization of Airborne Multispectral Camera for Land Cover Classification…*

#*S* X *f* ∈*S*

for a feature subset *S*, where *I C*ð Þ *; f* is the MI between feature *f* and classes, *I*ð Þ *f; s* is the MI between features *f* and *s* and *H*ð Þ*f* is the entropy of feature *f*. It is

Several classifiers were used to define wrapper scores related to their classification performances achieved using feature subsets. Only fast classifiers which did

• **Maximum likelihood classification (ML)**: assuming a Gaussian model for the spectral distribution of classes, mean vectors and covariance matrices are estimated for each class during the training step. Each new sample is then

• **SAM and SID**: these classifiers are specific to hyperspectral data. The *spectral angle mapper* (SAM) consists in classifying a sample according to the angle between its spectrum and reference spectra. The *spectral information divergence* [42] comes from dissimilarity measures between statistical distributions and

• **Support Vector Machine (SVM)** [67]: SVM has been intensively used to classify remote sensing data and especially hyperspectral data [2, 15, 28]. Training a SVM classifier aims at estimating the best frontiers between classes. Only a one-against-one linear SVM was used here. Indeed, it is fast and enables to avoid an optimization of hyper-parameters, contrary to other kernels. Besides, using a linear SVM introduces a constraint to select bands achieving a

• **Random forests (RF)** [41] is a modification of bagging applied with decision trees. It can achieve a classification accuracy comparable to boosting [41] or SVM [33]. It does not require assumptions on the distribution of the data, which is interesting when different types or scales of input features are used. It was successfully applied to remote sensing data such as multispectral data,

hyperspectral data or multisource data. This ensemble classifier is a combination of tree predictors built from multiple bootstrapped training samples. For each node of a tree, a subset of features is randomly selected. Then, the best feature

classification, each tree gives a unit vote for the most popular class at each input

with regard to Gini impurity measure is used for node splitting. For

instance, and the final label is determined by a majority vote of all trees.

X *<sup>s</sup>*∈*S; <sup>s</sup>*6¼*<sup>f</sup>* *I*ð Þ *f; s*

*<sup>H</sup>*ð Þ*<sup>f</sup> :H s*ð Þ (6)

#### **Table 3.**

*Selected FS criteria to be compared.*

$$B\_{\overrightarrow{\mu}} = \frac{\mathbf{1}^{\mathrm{t}}}{\mathbf{8}} \left( \overrightarrow{\mu\_{i}} - \overrightarrow{\mu\_{j}} \right) \Sigma^{-1} \left( \overrightarrow{\mu\_{i}} - \overrightarrow{\mu\_{j}} \right) + \mathbf{0.5ln} \left( \frac{\det \Sigma}{\sqrt{\det \Sigma\_{i} \det \Sigma\_{j}}} \right) \text{ where } \Sigma = \frac{\Sigma\_{i} + \Sigma\_{j}}{2} \quad \text{(2)}$$

As Bhattacharyya and Fisher separability measures are defined for binary problems, their mean overall possible pairs of classes were here used as FS criteria. To sum it up, the next separability measures were used as FS criteria:

• Mean Fisher (*fisher*) separability measures calculated over all pairs of classes (equation 3):

$$\frac{1}{\text{mb\\_pairs\\_of\\_classes}} \sum\_{i=1}^{c-1} \sum\_{j=i+1}^{c} F\_{ij} \tag{3}$$

• Mean Bhattacharyya (*Bdist*) separability measures calculated over all pairs of classes (equation 4):

$$\frac{1}{\text{mb\\_pairs\\_of\\_classes}} \sum\_{i=1}^{c-1} \sum\_{j=i+1}^{c} B\_{ij} \tag{4}$$

• Jeffries-Matusita measure (*jm*) defined in equation 5:

$$J\!M = \sum\_{i=1}^{c-1} \sum\_{j=i+1}^{c} \left(\mathbf{1} - e^{-B\_{\bar{\eta}}}\right) \tag{5}$$

*Spectral Optimization of Airborne Multispectral Camera for Land Cover Classification… DOI: http://dx.doi.org/10.5772/intechopen.88507*

## *3.1.1.2 Mutual information*

Another FS criterion based on high-order statistics from information theory, e.g. mutual information (MI), was adapted from [14] and tested: it took into account both feature-class dependencies and between feature correlations. It is defined in equation 6.

$$J(\mathbb{S}) = \sum\_{f \in \mathbb{S}} I(\mathbb{C}, f) - \frac{1}{\#\mathbb{S}} \sum\_{f \in \mathbb{S}} \sum\_{s \in \mathbb{S}; s \neq f} \frac{I(f, s)}{H(f) \cdot H(s)} \tag{6}$$

for a feature subset *S*, where *I C*ð Þ *; f* is the MI between feature *f* and classes, *I*ð Þ *f; s* is the MI between features *f* and *s* and *H*ð Þ*f* is the entropy of feature *f*. It is referred to as *mi* in **Table 3**.
