SFE2D: A Hybrid Tool for Spatial and Spectral Feature Extraction

*Bahman Abbassi and Li Zhen Cheng*

### **Abstract**

A crucial task for integrated geoscientific image (geo-image) interpretation is the relevant geological representation of multiple geo-images, which demands highdimensional techniques for extracting latent geological features from highdimensional geo-images. A standalone mathematical tool called SFE2D (spatiospectral feature extraction in two-dimension) is developed based on independent component analysis (ICA), continuous wavelet transform (CWT), *k*-means clustering segmentation, and RGB color processing that iteratively separates, extracts, clusters, and visualizes the highly correlated and overlapped geological features from multiple sources of geo-images. The SFE2D offers spatial feature extraction and wavelet-based spectral feature extraction for further extraction of frequency-dependent features. We show that the SFE2D is a robust tool for automated pattern recognition, fast pseudo-geological mapping, and detection of regions of interest with a wide range of applications in different scales, from regional geophysical surveys to the interpretation of microscopic images.

**Keywords:** feature extraction, independent component analysis, continuous wavelet transform, *k*-means clustering segmentation, pseudo-geological mapping

### **1. Introduction**

Global trends show that the number of major mineral discoveries is drastically declined during the last 25 years because the most outcropping mineralization systems are mapped and already excavated [1]. Therefore, a significant challenge for mineral exploration is the deep targeting of hidden resources. The "deep" targets are usually considered as resources as deep as hundreds of meters to thousand meters of depth. However, occurrences of "deep" targets are not limited to the third dimension. They can be seen as complex obscured mineralization systems barely detectable within a large amount of hyper-dimensional geoscientific data. Consequently, advanced data mining methods are needed to extract useful information from large amounts of data sets. Responding to these challenges of integrating multidisciplinary information is the motivation of present research.

Automated interpretation of multiple images is an important topic in large data interpretation and multivariate pattern recognition. No matter how accurate the geoscientific images (geo-images) are, each geo-image is only sensitive to a limited number of geological features. Therefore, different parts of the subsurface geology can be reconstructed from various geo-images. The critical question is how one can put together the interpreted jigsaw puzzles inside a high-dimensional space to

rebuild a relevant geological model. This has led us to the topic of feature extraction in the treatment of large data sets.

The fundamental problem in integrated geoscientific interpretation is the proper geological understanding of multiple geo-images, which demands high dimensional techniques for the extraction of geological information from hyperdimensional data sets. Several methods are available for feature extraction in the high dimensional space, which can be seen as a dimensionality increase problem [2]. The problem is that the increased dimensionality due to feature extraction leads to an overload of information that creates difficulties for human visual interpretation as well as machine learning optimization [2]. That is to say; latent patterns are lost in high space as one moves to higher dimensions. Several recent studies, mainly in seismic attributes analysis, have investigated the problem of high-dimensional pattern recognition to extract relevant geological features from broadband seismic data [3–7]. In mineral exploration, the treatment of multiple non-seismic data also put forth a similar pattern recognition problem, which seeks to extract relevant geological features from potential field and electromagnetic data in the form of multiple images [8].

This study aimed to provide a tool for spatial and spectral feature extraction from multiple geo-images to facilitate the identification of geoscientific features. The study employed a quantitative approach combining different source separation methods for feature extraction and representation. The output is the SFE2D program, which is a standalone 2D spatiospectral feature extraction tool based on unsupervised source separation methods like principal component analysis (PCA), independent component analysis (ICA), continuous wavelet transform (CWT), RGB color processing, and *k*-means clustering segmentation. The program is made in a MATLAB environment for feature extraction and dimensionality reduction of hyperdimensional geoscientific data sets through variance/kurtosis/negentropy maximization, *k*-mean clustering segmentation, color pick algorithm, and RGB visualizations (SFE2D ver. 2.0).

## **2. Theoretical background of source separation**

### **2.1 Feature extraction**

The theoretical framework underpinning this study is based on blind source separation (BSS) methods. Generally, BSS aims to recover latent source signals (or images) from a set of highly mixed signals (or images). Consider a mixing model, where every geo-image (**g***i*) is a mixture of several latent features (**f***j*) with different contributions in constructing the observed image. A mathematical description for this model can be formulated as a linear mixing system that links the latent features to the observed geo-image by a set of mixing weights (*aij*) that determine the contribution of each feature in the geo-image construction:

$$\mathbf{g}\_{i} = \sum a\_{ji}\mathbf{f}\_{j} \tag{1}$$

Where *j* = 1, 2, … , *n* denotes the number of latent features, and *i* = 1, 2, … , *m* indicates the number of geo-images. The vector form of Eq. (1) can be rewritten as:

$$\mathbf{g} = A\mathbf{f} \tag{2}$$

Where the unknown mixing weights are defined as the matrix ½ � *<sup>A</sup>* ð Þ *<sup>j</sup>*, *<sup>i</sup>* <sup>¼</sup> *aji*. The observed geo-images **<sup>g</sup>** <sup>¼</sup> **<sup>g</sup>***1*ð Þ *<sup>x</sup>*, *<sup>y</sup>* , **<sup>g</sup>***2*ð Þ *<sup>x</sup>*, *<sup>y</sup>* , … **<sup>g</sup>***m*ð Þ *<sup>x</sup>*, *<sup>y</sup>* � �*<sup>T</sup>* are linear mixtures of the

*SFE2D: A Hybrid Tool for Spatial and Spectral Feature Extraction DOI: http://dx.doi.org/10.5772/intechopen.101363*

$$\begin{array}{|c|c|}\hline \mathbf{f}\_{l}\left(\mathbf{x},\boldsymbol{\nu}\right) \\ \mathbf{f}\_{2}\left(\mathbf{x},\boldsymbol{\nu}\right) \\ \vdots \\ \mathbf{f}\_{n}\left(\mathbf{x},\boldsymbol{\nu}\right) \\ \end{array} \rightarrow \begin{array}{|c|}\hline \mathbf{g}\_{l}\left(\mathbf{x},\boldsymbol{\nu}\right) \\ \hline \mathbf{g}\_{2}\left(\mathbf{x},\boldsymbol{\nu}\right) \\ \vdots \\ \mathbf{g}\_{m}\left(\mathbf{x},\boldsymbol{\nu}\right) \\ \end{array} \rightarrow \begin{array}{|c|}\hline \mathbf{f}^{\*}\_{l}\left(\mathbf{x},\boldsymbol{\nu}\right) \\ \hline \mathbf{f}^{\*}\_{l}\left(\mathbf{x},\boldsymbol{\nu}\right) \\ \vdots \\ \mathbf{f}^{\*}\_{l}\left(\mathbf{x},\boldsymbol{\nu}\right) \\ \end{array}$$

**Figure 1.** *Schematic of feature extraction model.*

latent features **f** ¼ ½ � **f** *<sup>1</sup>*ð Þ *x*, *y* ,**f***2*ð Þ *x*, *y* , … **f***n*ð Þ *x*, *y <sup>T</sup>*. The *x* and *y* denote the coordinates on a two-dimensional space in the spatial feature extraction scheme. The problem is to find a separation matrix (*W*) that tends to unmix the geo-images (**g**) to recover the hidden features (**f**):

$$\mathbf{f} = \mathbf{W}\mathbf{g} = \mathbf{A}^{-1}\mathbf{g} \tag{3}$$

Where separation matrix (*W*) is the inversion of the mixing matrix and cannot be directly determined. Nevertheless, it can be estimated adaptively as an optimization problem and setting up a cost function to be minimized iteratively. This is the general formulation of a feature extraction process that iteratively eliminates or at least reduces the effect of feature overlaps (shadow effect) on the observed geo-images (**Figure 1**).

### **2.2 Spatial feature extraction through PCA and ICA**

Estimating a relevant separation matrix is possible by making assumptions about the statistical measures such as correlation. In some instances, it is common for two geo-images to be statistically correlated. For example, suppose there is a linear relationship between two gamma-ray concentration images (e.g., K vs. eTh). In that case, the amount of information that the first image provides is as same as the second one. Therefore, one can transform the bivariate data to a univariate form without losing any valuable information. This transformation is called dimensionality reduction, which is the basis of PCA algorithms. PCA algorithms utilize the maximization of second-order statistical measure (variance) for image separation and produce linearly uncorrelated images (**Figure 2a**). However, when there is a nonlinear form of correlation (dependency) between images, PCA will not work. In this case, ICA is useful to separate the geo-images into nonlinearly uncorrelated images by maximizing non-Gaussianity (**Figure 2b**).

PCA is, in fact, the first step in the ICA process. PCA looks for a weight matrix *D* so that a maximal variance of the principal components (y) of the centered geo-images (g*C*) are confirmed:

$$\mathbf{y} = D\mathbf{g}\_{\mathbf{C}} \tag{4}$$

During this preprocessing step, the mean of the input geo-images (**g***i*) is removed (also known as Centering). Then whitening is performed by eigenvalue decomposition, which results in unit variance and identity covariance matrix. The matrix *D* that is calculated for linear transformation of centered geo-images into whitened images is:

$$D = \lambda^{-1/2} \mathbf{E}^T \tag{5}$$

**Figure 2.**

*Schematic description of feature extraction process: (a) PCA through maximization of variance. The correlated bivariate data sets (***g** *axes) are transformed into univariate forms (*^**f** *axes) without losing valuable information. (b) ICA through maximization of non-Gaussianity. The correlated and interdependent (overlapped) bivariate data sets (***g** *axes) are transformed into two separate univariate independent components (*^**f** *axes).*

Where *λ* is eigenvalue and E is the eigenvector matrix of the covariance matrix of the geo-images [9].

Whitening solves half of the ICA problem. The second step in the ICA is increasing the non-Gaussianity of the whitened images or principal components (y). The problem is to find a rotation matrix *R* that during the multiplication with principal components produces the least Gaussian outputs (^**f**) that are approximations of the latent features (**Figure 3**):

$$\mathbf{f} \approx \hat{\mathbf{f}} = \boldsymbol{R}^T \mathbf{y} \tag{6}$$

### **2.3 Fast-ICA algorithms**

This study has employed two alternative methods to obtain the rotation matrix *R* based on Fast-ICA through kurtosis maximization and negentropy maximization. Both kurtosis and negentropy maximization methods performed through the Hyvärinen fixed-point algorithm employ higher-order statistics to recover the independent sources [9]. The fixed-point algorithm has a couple of properties that make it superior to the gradient-based methods: (a) It has a cubic speedy convergence criterion. (b) In contrast to gradient-based algorithms, it does not need any learning rate adjustment or parameter tuning, making it easy to implement [9].

$$\begin{array}{|c|c|}\hline \mathbf{f}\_{\mathsf{t}}\left(\mathbf{x},\boldsymbol{\mathsf{y}}\right) \\ \mathbf{f}\_{\mathsf{z}}\left(\mathbf{x},\boldsymbol{\mathsf{y}}\right) \\ \vdots \\ \mathbf{f}\_{\mathsf{n}}\left(\mathbf{x},\boldsymbol{\mathsf{y}}\right) \\ \hline \end{array} \rightarrow \begin{array}{|c|c|}\hline \mathbf{g}\_{\mathsf{t}}\left(\mathbf{x},\boldsymbol{\mathsf{y}}\right) \\ \hline \mathbf{g}\_{\mathsf{z}}\left(\mathbf{x},\boldsymbol{\mathsf{y}}\right) \\ \vdots \\ \mathbf{g}\_{\mathsf{n}}\left(\mathbf{x},\boldsymbol{\mathsf{y}}\right) \\ \hline \end{array} \rightarrow \begin{array}{|c|c|}\hline \mathbf{y}\_{\mathsf{t}}\left(\mathbf{x},\boldsymbol{\mathsf{y}}\right) \\ \hline \mathbf{y}\_{\mathsf{z}}\left(\mathbf{x},\boldsymbol{\mathsf{y}}\right) \\ \hline \vdots \\ \mathbf{y}\_{\mathsf{n}}\left(\mathbf{x},\boldsymbol{\mathsf{y}}\right) \\ \hline \end{array} \rightarrow \begin{array}{|c|c|}\hline \hat{\mathbf{f}}\_{\mathsf{t}}\left(\mathbf{x},\boldsymbol{\mathsf{y}}\right) \\ \hline \hat{\mathbf{f}}\_{\mathsf{z}}\left(\mathbf{x},\boldsymbol{\mathsf{y}}\right) \\ \hline \vdots \\ \hline \mathbf{f}\_{\mathsf{n}}\left(\mathbf{x},\boldsymbol{\mathsf{y}}\right) \\ \hline \end{array}$$

**Figure 3.** *Generalized steps of independent component analysis for feature extraction.*

### *2.3.1 Fast-ICA through kurtosis maximization*

According to the central limit theorem, the distribution of a sum of independent random variables tends toward a Gaussian (normal) distribution [9]. One can use this principle to maximize the non-Gaussianity through kurtosis maximization (kICA). Kurtosis that is the fourth-order cumulant of the whitened images can be expressed as the normalized version of the fourth moment:

$$
\hat{\mathbf{x}} \text{urt}(\mathbf{y}) = E\{\mathbf{y}^4\} - \mathbf{3} \tag{7}
$$

Where *E*{.} denotes the expectation over the unknown density of input geoimages. Kurtosis provides a measure of how Gaussian (*kurt* = 0), super-Gaussian (*kurt* > 0) or sub-Gaussian (*kurt* < 0) the probability density functions of the geoimages are. Maximization of the kurtosis of the principal components estimates the latent independent components (^**f**). The algorithm for kurtosis maximization in the present SFE2D program incorporates deflationary orthogonalization that estimates the independent components one by one:

Step 1. Center and whiten the geo-image data **<sup>g</sup>** to give y <sup>¼</sup> *<sup>D</sup>*g*C*, where *<sup>D</sup>* <sup>¼</sup> *<sup>λ</sup>*�1*=*<sup>2</sup> E*<sup>T</sup>*. Step 2. Set *N* as the number of independent components to be estimated and a counter *I* 1. Step 3. Set randomly an initial vector *W*^ *<sup>I</sup>* of unit norm (rows of the separation matrix). Step 4. Let *<sup>W</sup>*^ *<sup>I</sup> <sup>E</sup>* <sup>y</sup> *<sup>W</sup>*^ *<sup>T</sup> I* y h i<sup>3</sup> � � � <sup>3</sup>*W*^ *<sup>I</sup>*. Step 5. Rotate by deflationary orthogonalization: *<sup>W</sup>*^ *<sup>I</sup> <sup>W</sup>*^ *<sup>I</sup>* � <sup>P</sup>*<sup>I</sup>*�<sup>1</sup> *<sup>i</sup>*¼<sup>1</sup> *<sup>W</sup>*^ *<sup>T</sup> <sup>I</sup> <sup>W</sup>*^ *<sup>i</sup>* � � *<sup>W</sup>*^ *<sup>i</sup>*. Step 6. Normalize: *<sup>W</sup>*^ *<sup>I</sup> <sup>W</sup>*^ *<sup>I</sup><sup>=</sup> <sup>W</sup>*^ *<sup>I</sup>* � � � � Step 7. Go to step 4 if *<sup>W</sup>*^ *<sup>I</sup>* is not converged: *<sup>W</sup>*^ *<sup>k</sup>*þ<sup>1</sup> *<sup>I</sup>* , *<sup>W</sup>*^ *<sup>k</sup> I* �D E � � � � � 6¼ 1. Otherwise, *<sup>W</sup>*^ *<sup>I</sup>* is row *<sup>I</sup>* of the estimated separation matrix. Step 8. If I≤ N go to step 3 and set I ¼ I þ 1 to estimate the next row of the separation matrix. Step 9. Obtain the latent features by ^*<sup>f</sup>* <sup>¼</sup> *<sup>W</sup>*^ g where *<sup>W</sup>*^ <sup>¼</sup> *<sup>W</sup>*^ 1, *<sup>W</sup>*^ 2, … ,*W*^ *<sup>I</sup>* � �*<sup>T</sup>* .

### *2.3.2 Fast-ICA through fixed-point negentropy maximization*

For large-scale studies, kurtosis maximization is time-consuming because of its computational complications. Also, kurtosis is not a robust measure of non-Gaussianity in the presence of unknown noises and image artifacts. Kurtosis is very sensitive to outliers, that is, a single erroneous outlier value in the tails of the distribution makes kurtosis extremely large. Therefore, using kurtosis is well justified only if the independent components are sub-Gaussian and there are no outliers.

Another principle in information theory enables us to obtain the rotation matrix *R* more efficiently in the presence of outliers. It states that the spatial complexities of the input images are equal to or greater than that of the least complex latent features [9]. This general principle enables us to maximize the non-Gaussianity of multiple geo-images through negentropy maximization (nICA).

Negentropy is based on the information-theoretic quantity of entropy. The entropy (*H*) of a geo-image (g) with probability density of *p*ð Þg is defined as [9]:

$$H(\mathbf{g}) = -\int p(\mathbf{g}) \log p(\mathbf{g}) d\mathbf{g} \tag{8}$$

The more random (unpredictable and unstructured) the variable is, the larger its entropy. Between all random variables, the Gaussian variable has the largest entropy. Negentropy of a geo-image **g** is normalized differential entropy of **g**, which is the difference between the entropy of that geo-image **g** and the entropy of a Gaussian random vector of the same covariance matrix as **g** (**g**gauss):

$$\log(\mathbf{g}) = H\left(\mathbf{g}\_{\text{gauc}}\right) - H(\mathbf{g}) \tag{9}$$

Negentropy is always non-negative and is zero when the signal has a Gaussian distribution. In other words, the more random (unpredictable and unstructured) the variable is, the larger its entropy [9]. Negentropy approximated in this study is based on using classical higher order cumulants given as:

$$\text{mg}\left(\mathbf{y}\right) \approx \frac{1}{12}E\left\{\mathbf{y}^{3}\right\}^{2} + \frac{1}{48}Kurt\left(\mathbf{y}\right)^{2} \tag{10}$$

Since entropy estimation is not directly performed on geo-images, and instead, centered/whitened images (principal components) are used for entropy calculations, therefore, **g** is replaced by **y**. Replacing polynomial functions y3 and y<sup>4</sup> by another non-quadratic function G*<sup>i</sup>* that does not grow too fast results in robust negentropy estimation (*i* is an index denoting odd and even functions):

$$\text{meg}\left(\mathbf{y}\right) \approx k\_1 \left[ E\left\{ G^1\left(\mathbf{y}\right) \right\} \right]^2 + k\_2 \left[ E\left\{ G^2\left(\mathbf{y}\right) \right\} - E\left\{ G^2(\nu) \right\} \right]^2 \tag{11}$$

Where *ν* is a standardized Gaussian variable with unit variance and zero mean, and *k*<sup>1</sup> and *k*<sup>2</sup> are positive constants. In the case where only one non-quadratic function *G* is used, the approximation becomes simpler:

$$\text{neg}\left(\mathbf{y}\right) \approx \left[E\left\{G\left(\mathbf{y}\right)\right\} - E\{G(\nu)\}\right]^2\tag{12}$$

In this moment-based approximation, we used a slow-growing exponential function for *G*:

$$G(\mathbf{y}) = -e^{\left(-\mathbf{y}^2/2\right)}\tag{13}$$

The criterion for negentropy maximization is based on bringing an objective function to an approximated maximum value. The algorithm for negentropy maximization in the present SFE2D program is as followed:

Step 1. Center and whiten the geo-image data **<sup>g</sup>** to give y <sup>¼</sup> *<sup>D</sup>*g*C*, where *<sup>D</sup>* <sup>¼</sup> *<sup>λ</sup>*�1*=*<sup>2</sup> E*<sup>T</sup>*. Step 2. Set *N* as the number of independent components to be estimated and a counter *I* 1. Step 3. Set randomly an initial vector *W*^ *<sup>I</sup>* of unit norm (rows of the separation matrix). Step 4. Let *<sup>W</sup>*^ *<sup>I</sup> <sup>E</sup>* <sup>y</sup>*<sup>g</sup> <sup>W</sup>*^ *<sup>T</sup> I* y n o h i � *E g*<sup>0</sup> *<sup>W</sup>*^ *<sup>T</sup> I* y n o h i *<sup>W</sup>*^ , where *<sup>g</sup>* <sup>y</sup> � � <sup>¼</sup> <sup>y</sup>*<sup>e</sup>* �y<sup>2</sup> ð Þ *<sup>=</sup>*<sup>2</sup> . Step 5. Rotate by deflationary orthogonalization: *<sup>W</sup>*^ *<sup>I</sup> <sup>W</sup>*^ *<sup>I</sup>* � <sup>P</sup>*<sup>I</sup>*�<sup>1</sup> *<sup>i</sup>*¼<sup>1</sup> *<sup>W</sup>*^ *<sup>T</sup> <sup>I</sup> <sup>W</sup>*^ *<sup>i</sup>* � � *<sup>W</sup>*^ *<sup>i</sup>* Step 6. Normalize: *<sup>W</sup>*^ *<sup>I</sup> <sup>W</sup>*^ *<sup>I</sup><sup>=</sup> <sup>W</sup>*^ *<sup>I</sup>* � � � �. Step 7. Go to step 4 if *<sup>W</sup>*^ *<sup>I</sup>* is not converged: *<sup>W</sup>*^ *<sup>k</sup>*þ<sup>1</sup> *<sup>I</sup>* , *<sup>W</sup>*^ *<sup>k</sup> I* �D E � � � � � 6¼ 1. Otherwise, *<sup>W</sup>*^ *<sup>I</sup>* is the row *<sup>I</sup>* of the estimated separation matrix. Step 8. If I≤ N go to step 3 and set I ¼ I þ 1 to estimate the next row of the separation matrix. Step 9. Obtain the latent features by ^*<sup>f</sup>* <sup>¼</sup> *<sup>W</sup>*^ g where *<sup>W</sup>*^ <sup>¼</sup> *<sup>W</sup>*^ 1, *<sup>W</sup>*^ 2, … ,*W*^ *<sup>I</sup>* � �*<sup>T</sup>* .
