**4. Feature representation**

The SFE2D program provides interpretation tools for feature representation in two dimensions:

### **4.1 RGB image compilation and analysis**

SFE2D users can select the RGB image compilation method to assemble a colored image from three manually chosen extracted features by PCA and ICA methods. The RGB combination makes it possible to bring out hidden characteristics. To control which extracted features are used for image compilation, the user can graphically set the label number of desired features. For example, in nICA feature extraction, the 1, 2, 3 labels mean the features 1, 2, and 3 are used for image compilation. The user can also change the polarity of the features since PCA/ICA methods distort the polarities randomly. For example, one can set �2, 1, 3 labels for RGB compilation.

A color pick algorithm is also developed to select the region of interest (ROI) based on color intensities. To find objects of a specific color, the SFE2D program assigns the low and high thresholds for each RGB color band with several clicks on the regions of interest. The more users click on the specified zones, the more accurate the thresholding works. Then, the program automatically picks up the minimums and maximums from red/green/blue bands. To reduce the effect of the outliers, only a range of 5 to 95 percentiles are kept, which results in smoother color picking tasks. **Figure 6** represents an example of testing the ROI detection method in the SFE2D program to detect the red-colored objects in a standard image.

### **4.2 The color image segmentation algorithm**

SFE2D uses a *k*-means clustering algorithm for the segmentation of color images. The segmentation helps to reduce the color space size to a manageable number of colors. The process produces a pseudo-geological map based on the spatially extracted RGB features. This eventually helps geologists to detect the hidden geological contacts and structures in geo-images. In addition, segmentation significantly reduces memory usage and speeds up image analysis by focusing on relevant information.

The output of segmentation is a set of *k* non-overlapping segments {*S*1, *S*2, … , *Sk*} that comprises the whole segmented representation of a dataset *X* in the form of [15]:

$$X = \bigcup\_{i=1}^{k} \mathbb{S}\_i \tag{15}$$

For the non-overlapping condition, we should have *Si* ∩ *S <sup>j</sup>* ¼ ∅ for *i* 6¼ *j*, where 1≤*i*, *j*≤*k* which guarantees that each cluster of data belongs uniquely to a segmented centroid.

**Figure 6.** *Color pick method for extracting the red color objects. (a) Original RGB image. (b) ROI detection.*

SFE2D incorporates a *k*-means segmentation for grayscale or RGB images. The program aims to separate extracted features from an image in different clusters iteratively. The algorithm computes a hyperdimensional centroid for each cluster. Each segment *Si* is uniquely defined by a center *ci*:

$$\mathcal{L}\_i = \frac{1}{|\mathbb{S}\_i|} \sum\_{s \in \mathbb{S}\_i} \mathfrak{s} \tag{16}$$

Where *Si* j j is the number of elements in the *i*th cluster.

The centroid gets modified interactively through minimizing a cost function that is the distance between data pattern *pj* and centroid *ci*:

$$F(c\_1, c\_1, \dots, c\_k) = \sum\_{i=1}^k \sum\_{p\_j \in C\_i} d^2 \left( p\_j, c\_i \right) \tag{17}$$

If the image is RGB, the program calculates a 3D Euclidean distance for the RGB color space. The SFE2D program reads data points *p* and returns *k* cluster centroids as *C* ¼ f g *c*1,*c*2, … , *ck* . The *k*-means algorithm for RGB image segmentation in the SFE2D program is as followed:

Step 1: Generate *k* random initial centroids: *C Rand p*ð Þ , *k* . Step 2: Repeat the following until *Si* do not change: � *Si <sup>p</sup>* : k k *<sup>p</sup>* � *ci* <sup>2</sup> <sup>≤</sup> *<sup>p</sup>* � *<sup>c</sup> <sup>j</sup>*, � � � �2 ∀*j*, 1≤ *i*, *j* ≤*k* n o such that *Si* <sup>∩</sup> *<sup>S</sup> <sup>j</sup>* <sup>¼</sup> <sup>∅</sup>*:* � *ci* <sup>1</sup> *Si* j j P *x <sup>j</sup>* ∈*Si x j* Step 3: Return all clustered segments as *C* ¼ f g *c*1, *c*2, … , *ck* .
