**2.3 Data processing and encoding**

To pre-process the text data, three operations were implemented: first, selection of lower-casing and de-accenting; second, removing stop words; and third, selection of words that are part of a speech (nouns, pronouns, adjectives, verbs, adverbs, prepositions, conjunctions, and interjections).

**Figure 2.** *World cloud of 1930 mission statements from humanitarian organizations.*

#### *The Use of Artificial Intelligence to Bridge Multiple Narratives in Disaster Response DOI: http://dx.doi.org/10.5772/intechopen.108196*

To encode the text data to numerical vectors, the research uses algorithms for word embedding (Natural Language Processing), a technique that assigns highdimensional vectors (embeddings) to words in a text corpus, preserving their syntax and semantics. Tshitoyan et al. [8] demonstrated that scientific knowledge could efficiently be encoded as information-dense word embeddings without human labeling or supervision. The algorithm used to transform the text into word embeddings is an Artificial Neural Network called Word2Vec [9, 10], which uses the continuousbag-of-words (CBOW) method. The algorithm learns the embedding by maximizing the ability of each word to be predicted from its set of context words using vector similarity. The output of Word2Vec is a 50-dimensional numerical vector for each word from the text corpus.

To continue with the experiment, a subsample of 40% of the pre-processed texts (both abstracts of academic writing and mission statements of humanitarian organizations) were fed as training data to a Word2Vec algorithm to use the knowledge acquired from the previous training to create a domain-specific model Word2VecDR (this is call transfer learning). After the Word2VecDR was trained, it was able to encode all texts from the dataset into a numerical representation––every word of the text was assigned 50 numerical vectors. The texts ranged from 15 to 5668 words, having an average of 824 words per text. Therefore, if an abstract contains 100 words, the resulting vector form Word2VecDR is a list of 100 sub-lists with 50 elements each.

Mikolov et al. [9] observed that simple algebraic operations on word embeddings, e.g., vector "King" – vector "Man" + vector "Woman," results in a vector that is closest to the vector "Queen" concluding that the resulting vector is content-related. Furthermore, researchers have also applied statistical operations such as mean or average values onto a list of word embeddings and had successful results that captured the content of the text (examples of the former can be found in [11, 12]. However, when calculating the mean value or adding each word vector, the resulting vector will be an abstraction (reduced) of its content, hence, losing information. To encapsulate as much information as possible from the list of numerical vectors, the present research proposed to use Higher-Order Statics (HOS). In HOS, mean (X) and standard deviations (s) are related to the first and the second-order moments––one could calculate up to n-order moments. Skewness (ski) can be calculated from the third-order moments of the data distribution, which measures the direction of the tail in comparison to the normal distribution, where Y is the Median.

$$\text{slk}\_i = \frac{\left(X - Y\right)}{s} \tag{1}$$

If the resulting number is positive, the data are skewed to the left, leaving the tail pointing to the right side of the distribution. If the resulting number is negative, the tail is on the left side of the distribution. Kurtosis (ki) is the fourth-order moment, which measures how heavy the tails of a distribution are [13], where N is the sample size.

$$k\_i = \frac{\sum\_{i=1}^{N} \frac{\left(X\_i - X\right)}{N}}{s^4} \tag{2}$$

By applying the fourth moments of HOS to the data, each text is represented by a numerical vector of 200 dimensions or four sublists of 50 dimensions for each HOS

moment (mean, standard deviation, skewness, and kurtosis). Two advantages of encoding data with HOS are that, compared with the embedding vectors in deep autoencoders, the resulting vectors of HOS are meaningful and directly interpretable [14]. Second, by using HOS, the computational time for clustering the text data reduces exponentially since the length of the numerical vector is decreased. Additionally, by using the fourth moments of HOS, each resulting numerical vector encapsulates more information than when using only one statistical value (first or second moment).

#### **2.4 Data representation and clustering**

Clustering and representation techniques assist with adequately exploring data collection and identifying clusters of similar information that share similar properties [15]. For example, to cluster text, the following algorithms have been used: Support Vector Machine (SVM) [16], k-means [17], Principal Component Analysis (PCA) [18], and Kohonen Self-Organizing Map (SOM) [19]. A full review of different clustering algorithms can be found in [20]. As shown in the work of [19], the unsupervised ML algorithm SOM [21, 22] has proven to have excellent performance when clustering text data and reducing its dimensionality. Additionally, as presented in the work of [23].

*"SOM acts as a nonlinear data transformation in which data from a high-dimensional space are transformed into a low-dimensional space (usually a space of two or three dimensions), while the topology of the original high-dimensional space is preserved. SOM has the advantage of delivering two-dimensional maps that visualizes data clusters that reflect the topology of the original high-dimensional space."*

In the proposed experiment, the algorithm of choice for Clustering is SOM, which takes advantage of both clustering and dimensionality reduction [21, 22]. SOM acts as a nonlinear data transformation in which data from a high-dimensional space are transformed into a low-dimensional space (usually a space of two or three dimensions) while preserving the topology of the original high-dimensional space. Topology preservation means that if two data points are similar in the highdimensional space, they are necessarily close in the new low-dimensional space, and hence, they are placed within the same cluster. This low-dimensional space, usually represented by a planar grid with a fixed number of points, is called a map. Each node of this map has specific coordinates (xi,1, xi,2) and an associated n-dimensional vector or Best Matching Unit (BMU) in such a way that similar data points in the high dimensional space are given similar coordinates. Moreover, each node of the map represents the average of the n-dimensional original observed data that, after iteration, belongs to this node [14]. SOM is an algorithm that helps navigates a dataset. It represents the many dimensions of a dataset into one or two dimensions, allowing a deep understanding of such a dataset.

### **3. Results**

At this point, the text of the academic publications and the humanitarian mission statements are comparable since both were encoded with the same method (Word2Vec). Therefore, they can be clustered using the SOM algorithm presented in the previous section. The first attempts to unify the narratives showed that the
