Application of Clustering Methods

### **Chapter 5**

## Application of Jump Diffusion Models in Insurance Claim Estimation

*Leonard Mushunje, Chiedza Elvina Mashiri, Edina Chandiwana and Maxwell Mashasha*

### **Abstract**

We investigated if general insurance claims are normal or rare events through systematic, discontinuous or sporadic jumps of the Brownian motion approach and Poisson processes. Using firm quarterly data from March 2010 to December 2018, we hypothesized that claims with high positive (negative) slopes are more likely to have large positive (negative) jumps in the future. As such, we expected salient properties of volatile jumps on the written products/contracts. We found that insurance claims for general insurance quoted products cease to be normal. There exist at times some jumps, especially during holidays and weekends. Such jumps are not healthy to the capital structures of firms, as such they need attention. However, it should be noted that gaps or jumps (unless of specific forms) cannot be hedged by employing internal dynamic adjustments. This means that, jump risk is non-diversifiable and such jumps should be given more attention.

**Keywords:** insurance claims, jumps, diffusion models, insurance claims, general insurance, volatility, reserving

### **1. Introduction**

Insurance claim jumps are irregularities of the claims frequency from the policyholder to the insurer. They are crucial and fundamental in the understanding and tackling of insurable risks. We therefore, explore insurance claim jumps in general insurance products. Specifically, we investigate whether claims are associated with systematic, sporadic or discontinuous jumps or they undergo through a normal process. Our aim was to explore if insurance claims are rare or normal events using the Brownian motion model and Poisson processes in testing diffusion and jump risk. The second aim was to explore how the identified jumps affect the company's solvency status. We put forward that, the knowledge of claim jumps is useful in proper pricing of products and better claim reserve calculations. We hypothetically state that, persistent claim jumps lead to the ruin problem. Furthermore, we conjectured that claims with high positive (negative) slopes are more likely to have large positive (negative) jumps in the future. A mismatch between liabilities and assets is central to insurance.

High frequency claims exhibit fat-tailed distributions (excess kurtosis), skewness and are in most cases clustered together. We can say, the infrequent movements of large magnitude in claim counts are attributed to sudden-jumps that we want to really explore in this study.

Literally, diffusion models are tools used to describe the movement, decay and evolution of products or items in a given environment over a specified period. The variables are normally random in nature. The general application of diffusion processes is to describe the evolution of asset/product's behavior over time in terms of their prices or returns. In finance, we see the application of the models in explaining the evolution of asset returns [1, 2]. Randomness and persistence are two salient properties of claim jumps and volatility. Jump diffusion models are applied in the financial arena to estimate stock volatilities of both prices and returns [3–6]. The statistical properties of claim amounts have long been of curiosity to insurers and actuaries in pricing and risk management. Higher order expectations are less considered as much of the information about any financial or insurance data is believed to have been carried by the standard arithmetic mean and standard deviations. However special cases like positive kurtosis implies concave, U-shaped implied Black-Scholes volatility (IV) curves. Practitioners rarely do taking higher order expectations in statistical distributions such as the Gaussian, Binomial, and Poison and so on.

Claimant distributions are Longley modeled using the compound poison and gamma distributions where the former captures the frequency and the latter captures the severity of the claims [7]. No serious attention was exerted to the jumps associated with the insurance claims and the jump effects too. In general, insurance where the frequency of claim arrival is high, jump analysis is quite necessary utmost for economical reserving and capital solvency.

In option pricing, the use of Black and Scholes-type formulae is considered to price European options on written underlying assets such as stocks, foreign currencies, commodities and interest rates. We therefore intend to apply the taste of diffusion models in modeling the evolution and behavior of insurance claims for the written non-life products. The Jump Diffusion model chosen in this study can potentially explain the evolution of claims and its behavior (frequency and severity) more accurately at the expense of making the market incomplete, since jumps in premiums cannot be hedged easily. The reason behind the existence of claim reserves such as the unearned premium reserves is a key indication of the jumps in premium payments. However, underestimation is commonly burdening insurers. Underestimation is a tendency of deriving and providing values, which are excessively low and unfavorable. In our context, underestimation negatively affects the insurer's capital structures and reserving. Thus, jump is indeed an important aspect that should be taken into consideration at regular times. We do this using the Gaussian, Poison model and by extending Merton's [1, 8] jump-diffusion model, which we presented in our methods section. The remainder of our paper is organized as follows. The next section generalizes our jump diffusion models to a firm level. Empirical tests for the presence of jump components in the claims are contained in Section 3. Section 4 concludes the paper.

### **2. Materials and tools**

The study employed the model contained in [1, 5]. We interpret the model as the one which contains a finite number of insurance contracts and insurers and insured. The model is based on the following assumptions:


$$\frac{d\mathbf{C}\_{j}}{\mathbf{C}\_{j}} = a\_{j}d\mathbf{t} + \sigma\_{j}d\mathbf{Z}\_{j} + \left(-\lambda\_{j}\mathbf{K}\_{j}dt + \pi\_{j}d\mathbf{Y}\_{j}\right), j = \mathbf{1}:m\dots \tag{1}$$

where *Cj*ð Þ*t* is the claim amount of a contract *j* at time *t*; *αj*, *σj*, *λj*, and *Kj* are constants where *α<sup>j</sup>* and *σ<sup>j</sup>* are the drift and diffusion components respectively; *dZj* is a Wiener process; *dYj* is a Poisson process with parameter *λj*; *π<sup>j</sup>* is the jump amplitude with expected value equal to *Kj*; and *dZj*, *dYj*, and *π<sup>j</sup>* are independent.


We now rewrite assumption 4 in an equivalently alternate way that separates systematic and unsystematic risk components.

Consider the diffusion part of assumption 4,

$$d\mathbf{D}\_{j} = a\_{j}d\mathbf{t} + \sigma\_{j}d\mathbf{Z}\_{j}; j = \mathbf{1}; \dots; m\dots \tag{2}$$

Following the argument from Ross [9], expression (2) implies that there exists. {*uj*, *fj* , *gj* , *dØ*, *dWj*g; j = 1, … m, such that

$$dD\_j = a\_j dt + f\_j d\mathcal{O} + \mathcal{g}\_j \mathcal{W} d\_j; j = 1; \dots; m \dots \tag{3}$$

where *fj*<sup>2</sup> <sup>þ</sup> *<sup>g</sup>*<sup>2</sup> *<sup>j</sup>* <sup>¼</sup> *<sup>σ</sup>*<sup>2</sup> *<sup>j</sup>* ; *d*Ø, *dWj* are Wiener processes;

$$E\left[d\mathcal{O}d\mathcal{W}\_j\right] = \mathbf{0}, j = 1, \ldots, m;$$

and,

$$\sum\_{j=1}^{m} u\_j = 1,\\
\left(\mathbf{x} + \mathbf{a}\right)^n = \sum\_{j=1}^{m} u\_j \left(\mathbf{g}\_j dW\_j\right) = \mathbf{0} = \sum\_{j=1}^{m} u\_j a\_j > r \dots \tag{4}$$

It is always likely to decompose a restricted number of normal arbitrary variables into a common factor, *dØ*, and error terms, *dWj*, which are normally distributed. The key property of normal claims employed is that covariance of zero implies numerical independence. This same assumption is confirmed in asset prices and returns analysis [10]. Note that *dØ*, *dWj* will be independent of *dYj* and *π<sup>j</sup>* by assumption 4. This disintegration gives *dØ* the interpretation of being the unsystematic risk factor.

Substitution of expression (3) and (4) into (1) gives assumption 6: There are m risky insurance contracts whose claims satisfy:

$$\frac{d\mathbf{C}\_j}{\mathbf{C}\_j} = a\_j dt + f\_j d\mathbf{Q} + \mathbf{g}\_j dW\_j + \left(-\lambda\_j K\_j dt + \pi\_j dY\_j\right), j = 1, \dots, m \dots \tag{5}$$

where *Cj*ð Þ*t* is the claim of a contract *j* at time *t*; *αj*, *fj* , *gj* , *λj*, *Kj* are constants; *dØ*, *dWj* are Wiener processes; *dYj* is a Poisson process with parameter *λj*; *π<sup>j</sup>* is the jump amplitude with expected value equal to *Kj*; and *dØ*, *dWj*, *dYj*, *π<sup>j</sup>* are independent. The jump component in expression (5), �*λjKjdt* þ *πjdYj* � �, infers that insurance claims can have discontinuous ample paths. This generalizes existing models.

### **3. Data and model**

The section tests the written insurance contracts claims to see if they contain jumps. If no jump component is present, then this would be consistent with the proposition of the previous deduction. In addition, it implies that the claims are normal events. Thus, the satisfaction of instantaneous claim reserves calculation frameworks such as the Chain ladder method and pricing models (collective risk model). We used the written insurance contacts and the recorded claims for a period spanning from March 2010 to December 2018. We performed the following hypothesis tests:

*H*0, jump risk is diversifiable.

*H*1, jump risk is non-diversifiable.

From the above hypothesis, we will see whether jump risk leads to capital insolvency for insurance firms. We will survey the sample path of the claims. To advance the testing procedure, note that under expression (5) the insurance claims dynamics are given by:

$$\frac{d\mathbf{C}}{\mathbf{C}} = \sum\_{j=1}^{m} \mathbf{C}\_{j} a\_{j} dt + \left(\sum\_{j=1}^{m} \mathbf{C}\_{j} f\_{j}\right) d\mathbf{C} + \sum\_{j=1}^{m} \mathbf{C}\_{j} \left(\mathbf{g}\_{j} d\mathbf{W}\_{j} - \lambda\_{j} \mathbf{K}\_{j} dt \pi\_{j} d\mathbf{Y}\_{j}\right) + \log \mathbf{V}\_{j} \tag{6}$$

Where, *<sup>C</sup>* <sup>¼</sup> <sup>P</sup>*<sup>n</sup> <sup>j</sup>*¼<sup>1</sup>*mjCj*, *log Vj* � *<sup>i</sup>:i:d:<sup>N</sup> <sup>α</sup>*, *<sup>σ</sup>*<sup>2</sup> ð Þ, normally distributed and models jumps in claims. Under the null hypothesis, expression (6) reduces to:

$$\frac{d\mathbf{C}}{\mathbf{C}} = adt + \sigma d\mathfrak{O} \dots \tag{7}$$

Where, *<sup>α</sup>* <sup>¼</sup> <sup>P</sup>*<sup>n</sup> <sup>j</sup>*¼<sup>1</sup>*mjα<sup>j</sup>* and *<sup>σ</sup>* <sup>¼</sup> <sup>P</sup>*<sup>n</sup> <sup>j</sup>*¼<sup>1</sup>*mjfj:* Under the alternative hypothesis, expression (6) reduces to:

$$\frac{d\mathbf{C}}{\mathbf{C}} = a'dt + \sigma d\mathfrak{O} + dq \dots \tag{8}$$

where *dq* ¼ *πdY* denotes a Poisson process with parameter *λ*, *π* ¼ jump amplitude with estimated value equal to *K*, and *α*' ¼ *α*—*λK*.

Another assumption is added to (8), that is, ð Þ *π* has a lognormal distribution with parameters (*a*, *b*<sup>2</sup> ). We add this assumption to easy up the Maximum Likelihood Estimation procedure in estimating the parameters of Eqs. (7) and (8).


#### **Table 1.**

*Diffusion process parameter estimates.*

Now, we conveniently re-write the hypothesis to be tested as follows: *H*0, jump risk is diversifiable

$$\frac{d\mathbf{C}}{C} = adt + \sigma d\mathfrak{O} \dots \tag{9}$$

*H*1, jump risk is non-diversifiable

$$\frac{d\mathbf{C}}{C} = a'dt + \sigma d\mathfrak{O} + dq \dots \,\tag{10}$$

and ð Þ *<sup>π</sup>* is dispersed lognormal *<sup>a</sup>*, *<sup>b</sup>*<sup>2</sup>

Now, to properly test the above stated null hypothesis, a likelihood ratio test can be used: *A* ¼ �2ð Þ *ln Lr*— *ln Lu* , where *Lr* is the likelihood value for the reserved density function (i.e., the null hypothesis, Eq. (9)) and *Lu* signifies the likelihood function for the unconstrained density function (i.e., the alternative hypothesis, Eq. (10)).

**Table 1** presents estimates of parameters of the diffusion-only process for diverse observation intervals and time periods. The results suggest that the total claims frequency and severity are not constant over time. The total standard deviation of claims on the firm is measured by the total claims index over 8-year period.

**Table 1** is a summary of the parameter estimates of the diffusion model used over a time horizon for a basket of diversified observations. Having the parameter values the jump-diffusion model can be safely used to infer the likely consequences of the claim jumps towards an efficient insurance engineering. The jump probability is our spanner for dealing with ruin issues and proper reserve estimations. Jump deviance is the standard deviation of the jumps, which gives the spread of the claim jumps (positive or negative) over time for the written contracts.

### **4. Methodology**

Throughout this paper we assume that *Ct* to be the claim amount of each insurance contract at time *t*, whose dynamics are given by;

$$\frac{d\mathbf{C}\_t}{d\mathbf{C}\_t} = (\mu - \lambda\kappa)dt + \sigma dBt + [e^f - \mathbf{1}]d\mathbf{N}t,\tag{11}$$

where *μ*, is the instantaneous expected claim amount per unit time, and *σ* is the instantaneous volatility per unit time. The stochastic process *Bt* is a standard Wiener process under the market measure *P*. The process *N t*ð Þ is a Poisson process, independent of the jump-sizes *J* and the Wiener process *Bt*, with arrival intensity *λ* per unit time under the measure *P*, so that its increments satisfy the following:

$$d(\text{Nt}) = \begin{cases} \mathbf{1}, \dots, \mathbf{1}, \text{with probability } (\lambda dt) \\\\ \mathbf{0}, \text{with probability } (\mathbf{1} - \lambda dt) \end{cases} \}, \dots, \dots \tag{12}$$

The expected proportional jump size is;

$$\kappa = E\left[e^J - \mathbf{1}\right], \dots, \dots \tag{13}$$

In this study, jumps are assumed independent of each other as they arrive at different times. We then defined an information set through a filtered probability measure space ð Þ *Ω*, *F*, f g *Ft* , *P* , where the filtration f g *Ft* is the natural filtration generated by the Wiener process *Bt:* In the jump-diffusion model, the insurance claims *Ct* are defined to follow the random process given by:

$$\frac{dC\_t}{C\_t} = \mu dt + \sigma dW\_t + (I - \mathbf{1})dN\_t, \dots, \dots \tag{14}$$

The first two terms are familiar from the Black Scholes [11] model: The drift rate *μ*, volatility *σ*, and random walk (Wiener process), *Wt*. The last term represents the jumps: *J* is the jump size and *N t*ð Þ is the number of jump events that have occurred up to time *t*. *N t*ð Þ is assumed to follow the Poisson process;

$$P(\left(\text{Nt}\right)=k) = \frac{\left(\lambda t\right)^{k}}{k!}e^{-\lambda t}, \dots, \tag{15}$$

where, *λt* is the average number of jumps per unit time. Note that, there is no specific distribution for the jump sizes. However, a common choice is a log-normal distribution given as:

$$J \sim m e^{\frac{\nu^2}{2} + vN(0, 1)}, \dots, \tag{16}$$

where *N* ð Þ 0, 1 is the standard normal distribution, *m* is the average jump size, and *v* is the volatility of jump size. The key parameters that characterize the jumpdiffusion model are *λ*, *m*, *v*.

### **5. Model**

We use the basic excel spreadsheet to model the effects of jump-diffusion on claims and the respective reserves. Our equation is as follows:

$$r\_t = a + e\_t + I\_t u\_t, \dots, \tag{17}$$

where *rt* is the log claim amount, *α* is the mean drift, *ε<sup>t</sup>* is the diffusion which follows a normal distribution calculated as *σ* ∗ *NORMSINV RAND* ð Þ ðÞ , where *σ* is the standard deviation of the jumps, *I* is the indicator variable (0 or 1), for either absence or presence of the claim jump. The value is determined by the jump probability; *ut* is the value of the jump. This follows a normal distribution and is determined by

E u½ �þ σ<sup>u</sup> ∗ NORMSINV RAN ð Þ ðÞ ,

where *E u*½ � and *σ<sup>u</sup>* are the mean and standard deviation of the jump, respectively. Space is too much

### **6. Results**

Using Excel Visual Basic for Applications (VBA) and R scrip, we perform our analysis, using the calibrated parameters. Parameter calibration was done using the maximum likelihood approach. Concisely, we presented the model results in a nicely and user-friendly manner. The user can only enter the input values on the designed user form and click 'run'. The inputs included are the sigma (volatility value that we normally called the implied volatility), the risk free interest rate, time component (T), the number of paths for simulations (we used 174 for our case, but can be varied). **Table 2** summary is in the subsequent tables in the Appendices section. The number of jumps are then estimated and modeled within the selected paths number and period. Ones denote jumps, otherwise they are normal claim movements.

### **7. Discussions**

We tested whether or not there are systematic jumps insurance claims or they are normal events. We found that insurance claims for general insurance quoted products cease to be normal. There exist at times some jumps, especially during holidays and weekends. Such jumps are not healthy to the capital structures of firms, as such, they need attention. However, it should be noted that gaps or jumps (unless of specific forms) cannot be hedged by employing internal dynamic adjustments. This means that, according to our hypothesis tested, jump risk is non-diversifiable. A firm can manage jump-induced risks by buying options. Option derivatives help the firm to protect it against negative jumps and its consequences on its capital status. If, however, it establishes its own reserves, it must ensure and enforce a dynamic reserve adjustment. The reserves must increase as position values fall. This is an alternative option. The insurers must bear in mind the cushion, so to speak, that is dynamic. However, by dynamically hedging its own capital account, the insurer cannot wholly protect itself. Gaps or jumps are truly difficult to hedge; we thus need an idea of option hedging.

### **8. Conclusion**

This paper develops and tests sufficient conditions for a model when insurance claims follow a jump-diffusion process. Based on weekly claims data, our results are that the reported claims contain a jump component, with a slightly high magnitude. We measure the jump component over both short (monthly and larger intervals in time (quarterly interval) and find that the weekends and holidays tend to cover up the high jump component. The economic intuition is that jump risk is not diversifiable and hence can ruin the firm leading to capital insolvency.

### **Appendix**






### *Application of Jump Diffusion Models in Insurance Claim Estimation DOI: http://dx.doi.org/10.5772/intechopen.99853*


### **Table 2.**

*Claim amounts statistics, paths and jump forecasts.*

*Application of Jump Diffusion Models in Insurance Claim Estimation DOI: http://dx.doi.org/10.5772/intechopen.99853*

### **Author details**

Leonard Mushunje\*, Chiedza Elvina Mashiri, Edina Chandiwana and Maxwell Mashasha Department of Applied Mathematics and Statistics, Midlands State University, Gweru, Zimbabwe

\*Address all correspondence to: leonsmushunje@gmail.com

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **References**

[1] Merton, R. C. (1975). Optimum consumption and portfolio rules in a continuous-time model. In: Stochastic Optimization Models in Finance (pp. 621–661). Academic Press.

[2] Merton, R. C. (1976). Option pricing when underlying stock returns are discontinuous. Journal of Financial Economics, 3, 125–144. DOI:10.1016/ 0304-405X(76) 90022-2.

[3] Bollerslev, T., Todorov, V., and Li, S. Z. (2013). Jump tails, extreme dependencies, and the distribution of stock returns. Journal of Econometrics, 172(2), 307–324. DOI:10.1016/j. jeconom.2012.08.014

[4] Cremers, M., Halling, M., and Weinbaum, D. (2015). Aggregate jump and volatility risk in the crosssection of stock returns. The Journal of Finance, 70(2), 577–614. DOI:10.1111/ jofi.12220

[5] Jarrow, R. A., and Rosenfeld, E. R. (1984). Jump risks and the intertemporal capital asset pricing model. Journal of Business, 57(3), 337-351.

[6] Jorion, P. (1989). Asset allocation with hedged and unhedged foreign stocks and bonds. Journal of Portfolio Management, 15(4), 49.

[7] IFE (2019) Institute and Faculty of Actuaries (IFoA) (2019). Actuarial Statistics Notes with revision questions.

[8] Merton, R. C. (1973). An intertemporal capital asset pricing model. Econometrica, 41(5), 83. DOI: 10.2307/1913811, Pierre (2017)

[9] Ross, S. A. (1976). Options and efficiency. The Quarterly Journal of Economics, 90(1), 75-89.

[10] Fama, E. F. (1973). A note on the market model and the two-parameter model. Journal of Finance, 48, 1181– 1185. DOI:10.1111/j.1540-6261.1973. tb01449.x

[11] Black, F and Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of Political Economy Vol. 81, No. 3 (May–Jun., 1973), pp. 637-654 (18 pages) Published By: The University of Chicago Press. https:// www.jstor.org/stable/1831029

### **Chapter 6**

### Fuzzy Perceptron Learning for Non-Linearly Separable Patterns

*Raja Kishor Duggirala*

### **Abstract**

Perceptron learning has its wide applications in identifying interesting patterns in the large data repositories. While iterating through their learning process perceptrons update the weights, which are associated with the input data objects or data vectors. Though perceptrons exhibit their robustness in learning about interesting patterns, they perform well in identifying the linearly separable patterns only. In the real world, however, we can find overlapping patterns, where objects may associate with multiple patterns. In such situations, a clear-cut identification of patterns is not possible in a linearly separable manner. On the other hand, fuzzy-based learning has its wide applications in identifying non-linearly separable patterns. The present work attempts to experiment with the algorithms for fuzzy perceptron learning, where perceptron learning and fuzzy-based learning techniques are implemented in an interfusion manner.

**Keywords:** perceptron learning, fuzzy-based learning, fuzzy C-means, interfusion, weighted distances, pattern recognition, sum of squared errors, clustering fitness

### **1. Introduction**

A learning system could be thought as a collection of methods that are brought together in order to create an environment to facilitate different learning processes. The learning systems will provide various types of learning resources and descriptions of procedures for obtaining quality results [1]. The learning systems find their applications in the areas like, image recognition, speech recognition, traffic prediction, e-mail spam and malware filtering, automatic language translation, medical diagnosis, etc. [2].

As the data increases in large volumes in the digital repositories, it has become essential to look for alternative approaches to yield better results in extracting interesting patterns from the repositories. Intelligent learning systems are gaining attention from a wide range of researchers in the recent years in extracting patterns from the data repositories. The learning systems have three kinds of approaches. They are supervised, unsupervised, and semi-supervised learning approaches [3].

The concept of perceptron learning plays a critical role in pattern recognition, which has become a challenging problem in the data science research. In the recent years, perceptron learning algorithms are exhibiting their robust performance in identifying interesting patterns from large data repositories when compared to the traditional supervised learning approaches [4]. A perceptron can be thought as a computational prototype of a neuron. As a supervised learning approach, perceptron learning is used for linear classification of patterns. This learning approach uses the already available labelled data to classify the future data by predicting the class labels.

In the literature, it is studied that many researchers experimented with perceptron learning for identifying interesting patterns from the data. A novel autonomous perceptron model (APM) was proposed to address the issues of complexity of traditional perceptron architectures [4]. APM is a nonlinear supervised learning model, which has the architecture using the computational power of the quantum bits (qubits). The researchers [5], using biophysical perceptron (BP), tried to simulate the pyramidal cells in the brain with a wide variety of active dendritic channels. The BP, here, explores the ability of real neurons with extended non-linear dendritic trees to effectively perform the classification task in identifying interesting patterns from the data. Many researchers have experimented with perceptron learning in a wide variety of ways. However, the perceptron learning suffers several limitations. It works well for linearly separable patterns. Though some researchers experimented for identifying non-linearly separable patterns, the perceptron learning produced best results for binary separation of patterns only [5]. Also that perceptron learning suffers poor performance in case of overlapping patterns, that is, when patterns are not having sharp boundaries.

Fuzzy-based learning, on the other hand, is found to show its ability in performing well for overlapping patterns [6]. As a fuzzy-based learning approach, fuzzy C-means (FCM) is widely used by researchers for pattern recognition. A weighted local fuzzy regression model showed a better efficiency than the least squares regression for nonlinear and high-dimensional pattern recognition of transport system in China [7]. The new kernelized fuzzy C-means clustering algorithm [8] uses a kernel-induced distance function as a similarity measure showed improved performance in identifying the patterns when compared to the conventional fuzzy C-means technique. In many research findings, it is observed that the fuzzy-based learning approach was used in a wide variety of ways to achieve better results in extracting non-linear and overlapping patterns.

The present work attempts to experiment with fuzzy perceptron learning, which implements the perceptron learning and fuzzy-based learning techniques in an interfusion manner. In the research literature, we can find a good amount of work related to the combination of fuzzy logic with perceptron learning. The fuzzy neural network (FNN) was proposed for pattern classification, which uses supervised fuzzy clustering and pruning algorithm to determine the precise number of clusters with proper centroids representing the patterns to be recognised [9]. In the fuzzy neural integrated networks [10], the researchers attempted to integrate the concept of fuzzy sets and neural networks to deal with pattern recognition problems. In an enhanced algorithm for fuzzy lattice reasoning (FLR) classifier, a new nonlinear positive valuation function was defined to produce better results for pattern classification [11]. Along with these, however, many other research experiments of fuzzy perceptron learning are supervised learning approaches only. Therefore, the present work focuses on experimenting with effective implementation of some techniques involved in the perceptron and fuzzybased learning systems for unsupervised learning to identify interesting patterns in large datasets. As part of the present work, five algorithms are developed, two of which are related to perceptron learning, one is the standard fuzzy C-means (FCM) algorithm. The remaining two algorithms are proposed by the present work, which implement the perceptron learning and fuzzy-based learning in an interfusion manner using weights and weighted distances respectively. All the algorithms are implemented using three benchmark datasets. The CPU time, clustering fitness (CF), and sum of squared errors (SSE) are taken into consideration for performance evaluation of the algorithms.

### **2. Perceptron learning**

Nowadays, the perceptron learning model can be thought as a more general computational model in identifying interesting patterns in a dataset. It takes an input, aggregates it along with the weights and produces the result. A perceptron is used to learn patterns and relationships in data. Patterns help us knowing about the interesting features around which objects may be grouped in a given population of data.

A perceptron may be configured for a specific application, such as pattern recognition and data classification through some learning process [12]. Perceptrons are information processing devices, which are built from interconnected elementary processing units. These units are called neurons. The perceptrons are robust in exhibiting their ability in distributed representation and computation, learning, generalisation, adaptivity, inherent contextual information processing, and fault tolerance [13].

The perceptron learning uses an iterative weight adjustment for the enhanced retrieval of patterns from a dataset. The iterative process converges to the weights, which produce the patterns that represent the different groups of data objects in the dataset uniquely. While operating for learning on patterns, the perceptrons use weights in connection to every input vector. A weight represents the information used by the perceptron to solve a problem [14].

The perceptron with multiple neurons is shown in **Figure 1**.

In **Figure 1**, *X*1, *X*2, … , *Xn* are the n input vectors and *Y*1, *Y*2, … , *Ym* are the *m* neurons. The input vector *X*<sup>1</sup> is connected to neurons *Y*1, *Y*2, … , *Ym* with weights *W*11, *W*12, … , *W*1*m*, respectively, the input vector *X*<sup>2</sup> is connected to the neurons with weights *W*21, *W*22, … , *W*2*m*, respectively so on and the input vector *Xn* is connected to the neurons with the weights *Wn*1, *Wn*2, … , *Wnm*, respectively. The weights of all input vectors for all neurons will be formulated as the weight matrix as shown below.


**Figure 1.** *A perceptron with a multiple neurons.*

Though the perceptron learning exhibits its robustness in identifying the patterns in the data repositories, it works well for linearly separable patterns, that is, the patterns with sharp boundaries only. However, in the real time world, we may find overlapping patterns, that is, non-linearity in pattern associativity, where data objects may associate with multiple patterns. In such situations, the perceptron learning approach may suffer in identifying the patterns clearly. On the other hand, fuzzybased learning has its wide applications in identifying patterns in the overlapping scenario. In the present work, two algorithms are implemented for perceptron learning. They are discussed in the following sub-sessions.

### **2.1 Perceptron learning using weights (PLW)**

This algorithm implements the perceptron learning using weights [12]. With each input data vector, a weight is associated corresponding to each pattern. To generate the initial weights, one iteration of K-means algorithm is performed. The results of Kmeans iteration are used to compute the weight matrix. This weight matrix will be repeatedly updated in the subsequent iterations. For each input data vector weights are computed corresponding to every pattern. The input data vector is associated with the pattern corresponding to which the weight is maximum. This process is repeated for every iteration. The algorithm terminates when there is no change in the association of data vectors to the patterns. The algorithm for perceptron learning using weights is given below.

### *2.1.1 Algorithm PLW*

Step 1: Determine the number of patterns, *k*, to be recognised from the dataset. Step 2: Select *k* points randomly from the dataset and set them as cluster seeds to correspond the patterns to be recognised.

Step 3: Perform one iteration of K-means algorithm.

Step 4: Using the results of K-means iteration, compute cluster wise initial weights. Step 5: Repeat steps 6–8 until the stopping condition.

Step 6: Generate weight matrix, where each element *Wij* is computed as:

$$\mathbf{W\_{ij}}(\mathbf{i}+\mathbf{1}) = \mathbf{W\_{ij}}(\mathbf{i}) + \mathbf{l}\mathbf{r} \* \left( ||\mathbf{X\_i}|| - \mathbf{W\_{ij}}(\mathbf{i}) \right) \tag{1}$$

Here, *Wij*(*i* + 1) is the weight of ith data point *Xi* for jth cluster for the iteration (*i* + 1), *Wij*(*i*) is the weight of *i*th data point for *j*th cluster for the iteration *i*, ||*Xi*|| is the norm of data point *Xi*, and *lr* is the learning rate. The *lr* may assume a value ranging between 0 and 1. To avoid possible biasedness in the computations, *lr* is assumed to be 0.5.

Step 7: Assign points to clusters using weights. Step 8: Update cluster means, that is, refine patterns. [End of step 5 loop] Step 9: [End of algorithm]

### **2.2 Perceptron learning using weighted distances (PLWD)**

This algorithm implements the perceptron learning using weighted distances [15]. With each input data vector, a weighted distance is associated corresponding to each pattern. To generate the initial weighted distances, one iteration of K-means algorithm

### *Fuzzy Perceptron Learning for Non-Linearly Separable Patterns DOI: http://dx.doi.org/10.5772/intechopen.101312*

is performed. Using the results of K-means the weight matrix is computed. This weight matrix is used to compute the weighted distances for each input data vector. The data vector is associated with the pattern corresponding to which the weighted distance is minimum. This weight matrix will be repeatedly updated in the subsequent iterations to compute the new weighted distances. This process repeats for every iteration. The algorithm terminates when there is no change in the association of data vectors to the patterns. The algorithm for perceptron learning using weighted distances is given below.

### *2.2.1 Algorithm PLWD*

Step 1: Determine the number of patterns, *k*, to be recognised from the dataset. Step 2: Select *k* points randomly from the dataset and set them as cluster seeds *μ<sup>j</sup>*

(*j* = 1, 2, … , *m*) to correspond the patterns to be recognised.

Step 3: Perform one iteration of K-means algorithm.

Step 4: Using the results of K-means iteration, compute cluster wise initial weights.

Step 5: Repeat steps 6–10 until the stopping condition.

Step 6: Generate weight matrix *W* using Eq. (1).

Step 7: For each data point *Xi*, compute the Euclidean distance *d*(*Xi*, *μj*) as follows:

$$d\left(\mathbf{X}\_i, \mu\_j\right) = \sqrt{\sum\_{l=1}^d \left(\mathbf{x}\_{il} - \mu\_{jl}\right)^2} \tag{2}$$

Here, *Xi* is the ith data point, *μ<sup>j</sup>* is the mean vector of the cluster *j*. Step 8: For each data point compute the weighted distances as follows:

$$\mathcal{W}d\_{\vec{j}} = \mathcal{W}\_{\vec{j}}(\vec{i} + \mathbf{1}) \cdot d\left(\mathbf{X}\_i, \mu\_{\vec{j}}\right) \tag{3}$$

Step 9: Assign points to clusters using weights.

Step 10: Update cluster means, that is, refine patterns.

[End of step 5 loop]

Step 11: [End of algorithm]

Though the perceptron learning algorithms are experimented widely by many researchers, they exhibit their robustness in identifying linearly separable patterns only.

### **3. Fuzzy-based learning**

Fuzzy-based learning is used to handle the concept of partial truth, where the truth value may range between completely true and completely false [16]. It is an approach that allows for multiple possible truth values to be processed through the same data object. In fuzzy-based learning, the data objects are assumed being associated with multiple patterns. For each data object, the degree of association is measured in membership. This membership value may range between 0 and 1 (1 being high similarity and 0 being no similarity with the pattern).

Fuzzy-based learning techniques focus on modelling uncertain and vague information that is found in the real world situations. These techniques deal with the

patterns whose boundaries cannot be defined sharply [17, 18]. By fuzzy-based learning, one can know if data objects fully or partially associate with the patterns that are under consideration based on their memberships of association [19]. Among the techniques of fuzzy-based learning, fuzzy C-means (FCM) is the most well-known one as it has the advantage of robustness for obscure information about the patterns [20, 21]. FCM is widely studied and applied in geological shape analysis [22], medical diagnosis [23], automatic target recognition [24], meteorological data [20], pattern recognition, image analysis, image segmentation and image clustering [25–27], agricultural engineering, astronomy, chemistry [28], detection of polluted sites [29], etc. The following section presents a brief discussion of FCM algorithm.

### **3.1 Fuzzy C-means (FCM)**

The fuzzy C-means (FCM) is a technique that uses degree of membership for natural interpretation of patterns recognised [30]. The FCM associates the data vectors among *k* patterns. Each data vector may associate with each pattern with a membership degree. The membership of a data vector towards a pattern can range between 0 and 1.

The FCM algorithm is given below [31]. Here, *U* is the *k* � *N* membership matrix. While computing the cluster means and updating the membership matrix at each iteration, the FCM uses the fuzzifier factor, *m*. For most cases, *m* ranging between 1.5 and 3.0 gives good results [32]. In the present work, in all the experiments, *m* is set to 1.5.

*3.1.1 Algorithm FCM*

Step 1: Determine the number of patterns, *k*, to be recognised from the dataset. Step 2: Select *k* points randomly from the dataset and set them as cluster seeds *μ<sup>j</sup>*

(*j* = 1, 2, … , *m*) to correspond the patterns to be recognised.

Step 3: Perform one iteration of K-means algorithm. Set *t* = 0.

Step 4: Using the results of K-means iteration, compute membership matrix *U*ð Þ <sup>0</sup> *k* X *N*. Step 5: Repeat steps 6–9 until the stopping condition.

Step 6: [Refine patterns] Update the mean of *j*th cluster *μ<sup>j</sup>* as follows:

$$\mu\_{\vec{j}} = \frac{\sum\_{i=1}^{N} \left( u\_{\vec{i}\vec{j}} \right)^{m} \mathbf{X}\_{i}}{\sum\_{i=1}^{N} \left( u\_{\vec{j}\vec{j}} \right)^{m}} \tag{4}$$

Here, *uij* is the membership degree of the data point *Xi* w.r.t. *j*th pattern and *m* is the fuzzifier factor.

Step 7: Compute the new membership matrix using:

$$\mu\_{\vec{\eta}}{}^{t+1} = \left[ \sum\_{l=1}^{k} \left( \frac{\left\| \mathbf{X}\_{i} - \mu\_{\vec{f}}^{t} \right\|^{2}}{\left\| \mathbf{X}\_{i} - \mu\_{l}^{t} \right\|^{2}} \right)^{\mathbb{1}\_{m-1}} \right]^{-1} \tag{5}$$

Step 9: Assign points to clusters using membership degrees. Set *t* = *t* + 1. [End of step 5 loop] Step 10: [End of algorithm]

### **4. Fuzzy perceptron learning**

The fuzzy perceptron learning works in an interfusion manner, where the fuzzy logic is combined with perceptron learning for identifying non-linear and overlapping patterns. Much research work may be found in the literation where fuzzy perceptron learning is experimented in different applications [33, 34]. However, those experiments are confined to supervised learning only. The present work attempts to experiment with fuzzy perceptron learning for unsupervised cases. The present work proposes two algorithms, one is for fuzzy perceptron learning using weights and the other is for fuzzy perceptron learning using weighted distances.

### **4.1 Fuzzy perceptron learning using weights (FPLW)**

This algorithm implements the perceptron learning using weights and FCM techniques in an interfusion manner. These techniques are performed in alternative iterations until the termination condition. Initially, one iteration of K-means algorithm is performed. Using the results of K-means, initial weights are computed as mentioned in the Section 2.1. Using these weights, weight matrix is generated to perform one iteration of perceptron learning algorithm to associate the input data vectors to the patterns. Using the results of perceptron learning step, membership matrix is computed to perform one iteration of FCM algorithm as mentioned in Section 3.1. The results of FCM step are used to update weight matrix to perform perceptron learning step. In this way the perceptron learning and FCM algorithms are repeated in alternative iterations until termination condition. The algorithm for fuzzy perceptron learning using weights (FPLW) is given below.

### *4.1.1 Algorithm FPLW*

Step 1: Determine the number of patterns, *k*, to be recognised from the dataset. Step 2: Select *k* points randomly from the dataset and set them as cluster seeds *μ<sup>j</sup>*

	- Step 3: Perform one iteration of K-means algorithm.
	- Step 4: Using the results of K-means iteration, compute cluster wise initial weights.
	- Step-5: Update cluster means *μ<sup>j</sup>* (*j* = 1, 2, … , *m*).
	- Step 6: Repeat steps 7–13 until the stopping condition.
	- Step 7: Compute the weight matrix using Eq. (1).
	- Step 8: Assign points to clusters using weights.
	- Step 9: If there is no change in cluster assignment then go to step 14.
	- Step 10: Update cluster means using Eq. (4).
	- Step 11: Generate membership matrix *U*ð Þ <sup>0</sup> *<sup>k</sup>* <sup>X</sup> *<sup>N</sup>* using Eq. (5).
	- Step 12: Assign points to clusters using membership matrix.
	- Step 13: If there is no change in cluster assignment then go to step 14.
	- [End of Step 6 loop]
	- Step 14: [End of Algorithm]

### **4.2 Fuzzy perceptron learning using weighted distances (FPLWD)**

This algorithm implements the perceptron learning using weighted distances and FCM techniques in an interfusion manner. These techniques are performed in alternative iterations until the termination condition. Initially, one iteration of K-means

technique is performed. Using the results of K-means, initial weights are computed as mentioned in the Section 2.2. Now, one iteration of perceptron learning algorithm is performed where the weight matrix is generated using the initial weights. Using this weight matrix, weighted distances are computed for every input vector *Xi* with respect to each pattern using the Eq. (3). The weighted distances are used to associate the input vectors to the patterns. Using the results of perceptron learning step, membership matrix is computed to perform one iteration of FCM algorithm as mentioned in Section 3.1. The results of FCM step are used to compute weight matrix for perceptron learning step. In this way the perceptron learning and FCM steps are repeated in alternative iterations until termination condition. The algorithm for fuzzy perceptron learning using weighted distances (FPLWD) is given below.

### *4.2.1 Algorithm FPLWD*

Step 1: Determine the number of patterns, *k*, to be recognised from the dataset. Step 2: Select *k* points randomly from the dataset and set them as cluster seeds *μ<sup>j</sup>*

	- Step 3: Perform one iteration of K-means algorithm.

Step 4: Using the results of K-means iteration, compute cluster wise initial weights.


Step 15: If there is no change in cluster assignment then go to step 16.

[End of Step 6 loop]

Step 16: [End of Algorithm]

### **5. Performance evaluation**

For performance evaluation of algorithms, CPU time in seconds, sum of squared errors [35] and clustering fitness (CF) [36] are taken into consideration and are calculated for all the algorithms.

### **5.1 Sum of squared errors**

The objective of pattern learning is to minimise the intra-cluster sum of squared errors (SSE). The lesser the SSE, the better the goodness of fit is. The SSE for the results of each algorithm is computed using Eq. (6).

$$\text{SSE} = \sum\_{j=1}^{k} \sum\_{X\_i \in C\_j} \left( X\_i - \mu\_j \right)^2 \tag{6}$$

Here, *Xi* is the *i*th data point in the dataset, *μ<sup>j</sup>* (*j* = 1, … , *k*) is the mean of the cluster *Cj*, and *k* is the number of patterns to be recognised.

### **5.2 Cluster fitness**

While achieving high intra-cluster similarity, it is also important to achieve well separation of patterns.

So, it is also important to consider inter-cluster similarity while evaluating the performance of the algorithms. For this, the present work, computes the clustering fitness (CF) as a performance criterion, which requires the calculation of both intracluster similarity and inter-cluster similarity. The computation of CF also requires the experiential knowledge, *λ*. The computation of CF results in higher value when the inter-cluster similarity is low and results in lower value for when the inter-cluster similarity is high. Also that to make the computation of CF unbiased, the value of *λ* is taken as 0.5 [36].

### *5.2.1 Intra-cluster similarity for the cluster* Cj

It can be quantified via a function of the reciprocals of intra-cluster radii within each of the resulting clusters. The intra-cluster similarity of a cluster *Cj* (1 = *j* = *k*), denoted as *Stra*(*Cj*) [36], is defined by:

$$S\_{tm}\left(C\_j\right) = \frac{1+n}{1 + \sum\_{1}^{n} dist(I\_l, Centroid)}\tag{7}$$

Here, *n* is the number of items in cluster *Cj*, *Ij* (1 = *j* = *n*) is the *j*th item in cluster *Cj*, and dist(*Ij*, *Centroid*) calculates the distance between *Ij* and the centroid of *Cj*, which is the intra-cluster radius of *Cj*. To smooth the value of *Stra*(*Cj*) and allow for possible singleton clusters, 1 is added to the denominator and numerator.

### *5.2.2 Intra-cluster similarity for one clustering result* C

It is denoted as *Stra*(*C*) [36]. It is defined by:

$$\mathbf{S}\_{\text{tra}}(\mathbf{C}) = \frac{\sum\_{1}^{k} \mathbf{S}\_{\text{tra}}\left(\mathbf{C}\_{j}\right)}{k} \tag{8}$$

Here, *k* is the number of resulting clusters in *C* and *Stra*(*Cj*) is the intra-cluster similarity for the cluster *Cj*.

### *5.2.3 Inter-cluster similarity*

It can be quantified via a function of the reciprocals of inter-cluster radii of the clustering centroids. The inter-cluster similarity for one of the possible clustering results *C*, denoted as *Ster*(*C*) [36] is defined by:

$$\mathcal{S}\_{ter}(C) = \frac{1+k}{1 + \sum\_{1}^{k} \text{dist}\left(\text{Centroid}\_{j}, \text{Centroid}^{2}\right)} \tag{9}$$

Here, *k* is the number of resulting clusters in *C*,1= *j* = *k*, *Centroidj* is the centroid of the *j*th cluster in *C*, *Centroid*<sup>2</sup> is the centroid of all centroids of clusters in *C*. We compute inter-cluster radius of *Centroidj* by calculating dist(*Centroidj*, *Centroid*<sup>2</sup> ), which is distance between *Centroidj*, and *Centroid*<sup>2</sup> . To smooth the value of *Ster*(*C*) and allow for possible all-inclusive clustering result, 1 is added to the denominator and the numerator.

### *5.2.4 Clustering fitness*

The clustering fitness for one of the possible clustering results *C*, denoted as *CF* [36], is defined by:

$$\text{CF} = \lambda \ge \text{S}\_{\text{tra}}(\text{C}) + \frac{\mathbf{1} - \lambda}{\text{S}\_{\text{ter}}(\text{C})} \tag{10}$$

Here, *λ* (0 < *λ* < 1) is an experiential weight, *Stra*(*C*) is the intra-cluster similarity for the clustering result *C* and *Ster*(*C*) is the inter-cluster similarity for the clustering result *C*.

### **6. Experiments and results**

Experimental work has been carried out on the system with Intel(R) Core(TM) i3-5005 U CPU@2.00GHz processor speed, 4GB RAM, Windows 7 OS (64-bit) and using JDK1.7.0\_45. Separate modules are written for each of the above discussed methods to observe the CPU time for clustering any dataset by keeping the cluster seeds same for all methods. I/O operations are eliminated and the CPU time observed is strictly for clustering of the data.

Along with the proposed algorithms FPLW and FPLWD for fuzzy perceptron learning, experiments are also conducted with the algorithms PLW, PLWD and FCM for performance comparison. All the algorithms are executed using the benchmark datasets with varying number of patterns to be recognised. In the present work, Magic Gamma, Letter Recognition and Intrusion datasets are used from UCI ML data repository [37]. All the developed algorithms, PLW, PLWD, FCM, FPLW and FPLWD, are executed using these datasets for varying number of patterns to be recognised (*k* = 10, 11, 12, 13, 14, 15).

All the algorithms operate in an iterative manner and terminate when a stopping condition is met. The stopping condition is when there is no change in the pattern associativity of the data vectors. The termination condition is the same for all the algorithms.


Details of the datasets are available in **Table 1**.

**Table 1.** *Details of datasets.*

### **6.1 Observations with Magic Gamma dataset**

The results of all algorithms, using Magic Gamma dataset, with respect to CPU time in seconds, clustering fitness and sum of squared errors are shown in **Figures 2**–**4**, respectively.

### **6.2 Observations with Letter Recognition dataset**

The results of all algorithms, using Letter Recognition dataset, with respect to CPU time in seconds, clustering fitness and sum of squared errors are shown in **Figures 5**–**7**, respectively.

### **6.3 Observations with Intrusion dataset**

The results of all algorithms, using Intrusion dataset, with respect to CPU time in seconds, clustering fitness and sum of squared errors are shown in **Figures 8**–**10**, respectively.

**Figure 2.** *CPU time of each clustering method (Magic Gamma dataset).*

**Figure 3.** *Clustering fitness of each clustering method (Magic Gamma dataset).*

**Figure 4.** *SSE of each clustering method (Magic Gamma dataset).*

**Figure 5.** *CPU time of each clustering method (Letter Recognition dataset).*

**Figure 6.** *Clustering fitness of each clustering method (Letter Recognition dataset).*

In all the experiments, it is observed that the algorithm FPLW, which implements the perceptron learning using weights and the FCM techniques in an interfusion manner, is showing consistently better performance in terms of clustering fitness (CF) and SSE than the other algorithms.

*Fuzzy Perceptron Learning for Non-Linearly Separable Patterns DOI: http://dx.doi.org/10.5772/intechopen.101312*

**Figure 7.** *SSE of each clustering method (Letter Recognition dataset).*

**Figure 8.** *CPU time of each clustering method (Intrusion dataset).*

**Figure 9.** *Clustering fitness of each clustering method (Intrusion dataset).*

**Figure 10.** *SSE of each clustering method (Intrusion dataset).*

### **7. Conclusion**

The present experiment mainly focuses on the study fuzzy perceptron learning for recognising non-linear patterns in the datasets. Many researchers contributed greatly towards fuzzy perceptron learning. However, their experiments are confined to supervised learning only. So, the present work experimented with the fuzzy perceptron learning approaches for unsupervised learning. The work proposes two new algorithms, that is, FPLW and FPLWD. These algorithms are implemented using three benchmark datasets. Along with these algorithms, the algorithms for standard FCM and perceptron learning using weights and weighted distances are also implemented for performance comparison. For all the algorithms the CPU time in seconds, clustering fitness (CF) and sum of squared errors (SSE) are taken into consideration for performance evaluation. All the developed algorithms are experimented with varying number of patterns (*k*) to be recognised.

In all the experiments, it is observed that the proposed algorithm for fuzzy perceptron learning using weights (FPLW) is consistently showing better performance with respect to clustering fitness and SSE. Of course, the algorithm FPLW is taking a little more time for its execution than the other algorithms. However, it could be negligible, as the main concern is for clearly recognising the non-linear patterns in the datasets.

*Fuzzy Perceptron Learning for Non-Linearly Separable Patterns DOI: http://dx.doi.org/10.5772/intechopen.101312*

### **Author details**

Raja Kishor Duggirala Department of Computer Science Engineering, Dr. Lankapalli Bullayya College of Engineering, Visakhapatnam, Andhra Pradesh, India

\*Address all correspondence to: rajakishor@gmail.com

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **References**

[1] Satapathy SK et al. EEG Brain Signal Classification for Epileptic Seizure Disorder Detection, ScienceDirect, 2019, ISBN 978-0-12-817426-5, DOI: https:// doi.org/10.1016/C2018-0-01888-5

[2] Benton WC, Hat R, Raleigh. Machine learning systems and intelligent applications. IEEE Software. 2020;**37**: 43-49. DOI: 10.1109/MS.2020.2985224

[3] Krendzelak M. Machine Learning and Its Applications in e-Learning Systems. Stary Smokovec, Slovakia: IEEE, 2014 IEEE 12th IEEE International Conference on Emerging eLearning Technologies and Applications (ICETA); 2014. pp. 267-269. DOI:10.1109/ICETA.2014.7107596

[4] Sagheer A, Zidan M, Abdelsamea MM. A novel autonomous perceptron model for pattern classification applications. Entropy. 2019;**21**(8):763. DOI: 10.3390/ e21080763

[5] Moldwin T, Segev I. Perceptron learning and classification in a modeled cortical pyramidal cell. Frontiers in Computational Neuroscience. 2020. DOI: 10.3389/fncom.2020.00033. https://www.frontiersin.org/articles/ 10.3389/fncom.2020.00033/full

[6] Leoni Sharmila S, Dharuman C, Venkatesan P. A fuzzy based classification—An experimental analysis, International Journal of Innovative Technology and Exploring Engineering. 2019;**8**(10):4634-4638

[7] Tan Y, Chen S. Pattern recognition based on weighted fuzzy C-means clustering. In: 6th International Congress on Image and Signal Processing (CISP). 2013. pp. 1061-1065. DOI: 10*:*1109*=*CISP*:*2013*:*6745213

[8] Das S, Baruah H. A new kernelized fuzzy C-means clustering algorithm with enhanced performance. Semantic Scholar; 2014. Corpus ID: 212563388

[9] Kulkarni A, Kulkarni N. Fuzzy neural networks for pattern recognition. Procedia Computer Science. 2020;**167**:2606-2616. DOI: 10.1016/j.procs.2020.03.321

[10] Baraldi A, Blonda P, Petrosino A. Fuzzy neural networks for pattern recognition. In: Marinaro M, Tagliaferri R, editors. Neural Nets WIRN VIETRI-97. Perspectives in Neural Computing. London: Springer; 1998. DOI: 10.1007/978-1-4471-1520-5\_2

[11] Jamshidi Khezeli YJ, Nezamabadipour H. Fuzzy lattice reasoning for pattern classification using a new positive valuation function. Advances in Fuzzy Systems. 2012;**2012**:206121. DOI: 10.1155/2012/206121

[12] Sivanandam SN, Sumathi S, Deepa SN. Introduction to Neural Networks Using Matlab 6.0. India: Tata McGraw Hill; 2008

[13] Nadal J-P, Parga N. Information processing by a perceptron in an unsupervised learning task. Network: Computation in Neural Systems. 1993; **4**(3):295-312. DOI: 10.1088/0954-898X\_ 4\_3\_004

[14] Haykin S. Neural Networks: A Comprehensive Foundation. 2nd ed. New Delhi, India: Pearson Education; 2007

[15] Nalaie K, Ghiasi-Shirazi K, Akbarzadeh-T M. Efficient implementation of a generalized convolutional neural networks based on weighted Euclidean distance. In: 2017 7th International Conference on Computer and Knowledge Engineering (ICCKE). 2017. pp. 211-216. DOI: 10*:*1109*=*ICCKE*:*2017*:*8167877

*Fuzzy Perceptron Learning for Non-Linearly Separable Patterns DOI: http://dx.doi.org/10.5772/intechopen.101312*

[16] Novák V, Perfilieva I, Močkoř J. Mathematical Principles of Fuzzy Logic. Dordrecht: Kluwer Academic; 1999

[17] Zadeh LA. Fuzzy sets. Information and Control. 1965;**8**(3):338-353

[18] Lemiare J. Fuzzy insurance. ASTIN Bulletin. 1990;**20**(1):33-55

[19] Das S. Pattern recognition using fuzzy C-means technique. International Journal of Energy Information and Communications. 2013;**4**(1):1-14

[20] Lu Y, Ma T, Yin C, Xie X, Tian W, Zhong SM. Implementation of the fuzzy C-means clustering algorithm in meteorological data. International Journal of Database Theory and Application. 2013;**6**(6):1-18

[21] Kaltri K, Mahjoub M. Image segmentation by Gaussian mixture models and modified FCM algorithm. The International Arab Journal of Information Technology. 2014;**11**(1):11-18

[22] Bezdek JC, Trivedi M, Ehrlich R, Full WE. Fuzzy clustering: A new approach for geostatistical analysis. International Journal of System Measurement and Decision. 1981;**1**:13-23

[23] Bezdek JC. Feature selection for binary data-medical diagnosis with fuzzy sets. In: Proc. Nat. Comput. Conf. AFIPS Press; 1972. pp. 1057-1068

[24] Cannon RL, Dave JV, Bezdek JC. Efficient implementation of fuzzy C-means clustering algorithm. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1986;**8**(2):248-255

[25] Cannon RL, Jacobs C. Multispectral pixel classification with fuzzy objective functions. Technical Report CAR-TR-51. College Park: Center for Automation Research, University of Maryland; 1984

[26] Gong M, Liang Y, Ma W, Ma J. Fuzzy C-means clustering with local information and kernel metric for image segmentation. IEEE Transactions on Image Processing. 2013;**22**(2):573-584

[27] Krinidis S, Krinidis M, Chatzis V. Fast and robust fuzzy active contours. IEEE Transactions on Image Processing. 2010;**19**(5):1328-1337

[28] Yong Y, Chongxun Z, Pan L. A novel fuzzy C-means clustering algorithm for image thresholding. Measurement Science Review. 2004;**4**(1):11-19

[29] Hanesch M, Scholger R, Dekkers MJ. The application of fuzzy C-means cluster analysis and non-linear mapping to a soil data set for the detection of polluted sites. Physics and Chemistry of the Earth, Part A. 2001;**26**(11–12):885-891

[30] Ghosh S, Dubey SK. Comparative analysis of K-means and fuzzy C-means algorithms. International Journal of Advanced Computer Science and Applications. 2013;**4**(4):35-39

[31] Klir GJ, Yuan B. Fuzzy Sets and Fuzzy Logic: Theory and Applications. India: Prentice Hall of India Private Limited; 2005

[32] Bezdek JC, Ehrlich R, Full W. FCM: The fuzzy C-means clustering algorithm. Computers and Geosciences. 1984;**10** (2–3):191-203

[33] Auephanwiriyakul S, Dhompongsa S. An investigation of a linguistic perceptron in a nonlinear decision boundary problem. In: 2006 IEEE International Conference on Fuzzy Systems. 2006. pp. 1240-1246

[34] Yang J, Wu W, Shao Z. A new training algorithm for a fuzzy perceptron and its convergence. In: Advances in Neural Networks—ISNN 2005, Second

International Symposium on Neural Networks, Chongqing, China; May 30–June 1 2005

[35] Han J, Kamber M. Data Mining Concepts and Techniques. 2nd ed. San Francisco, CA: Morgan Kaufmann Publishers, An Imprint of Elsevier; 2007

[36] Han X, Zhao T. Auto-K dynamic clustering algorithm. Journal of Animal and Veterinary Advances. 2005;**4**(5): 535-539

[37] Lichman M. UCI Machine Learning Repository. 2013. Available from: http:// archive.ics.uci.edu/ml

### **Chapter 7**

### Semantic Map: Bringing Together Groups and Discourses

*Theodore Chadjipadelis and Georgia Panagiotidou*

### **Abstract**

This chapter presents a multivariate analysis method which is developed in two steps using a combination of Hierarchical cluster analysis (HCA) and Factorial Correspondence Analysis (AFC). To explain and describe the steps of the method, we use an application example on a survey dataset from young students in Thessaloniki trying to investigate their behavioral profiles in terms of political characteristics and how these may be affected about their attendance to a civic education course offered by the Political Science department in the Aristotle University of Thessaloniki. The method is explained step by step on this example serving as a manual of its application to the researcher. HCA assigns subjects into cluster membership variables and in the next stage, these new variables are jointly analyzed with AFC. Correspondence analysis manages to extract the dimensions of the phenomenon in the study, explaining the inner antithesis between the categories but also giving the opportunity to visualize the information in a two-dimensional space, a semantic map, making interpretation more comprehensive. HCA is then applied again to the AFC's coordinates of the categories constructing profiles of subjects, assigning them to the categories of the variables.

**Keywords:** hierarchical cluster analysis, correspondence analysis, political analysis, multivariate methods, data analysis

### **1. Introduction**

This chapter presents a multivariate analysis method, using a combination of Hierarchical Cluster Analysis (HCA) [1] and Factorial Correspondence Analysis (AFC) in two steps [2]. The method provides the advantage of jointly handling multiple variables with many levels. The approach exploits HCA in reducing many variables into fewer ones that represent the individuals within them and then with Correspondence analysis it manages to reduce the information even further and express it upon dimensions.

These dimensions not only organize the information within the data to be explained more thoroughly but also visualizes the inner relationships among categories of the variables. By analyzing the antagonism of the clusters on different sets of dimensions, as we can also have a three-dimensional or more system of axes [3], we can understand further the behavior of the variables and their categories, as well as the associations among them.

Clustering in the final step of the coordinates of the categories on the dimensions we link the initial clusters with the categories, creating a semantic map [4] that can visualize the phenomenon in a Cartesian field or a three-dimensional space [3]. In this chapter, we present the application of the method in a specific case, which works only as an example.

The sample consists of students in Thessaloniki, Greece measuring specifically their political attitudes and their views on democracy, on moral values and the way they are informed in general about politics. In the example that is developed through the chapter we describe the application of the method and the interpretation of the results step by step.

### **2. Methodology**

Our data analysis is based on Hierarchical Cluster Analysis (HCA) and Factorial Correspondence Analysis (AFC) in two steps [5]. The dataset is analyzed using advanced multivariate methods (Hierarchical Cluster analysis, Factorial correspondence Analysis (Analyse factorielle des correspondences AFC) [2]. Using this mixedmethod approach, enables the detection of profiles of similar behavior, the association of each profile to the distinct categories that compose it and the detection of the dimension which describes the dynamics of the phenomenon, enabling the visualization of these dynamics in its final output.

In the first step, HCA assigns subjects into distinct groups according to their response patterns [2]. The main output of HCA is a group or cluster membership variable, which reflects the partitioning of the subjects into groups. Furthermore, for each group, the contribution of each question (variable) to the group formation is investigated [2], to reveal a typology of behavioral patterns. To determine the number of clusters, we use the empirical criterion of the change in the ratio of between-cluster inertia to total inertia, when moving from a partition with r clusters to a partition with r-1 clusters [6]. The metric used is chi-square. Analysis was conducted with the software M.A.D. (Methodes de l' Analyse des Donnees) [7]. In the second step, the group membership variable, obtained from the first step, is jointly analyzed with the existing variables via Multiple Correspondence Analysis on the so-called Burt table [8]. At this stage, correspondence extracts the dimensions that constitute the overall phenomenon, explaining the inner inertia between all subjects. To determine the number of factors, the empirical criterion of Benzecri was used. According to the empirical criterion of Benzecri [2], two specific sub-criteria should be fulfilled.

COR > 200 and CTR value >1000/(n + 1).

where n = total number of categories.

We proceed by applying again HCA for the coordinates of the categories on the dimensions. Bringing these two analyses steps together, we can construct a semantic map that can visualize the behavioral structure of the variables and the subjects, creating behavioral patterns and abstract discourses [4].

### **3. An application example in political analysis**

To demonstrate the method of HCA and MCA in two steps, an example was selected to be described in the following sections. This example refers to the analysis of data collected during a survey in Thessaloniki, Greece in the period 2019–2020. The topic of the survey is to collect data about the political characteristics of young students who participated in a civic education course offered by the Department of Political Sciences in the Aristotle University of Thessaloniki. The sample consists of 1618 participants, allocated into four groups:

Group 1: random university students within the campus of the university who were not part of the civic education course.

Group 2: university students who attended the course in-classroom.

Group 3: university students who attended the course through e-learning, due to covid-19 restrictions and measures.

Group 4: high-school students who attended the course.

The tool of the survey was a questionnaire, structured in three sections: 1) demographics, 2) political behavior, 3) information means, views on democracy and moral context.

The objective of the research is to investigate the students' levels of political knowledge, political interest, preferable way of political mobilization and distinguish the different profiles among the four groups of participants. The variables of the research -associated with each one of the questions- correspond to: a) political interest, c) political knowledge, b) political mobilization, c) their self-positioning on the ideological left–right axis, d) sources of information on politics e) structure of the "political" and f) "moral" self [9, 10].

More specifically, the respondents are asked directly for their level of political interest (ordinal scale) and the way they prefer to mobilize themselves on political issues which may arise (nominal scale). The variable of political knowledge (ordinal scale) is composed through the answers of the respondents on basic questions about politics, many correct answers produce a high score of political knowledge. Next, the respondents are asked to position themselves on a scale of 0 to 10 resembling the left– right ideological axis.

In the last section of the questionnaire, the questions on information sources, democratic and moral self are found. Regarding the preferable source of information, the respondents are asked to choose the two sources they use more often to get informed about politics. Moving on to the variable of "democratic self" [10], the respondent finds a set of 12 pictures, which conceptualize different versions of democracy. They are asked to choose three of them that symbolize in the best way how they understand democracy. Same wise, in the next question they are asked again to choose 3 pictures from a new set of 12 pictures, representing attitudes and views on life and moral values in general. These two sets of pictures construct symbolic representations of democratic institutions and of their personal moral compass (**Table 1**) [9].

### **3.1 First step of the analysis: clustering subjects into distinct groups**

In this step of the analysis, we select the three variables of the last section, these are the sources of information (E13), the understanding of democracy (E14) and the moral values (E15). For these variables, we have a dataset comprising of 0–1 values, where 0 equals to a not selected picture or source and 1 to a selected one. For each one of these three sets of variables, we apply HCA, aiming to summarize the information. HCA's output is the dendrogram in **Figure 1** visualizing the clusters created in each step.

Initially, we cluster the variables to see patterns of categories. In the example below, we cluster the pictures for democracy, getting 5 clusters (38, 40, 41, 46 and 44). As seen in **Figure 2**, cluster 38 is created by the selection of pictures 3, 10 and 11, cluster 40 consists of selecting picture 1 etc.


### **Table 1.**

*Coding and categories of the variables used in the analysis.*

Processing the same HCA analysis, to cluster the variables for each one of the three selected variables, we get 5 clusters for E14, 5 clusters for E15 and 4 clusters for E13, as shown in the **Table 2**.

We proceed by clustering now the subjects. Instead of having 12 binary variables to represent the democratic self, we produce clusters of similar choices and assign each one of the respondents to the clusters he is closer to according to this profile of answers. HCA again produces a dendrogram with the steps of the clustering process (**Figure 3**).

In the example shown in **Figure 4** we see how the answers on the 12 pictures on democratic self are transformed into one clustering variable (gr\_dem), assigning each respondent into one of the clusters of HCA. Following the same method, a separate application of HCA for information sources and for the moral self we get the clustering variables (gr\_inf) and (gr\_val).

After we have completed a separate HCA, to classify the subjects (respondents) for each one of the selected variables (E14, E15 and E13) we get 8 clusters of respondents for E14 (renamed to gr\_dem), 9 clusters for E15 (renamed to gr\_val) and 8 clusters for E13 (renamed to gr\_inf). **Table 3** shows a summary of the clusters of subjects for each *Semantic Map: Bringing Together Groups and Discourses DOI: http://dx.doi.org/10.5772/intechopen.103818*

**Figure 1.** *Dendrogram (HCA) indicating the clusters for E14 variable.*


**Figure 2.** *Classification process of the 12 pictures-variables of E14 (from E141 to E1412).*


**Table 2.**

*The clusters for each one of the variables (E4, E15 and E13) and the selected pictures they are linked to.*

*Semantic Map: Bringing Together Groups and Discourses DOI: http://dx.doi.org/10.5772/intechopen.103818*


**Figure 4.** *Transforming the dataset by replacing the binary E141-E1412 with the cluster membership variable gr\_dem.*


### **Table 3.**

*Cluster membership variables and their categories for E14, E15 and E13.*

one of the three variables we get the following table including the clusters and their relative frequency.

We investigate further the profile of each cluster for the variable E14. Each cluster is associated with selecting a set of pictures. As shown in **Table 4** cluster 3201 consists of the respondents who are more likely to select picture number 12, which corresponds to the symbolic representation for religion (**Table 5**). Cluster 3204 relates to


### **Table 4.**

*Weight of selecting each picture to the creation of the clusters for E14.*


### **Table 5.**

*Summarizing the content of each cluster and renaming the clusters for E14.*

selecting pictures 4,5,9 and 12 (e-democracy, representative, clientelism and religion). The sets of pictures connected to the clusters, depict the different profiles of the respondents according to the way they comprehend democracy.

Similarly, for variable E15, we describe the profiles of the cluster of the respondents regarding the pictures they are more likely to select. In **Table 6** we see that cluster 3187 is connected to the pictures 1, 2, 4 and 11 which correspond to riot, anonymous, army and protest, a representation of expressivist moral values (**Table 7**). In contrast, we see


### **Table 6.**

*Weight of selecting each picture to the creation of the clusters for E15.*

cluster 3207 having a completely naturalist moral values as it is connected to pictures 7, 8, 9, 12 (mountain, family, intimacy and concert).

Once more, we investigate the content of each cluster for the variable E13, regarding sources of information. Cluster 3136 includes those respondents who answer 1 and 3 (**Table 8**) which translates into preferring to get informed about politics by TVradio and family (**Table 9**).

### **3.2 Second step: joint analysis of the cluster membership variables**

In the second step of the analysis, we jointly analyze the initial variables together with the new cluster membership variables gr\_dem, gr\_var and gr\_inf. We repeat the steps as in the early stages of the analysis applying HCA which produced the following clusters for the subjects, as w result 8 clusters of respondents are detected (**Table 10**).

These clusters relate to the categories of the variables creating a behavioral profile for each one of the clusters of the respondents, in which they have been assigned accordingly. In **Table 11** the profiles of the clusters are given in full detail, e.g., cluster 3155 consists of respondents who belong to group 4, are men [sex1], they characterize themselves as center-left [lr\_c2], have moderate political knowledge [PK2], they choose to mobilize by personally addressing the authorities, take action through social media and/or let the authorities to do their job [PM1, PM3 and/or PM4],have a little political interest [PI3]. Furthermore, respondents in this cluster belong also in cluster 3136, 3208 and 3216 on how they get informed on politics, they belong to clusters 3207,3209, 3213 and 3214 regarding their views on democracy, and finally they belong in cluster 3192 regarding their set of moral values.

In the same way, we continue to examine each one of the clusters of the respondents to understand their behavioral profile, considering the total number of the variables used in our analysis.

In the next step, with the application of correspondence analysis, we extract the dimensions of the analysis and a set of coordinates for each one of the dimensions for each one of the variable categories (**Table 12**).



 *the clusters for E15.*


### **Table 8.**

*Weight of selecting each source of information to the creation of the clusters for E13.*


### **Table 9.**

*Summarizing the content of each cluster and renaming the clusters for E13.*


### **Table 10.**

*Clustering for the subjects using all the variables together with the new cluster membership variables, produced in the first step.*

An extra but final step of HCA is applied this time on the coordinates of the categories classifying them into groups (**Figure 5**).

The analysis highlights the existence of 10 distinct discourses of behavior (**Table 13**):


*Semantic Map: Bringing Together Groups and Discourses DOI: http://dx.doi.org/10.5772/intechopen.103818*


#### **Table 11.**

*Association between the clusters produced in the second step and the categories of the analysis.*



### **Table 12.**

*Coordinates for each one of the categories on two main dimensions (x,y).*


**Figure 5.** *Clustering the variables using their coordinates on the dimension as input.*

### *Semantic Map: Bringing Together Groups and Discourses DOI: http://dx.doi.org/10.5772/intechopen.103818*


**Table 13.** *Summarizing the association between the categories and the*

 *clusters.*


### **4. Final output: the semantic map**

Utilizing the coordinates of the points on the two first axes which were obtained from the correspondence analysis, we construct a system of 2 axes on which we place all these points [3]. The output resembles a simple Cartesian field where x is the first dimension (horizontal), and y is the second dimension (vertical). A third dimension can be brought into the analysis by using a three-dimensional space, visualizing the objects within a cube, or by presenting the different sets of the dimension by two.

The output is a semantic map, where all objects can be seen altogether, and their positioning on the field can be explained in terms of the object's proximity or opposition on each one of the dimensions.

In our example (**Figure 6**), we make the following observations:

The first axis is created by the opposing objects of: 1) group 1 (random students) and group 4 (high school students), followed by characteristics such as low political interest, getting informed by V-radio or friend and family, center left\center right, naturalistic values, choosing not to be mobilized or act on an individual level if needed and 2) group 2 and group 3 (university students of the civic education course) with high political

#### **Figure 6.**

*The semantic map, visualizing in a Cartesian field (x,y) the categories of all variables positioned according to their coordinates from AFC.*

interest, left, getting informed by newspapers and social media, expressivists choosing collective ways of mobilization.

The second axis depicts the antithesis between group 3 (online students of the civic education course) who are connected to the online information about politics, in contrast to the in-class students of group 2 who are linked to collective ways of mobilization. Additionally, the second axis is described by the antithesis between the set val\_1 (Riot, Anonymous, Army, Protest), dem\_5 (Riot, Deliberation, Volunteerism, Clientelism, Rebellion, Protest) and the set val\_6/val\_9 (Mountain, Family/Mountain, Family, Intimacy) and dem\_3/dem\_4 (Ancient Greece, Representative, Deliberation /e-Democracy). This polarization is explained as the difference between the democratic and moral discourses which were detected in the analysis.

### **5. Conclusion**

The method presented in this chapter, as applied in the example of a survey among universities and high school in Thessaloniki, follows the application of HCA and MCA (or AFC) in two steps.

The added value of the presented methodological approach lies in its competence to utilize an advanced clustering method that incorporates the dimension reduction function of correspondence analysis. Clustering in multiple stages of the analysis, produces summarized variables that can describe the overall behavior or profile of the subjects. Then these new cluster membership variables can be associated with the categories of the variables used in the clustering analysis, therefore we can associate each cluster not only with its subjects but with the categories as well. In the second step, the joint analysis of the cluster membership variables together with the rest of the variables of the analysis, produces a comprehensive clustering of all items together, associating them again with the categories of the variables. This procedure allows the researcher to have a full and comprehensive overview of the profiles of each cluster.

Moreover, correspondence analysis brings forward the inner competition of the phenomenon, extracting multiple dimensions that explain the dynamics within it. The coordinates of each object give a better understanding of the distances between them, and when analyzed again with HCA we get the final fully described clusters. The coordinates can visualize the phenomenon in a simple two-dimensional space or even of more dimensions, where the observer can comprehend in more detail the revealed inner relationships or oppositions among the subjects and the objects of the analysis.

### **Author details**

Theodore Chadjipadelis\* and Georgia Panagiotidou Aristotle University of Thessaloniki, Greece

\*Address all correspondence to: chadji@polsci.auth.gr

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Semantic Map: Bringing Together Groups and Discourses DOI: http://dx.doi.org/10.5772/intechopen.103818*

### **References**

[1] Galbraith JI, Bartholomew DJ, Moustaki I, Steele F. The Analysis and Interpretation of Multivariate Data for Social Scientists. London: Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences; 2002

[2] Benzècri JP. L'analyse des donnees. Tome 2: L'analyse des correspondances. Paris: Dunod; 1973

[3] Greenacre M. Biplots in Practice. Bilbao: Fundación BBVA; 2010

[4] Panagiotidou G, Chadjipadelis T. First-time voters in Greece: Views and attitudes of youth on Europe and democracy. In: Chadjipadelis T, Lausen B, Markos A, Lee TR, Montanari A, Nugent R, editors. Studies in Classification, Data Analysis and Knowledge Organization. Springer, Cham; 2020. pp. 415-429

[5] Chadjipadelis T. Parties, Candidates, Issues: The Effect of Crisis, Correspondence Analysis and Related Methods. Napoli, Italy: CARME; 2015

[6] Papadimitriou G, Florou G. Contribution of the Euclidean and chisquare metrics to determining the most ideal clustering in ascending hierarchy (in Greek). In: Annals in Honor of Professor I Liakis. Thessaloniki: University of Macedonia; 1996. pp. 546-581

[7] Karapistolis D. Software Method of Data Analysis MAD [Internet]. 2010. Available from: http://www.pylimad.gr/ [Accessed: January 25, 2022]

[8] Greenacre M. Correspondence Analysis in Practice. Boca Raton: Chapman and Hall/CRC Press; 2007

[9] Marangudakis M, Chadjipadelis T. The Greek Crisis and its Cultural

Origins. New York: Palgrave-Macmillan; 2019

[10] Taylor C. Sources of the Self. Cambridge, MA: Harvard University Press; 1991

### *Edited by Niansheng Tang*

In view of the considerable applications of data clustering techniques in various fields, such as engineering, artificial intelligence, machine learning, clinical medicine, biology, ecology, disease diagnosis, and business marketing, many data clustering algorithms and methods have been developed to deal with complicated data. These techniques include supervised learning methods and unsupervised learning methods such as density-based clustering, K-means clustering, and K-nearest neighbor clustering. This book reviews recently developed data clustering techniques and algorithms and discusses the development of data clustering, including measures of similarity or dissimilarity for data clustering, data clustering algorithms, assessment of clustering algorithms, and data clustering methods recently developed for insurance, psychology, pattern recognition, and survey data.

### *Andries Engelbrecht, Artificial Intelligence Series Editor*

Published in London, UK © 2022 IntechOpen © your\_photo / iStock

Data Clustering

IntechOpen Series

Artificial Intelligence, Volume 10

Data Clustering

*Edited by Niansheng Tang*