**Graph Mining Based SOM: A Tool to Analyze Economic Stability**

## Marina Resta

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/51240

## **1. Introduction**

Living in times of Global Financial Crisis (GFC) has offered new challenges to researchers and financial policy makers, in search of tools assuring either to monitor or to prevent the incur‐ rence of critical situations. This issue, as usual, can be managed under various perspectives.

Under the economic profile, two basic strands emerged: various contributions debated on the central role of systemic risk in conditioning countries financial fragility; a second vein concerned the role (either in positive or negative sense) of financial sector on economic growth. Provided the relevance for our work, we will discuss each of them in a deeper way.

For what it concerns the first aspect, there are several definitions of systemic risk (see for in‐ stance: [1]; [2], [3] and [4]), but there is not any widely accepted definition for it. Nevertheless, we agree with the position of [5] who claimed that systemic risk can be identified by the pres‐ ence of two distinct elements: an initial random shock, as the source of systemic impact, and a contagion mechanism (such as the interbank market or the payment system), which spread the negative shock wave to other members of the system. Along this vein, a growing body of em‐ pirical research has already bloomed: [6] suggested a network approach to analyze the impact of liquidity shocks into financial systems; a similar approach was followed by [7] discussing the case of United Kingdom, Boss [8] for Austria, and [9] for Switzerland; more recently Sora‐ maki et. al. (2012) developed a software platform1 that employs graphs models for various pur‐ poses, including to monitor financial contagion spreading effects.

A second related point concerns the evaluation of how financial sector can condition coun‐ tries' economic growth. There is a general agreement in financial economics literature about

<sup>1</sup> Financial Network Analysis (fna): free web version available at: http://www.fna.fi/products/list.

the existence of a link between bankruptcies and the business cycle. However, the same does not apply when one is asked to identify the methods and the variables by which bankrupt‐ cies and the business cycle interact. Basic streams of research moved along four directions. A number of papers focused on the application of discriminant analysis over a bunch of ac‐ counting variables (see for instance: [10], [11], [12]; [13]). A second group of papers (see among the others: [14]) employs the methodology initiated by [15], who used logistic regres‐ sion models (logit) on macroeconomic variables. A third strand focuses on duration models, i.e. models that measure how long the economic system remains in a certain state. This is the line joined, for example, by [16], and [17]. Finally, there is a plenty of (more or less) sophisti‐ cated econometric techniques aided to estimate bankruptcies by means of macroeconomic variables. Interested readers may take a look to [18], and [19].

analyze the impact of emerging patterns over the economic situation of the country. This was done both in a static way, i.e. by observing the situation with data referring to a fixed one year long period (from December 2010 to December 2011), and in a dynamic way, by comparison of MST obtained for each countries with data extracted by means of a 300 days long moving win‐

Graph Mining Based SOM: A Tool to Analyze Economic Stability

http://dx.doi.org/10.5772/51240

3

Our major findings may be then summarised as follows: (i) using SOM we got an original rep‐ resentation of financial markets; (ii) by building from SOM winning nodes the corresponding MST it was possible both to emphasize the relations among various quoted enterprises, and to check for the emergence of critical patterns; (iii) we provided a global representation of coun‐ tries financial situation that generates information that can be of help to policy makers, in or‐

As stated in Section 1, we examined financial markets data by means of a hybrid technique which assumes the joint use of both SOM and graphs formalism. In order to assure a better understanding of this framework, we will recall some basic definitions and notational con‐

A Self Organizing Map (SOM) is a single layer neural network, where neurons are set along an *n*-dimensional grid: typical applications assume a 2-dimensions rectangular grid, but hexagonal as well as toroidal grids are also possible. Each neuron has as many components as the input patterns: mathematically this implies that both neuron and inputs are vectors embedded in the same space. Training a SOM requires a number of steps to be performed in

**2.** to select the neuron (node) with the smallest distance from x. We will call it winner neu‐

**3.** to correct the position of each node according to the results of Step 2., in order to pre‐

Steps 1.- 3. can be repeated either once or more than once for each input pattern: a good stopping criterion generally consists in taking a view to the so called Quantization Error (QE), i.e. a weighted average over the Euclidean norms of the difference between the input vector and the corresponding BMU. When QE goes below a proper threshold level, say for

In this way, once the learning procedure is concluded, we get an organization of SOM which takes into account how the input space is structured, and projects it into a lower dimension‐

dow over a time interval of overall length of 3000 days (approximately ten years).

der to realize more efficient interventions in periods of higher instability.

**2. Methodology**

ventions for both the aforementioned tools.

**2.1. Self Organizing Maps: Basic principles**

ron or Best Matching Unit (BMU);

serve the network topology.

a sequential way. For a generic input pattern x we will have:

**1.** to evaluate the distance between x and each neuron of the SOM;

instance 10-2 or lower, it might be suitable to stop the procedure.

al space where closer nodes represent neighboring input patterns.

From all the above research streams dealing with crisis and financial (in)stability we extract three discussion issues. As first remark, our review highlighted that in general, in all periods of crisis there is always a strong financial component. As second remark, we may observe that the economic literature addressed the analysis mainly by means of either macroeconomic or ac‐ counting data. Finally, we want to focus on a methodological issue: quantitative papers gener‐ ally studied the problem by means of econometric techniques; only over the past decade soft computing methods (namely: graphs models) have become of some interest for economic re‐ searchers and policy makers. Starting from this point, we think that there is enough room to add something newer towards the following directions: (i) studying the emergence of instabil‐ ity by way of financial markets data; (ii) using a hybrid approach combining graphs models to‐ gether with non-linear dimension reduction techniques, in detail: with Self Organizing Maps [20]. To such purpose, it aids to remember that Self Organizing Maps (SOM) are nowadays a landmark among soft computing techniques, with applications which virtually span over all fields of knowledge. However, while the use of SOM in robotics, medical imaging, characters recognition, to cite most important examples, is celebrated by a consistent literature corpus (in‐ terested readers may take a look to [21], [22], and [23]), economics and financial markets seem relatively less explored, with some notable exceptions (from the pioneering works of [24], [25] to, more recently, [26] and [27]). Such lack of financial applications is quite non-sense, provid‐ ed the great potential that relies on this kind of technique.

The rationale of this contribution is to offer some insights about the use of SOM to explore how financial markets organize during critical periods i.e. deflation, recession and so on. Some‐ thing similar has been already discussed in [28] and [29], who deal with the use of SOM as sup‐ port tool for Early Warning Systems (EWS), alerting the decision maker in case of critical economic situations. However, the present contribution goes one step forward under various points of view. The first element of innovation relies on the examined data. We studied the sit‐ uation of markets characterized by different levels of (in)stability, but instead of using either fi‐ nancial or macroeconomic indicators as it is generally done in literature, we employed historical time-series of price levels for every enterprise quoted in the related stock exchanges, and we then trained a SOM for each market. A second innovative item relied on the use of the so obtained SOM best matching units, to build the corresponding Minimum Spanning Tree (MST). In this way we were able to capture both the clusters structure of every market and to analyze the impact of emerging patterns over the economic situation of the country. This was done both in a static way, i.e. by observing the situation with data referring to a fixed one year long period (from December 2010 to December 2011), and in a dynamic way, by comparison of MST obtained for each countries with data extracted by means of a 300 days long moving win‐ dow over a time interval of overall length of 3000 days (approximately ten years).

Our major findings may be then summarised as follows: (i) using SOM we got an original rep‐ resentation of financial markets; (ii) by building from SOM winning nodes the corresponding MST it was possible both to emphasize the relations among various quoted enterprises, and to check for the emergence of critical patterns; (iii) we provided a global representation of coun‐ tries financial situation that generates information that can be of help to policy makers, in or‐ der to realize more efficient interventions in periods of higher instability.

## **2. Methodology**

the existence of a link between bankruptcies and the business cycle. However, the same does not apply when one is asked to identify the methods and the variables by which bankrupt‐ cies and the business cycle interact. Basic streams of research moved along four directions. A number of papers focused on the application of discriminant analysis over a bunch of ac‐ counting variables (see for instance: [10], [11], [12]; [13]). A second group of papers (see among the others: [14]) employs the methodology initiated by [15], who used logistic regres‐ sion models (logit) on macroeconomic variables. A third strand focuses on duration models, i.e. models that measure how long the economic system remains in a certain state. This is the line joined, for example, by [16], and [17]. Finally, there is a plenty of (more or less) sophisti‐ cated econometric techniques aided to estimate bankruptcies by means of macroeconomic

From all the above research streams dealing with crisis and financial (in)stability we extract three discussion issues. As first remark, our review highlighted that in general, in all periods of crisis there is always a strong financial component. As second remark, we may observe that the economic literature addressed the analysis mainly by means of either macroeconomic or ac‐ counting data. Finally, we want to focus on a methodological issue: quantitative papers gener‐ ally studied the problem by means of econometric techniques; only over the past decade soft computing methods (namely: graphs models) have become of some interest for economic re‐ searchers and policy makers. Starting from this point, we think that there is enough room to add something newer towards the following directions: (i) studying the emergence of instabil‐ ity by way of financial markets data; (ii) using a hybrid approach combining graphs models to‐ gether with non-linear dimension reduction techniques, in detail: with Self Organizing Maps [20]. To such purpose, it aids to remember that Self Organizing Maps (SOM) are nowadays a landmark among soft computing techniques, with applications which virtually span over all fields of knowledge. However, while the use of SOM in robotics, medical imaging, characters recognition, to cite most important examples, is celebrated by a consistent literature corpus (in‐ terested readers may take a look to [21], [22], and [23]), economics and financial markets seem relatively less explored, with some notable exceptions (from the pioneering works of [24], [25] to, more recently, [26] and [27]). Such lack of financial applications is quite non-sense, provid‐

The rationale of this contribution is to offer some insights about the use of SOM to explore how financial markets organize during critical periods i.e. deflation, recession and so on. Some‐ thing similar has been already discussed in [28] and [29], who deal with the use of SOM as sup‐ port tool for Early Warning Systems (EWS), alerting the decision maker in case of critical economic situations. However, the present contribution goes one step forward under various points of view. The first element of innovation relies on the examined data. We studied the sit‐ uation of markets characterized by different levels of (in)stability, but instead of using either fi‐ nancial or macroeconomic indicators as it is generally done in literature, we employed historical time-series of price levels for every enterprise quoted in the related stock exchanges, and we then trained a SOM for each market. A second innovative item relied on the use of the so obtained SOM best matching units, to build the corresponding Minimum Spanning Tree (MST). In this way we were able to capture both the clusters structure of every market and to

variables. Interested readers may take a look to [18], and [19].

2 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

ed the great potential that relies on this kind of technique.

As stated in Section 1, we examined financial markets data by means of a hybrid technique which assumes the joint use of both SOM and graphs formalism. In order to assure a better understanding of this framework, we will recall some basic definitions and notational con‐ ventions for both the aforementioned tools.

#### **2.1. Self Organizing Maps: Basic principles**

A Self Organizing Map (SOM) is a single layer neural network, where neurons are set along an *n*-dimensional grid: typical applications assume a 2-dimensions rectangular grid, but hexagonal as well as toroidal grids are also possible. Each neuron has as many components as the input patterns: mathematically this implies that both neuron and inputs are vectors embedded in the same space. Training a SOM requires a number of steps to be performed in a sequential way. For a generic input pattern x we will have:


Steps 1.- 3. can be repeated either once or more than once for each input pattern: a good stopping criterion generally consists in taking a view to the so called Quantization Error (QE), i.e. a weighted average over the Euclidean norms of the difference between the input vector and the corresponding BMU. When QE goes below a proper threshold level, say for instance 10-2 or lower, it might be suitable to stop the procedure.

In this way, once the learning procedure is concluded, we get an organization of SOM which takes into account how the input space is structured, and projects it into a lower dimension‐ al space where closer nodes represent neighboring input patterns.

## **2.2. Graphs models: A brief review and some notational conventions**

In order to understand how graphs theory can be used in clusters analysis, it is worth to re‐ view some basic terminology.

nodes connected under a suitably defined neighborhood relationship. A cluster is thus de‐ fined to be a connected subgraph, obtained according to criteria peculiar of each specific al‐ gorithm. Algorithms based on this definition are capable of detecting clusters of various shapes and sizes, at least for the case in which they are well separated. Moreover, isolated samples should form singleton clusters and then can be easily discarded as noise in case of

Graph Mining Based SOM: A Tool to Analyze Economic Stability

http://dx.doi.org/10.5772/51240

5

**Figure 1.** From left to right: the adjacency matrix (a) for an undirected graph, and the corresponding graph (b). The

With this in mind, one can easily understand that coping SOM (that satisfy topology preser‐ vation features) to graphs (that do not require any a priori assumption about the input space

SOM achieves a bi-dimensional representation of the input domain, maintaining unchanged the basic relations among neighbor patterns: closer points in their r-dimensions (r>>2) initial space are still nearer one to each other in the SOM grid; in addition, they are projected into a space where relations can be easily visualized and understood. However, sometimes this

Consider the issue to represent basic relations among quoted societies in a market (for exam‐ ple: in the Italian market). Figure 2 shows SOM, once the relations among Italian quoted

Here we have a SOM assuring an overall good performance, in terms of quantization error (QE<10-3), but the winner nodes are even too much closer than desired, thus making difficult

Moving one step forward, we suggested a hybrid procedure that combines together SOM and MST (see also [31]). The idea by itself is not totally newer: [32], for instance, suggested a variant of SOM where neighborhood relationships during the training stage were defined along the MST; [33], and, more recently, [34] applied a MST to SOMs to connect similar no‐ des with each other, thus visualizing related nodes on the map. In all cited cases this was

ones in the matrix indicate the existence of a connection among nodes, while zeroes mean no connection.

distribution) should result in a very powerful tool to analyze data domains.

**2.3. A hybrid model combining SOM to MST**

to understand the effective significance of their closeness.

cannot be enough.

companies have been learned.

cluster detection problems.

From the mathematical point of view, a graph (network) G = (V,E) is perfectly identified by a (finite) set V, and a collection E ⊆ V ×V, of unordered pairs {*u*, *v*} of distinct elements from V. Each element of V is called a vertex (point, node), and each element of E is called an edge (line, link). Edges of the form (*u,u*), for some *u* ∈ V, are called self-loops, but in practical ap‐ plications they typically are not contained in a graph.

A sequence of connected vertices forms a *path*; the number *n* of vertices, (i.e. the cardinality of V), defines the *order of graph* and it is denoted by |V|:=*n*. In a similar way, the number *m* of edges (the cardinality of E), is called the *size of the graph* and denoted by: |E|:= *m*. Finally, the number of neighbors of any vertex *v* ∈ V in the graph identifies its *degree*.

Moreover, the graph G will be claimed to be:


Our brief explanation highlights that Minimum Spanning tree is nothing but a particular graph with no cycles, where all nodes are connected and edges are selected in order to mini‐ mize the sum of distances.

Graphs representation passes through the building of the *adjacency matrix*, i.e. the matrix that marks neighbor vertexes with one, and with zero not adjacent nodes. Figure 1 provides an explanatory example.

In a number of real world applications there is the common habit to use graphs theory for‐ malism, representing the problem data through an undirected graph. Each node is associat‐ ed to a sample in the feature space, while to each edge is associated the distance between nodes connected under a suitably defined neighborhood relationship. A cluster is thus de‐ fined to be a connected subgraph, obtained according to criteria peculiar of each specific al‐ gorithm. Algorithms based on this definition are capable of detecting clusters of various shapes and sizes, at least for the case in which they are well separated. Moreover, isolated samples should form singleton clusters and then can be easily discarded as noise in case of cluster detection problems.

**Figure 1.** From left to right: the adjacency matrix (a) for an undirected graph, and the corresponding graph (b). The ones in the matrix indicate the existence of a connection among nodes, while zeroes mean no connection.

With this in mind, one can easily understand that coping SOM (that satisfy topology preser‐ vation features) to graphs (that do not require any a priori assumption about the input space distribution) should result in a very powerful tool to analyze data domains.

#### **2.3. A hybrid model combining SOM to MST**

**2.2. Graphs models: A brief review and some notational conventions**

plications they typically are not contained in a graph.

Moreover, the graph G will be claimed to be:

set is composed of unordered vertex pairs;

**•** *regular*, if all the vertices of G have the same degree;

graph joins a vertex in first set to a vertex in second;

**•** *complete*, if every two distinct vertices are joined by exactly one edge;

mum length spanning tree is called Minimum Spanning Tree (MST).

**•** *simple*, if it has no loops or multiple edges;

**•** a *path*, if consisting of a single path.

mize the sum of distances.

an explanatory example.

contrary holds.

view some basic terminology.

4 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

In order to understand how graphs theory can be used in clusters analysis, it is worth to re‐

From the mathematical point of view, a graph (network) G = (V,E) is perfectly identified by a (finite) set V, and a collection E ⊆ V ×V, of unordered pairs {*u*, *v*} of distinct elements from V. Each element of V is called a vertex (point, node), and each element of E is called an edge (line, link). Edges of the form (*u,u*), for some *u* ∈ V, are called self-loops, but in practical ap‐

A sequence of connected vertices forms a *path*; the number *n* of vertices, (i.e. the cardinality of V), defines the *order of graph* and it is denoted by |V|:=*n*. In a similar way, the number *m* of edges (the cardinality of E), is called the *size of the graph* and denoted by: |E|:= *m*. Finally,

**•** *directed*, if the edges set is composed of ordered vertex (node) pairs; *undirected* if the edge

**•** *acyclic* if there is not any possibility to loop back again from every vertex; *cyclic* if the

**•** *connected*, if there is a path in G between any given pair of vertices, otherwise it is *disconnected*;

**•** *bipartite*, if the vertex–set can be split into two sets in such a way that each edge of the

**•** a *tree*, if it is connected and it has no cycles. If G is a connected graph, the spanning tree in G will be a subgraph of G which includes every vertex of G and is also a tree. The mini‐

Our brief explanation highlights that Minimum Spanning tree is nothing but a particular graph with no cycles, where all nodes are connected and edges are selected in order to mini‐

Graphs representation passes through the building of the *adjacency matrix*, i.e. the matrix that marks neighbor vertexes with one, and with zero not adjacent nodes. Figure 1 provides

In a number of real world applications there is the common habit to use graphs theory for‐ malism, representing the problem data through an undirected graph. Each node is associat‐ ed to a sample in the feature space, while to each edge is associated the distance between

the number of neighbors of any vertex *v* ∈ V in the graph identifies its *degree*.

SOM achieves a bi-dimensional representation of the input domain, maintaining unchanged the basic relations among neighbor patterns: closer points in their r-dimensions (r>>2) initial space are still nearer one to each other in the SOM grid; in addition, they are projected into a space where relations can be easily visualized and understood. However, sometimes this cannot be enough.

Consider the issue to represent basic relations among quoted societies in a market (for exam‐ ple: in the Italian market). Figure 2 shows SOM, once the relations among Italian quoted companies have been learned.

Here we have a SOM assuring an overall good performance, in terms of quantization error (QE<10-3), but the winner nodes are even too much closer than desired, thus making difficult to understand the effective significance of their closeness.

Moving one step forward, we suggested a hybrid procedure that combines together SOM and MST (see also [31]). The idea by itself is not totally newer: [32], for instance, suggested a variant of SOM where neighborhood relationships during the training stage were defined along the MST; [33], and, more recently, [34] applied a MST to SOMs to connect similar no‐ des with each other, thus visualizing related nodes on the map. In all cited cases this was done by calculating the square difference between neighbor units on the trained map, and using this value to color the edge separating the units.

**iii.** For each element in *Lord* (*Uord*) add the corresponding couple from *C* to MST;

**iv.** Repeat step iii. until all the elements in *Lord* (*Uord*) have been examined, and

The result is a filtering of available information, letting only more significant patterns to

Our work is aimed to demonstrate how a fully data-driven approach can be helpful to ana‐ lyze complex financial situations in quite an intuitive way, thus making SOM-MST a very

We performed both static and dynamic analyses, as we are going to explain. As starting point for the static analysis we selected a market and for each quoted enterprise we took all available price levels (*pl)* from December 2010 to December 2011. In this way for the generic i-th stock (i=1,…,N, where N is the overall number of quoted enterprises) we got the time-

The transformation described in (2) turns price levels into price log-returns: this is a com‐ mon practice in empirical financial studies to avoid any trend effect in data. The final result was a matrix *Σ* of dimensions *N* × *T*-1, containing T-1 log-returns for each quoted enterprise (for an overall number of *N*). As final step, we performed on *Σ* the procedure we explained

The dynamical procedure is similar to the static one, but instead of considering last year sample for each stock, we examined a number of fixed length samples, going back in time (when possible) up to ten years. In practice, assuming as starting point t=3000 the day Dec. 30 2011, we build for each stock the block B1 going from t=2701 to t=3000; the block B2 with data from t=2401 to t=2700; and so on towards the block B10 that goes from t=1 to t=300. In practice, instead of having a single block of data to analyze, in the dynamical procedure we can monitor the situation of the country with different sets of data. Moreover, taking ad‐ vantage of the networks representation, one can have a look to graphs statistics for every

We applied our methodology to the German and Spanish markets. Our choice obeys to a precise motivation: we have examined countries characterized by different levels of (in)sta‐ bility: at the end of 2011 Spanish financial equilibrium seemed heavily compromised, while

(*i*) , *k* =1, ‧, *T* - 1 (2)

Graph Mining Based SOM: A Tool to Analyze Economic Stability

http://dx.doi.org/10.5772/51240

7

hold the inserted link, otherwise discard it.

then stop the procedure

reliable tool also for policy makers.

**3. Experimental settings and results discussion**

} with length T-1, being:

*pvk* (*i*)

year and to compare them over the ten years time horizon.

Germany still maintained its leading role in Europe.

<sup>=</sup>*log plk* +1

(*i*) *plk*

emerge.

series *S* (*i*)

={*pvk* (*i*)

in Sec.2.3, coping SOM to MST.

in particular, if the graph is still acyclic (i.e. no loops are added to MST), then

**Figure 2.** SOM representation of Italian quoted companies.

In a likewise manner, we applied a clustering procedure whose main steps can be summar‐ ized as follows:


$$\mathbf{a.} \tag{1}$$

$$\mathbf{a.} \tag{2.1} \tag{2.2} \\ \text{we is an } \mathbf{M} \times \mathbf{M} \text{ is a BMU} \tag{1}$$

	- **i.** sort the elements of *L* (*U*) in decreasing order, thus moving from *L* to the list *Lord* (from *U* to *Uord*).
	- **ii.** ii.Set the coordinates in *C* of the first element of *Lord* (*Uord*) as those of the first two nodes of the MST.

The result is a filtering of available information, letting only more significant patterns to emerge.

## **3. Experimental settings and results discussion**

done by calculating the square difference between neighbor units on the trained map, and

In a likewise manner, we applied a clustering procedure whose main steps can be summar‐

**a.** B={*w* :*w* ∈*M* ∧*w* is a BMU} (1)

**i.** sort the elements of *L* (*U*) in decreasing order, thus moving from *L* to the list

**ii.** ii.Set the coordinates in *C* of the first element of *Lord* (*Uord*) as those of the first

**4.** Use C as starting point to compute the MST. In particular, since C is symmetric, one can

consider only the lower (*L*) or Upper (*U*) triangular part of the matrix, and:

**1.** Define a SOM *M* (made of a number *n* of neurons *w*) over the input space and run it.

**2.** For each input sample extract the corresponding BMU. We set:

**3.** Build the correlation matrix C among the nodes belonging to B.

using this value to color the edge separating the units.

6 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

**Figure 2.** SOM representation of Italian quoted companies.

*Lord* (from *U* to *Uord*).

two nodes of the MST.

ized as follows:

Our work is aimed to demonstrate how a fully data-driven approach can be helpful to ana‐ lyze complex financial situations in quite an intuitive way, thus making SOM-MST a very reliable tool also for policy makers.

We performed both static and dynamic analyses, as we are going to explain. As starting point for the static analysis we selected a market and for each quoted enterprise we took all available price levels (*pl)* from December 2010 to December 2011. In this way for the generic i-th stock (i=1,…,N, where N is the overall number of quoted enterprises) we got the timeseries *S* (*i*) ={*pvk* (*i*) } with length T-1, being:

$$p\upsilon\_k^{(j)} = \log \frac{p l\_{k+1}^{(j)}}{p l\_k^{(j)}}, \; k = 1, \; \cdot, \; T \; -1 \tag{2}$$

The transformation described in (2) turns price levels into price log-returns: this is a com‐ mon practice in empirical financial studies to avoid any trend effect in data. The final result was a matrix *Σ* of dimensions *N* × *T*-1, containing T-1 log-returns for each quoted enterprise (for an overall number of *N*). As final step, we performed on *Σ* the procedure we explained in Sec.2.3, coping SOM to MST.

The dynamical procedure is similar to the static one, but instead of considering last year sample for each stock, we examined a number of fixed length samples, going back in time (when possible) up to ten years. In practice, assuming as starting point t=3000 the day Dec. 30 2011, we build for each stock the block B1 going from t=2701 to t=3000; the block B2 with data from t=2401 to t=2700; and so on towards the block B10 that goes from t=1 to t=300. In practice, instead of having a single block of data to analyze, in the dynamical procedure we can monitor the situation of the country with different sets of data. Moreover, taking ad‐ vantage of the networks representation, one can have a look to graphs statistics for every year and to compare them over the ten years time horizon.

We applied our methodology to the German and Spanish markets. Our choice obeys to a precise motivation: we have examined countries characterized by different levels of (in)sta‐ bility: at the end of 2011 Spanish financial equilibrium seemed heavily compromised, while Germany still maintained its leading role in Europe.

## **4. Results and discussion**

Before going to separately discuss various cases, we will spend a few words about some common features shared by our simulation study.

Starting from the static case, we examined German, Italian and Spanish markets from 30 De‐ cember 2010 to 30 December 2011. For each market we considered data of quoted enterpris‐ es, transforming them according to the formula given in (2). Table 1 highlights some basic details concerning the markets we have considered.


**Figure 3.** German Market topology, as resulting in the static case. Natural clusters are highlighted with different colors.

Graph Mining Based SOM: A Tool to Analyze Economic Stability

http://dx.doi.org/10.5772/51240

9

This cluster structure is partly due to the filtering procedure operated on SOM BMUs af‐ ter the learning stage, but the resulting organization makes sense also if we look at the statistical features of the clusters (Table 2) as well as at their composition, by industrial

**CL. ID. mu std sk ku SR** CL01 0.0001270 0.045235 1.651532 411.3881 0.281% CL02 0.0002923 0.055717 10.48408 2521.463 0.525% CL03 0.0001208 0.143701 9.605624 906.0551 0.084% CL04 0.0002904 0.056795 10.3774 2448.29 0.511% CL05 0.0003361 0.056718 10.14951 2386.374 0.593% CL06 0.0004759 0.057566 11.4447 2608.218 0.827% CL07 0.0003254 0.056369 10.4055 2471.74 0.577% CL08 0.0003683 0.05673 10.54512 2485.83 0.649% CL09 0.0003223 0.056334 10.14833 2402.134 0.572% CL10 0.0004811 0.057746 11.11191 2515.727 0.833% CL11 0.0002920 0.05616 10.06927 2395.238 0.520% CL12 0.0005635 0.058396 11.85686 2655.044 0.965% CL13 0.0002770 0.058349 11.88501 2663.531 0.475% CL14 0.0003343 0.056866 10.46831 2462.213 0.588%

**Table 2.** Statistical properties of the network of German stocks in the period December 2010-December 2011.

reference sector (Table 3).

**Table 1.** Markets main features.

The column *Country* reports the name of the countries whose assets have been examined, while *Idx* indicates the name of the national market index that has been employed to pick up quoted stocks; *NrS* is the number of stocks we included for every market; finally *MD* high‐ lights input matrix dimensions in our simulation study. In particular, we referred to the CDAX (*Composite Deutscher Aktienindex*) index for the German market, and the IGBM (*Index General de Bolsa Madrid*) index in the case of Spain. As a straightforward observation, one can argue that the overall number of quoted enterprises in those markets should be higher than the one we have reported in the third column of Table 1. However, for sake of comparison among graphs, we needed to eliminate from the markets those stocks for which it was not possible to go back in time enough (at least 600 days, approximately corresponding to two years and half of market tradings).

#### **4.1. The case of Germany**

Applying our procedure led us to obtain the skeleton framework of the German stock ex‐ change that is shown in Figure 3.

Our procedure found out 14 clusters. At first glimpse the clusters seem to be *natural*, in the sense discussed in [35], i.e.:


Graph Mining Based SOM: A Tool to Analyze Economic Stability http://dx.doi.org/10.5772/51240 9

**4. Results and discussion**

8 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

**Table 1.** Markets main features.

years and half of market tradings).

change that is shown in Figure 3.

**•** each node is member of exactly one group;

**•** each node has many edges to other members of its group;

**•** each node has few or even no edges to nodes of other groups.

**4.1. The case of Germany**

sense discussed in [35], i.e.:

common features shared by our simulation study.

details concerning the markets we have considered.

Before going to separately discuss various cases, we will spend a few words about some

Starting from the static case, we examined German, Italian and Spanish markets from 30 De‐ cember 2010 to 30 December 2011. For each market we considered data of quoted enterpris‐ es, transforming them according to the formula given in (2). Table 1 highlights some basic

The column *Country* reports the name of the countries whose assets have been examined, while *Idx* indicates the name of the national market index that has been employed to pick up quoted stocks; *NrS* is the number of stocks we included for every market; finally *MD* high‐ lights input matrix dimensions in our simulation study. In particular, we referred to the CDAX (*Composite Deutscher Aktienindex*) index for the German market, and the IGBM (*Index General de Bolsa Madrid*) index in the case of Spain. As a straightforward observation, one can argue that the overall number of quoted enterprises in those markets should be higher than the one we have reported in the third column of Table 1. However, for sake of comparison among graphs, we needed to eliminate from the markets those stocks for which it was not possible to go back in time enough (at least 600 days, approximately corresponding to two

Applying our procedure led us to obtain the skeleton framework of the German stock ex‐

Our procedure found out 14 clusters. At first glimpse the clusters seem to be *natural*, in the

**Country Idx NrS MD**

Germany CDAX 207 207 × 245 Spain IGBM 85 85 × 245

**Figure 3.** German Market topology, as resulting in the static case. Natural clusters are highlighted with different colors.

This cluster structure is partly due to the filtering procedure operated on SOM BMUs af‐ ter the learning stage, but the resulting organization makes sense also if we look at the statistical features of the clusters (Table 2) as well as at their composition, by industrial reference sector (Table 3).


**Table 2.** Statistical properties of the network of German stocks in the period December 2010-December 2011.

We examined basic clusters statistics: mean (*mu*), standard deviation (*std*), skewness (*sk*), and kurtosis (*ku*). We also evaluated the Sharpe Ratio of every cluster (SR):

**Sector CL01 CL02 CL03 CL04 CL05 CL06 CL07 CL08 CL09 CL10 CL11 CL12 CL13 CL14** Press 0.00 0.00 7.14 0.00 7.14 6.67 0.00 0.00 10.00 0.00 0.00 7.69 4.14 0.00 Imp/Exp 0.00 0.00 0.00 0.00 0.00 0.00 4.35 0.00 0.00 0.00 0.00 0.00 0.00 9.09 PU 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 4.52 0.00 4.79 TCom 0.00 7.14 8.24 0.00 0.00 0.00 0.00 0.00 13.33 0.00 7.14 0.00 6.89 0.00 Auto 6.67 0.00 0.00 0.00 0.00 0.00 9.52 0.00 2.50 0.00 0.00 7.14 0.00 9.09 Gard 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Man 6.67 0.00 0.00 8.33 0.00 13.33 4.52 7.14 4.76 7.14 0.00 4.28 0.00 0.00

Graph Mining Based SOM: A Tool to Analyze Economic Stability

http://dx.doi.org/10.5772/51240

11

In general, clusters did not show an exclusive, but rather a dominant composition. Looking at Table 3, in fact, CL01 exhibits a dominant percentage of companies from both Services (Serv) and Logistics (Log) sector (20%), CL02 is equally divided into firms belonging to Banking and Finance (B&F), Health-Care (HC), Logistics and Components (Comp) sectors which share the same 14.29% percentage. B&F dominates (30%) cluster CL03 as well as CL05 (42.86%), CL11 (21.43%), and CL13 (23.08%). Hi-Tech companies (HT) are preferably grouped into clusters CL04 (25%) and CL10 (28.57%). Companies working in the Health Care sector (HC) are more numerous in clusters CL07 and CL08 (17.39% and 27.28% respectively). Finally, clusters CL06 and CL09 have their most representative elements in societies of energy sector (En) (20% and

This seems to suggest that despite of the variety of sectors represented in German Stock Ex‐ change, only a reduced number of them (i.e. clusters dominant sectors) may be considered the very driving engine of the German economy. Such information together with the one re‐ trieved by looking at the Sharpe Ratio scores has strengthened the belief that Hi-Tech and

composition is lower, or better, they did not seem to cluster anyway. If this sounds reasonably for some niche-wise sectors (Luxury and Gardening, to make some examples), this is more sur‐ prising for other sectors (mainly Automotive and Telecommunications) that are worldwide known as strengths of German economy. This evidence, however, is somewhat aligned to the

We can then conclude that Germany did not particularly suffer for the critical situation com‐ mon to greater part of European countries. The role played by both Hi-Tech and Energy sec‐ tors has been probably a key issue. However, from now on Germany should carefully monitor the state of B&F companies that are those that actually are performing worse. Other

2 Fashion (Fash), Luxury Goods (Lux), Housing (Hou), Retail Services (Ret Serv.), Food and Drinking (F&D), Entertrain‐ ment (Ent), Press (Press), Import/Export (Imp/Exp), Public Utilities (PU), Telecommunications (TCom), Automotive (Au‐

whose incidence on clusters

Energy are, at present, the most challenging areas for investors in German market.

policy strategy that the German government has adopted in most recent times.

**Table 3.** Clusters percentage composition according to the reference industrial sector.

14.29%), while CL14 is dominated by Heavy Industry (HI) companies.

As a counterpart, we observed that there is a plenty of sectors2

to), Gardening (Gard) and Manifacturing (Man).

$$\mathcal{SR} = \frac{mm - rf}{std}$$

where *rf* is the risk free rate, and *mu*, *std* are as above described. According to financial liter‐ ature, SR is a profitability index that measures how much attractive a risky investment is with respect to a riskless investment with return equal to *rf*: the ratio, in fact, opposes the excess of return (upper side of the ratio) to the excess of risk the investor assumes in charge when/if he decides to move his money from the riskless asset (whose standard deviation is zero) to the riskier one (lower side of the ratio), whose standard deviation is greater than zero. The beauty of SR stands in the fact it can be easily interpreted, giving an idea about the general attractiveness/profitability of the companies included in each group; at the same time, if we assume *rf*=0, the index turns to be the reciprocal of the coefficient of variation and it has also a (quite) trivial statistical interpretation.

The analysis of the results evidenced that all clusters have positive mean, relatively low var‐ iability, and good profitability (with the exception of CL01 and CL03 whose Sharpe Ratio is the lowest over all examined cases). Besides, companies returns are positively skewed.


Moving to Table 3, we checked whether companies tend to aggregate according to the sector they belong to or not, as well as if clusters composition may have affected the results that we have shown in Table 2.


**Table 3.** Clusters percentage composition according to the reference industrial sector.

We examined basic clusters statistics: mean (*mu*), standard deviation (*std*), skewness (*sk*),

where *rf* is the risk free rate, and *mu*, *std* are as above described. According to financial liter‐ ature, SR is a profitability index that measures how much attractive a risky investment is with respect to a riskless investment with return equal to *rf*: the ratio, in fact, opposes the excess of return (upper side of the ratio) to the excess of risk the investor assumes in charge when/if he decides to move his money from the riskless asset (whose standard deviation is zero) to the riskier one (lower side of the ratio), whose standard deviation is greater than zero. The beauty of SR stands in the fact it can be easily interpreted, giving an idea about the general attractiveness/profitability of the companies included in each group; at the same time, if we assume *rf*=0, the index turns to be the reciprocal of the coefficient of variation and

The analysis of the results evidenced that all clusters have positive mean, relatively low var‐ iability, and good profitability (with the exception of CL01 and CL03 whose Sharpe Ratio is the lowest over all examined cases). Besides, companies returns are positively skewed.

Moving to Table 3, we checked whether companies tend to aggregate according to the sector they belong to or not, as well as if clusters composition may have affected the results that we

**Sector CL01 CL02 CL03 CL04 CL05 CL06 CL07 CL08 CL09 CL10 CL11 CL12 CL13 CL14** Mkt 0.00 7.14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7.14 7.69 0.00 0.00 B&F 6.67 14.29 30.77 16.67 42.86 6.67 13.04 21.43 4.76 7.14 21.43 7.69 23.08 9.09 HI 0.00 7.14 7.69 0.00 0.00 0.00 4.35 0.00 9.52 14.29 0.00 15.38 0.00 4.35 Serv 20.00 0.00 0.00 0.00 0.00 6.67 4.35 15.00 9.52 0.00 14.29 0.00 15.38 9.09 Fash 13.33 0.00 0.00 16.67 0.00 13.33 5.88 0.00 0.00 7.14 0.00 7.69 15.38 0.00 HT 0.00 0.00 7.69 25.00 14.29 0.00 9.52 0.00 9.52 28.57 7.14 0.00 15.38 27.27 HC 13.33 14.29 0.00 8.33 14.29 6.67 17.39 27.28 9.52 14.29 0.00 0.00 0.00 0.00 Log 20.00 14.29 15.38 8.33 0.00 0.00 4.35 0.00 0.00 0.00 21.43 0.00 4.35 4.52 Lux 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 5.00 0.00 0.00 0.00 0.00 4.52 Hou 13.33 7.14 7.69 8.33 14.29 13.33 8.70 7.14 4.76 0.00 7.14 0.00 7.69 0.00 Comp 0.00 14.29 0.00 8.33 0.00 0.00 4.52 0.00 0.00 7.14 14.29 7.14 0.00 9.09 Re. Serv 0.00 0.00 0.00 0.00 0.00 6.67 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 F&D 0.00 7.14 7.69 0.00 0.00 0.00 9.52 0.00 0.00 0.00 0.00 0.00 7.69 0.00 En 0.00 7.14 7.69 0.00 0.00 20.00 0.00 7.14 14.29 7.14 0.00 0.00 0.00 9.09 Ent 0.00 0.00 0.00 0.00 7.14 6.67 0.00 0.00 2.50 7.14 0.00 30.77 0.00 0.00

and kurtosis (*ku*). We also evaluated the Sharpe Ratio of every cluster (SR):

it has also a (quite) trivial statistical interpretation.

10 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

*SR* <sup>=</sup> *mu* - *rf std*

have shown in Table 2.

In general, clusters did not show an exclusive, but rather a dominant composition. Looking at Table 3, in fact, CL01 exhibits a dominant percentage of companies from both Services (Serv) and Logistics (Log) sector (20%), CL02 is equally divided into firms belonging to Banking and Finance (B&F), Health-Care (HC), Logistics and Components (Comp) sectors which share the same 14.29% percentage. B&F dominates (30%) cluster CL03 as well as CL05 (42.86%), CL11 (21.43%), and CL13 (23.08%). Hi-Tech companies (HT) are preferably grouped into clusters CL04 (25%) and CL10 (28.57%). Companies working in the Health Care sector (HC) are more numerous in clusters CL07 and CL08 (17.39% and 27.28% respectively). Finally, clusters CL06 and CL09 have their most representative elements in societies of energy sector (En) (20% and 14.29%), while CL14 is dominated by Heavy Industry (HI) companies.

This seems to suggest that despite of the variety of sectors represented in German Stock Ex‐ change, only a reduced number of them (i.e. clusters dominant sectors) may be considered the very driving engine of the German economy. Such information together with the one re‐ trieved by looking at the Sharpe Ratio scores has strengthened the belief that Hi-Tech and Energy are, at present, the most challenging areas for investors in German market.

As a counterpart, we observed that there is a plenty of sectors2 whose incidence on clusters composition is lower, or better, they did not seem to cluster anyway. If this sounds reasonably for some niche-wise sectors (Luxury and Gardening, to make some examples), this is more sur‐ prising for other sectors (mainly Automotive and Telecommunications) that are worldwide known as strengths of German economy. This evidence, however, is somewhat aligned to the policy strategy that the German government has adopted in most recent times.

We can then conclude that Germany did not particularly suffer for the critical situation com‐ mon to greater part of European countries. The role played by both Hi-Tech and Energy sec‐ tors has been probably a key issue. However, from now on Germany should carefully monitor the state of B&F companies that are those that actually are performing worse. Other

<sup>2</sup> Fashion (Fash), Luxury Goods (Lux), Housing (Hou), Retail Services (Ret Serv.), Food and Drinking (F&D), Entertrain‐ ment (Ent), Press (Press), Import/Export (Imp/Exp), Public Utilities (PU), Telecommunications (TCom), Automotive (Au‐ to), Gardening (Gard) and Manifacturing (Man).

sectors like F&D, Hou, Press, and Auto need to be constantly checked as well since they seem to be in a stage whose evolution (towards either better or worse phases) is uncertain.

**CL.ID mu std sk ku SR** CL06 0.0002756 0.055918 10.00477 2389.595 0.493% CL07 0.0002644 0.055723 10.02986 2411.497 0.475% CL08 0.0004753 0.058852 11.32831 2525.394 0.808% CL09 0.0003157 0.056422 10.36997 2460.894 0.559% CL10 0.0005666 0.059179 11.77725 2602.168 0.957% CL11 0.0005439 0.058084 11.26676 2532.204 0.936% CL12 0.0003427 0.056806 10.10251 2371.656 0.603% CL13 0.0004168 0.057255 10.66783 2490.513 0.728%

Graph Mining Based SOM: A Tool to Analyze Economic Stability

http://dx.doi.org/10.5772/51240

13

**Table 4.** Clusters percentage composition for the German market in the period 2007-2008.

**Table 5.** Clusters percentage composition for the German market in the period 2004-2005.

**Sec. CL01 CL02 CL03 CL04 CL05 CL06 CL07 CL08 CL09 CL10 CL11 CL12 CL13** Mkt 0.00 0.00 0.00 8.33 0.00 0.00 4.35 0.00 11.76 0.00 0.00 8.33 0.00 B&F 23.81 8.33 15.38 0.00 22.22 0.00 17.39 37.50 11.76 7.69 14.29 8.33 36.36 HI 4.76 0.00 0.00 8.33 0.00 6.29 0.00 0.00 12.75 0.00 0.00 0.00 0.00 Serv 0.00 8.33 7.69 8.33 0.00 7.14 4.35 12.50 0.00 0.00 0.00 0.00 7.14 Fash 4.76 8.33 7.69 8.33 5.56 7.14 4.35 12.50 5.88 15.38 0.00 16.67 0.00 HT 4.76 12.00 7.69 0.00 0.00 21.43 4.35 0.00 11.76 23.08 21.43 25.00 9.09 HC 0.00 16.67 8.33 0.00 16.67 0.00 0.00 0.00 0.00 15.38 0.00 0.00 0.00

**CL.ID mu std sk ku SR** CL01 0.0003849 0.056505 10.48629 2485.513 0.681% CL02 0.0004037 0.057215 1.653466 2378.495 0.706% CL03 0.0008787 0.061519 12.6148 2713.425 1.428% CL04 0.0002657 0.055689 9.951796 2394.033 0.477% CL05 0.0003660 0.056965 10.57433 2485.67 0.642% CL06 0.0003745 0.056659 10.46166 2471.775 0.661% CL07 0.0003209 0.056475 10.12393 2390.405 0.568% CL08 0.0001570 0.045558 1.733983 413.5084 0.345% CL09 0.0003237 0.056749 10.47651 2469.166 0.570% CL10 0.0003062 0.056178 10.10382 2404.7 0.545% CL11 0.0005619 0.058529 11.85308 2648.211 0.960%

At this point it makes sense to test whether or not the actual snapshot we have captured for Germany is the result of either a strategic issue, or a kind of natural evolution from previous situations. To do this we performed a dynamical analysis going back in time from December 2011 to December 2001. As explained in Section 3, we scanned data by means of a moving win‐ dow, thus obtaining 10 matrices of dimensions 207×300, where 207 is the number of companies included into the simulation and 300 is the number of log-returns we took for each of them.

In order to make the discussion as clear as possible, we focused on the analysis of the peri‐ ods: 2004-2005 and 2007-2008. The period 2004-2005, in fact, is a starting point of some symptoms anticipating the world financial crisis; while the period 2007-2008 is generally ac‐ knowledged as the one where deepest effects of the crisis were felt.

Figure 4 shows the market skeleton frame obtained for the German Stock Exchange in the periods 2007-2008 and 2004-2005 respectively. Tables 4-7 detail basic statistics and clusters composition.


#### Graph Mining Based SOM: A Tool to Analyze Economic Stability http://dx.doi.org/10.5772/51240 13


**Table 4.** Clusters percentage composition for the German market in the period 2007-2008.

sectors like F&D, Hou, Press, and Auto need to be constantly checked as well since they seem to be in a stage whose evolution (towards either better or worse phases) is uncertain.

At this point it makes sense to test whether or not the actual snapshot we have captured for Germany is the result of either a strategic issue, or a kind of natural evolution from previous situations. To do this we performed a dynamical analysis going back in time from December 2011 to December 2001. As explained in Section 3, we scanned data by means of a moving win‐ dow, thus obtaining 10 matrices of dimensions 207×300, where 207 is the number of companies included into the simulation and 300 is the number of log-returns we took for each of them.

In order to make the discussion as clear as possible, we focused on the analysis of the peri‐ ods: 2004-2005 and 2007-2008. The period 2004-2005, in fact, is a starting point of some symptoms anticipating the world financial crisis; while the period 2007-2008 is generally ac‐

Figure 4 shows the market skeleton frame obtained for the German Stock Exchange in the periods 2007-2008 and 2004-2005 respectively. Tables 4-7 detail basic statistics and

**CL.ID mu std sk ku SR** CL01 0.0003094 0.057005 10.44936 2451.638 0.543% CL02 0.0003942 0.057219 10.25497 2390.428 0.689% CL03 0.0002678 0.05579 9.94812 638.8266 0.480% CL04 0.0002824 0.056436 10.02534 2373.059 0.500% CL05 0.0003270 0.056945 10.9984 2587.711 0.574%

knowledged as the one where deepest effects of the crisis were felt.

**Figure 4.** German market topology in 2008 (a) and 2004 (b).

clusters composition.

12 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps


**Table 5.** Clusters percentage composition for the German market in the period 2004-2005.



**Sec. CL01 CL02 CL03 CL04 CL05 CL06 CL07 CL08 CL09 CL10 CL11** Ent 0.00 6.25 25.00 6.67 0.00 12.50 0.00 0.00 0.00 10.00 7.14 Press 0.00 0.00 0.00 0.00 10.00 0.00 0.00 10.00 0.00 10.00 0.00 Imp/Exp 0.00 0.00 0.00 9.86 0.00 0.00 0.00 0.00 5.71 0.00 5.00 PU 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 5.00 0.00 TCom 0.00 7.50 0.00 0.00 0.00 0.00 16.67 0.00 0.00 0.00 0.00 Auto 30.00 0.00 12.50 6.67 9.09 12.50 16.67 10.00 15.00 0.00 0.00 Gard 5.00 0.00 0.00 0.00 10.00 0.00 0.00 0.00 0.00 0.00 0.00 Man 0.00 0.00 0.00 0.00 10.00 7.00 0.00 0.00 7.14 10.00 3.00

In both cases cluster statistics evidence (once again) positive mean and skewness, and lower variability. The Sharpe Ratio is generally higher than that evidenced in the static analysis. Looking at clusters composition, we primarily observe that, moving from one period to an‐ other, it did not maintain unchanged. However, it has been possible to isolate dominant sec‐ tors. In particular, in the period 2007- 2008, B&F companies prevail in five over thirteen clusters (CL01, CL03, CL07, CL08 and CL13); HC and Imp/Exp firms share dominance in CL02; Hi-Tech is the dominant sector in cluster CL06, CL10, CL11 and CL12. Finally, Logis‐ tics and TCom societies are concentrated in CL04 and CL09 respectively. Coping such re‐ sults to the values of Sharpe Ratio, it seems possible to claim that good performances are mainly due to the leading activity of the High-Tech sector. Besides, by comparison with the performances discussed in the static analysis, Germany gave the impression to have suf‐

> **2004-2005 (NETG1)**

Average Degree 1.990 1.989 1.990 Density 0.010 0.011 0.010 Modularity 0.767 0.757 0.755

**Table 8.** Measures of network organization. A comparison among German market topologies during the periods

Most interesting results, in our opinion, come by the analysis of the period: December 2004-De‐ cember 2005. The first element to highlight is that in this case we have only 11 clusters (versus 13 in the period: 2007-2008, and 14 in the period: 2010-2011). For what is concerning clusters composition now we have: B&F companies dominating clusters CL01, CL05, CL06, CL07, CL08, and CL10; fashion sector prevails in CL02, entertainment companies in CL03, housing societies in CL09, while Hi-Tech is the king of remaining clusters (CL04, CL05 and CL11). If we

**2007-2008 (NETG2)**

**2010-2011 (NETG3)**

Graph Mining Based SOM: A Tool to Analyze Economic Stability

http://dx.doi.org/10.5772/51240

15

**Table 7.** Clusters percentage composition during the period 2004- 2005.

fered for the global crisis with some delay.

under examination. NET.

**Table 6.** Clusters percentage composition during the period 2007- 2008.



**Table 7.** Clusters percentage composition during the period 2004- 2005.

**Sec. CL01 CL02 CL03 CL04 CL05 CL06 CL07 CL08 CL09 CL10 CL11 CL12 CL13** Log 4.76 0.00 8.33 16.67 0.00 7.14 4.35 12.50 0.00 0.00 0.00 0.00 4.75 Lux 0.00 8.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7.14 0.00 Hou 19.05 0.00 7.69 0.00 5.56 7.14 8.70 12.50 5.88 0.00 14.29 8.33 9.09 Comp 4.76 0.00 0.00 8.33 0.00 0.00 4.00 0.00 0.00 7.69 0.00 0.00 9.09 Ret Serv 0.00 0.00 0.00 0.00 0.00 5.00 0.00 0.00 11.76 0.00 4.21 0.00 0.00 F&D 4.76 0.00 7.69 8.33 5.56 0.00 16.67 0.00 0.00 0.00 0.00 8.33 4.75 En 4.76 0.00 4.76 0.00 16.67 0.00 9.20 0.00 0.00 0.00 0.00 0.00 4.76 Ent 0.00 0.00 0.00 8.33 0.00 7.14 0.00 7.14 5.88 23.08 14.29 0.00 0.00 Press 4.76 0.00 8.33 8.33 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 7.14 Imp/Exp 0.00 16.67 3.94 8.33 0.00 7.14 0.00 0.00 0.00 0.00 7.14 0.00 0.00 PU 4.76 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 TCom 0.00 0.00 7.69 0.00 0.00 7.14 12.41 0.00 16.67 7.69 7.69 0.00 0.00 Auto 9.52 12.00 0.00 8.33 11.11 7.14 5.90 5.36 5.88 0.00 16.67 16.67 9.09 Gard 0.00 0.00 0.00 0.00 0.00 3.00 0.00 0.00 0.00 0.00 0.00 1.19 0.00 Man 4.76 8.33 4.76 0.00 16.67 7.14 4.00 0.00 0.00 0.00 0.00 0.00 0.00

**Sec. CL01 CL02 CL03 CL04 CL05 CL06 CL07 CL08 CL09 CL10 CL11** Mkt 0.00 6.25 12.50 6.67 0.00 6.25 0.00 0.00 0.00 0.00 0.00 B&F 20.00 12.50 0.00 0.00 18.18 25.00 41.67 35.00 0.00 40.00 14.29 HI 0.00 0.00 7.50 16.80 5.00 0.00 0.00 0.00 16.43 0.00 0.00 Serv 0.00 6.25 0.00 0.00 9.09 0.00 0.00 10.00 0.00 10.00 0.00 Fash. 10.00 25.00 0.00 6.67 0.00 6.25 0.00 5.00 0.00 0.00 14.29 HT 10.00 0.00 12.50 26.67 18.18 0.00 8.33 0.00 14.29 0.00 41.43 HC 5.00 0.00 0.00 0.00 0.00 12.50 0.00 0.00 0.00 0.00 0.00 Log 0.00 18.50 0.00 0.00 10.00 0.00 8.33 20.00 0.00 0.00 12.00 Lux 0.00 0.00 12.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Hou 10.00 6.25 0.00 6.67 0.00 5.50 8.33 0.00 41.43 0.00 0.00 Comp 0.00 11.50 0.00 6.67 0.00 6.25 0.00 5.00 0.00 0.00 0.00 Ret Serv 0.00 0.00 10.00 0.00 0.00 0.00 0.00 0.00 0.00 15.00 0.00 F&D 0.00 0.00 0.00 6.67 0.00 6.25 0.00 5.00 0.00 0.00 0.00 En 10.00 0.00 7.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

**Table 6.** Clusters percentage composition during the period 2007- 2008.

14 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

In both cases cluster statistics evidence (once again) positive mean and skewness, and lower variability. The Sharpe Ratio is generally higher than that evidenced in the static analysis. Looking at clusters composition, we primarily observe that, moving from one period to an‐ other, it did not maintain unchanged. However, it has been possible to isolate dominant sec‐ tors. In particular, in the period 2007- 2008, B&F companies prevail in five over thirteen clusters (CL01, CL03, CL07, CL08 and CL13); HC and Imp/Exp firms share dominance in CL02; Hi-Tech is the dominant sector in cluster CL06, CL10, CL11 and CL12. Finally, Logis‐ tics and TCom societies are concentrated in CL04 and CL09 respectively. Coping such re‐ sults to the values of Sharpe Ratio, it seems possible to claim that good performances are mainly due to the leading activity of the High-Tech sector. Besides, by comparison with the performances discussed in the static analysis, Germany gave the impression to have suf‐ fered for the global crisis with some delay.


**Table 8.** Measures of network organization. A comparison among German market topologies during the periods under examination. NET.

Most interesting results, in our opinion, come by the analysis of the period: December 2004-De‐ cember 2005. The first element to highlight is that in this case we have only 11 clusters (versus 13 in the period: 2007-2008, and 14 in the period: 2010-2011). For what is concerning clusters composition now we have: B&F companies dominating clusters CL01, CL05, CL06, CL07, CL08, and CL10; fashion sector prevails in CL02, entertainment companies in CL03, housing societies in CL09, while Hi-Tech is the king of remaining clusters (CL04, CL05 and CL11). If we compare the results to those we have previously discussed, it is quite clear that during the ob‐ served period we have been witnesses of various companies reactions to the crisis: while Hi-Tech as well as financial companies maintained similar behaviors (and this is confirmed by the tendency to be clustered together), companies in other sectors did not group in any way. A pos‐ sible explanation might stay in some policy action made by the national government, in order to address the economy, and to protect sectors with higher exposure.

**CL.ID mu std sk ku SR** CL01 0.0004 0.023 -1.803 189.280 1.9% CL02 0.0005 0.022 -1.400 175.611 2.4% CL03 0.0005 0.022 -1.235 162.575 2.4% CL04 0.0005 0.022 -1.353 169.223 2.4% CL05 0.0005 0.022 -1.209 161.611 2.3% CL06 0.0005 0.023 -1.321 164.853 2.3% CL07 0.0006 0.022 -1.345 180.364 2.8% CL08 0.0004 0.023 -1.705 179.979 1.8%

Graph Mining Based SOM: A Tool to Analyze Economic Stability

http://dx.doi.org/10.5772/51240

17

**Table 9.** Basic statistics for clusters in the Spanish stock exchange. The reference period is: Dec. 2010 - Dec. 2011.

**Figure 6.** The behaviour of Spanish market (log returns) in the period December 2010-December 2011.

discussed for Germany a number of sectors is now missing3

Fig 6, in fact, shows the log returns dynamics in the Spanish market in the period December 2010-December 2011. It sticks out immediately the *spiky* nature of the observed time series.

Moving to the analysis of clusters composition (Table 10), by comparison to the situation

. In addition, companies in the

At the first glance cluster statistics are not as dramatic as to justify the present critical situation of the Spanish market: mean is positive and so the Sharpe Ratio is. Obviously it is quite low, and hence it can be explained as a signal of overall reduced market profitability. Nevertheless, a warning comes matching mean to skewness. Skewness, in fact, is negative: under this light the positivity of the mean can be justified only by the presence of bursts (and hence speculative movements), like viewing at the Spanish market behavior (Fig 6) over the past year confirms.

To conclude, the joint use of SOM and MST makes also possible to analyze the results from a network (graphs theory) perspective. To such aim, Table 8 shows some relevant measures of network organization for the German market in the periods under examination.

Before discussing the values, we briefly explain the meaning of the observed variables. The Average Degree (AD) expresses the average number of ties of the networks nodes and measures how much immediate is the risk of nodes for catching whatever is flowing through the network. In the examined cases higher scores should mean an exposure to abrupt changes in the market arrangement. However, the AD values we have obtained are low and very similar one to each other. The Graph Density (GD) measures how close the network is to be complete: since a complete graph has all possible edges, its GD will be 1: the lower this value, the farther the graph is to be complete. The values in our nets are at least the same and lower. Both NetG1, NetG2 and NetG3 are far to be complete. Note that the reason is in the filtering procedure acted by MST on SOM that cleaned the original map from lesser significant ties. The Modularity, on the other hand, is a concept close to that of clustering, since it examines the attitude to community formation in the net, and it is then strictly related to the possibility to disclose clusters in a net. In order to be significant, values need to be higher than 0.4. This threshold has been largely exceeded in all examined nets.

#### **4.2. The case of Spain**

As done for Germany, we begin by the static analysis during the period: December 2010-De‐ cember 2011. Our procedure identified eight clusters, as shown in Figure 5.

**Figure 5.** Skeleton framework of the Spanish stock exchange in the period: 30 December 2010 - 30 December 2011.

#### Graph Mining Based SOM: A Tool to Analyze Economic Stability http://dx.doi.org/10.5772/51240 17


compare the results to those we have previously discussed, it is quite clear that during the ob‐ served period we have been witnesses of various companies reactions to the crisis: while Hi-Tech as well as financial companies maintained similar behaviors (and this is confirmed by the tendency to be clustered together), companies in other sectors did not group in any way. A pos‐ sible explanation might stay in some policy action made by the national government, in order

To conclude, the joint use of SOM and MST makes also possible to analyze the results from a network (graphs theory) perspective. To such aim, Table 8 shows some relevant measures of

Before discussing the values, we briefly explain the meaning of the observed variables. The Average Degree (AD) expresses the average number of ties of the networks nodes and measures how much immediate is the risk of nodes for catching whatever is flowing through the network. In the examined cases higher scores should mean an exposure to abrupt changes in the market arrangement. However, the AD values we have obtained are low and very similar one to each other. The Graph Density (GD) measures how close the network is to be complete: since a complete graph has all possible edges, its GD will be 1: the lower this value, the farther the graph is to be complete. The values in our nets are at least the same and lower. Both NetG1, NetG2 and NetG3 are far to be complete. Note that the reason is in the filtering procedure acted by MST on SOM that cleaned the original map from lesser significant ties. The Modularity, on the other hand, is a concept close to that of clustering, since it examines the attitude to community formation in the net, and it is then strictly related to the possibility to disclose clusters in a net. In order to be significant, values need to be higher than 0.4. This threshold has been largely exceeded in all examined nets.

As done for Germany, we begin by the static analysis during the period: December 2010-De‐

**Figure 5.** Skeleton framework of the Spanish stock exchange in the period: 30 December 2010 - 30 December 2011.

cember 2011. Our procedure identified eight clusters, as shown in Figure 5.

network organization for the German market in the periods under examination.

to address the economy, and to protect sectors with higher exposure.

16 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

**4.2. The case of Spain**

**Table 9.** Basic statistics for clusters in the Spanish stock exchange. The reference period is: Dec. 2010 - Dec. 2011.

At the first glance cluster statistics are not as dramatic as to justify the present critical situation of the Spanish market: mean is positive and so the Sharpe Ratio is. Obviously it is quite low, and hence it can be explained as a signal of overall reduced market profitability. Nevertheless, a warning comes matching mean to skewness. Skewness, in fact, is negative: under this light the positivity of the mean can be justified only by the presence of bursts (and hence speculative movements), like viewing at the Spanish market behavior (Fig 6) over the past year confirms.

**Figure 6.** The behaviour of Spanish market (log returns) in the period December 2010-December 2011.

Fig 6, in fact, shows the log returns dynamics in the Spanish market in the period December 2010-December 2011. It sticks out immediately the *spiky* nature of the observed time series.

Moving to the analysis of clusters composition (Table 10), by comparison to the situation discussed for Germany a number of sectors is now missing3 . In addition, companies in the B&F sector are widely disseminated and dominate five over eight clusters. In the remaining three clusters Housing (Hou) and Paper Factories (Pap, a new entry with respect to what al‐ ready seen for Germany) have a dominant position.

The aforementioned clusters structure suggests a key to understand present financial in‐ stability in Spain: the highest number of financial companies in the market makes it weak and prone to speculation (as the bursts one can see by looking at Fig 6 confirms in turn). One the other hand, since the Housing sector has been the driving engine of the global crisis, it is reasonable that its higher influence in the Spanish market composition has neg‐ atively conditioned its behaviour.

**Figure 7.** Skeleton framework of Spanish market in the periods: 2007-2008 (a), and 2004-2005.

**Table 11.** Clusters statistics for Spain in the period: 2007-2008.

**Table 12.** Clusters statistics for Spain in the period: 2004-2005.

**CL.ID mu std sk ku SR**

**CL.ID mu std sk ku SR**

CL01 -0.00006 0.021 1.888 19.464 -0.3% CL02 0.00150 0.018 1.327 18.261 8.4% CL03 -0.00019 0.021 0.209 2.673 -0.09% CL04 0.0012 0.015 0.782 6.485 7.7% CL05 0.0013 0.017 0.939 7.531 7.6% CL06 -0.0006 0.026 -11.489 396.951 -2.1% CL07 0.0014 0.029 6.702 132.794 5% CL08 -0.0001 0.024 -0.249 12.979 -0.5%

CL01 0.00142 0.018 1.156 14.442 8% CL02 -0.00023 0.016 0.702 13.033 -1.5% CL03 -0.00028 0.026 -10.400 354.521 -1.1% CL04 0.0014 0.020 0.948 7.377 7.1% CL05 -0.0005 0.025 -0.267 13.392 -2.1% CL06 0.0009 0.017 0.375 4.462 5.3% CL07 0.0015 0.024 7.952 185.964 6.2% CL08 0.0002 0.022 -0.122 14.941 0.7%

Graph Mining Based SOM: A Tool to Analyze Economic Stability

http://dx.doi.org/10.5772/51240

19


**Table 10.** Cluster percentage composition for Spain in the period December 2010-December 2011.

Replicating for Spain the analysis we have already performed for Germany, suggests a num‐ ber of additional issues to be discussed. Figure 7 shows the market organization in the peri‐ ods: 2007-2008 and 2004-2005, while Tables 11-14 report the corresponding basic statistics and clusters composition.

<sup>3</sup> This is the case, for instance, of Imp/Exp, Ret. Serv., and High Tech.

**Figure 7.** Skeleton framework of Spanish market in the periods: 2007-2008 (a), and 2004-2005.


**Table 11.** Clusters statistics for Spain in the period: 2007-2008.

B&F sector are widely disseminated and dominate five over eight clusters. In the remaining three clusters Housing (Hou) and Paper Factories (Pap, a new entry with respect to what al‐

The aforementioned clusters structure suggests a key to understand present financial in‐ stability in Spain: the highest number of financial companies in the market makes it weak and prone to speculation (as the bursts one can see by looking at Fig 6 confirms in turn). One the other hand, since the Housing sector has been the driving engine of the global crisis, it is reasonable that its higher influence in the Spanish market composition has neg‐

**CL01 CL02 CL03 CL04 CL05 CL06 CL07 CL08**

Aero 0% 0% 14.29% 0.00% 0% 0% 0% 0% Agr 0% 0% 7.14% 0.00% 0% 0% 0% 0% Auto 0% 0% 0.00% 0.00% 11.11% 0% 0% 0% B&F 40% 22.22% 21.43% 12.50% 11.11% 44.44% 0.00% 20% Chem 0% 0% 7.14% 0.00% 0.00% 11.11% 10% 0% En 10% 11.11% 14.29% 12.50% 0.00% 11.11% 10% 0% F&D 10% 11.11% 0.00% 0.00% 22.22% 22.22% 10% 10% Fas 10% 0% 0% 12.50% 0% 0% 0% 10% Hi 10% 0% 0% 0% 0% 0% 0% 10% Hou 10% 22.22% 7.14% 0.00% 22.22% 0% 20% 0% It 0% 0% 0% 12.50% 0% 0% 0% 0% Log 0% 0% 0% 12.50% 0% 0% 0% 0% Lux 0% 11.11% 0% 0% 11.11% 0% 0% 0% Pap 0% 0.00% 0.00% 25.00% 0% 11.11% 10% 10% Pharm 0% 11.11% 7.14% 0% 0% 0% 10% 10% Pu 0% 0% 7.14% 0% 0% 0% 0% 0% Tcom 10% 0% 0% 12.50% 0% 0% 20% 0% Ter 0% 0% 7.14% 0% 22.22% 0% 10% 30% Transp 0% 11.11% 7.14% 0% 0% 0% 0% 0%

**Table 10.** Cluster percentage composition for Spain in the period December 2010-December 2011.

Replicating for Spain the analysis we have already performed for Germany, suggests a num‐ ber of additional issues to be discussed. Figure 7 shows the market organization in the peri‐ ods: 2007-2008 and 2004-2005, while Tables 11-14 report the corresponding basic statistics

ready seen for Germany) have a dominant position.

18 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

atively conditioned its behaviour.

and clusters composition.

3 This is the case, for instance, of Imp/Exp, Ret. Serv., and High Tech.


**Table 12.** Clusters statistics for Spain in the period: 2004-2005.


**CL01 CL02 CL03 CL04 CL05 CL06 CL07 CL08**

Graph Mining Based SOM: A Tool to Analyze Economic Stability

http://dx.doi.org/10.5772/51240

21

Aero 0% 8% 0.00% 0.00% 10% 0% 0% 0% Agr 0% 0% 0.00% 10.00% 0% 0% 0% 0% Auto 0% 0% 0.00% 0.00% 0.00% 8% 0% 0% Bkf 17% 33.33% 11.11% 20.00% 0.00% 41.67% 0.00% 26% Chem 0% 25% 0.00% 10.00% 0.00% 0.00% 0% 0% En 0% 16.67% 22.22% 10.00% 10.00% 0.00% 13% 5% F&D 0% 0.00% 11.11% 20.00% 10.00% 16.67% 38% 0% Fas 17% 0% 11% 0.00% 10% 0% 13% 0% Hi 0% 8% 0% 10% 0% 8% 0% 0% Hou 0% 0.00% 11.11% 0.00% 10.00% 8% 13% 26% It 0% 0% 0% 0.00% 0% 0% 0% 5% Log 0% 0% 0% 0.00% 0% 8% 0% 0% Lux 17% 0.00% 0% 0% 0.00% 0% 0% 5% Pap 17% 0.00% 11.11% 0.00% 30% 0.00% 0% 0% Pharm 17% 0.00% 0.00% 0% 0% 0% 0% 16% Pu 0% 0% 0.00% 10% 0% 0% 0% 0% Tcom 0% 0% 0% 0.00% 20% 0% 13% 5% Ter 0% 8% 22.22% 10% 0.00% 8% 13% 5% Transp 17% 0.00% 0.00% 0% 0% 0% 0% 5%

**Table 14.** Cluster percentage composition for Spain in the period 2004-2005.

country to external speculation attacks.

examination.

Moving the attention towards networks statistics (Table 15), we may observe that the values of NETS2 and NETS3 are quite similar; conversely, they differ from those referring to the first period under examination (NETS1). In the attempt to give the data an economic inter‐ pretation, we can say that NETS2 and NETS3 mirror a steady situation. Moreover, looking to Density values the Spanish market gives the impression of a place where each company is undertaking its own way. Such *de-clustering* orientation confirms the present exposure of the

> **2007-2008 (NETS2)**

**2010-2011 (NETS3)**

**2004-2005 (NETS1)**

Average Degree 1.75 1.974 1.974 Diameter 16 19 22 Density 0.25 0.026 0.026 Modularity 0.132 0.697 0.701

**Table 15.** Measures of network organization. A comparison among market topologies during the periods under

**Table 13.** Cluster percentage composition for Spain in the period 2007-2008.

Looking to Table 11, basic statistics for 2008 highlight a situation that cannot be interpreted in an precise way: clusters CL01, CL04, CL06, and CL07 have positive mean, skewness and Sharpe Ratio, CL08 has positive mean and SR, CL03 and CL05 has gone negative, while CL02 is a hybrid of all above states, with negative mean and SR, and positive skewness. Go‐ ing back to 2004, Table 12 sees two clusters (CL06, CL08) negative both in mean, SR and skewness, two negative only in mean and SR (CL01, CL03), and all remaining clusters with positive statistics.

The turning point to understand the crisis of Spain is in clusters composition. While in 2004 (Table 14) the Spanish market exhibited a strongest component in the Energy sector, this dis‐ appeared when we look to Table 13 that shows market organization in 2008. The snapshot we took by looking at this period, shows a market dominated by banks (i.e. an exposure to speculation), as well as by sectors like luxury goods, and fashion that did not assure any protective shield in period of global crisis.



**Table 14.** Cluster percentage composition for Spain in the period 2004-2005.

**CL01 CL02 CL03 CL04 CL05 CL06 CL07 CL08**

Aero 14% 0% 0.00% 12.50% 0% 0% 0% 0% Agr 0% 0% 0.00% 0.00% 0% 0% 9% 0% Auto 0% 0% 8.33% 0.00% 0.00% 0% 0% 0% B&F 0% 0.00% 33.33% 25.00% 22.22% 22.22% 18.18% 36% Chem 0% 0% 0.00% 0.00% 0.00% 11.11% 9% 9% En 29% 18.18% 0.00% 12.50% 0.00% 0.00% 18% 0% F&D 14% 9.09% 16.67% 12.50% 0.00% 11.11% 18% 0% Fas 0% 0% 8% 0.00% 11% 11% 0% 0% Hi 0% 0% 0% 0% 0% 11% 9% 0% Hou 14% 9.09% 8.33% 12.50% 33.33% 0% 0% 9% It 0% 0% 0% 0.00% 11% 0% 0% 0% Log 0% 0% 8% 0.00% 0% 0% 0% 0% Lux 0% 9.09% 0% 0% 0.00% 0% 0% 9% Pap 14% 9.09% 8.33% 0.00% 0% 0.00% 9% 9% Pharm 0% 18.18% 0.00% 0% 0% 0% 0% 18% Pu 0% 0% 0.00% 13% 0% 0% 0% 0% Tcom 14% 9% 0% 12.50% 0% 11% 0% 0% Ter 0% 9% 8.33% 0% 22.22% 22% 9% 0% Transp 0% 9.09% 0.00% 0% 0% 0% 0% 9%

20 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

**Table 13.** Cluster percentage composition for Spain in the period 2007-2008.

positive statistics.

protective shield in period of global crisis.

Looking to Table 11, basic statistics for 2008 highlight a situation that cannot be interpreted in an precise way: clusters CL01, CL04, CL06, and CL07 have positive mean, skewness and Sharpe Ratio, CL08 has positive mean and SR, CL03 and CL05 has gone negative, while CL02 is a hybrid of all above states, with negative mean and SR, and positive skewness. Go‐ ing back to 2004, Table 12 sees two clusters (CL06, CL08) negative both in mean, SR and skewness, two negative only in mean and SR (CL01, CL03), and all remaining clusters with

The turning point to understand the crisis of Spain is in clusters composition. While in 2004 (Table 14) the Spanish market exhibited a strongest component in the Energy sector, this dis‐ appeared when we look to Table 13 that shows market organization in 2008. The snapshot we took by looking at this period, shows a market dominated by banks (i.e. an exposure to speculation), as well as by sectors like luxury goods, and fashion that did not assure any Moving the attention towards networks statistics (Table 15), we may observe that the values of NETS2 and NETS3 are quite similar; conversely, they differ from those referring to the first period under examination (NETS1). In the attempt to give the data an economic inter‐ pretation, we can say that NETS2 and NETS3 mirror a steady situation. Moreover, looking to Density values the Spanish market gives the impression of a place where each company is undertaking its own way. Such *de-clustering* orientation confirms the present exposure of the country to external speculation attacks.


**Table 15.** Measures of network organization. A comparison among market topologies during the periods under examination.

## **5. Conclusion**

In this chapter we provided an example of how to use Self Organizing Maps (SOMs) as a tool to analyze financial stability.

[3] European Central Bank. (2004). *Annual Report*.

University.

2358-2374.

*per 2001/10*.

*of Finance*, 8(4), 537-569.

*Methodology*, 61-99.

*nomica*, 76, 108-131.

[210].

281-299.

*Services Research*, 29, 37-60.

[4] Schwarcz, S. L. (2008). Systemic risk. *Duke Law School Legal Studies Paper 163*, Duke

Graph Mining Based SOM: A Tool to Analyze Economic Stability

http://dx.doi.org/10.5772/51240

23

[5] Martinez-Jaramillo, S., Perez, O., Embriz, F., & Dey, F. (2010). Systemic risk, financial contagion and financial fragility. *Journal of Economic Dynamics & Control*, 34(11),

[6] Allen, F., & Gale, D. (2000). Financial contagion. *Journal of Political Economy*, 108, 1-33.

[7] Nier, E., Yang, J., Yorulmazer, T., & Alentorn, A. (2006). Network models and finan‐

[8] Boss, M., Elsinger, H., Summer, M., & Thurner, S. (2004). The network topology of

[9] Muller, J. (2006). Interbank credit lines as a channel of contagion. *Journal of Financial*

[10] Altman, E. I., & Saunders, A. (1998). Credit risk measurement: developments over

[11] Benito, A., Delgado, F., & Pagés, J. (2004). A synthetic indicator of financial pressure

[12] Bernhardsen, E. (2001). A Model of Bankruptcy Prediction. *Norges Bank Working Pa‐*

[13] Bunn, P., & Redwood, V. (2003). Company Accounts-Based Modelling of Business Failures and the Implications for Financial Stability. *Bank of England Working Paper*

[14] Chava, S., & Jarrow, R. A. (2004). Bankruptcy prediction with industry effects. *Review*

[15] Allison, P. (1982). Discrete-time methods for the analysis of event history. *Sociological*

[16] Bonfim, D. (2009). Credit risk drivers: evaluating the contribution of firm level infor‐ mation and of macroeconomic dynamics. *Journal of Banking and Finance*, 33(2),

[17] Bhattacharjee, A., Higson, C., Holly, S., & Kattuman, P. (2009). Macroeconomic insta‐ bility and business exit: determinants of failures and acquisitions of UK Firms. *Eco‐*

[18] Qu, Y. (2008). Macro Economic Factors and Probability of Default. *European Journal of*

*Economics, Finance and Administrative Sciences* [13], 1450-2275.

cial stability. *Journal of Economic Dynamics & Control*, 31, 2033-2060.

the interbank market. *Quantitative Finance*, 4, 677-684.

the last 20 years. *Journal of Banking and Finance*, 21, 1721-1742.

for Spanish firms. *Banco de Espana, Working paper* [411].

We moved from row data (price levels) of quoted enterprises to provide a snapshot of coun‐ tries financial situation, and then we applied a hybrid procedure coping together SOMs and Minimum Spanning Tree (MST). We checked our approach on two markets featuring differ‐ ent levels of (in)stability: the German and the Spanish Stock Exchange.

Our study made us possible to highlight most important relations among quoted societies, as well as the natural clusters that tend to be created into those markets.

In particular, in the case of Germany we captured the country situation in three periods (2004-2005, 2007-2008 and 2010-2011). The study suggested that the German government was able to pay attention to warning signals emerging from the market. In this way Germa‐ ny applied measures that allowed it to face last year critical situation. Protecting sectors with a strength tradition and promoting the challenge in emerging sectors Germany played a game that seems to maintain the country at the marginal side of current global crisis.

On the other hand, the case of Spain suggests the existence of a weak market dominated by banks that has been highly exposed to investors speculation. Local governors neither did take into account in the right way alerting signals or did apply correction/protection meas‐ ures. In a positive sense our procedure highlighted some direction towards which policy makers could operate in order to reduce instability.

To conclude the joined SOM-MST approach seems able to suggest proper recipes that gov‐ ernments might consider in order to address their policy efforts.

## **Author details**

Marina Resta\*

Address all correspondence to: resta@economia.unige.it

Department of Economics, University of Genova, Italy

## **References**


[3] European Central Bank. (2004). *Annual Report*.

**5. Conclusion**

**Author details**

Marina Resta\*

**References**

tool to analyze financial stability.

22 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

In this chapter we provided an example of how to use Self Organizing Maps (SOMs) as a

We moved from row data (price levels) of quoted enterprises to provide a snapshot of coun‐ tries financial situation, and then we applied a hybrid procedure coping together SOMs and Minimum Spanning Tree (MST). We checked our approach on two markets featuring differ‐

Our study made us possible to highlight most important relations among quoted societies,

In particular, in the case of Germany we captured the country situation in three periods (2004-2005, 2007-2008 and 2010-2011). The study suggested that the German government was able to pay attention to warning signals emerging from the market. In this way Germa‐ ny applied measures that allowed it to face last year critical situation. Protecting sectors with a strength tradition and promoting the challenge in emerging sectors Germany played a

On the other hand, the case of Spain suggests the existence of a weak market dominated by banks that has been highly exposed to investors speculation. Local governors neither did take into account in the right way alerting signals or did apply correction/protection meas‐ ures. In a positive sense our procedure highlighted some direction towards which policy

To conclude the joined SOM-MST approach seems able to suggest proper recipes that gov‐

[1] Kaufman, G. Ed. (1995). Banking, Financial Markets, and Systemic Risk Research in

[2] de Bandt, O., & Hartmann, P. (2000, December). Systemic Risk: a Survey. *Discussion*

game that seems to maintain the country at the marginal side of current global crisis.

ent levels of (in)stability: the German and the Spanish Stock Exchange.

as well as the natural clusters that tend to be created into those markets.

makers could operate in order to reduce instability.

Address all correspondence to: resta@economia.unige.it

Department of Economics, University of Genova, Italy

Financial Services Greenwich/London , 7

*Paper 2634*, Centre for Economic Policy Research.

ernments might consider in order to address their policy efforts.


[19] Pederzoli, C., & Torricelli, C. (2005). Capital requirements and business cycle re‐ gimes: forward-looking modelling of default probabilities. *Journal of Banking and Fi‐ nance*, 29(12), 3121-3140.

[34] Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological

Graph Mining Based SOM: A Tool to Analyze Economic Stability

http://dx.doi.org/10.5772/51240

25

networks. *Proc. Natl. Acad. Sci.*, USA, 99, 7821-7826.


[34] Girvan, M., & Newman, M. E. J. (2002). Community structure in social and biological networks. *Proc. Natl. Acad. Sci.*, USA, 99, 7821-7826.

[19] Pederzoli, C., & Torricelli, C. (2005). Capital requirements and business cycle re‐ gimes: forward-looking modelling of default probabilities. *Journal of Banking and Fi‐*

[21] Kaski, S., Kangas, J., & Kohonen, T. (1998). Bibliography of Self-Organizing Map

[22] Oja, M., Kaski, S., & Kohonen, T. (2003). Bibliography of Self-Organizing Map (SOM)

[23] Polla, M., Honkela, T., & Kohonen, T. (2009). Bibliography of Self-Organizing Map (SOM) Papers: 2002-2005 Addendum. *TKK Reports in Information and Computer Sci‐*

[24] Martin, B., & Serrano, Cinca. C. (1993). Self Organizing Neural Networks for the Analysis and Representation of Data: some Financial Cases. *Neural Computing & Ap‐*

[25] Deboeck, G., Kohonen, T., & Edrs, . (1998). Visual Explorations in Finance: with Self-

[26] Montefiori, M., & Resta, M. (2009). A computational approach for the health care

[27] Resta, M. (2011). Assessing the efficiency of Health Care Providers: A SOM perspec‐ tive. *In: Laaksonen J., Honkela T. Advances in Self Organizing Maps. LNCS 6731*, Spring‐

[28] Resta, M. (2009). Early Warning Systems: an approach via Self Organizing Maps with applications to emergent markets. *Proceedings of the 2009 conference on New Directions in Neural Networks: 18th Italian Workshop on Neural Networks: WIRN 2008*, IOS Press,

[29] Sarlin, P., & Eklund, T. (2011). Fuzzy clustering of the self-organizing map: some ap‐ plications on financial time series. *In: Laaksonen J., Honkela T. Advances in Self Organiz‐*

[30] Resta, M. (2012). The Shape of Crisis: lessons from Self Organizing Maps. *Forthcoming in C. Kahraman Ed.: Computational intelligence applications in industrial engineering*,

[31] Kangas, J. A., Kohonen, T., & Laaksonen, J. (1990). Variants of self-organizing maps.

[32] Kleiweg, P. (1996). *Neurale netwerken: Een inleidende cursus met practica voor de studie*

[33] Mayer, R., & Rauber, A. (2010). Visualising Clusters in Self-Organising Maps with Minimum Spanning Trees. *Proceedings of the International Conference on Artificial Neu‐ ral Networks (ICANN'10)*, Springer-Verlag, Berlin, Heidelberg, 364-2-15821-842-6,

[20] Kohonen, T. (2001). Self-Organizing Maps. Third, extended edition, Springer.

(SOM) Papers: 1981-1997. *Neural Computing Surveys*, 1, 102-350.

Papers: 1998-2001 Addendum. *Neural Computing Surveys*, 3, 1-156.

*ence, Report TKK-ICS-R23*, Helsinki University of Technology.

Organizing Maps. Springer Finance, New York.

market. *Health Care Management Science*, 12(4), 344-350.

*ing Maps. LNCS 6731*, Springer, Heidelberg, 40-50.

*Alfa-Informatica, Master thesis*, Rijksuniversiteit Groningen.

*nance*, 29(12), 3121-3140.

24 Developments and Applications of Self-Organizing Maps Applications of Self-Organizing Maps

*plications*, 1(2), 193-206.

er, Heidelberg, 30-39.

Springer, Atlantis.

426-431.

Amsterdam, The Netherlands.

*IEEE Trans Neural Netw*, 1(1), 93-99.

**Chapter 2**

**Provisional chapter**

**Social Interaction and Self-Organizing Maps**

**Social Interaction and Self-Organizing Maps**

In this chapter, we consider neuron societies where there are many different types of interactions. In one society, a neuron is connected with others only by the distance between two neurons. In another one, a neuron is connected with others by similarity between neurons, and so on. We here choose a special case where the interaction between neurons is weighted by the distance between them. This simplification aims to apply the new method to the creation of self-organizing maps. With this research, we expect new types of self-organizing maps to appear, ones which take into account the interactions

The self-organizing map (SOM) [1] is one of the most well-known techniques in neural networks. In particular, the SOM is commonly used for the visualization of complex data. Contradictorily, one of the main problems of the SOM is that it is difficult to represent final SOM knowledge. This is because self-organizing maps are generally only concerned with competition and cooperation between neurons, without due attention being paid to visualization in the course of learning. Thus, there have been many attempts to visually represent SOM knowledge [1], [2], [3], [4], [5], [6], [7], [8], [9]. However, it is still presently difficult to visualize SOM knowledge clearly; thus, the present study is an additional attempt at clearly visualizing SOM knowledge. The hypothetical improved visualization is possible by enhancing the characteristics common to neurons based upon their interactions. In addition, our method can be used to control the degree of interaction or cooperation, which contributes to the better

We applied our method to the analysis of Japanese automobile production for a period of twenty years. The automobile industry underwent drastic changes during these years due to severe competition in the development of environmentally friendly and fuel-efficient cars, and in reducing production costs. However, because of the lack of the methods to clarify the overall characteristics of the automobile industry, it has been difficult to clarify the main characteristics of automobile production. Our method is expected to focus upon the important characteristics of the automobile industry through social interaction, because two neurons with similar outputs interact with each other. Even if the conventional

> ©2012 Kamimura, licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 Kamimura; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use,

© 2012 Kamimura; licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

distribution, and reproduction in any medium, provided the original work is properly cited.

Additional information is available at the end of the chapter

Additional information is available at the end of the chapter

Ryotaro Kamimura

Ryotaro Kamimura

http://dx.doi.org/10.5772/51705

visualization of SOM knowledge.

1. Introduction

between neurons.

## **Chapter 2**

**Provisional chapter**

## **Social Interaction and Self-Organizing Maps**

**Social Interaction and Self-Organizing Maps**

Ryotaro Kamimura Additional information is available at the end of the chapter

Ryotaro Kamimura

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/51705

#### 1. Introduction

In this chapter, we consider neuron societies where there are many different types of interactions. In one society, a neuron is connected with others only by the distance between two neurons. In another one, a neuron is connected with others by similarity between neurons, and so on. We here choose a special case where the interaction between neurons is weighted by the distance between them. This simplification aims to apply the new method to the creation of self-organizing maps. With this research, we expect new types of self-organizing maps to appear, ones which take into account the interactions between neurons.

The self-organizing map (SOM) [1] is one of the most well-known techniques in neural networks. In particular, the SOM is commonly used for the visualization of complex data. Contradictorily, one of the main problems of the SOM is that it is difficult to represent final SOM knowledge. This is because self-organizing maps are generally only concerned with competition and cooperation between neurons, without due attention being paid to visualization in the course of learning. Thus, there have been many attempts to visually represent SOM knowledge [1], [2], [3], [4], [5], [6], [7], [8], [9]. However, it is still presently difficult to visualize SOM knowledge clearly; thus, the present study is an additional attempt at clearly visualizing SOM knowledge. The hypothetical improved visualization is possible by enhancing the characteristics common to neurons based upon their interactions. In addition, our method can be used to control the degree of interaction or cooperation, which contributes to the better visualization of SOM knowledge.

We applied our method to the analysis of Japanese automobile production for a period of twenty years. The automobile industry underwent drastic changes during these years due to severe competition in the development of environmentally friendly and fuel-efficient cars, and in reducing production costs. However, because of the lack of the methods to clarify the overall characteristics of the automobile industry, it has been difficult to clarify the main characteristics of automobile production. Our method is expected to focus upon the important characteristics of the automobile industry through social interaction, because two neurons with similar outputs interact with each other. Even if the conventional

Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 Kamimura; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 Kamimura; licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

©2012 Kamimura, licensee InTech. This is an open access chapter distributed under the terms of the Creative

**Figure 1.** Social interaction from an initial state (a) to a final state (d).

SOM does not create interpretable representations, our method can be used to create interpretable representations by controlling the degree of interaction.

In Section 2, we explain a concept of social interaction and how to compute social interaction. Then, we apply the method to the self-organizing maps. We define the KL-divergence between neurons in interaction and usual neurons. By minimizing the KL-divergence, we derive the optimal outputs and connection weights. In Section 3, we present the experimental results applied to the extraction of characteristics of automobile production from the period of 1993 to 2011 in Japan. We first determine the optimal representation to maximize mutual information between neurons and input patterns. Then, we try to interpret connection weights. In the discussion section, we try to interpret the final representations based on the events and incidents of this period.

## 2. Theory and computational methods

#### 2.1. Social interaction

In this chapter, we consider societies formed by the interaction of neurons. Suppose that two neurons' outputs are represented by *vj* and *vm*, respectively as shown in Figure 1. Then, the interaction is defined by the product of two neurons' outputs:

$$\mathbf{i}\mathbf{i}\mathbf{i}\mathbf{i}\mathbf{r}\mathbf{c}\mathbf{c}\mathbf{t}\_{jm} = v\_{j}v\_{m}.\tag{l}$$

The output from the *j*th neuron is defined by the sum of all interaction of the *j*th neuron and computed

*M* ∑ *m*=1

interact*<sup>j</sup>*

*<sup>m</sup>*=<sup>1</sup> interact*<sup>m</sup>*

*<sup>p</sup>*(*j*)log *<sup>p</sup>*(*j*) *q*(*j*)

*vjhjmvm*. (3)

Social Interaction and Self-Organizing Maps http://dx.doi.org/10.5772/51705

. (5)

*<sup>T</sup>*, *s* = 1, 2, ··· , *S*. Connection weights

*<sup>T</sup>*, *j* = 1, 2, . . . , *M*.

, (6)

(4)

29

interact*<sup>j</sup>* =

*q*(*j*) =

*D* =

into the *<sup>j</sup>*th neuron of total *<sup>M</sup>* neurons are computed by **<sup>w</sup>***<sup>j</sup>* = [*wj*1, *wj*2, ··· , *wjL*]

−<sup>1</sup> 2

relative output *p*(*j*) becomes closer to the output after the interaction.

*M* ∑ *j*=1

∑*<sup>M</sup>*

Then, we suppose that neurons gradually transform from an initial state of society without interaction in Figure 1(a) to a final state with interaction in Figure 1(d). Thus, we should develop a method to model this transformation. Now, let *p*(*j*) denote the relative output without the interaction of the *j*th neuron. Then, this neuron must imitate the corresponding neuron with interaction. The difference between two

A society of neurons is formed by minimizing this KL-divergence. By minimizing this divergence, the

Let us apply the concept of a society of neurons to the self-organizing maps. The *s*th input pattern of

where **x***<sup>s</sup>* and **w***<sup>j</sup>* are supposed to represent *L*-dimensional input and weight column vectors, where *L* denotes the number of input units. The *L* × *L* matrix **Λ** is called a "scaling matrix," and the *kl*th element

> *σα* <sup>=</sup> <sup>1</sup> *α*

1 *σ*2 *α*

<sup>2</sup>, ··· , *<sup>x</sup><sup>s</sup> L*]

(**x***<sup>s</sup>* − **<sup>w</sup>***j*)*T***Λ**(**x***<sup>s</sup>* − **<sup>w</sup>***j*)

, *k*, *l* = 1, 2, ··· , *L*. (7)

. (8)

<sup>1</sup>, *<sup>x</sup><sup>s</sup>*

The relative output after the interaction becomes

types of neurons can be defined by the KL-divergence:

total *S* patterns can be represented by **x***<sup>s</sup>* = [*x<sup>s</sup>*

Then, the *j*th neuron's output can be computed by

of the matrix denoted by (**Λ**)*kl* is defined by

where *σα* is a spread parameter and defined by

*vs*

*<sup>j</sup>* <sup>∝</sup> exp

(**Λ**)*kl* <sup>=</sup> *<sup>δ</sup>kl*

2.2. Application to SOM

by

In addition, the distance between two neurons should be considered. Now, suppose that the distance is represented by *hjm*. Then, the interaction is modified as

$$\mathbf{i}\mathbf{i}\mathbf{i}\mathbf{r}\mathbf{c}\mathbf{c}\mathbf{t}\_{\mathbf{j}\mathbf{m}}=\upsilon\_{\mathbf{j}}\mathbf{t}\_{\mathbf{j}\mathbf{m}}\upsilon\_{\mathbf{m}}.\tag{2}$$

The output from the *j*th neuron is defined by the sum of all interaction of the *j*th neuron and computed by

$$\mathbf{impact}\_{j} = \sum\_{m=1}^{M} v\_{j} h\_{jm} v\_{m}.\tag{3}$$

The relative output after the interaction becomes

2 Applications of Self-Organizing Maps

0

**Figure 1.** Social interaction from an initial state (a) to a final state (d).

representations by controlling the degree of interaction.

2. Theory and computational methods

represented by *hjm*. Then, the interaction is modified as

by the product of two neurons' outputs:

2.1. Social interaction

the final representations based on the events and incidents of this period.

SOM does not create interpretable representations, our method can be used to create interpretable

In Section 2, we explain a concept of social interaction and how to compute social interaction. Then, we apply the method to the self-organizing maps. We define the KL-divergence between neurons in interaction and usual neurons. By minimizing the KL-divergence, we derive the optimal outputs and connection weights. In Section 3, we present the experimental results applied to the extraction of characteristics of automobile production from the period of 1993 to 2011 in Japan. We first determine the optimal representation to maximize mutual information between neurons and input patterns. Then, we try to interpret connection weights. In the discussion section, we try to interpret

In this chapter, we consider societies formed by the interaction of neurons. Suppose that two neurons' outputs are represented by *vj* and *vm*, respectively as shown in Figure 1. Then, the interaction is defined

In addition, the distance between two neurons should be considered. Now, suppose that the distance is

interact*jm* = *vjvm*. (1)

interact*jm* = *vjhjmvm*. (2)

$$q(j) = \frac{\text{interact}\_{j}}{\sum\_{m=1}^{M} \text{interact}\_{m}} \tag{4}$$

Then, we suppose that neurons gradually transform from an initial state of society without interaction in Figure 1(a) to a final state with interaction in Figure 1(d). Thus, we should develop a method to model this transformation. Now, let *p*(*j*) denote the relative output without the interaction of the *j*th neuron. Then, this neuron must imitate the corresponding neuron with interaction. The difference between two types of neurons can be defined by the KL-divergence:

$$D = \sum\_{j=1}^{M} p(j) \log \frac{p(j)}{q(j)}.\tag{5}$$

A society of neurons is formed by minimizing this KL-divergence. By minimizing this divergence, the relative output *p*(*j*) becomes closer to the output after the interaction.

#### 2.2. Application to SOM

Let us apply the concept of a society of neurons to the self-organizing maps. The *s*th input pattern of total *S* patterns can be represented by **x***<sup>s</sup>* = [*x<sup>s</sup>* <sup>1</sup>, *<sup>x</sup><sup>s</sup>* <sup>2</sup>, ··· , *<sup>x</sup><sup>s</sup> L*] *<sup>T</sup>*, *s* = 1, 2, ··· , *S*. Connection weights into the *<sup>j</sup>*th neuron of total *<sup>M</sup>* neurons are computed by **<sup>w</sup>***<sup>j</sup>* = [*wj*1, *wj*2, ··· , *wjL*] *<sup>T</sup>*, *j* = 1, 2, . . . , *M*. Then, the *j*th neuron's output can be computed by

$$v\_j^s \propto \exp\left\{-\frac{1}{2} (\mathbf{x}^s - \mathbf{w}\_j)^T \boldsymbol{\Lambda} (\mathbf{x}^s - \mathbf{w}\_j) \right\},\tag{6}$$

where **x***<sup>s</sup>* and **w***<sup>j</sup>* are supposed to represent *L*-dimensional input and weight column vectors, where *L* denotes the number of input units. The *L* × *L* matrix **Λ** is called a "scaling matrix," and the *kl*th element of the matrix denoted by (**Λ**)*kl* is defined by

$$(\mathbf{A})\_{kl} = \delta\_{kl} \frac{1}{\sigma\_{\mathbf{a}}^2}, \quad k, l = 1, 2, \dots, L. \tag{7}$$

where *σα* is a spread parameter and defined by

$$
\sigma\_{\mathfrak{a}} = \frac{1}{\mathfrak{a}}.\tag{8}
$$

Let us consider the following neighborhood function usually used in self-organizing maps:

$$h\_{jc} \propto \exp\left(-\frac{||\mathbf{r}\_j - \mathbf{r}\_c||^2}{2\sigma\_\gamma^2}\right),\tag{9}$$

3. Experiments

the final results.

**Figure 2.** Network architecture for the automobile data.

3.1. Data description and network architecture

The automobile industry has undergone drastic changes these days because of the increasing interest in environmental problems and severe competition between different automobile manufacturers around the world. In particular, the Japanese automobile industry has undergone major changes in developing advanced technologies and lowering the costs of manufacturing. In advanced technologies, much focus has been upon more fuel-efficiency automobiles, like electric, hybrid, and fuel cell vehicles. In addition, the high appreciation of the Japanese yen has made it impossible to produce automobiles with lower costs in Japan. Thus, it is certain that these drastic changes have been observed in the production and sales of automobiles in Japan. However, it has been difficult to extract the overall characteristics from complex automobile production and sales data. We here focus upon the analysis of automobile

Social Interaction and Self-Organizing Maps http://dx.doi.org/10.5772/51705 31

production and try to show the main characteristics of the production over these twenty years.

The total data for automobile production ranged between the years 1993 and 2011. The numbers of variables were eight, namely, standard, small, and mini passenger cars; standard, small, and mini trucks; and large and small buses. The data was normalized to range between zero and one. We examined what kinds of characteristics could be obtained by visualizing the data by our method and compared the results with those by the conventional SOM. Figure 2 shows the network architecture for the automobile data. In the network, we had eight input units, corresponding to the eight variables used. The number of neurons in the output layer was 288 (24 × 12). We used the large size of the network to clearly visualize

where **r***<sup>j</sup>* and **r***<sup>c</sup>* denote the position of the *j*th and the *c*th unit on the output space and *σγ* is a spread parameter. Using this neighborhood function, we have

$$\text{interact}\_{\text{j}}^{\text{s}} = \sum\_{m=1}^{M} h\_{jm} \exp\left\{ -\frac{1}{2} (\mathbf{x}^{\text{s}} - \mathbf{w}\_{\text{j}})^{T} \boldsymbol{\Lambda} (\mathbf{x}^{\text{s}} - \mathbf{w}\_{\text{j}}) \right\}. \tag{10}$$

The relative output of the *j*th neuron with interaction can be obtained by

$$q(j \mid s) = \frac{\text{interact}^s\_j}{\sum\_{m=1}^M \text{interact}^s\_m}.\tag{11}$$

Let *p*(*j* | *s*) denote the relative output from the *j*th neuron without interaction; then KL divergence is defined by

$$D = \sum\_{s=1}^{S} p(s) \sum\_{j=1}^{M} p(s) p(j \mid s) \log \frac{p(j \mid s)}{q(j \mid s)}. \tag{12}$$

By minimizing this divergence, we have

$$p^\*(j \mid s) = \frac{q(j \mid s) \exp\left\{-\frac{1}{2} (\mathbf{x}^s - \mathbf{w}\_j)^T \boldsymbol{\Lambda} (\mathbf{x}^s - \mathbf{w}\_j)\right\}}{\sum\_{m=1}^M q(m \mid s) \exp\left\{-\frac{1}{2} (\mathbf{x}^s - \mathbf{w}\_m)^T \boldsymbol{\Lambda} (\mathbf{x}^s - \mathbf{w}\_m)\right\}}. \tag{13}$$

Then, by substituting *<sup>p</sup>*(*<sup>j</sup>* | *<sup>s</sup>*) for *<sup>p</sup>*∗(*<sup>j</sup>* | *<sup>s</sup>*) , we have the well-known free energy function [10], [11]

$$F = -2\sigma\_n^2 \sum\_{s=1}^{S} p(s) \log \sum\_{j=1}^{M} q(j|s) \exp\left\{ -\frac{1}{2} (\mathbf{x}^s - \mathbf{w}\_j)^T \mathbf{A} (\mathbf{x}^s - \mathbf{w}\_j) \right\}. \tag{14}$$

By differentiating the free energy, we can have connection weights

$$\mathbf{w}\_{\circ} = \frac{\sum\_{s=1}^{S} p^\*(j \mid s)\mathbf{x}^s}{\sum\_{s=1}^{S} p^\*(j \mid s)}. \tag{15}$$

## 3. Experiments

4 Applications of Self-Organizing Maps

defined by

Let us consider the following neighborhood function usually used in self-organizing maps:

where **r***<sup>j</sup>* and **r***<sup>c</sup>* denote the position of the *j*th and the *c*th unit on the output space and *σγ* is a spread

 −<sup>1</sup> 2

∑*<sup>M</sup>*

Let *p*(*j* | *s*) denote the relative output from the *j*th neuron without interaction; then KL divergence is

 −1

> −1

Then, by substituting *<sup>p</sup>*(*<sup>j</sup>* | *<sup>s</sup>*) for *<sup>p</sup>*∗(*<sup>j</sup>* | *<sup>s</sup>*) , we have the well-known free energy function [10], [11]

*q*(*j*|*s*) exp

∑*<sup>S</sup>*

 −<sup>1</sup> 2

*<sup>s</sup>*=<sup>1</sup> *<sup>p</sup>*∗(*<sup>j</sup>* <sup>|</sup> *<sup>s</sup>*)**x***<sup>s</sup>*

<sup>−</sup>�**r***<sup>j</sup>* <sup>−</sup> **<sup>r</sup>***c*�<sup>2</sup> 2*σ*<sup>2</sup> *γ*

> interact*<sup>s</sup> j*

*<sup>m</sup>*=<sup>1</sup> interact*<sup>s</sup>*

*<sup>p</sup>*(*s*)*p*(*<sup>j</sup>* <sup>|</sup> *<sup>s</sup>*)log *<sup>p</sup>*(*<sup>j</sup>* <sup>|</sup> *<sup>s</sup>*)

*q*(*j* | *s*)

<sup>2</sup> (**x***<sup>s</sup>* <sup>−</sup> **<sup>w</sup>***m*)*T***Λ**(**x***<sup>s</sup>* <sup>−</sup> **<sup>w</sup>***m*)

(**x***<sup>s</sup>* − **<sup>w</sup>***j*)*T***Λ**(**x***<sup>s</sup>* − **<sup>w</sup>***j*)

*<sup>s</sup>*=<sup>1</sup> *<sup>p</sup>*∗(*<sup>j</sup>* <sup>|</sup> *<sup>s</sup>*) . (15)

<sup>2</sup> (**x***<sup>s</sup>* <sup>−</sup> **<sup>w</sup>***j*)*T***Λ**(**x***<sup>s</sup>* <sup>−</sup> **<sup>w</sup>***j*)

(**x***<sup>s</sup>* − **<sup>w</sup>***j*)*T***Λ**(**x***<sup>s</sup>* − **<sup>w</sup>***j*)

*m*

, (9)

. (11)

. (12)

. (13)

. (14)

. (10)

*hjc* ∝ exp

*hjm* exp

*q*(*j* | *s*) =

parameter. Using this neighborhood function, we have

interact*<sup>s</sup>*

*j* =

*D* =

*<sup>p</sup>*∗(*<sup>j</sup>* <sup>|</sup> *<sup>s</sup>*) = *<sup>q</sup>*(*<sup>j</sup>* <sup>|</sup> *<sup>s</sup>*) exp

∑*<sup>M</sup>*

*p*(*s*)log

By differentiating the free energy, we can have connection weights

By minimizing this divergence, we have

*F* = −2*σ*<sup>2</sup> *α S* ∑ *s*=1

*S* ∑ *s*=1 *p*(*s*) *M* ∑ *j*=1

*<sup>m</sup>*=<sup>1</sup> *<sup>q</sup>*(*<sup>m</sup>* <sup>|</sup> *<sup>s</sup>*) exp

*M* ∑ *j*=1

**<sup>w</sup>***<sup>j</sup>* <sup>=</sup> <sup>∑</sup>*<sup>S</sup>*

*M* ∑ *m*=1

The relative output of the *j*th neuron with interaction can be obtained by

#### 3.1. Data description and network architecture

The automobile industry has undergone drastic changes these days because of the increasing interest in environmental problems and severe competition between different automobile manufacturers around the world. In particular, the Japanese automobile industry has undergone major changes in developing advanced technologies and lowering the costs of manufacturing. In advanced technologies, much focus has been upon more fuel-efficiency automobiles, like electric, hybrid, and fuel cell vehicles. In addition, the high appreciation of the Japanese yen has made it impossible to produce automobiles with lower costs in Japan. Thus, it is certain that these drastic changes have been observed in the production and sales of automobiles in Japan. However, it has been difficult to extract the overall characteristics from complex automobile production and sales data. We here focus upon the analysis of automobile production and try to show the main characteristics of the production over these twenty years.

The total data for automobile production ranged between the years 1993 and 2011. The numbers of variables were eight, namely, standard, small, and mini passenger cars; standard, small, and mini trucks; and large and small buses. The data was normalized to range between zero and one. We examined what kinds of characteristics could be obtained by visualizing the data by our method and compared the results with those by the conventional SOM. Figure 2 shows the network architecture for the automobile data. In the network, we had eight input units, corresponding to the eight variables used. The number of neurons in the output layer was 288 (24 × 12). We used the large size of the network to clearly visualize the final results.

**Figure 2.** Network architecture for the automobile data.

#### 3.2. Optimal representation and mutual information

The social interaction method can produce many different types of networks by taking into account the degree of interaction and competition. The degree of interaction can be changed through the parameter *α*. Thus, we must choose an appropriate representation among them. One of the possibilities is to use mutual information between neurons and input patterns. When this mutual information is increased, neurons tend to contain more information on input patterns. Mutual information can be defined by

$$I(\mathfrak{a}) = \sum\_{s=1}^{S} p(s) \sum\_{j=1}^{M} p(s) p(j \mid s; \mathfrak{a}) \log \frac{p(j \mid s; \mathfrak{a})}{p(j; \mathfrak{a})}.\tag{16}$$

**Figure 3.** Mutual information as a function of the parameter *α* (a) and with a fixed value (1/10) of the parameter *α* (b).

the lowest around 2011.

production of mini-cars remained relatively higher.

Figure 6 shows connection weights from the eight variables. As shown in Figure 6(a3), in the second and third periods, the production of mini-cars was very large, shown in warmer colors. On the other hand, standard, small and mini trucks were more heavily produced in the first period, in Figure 6 (b1), (b2) and (b3). In the third period, standard passenger cars and small buses were produced largely, represented by warmer colors in Figure 6(a1) and (c2). In addition, for all variables, the parts on the left hand at the bottom were very low in dark blue. This means that the production of automobiles was

Social Interaction and Self-Organizing Maps http://dx.doi.org/10.5772/51705 33

Figure 7 shows connection weights in nine typical neurons located and shown on the map in Figure 5(2). In the first period, the production of small passenger cars and trucks was large and the levels of production decreased gradually from (a3) to (a1). In the second period, the production gradually increased. In particular, the production of mini-cars increased from left (b1) to right (b3). In the beginning of the third period, in Figure 7(c3), the production of standard passenger cars and small buses were much higher than that of any other type of cars. However, the production decreased gradually in Figure 7(c2). Finally, in 2011, shown in Figure 7(c1), though overall production was very low, the

One of the problems with this mutual information is that it increases constantly when the Gaussian width decreases or the parameter *α* increases, as shown in Figure 3(a). Thus, we must assign a constant value to the parameter *α*. Note that in actual learning, the parameter *α* was changed from one to ten, and the parameter was fixed only for computing mutual information. Figure 3(b) shows this mutual information when the parameter *α* was set to 1/10. As can be seen in the figure, mutual information increased initially and reached its highest point when the parameter *α* was 4. Then, mutual information gradually decreased. Though mutual information increased when the parameter *α* was increased in Figure 3(a), the actual mutual information did not increase when the parameter *α* was increased from 4 in Figure 3(b). Thus, we can say that when the parameter *α* was 4, we could obtain an optimal representation which had the maximum amount of information on input patterns.

Figure 4 shows the U-matrices when the parameter *α* was changed from 1 (a) to 10 (i). When the parameter *α* was 1 in Figure 4(a), the centralized class boundary was too huge. When the parameter *α* was 2, the huge class boundary became smaller, see Figure 4(b). When the parameter *α* was further increased to 3 in Figure 4(c), a class boundary in warmer colors on the upper side of the matrix became clearer, and other class boundaries began to appear on the lower side of the matrix. When the parameter *α* was 4 in Figure 4(d), the class boundary on the upper side of the matrix became the clearest and the other boundaries on the lower side became much clearer. Then, when the parameter *α* was further increased from 5 in Figure 4(e) to 10 (i), the class boundaries began to gradually deteriorate. These results corresponded to those of mutual information in Figure 3(b). When mutual information was 4, we could obtain maximum information, and then mutual information gradually decreased. When mutual information reached its maximum, the clearest representation in Figure 4(d) could be obtained.

#### 3.3. Interpretation of optimal representation

We interpret the optimal representation with maximum mutual information when the parameter *α* was 4. Figure 5 shows the U-matrix and labels with class boundaries when the parameter *α* was 4. As shown in Figure 5(1), a clear class boundary in warmer color could be detected on the upper side of the matrix. Additionally, several minor class boundaries were located on the lower side of the matrix. From these boundaries and labels in Figure 5(2), the data was classified into three classes (periods). The first period (a) represented the production from 1993-1998. The second period ranged between 1999 and 2006, and the third period between 2007 and 2011. In the third period, the period between 2007 and 2008 and the year 2011 were separated from the period in the middle. In addition, we can see that in the first and the third periods, the data were arranged from right to left. On the other hand, in the second period, the data were arranged from left to right.

6 Applications of Self-Organizing Maps

3.2. Optimal representation and mutual information

*I*(*α*) =

3.3. Interpretation of optimal representation

were arranged from left to right.

*S* ∑ *s*=1 *p*(*s*) *M* ∑ *j*=1

representation which had the maximum amount of information on input patterns.

The social interaction method can produce many different types of networks by taking into account the degree of interaction and competition. The degree of interaction can be changed through the parameter *α*. Thus, we must choose an appropriate representation among them. One of the possibilities is to use mutual information between neurons and input patterns. When this mutual information is increased, neurons tend to contain more information on input patterns. Mutual information can be defined by

One of the problems with this mutual information is that it increases constantly when the Gaussian width decreases or the parameter *α* increases, as shown in Figure 3(a). Thus, we must assign a constant value to the parameter *α*. Note that in actual learning, the parameter *α* was changed from one to ten, and the parameter was fixed only for computing mutual information. Figure 3(b) shows this mutual information when the parameter *α* was set to 1/10. As can be seen in the figure, mutual information increased initially and reached its highest point when the parameter *α* was 4. Then, mutual information gradually decreased. Though mutual information increased when the parameter *α* was increased in Figure 3(a), the actual mutual information did not increase when the parameter *α* was increased from 4 in Figure 3(b). Thus, we can say that when the parameter *α* was 4, we could obtain an optimal

Figure 4 shows the U-matrices when the parameter *α* was changed from 1 (a) to 10 (i). When the parameter *α* was 1 in Figure 4(a), the centralized class boundary was too huge. When the parameter *α* was 2, the huge class boundary became smaller, see Figure 4(b). When the parameter *α* was further increased to 3 in Figure 4(c), a class boundary in warmer colors on the upper side of the matrix became clearer, and other class boundaries began to appear on the lower side of the matrix. When the parameter *α* was 4 in Figure 4(d), the class boundary on the upper side of the matrix became the clearest and the other boundaries on the lower side became much clearer. Then, when the parameter *α* was further increased from 5 in Figure 4(e) to 10 (i), the class boundaries began to gradually deteriorate. These results corresponded to those of mutual information in Figure 3(b). When mutual information was 4, we could obtain maximum information, and then mutual information gradually decreased. When mutual information reached its maximum, the clearest representation in Figure 4(d) could be obtained.

We interpret the optimal representation with maximum mutual information when the parameter *α* was 4. Figure 5 shows the U-matrix and labels with class boundaries when the parameter *α* was 4. As shown in Figure 5(1), a clear class boundary in warmer color could be detected on the upper side of the matrix. Additionally, several minor class boundaries were located on the lower side of the matrix. From these boundaries and labels in Figure 5(2), the data was classified into three classes (periods). The first period (a) represented the production from 1993-1998. The second period ranged between 1999 and 2006, and the third period between 2007 and 2011. In the third period, the period between 2007 and 2008 and the year 2011 were separated from the period in the middle. In addition, we can see that in the first and the third periods, the data were arranged from right to left. On the other hand, in the second period, the data

*<sup>p</sup>*(*s*)*p*(*<sup>j</sup>* <sup>|</sup> *<sup>s</sup>*; *<sup>α</sup>*)log *<sup>p</sup>*(*<sup>j</sup>* <sup>|</sup> *<sup>s</sup>*; *<sup>α</sup>*)

*<sup>p</sup>*(*j*; *<sup>α</sup>*) . (16)

**Figure 3.** Mutual information as a function of the parameter *α* (a) and with a fixed value (1/10) of the parameter *α* (b).

Figure 6 shows connection weights from the eight variables. As shown in Figure 6(a3), in the second and third periods, the production of mini-cars was very large, shown in warmer colors. On the other hand, standard, small and mini trucks were more heavily produced in the first period, in Figure 6 (b1), (b2) and (b3). In the third period, standard passenger cars and small buses were produced largely, represented by warmer colors in Figure 6(a1) and (c2). In addition, for all variables, the parts on the left hand at the bottom were very low in dark blue. This means that the production of automobiles was the lowest around 2011.

Figure 7 shows connection weights in nine typical neurons located and shown on the map in Figure 5(2). In the first period, the production of small passenger cars and trucks was large and the levels of production decreased gradually from (a3) to (a1). In the second period, the production gradually increased. In particular, the production of mini-cars increased from left (b1) to right (b3). In the beginning of the third period, in Figure 7(c3), the production of standard passenger cars and small buses were much higher than that of any other type of cars. However, the production decreased gradually in Figure 7(c2). Finally, in 2011, shown in Figure 7(c1), though overall production was very low, the production of mini-cars remained relatively higher.

**Figure 5.** U-matrices (1) and labels (2) when the parameter *α* was 4.

that the class boundaries in Figure 8(b) corresponded to those in Figure 5(2).

*Ij* = log *<sup>L</sup>* +

We here compare the results of our method with those obtained by the standard SOM and PCA. Figure 8 shows the U-matrix and labels by the conventional SOM. We used the SOM toolbox for the experiments [4]. As can be seen in Figure 8(a), two class boundaries in warmer colors appeared on the upper side and the lower left hand side of the matrix, but they were rather ambiguous. Labels in Figure 8(b) show

Figure 9 shows information contained in the *j*th neuron on the input neurons. Let *p*(*k* | *j*) denote the relative output of the *k*th input neuron for the *j*th neuron; then, information for the *j*th neuron on the

> *wjk* ∑*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> *wjl*

*p*(*k* | *j*)log *p*(*k* | *j*), (17)

. (18)

Social Interaction and Self-Organizing Maps http://dx.doi.org/10.5772/51705 35

*L* ∑ *k*=1

*p*(*k* | *j*) =

3.4. Comparison with SOM and PCA

input neurons is defined by

where

**Figure 4.** U-matrices when the parameter *α* was changed from 1 (a) to 10 (i).

<sup>34</sup> Applications of Self-Organizing Maps Social Interaction and Self-Organizing Maps 9 Social Interaction and Self-Organizing Maps http://dx.doi.org/10.5772/51705 35

**Figure 5.** U-matrices (1) and labels (2) when the parameter *α* was 4.

#### 3.4. Comparison with SOM and PCA

We here compare the results of our method with those obtained by the standard SOM and PCA. Figure 8 shows the U-matrix and labels by the conventional SOM. We used the SOM toolbox for the experiments [4]. As can be seen in Figure 8(a), two class boundaries in warmer colors appeared on the upper side and the lower left hand side of the matrix, but they were rather ambiguous. Labels in Figure 8(b) show that the class boundaries in Figure 8(b) corresponded to those in Figure 5(2).

Figure 9 shows information contained in the *j*th neuron on the input neurons. Let *p*(*k* | *j*) denote the relative output of the *k*th input neuron for the *j*th neuron; then, information for the *j*th neuron on the input neurons is defined by

$$I\_j = \log L + \sum\_{k=1}^{L} p(k \mid j) \log p(k \mid j), \tag{17}$$

where

8 Applications of Self-Organizing Maps

**Figure 4.** U-matrices when the parameter *α* was changed from 1 (a) to 10 (i).

$$p(k \mid j) = \frac{w\_{jk}}{\sum\_{l=1}^{L} w\_{jl}}.\tag{18}$$

**Figure 6.** Connection weights from all variables when the parameter *α* was 4.

Figure 9 shows this information computed by the social interaction (a) and SOM (b). As shown in Figure 9(a), we could see three classes on the map by the social interaction. On the other hand, by the SOM, as in Figure 9(b), boundaries between three classes were not always clear. On the lower left hand side of the maps by the social interaction and SOM, neurons with the highest information on input neurons appeared. This part corresponded to year 2011, where only mini-car was produced largely. This proves that the year 2011 showed the most explicit characteristic of all periods. Namely, the number of mini cars was much larger than any other cars in terms of production.

Figure 10 shows the results of PCA applied to data itself (a), connection weights by the conventional SOM (b) and social interaction (c). With the PCA applied to the data itself, seen in Figure 10(a), three classes were observed but they were extensively overlapping. Figure 10(b) shows the results of PCA applied to the connection weights by the conventional SOM. Though three classes could be observed, many weights were scattered between boundaries. Finally, when the social interaction was used in Figure 10(c), the classes were clearly separated.

**Figure 7.** Connection weights into six typical neurons in Figure 5(2).

Social Interaction and Self-Organizing Maps http://dx.doi.org/10.5772/51705 37

<sup>36</sup> Applications of Self-Organizing Maps Social Interaction and Self-Organizing Maps 11 Social Interaction and Self-Organizing Maps http://dx.doi.org/10.5772/51705 37

**Figure 7.** Connection weights into six typical neurons in Figure 5(2).

10 Applications of Self-Organizing Maps

**Figure 6.** Connection weights from all variables when the parameter *α* was 4.

Figure 10(c), the classes were clearly separated.

Figure 9 shows this information computed by the social interaction (a) and SOM (b). As shown in Figure 9(a), we could see three classes on the map by the social interaction. On the other hand, by the SOM, as in Figure 9(b), boundaries between three classes were not always clear. On the lower left hand side of the maps by the social interaction and SOM, neurons with the highest information on input neurons appeared. This part corresponded to year 2011, where only mini-car was produced largely. This proves that the year 2011 showed the most explicit characteristic of all periods. Namely,

Figure 10 shows the results of PCA applied to data itself (a), connection weights by the conventional SOM (b) and social interaction (c). With the PCA applied to the data itself, seen in Figure 10(a), three classes were observed but they were extensively overlapping. Figure 10(b) shows the results of PCA applied to the connection weights by the conventional SOM. Though three classes could be observed, many weights were scattered between boundaries. Finally, when the social interaction was used in

the number of mini cars was much larger than any other cars in terms of production.

1 0. 5 0 0.5

1

0. 5

0

0.5

1

1 0. 5 0 0.5

Social Interaction and Self-Organizing Maps http://dx.doi.org/10.5772/51705 39

1

(a) Data (b) Weights by SOM

1 0. 5 0 0.5

(b) Weights by social interaction **Figure 10.** Results of PCA analysis applied to the data (a), connection weights by SOM (b) and those by social interaction (c).

in Figure 7(b3) to (b1). Then, in the beginning of the third period (2007 and 2008), the production of standard passenger cars and small buses increased significantly, shown in Figure 7(c3). The production then decreased again in Figure 7(c2). Finally, in 2011, only the production of mini-cars maintained relatively high production rates, while all the other types of car showed rather low production rates, as

These characteristics can be explained by the two important factors occuring in these periods: the

First, the class boundary between the first and second period could be explained by the revised regulation law for mini cars in 1998. In the first period, all types of cars were being produced equally, except standard and mini-cars and small buses. In the 2000s, only the production of mini-cars increased, albeit gradually. We examined the events and incidents around this boundary period, and found that

revised regulation law for mini cars and the economic crisis called the "Lehman shock."

0. 5

0

0.5

1

1

shown in Figure 7(c1).

*3.5.2. Explaining by Actual Events*

0. 5

0

0.5

1

**Figure 9.** Information content of neurons on input neurons by social interaction (a) and SOM (b).

#### 3.5. Discussion

#### *3.5.1. Summary of Results*

Let us summarize the main results of the automobile production. In 2000s, the automobile production gradually decreased as shown in Figure 7(a3) to (a1). In the second period (the beginning of 1990s), the production inversely increased, and in particular, the production of mini-cars increased as shown

**Figure 10.** Results of PCA analysis applied to the data (a), connection weights by SOM (b) and those by social interaction (c).

in Figure 7(b3) to (b1). Then, in the beginning of the third period (2007 and 2008), the production of standard passenger cars and small buses increased significantly, shown in Figure 7(c3). The production then decreased again in Figure 7(c2). Finally, in 2011, only the production of mini-cars maintained relatively high production rates, while all the other types of car showed rather low production rates, as shown in Figure 7(c1).

#### *3.5.2. Explaining by Actual Events*

12 Applications of Self-Organizing Maps

**Figure 8.** U-matrix (a) and labels (b) by the conventional SOM.

3.5. Discussion

*3.5.1. Summary of Results*

**Figure 9.** Information content of neurons on input neurons by social interaction (a) and SOM (b).

Let us summarize the main results of the automobile production. In 2000s, the automobile production gradually decreased as shown in Figure 7(a3) to (a1). In the second period (the beginning of 1990s), the production inversely increased, and in particular, the production of mini-cars increased as shown These characteristics can be explained by the two important factors occuring in these periods: the revised regulation law for mini cars and the economic crisis called the "Lehman shock."

First, the class boundary between the first and second period could be explained by the revised regulation law for mini cars in 1998. In the first period, all types of cars were being produced equally, except standard and mini-cars and small buses. In the 2000s, only the production of mini-cars increased, albeit gradually. We examined the events and incidents around this boundary period, and found that the automobile regulation by the Japanese government was revised in 1998. In the revision of the safety regulation for the mini-cars in 1998, the size of mini-cars became larger and the safety levels became higher, obtaining performance comparable to that of larger cars. Because of this revision of the regulation, the Japanese automobile market was drastically changed around 1998.

*3.5.5. Possibilities of the Method*

self-organizing maps.

4. Conclusion

weights.

Author details

Ryotaro Kamimura

The main possibilities of the method are summarized by two points, namely, flexibility and new

Social Interaction and Self-Organizing Maps http://dx.doi.org/10.5772/51705 41

First, one of the main beneficial characteristics of our method is its flexibility. Fundamentally, we aim to create a general theory of social interaction. For that, we must take into account many types of interactions. For simplification, social interaction is supposed to be the product of two neurons; thus, in this study, only the distance between two neurons was taken into account. However, it is easy to include any kind of interaction only by substituting the present probabilities *q*(*j* | *s*) by new ones. For example, we can imagine a case where even if the distance between neurons is very large, they still may strongly

Second, we can create new types of self-organizing maps based upon the social interaction. As mentioned above, our method can create a variety of interactions between neurons. Based upon these different types of interaction, it is possible for networks to self-organize, leading to characteristics different from those by the conventional SOM. If we take into account the different types of cooperation

In this chapter, we proposed a new type of information-theoretic method in which neurons are supposed to form a society. In this society, the interaction of neurons is the product of all neighboring neurons' outputs weighted by their distance. The individual neuron tries to imitate this interaction as much as possible. The difference between neurons with and without interaction is computed by the KL-divergence. By minimizing the KL-divergence, we can obtain the optimal outputs of the neuron and the free energy. By differentiating the free energy, we can obtain the re-estimation rules for connection

We applied our method to the data of the production of Japanese automobiles during the period of 1993 and 2011. We can summarize the final results from two points of view. Technically, the new method showed better performance in clarifying class boundaries, compared with the conventional SOM. Explicit class boundaries were due to the interaction of neurons, similar neurons interacting strongly with each other in terms of distance and firing rates. Second, the strong class boundaries were traced back to the important events or incidents which occurred in the period. For example, the class boundary between the first and the second period was due to the revision of regulation law for mini-cars. Thanks to this revision, the number of mini-cars in production increased gradually. In the third period, a significant production increase at the beginning of the period was accompanied by a decrease in production of other models, with only mini-cars being largely produced in the end. This

Though there are some problems such as optimality and topological preservation, as explained in the discussion section, we have shown that it is possible to create different types of neuron societies, where

be connected with each other. We can take into account this kind of interaction.

between neurons, new types of self-organizing maps can be created.

period was well explained by the economic crisis in 2008.

Ryotaro Kamimura IT Education Center Tokai University, Japan

different kinds of interaction can be implemented.

Second, the third period was explained by the economic crisis of 2008. We could observe the high production in standard passenger cars and small buses in the beginning of the third period in Figure 7(c3) around 2007 and 2008. In this period, we recognized the well-known "Lehman Shock" phenomenon following the economic crisis, which damaged the Japanese automobile industry. In particular, the increase in the production of standard passenger cars in this period was one of the main causes of troubles in the automobile industry.

#### *3.5.3. Implication for Automobile Industry*

Considering these results and facts, we can point out two factors concerning the automobile industry, namely, policy and planning.

First, one important factor in the development of automobile industry is the policy for the industry. It is necessary to guide the industry through the effective and industrial policy, conceptualized and implemented by the government. In our experimental results, the revised regulation law for the mini-cars drastically changed the market, leading to a sharp increase in the production of mini cars.

Second, production should be more carefully planned. The increase in production in the beginning of 2000s had long lasting negative effects on the automobile industry. We observed that the production in the beginning of 2000s was focused on mini cars, meaning that smaller cars were generally preferred. Despite this, standard passenger cars were largely produced in the beginning of the period. Even if the majority were for export purposes, more restrained production should have been expected, which would have led to lessened damages from the economic crisis.

#### *3.5.4. Problems of the method*

Though our method has shown better performance in visualization, we should point out two problems, namely, optimality and topological preservation.

First, we used mutual information to obtain optimal representations. In other words, mutual information was used to choose the optimal values of the parameter *α*. When mutual information increases, neurons tend to respond very specifically to input patterns. By increasing mutual information, representations become simpler. However, one of the problems is that we did not increase this mutual information, but rather decreased KL-divergence. Thus, we need to examine the relation between KL-divergence and mutual information more carefully.

Second, we should examine the relations between visualization and topological preservation. We have shown that the method worked better to clarify class boundaries. When visualization can be improved, it may happen that topological relations cannot be maintained. This is because better visualization enhances some parts of input patterns, reducing topological preservation. However, we have not yet finished examining the relations between the improved performance and topological preservation. Even if the performance in visualization is improved, if topological relations are not preserved, then the reliability of the final maps decreases. Thus, we should more precisely examine the relationship between visual performance and topological preservation.

#### *3.5.5. Possibilities of the Method*

14 Applications of Self-Organizing Maps

troubles in the automobile industry.

namely, policy and planning.

*3.5.4. Problems of the method*

mutual information more carefully.

*3.5.3. Implication for Automobile Industry*

have led to lessened damages from the economic crisis.

namely, optimality and topological preservation.

visual performance and topological preservation.

the automobile regulation by the Japanese government was revised in 1998. In the revision of the safety regulation for the mini-cars in 1998, the size of mini-cars became larger and the safety levels became higher, obtaining performance comparable to that of larger cars. Because of this revision of the

Second, the third period was explained by the economic crisis of 2008. We could observe the high production in standard passenger cars and small buses in the beginning of the third period in Figure 7(c3) around 2007 and 2008. In this period, we recognized the well-known "Lehman Shock" phenomenon following the economic crisis, which damaged the Japanese automobile industry. In particular, the increase in the production of standard passenger cars in this period was one of the main causes of

Considering these results and facts, we can point out two factors concerning the automobile industry,

First, one important factor in the development of automobile industry is the policy for the industry. It is necessary to guide the industry through the effective and industrial policy, conceptualized and implemented by the government. In our experimental results, the revised regulation law for the mini-cars drastically changed the market, leading to a sharp increase in the production of mini cars. Second, production should be more carefully planned. The increase in production in the beginning of 2000s had long lasting negative effects on the automobile industry. We observed that the production in the beginning of 2000s was focused on mini cars, meaning that smaller cars were generally preferred. Despite this, standard passenger cars were largely produced in the beginning of the period. Even if the majority were for export purposes, more restrained production should have been expected, which would

Though our method has shown better performance in visualization, we should point out two problems,

First, we used mutual information to obtain optimal representations. In other words, mutual information was used to choose the optimal values of the parameter *α*. When mutual information increases, neurons tend to respond very specifically to input patterns. By increasing mutual information, representations become simpler. However, one of the problems is that we did not increase this mutual information, but rather decreased KL-divergence. Thus, we need to examine the relation between KL-divergence and

Second, we should examine the relations between visualization and topological preservation. We have shown that the method worked better to clarify class boundaries. When visualization can be improved, it may happen that topological relations cannot be maintained. This is because better visualization enhances some parts of input patterns, reducing topological preservation. However, we have not yet finished examining the relations between the improved performance and topological preservation. Even if the performance in visualization is improved, if topological relations are not preserved, then the reliability of the final maps decreases. Thus, we should more precisely examine the relationship between

regulation, the Japanese automobile market was drastically changed around 1998.

The main possibilities of the method are summarized by two points, namely, flexibility and new self-organizing maps.

First, one of the main beneficial characteristics of our method is its flexibility. Fundamentally, we aim to create a general theory of social interaction. For that, we must take into account many types of interactions. For simplification, social interaction is supposed to be the product of two neurons; thus, in this study, only the distance between two neurons was taken into account. However, it is easy to include any kind of interaction only by substituting the present probabilities *q*(*j* | *s*) by new ones. For example, we can imagine a case where even if the distance between neurons is very large, they still may strongly be connected with each other. We can take into account this kind of interaction.

Second, we can create new types of self-organizing maps based upon the social interaction. As mentioned above, our method can create a variety of interactions between neurons. Based upon these different types of interaction, it is possible for networks to self-organize, leading to characteristics different from those by the conventional SOM. If we take into account the different types of cooperation between neurons, new types of self-organizing maps can be created.

#### 4. Conclusion

In this chapter, we proposed a new type of information-theoretic method in which neurons are supposed to form a society. In this society, the interaction of neurons is the product of all neighboring neurons' outputs weighted by their distance. The individual neuron tries to imitate this interaction as much as possible. The difference between neurons with and without interaction is computed by the KL-divergence. By minimizing the KL-divergence, we can obtain the optimal outputs of the neuron and the free energy. By differentiating the free energy, we can obtain the re-estimation rules for connection weights.

We applied our method to the data of the production of Japanese automobiles during the period of 1993 and 2011. We can summarize the final results from two points of view. Technically, the new method showed better performance in clarifying class boundaries, compared with the conventional SOM. Explicit class boundaries were due to the interaction of neurons, similar neurons interacting strongly with each other in terms of distance and firing rates. Second, the strong class boundaries were traced back to the important events or incidents which occurred in the period. For example, the class boundary between the first and the second period was due to the revision of regulation law for mini-cars. Thanks to this revision, the number of mini-cars in production increased gradually. In the third period, a significant production increase at the beginning of the period was accompanied by a decrease in production of other models, with only mini-cars being largely produced in the end. This period was well explained by the economic crisis in 2008.

Though there are some problems such as optimality and topological preservation, as explained in the discussion section, we have shown that it is possible to create different types of neuron societies, where different kinds of interaction can be implemented.

#### Author details

Ryotaro Kamimura

Ryotaro Kamimura IT Education Center Tokai University, Japan

#### References

[1] J. W. Sammon. A nonlinear mapping for data structure analysis. *IEEE Transactions on Computers*, C-18(5):401–409, 1969.

**Chapter 3**

**Using Wavelets for Feature Extraction and Self**

Héctor Benítez-Pérez, Jorge L. Ortega-Arjona and

Additional information is available at the end of the chapter

**Dynamic Systems**

Alma Benítez-Pérez

**1. Introduction**

http://dx.doi.org/10.5772/50235

**Organizing Maps for Fault Diagnosis of Nonlinear**

Fault diagnosis has been established in two main approaches: model-based fault diagnosis and model-free fault diagnosis. Present paper focuses on the later, mainly as an extension of the approach proposed in [17]. The challenge here is to classify faults at early stages, with an accurate response. However, as the term model-free implies, a model for the plant is not available neither for fault-free nor for fault-present scenarios. The objective, thus, is to classi‐ fy faults based on system's response and the related signal analysis, in terms of dilation and shift decomposition, as obtained by a wavelets approach. So, self-organizing maps (SOM) are proposed as a powerful nonlinear neural network to achieve such a fault classification.

Several strategies have been proposed for feature extraction using wavelets. For instance, [1] presents a wavelet packet feature extraction, based on the analysis and measure of a "distance" between the energy distribution of some signal classes and the proper classifi‐ cation by the use of fuzzy sets. Alternatively, [2] proposes the use of wavelets as a strat‐ egy of parametric system identification, giving prime emphasis to wavelet properties and parameter relations. The idea of using wavelets for fault classification is a powerful proce‐ dure for feature extraction of several scenarios, even in the case of frequency and power shifts. [3] and [4] have explored this approach for process system, in which practical re‐ sults are satisfactory, regardless of the classification. Moreover, several other strategies us‐ ing wavelets have been proposed for abnormal signal detection, like that presented in [5], in which a parasitic wavelet transform is proposed. Further research in the same direction is followed in [6], in which a cubic spline methodology is proposed for the boundary

> © 2012 Benítez-Pérez et al.; licensee InTech. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2012 Benítez-Pérez et al.; licensee InTech. This is a paper distributed under the terms of the Creative Commons

