**3.1.1 Data characteristics**

The sounding data are obtained by radiosondes transported by meteorological balloons. A radiosonde may determine various atmospheric parameters, such as atmospheric pressure,

Thus, the RSOM takes into account the past inputs and also starts to remember explicitly the

 For the determination of the winner neurons in RSOM is necessary to calculate and record the recursive difference **y***i(t)*, while in SOM the choice criterion of the winner

The winner neuron in RSOM is one with smallest recursive difference **y***i(t)*, while in

To note that if =1 the RSOM network becomes identical to a SOM network (Salhi et al., 2009). Angelovič (2005) discribes several advantages of the use RSOM for prediction systems. First, the small computing complexity, opposite to the global models. Then, the unsupervised learning. It allows building models from the data with only a little a priori knowledge.

The study used sounding data from weather station denominated SBBE, number 82193 (Belem airport), in the interval 2003 to 2010. The data were collected at the University of Wyoming website. The Figure 3 shows the station location, in the eastern Amazon Region,

The sounding data are obtained by radiosondes transported by meteorological balloons. A radiosonde may determine various atmospheric parameters, such as atmospheric pressure,

This section discusses the study area, data pre-processing and models training.

with their geographic coordinates (latitude: 1.38 S and longitude: 48.48 W).

Where:

(*t*) is a learning rate, at time *t*;

Fig. 2. RSOM algorithm flow diagram

neurons is the quantization error;

**3.1 Study area and data pre-processing** 

**3. Materials and methods** 

**3.1.1 Data characteristics** 

SOM is one with smallest quantization error.

space-time patterns.

*hib*(*t*) is a neighbourhood function, at time *t*;

**y***i(t)* is the recursive difference of the neuron *i*, at time *t*.

A RSOM algorithm flow diagram is exhibited in Figure 2.

The basic differences between RSOM and SOM networks are:

temperature, dewpoint temperature, relative humidity, among others, in various atmospheric levels. These parameters are used to calculate sounding indices that seek to analyze the state of the atmosphere at a given time. Figure 4 shows an example of sounding indices collected from a radiosonde launched on January 1, 2010 at 12 h UTC.

Fig. 3. SBBE station localization (Belem airport).

For the evaluation of the atmospheric static stability, used for thunderstorms forecasting, several indicators have been developed (Peppier, 1988). Some indicators admit as instability factors the temperature difference and humidity difference between two pressure levels; while others, besides these factors, add the characteristics of the wind (speed and direction) at the same pressure levels. There are also indices based on the energy requirements for the occurrence of convective phenomena. Some indices and parameters used for the thunderstorms forecasting are: Showalter Index, K Index, Lifted Index, Cross Totals Index, Vertical Totals Index, Total Totals Index, SWEAT Index, Convective Inhibition, Convective Available Potential Energy, Level of Free Convection, Precipitable Water, among others.

Recurrent Self-Organizing Map for Severe Weather Patterns Recognition 159

The SWEAT index (or Severe Weather Threat Index) uses several variables (dewpoint, wind speed and direction, among others) to determine the likeliness of severe weather. The Convective Available Potential Energy (CAPE) is the integration of the positive area on a Skew-T sounding diagram. It exists when the difference between the equivalent potential temperature of the air parcel and the saturated equivalent potential temperature of the environment is positive. This means that the pseudo-adiabatic of the displaced air parcel is warmer than the environment (unstable condition). The Level of Free Convection (LFCT) is the CAPE region lower boundary. At this level a lifted air parcel will become equal in temperature to that of the environmental temperature. Once an air parcel is lifted to the LFCT it will rise all the way to the CAPE region top. The Precipitable Water (or Precipitable

Water Vapor) is a parameter which gives the amount of moisture in the troposphere.

of the cleansing process, a total of 1774 examples were obtained.

*value*

values of these input variables in normalized values within the range [0, 1].

*normalized*

The input vectors contained four variables: SWEAT index (SWET), Convective Available Potential Energy (CAPE), Level of Free Convection (LFCT) and Precipitable Water (PWAT). The vectors containing missing data from one or more variables were discarded. At the end

To reduce the discrepancies magnitude in the input vectors values, the min-max normalization was applied according to the equation 11. This transformed the original

min

(11)

max min *original*

*value A*

*A A*

Fig. 5. Variance explained for principal components

**3.1.3 Data cleansing** 

**3.1.4 Data normalization** 

Fig. 4. SBBE station information and sounding indices

#### **3.1.2 Data selection**

A selection algorithm was used to identify all 24 available atmospheric indices from radiosoundings performed at 12h UTC (9h Local Time) in the period analyzed. Subsequently, the indices calculated with virtual temperature were eliminated, leaving only 18 indices.

After normalization of these 18 indices by the standard deviation, the principal component analysis was used to reduce the number of variables. It was found that among the 18 principal components, the first three represented about 70% (seventy percent) of the total variance. Four variables related to severe weather conditions had considerable numerical values of the respective coefficients in the linear combinations of these principal components. Namely: SWEAT index (SWET), Convective Available Potential Energy (CAPE), Level of Free Convection (LFCT) and Precipitable Water (PWAT). Therefore, these elements were defined as the input vectors variables of the SOM, TKM and RSOM networks. Figure 5 shows a variance explained for principal components, until the ninth principal component.

Fig. 5. Variance explained for principal components

The SWEAT index (or Severe Weather Threat Index) uses several variables (dewpoint, wind speed and direction, among others) to determine the likeliness of severe weather. The Convective Available Potential Energy (CAPE) is the integration of the positive area on a Skew-T sounding diagram. It exists when the difference between the equivalent potential temperature of the air parcel and the saturated equivalent potential temperature of the environment is positive. This means that the pseudo-adiabatic of the displaced air parcel is warmer than the environment (unstable condition). The Level of Free Convection (LFCT) is the CAPE region lower boundary. At this level a lifted air parcel will become equal in temperature to that of the environmental temperature. Once an air parcel is lifted to the LFCT it will rise all the way to the CAPE region top. The Precipitable Water (or Precipitable Water Vapor) is a parameter which gives the amount of moisture in the troposphere.

### **3.1.3 Data cleansing**

158 Recurrent Neural Networks and Soft Computing

A selection algorithm was used to identify all 24 available atmospheric indices from radiosoundings performed at 12h UTC (9h Local Time) in the period analyzed. Subsequently, the indices calculated with virtual temperature were eliminated, leaving only

After normalization of these 18 indices by the standard deviation, the principal component analysis was used to reduce the number of variables. It was found that among the 18 principal components, the first three represented about 70% (seventy percent) of the total variance. Four variables related to severe weather conditions had considerable numerical values of the respective coefficients in the linear combinations of these principal components. Namely: SWEAT index (SWET), Convective Available Potential Energy (CAPE), Level of Free Convection (LFCT) and Precipitable Water (PWAT). Therefore, these elements were defined as the input vectors variables of the SOM, TKM and RSOM networks. Figure 5 shows a variance explained for principal components, until the ninth

Fig. 4. SBBE station information and sounding indices

**3.1.2 Data selection** 

principal component.

18 indices.

The input vectors contained four variables: SWEAT index (SWET), Convective Available Potential Energy (CAPE), Level of Free Convection (LFCT) and Precipitable Water (PWAT). The vectors containing missing data from one or more variables were discarded. At the end of the cleansing process, a total of 1774 examples were obtained.

#### **3.1.4 Data normalization**

To reduce the discrepancies magnitude in the input vectors values, the min-max normalization was applied according to the equation 11. This transformed the original values of these input variables in normalized values within the range [0, 1].

$$value\_{normalized} = \frac{value\_{original} - \min A}{\max A - \min A} \tag{11}$$

Recurrent Self-Organizing Map for Severe Weather Patterns Recognition 161

Specifically for the TKM network was used the time constant *d* equal to 0.65 and for the

Subsequently it was evaluated the performance of the TKM and RSOM classifiers for

The results were presented in confusion matrices. In the confusion matrix each column represents the expected results, while each row corresponds to the actual results. During the

After, a ROC analysis was done. The ROC graph is a technique for visualizing and evaluating classifiers based on their performance (Fawcett, 2006). A ROC graph allows identifying relative tradeoffs of a discrete classifier (one that your output is only a class label). In ROC graph the true positive rate (tp rate) of a classifier is plotted on the Y axis, while the false positive rate (fp rate) is plotted on the X axis. Fig. 7 shows a ROC graph with

Some points in ROC graph are very important. The lower left point (0, 0) represents a classifier that commits no false positive errors but also gains no true positives. The opposite situation is represented by the upper right point (1, 1). The upper left point (0, 1) represents a perfect classification (tp rate = 1 and fp rate = 0). In Fig 7, the point A is the ideal classifier; the point B represents a conservative classifier; and the point C represents a liberal classifier. Conservative classifiers make positive classifications only with strong evidence, i.e., their

The parameters used in the training of the SOM, TKM and RSOM networks were:

RSOM network was used the leaking coefficient equal to 0.35.

simulation 1174 remaining examples of the data set were used.

 Random weight initialization; Initial learning rate equal to 0.8; Final learning rate equal to 0.001; Number of epochs equal to 50.

different values of the constants and *d*.

three classifiers labeled A through C.

Fig. 7. ROC graph showing three discrete classifiers

### Where:

