**3.1 The Bayes strategy for pattern classification**

One of the traditionally accepted strategies or decision rules used to patterns classification is that they minimize the "expected risk." Such strategies are called Bayes strategies, and can be applied to problems containing any number of categories (Specht, 1988).

To illustrate the Bayes decision rule formalism, it is considered the situation of two categories in which the state of known nature , can be *<sup>A</sup>* or *<sup>B</sup>* . It is desired to decide whether *<sup>A</sup>* or *<sup>B</sup>* based on a measurements set represented by a *n* dimension vector *x* . Then the Bayes decision rule is given by:

$$\begin{aligned} d(\mathbf{x}) &= \theta\_A \text{ if } h\_A l\_A f\_A(\mathbf{x}) > h\_B l\_B f\_B(\mathbf{x})\\ d(\mathbf{x}) &= \theta\_B \text{ if } h\_A l\_A f\_A(\mathbf{x}) < h\_B l\_B f\_B(\mathbf{x}) \end{aligned} \tag{7}$$

where ( ) *Af x* and ( ) *Bf x* are the probability density functions for categories *<sup>A</sup>* and*B* respectively, *Al* is the uncertainty function associated with the decision ( ) *<sup>B</sup> d x* when *<sup>A</sup>* ; *Bl* is the uncertainty function associated with the decision ( ) *<sup>A</sup> d x* when *<sup>B</sup>* , *Ah* is the a priori probability of category *<sup>A</sup>* patterns occurrence, and 1 *B A h h* is the a priori probability that *<sup>B</sup>* . Then, the boundary between the regions in which the Bayes decision ( ) *<sup>A</sup> d x* and ( ) *<sup>B</sup> d x* is given by:

$$f\_A(\mathbf{x}) = \mathbf{K} f\_B(\mathbf{x}) \tag{8}$$

where:

$$K = \frac{h\_B l\_B}{h\_A l\_A} \tag{9}$$

It should be noted that, in general, the decision surfaces of two categories defined by Eq. (8) can be arbitrarily complex, since there are no restrictions on the densities except for those conditions to which all probability density functions must satisfy, namely that they must be always non-negative, and integrable and their integrals over all space be equal to unity.

The ability to estimate the probability density functions, based on training patterns, is fundamental to the use of Eq. (8). Frequently, a priori probabilities can be known or estimated, and the loss functions require subjective evaluation. However, if the probability densities of the categories patterns to be separate are unknown, and all that is known is a set of training patterns, then, these patterns provide the only clue to the estimation of that unknown probability density. A particular estimator that can be used is (Specht, 1990):

$$f\_A(\mathbf{x}) = \frac{1}{\frac{1}{\left(2\pi\right)^{\frac{n}{2}}\sigma^n}} \frac{1}{m} \sum\_{i=1}^m \exp\left(-\frac{\left(\mathbf{x} - \mathbf{x}\_{ai}\right)^T \left(\mathbf{x} - \mathbf{x}\_{ai}\right)}{2\sigma^2}\right) \tag{10}$$

Application of Wavelet Transform and Artificial Neural Network to

from:

depending on training.

be analyzed.

**4. Proposed procedure** 

Extract Power Quality Information from Voltage Oscillographic Signals in Electric Power Systems 185

In the radial basis layer, the training vectors are stored in a weights matrix *w*<sup>1</sup> . When a new pattern is presented to the input, the block *dist* calculates the Euclidean distance between each input pattern vector for each of the stored weight vectors. The vector in the output block *dist* is multiplied, point by point, by the polarization factor *b* . The result of this multiplication *n*<sup>1</sup> is applied to a radial basis function providing as output <sup>1</sup>*a* , obtained

> 2 1

*<sup>n</sup> a e* (11)

1

This way, a vector in the input pattern close to a training vector is represented by a value close to 1 in the output vector <sup>1</sup>*a* . The competitive layer of the weight matrix *w*2 contains the target vectors representing each class corresponding to each vector in the training pattern. Each vector *w*<sup>2</sup> has a 1 only in the row associated with a particular class and 0 in other positions. The Multiplication *w a*2 1 adds the 1*a* elements corresponding to each class, providing the output *n*<sup>2</sup> . Finally block *C* provides 1 at output 2*a* corresponding to the biggest element of *n*2 and 0 for the other values. Thus, the neural network classifies each vector of the input pattern in a specific class, because that class has the highest probability of being correct. The main advantage of PNN is its easy and straightforward project, and not

The proposed procedure is shown schematically in Figure 6. The real data file contains phases A-B-C voltages and currents waveforms, as well as digital signals that indicate the statuses of protective devices, as relays and circuit breakers, acquired by DDR and IED installed in the electrical system substations. These raw data are coded in the COMTRADE format for power systems (IEEE Standard Common Format for Transient Data Exchange), (IEEE Std C37.111, 1999). So, to obtain the voltages and currents signals it is firstly necessary to decode the COMTRADE data, and select the desired waveforms to

Fig. 6. Schematic diagram representing the proposed processing procedure.

Before inputting the voltages waveforms to the processing stage, a pre-processing routine is accomplished to standardize the raw data due to the different voltage levels that are

Where *i* is the pattern number, *m* is the total number of training patterns, *ai x* is the i-th training pattern of category *<sup>A</sup>* , and is the smoothing factor. It should be noted that ( ) *Af x* is simply the sum of small Gaussian distributions centered at each training sample.

#### **3.2 Structure of the Probabilistic Neural Network**

The probabilistic neural network is basically a Bayesian classifier implemented in parallel. The PNN, as described by Specht (Specht, 1988), is based on estimation of probability density functions for the various classes established by the training patterns. A schematic diagram for a PNN is shown in Figure 5. The input layer *X* is responsible for connecting the input pattern to the radial basis layer. *X xx x* 1 2 ,,, *<sup>M</sup>* is a matrix containing the vectors to be classified.

Fig. 5. Schematic diagram of a Probabilistic Neural Network

In the radial basis layer, the training vectors are stored in a weights matrix *w*<sup>1</sup> . When a new pattern is presented to the input, the block *dist* calculates the Euclidean distance between each input pattern vector for each of the stored weight vectors. The vector in the output block *dist* is multiplied, point by point, by the polarization factor *b* . The result of this multiplication *n*<sup>1</sup> is applied to a radial basis function providing as output <sup>1</sup>*a* , obtained from:

$$a\_1 = e^{-n\_1^2} \tag{11}$$

This way, a vector in the input pattern close to a training vector is represented by a value close to 1 in the output vector <sup>1</sup>*a* . The competitive layer of the weight matrix *w*2 contains the target vectors representing each class corresponding to each vector in the training pattern. Each vector *w*<sup>2</sup> has a 1 only in the row associated with a particular class and 0 in other positions. The Multiplication *w a*2 1 adds the 1*a* elements corresponding to each class, providing the output *n*<sup>2</sup> . Finally block *C* provides 1 at output 2*a* corresponding to the biggest element of *n*2 and 0 for the other values. Thus, the neural network classifies each vector of the input pattern in a specific class, because that class has the highest probability of being correct. The main advantage of PNN is its easy and straightforward project, and not depending on training.
