**Data Mining Applied to Cognitive Radio Systems**

Lilian Freitas, Yomara Pires, Jefferson Morais, João Costa and Aldebaro Klautau

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/51824

## **1. Introduction**

Cognitive radio (CR) is a novel technology that allows to improve spectrum utilization by enabling opportunistic access to the licensed spectrum band by unlicensed users [2]. This is accomplished through heterogeneous architectures and techniques of dynamic spectrum access. The CR is defined as an intelligent wireless communication system that is aware of its environment and is capable to learn from the environment and adapt its transmission parameters, such as frequency, modulation, transmission power and communication protocols [14].

An important aspect of a cognitive radio is spectrum sensing [10], which involves two main tasks: signal detection and modulation classification. Signal detection refers to detection of unused spectrum (spectrum holes). It is a simpler task and can be done, for example, by comparing the energy in the frequency band of interest with a predetermined threshold. This task is important so that the unlicensed users do not cause interference to licensed users. Modulation classification consists in automatically identifying the modulation scheme (PSK, FM, QAM, etc) of a given communication system with a high probability of success and in a short period of time. The identification of the modulation scheme allows the cognitive radio to demodulate the received signal. In order to accomplish the task of modulation classification, several data mining techniques can be applied, such as artificial neural networks, support vector machine, Bayesian classifiers, etc.

This chapter aims to evaluate different algorithms for classification of modulation signals on spectrum sensing. The features used for classification are based on a well-established technique called cyclostationarity [7, 10]. Based on these features are evaluated the performances of five data mining techniques: naïve bayes, decision tree, k-nearest neighbor (KNN), support vector machine (SVM) and artificial neural networks (ANN). The choice of such techniques was based on the fact that they are the most popular representatives of different learning paradigms.

©2012 Freitas et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0),which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ©2012 Freitas et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## **2. The problem of modulation classification**

A modulation classification system consists of a *front end* and a *back end* or classifier. The *front end* converts the received signal *r*(*t*) to a vector **x**[*k*], *k* = 1, . . . , *N* composed of *N* elements. Having **x**[*k*] as input, the classifier decides the class *y* ∈ {1, . . . , *C*} among *C* pre-determined modulation schemes. The process is depicted in the diagram below:

Cognitive Radio Systems 3

frequency. For every modulation scheme 1500 samples are generated under each SNR (from -10 dB to 10 dB at intervals of 5 dB), in which 750 samples are used for training, and the other

Cyclostationarity is a technique that extracts features of the signals. Signals are characterized as cyclostationary since their mean and autocorrelation are periodic with some period *T*. This

> *Mx*(*t* + *T*) = *Mx*(*t*) *Rx*(*t* + *T*, *u* + *T*) = *Rx*(*t*, *u*)

Modulated signals are cyclostationary since they are coupled with several sources of periodicities such as sine wave carriers, pulse trains, repeating spreading, hopping sequences, or cyclic prefixes. These introduced periodicities cause spectral redundancy, which can be measured by the correlation between spectral components of cyclostationary signals. This periodicity appearing in transmitted signal of the users can be used by cognitive radio to

Since the autocorrelation function *Rx*(*τ*) of received signal *x*(*t*) is periodic, it can be

components at cyclic frequencies *α*. The Fourier coefficients may be obtained by Equation 2.

*Rx*(*t* +

The density of spectral correlation is the Fourier Transform of cyclic autocorrelation function

where *XT*(*t*, *f*) is the spectral components of received signal *x*(*t*) at frequency *f* with

*t*−*T*/2

The SCF is a three-dimensional function; therefore, to reduce the calculations for the classifier, it is possible to use the peak values of normalized SCF as features to distinguish each

 Δ*t*/2 −Δ*t*/2 *Rα <sup>x</sup>*(*τ*)*e*

)*e*−*i*2*πα<sup>t</sup>*

*<sup>x</sup>*(*f*), is the density of correlation between

*x*(*u*)*e*−*i*2*<sup>π</sup> f udu* (4)

*<sup>T</sup>*(*t*, *<sup>f</sup>* <sup>−</sup> *<sup>α</sup>* 2

*<sup>x</sup>*(*τ*) is the Fourier Series coefficients called *cyclic autocorrelation function* with spectral

*τ* 2 , *<sup>t</sup>* <sup>−</sup> *<sup>τ</sup>* 2

*XT*(*t*, *f* +

*α* <sup>2</sup> )*X*<sup>∗</sup>

*<sup>i</sup>*2*πα<sup>t</sup>* (1)

Data Mining Applied to Cognitive Radio Systems 201

*dt*, (2)

)*dt* (3)

*Rx*(*<sup>t</sup>* <sup>+</sup> *<sup>τ</sup>*/2, *<sup>t</sup>* <sup>−</sup> *<sup>τ</sup>*/2) = <sup>∑</sup>*<sup>α</sup>*

 *T*/2 −*T*/2

spectral components at (*f* + *α*/2) and (*f* − *α*/2), and is given by Equation 3,

1 *T*Δ*t*

*XT*(*t*, *<sup>f</sup>*) = *<sup>t</sup>*+*T*/2

750 samples for testing.

is

where *R<sup>α</sup>*

*Rα*

**3. Front end: Cyclostationarity**

detect and identify user signals.

for all *t* and *u*, where *x*(*t*) is a signal said to be cyclostationary.

represented by a Fourier Series, shown in Equation 1,

*Rα <sup>x</sup>*(*τ*) <sup>Δ</sup> <sup>=</sup> <sup>1</sup> *T*

*<sup>x</sup>*(*τ*) and *spectral correlation function* (SCF), *S<sup>α</sup>*

*<sup>x</sup>*(*f*) = lim

*<sup>T</sup>*→<sup>∞</sup> lim Δ*t*→∞

*Sα*

bandwidth 1/*T* as defined in Equation 4.

$$r(t) \text{ (signal)} \rightarrow \boxed{\text{front end}} \rightarrow \text{x}[k] \text{ (features)} \rightarrow \boxed{\text{classifier}} \rightarrow \text{y} \text{ (class)}$$

The feature selection is a key step in the performance of the classifier. This selection depends on factors such as the modulation type to be classified, the signal to noise ratio, the presence of fading, the frequency offset, etc. This chapter uses the cyclostationarity to extract features of modulation due to its reduced sensibility to noise and interfering signals, and also its ability to extract signal parameters such as the carrier frequency and the symbol rate [7].

In the literature there are numerous works that combine different techniques of extraction features and classifiers to perform the modulation classification, as shown in Table 1.


**Table 1.** Examples of different *front end* and *classifier* used in the literature. NA: not available.

These works show good results in the experimental setup in which they were assessed. However, these works are evaluated on different operating conditions (signal-to-noise ratio, type of noise and distortion) and use different modulations. Thus, it is difficult to directly compare the results, or even to reproduce the results presented.

In this chapter, the comparison of main classifiers available in the literature is performed, considering the same operating conditions: five modulation schemes (AM, BPSK, QPSK, BFSK and 16QAM); channel with additive white Gaussian noise (AWGN) and Rayleigh multipath fading; all the modulated signals adopt the same symbol rate, sampling frequency and carrier frequency. For every modulation scheme 1500 samples are generated under each SNR (from -10 dB to 10 dB at intervals of 5 dB), in which 750 samples are used for training, and the other 750 samples for testing.

#### **3. Front end: Cyclostationarity**

2 Will-be-set-by-IN-TECH

A modulation classification system consists of a *front end* and a *back end* or classifier. The *front end* converts the received signal *r*(*t*) to a vector **x**[*k*], *k* = 1, . . . , *N* composed of *N* elements. Having **x**[*k*] as input, the classifier decides the class *y* ∈ {1, . . . , *C*} among *C* pre-determined

*r*(*t*) (signal)→ front end →**x**[*k*] (features)→ classifier →*y* (class)

The feature selection is a key step in the performance of the classifier. This selection depends on factors such as the modulation type to be classified, the signal to noise ratio, the presence of fading, the frequency offset, etc. This chapter uses the cyclostationarity to extract features of modulation due to its reduced sensibility to noise and interfering signals, and also its ability

In the literature there are numerous works that combine different techniques of extraction

Ref. front end Classifier Noise/ SNR (dB) Modulations Training/

[15] CSS SVM AWGN/ -5:2:10 BPSK, 4PAM, 16QAM, 8PSK 500/500 freq. offset

[16] features derived neural AWGN -5:5:20 2ASK, 4ASK, 2FSK, BPSK, 1200/1200 from amplitude, networks QPSK, AM, DSB, SSB, FM frequency and phase OFDM, 16QAM, 64QAM [18] cyclic neural network AWGN -10:5:10 BPSK, QPSK/QAM, several

[13] spectrogram rule AWGN 0:2:12 ASK, FSK, PSK NE/400

[4] linear transform of joint AWGN/ 0:1:5 BPSK, QPSK NE/100

[20] wavelet decision AWGN 0:5:15 FSK,PSK, QAM NE/1000

[19] fourth-order decision AWGN/ 0:5:20 BPSK, QPSK, 8PSK NE/100

[24] forth-order and ARBF not informed 0:5:20 4ASK, 2ASK, 2PSK 50/50

[1] cyclostationarity neural AWGN/ -10:5:15 BPSK, QPSK NE/1000 networks freq. offset FSK, MSK, AM [11] Renyi entropy, SVM AWGN 10 dB AM, FM, AM-FM 200/200

These works show good results in the experimental setup in which they were assessed. However, these works are evaluated on different operating conditions (signal-to-noise ratio, type of noise and distortion) and use different modulations. Thus, it is difficult to directly

In this chapter, the comparison of main classifiers available in the literature is performed, considering the same operating conditions: five modulation schemes (AM, BPSK, QPSK, BFSK and 16QAM); channel with additive white Gaussian noise (AWGN) and Rayleigh multipath fading; all the modulated signals adopt the same symbol rate, sampling frequency and carrier

Interference (min:Δ:max) Testing

200 Advances in Data Mining Knowledge Discovery and Applications Data Mining Applied to

to extract signal parameters such as the carrier frequency and the symbol rate [7].

spectrum and HMM FSK, MSK

cumulant threshold freq. offset pi/4 DQPSK

sixth-order cumulants 4PSK, 16QAM

**Table 1.** Examples of different *front end* and *classifier* used in the literature. NA: not available.

high order statistics QPSK

compare the results, or even to reproduce the results presented.

time-frequency based

transform threshold

amplitude and phase moments freq. offset

features and classifiers to perform the modulation classification, as shown in Table 1.

**2. The problem of modulation classification**

modulation schemes. The process is depicted in the diagram below:

Cyclostationarity is a technique that extracts features of the signals. Signals are characterized as cyclostationary since their mean and autocorrelation are periodic with some period *T*. This is

$$M\_{\mathbf{x}}(t+T) = M\_{\mathbf{x}}(t)$$

$$R\_{\mathbf{x}}(t+T, \mu+T) = R\_{\mathbf{x}}(t, \mu)$$

for all *t* and *u*, where *x*(*t*) is a signal said to be cyclostationary.

Modulated signals are cyclostationary since they are coupled with several sources of periodicities such as sine wave carriers, pulse trains, repeating spreading, hopping sequences, or cyclic prefixes. These introduced periodicities cause spectral redundancy, which can be measured by the correlation between spectral components of cyclostationary signals. This periodicity appearing in transmitted signal of the users can be used by cognitive radio to detect and identify user signals.

Since the autocorrelation function *Rx*(*τ*) of received signal *x*(*t*) is periodic, it can be represented by a Fourier Series, shown in Equation 1,

$$R\_{\mathbf{x}}(t+\tau/2, t-\tau/2) = \sum\_{\mathbf{a}} R\_{\mathbf{x}}^{\mathbf{a}}(\tau)e^{i2\pi\mathbf{a}t} \tag{1}$$

where *R<sup>α</sup> <sup>x</sup>*(*τ*) is the Fourier Series coefficients called *cyclic autocorrelation function* with spectral components at cyclic frequencies *α*. The Fourier coefficients may be obtained by Equation 2.

$$R\_{\mathbf{x}}^{a}(\tau) \stackrel{\Delta}{=} \frac{1}{T} \int\_{-T/2}^{T/2} R\_{\mathbf{x}}(t + \frac{\tau}{2}, t - \frac{\tau}{2}) e^{-i2\pi at} dt,\tag{2}$$

The density of spectral correlation is the Fourier Transform of cyclic autocorrelation function *Rα <sup>x</sup>*(*τ*) and *spectral correlation function* (SCF), *S<sup>α</sup> <sup>x</sup>*(*f*), is the density of correlation between spectral components at (*f* + *α*/2) and (*f* − *α*/2), and is given by Equation 3,

$$\mathcal{S}\_x^{\mathfrak{a}}(f) = \lim\_{T \to \infty} \lim\_{\Delta t \to \infty} \frac{1}{T\Delta t} \int\_{-\Delta t/2}^{\Delta t/2} X\_T(t, f + \frac{\mathfrak{a}}{2}) X\_T^\*(t, f - \frac{\mathfrak{a}}{2}) dt\tag{3}$$

where *XT*(*t*, *f*) is the spectral components of received signal *x*(*t*) at frequency *f* with bandwidth 1/*T* as defined in Equation 4.

$$X\_T(t,f) = \int\_{t-T/2}^{t+T/2} x(u)e^{-i2\pi fu} du\tag{4}$$

The SCF is a three-dimensional function; therefore, to reduce the calculations for the classifier, it is possible to use the peak values of normalized SCF as features to distinguish each

#### 4 Will-be-set-by-IN-TECH 202 Advances in Data Mining Knowledge Discovery and Applications Data Mining Applied to

modulation, that is, the *cyclic domain profile* (CDP), obtained by Equation 5.

$$I(\mathfrak{a}) = \max\_{k} |S\_{\mathfrak{x}}^{\mathfrak{a}}(f)|. \tag{5}$$

Cognitive Radio Systems 5

and interfering signals, and also its ability to extract signal parameters such as the carrier

The naïve Bayes classifier is based on Bayes' theorem. This classifier is particularly useful when the input data dimensionality is high. Thus, to represent the classifier in the cognitive radio system, we adopt the nomenclature used in [6], where *P*(*y*|**x**), *P*(**x**|*y*), *P*(*y*) and *P*(**x**) are called posterior, likelihood, prior and evidence, respectively, and are related through Bayes'

*<sup>P</sup>*(*y*|**x**) = *<sup>P</sup>*(**x**|*y*)*P*(*y*)

*y*=1...,*Y*

which maximizes the posterior probability. However, neither *P*(*y*) nor *P*(**x**|*y*) is known.

*y*=1,...,*Y*

In most cases, the prior *P*(*y*) can be reliably estimated by counting the labels in the training set, i.e., we assume that *<sup>P</sup>*ˆ(*y*) = *<sup>P</sup>*(*y*). In order to estimate *<sup>P</sup>*ˆ(**x**|*y*) is often the most difficult task.

*θy* describes the distribution's parameters to be determined (e.g., the mean and covariance

The naïve Bayes algorithm assumes that the attributes (*x*1,..., *xK*) of **x** are conditionally independent of each other, given *y*. It means that the algorithm simplifies the representation of *P*(**x**|*y*), and the estimation problem from the training set. Whereas. In the case where

where *P*(*x*1, *x*2|*y*) = *P*(*x*1|*x*2, *y*)*P*(*x*2|*y*) is a general property from conditional probability definition, while *P*(*x*1, *x*2|*y*) = *P*(*x*1|*y*)*P*(*x*2|*y*) is only valid for conditional independence.

When training a naïve Bayes classifier, this will produce a probability distribution *P*(**x***i*|*y*) and *P*(*y*) for all values of *y*, i.e, *yk*, *k* = 1, . . . , *Y*. To calculate the posterior probability of each class

*P*(**x**|*y*) = *P*(*x*1, ..., *xK*|*y*) =

*P*(**x**|*y*) = *P*(*x*1, *x*2|*y*) = *P*(*x*1|*x*2, *y*)*P*(*x*2|*y*) = *P*(*x*1|*y*)*P*(*x*2|*y*) (9)

*K* ∏ *i*=1

F(**x**) = arg max

F(**x**) = arg max

Hence, Bayes classifiers typically assume a parametric distribution *<sup>P</sup>*ˆ(**x**|*y*) = *<sup>P</sup>*<sup>ˆ</sup>

Hence, the classifiers use estimates *<sup>P</sup>*ˆ(*y*) and *<sup>P</sup>*ˆ(**x**|*y*) and maximize

matrix if the likelihood model is a Gaussian distribution).

*<sup>P</sup>*(**x**) . (6)

Data Mining Applied to Cognitive Radio Systems 203

*P*(**x**|*y*)*P*(*y*), (7)

*<sup>P</sup>*ˆ(**x**|*y*)*P*ˆ(*y*). (8)

*P*(*xi*|*y*). (10)

*<sup>θ</sup><sup>y</sup>* (**x**|*y*) where

frequency and the symbol rate [7].

This classifier attempts to select the label

**4. Classifiers**

rule,

**4.1. Naïve Bayes**

**x** = (*x*1, *x*2), we have:

Generalizing Equation 9, we have:

In order to illustrate the use of the cyclostationarity technique, Figure 1 shows the estimation of the normalized SCF for BPSK and QPSK modulations respectively. The Figure 2 shows the cyclic domain profile for the BPSK and QPSK modulations. These examples adopted a sampling frequency *fs* = 8192 Hz, carrier frequency *K* = 2048 Hz, cyclic frequency resolution Δ*α* = 20 Hz and frequency resolution Δ*f* = 80 Hz.

**Figure 1.** Spectral correlation function. (a) BPSK. (b) QPSK

**Figure 2.** Cyclic domain profile. (a) BPSK. (b) QPSK

Note that different modulations have different CDP, thus these features were used as input to the block classifier. The cyclostationary features of modulated signals have been increasingly considered for use in a large range of applications, including signal detection, classification, synchronization and equalization. Its main advantages are the reduced sensibility to noise and interfering signals, and also its ability to extract signal parameters such as the carrier frequency and the symbol rate [7].

#### **4. Classifiers**

4 Will-be-set-by-IN-TECH

*<sup>k</sup>* <sup>|</sup>*S<sup>α</sup>*

In order to illustrate the use of the cyclostationarity technique, Figure 1 shows the estimation of the normalized SCF for BPSK and QPSK modulations respectively. The Figure 2 shows the cyclic domain profile for the BPSK and QPSK modulations. These examples adopted a sampling frequency *fs* = 8192 Hz, carrier frequency *K* = 2048 Hz, cyclic frequency resolution

(a) (b)

x 10 4

Note that different modulations have different CDP, thus these features were used as input to the block classifier. The cyclostationary features of modulated signals have been increasingly considered for use in a large range of applications, including signal detection, classification, synchronization and equalization. Its main advantages are the reduced sensibility to noise

0

0.2

0.4

0.6

I(alpha)

0.8

1

*<sup>x</sup>*(*f*)|. (5)

202 Advances in Data Mining Knowledge Discovery and Applications Data Mining Applied to

−1 −0.5 0 0.5 1

(b)

alpha(Hz)

x 10 4

modulation, that is, the *cyclic domain profile* (CDP), obtained by Equation 5.

Δ*α* = 20 Hz and frequency resolution Δ*f* = 80 Hz.

**Figure 1.** Spectral correlation function. (a) BPSK. (b) QPSK

−1 −0.5 0 0.5 1

(a)

**Figure 2.** Cyclic domain profile. (a) BPSK. (b) QPSK

alpha(Hz)

0

0.2

0.4

0.6

I(alpha)

0.8

1

*I*(*α*) = max

#### **4.1. Naïve Bayes**

The naïve Bayes classifier is based on Bayes' theorem. This classifier is particularly useful when the input data dimensionality is high. Thus, to represent the classifier in the cognitive radio system, we adopt the nomenclature used in [6], where *P*(*y*|**x**), *P*(**x**|*y*), *P*(*y*) and *P*(**x**) are called posterior, likelihood, prior and evidence, respectively, and are related through Bayes' rule,

$$P(y|\mathbf{x}) = \frac{P(\mathbf{x}|y)P(y)}{P(\mathbf{x})}.\tag{6}$$

This classifier attempts to select the label

$$\mathcal{F}(\mathbf{x}) = \arg\max\_{y=1\ldots,Y} P(\mathbf{x}|y)P(y) \,\,\,\tag{7}$$

which maximizes the posterior probability. However, neither *P*(*y*) nor *P*(**x**|*y*) is known. Hence, the classifiers use estimates *<sup>P</sup>*ˆ(*y*) and *<sup>P</sup>*ˆ(**x**|*y*) and maximize

$$\mathcal{F}(\mathbf{x}) = \arg\max\_{y=1,\ldots,Y} \hat{P}(\mathbf{x}|y)\hat{P}(y). \tag{8}$$

In most cases, the prior *P*(*y*) can be reliably estimated by counting the labels in the training set, i.e., we assume that *<sup>P</sup>*ˆ(*y*) = *<sup>P</sup>*(*y*). In order to estimate *<sup>P</sup>*ˆ(**x**|*y*) is often the most difficult task. Hence, Bayes classifiers typically assume a parametric distribution *<sup>P</sup>*ˆ(**x**|*y*) = *<sup>P</sup>*<sup>ˆ</sup> *<sup>θ</sup><sup>y</sup>* (**x**|*y*) where *θy* describes the distribution's parameters to be determined (e.g., the mean and covariance matrix if the likelihood model is a Gaussian distribution).

The naïve Bayes algorithm assumes that the attributes (*x*1,..., *xK*) of **x** are conditionally independent of each other, given *y*. It means that the algorithm simplifies the representation of *P*(**x**|*y*), and the estimation problem from the training set. Whereas. In the case where **x** = (*x*1, *x*2), we have:

$$P(\mathbf{x}|y) = P(\mathbf{x}\_1, \mathbf{x}\_2|y) = P(\mathbf{x}\_1|\mathbf{x}\_2, y)P(\mathbf{x}\_2|y) = P(\mathbf{x}\_1|y)P(\mathbf{x}\_2|y) \tag{9}$$

where *P*(*x*1, *x*2|*y*) = *P*(*x*1|*x*2, *y*)*P*(*x*2|*y*) is a general property from conditional probability definition, while *P*(*x*1, *x*2|*y*) = *P*(*x*1|*y*)*P*(*x*2|*y*) is only valid for conditional independence. Generalizing Equation 9, we have:

$$P(\mathbf{x}|y) = P(\mathbf{x}\_1, \dots, \mathbf{x}\_K|y) = \prod\_{i=1}^{K} P(\mathbf{x}\_i|y). \tag{10}$$

When training a naïve Bayes classifier, this will produce a probability distribution *P*(**x***i*|*y*) and *P*(*y*) for all values of *y*, i.e, *yk*, *k* = 1, . . . , *Y*. To calculate the posterior probability of each class

#### 6 Will-be-set-by-IN-TECH 204 Advances in Data Mining Knowledge Discovery and Applications Data Mining Applied to

*y*, we use Bayes' theorem:

$$p(y\_k|\mathbf{x}) = \frac{P(y\_k)P(\mathbf{x}\_1, \dots, \mathbf{x}\_K|y\_k)}{\sum\_{j} P(y\_j)P(z\_1, \dots, \mathbf{x}\_K|y\_j)}\tag{11}$$

Cognitive Radio Systems 7

growth of the tree known as "resubstitution error" continues to decrease, generally, the choices of the division in higher levels of the tree does not produce very reliable statistics. Therefore, the quality of the sample directly influences the accuracy of the estimates of the error. Since each iteration of the algorithm divides the set of training data, the internal nodes make decisions from ever smaller samples. This means that the error estimates are less reliable as the tree grows. Thus, *pruning methods* have been used to minimize this problem and avoid

Basically, there are two classes of methods in pruning a decision tree: a post-pruning and pre-pruning. In this chapter the *post-prune* method is used, which consists in allowing the tree to grow to a maximum size, i.e., until the leaf nodes that have minimal impurity, for

A support vector machine (SVM) is a class of learning algorithms based on statistical learning theory, which implements the principle of structural risk minimization [21]. The goal of an SVM classifier is to find a maximum margin hyperplane in a feature space. A hyperplane function is to be a decision surface such that the margin of separation between examples of

*overfitting* [6, 23].

**4.3. SVM classifier**

subsequent application of the pruning.

one class and another is at a maximum [5].

More specifically, a SVM is a binary classifier given by

*f*(**x**) =

can be converted to a perceptron *<sup>f</sup>*(**x**) = �**a**, **<sup>x</sup>**� <sup>+</sup> *<sup>c</sup>*, where **<sup>a</sup>** <sup>=</sup> <sup>∑</sup>*<sup>M</sup>*

SVMs, is the one-vs-all ECOC that uses *B* = *C* SVMs [12].

**4.4. Artificial neural networks**

*M* ∑ *m*=1

recognition, such as linear, Gaussian, polynomial, sigmoid and radial basis functions.

where K(**x**, **x***m*) is the kernel function between the test vector **x** and the *m*-th training example **x***m*, with *c*, *α<sup>m</sup>* ∈ �. The effectively used examples have *α<sup>m</sup>* �= 0 and are called *support vectors*. In the literature, several possibilities of kernels are presented in applications involving pattern

A SVM with a linear kernel K(**x**, **x***m*) = �**x**, **x***m*� given by the inner product between **x** and **x***<sup>m</sup>*

Therefore, linear SVMs were adopted in this chapter due to their lower computational cost when compared to non-linear SVMs with kernels such as the Gaussian [5]. To combine the binary SVMs *fb*(**x**), *b* = 1, . . . , *B*, to obtain *F*(**x**) this work adopted the *all-pairs* error-correcting output code (ECOC) matrix with Hamming decoding [3], where the winner class is the one with the majority of "votes". Note that an alternative to all-pairs, which uses *B* = 0.5*C*(*C* − 1)

Artificial neural networks (ANN) are parallel distributed systems composed of simple processing units called *neurons* that compute some mathematical function, usually nonlinear. Such units are arranged in one or more layers and interconnected by so-called synaptic

*αm*K(**x**, **x***m*) + *c*,

*<sup>m</sup>*=<sup>1</sup> *αm***x***<sup>m</sup>* is pre-computed.

Data Mining Applied to Cognitive Radio Systems 205

Assuming *xi* is conditionally independent given *y*, we can rewrite Equation 10 as:

$$p(y\_k|\mathbf{x}) = \frac{P(y\_k)\prod\_{i}P(\mathbf{x}\_i|y\_k)}{\sum\_{j}\left(P(y\_j)\prod\_{i}P(\mathbf{x}\_i|y\_j)\right)}.\tag{12}$$

Equation 12 is the fundamental equation of a naïve Bayes classifier. Given a new sample **x**, this equation shows how to calculate the probability for each *y*. Such calculation depends only on observed attribute values and distributions *P*(*y*) and *P*(*xi*|*y*) estimated from the data training. If it is desired only to the most likely value of *y*, then we can simplify to

$$\mathcal{F}(\mathbf{x}) = \arg\max\_{y\_k} P(y\_k) \prod\_i P(\mathbf{x}\_i | y\_k) \tag{13}$$

or, using the fact that the logarithm is a monotonic function:

$$\mathcal{F}(\mathbf{x}) = \arg\max\_{y\_k} \left[ \log P(y\_k) + \sum\_{i} \log P(x\_i | y\_k) \right]. \tag{14}$$

#### **4.2. Decision tree**

A decision tree is a model of predictive machine learning which performs the decision of a new instance based on the value of its various attributes [23]. It consists of a structure where leaf nodes represent tests of one or more attributes. The branches of these nodes are the possible values of these attributes. The terminal nodes are the result of classification. In order to perform the classification of a new instance, a decision tree is created based on the values of the attributes of the training set. This chapter uses the decision tree implemented in the Weka software, called J4.8, which is an implementation of the C4.5 algorithm, which was developed by J. Quinlan [17] and probably the most famous algorithm for the design of decision trees.

A decision tree is formed by a set of classification rules. Each path from the root to a leaf represents one of these rules. The decision tree should be set so that for each observation in the database, there is only one path from root to leaf. Classification rules are composed of an antecedent (precondition) and a consequent (conclusion). An antecedent should be formed by one or more predictive attributes, while the consequent defines the class or classes.

A key issue for building a decision tree is the strategy for the choice of features that can determine the class to which a *sample* belongs. Measures based on entropy are commonly used to address this problem, which measures the randomness of the value of a feature before deciding which feature to use to predict the class.

Decision trees are methods that use a recursive algorithm for successive divisions in a training set. The main problem is then the reliability of estimates of the error used to select the divisions. Despite the fact that estimate obtained with the training data used during the growth of the tree known as "resubstitution error" continues to decrease, generally, the choices of the division in higher levels of the tree does not produce very reliable statistics. Therefore, the quality of the sample directly influences the accuracy of the estimates of the error. Since each iteration of the algorithm divides the set of training data, the internal nodes make decisions from ever smaller samples. This means that the error estimates are less reliable as the tree grows. Thus, *pruning methods* have been used to minimize this problem and avoid *overfitting* [6, 23].

Basically, there are two classes of methods in pruning a decision tree: a post-pruning and pre-pruning. In this chapter the *post-prune* method is used, which consists in allowing the tree to grow to a maximum size, i.e., until the leaf nodes that have minimal impurity, for subsequent application of the pruning.

#### **4.3. SVM classifier**

6 Will-be-set-by-IN-TECH

*<sup>p</sup>*(*yk*|**x**) = *<sup>P</sup>*(*yk*)*P*(*x*1,..., *xK*|*yk*)

*<sup>p</sup>*(*yk*|**x**) = *<sup>P</sup>*(*yk*) <sup>∏</sup>*<sup>i</sup> <sup>P</sup>*(*xi*|*yk*) ∑*j* 

Equation 12 is the fundamental equation of a naïve Bayes classifier. Given a new sample **x**, this equation shows how to calculate the probability for each *y*. Such calculation depends only on observed attribute values and distributions *P*(*y*) and *P*(*xi*|*y*) estimated from the data

*P*(*yj*) ∏*<sup>i</sup> P*(*xi*|*yj*)

*P*(*yk*)∏ *i*

log *P*(*yk*) + ∑

A decision tree is a model of predictive machine learning which performs the decision of a new instance based on the value of its various attributes [23]. It consists of a structure where leaf nodes represent tests of one or more attributes. The branches of these nodes are the possible values of these attributes. The terminal nodes are the result of classification. In order to perform the classification of a new instance, a decision tree is created based on the values of the attributes of the training set. This chapter uses the decision tree implemented in the Weka software, called J4.8, which is an implementation of the C4.5 algorithm, which was developed by J. Quinlan [17] and probably the most famous algorithm for the design of decision trees. A decision tree is formed by a set of classification rules. Each path from the root to a leaf represents one of these rules. The decision tree should be set so that for each observation in the database, there is only one path from root to leaf. Classification rules are composed of an antecedent (precondition) and a consequent (conclusion). An antecedent should be formed by one or more predictive attributes, while the consequent defines the class or classes.

A key issue for building a decision tree is the strategy for the choice of features that can determine the class to which a *sample* belongs. Measures based on entropy are commonly used to address this problem, which measures the randomness of the value of a feature before

Decision trees are methods that use a recursive algorithm for successive divisions in a training set. The main problem is then the reliability of estimates of the error used to select the divisions. Despite the fact that estimate obtained with the training data used during the

*i*

log *P*(*xi*|*yk*)

Assuming *xi* is conditionally independent given *y*, we can rewrite Equation 10 as:

training. If it is desired only to the most likely value of *y*, then we can simplify to

<sup>F</sup>(**x**) = arg max *yk*

or, using the fact that the logarithm is a monotonic function:

<sup>F</sup>(**x**) = arg max *yk*

deciding which feature to use to predict the class.

<sup>∑</sup>*<sup>j</sup> <sup>P</sup>*(*yj*)*P*(*z*1,..., *xK*|*yj*) (11)

204 Advances in Data Mining Knowledge Discovery and Applications Data Mining Applied to

. (12)

*P*(*xi*|*yk*) (13)

. (14)

*y*, we use Bayes' theorem:

**4.2. Decision tree**

A support vector machine (SVM) is a class of learning algorithms based on statistical learning theory, which implements the principle of structural risk minimization [21]. The goal of an SVM classifier is to find a maximum margin hyperplane in a feature space. A hyperplane function is to be a decision surface such that the margin of separation between examples of one class and another is at a maximum [5].

More specifically, a SVM is a binary classifier given by

$$f(\mathbf{x}) = \sum\_{m=1}^{M} \alpha\_m \mathcal{K}(\mathbf{x}, \mathbf{x}\_m) + c\_m$$

where K(**x**, **x***m*) is the kernel function between the test vector **x** and the *m*-th training example **x***m*, with *c*, *α<sup>m</sup>* ∈ �. The effectively used examples have *α<sup>m</sup>* �= 0 and are called *support vectors*. In the literature, several possibilities of kernels are presented in applications involving pattern recognition, such as linear, Gaussian, polynomial, sigmoid and radial basis functions.

A SVM with a linear kernel K(**x**, **x***m*) = �**x**, **x***m*� given by the inner product between **x** and **x***<sup>m</sup>* can be converted to a perceptron *<sup>f</sup>*(**x**) = �**a**, **<sup>x</sup>**� <sup>+</sup> *<sup>c</sup>*, where **<sup>a</sup>** <sup>=</sup> <sup>∑</sup>*<sup>M</sup> <sup>m</sup>*=<sup>1</sup> *αm***x***<sup>m</sup>* is pre-computed.

Therefore, linear SVMs were adopted in this chapter due to their lower computational cost when compared to non-linear SVMs with kernels such as the Gaussian [5]. To combine the binary SVMs *fb*(**x**), *b* = 1, . . . , *B*, to obtain *F*(**x**) this work adopted the *all-pairs* error-correcting output code (ECOC) matrix with Hamming decoding [3], where the winner class is the one with the majority of "votes". Note that an alternative to all-pairs, which uses *B* = 0.5*C*(*C* − 1) SVMs, is the one-vs-all ECOC that uses *B* = *C* SVMs [12].

#### **4.4. Artificial neural networks**

Artificial neural networks (ANN) are parallel distributed systems composed of simple processing units called *neurons* that compute some mathematical function, usually nonlinear. Such units are arranged in one or more layers and interconnected by so-called synaptic

#### 8 Will-be-set-by-IN-TECH 206 Advances in Data Mining Knowledge Discovery and Applications Data Mining Applied to

weights. The intelligent behavior of ANN comes from the interactions between the processing units of the network.

Cognitive Radio Systems 9

Suppose a training set with *N* examples. Let **x** = (*x*1,..., *xk* ) be a new example, not yet classified. In order to classify it, calculate distances by a measure of similarity between **x** and all examples in the training set and consider the K closest examples (with the lowest distance) for **x**. The example **x** is classified according to the most frequent class *y* among the K examples

The distance between two examples is calculated by a measure of similarity. A popular measure of similarity is the Euclidian distance [6]. This measure calculates the square root

> *K* ∑ *i*=1

This chapter uses the Euclidean distance. Based on this metric, the KNN searches the "nearest

The simulations aim to evaluate the reliability of cyclostationarity technique for feature extraction and and to compare the performance of data mining techniques like naïve bayes, decision tree, KNN, SVM and ANN, under various conditions. The signals were modulated using AM, BPSK, QPSK, BFSK and 16-QAM modulations. The signals were propagated through two types of channel models: AWGN channel and an AWGN and multipath fading channel. The signal-to-noise ratio (SNR) was varied randomly from -10 to 10 dB as part of the simulation. In both channel models, carrier frequency *fc* = 2.4 GHz, sampling period *Ts*= 0.167 ns, square-root raised cosine with roll-off factor *r* = 0.1, number of symbols

In order to implementation of the classifiers we used the WEKA software [22], which is a collection of machine learning algorithms for data mining tasks. Weka is open source software issued under the General Public License. We adopted the following settings for the classifiers:

• **Naïve Bayes**: The naïve Bayes used a normal distribution for numeric attributes, and the parameter *K* (*UseKernelEstimator*) set to False, which corresponds to the standard naive

• **KNN**: In the configuration we adopted the KNN search algorithm of neighbors based on

• **J4.8**: The J4.8 has been configured with automatic selection for the confidence factor parameter C. Thus, the results presented represent the best result achieved for confidence

• **SVM**: The SVM has been configured with linear kernel (K = 0), with the cost parameter C

(*xi* − *<sup>x</sup>*ˆ*i*)<sup>2</sup> (15)

Data Mining Applied to Cognitive Radio Systems 207

of the sum of the squares of the differences between the vectors **x** and **x**ˆ:

neighbors" to classify new examples.

*Nsymbol* = 100, FFT points *Nfft* = 512 was used.

factor values which varied between [0.1, 0.25 and 0.5];

ranging from [2, 1, 0.5 and 0.25] and degree of the kernel D = 3.

*d*(**x**, **x**ˆ) =

found.

**5. Results**

Bayes.

Euclidean distance.

A neuron consists of a sum of weights and inputs, and an activation function. The weight of the connections are set by a rule of training, according to the patterns presented. In this chapter, a neural network was used called multilayer perceptrons with the backpropagation algorithm for training, which has shown good results in classification problems.

The algorithm multilayer perceptron backpropagation (also called the generalized Delta Rule) consists in a process of supervised learning using a predetermined set of pairs of input and output to adjust the weights in the network using an error correction scheme held in propagation cycles [6, 9]. The backpropagation is divided into two phases: the *first step* is forward the input vector from the first through the last layer and to compare the output value to the desired value. The *second phase* consists of backwarding the error based on the last layer through the input layer by adjusting the weights of the neurons of the hidden layers. After adjusting all the weights of network, is given one more set of examples is given, ending a epoch. This process is repeated until the error is acceptable for the training set, referred to as the convergence time of the network.

The performance of a multilayer perceptron neural network during training depends on the following parameters [9]:


#### **4.5. KNN**

The classifiers that simply store the training data are called "lazy" classifiers, or known as IBL (instance based learning) [23]. The k-nearest neighbor (KNN) is a method of this family and stores examples in memory as points in *n*-dimensional space defined by *n* attributes that describe the examples [6]. Thus, for each new example to classify, KNN uses the training data to determine the examples in the database that are "nearest" to the example in the analysis. With each new example to be classified, a sweep in the training data, is made, which causes a large computational effort.

Suppose a training set with *N* examples. Let **x** = (*x*1,..., *xk* ) be a new example, not yet classified. In order to classify it, calculate distances by a measure of similarity between **x** and all examples in the training set and consider the K closest examples (with the lowest distance) for **x**. The example **x** is classified according to the most frequent class *y* among the K examples found.

The distance between two examples is calculated by a measure of similarity. A popular measure of similarity is the Euclidian distance [6]. This measure calculates the square root of the sum of the squares of the differences between the vectors **x** and **x**ˆ:

$$d(\mathbf{x}, \hat{\mathbf{x}}) = \sqrt{\sum\_{i=1}^{K} (\mathbf{x}\_i - \mathbf{x}\_i)^2} \tag{15}$$

This chapter uses the Euclidean distance. Based on this metric, the KNN searches the "nearest neighbors" to classify new examples.

### **5. Results**

8 Will-be-set-by-IN-TECH

206 Advances in Data Mining Knowledge Discovery and Applications Data Mining Applied to

weights. The intelligent behavior of ANN comes from the interactions between the processing

A neuron consists of a sum of weights and inputs, and an activation function. The weight of the connections are set by a rule of training, according to the patterns presented. In this chapter, a neural network was used called multilayer perceptrons with the backpropagation

The algorithm multilayer perceptron backpropagation (also called the generalized Delta Rule) consists in a process of supervised learning using a predetermined set of pairs of input and output to adjust the weights in the network using an error correction scheme held in propagation cycles [6, 9]. The backpropagation is divided into two phases: the *first step* is forward the input vector from the first through the last layer and to compare the output value to the desired value. The *second phase* consists of backwarding the error based on the last layer through the input layer by adjusting the weights of the neurons of the hidden layers. After adjusting all the weights of network, is given one more set of examples is given, ending a epoch. This process is repeated until the error is acceptable for the training set, referred to as

The performance of a multilayer perceptron neural network during training depends on the

• *Initialization of weights*. The weights of the connections between neurons can be initialized

• *Learning rate*. The learning rate controls the speed of learning, increasing or decreasing the set of weights performed at each iteration during training. Intuitively, its value must be greater than 0 and less than 1. If the learning rate is too small, learning will take place very slowly. Where the rate is very large (greater than 1), the correction would be greater than the observed error, causing the neural network learning point exceeding its greatest value,

• *Transfer function parametrization*. Also known as threshold logic, this function is the one which defines and sends out the value passed by the neuron activation function. The activation function can take many forms and methods. The best known are the following:

The classifiers that simply store the training data are called "lazy" classifiers, or known as IBL (instance based learning) [23]. The k-nearest neighbor (KNN) is a method of this family and stores examples in memory as points in *n*-dimensional space defined by *n* attributes that describe the examples [6]. Thus, for each new example to classify, KNN uses the training data to determine the examples in the database that are "nearest" to the example in the analysis. With each new example to be classified, a sweep in the training data, is made, which causes a

algorithm for training, which has shown good results in classification problems.

units of the network.

the convergence time of the network.

making the training process unstable.

linear function, sigmoid function and exponential function.

following parameters [9]:

**4.5. KNN**

large computational effort.

randomly or uniformly.

The simulations aim to evaluate the reliability of cyclostationarity technique for feature extraction and and to compare the performance of data mining techniques like naïve bayes, decision tree, KNN, SVM and ANN, under various conditions. The signals were modulated using AM, BPSK, QPSK, BFSK and 16-QAM modulations. The signals were propagated through two types of channel models: AWGN channel and an AWGN and multipath fading channel. The signal-to-noise ratio (SNR) was varied randomly from -10 to 10 dB as part of the simulation. In both channel models, carrier frequency *fc* = 2.4 GHz, sampling period *Ts*= 0.167 ns, square-root raised cosine with roll-off factor *r* = 0.1, number of symbols *Nsymbol* = 100, FFT points *Nfft* = 512 was used.

In order to implementation of the classifiers we used the WEKA software [22], which is a collection of machine learning algorithms for data mining tasks. Weka is open source software issued under the General Public License. We adopted the following settings for the classifiers:


• **ANN**: We adopted a neural network multilayer perceptron with learning algorithm backpropagation, the number of neurons in the hidden layer ranging from [60, 110, 130 and 160] neurons, learning rate ranging between a rate of [0.1, 0.5 and 0.9], and time varying between [0.1, 0.2 and 0.4].

Cognitive Radio Systems 11

100 200 300 400 500 600 700 800 900 1000

100 200 300 400 500 600 700 800 900 1000

Number of samples in the training set

In this scenario, the SNR is varied from a range of -15 to 15 dB range. The training, testing and validation sets were composed of 750 different samples of each modulation. The classifiers were trained and tested with the same values of SNR. Figure 7 shows the results for the

Naive Bayes KNN J4.8 SVM ANN

Data Mining Applied to Cognitive Radio Systems 209

Naive Bayes KNN J4.8 SVM ANN

Number of samples in the training set

**Figure 4.** Sample complexity for SNR = -5 dB.

**Figure 5.** Sample complexity for SNR = 5 dB.

**5.2. Simulation result of AWGN channel**

classification of AM, BPSK, QPSK and BFSK modulations.

Probability of Correct Classification (%)

Probability of Correct Classification (%)

#### **5.1. Sample complexity**

The first experiment aimed to analyze the accuracy of classifiers with the variation in the number of samples used in the training phase. This analysis was performed using the sample complexity curves [8].

The sample complexity curve aims to determine how many samples are required for the classifier to achieve a certain level of performance. The abscissa represents the number of samples in the training set and the ordinate represents the percentage of correct classifications obtained in the test phase. It should be noted that only the number of samples of the training set varies, while the number of samples in the test phase remains fixed.

The classifiers were trained with different numbers of samples, varying between [50, 150, 300, 450, 750 and 1000] samples of each modulation. In the test phase, 1000 samples were used for each modulation. Figure 3 through Figure 6 show the results obtained for a multipath fading channel, configured with Doppler frequency FD = 50 Hz and AWGN.

**Figure 3.** Sample complexity for SNR = -10 dB.

Through an analysis of sample complexity curves, it was decided to work with 750 training samples of each modulation, since this number had a good performance for the different classifiers which were evaluated.

**Figure 4.** Sample complexity for SNR = -5 dB.

10 Will-be-set-by-IN-TECH

208 Advances in Data Mining Knowledge Discovery and Applications Data Mining Applied to

• **ANN**: We adopted a neural network multilayer perceptron with learning algorithm backpropagation, the number of neurons in the hidden layer ranging from [60, 110, 130 and 160] neurons, learning rate ranging between a rate of [0.1, 0.5 and 0.9], and time varying

The first experiment aimed to analyze the accuracy of classifiers with the variation in the number of samples used in the training phase. This analysis was performed using the sample

The sample complexity curve aims to determine how many samples are required for the classifier to achieve a certain level of performance. The abscissa represents the number of samples in the training set and the ordinate represents the percentage of correct classifications obtained in the test phase. It should be noted that only the number of samples of the training

The classifiers were trained with different numbers of samples, varying between [50, 150, 300, 450, 750 and 1000] samples of each modulation. In the test phase, 1000 samples were used for each modulation. Figure 3 through Figure 6 show the results obtained for a multipath fading

100 200 300 400 500 600 700 800 900 1000

Naive Bayes KNN J4.8 SVM ANN

Number of samples in the training set

Through an analysis of sample complexity curves, it was decided to work with 750 training samples of each modulation, since this number had a good performance for the different

set varies, while the number of samples in the test phase remains fixed.

channel, configured with Doppler frequency FD = 50 Hz and AWGN.

**Figure 3.** Sample complexity for SNR = -10 dB.

classifiers which were evaluated.

Probability of Correct Classification (%)

between [0.1, 0.2 and 0.4].

**5.1. Sample complexity**

complexity curves [8].

**Figure 5.** Sample complexity for SNR = 5 dB.

#### **5.2. Simulation result of AWGN channel**

In this scenario, the SNR is varied from a range of -15 to 15 dB range. The training, testing and validation sets were composed of 750 different samples of each modulation. The classifiers were trained and tested with the same values of SNR. Figure 7 shows the results for the classification of AM, BPSK, QPSK and BFSK modulations.

Cognitive Radio Systems 13

Data Mining Applied to Cognitive Radio Systems 211

Modulations N. Bayes J4.8 KNN SVM ANN AM 100 100 100 100 100 BPSK 90.0 67.6 85.2 92.8 79.2 QPSK 91.2 67.2 58.0 93.2 47.6 BFSK 100 99.2 99.6 100 100

Table 3 shows the confusion matrix of the J4.8 classifier considering a SNR = -15 dB. It is observed that at low SNR, the classification error occurs to distinguish the QPSK and BPSK modulation, due to distortions in their features. Figure 8 shows the profiles of the BPSK and

> Classifier as -> AM BPSK QPSK BFSK AM 750 0 0 0 BPSK 3 507 237 3 QPSK 3 243 504 0 BFSK 6 0 0 744

In order to analyze the degree of generalization of the classifiers, two experiments were realized. In the first, the number of samples in training set was fixed at 750 samples for each modulation with SNR = 5 dB. The number of samples in the test set was varied with SNR values from -15 to 15 dB. The goal was to evaluate the performance of classifiers when tested with SNR values for which they were not trained. The results are shown in Figure 9. It is observed that SVM and ANN presented the best performance, which can be seen, for

The second experiment was to train classifiers with SNR values from -15 to 15 dB. Then the classifiers were tested with specific values of SNR, which are indicated on the axis of abscissa

In this experiment, the performance of the classifiers obtained a considerable increase. The ANN and SVM classifiers presented the best performance; on the other hand, the naïve Bayes

Classifiers Correct class. (%)

In the literature there are studies indicating that the classification of QAM using cyclostationarity is difficult due to the fact that high-order QAM modulations do not exhibit periodicity of 2nd order, or in some cases, exhibit similar characteristics of QPSK

**Table 2.** Performance of classifiers in SNR = -15 dB.

QPSK modulations in SNR = -15 dB and 15 dB.

**Table 3.** Confusion matrix of the J4.8 classifier, SNR = -15 dB.

in Figure 9.

classifier had the worst performance.

example, comparing the performance of classifiers in SNR = -5 dB.

Naïve Bayes 89.1 KNN 91.9 J4.8 92.7 SVM 97.5 ANN 100

**Table 4.** Performance of classifiers when trained and tested with different SNR values.

**Figure 6.** Sample complexity for SNR = 10 dB.

**Figure 7.** Performance of classifiers in a AWGN channel.

The results show that for SNR values greater than -5 dB, all classifiers presented excellent performance with nearly 100% correct classification. Among the evaluated classifiers, the SVM had a higher percentage of correct classification, even for SNR values lower than -5 dB.

Table 2 allows the analysis of the performance of classifiers in the worst case, i.e., SNR = -15 dB. The results show that the greater number of errors occurs in the classification of QPSK and BPSK modulations, mainly by KNN (with 14.3 % errors) and decision tree J4.8 (16.5 % errors) classifiers.


**Table 2.** Performance of classifiers in SNR = -15 dB.

12 Will-be-set-by-IN-TECH

100 200 300 400 500 600 700 800 900 1000

Naive Bayes KNN J4.8 SVM ANN

210 Advances in Data Mining Knowledge Discovery and Applications Data Mining Applied to

Naive Bayes KNN J4.8 SVM ANN

Number of samples in the training set

−15 −10 −5 0 5 10 15

The results show that for SNR values greater than -5 dB, all classifiers presented excellent performance with nearly 100% correct classification. Among the evaluated classifiers, the SVM had a higher percentage of correct classification, even for SNR values lower than -5 dB. Table 2 allows the analysis of the performance of classifiers in the worst case, i.e., SNR = -15 dB. The results show that the greater number of errors occurs in the classification of QPSK and BPSK modulations, mainly by KNN (with 14.3 % errors) and decision tree J4.8 (16.5 % errors)

SNR (dB)

**Figure 6.** Sample complexity for SNR = 10 dB.

Probability of Correct Classification (%)

classifiers.

**Figure 7.** Performance of classifiers in a AWGN channel.

Probability of Correct Classification (%)

Table 3 shows the confusion matrix of the J4.8 classifier considering a SNR = -15 dB. It is observed that at low SNR, the classification error occurs to distinguish the QPSK and BPSK modulation, due to distortions in their features. Figure 8 shows the profiles of the BPSK and QPSK modulations in SNR = -15 dB and 15 dB.


**Table 3.** Confusion matrix of the J4.8 classifier, SNR = -15 dB.

In order to analyze the degree of generalization of the classifiers, two experiments were realized. In the first, the number of samples in training set was fixed at 750 samples for each modulation with SNR = 5 dB. The number of samples in the test set was varied with SNR values from -15 to 15 dB. The goal was to evaluate the performance of classifiers when tested with SNR values for which they were not trained. The results are shown in Figure 9. It is observed that SVM and ANN presented the best performance, which can be seen, for example, comparing the performance of classifiers in SNR = -5 dB.

The second experiment was to train classifiers with SNR values from -15 to 15 dB. Then the classifiers were tested with specific values of SNR, which are indicated on the axis of abscissa in Figure 9.

In this experiment, the performance of the classifiers obtained a considerable increase. The ANN and SVM classifiers presented the best performance; on the other hand, the naïve Bayes classifier had the worst performance.


**Table 4.** Performance of classifiers when trained and tested with different SNR values.

In the literature there are studies indicating that the classification of QAM using cyclostationarity is difficult due to the fact that high-order QAM modulations do not exhibit periodicity of 2nd order, or in some cases, exhibit similar characteristics of QPSK

Cognitive Radio Systems 15

Probability of Correct Classification (%)

−15 −10 −5 <sup>0</sup> <sup>5</sup> <sup>10</sup> <sup>15</sup>

(e) ANN

**Figure 9.** Performance of classifiers when trained and tested with different SNR values. Abscissa

SNR (dB)

Trained with SNR = 5 dB Trained with different SNR

Probability of Correct Classification (%)

−15 −10 −5 <sup>0</sup> <sup>5</sup> <sup>10</sup> <sup>15</sup>

(b) KNN

−15 −10 −5 <sup>0</sup> <sup>5</sup> <sup>10</sup> <sup>15</sup>

(d) SVM

SNR (dB)

SNR (dB)

Trained with SNR = 5 dB Trained with different SNR

Data Mining Applied to Cognitive Radio Systems 213

Trained with SNR = 5 dB Trained with different SNR

−15 −10 −5 <sup>0</sup> <sup>5</sup> <sup>10</sup> <sup>15</sup>

(a) Naïve Bayes

−15 −10 −5 <sup>0</sup> <sup>5</sup> <sup>10</sup> <sup>15</sup>

(c) Decision tree J4.8

SNR (dB)

Probability of Correct Classification (%)

indicates the SNR adopted for the test set.

SNR (dB)

Trained with SNR = 5 dB Trained with different SNR

Trained with SNR = 5 dB Trained with different SNR

Probability of Correct Classification (%)

Probability of Correct Classification (%)

**Figure 8.** Profiles of the BPSK and QPSK modulations.

modulation [1]. The results that follow show the performance of classifiers to classify the 16-QAM modulation. Figure 10 allows comparison of the performance of classifiers when the 16-QAM modulation is included.

The results show that at low SNR, the performance of the classifiers decreases, when included the 16-QAM modulation. However, with increasing SNR, the performance of the classifiers is close to 100% of a correct classification.

In general, the SVM classifier obtained the best results, which can be justified by their robustness, due to its mathematical formulation based on the search of the optimal solution. The naïve Bayes, despite being a simplistic method, also performed well, better than some classifiers already recognized as an ANN and KNN.

## **6. Simulation result of a multipath rayleigh fading and AWGN channel**

Figure 11 through Figure 13 show the results for SNR values from -15 to 15 dB and Doppler frequency FD = [50, 150 and 300] Hz. Modulations used were AM, BPSK, QPSK, BFSK and 16-QAM.

14 Will-be-set-by-IN-TECH

x 109

x 109

modulation [1]. The results that follow show the performance of classifiers to classify the 16-QAM modulation. Figure 10 allows comparison of the performance of classifiers when the

The results show that at low SNR, the performance of the classifiers decreases, when included the 16-QAM modulation. However, with increasing SNR, the performance of the classifiers is

In general, the SVM classifier obtained the best results, which can be justified by their robustness, due to its mathematical formulation based on the search of the optimal solution. The naïve Bayes, despite being a simplistic method, also performed well, better than some

**6. Simulation result of a multipath rayleigh fading and AWGN channel**

Figure 11 through Figure 13 show the results for SNR values from -15 to 15 dB and Doppler frequency FD = [50, 150 and 300] Hz. Modulations used were AM, BPSK, QPSK, BFSK and

0 1 2 3 4 5 6 7 8

212 Advances in Data Mining Knowledge Discovery and Applications Data Mining Applied to

(b) QPSK profile, SNR = 15 dB.

0 1 2 3 4 5 6 7 8

(d) QPSK profile, SNR = -15 dB.

x 109

x 109

0 1 2 3 4 5 6 7 8

(a) BPSK profile, SNR = 15 dB.

0 1 2 3 4 5 6 7 8

(c) BPSK profile, SNR = -15 dB.

**Figure 8.** Profiles of the BPSK and QPSK modulations.

classifiers already recognized as an ANN and KNN.

16-QAM modulation is included.

close to 100% of a correct classification.

16-QAM.

**Figure 9.** Performance of classifiers when trained and tested with different SNR values. Abscissa indicates the SNR adopted for the test set.

Cognitive Radio Systems 17

−15 −10 −5 0 5 10 15

−15 −10 −5 0 5 10 15

The results show that in general, all the classifiers had good performance. However, the decision tree KNN and J4.8 proved very susceptible to noise, especially for SNR values

Furthermore, comparison between the Rayleigh and AWGN channels shows that there was a decrease in the performance of the classifiers. In the experiments, it was used a uniform procedure was used for selection of models of classifiers (i.e., not invested much in the tune

SNR (dB)

SNR (dB)

Naive Bayes KNN J4.8 SVM ANN

Data Mining Applied to Cognitive Radio Systems 215

Naive Bayes KNN J4.8 SVM ANN

between -15 dB and 5 dB.

Probability of Correct Classification(%)

**Figure 11.** Performance of classifiers. Rayleigh fading channel, FD=50 Hz.

**Figure 12.** Performance of classifiers. Rayleigh fading channel, FD=150 Hz.

Probability of Correct Classification (%)

(b) With the 16-QAM modulation.

**Figure 10.** Performance of the classifiers for the classification of 16-QAM modulation.

**Figure 11.** Performance of classifiers. Rayleigh fading channel, FD=50 Hz.

16 Will-be-set-by-IN-TECH

−15 −10 −5 0 5 10 15

(a) Without the 16-QAM modulation.

−15 −10 −5 0 5 10 15

(b) With the 16-QAM modulation.

**Figure 10.** Performance of the classifiers for the classification of 16-QAM modulation.

SNR (dB)

SNR (dB)

Naive Bayes KNN J4.8 SVM ANN

214 Advances in Data Mining Knowledge Discovery and Applications Data Mining Applied to

Naive Bayes KNN J4.8 SVM ANN

Probability of Correct Classification(%)

Probability of Correct Classification (%)

**Figure 12.** Performance of classifiers. Rayleigh fading channel, FD=150 Hz.

The results show that in general, all the classifiers had good performance. However, the decision tree KNN and J4.8 proved very susceptible to noise, especially for SNR values between -15 dB and 5 dB.

Furthermore, comparison between the Rayleigh and AWGN channels shows that there was a decrease in the performance of the classifiers. In the experiments, it was used a uniform procedure was used for selection of models of classifiers (i.e., not invested much in the tune

Cognitive Radio Systems 19

Data Mining Applied to Cognitive Radio Systems 217

[1] A. Fehske, J. Gaeddert, J. R. [2005]. A new approach to signal classification using spectral

[2] Akyildiz, I., Lee, W., Vuran, M. C. & Mohanty, S. [2006]. Next generation/dynamic spectrum access/cognitive radio wireless networks: A survey, *Computer Networks: The International Journal of Computer and Telecommunications Networking* 50: 2127–2159. [3] Allwein, E., Schapire, R. & Singer, Y. [2000]. Reducing multiclass to binary: A unifying approach for margin classifiers, *Journal of Machine Learning Research* pp. 113–141. [4] D. Shimbo; I. Oka; S. Ata; [2007]. An improved algorithm of modulation classification for digital communication signals based on wavelet transform, *Radio and Wireless Symposium,*

[5] Cristianini & Shawe-Taylor, J. [2000]. *An introduction to support vector machines and other*

[7] Gardner, W. A. & Spooner, C. M. [1992]. Signal interception: Performance advantages of

[8] Hastie, T., Tibshirani, R. & Friedman, J. [2001]. *The elements of statistical learning*, Springer

[11] Kadambe, S. & Jiang, Q. [2004]. Classification of modulation of signals of interest, *Digital Signal Processing Workshop, 2004 and the 3rd IEEE Signal Processing Education Workshop.*

[12] Klautau, A., Jevti´c, N. & Orlitsky, A. [2003]. On nearest-neighbor ECOC with application

[13] Lynn, T. J. & Sha'amerr, A. [2007]. Automatic analysis and classification of digital modulation signals using spectogram time frequency analysis, *International Symposium*

[14] Mitola, J. & Maguire, G. Q. [1999]. Cognitive radio: making software radios more

[15] Muller, F., Cardoso, C. & Klautau, A. [2011]. A front end for discriminative learning in automatic modulation classification, *Communications Letters, IEEE* 15(4): 443 –445. [16] Popoola, J. & van Olst, R. [2011]. A novel modulation-sensing method, *Vehicular*

[18] Ramkumar, B. [2009]. Automatic modulation classification for cognitive radios using

*on Communications and Information Technologies, 2007. ISCIT '07.* pp. 916–920.

[17] Quinlan, J. [1993]. *C4.5: Programs for Machine Learning*, Morgan Kaufmann.

cyclic feature detection, *Circuits and Systems Magazine, IEEE* 9(2): 27 –45.

to all-pairs multiclass SVM, *J. Machine Learning Research* 4: 1–15.

personal, *Personal Communications, IEEE* 6(4): 13–18.

URL: *http://dx.doi.org/10.1109/98.788210*

*Technology Magazine, IEEE* 6(3): 60 –69.

[9] Haykin, S. [2001]. *Redes Neurais: Principios e Prática. 2. Ed.*, Porto Alegre: Bookman. [10] Haykin, S., Thomson, D. & Reed, J. [2009]. Spectrum sensing for cognitive radio,

cyclic-feature detectors, *IEEE Transactions on Communications* 40: 149–159.

*Signal Processing Laboratory (LaPS), Federal University of Pará (UFPA), Belém – PA – Brazil*

Yomara Pires, Jefferson Morais and Aldebaro Klautau

correlation and neural networks, *DySPAN* pp. 144–150.

*kernel-based learning methods*, Cambridge University Press. [6] Duda, R., Hart, P. & Stork, D. [2001]. *Pattern classification*, Wiley.

**8. References**

Verlag.

*2007 IEEE* 03: 567–570.

*Proceedings of the IEEE* 97: 849–877.

*2004 IEEE 11th* pp. 226–230.

**Figure 13.** Performance of classifiers. Rayleigh fading channel, FD=300 Hz.

of a specific classifier). This may explain the variation in results for AWGN and multipath fading. A more detailed investigation about the parameters of classifiers such as SVM and ANN would probably improve the results.

## **7. Conclusions**

This chapter discussed the task of modulation classification in cognitive radio. The modulation classification becomes fundamental, since this information allows the RC to adapt its transmission parameters for the spectrum to be shared efficiently, without causing interference to other users. A modulation classifier was implemented based on the characteristics of cyclostationarity of modulated signals. The performance of five data mining techniques were evaluated: naïve Bayes, decision tree J4.8, KNN, SVM, and ANN. In this evaluation, the signal classifications were performed to classifier AM, BPSK, BFSK, QPSK and 16-QAM modulations. An environment with multipath Rayleigh fading and AWGN was adopted.

Simulation results show that it is possible to classify the incoming signals, even at very low SNR, if the The cyclostationarity technique proved an effective technique for feature extraction, even in environments with low SNR. The SVM classifier with a linear kernel presented the best results, even in a fading multipath configuration.

The evaluation of algorithms for modulation classification proposed may serve as a starting point for researchers who want to compare results systematically.

## **Author details**

Lilian C. Freitas and João Costa *Applied Electromagnetism Laboratory (LEA), Federal University of Pará (UFPA), Belém – PA – Brazil* Yomara Pires, Jefferson Morais and Aldebaro Klautau *Signal Processing Laboratory (LaPS), Federal University of Pará (UFPA), Belém – PA – Brazil*

#### **8. References**

18 Will-be-set-by-IN-TECH

−15 −10 −5 0 5 10 15

of a specific classifier). This may explain the variation in results for AWGN and multipath fading. A more detailed investigation about the parameters of classifiers such as SVM and

This chapter discussed the task of modulation classification in cognitive radio. The modulation classification becomes fundamental, since this information allows the RC to adapt its transmission parameters for the spectrum to be shared efficiently, without causing interference to other users. A modulation classifier was implemented based on the characteristics of cyclostationarity of modulated signals. The performance of five data mining techniques were evaluated: naïve Bayes, decision tree J4.8, KNN, SVM, and ANN. In this evaluation, the signal classifications were performed to classifier AM, BPSK, BFSK, QPSK and 16-QAM modulations. An environment with multipath Rayleigh fading and AWGN was

Simulation results show that it is possible to classify the incoming signals, even at very low SNR, if the The cyclostationarity technique proved an effective technique for feature extraction, even in environments with low SNR. The SVM classifier with a linear kernel

The evaluation of algorithms for modulation classification proposed may serve as a starting

*Applied Electromagnetism Laboratory (LEA), Federal University of Pará (UFPA), Belém – PA – Brazil*

SNR (dB)

Naive Bayes KNN J4.8 SVM ANN

216 Advances in Data Mining Knowledge Discovery and Applications Data Mining Applied to

ANN would probably improve the results.

**7. Conclusions**

adopted.

**Author details**

Lilian C. Freitas and João Costa

**Figure 13.** Performance of classifiers. Rayleigh fading channel, FD=300 Hz.

presented the best results, even in a fading multipath configuration.

point for researchers who want to compare results systematically.

Probability of Correct Classification (%)

	- [19] Shen, L., Li, S., Song, C. & Chen, F. [2006]. Automatic modulation classification of mpsk signals using high order cumulants, *8th International Conference on Signal Processing, 2006* 01.
	- [20] Si, L.-L. M. X.-J. [2007]. An improved algorithm of modulation classification for digital communication signals based on wavelet transform, *IEEE Transactions on Aerospace and Electronic Systems* 03: 1226–1231.
	- [21] Vapnik, V. [1995]. *The nature of statistical learning theory*, Springer Verlag.
	- [22] Weka [n.d.]. http://www.cs.waikato.ac.nz/ml/weka.
	- [23] Witten & Frank, E. [2005]. *Data mining: practical machine learning tools and techniques with java implementations*, Morgan Kaufmann.
	- [24] Xiaorong;, H. T. J. [2004]. Modulation classification using arbf networks, *7th International Conference on Signal Processing, ICSP 04.* 03: 1809 – 1812.

© 2012 Filho et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2012 Filho et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**Short-Term Energy Price Prediction** 

price is absolutely crucial for risk management in energy markets.

combined with ARIMA and GARCH model.

**Using Data Mining** 

http://dx.doi.org/10.5772/48472

**1. Introduction** 

Additional information is available at the end of the chapter

**Multi-Step-Ahead in the Brazilian Market** 

José C. Reston Filho, Carolina de M. Affonso and Roberto Célio L. de Oliveira

The electricity price tend to be very volatile due to weather conditions, fuel price, economic growth and many others factors [1]. As a consequence, electricity markets participants face high risks in bilateral contracts and short-term market. With regard to short-term market, generators sell energy at variable pool prices while their fuel cost are fixed. Also, distributors supply energy to most of their costumers at an annual fixed tariff, but they have to purchase electricity at a variable pool price. Then, a reliable tool to forecast electricity

Many papers have proposed hybrid models to energy price prediction. The benefit of the hybrid model is to combine strengths of the techniques providing a robust model capable of capturing the nonlinear nature of the complex time series, producing more accurate forecasts. Reference [2] provides a hybrid methodology that combines both ARIMA and Artificial Neural Network (ANN) models for predicting short-term electricity prices. In [3], a novel technique to forecast day-ahead electricity prices is presented based on Self-Organizing Map neural network (SOM) and Support Vector Machine (SVM) models. Reference [4] proposes a novel price forecasting method based on wavelet transform

The major data mining functions that are developed in research communities include summarization, association, prediction and clustering. This work deals with the energy price prediction problem multi-step-ahead in the Brazilian market. The ARIMA model is used to predict the variables that affect the short-term energy price (exogenous input), instead of predicting the energy price directly as in [2]. The results obtained with the
