2.1. Energy detection-based cooperative spectrum sensing

Figure 1 shows a block diagram of the system model used for energy detection cooperative spectrum sensing based on machine-learning fusion rule. In this model, we consider a cooperative CR network with K cooperative nodes. Each node uses N samples for energy detection,

Figure 1. Block diagram of machine-learning-based fusion rule spectrum sensing.

while M frames are used for training the machine-learning (ML) classifier. The received signal of ith frame at the j th cooperative node yijð Þ n , 1 ≤ n ≤ N , 1 ≤ i ≤ M , 1 ≤ j ≤K is given by

corresponding decisions about the presence or absence of PU transmission in the channel. Then, we use the trained classifier to predict the decisions for newly unseen PU channel frames [3]. The second part focuses on estimating the near future of PU channel state. In the literature, there are many proposals that have studied the problem of estimating PU channel state in cognitive radio (CR) [4–6]. However, most of these studies focused on predicting PU channel state in frequency domain by converting the received digital signals into frequency domain using fast Fourier transform (FFT). This increases the system complexity due to the FFT computations process. In the second part of the chapter, we introduce a new time-domain approach for PU channel state prediction based on time series prediction with some machinelearning prediction model. In particular, a time series is used to capture PU channel state detection sequence (PU channel "idle" or "occupied") in time domain. Then, prediction models such as the hidden Markov model (HMM) and Markov switching model (MSM) are

used to predict the behavior of the time series that used capture PU channel state [7].

2. Machine-learning fusion-based cooperative spectrum sensing

2.1. Energy detection-based cooperative spectrum sensing

118 Machine Learning - Advanced Techniques and Emerging Applications

Figure 1. Block diagram of machine-learning-based fusion rule spectrum sensing.

with simulation experiments.

In this part, we, first, define the system model for energy detection-based spectrum sensing; then, we present the method of calculating the thresholds for energy detector with different fusion rules. Second, we formulate a machine-learning classification problem and present four machine-learning classifiers to solve it. Then, we evaluate the performance of these classifiers

Figure 1 shows a block diagram of the system model used for energy detection cooperative spectrum sensing based on machine-learning fusion rule. In this model, we consider a cooperative CR network with K cooperative nodes. Each node uses N samples for energy detection,

$$\begin{cases} y\_{ij}(n) = \begin{cases} w\_{\vec{\eta}}(n) & H0 \\ \sqrt{\mathcal{V}\_{\vec{\eta}}} \ s\_{\vec{\eta}}(n) + w\_{\vec{\eta}}(n) & H1 \end{cases} \tag{1} $$

where sijð Þ n is the PU signal which is assumed to follow Gaussian i.i.d random process (i.e., zero mean and σ<sup>s</sup> <sup>2</sup> variance), wijð Þ <sup>n</sup> is the noise which is also assumed to follow Gaussian i.i.d random process (zero mean and σ<sup>u</sup> <sup>2</sup> variance) because sijð Þ <sup>n</sup> and wijð Þ <sup>n</sup> are independent. Due to the fact that all K nodes are sensing the same frame at a given time, the global decision about PU channel availability will be made at the fusion center only. Thus, the energy statistic for the ith frame at the j th cooperative node Yij can be represented by the energy test statistic of the ith frame at the fusion center which is given by

$$Y\_i = \frac{1}{N} \sum\_{n=1}^{N} \left| \left( y\_{ij}(n) \right) \right|^2, \qquad 1 \le i \le M \tag{2}$$

Yi is a random variable that has chi-square distribution probability density function (2N degrees of freedom for complex value and with N degrees of freedom for real value case). If we assume that the channel remains unchanged during the observation interval and there are enough number of samples observed ð Þ N ≥ 200 [8], then we can approximate Yi using Gaussian distribution as follows:

$$Y\_i = \begin{cases} \left(\sigma\_{\vec{\eta}}^2, 2\sigma\_{\vec{\eta}}^4/N\right) & H0\\ \left(\sigma\_{\vec{\eta}}^2 \left(1+\mathcal{\boldsymbol{\gamma}}\_{\vec{\eta}}\right), 2\sigma\_{\vec{\eta}}^4 \left(1+\mathcal{\boldsymbol{\gamma}}\_{\vec{\eta}}\right)^2\right) N & H1 \end{cases} \tag{3}$$

where σij 2, is the standard deviation of noise samples wijð Þ <sup>n</sup> , and <sup>γ</sup>ij is the observed signal-tonoise ratio (SNR) of the ith frame sensed at the j th cooperative node. Assuming that the noise variance and the SNR at the node remain unchanged for all M frames, then γij ¼ γ<sup>j</sup> and σij <sup>2</sup> <sup>¼</sup> <sup>σ</sup><sup>j</sup> <sup>2</sup> . For a chosen threshold λ<sup>j</sup> for each frame in the probability of the false alarm, Pf as given in [9] can be written as

$$P\_f(\lambda\_j) = \Pr\{Y\_i > \lambda\_j | H0\}$$

$$= \frac{1}{\sqrt{2\pi\sigma\_j}} \int\_{\lambda\_j}^{\infty} e^{-\left(\lambda\_j - \sigma\_j\right)^2 / \sqrt{2}\sigma\_j^2}$$

$$= \mathbf{Q} \left(\frac{\lambda\_j}{\sigma\_j^2} - 1\right) \tag{4}$$

and the probability of detection Pd is given by

$$P\_d\left(\lambda\_j\right) = \Pr\{Y\_i > \lambda\_j | H1\}$$

$$= Q\left(\left(\frac{\lambda\_j}{\sigma\_j^2 \left(1 + \mathcal{V}\_j\right)} - 1\right) \sqrt{\frac{N}{2}}\right) \tag{5}$$

PdAND <sup>¼</sup> <sup>Q</sup> <sup>λ</sup>AND

ffiffiffiffi 2 N r

PdOR <sup>¼</sup> <sup>1</sup> � ð<sup>1</sup> � <sup>Q</sup> <sup>λ</sup>OR

In soft combination fusion K, cooperative nodes with noise variances σ<sup>11</sup>

Ysi <sup>¼</sup> <sup>X</sup> K

j¼1

unchanged for all the frames during the training process (i.e., γij ¼ γ<sup>j</sup> , σij

K

0 @

PdMRC <sup>¼</sup> <sup>Q</sup> <sup>λ</sup>MRC P<sup>K</sup>

@

0 @ wjσ<sup>j</sup> 2

<sup>j</sup>¼<sup>1</sup> <sup>1</sup> <sup>þ</sup> <sup>γ</sup><sup>j</sup> � �wjσ<sup>j</sup>

j¼1

<sup>λ</sup>MRC <sup>¼</sup> <sup>X</sup>

λOR ¼

2.1.2.3. Maximum ratio combination (optimal MRC) fusion rule

after receiving these energy statistics as follows:

old for MRC fusion rule is written as

And the detection probability PdMRC is given by

And the detection probability PdOR is

2.1.2.2. OR fusion rule

P N n¼1

yijð Þ n �� � � �

� � � 2 σu

The OR rule decides that a signal is present if any of the users detect a signal. The fusion center threshold for K cooperative nodes cooperate using OR fusion rule which can be expressed as

<sup>Q</sup>�<sup>1</sup> <sup>ð</sup><sup>1</sup> � <sup>1</sup> � Pf

σu

instantaneous SNRs {γ11, <sup>γ</sup>22, …, <sup>γ</sup>MK} send their ith frame energy test statistics Yij <sup>¼</sup> <sup>1</sup>

An assumption is made that SNRs and noise variances at the sensing node will remain

optimal linear combination, we need to find the optimum weight vector wj that maximizes the detection probability. For additive white Gaussian noise (AWGN) channels, the fusion thresh-

1

<sup>A</sup>Q�<sup>1</sup> Pf

0 r

� � þ<sup>X</sup> K

> 2 � 1

j¼1

1 A wjσ<sup>j</sup> 2

> ffiffiffiffi N 2

1

! ! r <sup>K</sup>

� �<sup>1</sup> K � � <sup>þ</sup> <sup>1</sup> !

> <sup>2</sup> 1 þ γ<sup>u</sup> � � <sup>1</sup> ! ffiffiffiffi

, 1 ≤ j ≤ K to the fusion center. The fusion center, weighs and adds them together

<sup>2</sup> 1 þ γ<sup>u</sup> � � � <sup>1</sup> ! ffiffiffiffi

! ! r <sup>K</sup>

N 2

Machine Learning Approaches for Spectrum Management in Cognitive Radio Networks

σu

N 2

wj Yij , 1 ≤ i ≤ M (12)

<sup>2</sup>; σ<sup>22</sup>

<sup>2</sup>;…; σMK <sup>2</sup> � � and

<sup>2</sup> <sup>¼</sup> <sup>σ</sup><sup>j</sup>

Þ (13)

A (14)

2). For soft

<sup>2</sup> (10)

http://dx.doi.org/10.5772/intechopen.74599

(9)

121

(11)

N

where Q ð Þ: is the complementary distribution function of Gaussian distribution with zero mean and unit variance. To obtain the optimal threshold λ for K cooperative sensing nodes, data fusion rules are used. The calculation of the thresholds for single user and other fusion rules is presented in subsections 2.1.1 and 2.1.2.

#### 2.1.1. The detection threshold for single-user-based sensing

For single user, sensing the number of the cooperative nodes is one (i.e., K = 1, σ<sup>j</sup> <sup>2</sup> <sup>¼</sup> <sup>σ</sup><sup>u</sup> 2, γ<sup>j</sup> ¼ γu: From Eq. (4) and for a given probability of false alarm Pf, the single-user threshold can be written as

$$
\lambda\_{\text{single}} = \left(\sqrt{\frac{2}{N}} \mathbb{Q}^{-1} \left(P\_f\right) + 1\right) \sigma\_u^{-2} \tag{6}
$$

where Q�<sup>1</sup> ð Þ: is the inverse of the Q ð Þ: function, and the probability of the detection Pdsingle can be written as

$$P\_{dsingle} = Q\left(\left(\frac{\lambda\_{single}}{\sigma\_u^2 \left(1 + \gamma\_u\right)} - 1\right) \sqrt{\frac{N}{2}}\right) \tag{7}$$

#### 2.1.2. The detection threshold for data fusion-based sensing

In a data fusion spectrum sensing scheme, K nodes cooperate in calculating the threshold that is used to make the global sensing decision. There are many fusion rules used to calculate the global sensing decision threshold, which are divided into: hard fusion rules including AND, OR, and majority rule and soft fusion rules including maximum ratio combining (MRC), equal gain combining (EGC), and square law selection (SLS).

#### 2.1.2.1. AND fusion rule

The AND rule decides that the signal is present if all users have detected the signal. For a system with K cooperative nodes with the same false alarm probability Pf cooperating using AND rule, the fusion center threshold can be expressed as

$$
\lambda\_{\rm AND} = \left(\sqrt{\frac{2}{N}} \mathbb{Q}^{-1} \left(P\_f^{\frac{1}{2}}\right) + 1\right) \sigma\_u^{\frac{2}{2}} \tag{8}
$$

And the detection probability PdAND can be written as

Machine Learning Approaches for Spectrum Management in Cognitive Radio Networks http://dx.doi.org/10.5772/intechopen.74599 121

$$P\_{dAND} = \left( Q\left( \left( \frac{\lambda\_{AND}}{\sigma\_u^2 \left(1 + \gamma\_u\right)} - 1 \right) \sqrt{\frac{N}{2}} \right) \right)^K \tag{9}$$

#### 2.1.2.2. OR fusion rule

Pd λ<sup>j</sup>

120 Machine Learning - Advanced Techniques and Emerging Applications

rules is presented in subsections 2.1.1 and 2.1.2.

be written as

where Q�<sup>1</sup>

be written as

2.1.2.1. AND fusion rule

2.1.1. The detection threshold for single-user-based sensing

2.1.2. The detection threshold for data fusion-based sensing

gain combining (EGC), and square law selection (SLS).

AND rule, the fusion center threshold can be expressed as

And the detection probability PdAND can be written as

λAND ¼

� � <sup>¼</sup> Pr Yi <sup>&</sup>gt; <sup>λ</sup>jjH<sup>1</sup> � �

@

0 @

σj

For single user, sensing the number of the cooperative nodes is one (i.e., K = 1, σ<sup>j</sup>

ffiffiffiffi 2 N r

λsingle ¼

Pdsingle ¼ Q

γ<sup>j</sup> ¼ γu: From Eq. (4) and for a given probability of false alarm Pf, the single-user threshold can

Q�<sup>1</sup> Pf � � <sup>þ</sup> <sup>1</sup>

!

ð Þ: is the inverse of the Q ð Þ: function, and the probability of the detection Pdsingle can

! r

λsingle σu<sup>2</sup> 1 þ γ<sup>u</sup> � � � <sup>1</sup> ! ffiffiffiffi

In a data fusion spectrum sensing scheme, K nodes cooperate in calculating the threshold that is used to make the global sensing decision. There are many fusion rules used to calculate the global sensing decision threshold, which are divided into: hard fusion rules including AND, OR, and majority rule and soft fusion rules including maximum ratio combining (MRC), equal

The AND rule decides that the signal is present if all users have detected the signal. For a system with K cooperative nodes with the same false alarm probability Pf cooperating using

> Q�<sup>1</sup> Pf 1 K � �

!

þ 1

σu

ffiffiffiffi 2 N r

λj

0 r

1 A

ffiffiffiffi N 2

σu

N 2

1

A (5)

<sup>2</sup> (6)

<sup>2</sup> (8)

<sup>2</sup> <sup>¼</sup> <sup>σ</sup><sup>u</sup> 2,

(7)

<sup>2</sup> 1 þ γ<sup>j</sup> � � � <sup>1</sup>

where Q ð Þ: is the complementary distribution function of Gaussian distribution with zero mean and unit variance. To obtain the optimal threshold λ for K cooperative sensing nodes, data fusion rules are used. The calculation of the thresholds for single user and other fusion

¼ Q

The OR rule decides that a signal is present if any of the users detect a signal. The fusion center threshold for K cooperative nodes cooperate using OR fusion rule which can be expressed as

$$
\lambda\_{\rm OR} = \left( \sqrt{\frac{2}{N}} \mathbb{Q}^{-1} \left( \left( 1 - \left( 1 - P\_f \right)^{\frac{1}{K}} \right) + 1 \right) \sigma\_u^{-2} \tag{10}
$$

And the detection probability PdOR is

$$P\_{dOR} = \left(1 - (1 - Q\left(\left(\frac{\lambda\_{OR}}{\sigma\_u^2 \left(1 + \gamma\_u\right)} 1\right) \sqrt{\frac{N}{2}}\right)\right)^K \tag{11}$$

#### 2.1.2.3. Maximum ratio combination (optimal MRC) fusion rule

In soft combination fusion K, cooperative nodes with noise variances σ<sup>11</sup> <sup>2</sup>; σ<sup>22</sup> <sup>2</sup>;…; σMK <sup>2</sup> � � and instantaneous SNRs {γ11, <sup>γ</sup>22, …, <sup>γ</sup>MK} send their ith frame energy test statistics Yij <sup>¼</sup> <sup>1</sup> N P N n¼1 yijð Þ n �� � � � � � � 2 , 1 ≤ j ≤ K to the fusion center. The fusion center, weighs and adds them together after receiving these energy statistics as follows:

$$\text{Ys}\_{i} = \sum\_{j=1}^{K} w\_{j} \text{ Y}\_{ij} \qquad , \text{ 1 } \le i \le M \tag{12}$$

An assumption is made that SNRs and noise variances at the sensing node will remain unchanged for all the frames during the training process (i.e., γij ¼ γ<sup>j</sup> , σij <sup>2</sup> <sup>¼</sup> <sup>σ</sup><sup>j</sup> 2). For soft optimal linear combination, we need to find the optimum weight vector wj that maximizes the detection probability. For additive white Gaussian noise (AWGN) channels, the fusion threshold for MRC fusion rule is written as

$$\lambda\_{\rm MRC} = \left(\sum\_{j=1}^{K} w\_{j} \sigma\_{j}^{2}\right) \mathbb{Q}^{-1} \left[\left.P\_{f}\right| + \sum\_{j=1}^{K} w\_{j} \sigma\_{j}^{2}\right) \tag{13}$$

And the detection probability PdMRC is given by

$$P\_{dMRC} = Q\left(\left(\frac{\lambda\_{MRC}}{\sum\_{j=1}^{K} \left(1 + \gamma\_j\right) \overline{w\_j} \sigma\_j^2} - 1\right) \sqrt{\frac{N}{2}}\right) \tag{14}$$

where the weighting coefficient vector wjf g w1; w2…wK can be obtained by:

$$\mathbf{w}\_{\mathbf{j}} = \text{sign}(\mathbf{g}^{\text{T}} \mathbf{w}\_0 \mathbf{}) \text{ w}\_0$$

di <sup>¼</sup> <sup>1</sup> YF <sup>≥</sup> <sup>λ</sup> �1 YF < λ

where <sup>λ</sup> <sup>∈</sup> <sup>λ</sup>single; <sup>λ</sup>and; <sup>λ</sup>OR; <sup>λ</sup>MRC; <sup>λ</sup>EGC; <sup>λ</sup>SLS � �, YF <sup>∈</sup> Yi f g ;Ysi , M is the number of frames in the training set and "�1 " represents the absence of primary user on the channel, and "1" represents the presence of the primary user transmission on the channel. The output of Eq. (17) gives a set of pairs Yi ð Þ ; di , i ¼ 1, 2…M, di ∈ð Þ �1; 1 that represent frame energy test statistics and their corresponding decisions. If we want to detect the decision (i.e., the class label) dx associated with a new frame energy test statistic Yx, we can use one of the following machine-learning classifiers

For K-nearest neighbors classifier,K nearest points to Yx are used to predict the class label dx which corresponds to Yx [11]. For K ¼ 1 , the Euclidian distance dst between Yx and the

2

and, the new Yx is classified with the label dx = din , where din is the point that achieves the

Under the assumption that di ¼ �1 and di ¼ 1 are independent, the prior probabilities for di ¼ �1 and di ¼ 1 given training example Yi ð Þ ; di , i ¼ 1, 2, …, M can be calculated, and the class-conditional densities (likelihood probabilities) can also be estimated from the set Y1;Y2;…;Yk ½ �:½ � Y1; Y2;…; Yk in which the new Yx is expected to fall in. And, the probability that the new Yx to be a member of either di ¼ �1 or di ¼ 1 class is calculated using Naïve

Prð Þ di

Y<sup>k</sup> j¼1

total number of class labels

total number of class labels

Pr Yj=di

1}

0}

� � (19)

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ð Þ Yx � Yi

�

to solve this classification problem.

2.2.1. K-nearest neighbors (KNN) classifier

training data points can be computed as

2.2.2. Naïve Bayes classifier

ian density function by:

dstðÞ¼ i

minimum Euclidian distance between dst and Yx:

Bayes assumption and Bayes rule [12] as follows:

where the prior probabilities are given to

q

class Yð Þ¼ <sup>x</sup> argmax

di

Prð Þ¼ di ¼ �<sup>1</sup> number of Yi with class label}

Prð Þ¼ di <sup>¼</sup> <sup>1</sup> number of Yi with a class label}

Whereas the class-conditional densities "likelihood probabilities" can be estimated using Gauss-

1 ≤ i ≤ M

Machine Learning Approaches for Spectrum Management in Cognitive Radio Networks

http://dx.doi.org/10.5772/intechopen.74599

¼ Yx � Yi j j i ¼ 1, 2…M (18)

(17)

123

where

$$w\_0 = \frac{L\_{H1}^{-1/2} \left[L\_{H1}^{-1/T}\right]^T \mathcal{S}}{\left\|L\_{H1}^{-1/2} \left[L\_{H1}^{-1/2}\right]^T \mathcal{S}\right\|}$$

where

$$L\_{H1} = 2 \operatorname{diag} \left( \sigma\_1^4 \left( 1 + \boldsymbol{\gamma}\_1 \right)^2, ..., \sigma\_k^4 \left( 1 + \boldsymbol{\gamma}\_k \right)^2 \right) / N$$

$$\boldsymbol{g} = \left[ \sigma\_1^2 \boldsymbol{\gamma}\_1, \sigma\_2^2 \boldsymbol{\gamma}\_2, \sigma\_3^2 \boldsymbol{\gamma}\_3, \sigma\_4^2 \boldsymbol{\gamma}\_4, ..., \sigma\_k^2 \boldsymbol{\gamma}\_k \right]^T$$

#### 2.1.2.4. Equal gain combination (EGC) fusion rule

Equal weight linear combining employs straightforward averaging of the received soft decision statistics. In the equal gain combination, the received energies are equally weighted and then added together. The calculation of the threshold λEGC and the detection probability PdEGC follow Eqs. (13) and (14), respectively; the weighting vector is wj ¼ w1;…wK � � where <sup>w</sup><sup>1</sup> <sup>¼</sup> <sup>w</sup><sup>2</sup> <sup>¼</sup> <sup>w</sup>3… <sup>¼</sup> wK <sup>¼</sup> <sup>1</sup><sup>=</sup> ffiffiffi K <sup>p</sup> [10].

#### 2.1.2.5. Square law selection (SLS) fusion rule

Here, the fusion center selects the node with the highest SNR γSLS ¼ MAX γ1; γ2; ::γ<sup>k</sup> � � and considers the noise variance σSLS <sup>2</sup> associated with that node. Then the fusion center threshold is calculated as follows:

$$
\lambda\_{SLS} = \left(\sqrt{\frac{2}{N}} \mathbb{Q}^{-1} \left(1 - \left(1 - P\_f\right)^{\frac{1}{K}}\right) + 1\right) \sigma\_{SLS} \tag{15}
$$

And the detection probability PdSLS is

$$P\_{dLS} = 1 - \left( (1 - Q\left( \left( \frac{\lambda\_{SLS}}{\sigma\_{SLS} ^2 \{1 + \chi\_{SLS}\}} - 1 \right) \sqrt{\frac{N}{2}} \right) \right)^K \tag{16}$$

#### 2.2. Machine-learning classification problem formulation

The ith frame energy test statistic ( Yi for hard fusion or Ysi for soft fusion rule) given in Eq. (2) or (12) is compared to the sensing threshold to calculate the decision di associated with ith frame in the training data set as follows:

Machine Learning Approaches for Spectrum Management in Cognitive Radio Networks http://dx.doi.org/10.5772/intechopen.74599 123

$$d\_i = \begin{cases} 1 & \mathcal{Y}\_{\mathcal{F}} \ge \lambda \\ -1 & \mathcal{Y}\_{\mathcal{F}} < \lambda \end{cases} \qquad 1 \le i \le M \tag{17}$$

where <sup>λ</sup> <sup>∈</sup> <sup>λ</sup>single; <sup>λ</sup>and; <sup>λ</sup>OR; <sup>λ</sup>MRC; <sup>λ</sup>EGC; <sup>λ</sup>SLS � �, YF <sup>∈</sup> Yi f g ;Ysi , M is the number of frames in the training set and "�1 " represents the absence of primary user on the channel, and "1" represents the presence of the primary user transmission on the channel. The output of Eq. (17) gives a set of pairs Yi ð Þ ; di , i ¼ 1, 2…M, di ∈ð Þ �1; 1 that represent frame energy test statistics and their corresponding decisions. If we want to detect the decision (i.e., the class label) dx associated with a new frame energy test statistic Yx, we can use one of the following machine-learning classifiers to solve this classification problem.

#### 2.2.1. K-nearest neighbors (KNN) classifier

where the weighting coefficient vector wjf g w1; w2…wK can be obtained by:

w<sup>0</sup> ¼

LH<sup>1</sup> ¼ 2 diag σ<sup>1</sup>

g ¼ σ<sup>1</sup> 2 γ1; σ<sup>2</sup> 2 γ2; σ<sup>3</sup> 2 γ3; σ<sup>4</sup> 2

Eqs. (13) and (14), respectively; the weighting vector is wj ¼ w1;…wK

ffiffiffiffi 2 N r

PdSLS <sup>¼</sup> <sup>1</sup> � ð<sup>1</sup> � <sup>Q</sup> <sup>λ</sup>SLS

2.1.2.4. Equal gain combination (EGC) fusion rule

122 Machine Learning - Advanced Techniques and Emerging Applications

K <sup>p</sup> [10].

considers the noise variance σSLS

And the detection probability PdSLS is

frame in the training data set as follows:

is calculated as follows:

2.1.2.5. Square law selection (SLS) fusion rule

λSLS ¼

2.2. Machine-learning classification problem formulation

where

where

<sup>w</sup>3… <sup>¼</sup> wK <sup>¼</sup> <sup>1</sup><sup>=</sup> ffiffiffi

wj <sup>¼</sup> sign gTw0

�1=<sup>2</sup> LH<sup>1</sup>

�1=<sup>2</sup> LH<sup>1</sup>

LH<sup>1</sup>

LH<sup>1</sup>

<sup>4</sup> <sup>1</sup> <sup>þ</sup> <sup>γ</sup><sup>1</sup> � �<sup>2</sup>

� � � � � � w0

�1=<sup>T</sup> h i<sup>T</sup>

�1=<sup>2</sup> h i<sup>T</sup>

;…::σ<sup>k</sup>

� �<sup>2</sup> � �

� �<sup>T</sup>

Equal weight linear combining employs straightforward averaging of the received soft decision statistics. In the equal gain combination, the received energies are equally weighted and then added together. The calculation of the threshold λEGC and the detection probability PdEGC follow

Here, the fusion center selects the node with the highest SNR γSLS ¼ MAX γ1; γ2; ::γ<sup>k</sup>

<sup>Q</sup>�<sup>1</sup> <sup>1</sup> � <sup>1</sup> � Pf

The ith frame energy test statistic ( Yi for hard fusion or Ysi for soft fusion rule) given in Eq. (2) or (12) is compared to the sensing threshold to calculate the decision di associated with ith

!

� �<sup>1</sup> K

σSLS<sup>2</sup> 1 þ γSLS

! ! r <sup>K</sup>

� � � <sup>1</sup> ! ffiffiffiffi

� �

g

g

� � � �

<sup>4</sup> <sup>1</sup> <sup>þ</sup> <sup>γ</sup><sup>K</sup>

γ4;…:; σ<sup>K</sup>

2 γK

<sup>2</sup> associated with that node. Then the fusion center threshold

σSLS

N 2

þ 1

=N

� � where <sup>w</sup><sup>1</sup> <sup>¼</sup> <sup>w</sup><sup>2</sup> <sup>¼</sup>

� � and

(16)

<sup>2</sup> (15)

For K-nearest neighbors classifier,K nearest points to Yx are used to predict the class label dx which corresponds to Yx [11]. For K ¼ 1 , the Euclidian distance dst between Yx and the training data points can be computed as

$$d\_{st}(i) = \sqrt{\left(\Upsilon\_x - \Upsilon\_i\right)^2} = |\ Y\_x - \Upsilon\_i| \quad i = 1, 2...M\tag{18}$$

and, the new Yx is classified with the label dx = din , where din is the point that achieves the minimum Euclidian distance between dst and Yx:

#### 2.2.2. Naïve Bayes classifier

Under the assumption that di ¼ �1 and di ¼ 1 are independent, the prior probabilities for di ¼ �1 and di ¼ 1 given training example Yi ð Þ ; di , i ¼ 1, 2, …, M can be calculated, and the class-conditional densities (likelihood probabilities) can also be estimated from the set Y1;Y2;…;Yk ½ �:½ � Y1; Y2;…; Yk in which the new Yx is expected to fall in. And, the probability that the new Yx to be a member of either di ¼ �1 or di ¼ 1 class is calculated using Naïve Bayes assumption and Bayes rule [12] as follows:

$$\text{class}(\,\,\,Y\_x) = \operatorname\*{argmax}\_{d\_i} \Pr(\,\,d\_i) \prod\_{j=1}^k \Pr(\,\,Y\_j/d\_i) \tag{19}$$

where the prior probabilities are given to

$$\Pr(d\_i = -1) = \frac{number \text{ of } Y\_i \text{ with } \text{ class label } 1^{\circ}}{total \text{ number of class labels}}$$

$$\Pr(d\_i = 1) = \frac{number \text{ of } Y\_i \text{ with a class label } 0^{\circ}}{total \text{ number of class labels}}$$

 $\mathbf{a}\_i = 1$  $) = \begin{array}{c} \hline \text{total number of class labels} \\ \hline \end{array}$ 

Whereas the class-conditional densities "likelihood probabilities" can be estimated using Gaussian density function by:

$$\Pr\left(\,\,Y\_{j}/d\_{i}\right) = \frac{1}{\sigma\_{j}\sqrt{2\pi}}e^{\frac{-\left(\chi-\rho\_{j}\right)}{2\sigma\_{j}}}, \qquad Y\_{1} < Y < Y\_{k\nu}\sigma\_{j} > 0,$$

where μ<sup>j</sup> , σ<sup>j</sup> are mean and variance of the set Y1;Y2; …;Yk ½ �: Eq. (19) means that Naïve Bayes classifier will label the new Yx with the class label di that achieves the highest posterior probability.

#### 2.2.3. Support vector machine (SVM) classifier

For a given training set of pairs Yi ð Þ ; di , i ¼ 1, 2…M , where Yi ∈R , and di ∈ ð Þ þ1; �1 , the minimum weight w and a constant b that maximize the margin between the positive and negative class (i.e., w Yi þ b ¼ �1 ) with respect to the hyper-plane equation w Yi þ b ¼ 0 can be estimated using support vector machine classifier by performing the following optimization [13].

$$\min\_{w, b} \left( \frac{||w||^2}{2} \right) \quad , \qquad \text{where } ||w||^2 = w^T w \tag{20}$$

which means that the classification of new Yx can be expressed as dot product of Yx and the

Machine Learning Approaches for Spectrum Management in Cognitive Radio Networks

http://dx.doi.org/10.5772/intechopen.74599

125

For the training of a set of pairs of sensing decision Yi ð Þ ; di , i ¼ 1, 2, …, M , di ∈ ð Þ �1; 1 , the decision tree classifier creates a binary tree based on either impurity or node error splitting rule in order to split the training set into separate subset. Then, it repeats the splitting rule recursively for each subset until the leaf of the subset becomes pure. After that, it minimizes the error in each leaf by taking the majority vote of the training set in that leaf [14]. For classifying a new example Yx, DT classifier selects the leaf where the new Yx falls in and classifies the new

Figure 2 shows the receiver operating characteristic (ROC) curves for single-user soft and hard fusion rules under Additive White Gaussian Noise (AWGN) channel. In order to generate this figure, we assume a cognitive radio system with 7 cooperative nodes (i.e., K = 7) operate at SNR γ<sup>u</sup> = �22 dB. The local node decisions are made after observing1000 samples (i.e., energy

Yx with the class label that occurs most frequently among that leaf.

Figure 2. ROC curves for the soft and hard fusion rules under the case of AWGN receiver noise, σ<sup>u</sup>

users and energy detection over N=1000 samples.

<sup>2</sup> <sup>¼</sup> 1,γu<sup>=</sup> �22 dB, <sup>K</sup> = 7

support vectors.

2.2.4. Decision tree (DT) classifier

2.3. Performance discussion

subject to dið Þ w Yi þ b ≥ 1 i ¼ 1, 2, …, M:

The solution of this quadratic optimization problem can be expressed using Lagrangian function as

$$L(w, b, \alpha) = \frac{||w||^2}{2} - \sum\_{i=1}^{M} \alpha\_i (d\_i(w \ Y\_i + b) - 1), \alpha\_i \ge 0 \tag{21}$$

where α ¼ ð Þ α1; α2;…; α<sup>M</sup> is the Lagrangian multipliers. IF we let L wð Þ¼ ; b; α 0 , we can get <sup>w</sup> <sup>¼</sup> <sup>P</sup><sup>M</sup> <sup>i</sup>¼<sup>1</sup> <sup>α</sup><sup>i</sup> di Yi and <sup>P</sup><sup>M</sup> <sup>i</sup>¼<sup>1</sup> <sup>α</sup>idi <sup>¼</sup> 0 , and by substituting them into Eq. (21), the dual optimization problem that describes the hyper-plane can be written as

$$\min\_{\alpha} \left( \frac{1}{2} \sum\_{i=1}^{M} \sum\_{j=1}^{M} d\_i d\_j \begin{pmatrix} Y\_i \ Y\_j \end{pmatrix} \alpha\_i \alpha\_j - \sum\_{i=1}^{M} \alpha\_i \right), \alpha\_j \ge 0 \tag{22}$$

From expression (22), we can assess <sup>α</sup> and compute w using <sup>w</sup> <sup>¼</sup> <sup>P</sup><sup>M</sup> <sup>i</sup>¼<sup>1</sup> <sup>α</sup><sup>i</sup> di Yi. Then by choosing α<sup>i</sup> > 0, from the vector of α ¼ ð Þ α1; α2;…; α<sup>M</sup> and calculating b from b = dj � <sup>P</sup><sup>M</sup> <sup>i</sup>¼<sup>1</sup> <sup>α</sup>idi Yi Yj � �, we classify the new instance Yx using the following classification function

$$\text{class}(Y\_x) = \text{sign}\left(\sum\_{i=1}^{M} \alpha\_{\hat{f}} d\_{\hat{f}}(Y\_i | Y\_x) + b\right) \tag{23}$$

which means that the classification of new Yx can be expressed as dot product of Yx and the support vectors.
