**5.5 Hierarchical hidden Markov classification**

Similar to HHMM [17], the PHHMM model uses the Baum-Welch algorithm to calculate the likelihood-maximizing parameters of the model given the observed data. It comprises four phases: the initial phase where the *λ* is randomly assumed if there is no prior knowledge, the forward phase where the forward variable is calculated recursively, the backward phase where the backward variable is calculated, and the update phase to update the parameters. Then, the model uses the Viterbi algorithm to find the most likely sequence of the hidden states given the observed data and the parameters. Finally, it uses the DDoS detection algorithm to detect the multistage attack based on the observed alert sequence, where, the standard HHMM focuses on a single category with a limited amount of features thus it is difficult to detect attack traffic that appears to be normal traffic.

The traditional HHMM constitutes multi-single states that are considered as selfcontained probabilistic models [18]. However, due to the heterogeneity of IoT traffic, we design each state to have multiple separate lower HMM layers and one upper HMM layer, each lower state constitutes three levels:


The model has one upper HMM state for predicting DDoS attacks that use the attack sequence from the lower states to learn new patterns of DDoS attacks by the DDoS detection algorithm. Thus, we can detect the multistage DDoS attacks in this extended mode, unlike the standard HHMM.

Model Training (Baum-Welch algorithm) Baum-Welch algorithm [19] is a recursive Expectation–Maximization method for estimating un-observed hidden parameters in an HHMM model. This algorithm facilitates the complex challenges of analytically applying maximum likelihood estimation. It trains the HHMMs to find the optimal *λ*. Starting with initialized values, the algorithm iteratively adjusts the parameters based on a set of observed feature vectors.

By this algorithm, for HHMM models (*λ ql* <sup>1</sup> , *λ ql* <sup>2</sup> , … , *<sup>λ</sup>ql <sup>n</sup>* ) and a given sequence of observations *O<sup>q</sup><sup>l</sup>* <sup>¼</sup> *<sup>O</sup>ql* <sup>1</sup> , *<sup>O</sup><sup>q</sup><sup>l</sup>* <sup>2</sup> , … , *<sup>O</sup>ql t* , we choose *<sup>λ</sup><sup>q</sup><sup>l</sup>* <sup>¼</sup> *Aql* , *Bql* , *<sup>π</sup><sup>q</sup><sup>l</sup>* such that *P O*j*λ ql i* , *<sup>i</sup>* <sup>¼</sup> ½ � 1, *<sup>n</sup>* is locally maximized.

**Algorithm 2**: The Baum–Welch algorithm.

**Input** Observation sequence *Oql* **. Output** Re-estimated model parameters:


$$\begin{aligned} a\_{t+1}^l \left( q\_j^l \right) &= b\_j^{q^{l-1}} (O\_{t+1}) \sum a\_t \left( q\_j^{l+1} \right) a\_{ij}^{q\_j^l} \\ \text{6: Calculate backward variable } \beta \left( q\_j^l \right) &\text{:} \end{aligned}$$

$$\begin{array}{c} \mathsf{CanCunanc} \ \mathsf{bac} \mathsf{cawanu} \ \mathsf{var} \ \mathsf{aanc} \ \rho \left( q\_{i} \right) \end{array}$$

$$\begin{array}{c} \rho\_{t}^{l} \left( q\_{i}^{l} \right) = \sum b\_{j}^{q^{l-1}} \left( O\_{t+1} \right) \ a\_{ij}^{q\_{i}^{l-1}} \ \beta\_{t+1}^{d} \left( q\_{j}^{l} \right) \end{array}$$

7: Calculate downward and upward variable *ε q<sup>l</sup> i* , *q<sup>l</sup> j* � �:

$$\varepsilon = \frac{a\_t(q\_i^l) a\_{\dot{\imath}}^{q^{l-1}} \beta\_{t+1} \left( q\_j^l \right)}{P(O|\lambda)}$$

8: Estimate the state probability *γ<sup>h</sup> ql i* � � and *γ<sup>f</sup> q<sup>l</sup> i* � �:

$$\chi\_{h\_t}(q\_i^l) = \frac{a\_t(q\_i^l)\beta\_t\left(q\_j^l\right)}{\cdot\_{\cdot}^{P(O|\boldsymbol{\lambda})}} \qquad \qquad \qquad \chi\_{f\_t}(q\_i^l) = \frac{a\_t(q\_i^l)\beta\_t(q\_i^l)}{\cdot\_{\cdot}^{P(O|\boldsymbol{\lambda})}}$$

*i*

9: Compute the optimal state sequence *X i*ð Þ , *j* :

$$X(i,j) = \frac{a\_t^l(i)a\_{\dot{\imath}\dot{\jmath}}^l \beta\_{t+1}^l(j)b\_j^l(O\_{t+1})}{\sum\sum\_{t'} a\_t^l(i)a\_{\dot{\imath}\dot{\jmath}}^l \beta\_{t+1}^l(j)b\_j^l(O\_{t+1})}$$

$$\begin{array}{ll} \textbf{10:}\ \textbf{Estimate}\ a\_{ij}^{\;q^l}\ \textbf{and}\ b\_{jh}^{\prime q^l};\\\ \sum\ \varepsilon\left(q\_i^{l+1},q\_j^{l+1}\right) \end{array}$$

$$\begin{aligned} a\_{\stackrel{\scriptstyle q}{\dot{\forall}}}^{\prime q^l} &= \frac{\sum \mathbf{e}\left(q\_i \mathbf{i} \cdot \mathbf{q}\_j\right)}{\sum \mathbf{\gamma}\_{h\_l}\left(q\_i^{l+1}\right)}\\ b\_{\stackrel{\scriptstyle q^l}{\dot{\lor}}}^{\prime q^l} &= \frac{\sum \mathbf{\gamma}\_f\left(q\_i^l\right) + \sum \mathbf{\gamma}\_h\left(q\_i^l\right)}{\sum \mathbf{\gamma}\_f\left(q\_i^l\right) + \sum \mathbf{\gamma}\_i\left(q\_i^l\right)} \end{aligned}$$

*jh* ¼ P*γ<sup>f</sup> q<sup>l</sup> i* � � <sup>þ</sup> <sup>P</sup>*γ<sup>h</sup> <sup>q</sup><sup>l</sup> i* � � 11: Calculate *P O*ð Þ j*λ* through the estimated parameters: *<sup>ε</sup>* <sup>¼</sup> *P O <sup>λ</sup>*0*q<sup>l</sup>* � � � � � *P O*j*λ<sup>q</sup><sup>l</sup>* � �*P O <sup>λ</sup><sup>q</sup><sup>l</sup>* � � � � <sup>¼</sup> *P O*j*λ*0*ql* � �*<sup>a</sup> ql ij* <sup>¼</sup> *<sup>a</sup>*0*ql ij b ql jh* <sup>¼</sup> *<sup>b</sup>*0*q<sup>l</sup> jh* � � 12: **until** *ε*< *Th*. 13: **return** *a*0*q<sup>l</sup> ij* , *<sup>b</sup>*<sup>0</sup>*q<sup>l</sup> jh*.

Model Decoding (Viterbi algorithm) Viterbi decoding algorithm [26] predicts the hidden traffic states. This algorithm only uses state-optimized joint likelihood for observation data and the underlying Markovian state sequence as the objective function for estimation. Opposed to the BW algorithm, it does not update all likely paths for all states in the HHMM.

**Algorithm 3**: Viterbi algorithm [26].

**Input** Re-estimated model parameters *a*0*q<sup>l</sup> ij* , *<sup>b</sup>*<sup>0</sup>*ql jh* from BA algorithm,

**Output** Re-estimated state transition matrix *a*0*ql ij* , and Alert observation likelihood sequence *<sup>b</sup>*<sup>0</sup>*q<sup>l</sup> jh*.

1: **Initialize** Observations **(***M***)**, States **(***qi* **)**, Threshold **(***Th***)**, *aij***,** *bj*ð Þ *vk* .

2: Obtain the model (*λ*) through expected probability *P O*ð Þ j*λ* .

*An Effective Method for Secure Data Delivery in IoT DOI: http://dx.doi.org/10.5772/intechopen.104663*

## 3: **reiterate.**

4: Split *Oq<sup>l</sup>* into *N* states through Viterbi decoding. 5: *a*0*q<sup>l</sup> ij* <sup>¼</sup> *no:of states transitions fromitoj total no:of states transition fromi* 6: *<sup>b</sup>*0*q<sup>l</sup> <sup>j</sup>* ð Þ¼ *vk no:ofoccurrencesofobservationkinstatej totalno:ofobservationsinstatej* where, *b<sup>l</sup> j* ð Þ¼ *vk P Oql <sup>t</sup>* ¼ *vk*j*q ql <sup>t</sup>* ¼ *Sj* � �. 7: Normalize row sums of *a*0*q<sup>l</sup> ij* and *<sup>b</sup>*0*ql <sup>j</sup>*<sup>ð</sup> *vk*<sup>Þ</sup> to unity so that all elements <sup>∈</sup> [0,1]. 8: Estimate *λ*0*<sup>l</sup>* from *Oql* , *a*0*q<sup>l</sup> ij* and *<sup>b</sup>*0*ql <sup>j</sup>* ð Þ *vk* . 9: *<sup>λ</sup><sup>d</sup>* <sup>¼</sup> *<sup>λ</sup>*0*q<sup>l</sup>* , *a ql ij* <sup>¼</sup> *<sup>a</sup>*0*ql ij* , *b ql jvk* <sup>¼</sup> *<sup>b</sup>*0*ql <sup>j</sup>* ð Þ *vk* . 10: **until** ∣*λ*0*<sup>l</sup>* � *<sup>λ</sup><sup>l</sup>* ∣> *Th*. 11: **return** *a*0*<sup>l</sup> ij*, *<sup>b</sup>*0*<sup>l</sup> jvk* .

Model Detection algorithm This algorithm uses prior knowledge to learn about the previous attack behavior and track the attack alerts. It gets the likelihood probability of the observation sequence *Oqd <sup>i</sup>* then predicts the DDoS attack behavior based on the occurrence of the attack observation sequence in the previous algorithm.

**Algorithm 4**: Detection algorithm.

**Input** Alert observation sequence *Oql <sup>i</sup>* , no. of iterations *G*, *λ<sup>l</sup> <sup>i</sup>* <sup>¼</sup> *<sup>A</sup>ql* , *Bql* , *<sup>π</sup><sup>q</sup><sup>l</sup>* � �.

**Output** Attack alerts observation sequence *O*0*q<sup>l</sup> i* .

1: **Initialize** *α ql* 1 <sup>0</sup> ðÞ¼*j π ql* 1 *j b ql* 1 *<sup>j</sup>* ð Þ *O*<sup>0</sup> .

2: Calculate the probability of *Oql* 1 :

$$a\_j^{q'}(\mathbf{1}) = \pi\_j^{q'} b\_j^{q'}(O\_1).$$

3: Calculate the probability of observation sequence (*Oql* ):

$$a\_{t+1}^{q^l}(j) = |
∑ {\alpha}\_t^{q^l}(i) a\_{\dot{\imath}}^{q^l}| b\_j^{q^l}(O\_{t+1})$$

4: Calculate the likelihood sequence of the observable sequence (O) obtained by the model:

$$\begin{array}{l} P\left(O|\mathcal{X}^{q^{l}}\right) = \sum a\_{t}^{q^{l}}(j) \\ \text{5: If } (\log P\left(O|\mathcal{X}^{q^{l}}\right) < -\text{Th}). \\ \text{left}\_{i} = \text{alert}\_{i} + 1 \\ \text{6: } \textbf{until}\ G \text{ is reached.} \\ \text{7: } \textbf{return}\ alert. \end{array}$$
