**2. Methodology**

Binary classification is the task of classifying elements of a given set into two groups, on the basis of a classification rule [18]. The objective of the proposed model consists of achieving anomaly classification based on the provided observations. The first process consists of performing feature selection and data categorization, to provide the proposed algorithm with input data. The DCA performs context assessment and finally, a classifier is used to produce a concrete assessment. Each observation is then classified as normal or anomalous and performance metrics are generated. The objective of this section is to introduce mathematical and algorithmic background. The proposed methodology contains four phases, namely dataset preprocessing, algorithm initialization, detection, and classification.

The Danger Theory model [24] was proposed by French immunologist Polly Matzinger and is mainly centered on the interactions of signals emitted by cells and antigens. These signals denote when a cell or a tissue is experiencing regular or abnormal behavior, such as programmed or unexpected cell death (known as apoptosis and necrosis respectively) or stress caused by antigens (pathogen or harmful organism signatures). The signals are categorized into three groups, namely Pathogen Associated Molecular Patterns (PAMP), Safe Signals (SS) and Danger Signals (DS). Biological Dendritic Cells are Human Immune System cells, constantly sensing the environment for such signals. These are collected (ingested) in order to assess whether the present alterations are due to an attacking organism or as a result of a normal process, for which an immune response is not necessary (known as a regulatory or tolerance process).

## **2.1 Feature selection**

The DCA requires input data to be represented as three input signals, namely PAMP, SS, and DS, as well as antigen representation (such as data ID's or attack type). Each input signal used by the algorithm denotes part of the context for the observations analyzed. As antigens in the immune system are organisms associated with disease, this signal category is related to the presence of attacks. Safe Signals are associated with the normal behavior of a biological cell life cycle. This signal category is related to normal behavior in the observed network communications. Danger Signals are emitted by cells and tissues that are stressed or damaged. This signal category indicates suspicious behavior in the network.

The preprocessing phase assigns a set of features from the original dataset to each of the signal categories (PAMP, SS, DS). This is commonly done by using expert knowledge or feature reduction methods such as PCA, Fuzzy Set Theory [18], or K-Nearest Neighbors [25]. In order to determine the features with the most influence [21, 26], the proposed approach relies on the information gain method, along with maximizing feature-class mutual information for signal categorization, followed by an average feature aggregation and normalization for each category. The information gain of an attribute *F* and a given dataset *S* is evaluated as shown in Eq. (1),

*Technology, Science and Culture - A Global Vision, Volume III*

$$G(\mathbb{S}, F) = H(\mathbb{S}) - \sum\_{\upsilon \in \text{naive}(F)} \frac{|\mathbb{S}\_{\upsilon}|}{|\mathbb{S}|} \* H(\mathbb{S}\_{\upsilon}) \tag{1}$$

where *values*(*F*) represents all the possible values of a given feature *F* in the set *S*, *Sv* ⊂ *S* where *v* is a potential value that attribute *F* may take, *G* is the information gain function and *H* represents the entropy of a system, as shown in Eq. (2),

$$H(\mathbf{S}) = \sum\_{i=1}^{2} -p\_i \* \log\_2 p\_i \tag{2}$$

where *pi* represents the probability of a given class *i* in the dataset *S*, based on the values of attribute *F*. High entropy implies the attribute provides a high amount of information about a feature in the dataset, high ranking attributes are preserved such as to have at least one feature per signal category. Each selected feature is assigned into one of the three signal categories, namely PAMP, DS, and SS. This is performed by performing feature-class mutual information maximization. Given two random features *F* and *C*, the mutual information among them *I*(*F*;*C*) is the amount of information that the feature *C* gives about *F*, as shown in Eq. (3),

$$I(F; \mathcal{C}) = \sum\_{f \in values(F), c \in values(\mathcal{C})} p(f, c) \* \log\left(\frac{p(f, c)}{p(f)p(c)}\right) \tag{3}$$

where *p*(*f*,*c*) represents the joint probability of attribute values *f* and *c*, *p*(*f*) and *p*(*c*) are the marginal probabilities. In order to categorize the selected features, the feature-class mutual information between each attribute and class is calculated. If a given attribute has higher mutual information with the normal class than it has with the anomalous class, it is categorized as SS. Conversely, if the attribute has higher mutual information with the anomalous class than with the normal class, it is categorized as PAMP. The remaining features are classified as DS.

The DCA contains a population of artificial Dendritic Cells, to simulate the behavior of biological cell context assessment capabilities in a human body. Each cell in the population has a predefined migration threshold (or lifespan). After which the cell does not sense signals or antigens. Its state is aggregated to the antigen repository used to classify after all observations have been processed. Algorithm initialization is performed in order to provide the detection phase with the required parameters, namely migration threshold and DC population size. The preprocessing phase is summarized in **Figure 1**. Dataset features are defined as *Dataset* ¼ f g *F*1, *F*2, … , *Ft* , *t* is the total number of dataset features. The information gain selected features *Ranked* ¼ f g *F*1, *F*2, … , *Fr* ⊆ *Dataset*, *r* is the total number of

**Figure 1.** *Dataset preprocessing.*

*Network Intrusion Detection Using Dendritic Cells and Danger Theory DOI: http://dx.doi.org/10.5772/intechopen.99973*

ranked features, are then compared against normal and anomalous data in order to generate three subsets of categorized features, namely danger signals f g *F*1, *F*2, … , *Fd* ⊂*Ranked*, safe signals f g *F*1, *F*2, … , *Fs* ⊂*Ranked*, and PAMP signals *F*1, *F*2, … , *Fp* � �⊂*Ranked*, *d*, *s*, *p*, are the total number of features for each signal category (DS, SS, and PAMP). Categorized features are averaged and normalized in the closed range of [0, 1], in order to generate the processed dataset, where only four predictors are present namely DS, SS, PAMP, and antigen representation.

### **2.2 Detection phase**

The detection phase aims to generate an antigen repository. This process is achieved after a population of artificial DCs (or agents) is created. The agent population performs signal (*PAMPi*, *DSi*, *SSi*, *i* ¼ 1, 2, … , *n*, n is the dataset size) and antigen (*α*) collection until a threshold is met. Antigen types that are collected by each cell are counted and stored as cell state signals *α<sup>g</sup>* , where *g* represents antigen categories. For each observation fed into the algorithm, the entirety of the DC population samples signals and antigens. The proposed approach incorporates cumulative signals known as Costimulatory Molecule Signal (CSM), Semi-mature Signal (smDC) and Mature Signal (mDC) [4]. These are defined in Eq. (4),

$$\mathbf{C\_{CSM,smDC,mDC}} = (\mathbf{W\_P} \ast \mathbf{C\_P}) + (\mathbf{W\_S} \ast \mathbf{C\_S}) + (\mathbf{W\_D} \ast \mathbf{C\_D}) \tag{4}$$

where *CCSM*,*smDC*,*mDC* represents the signal concentration for CSM, smDC, and mDC respectively, *WP*,*S*,*<sup>D</sup>* are the weights used for PAMP, SS, and DS [5, 27]. *CP*,*S*,*<sup>D</sup>* are the signal concentration values for each antigen sampled by the artificial DC. The role of CSM is to limit the time an artificial DC spends on antigen sampling by imitating the cell's lifespan (or signal collection limit). The smDC and mDC signals determine the cell context for the antigens collected in the DC population and are the basis used to generate the ^ *k* anomaly context. When a DC has exceeded the DC maturation threshold (set in algorithm initialization), it migrates to a separate DC pool where it no longer samples antigens. A new DC is created in the original DC population poll to always preserve the initial number of DCs. The deterministic DCA employs ^ *k*∈*R* to reflect the anomaly characteristic (or signature) of a migrated cell, this is shown in Eq. (5), where *s* represents the signals received by each artificial DC, *CmDC* and *CsmDC* are the intermediary mature and semi-mature signals respectively.

$$\hat{k} = \sum\_{1}^{s} \mathbf{C}\_{mDC} - \mathbf{C}\_{smDC} \tag{5}$$

After all data instances in the dataset have been processed, all migrated cells anomaly context and observed antigen count are summarized using *kα*, defined as the sum of all *k*<sup>ˆ</sup> *α* presented by each DC for antigen category *α*, in proportion to the amount of antigens presented in all migrated DCs, as defined in Eq. (6), where *m* represents the index of a DC in the migrated population.

$$K\_a = \frac{\sum\_m \hat{k}\_m}{\sum\_m a\_m} \tag{6}$$

### **2.3 Classification**

The classification phase generates a distinction criterion for all obtained *kα* anomaly signatures in the antigen repository. The DCA classification was based on a constant classification threshold [5, 6, 8]. This threshold was commonly set as a user-defined parameter, or derived from observations obtained in the detection phase. This approach is known to have issues [28], as the assigned threshold may not properly separate normal *kα*. The proposed model removes the use of such anomaly threshold, in favor of including a Decision Tree Classifier.

A Decision Tree (DT) is a supervised learning model commonly used for classification and regression tasks. The main objective of a DT is to build a model based on (simple) decision rules that are derived from data predictors. Decision Trees are commonly easy to understand, as they can be visualized. Some favorable characteristics of Decision Trees are low computational complexity for prediction, not requiring large amounts of observation to generate a model, and transparency (as generated rules can be visualized and understood). Decision Trees are also known to overfit. In order to solve this, several constraints and optimization features have been developed, such as pruning, sample number minimum for each leaf node, and maximum tree depth [29].

A Decision Tree is built in a sequential manner, where a set of simple tests are combined logically. For example, comparing a numeric value against a threshold or a specific range, or comparing a categorical value against a set of possible categorical values. As observation is compared against the set of rules generated by a DT, it is determined as belonging to the most frequent class present in that "region". A Decision Tree can be constructed using graphs, and can be expressed as shown in Eq. (7),

$$\mathbf{G} = (V, E) \tag{7}$$

where *E* ∈*V*<sup>2</sup> , *V* is a set of nodes, and *E* is a set of edges. The set of nodes *V* can be further described as the joint of three sets, namely *D*, *U*,*T*, where *D* are decision nodes, *U* are chance nodes, and *T* are terminal nodes, this set is expressed in Eq. (8). Decision nodes execute decision making, in which an action is selected. A chance node randomly selects a related edge. Terminal nodes are the end of action and chance nodes. Each edge contains a parent node association, as well as a child node. Decision Trees have further functions and conditions [30].

$$V = D \cup C \cup T \tag{8}$$

### **2.4 Proposed model**

The proposed model is summarized in **Figure 2**. Similar to the deterministic DCA approach, feature ranking is obtained by using Information Gain. Selected features are sorted into one of the three signal categories, namely SS, DS, and PAMP. Each feature set selected for each category is aggregated and normalized. Segment size, migration threshold, and DC population size are set as the algorithm initializes. Data from the processed dataset is fed to the algorithm sequentially, where a set of *DSi*, *SSi*, *PAMPi*, *<sup>α</sup>gi* , *<sup>i</sup>* <sup>¼</sup> 1, 2, … , *<sup>n</sup>*, *<sup>n</sup>* is the dataset size, and *<sup>g</sup>* is the antigen category for observation *i*. Each cell *DC*1, … , *DCp* in the DC population, where *p* is the amount of DC in the population, receives the same set of signals and antigen. An update process is performed to *CSMp*, *smDCp*, *mDCp*, *k<sup>α</sup>p*, *αgp*. After signal collection in the current iteration, the *CSM* status signal is compared against the migration threshold for all *DCp*. If the said threshold is surpassed, the *DCp* is migrated and does no longer perform signal and antigen collection. The accumulated status signals *k<sup>α</sup>m*, *αgm*, where *m* is the migrated population size, are accumulated into the antigen repository.

*Network Intrusion Detection Using Dendritic Cells and Danger Theory DOI: http://dx.doi.org/10.5772/intechopen.99973*

### **Figure 2.** *DCA with Decision Trees.*

Finally, all migrated DCs in the current iterations are reset. Classification is performed after all data elements are processed and by using a Decision Tree (DT). Stage (1) denotes Decision Tree model building. After the model has been built, testing can be performed by providing the testing dataset and starting the algorithm again. Stage (2) achieves classification by using the previously trained DT model after all data elements have been processed. Classification metrics are finally obtained to analyze the model performance.
