**3. Experimental work**

The proposed model was tested using the NSL-KDD and the University of New South Wales (UNSW-NB15) datasets. The dataset preprocessing and algorithm was developed using the MATLAB R2020 environment and executed in a computer running the Linux operating system with an Intel Core i7 8700 CPU and 16.0 GB of RAM. A confusion matrix is used to describe performance. For a binary classifier, the confusion matrix consists of positive and negative classes. The positive class refers to any anomaly (attack) present in the dataset. The negative class refers to normal behavior. In order to generate a confusion matrix, the classified records are compared against the dataset actual classes (i.e., ground truth). The anomalous records correctly classified are called True Positives (TP). When TP records are wrongfully classified, they are False Negatives (FN). In the case of normal behavior, correctly classified records are known as True Negatives (TN). Wrongfully

classified normal records are known as False Positives (FP). The resulting performance metrics are then used to generate statistic measures for further analysis and comparison, namely precision, sensitivity, specificity, and accuracy. Precision reflects the proportion of correct classifications and is given by Eq. (9). Sensitivity (also known as TP rate) refers to the proportion of correctly classified anomalies and is given by Eq. (10). In contrast, specificity (or TN rate) is the proportion of correctly classified normal behavior, given by Eq. (11). Finally, accuracy reflects the proportion of true results, either of anomaly or normal behavior, and is given by Eq. (12).

$$Precision = \frac{TP}{TP + FP} \tag{9}$$

$$\text{Sensitivity} = \frac{TP}{TP + FN} \tag{10}$$

$$\text{Specificity} = \frac{\text{TN}}{\text{TN} + \text{FP}} \tag{11}$$

$$Accuracy = \frac{TP + TN}{TP + TN + FP + FN} \tag{12}$$

### **3.1 Dataset description**

The UNSW-NB15 is a publicly available dataset [31]. It contains nine different attack types, namely Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, and Worms, as well as normal traffic. The dataset is divided into train and test sets. The training set contains 175,341 records (119,341 anomalous and 56,000 normal). The testing set, conversely, contains 82,332 records (45,332 anomalous and 37,000 normal). Two tools (Argus and Bro-IDS) along with 12 developed algorithms were used to generate 49 different features, which are categorized into flow features, content features, time features, basic features, and additionally generated features. Statistical analysis, feature correlation, and complexity were evaluated and showed the train and test sets to be of similar distributions [13].

The KDD-99 dataset was developed for the Third International Knowledge Discovery and Data Mining Tools Competition and is publicly available. It was generated to support NIDS development by simulating several intrusions in a military network environment. This dataset contains four attack types, namely Denial of Service (DoS), Probe, User to Root Attack (U2R), and Remote to Local Attack (R2L), and normal traffic. The dataset is divided into two subsets, namely train, and test. The train set contains 494,021 records (97,278 normal and 396,743 anomalous). The test set consists of 311,029 records (60,593 normal and 250,436 anomalous). In total, 41 features for each connection were generated. This dataset was widely used in IDS research. However, it has been the subject of wide criticism due to the probability distribution of the records in the testing set, as well as inconsistencies in the values of the training and testing sets. This has led to an unbalance in normal and anomalous observations, as well as several duplicate data instances [31, 32].

The NSL-KDD [32] is a publicly available dataset developed by the Canadian Institute for Cybersecurity. It was created to solve two main problems of the KDD-99 dataset, namely the distribution of the attacks in the train and test sets, and the over-inclusion of Denial of Service (DoS) attack types (*neptune* and *smurf*) in the test dataset. This dataset also provides the following improvements. The omission of *Network Intrusion Detection Using Dendritic Cells and Danger Theory DOI: http://dx.doi.org/10.5772/intechopen.99973*

redundant or duplicate records in the train and test sets, the balancing of records for the train and tests sets, in order to avoid dataset sub-sampling, to reduce computational time in model testing. Given this dataset is an improved version of the KDD-99, it has the same features and attack types. The complete training dataset contains 125,973 features (58,630 anomalous and 67,343 normal). There is a reduced version of the train set (KDD + Train 20%) that contains a 20% subset of the training set. The full testing dataset contains 22,544 records (12,833 anomalous and 9711 normal). Additionally, there exists a testing dataset that does not include records that were not validated by all 21 classifiers used to match the KDD-99 ground truth labels in the dataset creation [32]. The attack types for the presented datasets are detailed in **Table 1**.

### **3.2 Dataset preprocessing**

As part of the proposed model phases, dataset preprocessing was performed by ranking the most relevant features to be used for each signal category (PAMP, Safe, and Danger) required for the DCA. The feature ranking, selection, and categorization were based on information gain and feature-class mutual information maximization [21]. As a result, 10 and 17 features were selected for the NSL-KDD and UNSW-NB15 datasets respectively, as shown in **Tables 2** and **3**. Anomalous records of any category are labeled as one, whereas normal records are labeled as zero, to fit binary classification constraints. The selected features were combined by performing normalization in the range from zero to one. Each signal category is equal to the average of its corresponding features, similar to the approach in [21]. Antigen representation was achieved by using several dataset categorical features to


### **Table 1.**

*Attack types and descriptions for NSL-KDD and UNSW-NB15 datasets.*

## *Technology, Science and Culture - A Global Vision, Volume III*


### **Table 2.**

*Feature descriptions for signal categorization, NSL-KDD dataset.*


### **Table 3.**

*Feature descriptions for signal categorization, UNSW-NB15 dataset.*

generate antigen categories. Attack categories can be compared to biological antigens invading the body, as they tend to have similar patterns and can also attack recurrently [33].
