**3.5 Discussion**

The deterministic DCA performs context assessment by using a population of artificial DCs. Each element in the dataset is sequentially processed. All cells in the population receive the same signals and antigens for the current iteration. When a cell migration threshold is met, a cell does not receive any new signals or antigens and its antigen context values, namely the accumulated antigen signature of all cells that migrated in the current iteration, for each antigen type *α* (*k*<sup>ˆ</sup> *α*) and the sum of antigens *α* received by cell in its lifetime *s*^*α*. Said outputs are accumulated in an antigen repository. All cells in the population are able to determine a spatial correlation between signals and antigen types *α* by using coefficient ^ *kα*, as the accumulated difference of two linear functions, namely *smDC* and *mDC*. Antigen type *α* is determined in the dataset preprocessing phase and can be a distinctive categorical feature that represents similar observations (i.e., attack type, protocol, source port, etc.). Once all signals and antigens in the dataset have been processed, the anomaly metric coefficient *kα* is obtained, and is given as the relation between the sum of all *k* for each antigen type *α* and the amount of times antigen category *α* was sensed by any migrated DC. For classification, the DCA proposed a constant classification


**Table 7.**

*Proposed model comparison with machine learning models.*

threshold, based on the collected data [6]. Any antigen category *α* above said threshold is considered an anomaly. As the threshold calculated using the proposed equation for the deterministic DCA is a constant, it may be prone to large classification penalties when any antigen category is miss-classified, as all instances in the dataset that present this antigen category are affected. This issue may increase when antigen category count is low, or as a large dataset is processed and *kα* tends to have low variance. When the count of signal instances is large enough, the classification threshold tends to zero, and even though the normal antigen category (or categories) may be linearly separable, the classification threshold may not be adequate. This is further worsened if the mean of safe signals is greater or equal to the mean of danger signals, as Equation *k<sup>α</sup>* ∈ *R* can produce negative values. To solve this, the proposed model builds a DT classifier after the detection phase. The decision rules derived from this model are used to classify the antigen repository, generated from all migrated DCs. The proposed model aims to avoid the dependability on a linear classification threshold, as DT can perform classification using a non-linear approach.

The presented computation time results are related to the computational complexity, where the deterministic DCA presents a big O notation of *O*(*n*<sup>2</sup> ) for a worst-case scenario. Computational complexity increased with the incorporation of a DT classifier in the classification phase. As *N* (DC population size) changes, the DT construction does not present an increment or reduction in computation time, since all antigen signatures are summarized in the antigen repository of size *m*. Conversely, increasing the amount of antigen types *m* presents an increment in computation time. The main drawback of this model resides on the dependence on the DC migration threshold, dataset size, and antigen categories. It is necessary to provide a migration threshold that does not cause cells to migrate prematurely or late, as the over and under-sampling of signals in a migrated cell tends to cause classification errors or reduce antigen signature separability. This affects the DT classifier as it may not be able to assess several signatures of similar magnitudes, and all observations presenting this antigen category are thus incorrectly classified. To decrease this likelihood, it is necessary to provide dataset selection and signal categorization that can produce a relatively low average migration rate. The classification threshold proposed in the deterministic DCA is also highly dependent on the amount of observations and attack distribution in the observed data for training. The proposed model introduced an increase in computational cost. One final issue is, as Decision Trees receive a large number of observations for training, it is known to over-fit, as well as when dealing with high dimensional problems. Further DT optimization procedures in relation to dataset features may need to be implemented to solve such issues.

## **4. Conclusions**

Anomaly detection in computer networks is a complex task that requires the distinction of normality and anomaly. Artificial Immune Systems are biologically inspired computational models designed for the development of Intrusion Detection Systems. The Dendritic Cell Algorithm (DCA) is a population-based binary classifier, initially designed for network anomaly detection. The proposed model was inspired by the behavior of Dendritic Cells and immune Danger Theory. This research proposed solutions to two relevant anomaly detection challenges, namely feature selection and generalization capabilities to improve classification performance. The proposed model was based on the DCA and incorporated Decision Trees for the classification phase. Two publicly available datasets, namely UNSW- *Network Intrusion Detection Using Dendritic Cells and Danger Theory DOI: http://dx.doi.org/10.5772/intechopen.99973*

NB15 and NSL-KDD, were used. The model was trained using each training set provided. A comparison to assess the accuracy of other DCA models, along with state of the art approaches for network anomaly detection was performed. The proposed approach achieved a 97.25% accuracy, with the contemporary UNSW-NB15 dataset, and provided competitive results when compared to other state of the art machine learning approaches. The results using the NSL-KDD dataset achieved a 93.28% accuracy and surpassed machine learning methods, such as Artificial Neural Network and Random Forest. The proposed model was able to surpass other contemporary proposals using the DCA. Relevant challenges derived from the results obtained are the following. The potential of large miss classification due to the low amount of antigen categories; model dependence on migration threshold and their relationship with dataset features; lack of online detection; dependence on a large amount of observations to perform classification; as well as the lack of multi-class classification. There have been several proposals to address some of the presented issues, such as a variable functional migration threshold function [23], and signal categorization optimizations [22]. Said approaches need to be analyzed to further improve the proposed model. Multi-resolution analysis may provide insight to solve some of the mentioned challenges, such as reducing dependence on feature selection and multi-class classification. The proposal of a segmented version of the DCA [7] may provide a framework to implement online classification, reduction of computational complexity, and further increase the model learning capabilities. Although other proposals have included the use of machine learning techniques to perform classification in the DCA [34], the proposed method provides a starting point to incorporate a robust feature selection and classification mechanism to the ongoing research and development challenges of the DCA.

## **Acknowledgements**

The authors of this paper would like to thank the Mexican National Council of Science and Technology (CONACYT), as well as the Universidad de las Americas Puebla, Mexico, for providing funding for this research.
