**3.1 Classification using association rules**

*Security and Privacy From a Legal, Ethical, and Technical Perspective*

learning system like a signature-based intrusion detection system.

**3. Machine learning in misuse or signature detection**

"if-then" sequence as shown in **Figure 1**.

*Sequence of execution of misuse or signature detection modules.*

The efficacy of misuse or signature detection system largely depends on the completeness and sufficiency of the knowledge of attack patterns and signatures captured in the attack signature database of the system. It is a nontrivial task to capture and represent the knowledge of attacks and system vulnerabilities in a cyberinfrastructure or in a network of computing machines, and the job heavily depends on domain experts. Since the knowledge and skills of domain experts may vary significantly from person to person, the design of signature detection systems, quite often, can be incomplete and inaccurate. Moreover, a slight variation, evolution, blending, or a combination of already known attacks can make signature detection an impossible task. This is a typical problem with any similarity-based

**Figure 2** depicts the working mechanism of misuse or signature detection consists of five major steps: (i) data collection, (ii) data preprocessing, (iii) misuse or signature identification using a matching algorithm, (iv) rules regeneration and (v) denial of service (DoS) or other security response strategy. In most of the cases, the data sources are: network and host audit logs, packets transmitting over the network, and windows registry. Data preprocessing is a critical step that prepares the raw data for learning patterns. These steps involve the reduction of noise by eliminating outliers, normalizing or standardizing of data, and finally selecting and extracting features. After the data preprocessing step is over, an automatic intelligent learning system is deployed to build a learning model and extract rules using prior knowledge of the execution of malicious programs, network traffic data, and vulnerabilities in network infrastructure. The model is now ready for signature and misuse detection. The learned classification model is applied to the incoming network traffic for signature detection. If any part of the network traffic is found to be similar to attack patterns learned by the model, then an alarm is raised and the traffic is further analyzed for identifying whether it is really an attack or a false alarm. Consequently, misuse or signature detection can be simply understood as an

**158**

**Figure 2.**

Agrawal et al. proposed an elegant approach to discover underlying association rules to identify and then establish causal relationships among attributes that may exist in a multidimensional database [1]. Association rules mining identifies the frequent existing patterns in a dataset. This may help, for example, in designing algorithms for a computer antivirus software. A computer antivirus attempts to identify viruses that exhibit some frequently occurring patterns in a transaction dataset. The use of association rules mining and frequently occurring episodes from the computer audit data and exploiting those rules in feature selection had also been described in the literature [2]. Fuzzy association rules were designed for misuse and signature detection on 1998 DARPA intrusion detection dataset [3]. For the purpose of feature selection, 41 features were extracted for each connection record that included 24 different attack types. The attack traffic in the network was essentially of four types: (i) denial of service (DoS), (ii) remote to user (R2L), (iii) user to root and (iv) probes. Including the normal traffic in the network, the association rule mining algorithms extracted the essential features of five types of network data—four categories of attack traffic and one type of normal traffic.
