**3.3 Anomaly detection in log data using graph databases and machine learning to defend advanced persistent threats**

Schindler et al. proposed an APT detection engine based on the principles of APT kill chain phases [16]. In this work, SIEM logs were considered as data source. The correlation is identified between the event logs and the phases of APT kill chain. An adapted kill chain model is constructed to identify the possible attack vectors from the SIEM event logs. This model is implemented at two different levels.

*DMAPT: Study of Data Mining and Machine Learning Techniques in Advanced Persistent Threat… DOI: http://dx.doi.org/10.5772/intechopen.99291*

Level-1 deals with graph-based forensic analysis where logs from different programs are aggregated based on timestamp to identify events with in the network. A directed graph is constructed from the multiple layers of event sequences. Each event sequence reveals whether the event flow matches with the partial/full phases of the APT kill chain.

Level-2 helps in identifying various anomalous activities using the Machine Learning approach. An ML classifier is constructed to make the model robust in detecting APT events along with the graph model. Authors considered "one-class SVM" as the classifier model and used windows logs, firewall logs, file audit logs of benign system programs as its data source. This model is expected to identify all the events that differ from the benign programs.

The proposed model achieved a decent accuracy score of 95.33% in detecting APT events. However, considering the case of smart malware where malicious programs mimic normal user behaviour, the proposed model tends to produce a relatively high false-positives.
