**2.5 ATOMIC: FireEye's framework for large scale clustering and associating APT threat actors**

Security firms like FireEye investigate many victim networks and collect IOC and group them together as uncategorised ("UNC") intrusion sets. Over time, this type of UNC sets are increasing rapidly, and security firms need to either merge these other APT groups or assign a new group name based on manual analysis. FireEye security

**Figure 6.** *APT malware ontology model [12].*

**Figure 7.** *High-level overview of APTMalInsight framework [12].*

researchers proposed an automated framework with the help of ML models to perform investigation, analysis, and rationale for the whole APT attribution process [13]. In this framework, the researchers suggest a document clustering approach using term frequency and - inverse document frequency method (TF-IDF). The TF-IDF algorithm assigns more importance to a term if the word often appears in the document. Similarly, if the term appears common across all the documents, the algorithm decreases its importance. This method favours unique terms like custom malware families, which may appear in just a few classes, and downplays popular terms like 'phishing', which appear more often. After calculating scores using the TF-IDF algorithm, each UNC group is converted into a vector representation, and researchers calculate cosine similarity between these APT groups as shown in **Figure 8**, respectively. As angle between the two vectors decreases, they tend to become parallel. The decrease in the angle helps the researchers to determine the extent of similarity

**Figure 8.** *Cosine similarity between different un-attributed APT groups [13].*

*DMAPT: Study of Data Mining and Machine Learning Techniques in Advanced Persistent Threat… DOI: http://dx.doi.org/10.5772/intechopen.99291*

between two different APT groups. Based on this idea, FireEye automated the whole process of APT attribution and merging different uncategorised groups.
