**4.3 Big data investigation**

The Hadoop context requires structured and planned entry for execution. The control for investigation comes from the central digital investigation phases and the management constraints. At any step acquired evidence can include different types of forensic data but the strategy is to organize the data into category and class nodes, and also data nodes. This organization and technical capability structures the data fields to optimize access at each phase of the central investigation plan. Live and dead nodes are discovered in a Hadoop architecture. They both contribute the necessary information needed to complete the digital forensic investigation on big data volumes. Nodes information is identified based on the different levels described in **Table 1**, such as node name with port number and IP address, last contact, admin state and additional information related to the data management and storage time and structure features. The scope includes all the logs created and stored on the cluster which contain the log files of data nodes, name nodes, secondary name nodes, the history server, user logs, the node manager, and the resource manager for all nodes. These files are vital for the process of hypotheses examination. To examine the Hadoop cluster, multimedia data acquisition techniques are used for the search and data collection. Data acquisition comes as a bit-by-bit copy of the content such as journal status, storage, log files, images, directories and logical database objects. The forensic examination is conducted through extracting system and nodes information using a range of proprietary and open source tools that are all selected and customized for the media type and performance. In this way the investigation phases can be executed in the big data context.
