**4. Methodology**

This chapter aims to provide the research methodology roadmap designed to meet the objectives of this paper. It is important to select an appropriate method to ensure the accuracy, validity, and quality of data and findings. This chapter shows the method chosen, the tools used to extract data and data analysis. Hence, the phenomenological concept is focused on preparing data and reference [23]. A research method refers to how data can be collected and analyzed, such as data analysis software.

This paper used ethnography as the researcher was directly involved in preparing messy data on the dataset. Ethnography is usually described as participant observation, and this was where the researcher became actively involved, demonstrating the data preparation.

A single case approach was chosen for this paper to be the suitable method for executing data preparation into a single organization. It was not done to represent other same organization using data mining analysis. It was using the quantitative and qualitative method to explore data preparation. It began with a data collection approach to the analysis of data preparation. Although, it may be possible to generalize this paper.

The company set the principles of ethics, which was honored by the researcher. The company was informed that participating in this demonstration was voluntary and would not impact the company's brand. Ensuring anonymity, the paper removed some information that would be manipulation to favor the competitors [24]. Thus, the name of the organization is referred to as company A. Public information that could have damaged the company authenticity that could result in negative was removed.

#### **4.1 Company description**

Company A is one of the leading companies in producing steels. This company is situated in Alberton, where most of the production industries are built. It has a

history of making several sheets of steel at a high rate. It increases the data in the dataset, not only proper data but also messy or dirty data. The company was selected due to its nature of producing a high number of products. Therefore, it was suitable for this research, which is dealing with data.

#### **4.2 Data collection**

Data collection is the method of gathering observations or collecting information using standard validated techniques. It is important to collect data to understand what can be done using it. Data collection consisted of two sources, which is primary and secondary data. Primary data refers to raw data collected. Secondary data is data that is already collected. Therefore, this paper selected secondary as company A already collected its data using sensors embedded in their machines into databases, repositories, and data warehouses.

The researcher extracted the dataset from the repository of company A based on the experience obtained through training in extracting data. This potential skill has helped the researcher to use data mining tool for preparing data. This was done during the period month of February and March 2021. Datasets were sent by company A to the researcher as the active participant in preparing data datasets due to the coronavirus pandemic. The datasets that were sent contained the machine, alarm data, and sensor data.

## **4.3 Data analysis**

Data analysis is the process of systematically applying statistical and technique to evaluate data. According to [25], this type of research whereby data gathered is categorized into themes and sub-themes. Analysis helps data collected being reduced and simplified while at the same time producing results that may then measure using quantitative techniques. Moreover, the analysis provides the ability to the researcher to structure the qualitative data to satisfy the accomplishment of the paper objectives. The researcher installed a data mining tool as an "Add-on" to the Microsoft Excel spreadsheet. Microsoft Excel is a powerful tool for handling large data [26]. It consists of a grid with columns and rows that store data from resources of data. Data mining employed to arrange and remove inconsistencies that were on the datasets. Data mining was performed into a Microsoft Excel spreadsheet to prepare data for its readiness to be used for specific purposes. The resources were used in this paper are computer, Microsoft Excel spreadsheet, and data mining tool.
