**3. Literature review**

Data preparation corrects inconsistent data in the dataset to prepare quality data [5]. Research indicates that data preparation in data mining formulates a workflow process covering steps to prepare data [6]. However, some research suggested that data preparation begins with data collection to check data quality [7]. This paper aims to demonstrate the evolution of collecting data into preparation steps to influence data quality. The paper examines the data preparation in data mining processes through data collection.

## **3.1 Data collection**

Data mining is often described as an add-on software in checking the data quality in the dataset by searching through the large amount of data stored in databases, repositories, and data warehouses. The data stored is believed to be too messy, inconsistent, and have errors; it is unclear information to analysts and users, make it difficult to be ready to be used for its specific purposes [8]. Overloaded data limit analysts and users; thus, software such as data mining is developed to solve this challenge through automation.

The data mining software uses recognition technologies and statistical techniques to clean messy data and discover the possible rule to govern data in databases, repositories, and data warehouses. Data mining considers the process that requires goals and objectives to be specified [9]. Once the intended goals met, it is necessary to determine what data is collected or available. However, before data is used, data preparation is performed, making data ready for its purposes.

The concept that strategic or effective decisions are based on appropriate data is not new. Finding the correct data for strategic decisions began 30 years ago [10]. During the late 1960s, organizations create reports from production sensors into databases, repositories, and data warehouses. These resources stored data to retrieved and manipulate to produce constructive reports containing information to meet specific strategic decision needs.

In the 1980s, analysts and users began to need data more frequently and to be more individualized. Thus, the organizations started to request data in the resources. Later in the 1990s, analysts and users required immediate access to be more detailed information. This meant to correlate with production and strategic decisions processes. It has helped the analysts and users extract its data from databases, repositories, and data warehouses.

The analysts and users began to realize the need for more tools to prepare data for future uses. Additionally, the organizations recognized the accumulated amount of data; thus, new tools to prepare data before meeting their needs. Such tools enabled the system to search for any possible errors and inconsistencies in the dataset. Data mining software was the first developed to help analysts and users to find quality data from a voluminous amount of data. Because the massive volume of data keeps rapidly growing, preparation methods are urgently needed. Therefore, data mining has become an increasingly important research field [11].
