**3. Air pollution data mining**

The algorithm for air pollution concentration clustering performs clustering through observatories' location and measurement values. We can utilize air pollution information in AirKorea provided by the Korea Environment Corporation. The values of NO2, SO2, CO, O3, PM10, PM25 are upload every hour [13].

In this chapter, we use air pollution information measured in April 2020. First, we download national data on a website to create a feature dataset. And then, we should be converted the address of the observatory to latitude and longitude coordinates of the WGS84 coordinate system. We used Kakao Map API [14]. **Figure 1** shows the used dataset. For example, the 1 row is air pollution code is 111121, a date is April 1, 2020, and the location is the nearby city hall of Seoul.

The feature dataset used for air pollution clustering is latitude and longitude, NO2, SO2, CO, O3, PM10, PM25. We calculate an average observatory data to make oneday data into 1-hour data. Also, we filled in the missing values for each station by obtaining the value of the other stations closest to it. Pseudocode 1 is performing this process.

```
SET myData to READ(fileName)
SET feature to ['latitude', 'longitude', 'PM10', 'SO2', 'CO', 'O3', 'NO2', 'PM25']
IF type of 'date' in myData is string THEN
  convert datetime type to string type of 'date'
ELSE
  PASS
ENDIF
SET group to 'feature data' by 'date' in 'station code'
CALCULATE AVERAGE 'feature data' by group
```
**Pseudocode 1.** *The process of making the dataset by the day.*
