**1. Introduction**

This chapter introduces the data mining and the clustering algorithm, which is unsupervised learning among machine learning techniques. In this chapter, we analyze the performed clustering application research that used the air pollution concentration data. It has been a problem recently. The most popular algorithm among the clustering is the K-means clustering algorithm; it represents a data cluster. It is an essential factor that finds an appropriate K value for the distribution of the training dataset. Commonly, we determine the K value experimentally, and at this point, we can set the value using the elbow technique.

One example of clustering application studies is the air pollution concentration clustering algorithm. Air pollution is a substance that causes respiratory diseases and cancer, and the WHO reported the severity of the particulate matter [1–3]. The Korean government has also started providing particulate matter and air pollution information since 2004. On the AirKorea website, we can obtain air pollution information measured at 353 observatories in real-time [4].

Currently, observatories of air pollution in Korea are mainly located in Seoul and Gyeonggi-do, so it is challenging to know accurate air pollution values in local small towns without observatories. Therefore, in this chapter, we study the clustering method for air pollution observatory according to the air pollution concentration. We first split the air pollution-centered regions that can predict the distribution of air pollution by using K-means clustering. Then, we find the optimal station location according to the distribution of air pollution concentrations. Based on the optimal location, we divide the territory of the Korean.

We collect air pollution data in April 2020 and label air pollution monitoring stations through clustering algorithms for this clustering study. Based on the cluster center point, we can apply the Voronoi algorithm to divide the territory of Korea. With this method, we can classify air pollution areas by considering the concentration distribution of air pollution, unlike traditional administrative districts. Furthermore, this method can help know the air pollution distribution in the shaded area without air pollution [5, 6].
