**Abstract**

Coal and host rock, including the gangue dump, are important sources of toxic elements, which have high-contaminating potential to surface and groundwater. Surface water in the coal mine area and groundwater in the active or abandoned coal mines have been observed to be polluted by trace elements, such as arsenic, mercury, lead, selenium, cadmium. It is helpful to control pollution caused by the trace elements by understanding the leaching behavior and mechanism. The leaching and migration of the trace elements are controlled mainly by two factors, trace elements' occurrence and the surrounding environment. The traditional method to investigate elements' occurrence and leaching mechanism is based on the geochemical method. In this research, the data mining method was applied to find the relationship and patterns, which is concealed in the data matrix. From the geochemical point of view, the patterns mean the occurrence and leaching mechanism of trace elements from coal and host rock. An unsupervised machine learning method, principal component analysis was applied to reduce dimensions of data matrix of solid and liquid samples, and then, the re-calculated data were clustered to find its co-existing pattern using the method of Gaussian mixture model.

**Keywords:** coal, host rock, occurrence, principal component analysis, Gaussian mixture model
