**4. Conclusions**

The techniques of data mining are widely used to trace sources of TEs in water and solid matrix.

In water environment, ground water and surface water have relation in the flow network. Human activities, especially for the mining, change the natural reaction environment, releasing trace element into ground water and surface water. Then the sediment in river and lake may be contaminated and be a source to water that may release trace element again. Soil, dust, and air particles may be influenced by varies of human activities, especially in the urban and industrial area. The TE composition is different depending on the environmental media type, human activities, land use type, etc. However, some environmental concern element, As,

Coal mining impact of air pollution, including suspended particles was investigated in India. The PCA and CA results suggested PC1 represent PM 10, SO2, PM 2.5, PM 1.0, Ni and Cu, which are originate from coal burning and active mine fire. PC2 was high loaded with NO2, Pb, Cd and Cr, and originated from crude oil combustion and vehicular emission. The PC3, including Fe and Mn, was mainly

In the USA, brake wear, tire wear, fertilized soil, and resuspended soil were found to be important sources of copper, zinc, phosphorus, and silicon, respectively, using the method of positive matrix factorization. Zn was found strongly related to tire wear but also contributed to the Pb-rich features and soil. At the same time, the Pb-rich contributions are highly correlated with the tire wear, elevated P

Brinkman et al. compared the performance of PCA and PMF on the source apportionment for the particle matters. It was found that most of the PCA factors were easily distinguishable from others by sharp differences in the factor loadings. For many individual compounds, the variance was explained primarily by a single factor. In contrast, the factors obtained with PMF were more difficult to distinguish because anticipated tracer compounds for certain sources appeared in multiple PMF

Applications and implementations of multivariate analysis/data mining, combining with geochemical method, on source apportionment of trace element as contaminates in environmental medias are increasing, with the development of techniques of big data, machine learning, and computer software. Four environ-

Four types of application can be identified for water contamination: trace the source of TEs, evaluate water quality of surface water and ground water, identify intrusion in coal mines and other scenario, and find and quantify water relationship between different bodies, such as surface-ground water relationship. The sediment and water composite a reaction system, i.e., the sediment could be origin, sink of trace elements in water, or be sink at first step, then origin again. Therefore, the system should be analyzed together. The researches on sediment are less than water,

The most used method for the source apportionment of TEs in water and sediment is principal component analysis (PCA), probably for it's easy to use and explain. With the developing of data mining algorithm and calculation software, the application of PCA become easier and more efficient. The similar method, factor analysis (FA) is also used. The PCA and FA are both unsupervised ML method. Although having less accuracy than the supervised method, these methods are suitable for this topic. Supervised ML methods are also used in this area, though much less than the unsupervised ML methods, and its scope of application is different. For example, decision tree is used to classify the sample types [38]. Discriminant analysis is also a supervised method, its implementation can be found, especially on the identifying water inrush source in coal mines, as the labeled data can be obtained [32, 33]. In this sense, other supervised machine learning method, ANN, support vector machine, decision tree, can also be used to identify water inrush source. Usually, ANN need more data to improve predicting quality, than SVM and decision tree. In order to combing the advantages of unsupervised and supervised machine learning methods, semi-supervised method has been introduced and implemented on this topic [52]. At present, related researches are rare, but promising reports are

contributed by earth crust, wind-blown soil, and coal fly ash [60].

*Trace Metals in the Environment - New Approaches and Recent Advances*

factors [65].

expected.

**20**

contributions within the fertilized soil as well as the Pb-rich feature [68].

**3.5 Summary of method used to identify source of contaminates**

mental medias, water, sediment, soil, and particles are discussed.

and most of articles on this topic are from China.

Pb, Cd, Hg, Cr, are frequently found in water, sediment, soil, and particle, showing high mobility and contaminating potential on environment.

The unsupervised machine learning algorithm, including principal component analysis, factor analysis, positive matrix fractionation is mostly used. The PCA is used in water is to find contamination source of trace element, and sometimes water inrush in coal mines. In the air particle researches, PCA and PMF are frequently used to trace the source of PM 2.5 and PM 10, and the TEs source in the particle sources. Some supervised algorithm, including discrimination analysis, Bayesian network, artificial neural network, decision tree is used when the data are labeled.

Generally speaking, the most popular methods used to apportion the source of trace elements as contaminants are unsupervised ML techniques, especially the principal component analysis. In a wider scope, supervised ML is a big tool box for investigations and researches, which is frequently applied and implemented in the areas of science and society. The supervised ML usually gives more accuracy and robust result than the unsupervised ML. In the area of trace element apportionment, some factors constrain the implementation of supervised ML techniques, as the sources are usually not known. However, some techniques are promising to treat the issues of trace element apportionment. First, the supervised ML methods could be implemented more frequently. The unsupervised ML methods are used in the first step. With the intensive research, as some sources have been identified, the supervised ML methods could be used. For example, water inrush is sometimes a threaten in some Chinese coal mines. As the potential source of inrush can be identified, supervised ML method, discriminant analysis is used to determine the water type of inrush, then the corresponding technologies to deal with the threaten or accidents could be implemented. At this stage, some other supervised ML method could also be used. However, the discriminant analysis was mostly used. Second, semi supervised ML may be used implemented more. This method is a series of relative novel techniques. Once more data is obtained in an investigation or research, the semi-supervised ML may be used. In a sense, this method combines the unsupervised and supervised techniques in one implementation. Third, the machine learning method could be combined with geochemical method together. Two technique system have their advantages and disadvantages, the combination may achieve its maximum consequences and efficiency.
