**2. Deep neural networks for coagulation control and monitoring**

#### **2.1. Overview of artificial neural networks and deep learning**

Since their origin in 1943 [27], artificial neural networks (ANNs) have been used to provide the best solutions to large various nonlinear problems. The ANNs have generated a lot of motivation of machine learning research and industry, thanks to many progress results in robotic processing [28], object recognition [29], speech and handwriting recognition [30], and even real time sign-language translation [31]. Overall, an artificial neural network can be described by three main parts: (1) Input nodes that provide information from the external source (signals, features, image, and measurements) to the network. Each input has an associated weight, which is assigned on the basis of its relative importance to other nodes. These input nodes are usually normalized via the activation functions, which perform a certain fixed mathematical operation. (2) Hidden nodes responsible for extracting patterns associated with the process or system are being analyzed. These layers perform most of the internal processing from a network. (3) Output nodes are collectively responsible for computations (processing performed by the neurons in the previous layers) and producing the final network outputs. Depending on the arrangement of neurons and their interconnection via the processing layers, the main architectures of artificial neural networks can be divided as follows: single-layer feedforward network (as example, perceptron and the ADALINE), multilayer feedforward networks (multilayer perceptron (MLP) and the radial basis function (RBF)), recurrent networks, and mesh networks (The Self-Organizing Map the main representative of mesh architectures).

about what has been calculated so far. We can say that RNNs have a "memory". Long short term memory (LSTM) and gated recurrent unit (GRU) network are two variants generally

A Survey of Deep Learning Methods for WTP Control and Monitoring

http://dx.doi.org/10.5772/intechopen.77196

299

• Convolutional neural network (ConvNet) is a special kind of feedforward neural network with convolution layers and pooling operations. Each neuron receives some inputs, performs a dot product, and optionally follows it with a nonlinearity. ConvNet architecture posits an explicit assumption that the inputs are images, which provide (encode) certain properties into the architecture. These then allow to perform two things: easily and efficiently implement the forward function and vastly reduce the amount of parameters in the

• Restricted Boltzmann machine (RBM) is a parameterized generative model representing a probability distribution. Boltzmann machine consist of two types of layers, so called visible and hidden neurons. The visible layer corresponds to the components of an observation. The hidden layer models dependencies between the components of observations (for a digital input image, one visible unit for each pixel). Restricted means that there are no

• Auto-encoder (AE) is an unsupervised model pretraining that has three layers: an input layer, an encoding (hidden) layer, and a decoding layer. The AE model is trained to reconstruct its inputs, which forces the hidden layer to try to learn good representations of the inputs. The learned representation of auto-encoder can be used for dimensionality reduction and can be used as a feature for another task. There are many variants of autoencoders such as denoising auto-encoder, marginalized denoising auto-encoder, sparse

• Deep semantic similarity model (DSSM) has developed for representing text strings (sentences, queries, predicates, entity mentions, etc.) in a common low-dimensional semantic space and measuring their semantic similarities. DSSM is frequently used in various applications including information retrieval and Web search ranking, contextual entity search

• Neural autoregressive distribution estimation (NADE) is an unsupervised neural network which is inspired by RBM but uses feed-forward neural network and the framework of auto-regression for modeling the probability and density distribution of binary variables

• Generative adversarial network (GAN) is a generative neural network comprised of two nets: a discriminator and a generator, pitting one against the other, e.g., the two neural networks are trained simultaneously by competing with each other in a minimax game framework.

In the article, Zhang et al. [38] provide a comprehensive review of recent research efforts on deep learning–based recommender systems toward fostering innovations of recommender system research. A taxonomy of deep learning–based recommendation models is presented and used to categorize the surveyed articles. Wide and deep learning (WDL) is one of the models presented in this paper. This model can improve the accuracy, as well as the diversity of recommendation. The WDL (shown in **Figure 2**) can solve both problems, regression and

auto-encoder, contractive auto-encoder, and variational auto-encoder (VAE).

chosen to solve the vanishing gradient problem.

intra-layer communications in visible layer or hidden layer.

and interestingness, image captioning, etc.

in high-dimensional vectors.

network.

Despite the idea that deeper architectures would provide better results compared that are shallower already used, empirical tests with deep networks had found similar or even worse results when compared to networks with only one or two layers [32, 33]. Training was also found to be difficult and often inefficient [33]. Finally, this concept started to change with the proposal of greedy layer-wise unsupervised learning [34], which allowed for the fast learning of deep belief networks and solving the vanishing gradients problem. Thus, since 2006, deep learning's revolutionary advances in speech recognition, image analysis, and natural language processing have gained significant attention. With the ever-growing volume, complexity, and dynamicity of online information, deep learning approach has been an effective key solution to overcome such information overload. Recent studies also demonstrate its effectiveness in anomaly detection and prediction tasks. Deep learning is a sub research field of machine learning. It learns multiple levels of representations and abstractions from data, which can solve both supervised and unsupervised learning tasks [35]. In this subsection, we briefly review the key array of deep learning concepts using some of the sources included in the annotated section of the bibliography [36–38]:


about what has been calculated so far. We can say that RNNs have a "memory". Long short term memory (LSTM) and gated recurrent unit (GRU) network are two variants generally chosen to solve the vanishing gradient problem.

motivation of machine learning research and industry, thanks to many progress results in robotic processing [28], object recognition [29], speech and handwriting recognition [30], and even real time sign-language translation [31]. Overall, an artificial neural network can be described by three main parts: (1) Input nodes that provide information from the external source (signals, features, image, and measurements) to the network. Each input has an associated weight, which is assigned on the basis of its relative importance to other nodes. These input nodes are usually normalized via the activation functions, which perform a certain fixed mathematical operation. (2) Hidden nodes responsible for extracting patterns associated with the process or system are being analyzed. These layers perform most of the internal processing from a network. (3) Output nodes are collectively responsible for computations (processing performed by the neurons in the previous layers) and producing the final network outputs. Depending on the arrangement of neurons and their interconnection via the processing layers, the main architectures of artificial neural networks can be divided as follows: single-layer feedforward network (as example, perceptron and the ADALINE), multilayer feedforward networks (multilayer perceptron (MLP) and the radial basis function (RBF)), recurrent networks, and mesh networks (The Self-Organizing Map the main represen-

Despite the idea that deeper architectures would provide better results compared that are shallower already used, empirical tests with deep networks had found similar or even worse results when compared to networks with only one or two layers [32, 33]. Training was also found to be difficult and often inefficient [33]. Finally, this concept started to change with the proposal of greedy layer-wise unsupervised learning [34], which allowed for the fast learning of deep belief networks and solving the vanishing gradients problem. Thus, since 2006, deep learning's revolutionary advances in speech recognition, image analysis, and natural language processing have gained significant attention. With the ever-growing volume, complexity, and dynamicity of online information, deep learning approach has been an effective key solution to overcome such information overload. Recent studies also demonstrate its effectiveness in anomaly detection and prediction tasks. Deep learning is a sub research field of machine learning. It learns multiple levels of representations and abstractions from data, which can solve both supervised and unsupervised learning tasks [35]. In this subsection, we briefly review the key array of deep learning concepts using some of the sources included in

• Multilayer perceptron (MLP) is a deep artificial neural network with hidden layers (one/or multiple layers) between input layer and output layer that makes a decision or prediction about input. An MLP can be viewed as a logistic regression classifier where input is first transformed using a nonlinear transformation to project the input data into a space linearly separable. A single hidden layer is sufficient to consider MLP a universal approximator. Since 2006, scientific researchers have shown that there are considerable benefits to using

• Recurrent neural network (RNN) performs the same task for every element of a sequence, with the output being depended on the previous computations. In a traditional neural network (feedforward neural network), all inputs and outputs are independent of each other. The idea behind RNN is to employ sequential information in order to capture information

many such hidden layers, e.g., the very premise of deep learning.

tative of mesh architectures).

298 Desalination and Water Treatment

the annotated section of the bibliography [36–38]:


In the article, Zhang et al. [38] provide a comprehensive review of recent research efforts on deep learning–based recommender systems toward fostering innovations of recommender system research. A taxonomy of deep learning–based recommendation models is presented and used to categorize the surveyed articles. Wide and deep learning (WDL) is one of the models presented in this paper. This model can improve the accuracy, as well as the diversity of recommendation. The WDL (shown in **Figure 2**) can solve both problems, regression and

descriptors parameters of the quality of raw water, such as turbidity, pH, conductivity, etc., using a neural network. The learning base is constructed using a jar-test test history to model the optimal coagulant dose. Agdar et al. [39] propose to use the color, conductivity, and turbidity of raw water to predict the dose of coagulant. The results obtained on a pilot site [39] seem encouraging. However, the lack of input descriptor parameters does not allow to take into account all variations in the quality of the raw water. Another study [40] proposes to use many more input parameters of the RNA model. Mirsepassi et al. [41] also propose to use a history of these different parameters, that is, to consider the value of the parameters at times {t − 1, t − 2, …, t − 6} (t represents the current day). Gagnon et al. [25] show the interest of building seasonal models. They use four descriptors parameters of the raw water quality: pH, turbidity, conductivity, and temperature. This study compares the accuracy of an annual year-round model with four seasonal patterns. Nevertheless, the determination of the four periods of application of each model seems difficult. Valentin et al. [42] have developed an alternative to the jar-test and SCD methods allowing for the automatic determination of optimal coagulant dose from raw water characteristics, using a self-organizing map and MLP

A Survey of Deep Learning Methods for WTP Control and Monitoring

http://dx.doi.org/10.5772/intechopen.77196

301

approaches to validate the sensor measurements before coagulant dose estimation.

water so as to provide reliable inputs to the automatic coagulation control system.

been called SOFM (self-organizing feature map).

In many anomaly detection applications, abnormal (negative) samples are not available at the training stage. For instance, in a computer security application, it is difficult to have information about all possible attacks. In the machine learning approaches, the lack of samples from the abnormal class causes difficulty in the application of supervised techniques. Therefore, the obvious machine-learning solution is to use an unsupervised algorithm. For this, we adopted an unsupervised learning approach based on the self-organizing map algorithm introduced by Kohonen [48]. Self-organizing map is one of the most popular neural network models. It belongs to the category of competitive learning networks. The SOM method is based on unsupervised learning, which means that no human intervention is needed during the learning and that little needs to be known about the characteristics of the input data. We could, for example, use the SOM for clustering data without knowing the class memberships of the input data. The SOM can be used to detect features inherent to the problem and thus has also

Given the strong evolution of the raw water characteristics, an important property for such system is indeed the robustness with regard to the sensors failings or to the unexpected raw water characteristics, owing to accidental pollution for example. Coagulation process is one of the critical processes performed in the drinking water treatment, involving many biological, physical, and chemical phenomena. As we have already mentioned, the control of a good coagulation is essential for maintenance of satisfactory treated water quality and economic plant operation. Thus, an over-dosage can lead both to an increase in the operating costs and to public health concerns. While an underdosage can cause failure to meet the water quality targets, the coagulation has a strong impact on the clarification step. In addition to these developments on the coagulation automatic control, we have developed a software sensor based on a hybrid system [44–47], including a Self-Organizing Map (SOM) for measurements validation and missing data reconstruction [45], a multilayer perceptron (MLP) for coagulant dose prediction [47], and a neuro-fuzzy method to identify functional states of treatment plant [44, 45]. The main objective of our works conducted was to validate and rebuild the measurements of characteristics raw

**Figure 2.** Illustration of a wide and deep learning.

classification, by combining two learning techniques: the wide learning component (singlelayer perceptron) and deep learning (multilayer perceptron). The aim searched from combining these two learning techniques is that it enables the recommender system to capture both tasks: (1) memorization, which is the capability of catching the direct features from historical data, and (2) generalization by producing more general and abstract representations.

For different fields, suitable applications vary depending on the nature, type, and purpose of the data. While scientific researchers can be interested in searching for anomalies in the sleep patterns of a patient, economists and industrials may be more interested in forecasting the next prices some stocks of interest will assume. These kinds of problems are addressed in the literature by a range of different approaches used to perform tasks such as classification, segmentation, anomaly detection, and prediction. Applying deep neural network techniques also into treatment water process [25, 39–46] has been gaining momentum due to its state-ofthe-art performances and high-quality recommendations. In contrast to traditional recommendation models, deep learning provides a better understanding of user's demands, item's characteristics, and historical interactions between them.

#### **2.2. Neural software sensors for coagulation automatic control**

Several works [25, 39–42] have already shown the potential of these techniques for modeling the coagulation process. All these studies propose to relate the coagulant dose to different descriptors parameters of the quality of raw water, such as turbidity, pH, conductivity, etc., using a neural network. The learning base is constructed using a jar-test test history to model the optimal coagulant dose. Agdar et al. [39] propose to use the color, conductivity, and turbidity of raw water to predict the dose of coagulant. The results obtained on a pilot site [39] seem encouraging. However, the lack of input descriptor parameters does not allow to take into account all variations in the quality of the raw water. Another study [40] proposes to use many more input parameters of the RNA model. Mirsepassi et al. [41] also propose to use a history of these different parameters, that is, to consider the value of the parameters at times {t − 1, t − 2, …, t − 6} (t represents the current day). Gagnon et al. [25] show the interest of building seasonal models. They use four descriptors parameters of the raw water quality: pH, turbidity, conductivity, and temperature. This study compares the accuracy of an annual year-round model with four seasonal patterns. Nevertheless, the determination of the four periods of application of each model seems difficult. Valentin et al. [42] have developed an alternative to the jar-test and SCD methods allowing for the automatic determination of optimal coagulant dose from raw water characteristics, using a self-organizing map and MLP approaches to validate the sensor measurements before coagulant dose estimation.

Given the strong evolution of the raw water characteristics, an important property for such system is indeed the robustness with regard to the sensors failings or to the unexpected raw water characteristics, owing to accidental pollution for example. Coagulation process is one of the critical processes performed in the drinking water treatment, involving many biological, physical, and chemical phenomena. As we have already mentioned, the control of a good coagulation is essential for maintenance of satisfactory treated water quality and economic plant operation. Thus, an over-dosage can lead both to an increase in the operating costs and to public health concerns. While an underdosage can cause failure to meet the water quality targets, the coagulation has a strong impact on the clarification step. In addition to these developments on the coagulation automatic control, we have developed a software sensor based on a hybrid system [44–47], including a Self-Organizing Map (SOM) for measurements validation and missing data reconstruction [45], a multilayer perceptron (MLP) for coagulant dose prediction [47], and a neuro-fuzzy method to identify functional states of treatment plant [44, 45]. The main objective of our works conducted was to validate and rebuild the measurements of characteristics raw water so as to provide reliable inputs to the automatic coagulation control system.

classification, by combining two learning techniques: the wide learning component (singlelayer perceptron) and deep learning (multilayer perceptron). The aim searched from combining these two learning techniques is that it enables the recommender system to capture both tasks: (1) memorization, which is the capability of catching the direct features from historical

For different fields, suitable applications vary depending on the nature, type, and purpose of the data. While scientific researchers can be interested in searching for anomalies in the sleep patterns of a patient, economists and industrials may be more interested in forecasting the next prices some stocks of interest will assume. These kinds of problems are addressed in the literature by a range of different approaches used to perform tasks such as classification, segmentation, anomaly detection, and prediction. Applying deep neural network techniques also into treatment water process [25, 39–46] has been gaining momentum due to its state-ofthe-art performances and high-quality recommendations. In contrast to traditional recommendation models, deep learning provides a better understanding of user's demands, item's

Several works [25, 39–42] have already shown the potential of these techniques for modeling the coagulation process. All these studies propose to relate the coagulant dose to different

data, and (2) generalization by producing more general and abstract representations.

characteristics, and historical interactions between them.

**Figure 2.** Illustration of a wide and deep learning.

300 Desalination and Water Treatment

**2.2. Neural software sensors for coagulation automatic control**

In many anomaly detection applications, abnormal (negative) samples are not available at the training stage. For instance, in a computer security application, it is difficult to have information about all possible attacks. In the machine learning approaches, the lack of samples from the abnormal class causes difficulty in the application of supervised techniques. Therefore, the obvious machine-learning solution is to use an unsupervised algorithm. For this, we adopted an unsupervised learning approach based on the self-organizing map algorithm introduced by Kohonen [48]. Self-organizing map is one of the most popular neural network models. It belongs to the category of competitive learning networks. The SOM method is based on unsupervised learning, which means that no human intervention is needed during the learning and that little needs to be known about the characteristics of the input data. We could, for example, use the SOM for clustering data without knowing the class memberships of the input data. The SOM can be used to detect features inherent to the problem and thus has also been called SOFM (self-organizing feature map).

For coagulant dosage prediction, the MLP architecture (inputs, number of hidden layers, and number of neurons) has been fixed a priori. To define relevant descriptors of raw water quality affecting the coagulant dosage, a principal components analysis (PCA) is used within this framework. The number of neurons in the hidden layer has been optimized with a pruning method "weight-decay" [49, 50] in combination with the "Levenberg–Marquardt" algorithm [51], allowing the weak weights to be penalized (the connections with weak weight are eliminated). In this framework, the weights and biases of the network are assumed to be random variables with specified distributions. The regularization parameters are related to the unknown variances associated with these distributions. To take into account the uncertainly bound to the size limited of the learning set, the "Bootstrap" sampling [52] has been used to generate confidence interval for the model outputs. The results confrontation with test data of treatment plant located in Morocco [45, 46] shows that it is possible to determine online and in a very satisfactory way the optimal coagulant dose and this in various phases of functioning.

**3. Other propositions to purification water processes control**

model based on deep neural networks is performed on this type of process.

Water resources systems management practice, include drinking water treatment process, around the world is challenged by serious problems. Climate change and land use change are increasingly recognized as having the major impact on hydrologic variables and therefore on management of water resources. Certainly, the profession has been slow to acknowledge these changes, and that fundamentally new approaches will be required to address them. Evolutionary algorithms are becoming more prominent in the water treatment processes field. Significant advantages of evolutionary algorithms include: (1) no need for an initial solution; (2) ease of application to nonlinear problems and to complex systems; (3) production of acceptable results over longer time horizons; and (4) generation of several solutions that are very close to the optimum (and that give added flexibility to a water manager). Special attention is given to evolutionary optimization by deep neural networks to predict and capture anomalies in coagulation process, regarded as a complex and critical process. The use of deep neural networks for process modeling and control in the drinking water treatment is currently on the rise and is considered to be a key area of research. With regard

**4. Conclusions**

An expert system for a water purification system that performs supervisory control of water quantity, and automatic filter basin control, is developed in [55]. The sand bed filters can be in four possible states: waiting for filtering, filtering, waiting for scouring, and scouring. The filter basins in a water purification system are usually divided into groups connected in parallel. Online data are gathered from distributed control systems throughout the water purification system. In [56], filter basin control is based on control of filter scouring basin and control of the number of filter basins in operation. Filter scouring occurs when the water flow falls below a preset minimum value. The number of filters in operation is controlled to match the plant processing flow to total filtering flow. A different approach is presented in [56], where the proposed chlorination control system for water treatment is a double cascade PI loop for controlling the hypochlorite dosed in the system by means of free chlorine measurements taken at two sample points of the disinfection system. Denitrification of drinking water has been also proposed in several studies. In [57], SISO and MIMO robust variable structure controls for fixed bed bioreactors are developed. A SISO variable structure control is used to control the total concentration of nitrates and nitrites by changing either the inlet flow rate or the ethanol concentration. A MIMO variable structure control is needed to optimally regulate the ethanol concentration of drinking water. In [58], drinkable water is also treated by a fixed bed bioreactor. A multiinput and multioutput sliding control law of a distributed parameter bio-filter is designed to improve the quality of the water in order to control the harmful component concentration at the outlet of the bioreactor and to optimize the addition of carbon source. However, to our knowledge, it is certainly regrettable that no specific

A Survey of Deep Learning Methods for WTP Control and Monitoring

http://dx.doi.org/10.5772/intechopen.77196

303

To assure a good monitoring and contribute to a good operation of this process, it would be necessary to exploit all process information, such as the measurements of raw water characteristics and their evolutions resulting for example from unforeseen abnormalities, as well as the expert knowledge. For these reasons, we chose to carry the behavior monitoring of this process by using a neuro-fuzzy method, called "LAMDA" (Learning Algorithm for Multivariate Data Analysis) classification technique [53, 54], which allows aggregating this information for informing the operator by specific situations. The classification idea is the evaluation of the significant system signals (raw water quality measurements + neural coagulant dose) to recognize the factors related to such or such other situation and to help the operator to make a decision during the failure appearance. This approach was a first application that shows the utility of classification techniques in the monitoring and the surveillance of this process type. It is clear that the final objective was to spread this monitoring to other treatment processes in order to detect at the earliest a drift functioning or to identify a failure on an upstream unit (**Figure 3**).

**Figure 3.** Hybrid system proposed for coagulation control and monitoring.
