Reactive Distillation Modeling Using Artificial Neural Networks

*Francisco J. Sanchez-Ruiz*

#### **Abstract**

The use of artificial intelligence techniques in the design of processes has generated a line of research of interest, in areas of chemical engineering and especially in the so-called separation processes, in this chapter the combination of artificial neural networks (ANN) is presented and fuzzy dynamic artificial neural networks (DFANN). Applied to the calculation of thermodynamic properties and the design of reactive distillation columns, the ANN and DFANN are mathematical models that resemble the behavior of the human brain, the proposed models do not require linearization of thermodynamic equations, models of mass and energy transfer, this provides an approximate and tight solution compared to robust reactive distillation column design models. Generally, the models must be trained according to a dimensionless model, for the design of a reactive column a dimensionless model is not required, it is observed that the use of robust models for the design and calculation of thermodynamic properties give results that provide better results than those calculated with a commercial simulator such as Aspen Plus (R), it is worth mentioning that in this chapter only the application of neural network models is shown, not all the simulation and implementation are presented, mainly because it is a specialized area where not only requires a chapter for its explanation, it is shown that with a neural network of 16 inputs, 2 hidden layers and 16 outputs, it generates a robust calculation system compared to robust thermodynamic models that contain the same commercial simulator, a characteristic of the network presented is the minimization of overlearning in which the network by its very nature is low. In addition, it is shown that it is a dynamic model that presents adjustment as a function of time with an approximation of 96–98% of adjustment for commercial simulator models such as Aspen Plus (R), the DFANN is a viable alternative for implementation in processes of separation, but one of the disadvantages of the implementation of these techniques is the experience of the programmer both in the area of artificial intelligence and in separation processes.

**Keywords:** reactive distillation, neural networks, dynamic fuzzy neural network, thermodynamics properties, design column, azeotropic mix

#### **1. Introduction**

Reactive distillation is a separation process that is implemented for the separation of complex mixtures because it combines a chemical reaction in a single piece of equipment, that is, one or more of the stages of the separation column has the function of a chemical reactor, in which the catalyzed or uncatalyzed reaction will *Distillation Processes - From Solar and Membrane Distillation to Reactive Distillation…*

be carried out, this type of process is implemented for mixtures that present azeotropes, these with very close boiling points that can be complex or require an excess of energy for the separation of the components, on some occasions the process is implemented for the purification of substances through a thermally integrated process, the reactive distillation process is carried out by mass transfer both in the liquid phase and in the vapor phase, or on the surface of the catalyst [1].

The calculation and design of a reactive distillation system introduce a term in the mass balances of the stages which makes it become reactive stages *Mi*,*<sup>j</sup>* Eq. (1) thus becomes.

$$\begin{aligned} M\_{i\dot{j}} &= n\_{L,\dot{j}-1} \mathbf{x}\_{i\dot{j}-1} + n\_{v\dot{j}+1} \mathbf{y}\_{i,\dot{j}+1} + n\_{F\dot{\jmath}} \mathbf{x}\_{F,\dot{\jmath}} \\ \mathbf{y}\_{i\dot{j}} &= (n\_{l\dot{j}} + n\_{SL,\dot{j}}) \mathbf{x}\_{i\dot{j}} - (n\_{v\dot{j}} + n\_{S\dot{V}\dot{\jmath}}) \mathbf{y}\_{i\dot{j}} - (V\_{L,H})\_{\dot{j}} \sum\_{n=1}^{n\_{\text{Rx}}} \nu\_{i,n} r\_{j,n} = \mathbf{0} \end{aligned} \tag{1}$$

where (*VLH*)*<sup>j</sup>* is the volumetric liquid holdup at stage *j, νi,n* is the stoichiometric coefficient of component *i* in reaction *n, rj,n* rate of reaction *n* on stage *j*, and *nRx* is the number of chemical reactions.

The modification of stage energy balance is in the definition of *Qj* in Eq. (2), where the heat of reaction is included.

$$\begin{aligned} \mathbf{H}\_{j} &= n\_{L,j-1}h\_{Lj-1} + n\_{vj+1}h\_{vj+1} + n\_{F\_{j}}h\_{F\_{j}} - \\ &\mathbf{h} \left( n\_{L,j} + n\_{SL,j} \right) h\_{Lj} - \left( n\_{vj} + n\_{SVj} \right) h\_{vj} - \mathbf{Q}\_{j} = \mathbf{0} \end{aligned} \tag{2}$$

In these equations, *n* represents mole flow, *x* mole fraction in the liquid phase, *y* mole fraction in the vapor phase, *K* equilibrium constant, *h* molar enthalpy, and *Q* heat flow. The subscript *i* represents a component, *j* stage, *L* liquid, *V* vapor, *SV* side vapor, *SL* side liquid, *F* feed, and *N* last stage, respectively [1].

A first approximation is carried out using a mathematical model based on equations in steady-state, these equations are taken as the basis for modeling in a dynamic state, which is necessary for the implementation of fuzzy dynamic artificial neural networks, therefore; the models presented in this chapter are those that are implemented for artificial intelligence systems.

Dynamic fuzzy neural networks have been implemented to solve non-linear mathematical models. In the areas of process engineering, it has been implemented in temperature control systems. In this chapter the use of artificial intelligence techniques to calculate a temperature is shown. Reactive column, where azeotropes are present in a ternary mixture.

### **2. Artificial neural networks**

Artificial neural networks arise from the analogy that is made between the human brain and computer processing, from the first analyzes of the human brain carried out by Ramón y Cajal [2]. This analogy is made from the aspects of the neural structure to processing capacity.

Artificial neural networks are mathematical models that attempt to mimic the capabilities and characteristics of their biological counterparts. Neural networks are made up of simple calculation elements, all of them interconnected with a certain topology or structure, such as neurons called perceptron's, which are the simplest elements of a network. The basic model of a neuron is formed as observed by the following elements (**Figure 1**) [3, 4]:

*Reactive Distillation Modeling Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.101261*

**Figure 1.** *Elementary neuron.*


$$\mathcal{Y}\_i = \phi \left(\sum\_{j=1}^n w\_{\vec{\eta}} \mathfrak{s}\_j + w\_{i0} \right) \tag{3}$$

where n indicates the number of inputs to neuron *i* and *φ* denote the excitation function [5, 6]. The argument of the activation function is the linear combination of the inputs of the neuron. If we consider the set of inputs and the weights of neuron i as a vector of dimension (n + 1)*,* the expression is concluded as follows:

$$\mathcal{Y}\_i = \mathcal{O}\left[\mathbf{W}\_i^T \mathbf{s}\right] \tag{4}$$

where

$$\boldsymbol{s} = [-\mathbf{1}, \boldsymbol{s}\_1, \boldsymbol{s}\_2, \dots, \boldsymbol{s}\_n]^T \tag{5}$$

$$\boldsymbol{w}\_{i} = \begin{bmatrix} \boldsymbol{w}\_{i0}, \boldsymbol{w}\_{i1}, \dots, \boldsymbol{w}\_{in} \end{bmatrix}^{T} \tag{6}$$

Neural networks are classified into static and dynamic networks, the first of these have a broader field of application mainly due to their characteristic of no change for time, dynamic networks are applied more specifically to problems that present changes for time [7, 8].

The static and dynamic neural networks have the characteristics of similar this in mathematical structures, training in addition to principles of architectures of the same neural networks, the most commonly used networks are the so-called multilayer neural networks this mainly because they resemble structures of the human brain, they can be networks with forwarding propagation but also networks with backward propagation. The selection of the same depends on the type of study system and the application of the network [9, 10]. For prediction systems of breakdown curves in adsorption processes, the so-called multilayer neural networks with forwarding propagation are generally used, this is because it is not necessary to use a backward propagation of information as a means of comparison, the latter are most commonly applied in control processes [11–13].

#### **2.1 Multilayer networks**

A multilayer network has a defined structure, it consists of an input layer, hidden layers and an output layer (**Figure 2**), the definition of structure of a multilayer neural network has the characteristic of avoiding problems with the training of the network which generally results in prediction problems of the breakdown curve of the adsorption process, the process of establishing the architecture of the neural network is mainly based on a series of trial and error although in some of the cases if the programmer is an expert this is significantly reduced, mainly to the fact that there are already established mechanisms to determine architecture, Hecht-Nielsen (1989) [14] based on Kolmogorov's theorem [15–17], "The number of neurons in the hidden layer does not need to be greater than twice the number of inputs" using this theorem, the neuron approximation equation is established in the hidden layer [18, 19] Eq. (7).

$$h = \left(\frac{2}{3}\right)(n+m) \tag{7}$$

where *h* represents the number of neurons in the hidden layer, *n* number of inputs and m is defined as the number of hidden layers, using this rule a stop parameter is established which means that the number of neurons in the hidden layer will never be required. More than twice the number of entries *h < 2n*. When it comes to a multilayer network with a single hidden layer, it is recommended that the number of neurons is 2/3 of the number of inputs [20, 21].

The next step in structuring a neural network is the establishment of the excitation functions, these functions can propagate the information and use them for the training of the same network, the information introduced into the network is known as synaptic weights, alluding to the synapses of biological neurons [22, 23]. The excitation functions are of different types, their choice depends on the type of process to be modeled, each excitation function is found in each of the neurons, both in the hidden layers and in the inputs and outputs. The most commonly used functions are the type function: tangential sigmoidal Eq. (8), logarithmic Eq. (9) and radial base type functions Eq. (10), this last function is one of the complex ones generally used for systems dynamic, in non-dynamic processes it can be used but this increases the computing time and information processing mainly because it becomes more specific in its application Eqs. (11)–(16) [24–27].

**Figure 2.** *Multilayer network.*

*Reactive Distillation Modeling Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.101261*

$$\varphi = \frac{\mathfrak{e}^{-w\_i} + \mathfrak{e}^{w\_i}}{\mathfrak{e}^{-w\_i} - \mathfrak{e}^{w\_i}} \tag{8}$$

$$\rho = \frac{1}{1 + e^{-w\_i}} \tag{9}$$

$$\rho = \sum\_{i=1}^{N} w\_i \Phi(||w - w\_{ci}||) \tag{10}$$

Gaussian function

$$\Phi(w) = \mathfrak{e}^{w\_i^2} \tag{11}$$

Multi-quadratic function

$$\Phi(w) = \sqrt{\mathbf{1} + w\_i^2} \tag{12}$$

Reciprocad multi-quadratic function

$$\Phi(w) = \frac{1}{\sqrt{1 + w\_i^2}}\tag{13}$$

Armonic-poli function

$$\Phi(w) = w\_i^k \quad k = 1, 3, 5, \dots \tag{14}$$

$$\Phi(w) = w\_i^k \ln \left( w\_i \right) \quad k = 2, 4, 6, \dots \tag{15}$$

Slim quadratic function

$$
\Phi(w) = w\_i^2 \tag{16}
$$

Once the excitation function or also called the transfer function has been selected, the neural network is trained for which there are different types of training, as with the selection of the architecture, the training is also selected by trial and error but if the experienced programmer can initiate selection with training for a certain type of neural structure, the most commonly used training is backward propagation (BP) training [28, 29], other types of training most used are Levenberg-Maquart (LM) and Broyden Fletcher Goldfarb Shanno (BFGS). Backward propagation training is the basis for all other training, for that reason, only this type of training will be discussed [30, 31].

The error signal at the output of neuron j in iteration k is defined by:

$$e\_j(k) = d\_j(k) - \wp\_j(k) \tag{17}$$

The instantaneous value of the error is defined for neuron j, the sum of the instantaneous errors squared is formulated as:

$$\varepsilon(n) = \frac{1}{2} \sum\_{j \in h\_{\rm out}}^{l} e\_j^2(k) \tag{18}$$

where *hout* is the set of output neurons, *hout* ¼ f g 1, 2, … , *l* . The average error ð Þ *eav* it is obtained by averaging the instantaneous errors corresponding to the N training pairs.

*Distillation Processes - From Solar and Membrane Distillation to Reactive Distillation…*

$$\varepsilon\_{av}(n) = \frac{1}{N} \sum\_{k=1}^{N} \varepsilon(k) \tag{19}$$

The objective is to minimize *εav* with respect to weights. You need to calculate Δ*wji*ð Þ*k* .

$$\frac{\partial \varepsilon(k)}{\partial w\_{ji}(k)}\tag{20}$$

$$\frac{\partial \varepsilon(k)}{\partial w\_{ji}(k)} = \frac{\partial \varepsilon(k)}{\partial e\_j(k)} \frac{\partial \varepsilon\_j(k)}{\partial \nu\_j(k)} \frac{\partial \nu\_j(k)}{\partial w\_j(k)} \frac{\partial \nu\_j(k)}{\partial w\_{ji}(k)}\tag{21}$$

$$w\_j(k) = \sum\_{i=0}^{p} w\_{ji}(k) y\_j(k) \tag{22}$$

$$\mathcal{Y}\_j(k) = \varrho(\boldsymbol{v}\_j(k)) \tag{23}$$

The components to calculate the error are defined as follows.

$$\frac{\partial \varepsilon(k)}{\partial \varepsilon\_j(k)} = \varepsilon\_j(n) \tag{24}$$

$$\frac{\partial e\_j(k)}{\partial \nu\_j(k)} = -1 \tag{25}$$

$$\frac{\partial \boldsymbol{\nu}\_{j}(\boldsymbol{k})}{\partial \boldsymbol{v}\_{j}(\boldsymbol{k})} = \boldsymbol{\rho}\_{j}\left(\boldsymbol{v}\_{j}(\boldsymbol{k})\right) \tag{26}$$

$$\frac{\partial v\_j(k)}{\partial w\_{ji}(k)} = \mathcal{y}\_j(k) \tag{27}$$

The gradient of the error is determined with Eq. (35).

$$\frac{\partial \varepsilon(k)}{\partial w\_{ji}(k)} = -\varepsilon\_j(k)\rho\_j(\upsilon\_j(k))\mathbf{y}\_j(k) \tag{28}$$

#### **2.2 Dynamic fuzzy artificial neural network (DFANN)**

The DFANNs use an excitation function based on asymmetric radial type function, which implies that the system behaves like a Takegi-Sugeon model (T-S) which has a characteristic pulse of a radial function bias. For the inputs of the fuzzy neural network, it is necessary to establish the limits of the inputs within a known interval, when this type of network is applied in the determination of properties, the inputs must be defined within known ranges, to avoid overlearning of the same artificial neural network. The structure of DFNN is shown in **Figure 3**, which is similar to the traditional models of artificial neural networks with the difference of the propagation of the synaptic weights in the radial basis excitation function, which can be biased or unbiased, the structure is defined below [32]:

Layer 1: Each node represents an input linguistic variable.

Layer 2: Each node represents a membership function (MF) which is in the form of Gaussian function Eq. (29).

*Reactive Distillation Modeling Using Artificial Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.101261*

$$MF\_{\vec{\eta}} = \exp\left[-\frac{\left(\varkappa\_i - c\_{\vec{\eta}}\right)^2}{\sigma\_j^2}\right] \quad i = 1, \ldots, r \text{and} \quad j = 1, \ldots u \tag{29}$$

Where *MFij* is the *j*th membership function of *xi, cij* is the center of *j*th Gaussian membership function of *xi* and *σ<sup>j</sup>* is the width of the *j*th Gaussian membership function of *xi*, *r* is the number of input variables and *u* is the number of membership function [32].

Layer 3: Each node represents a possible IF, part for fuzzy rules. For the *j*th rule *Rj*, its output is:

$$OR\_j = \exp\left[-\frac{\sum\_{j=1}^{r} (\mathbf{x}\_i - c\_{ij})^2}{\sigma\_j^2}\right] \qquad \qquad j = 1, \ldots, u \tag{30}$$

$$OR\_j = \exp\left[-\frac{\left\|X - \mathbf{C}\_j\right\|^2}{\sigma\_j^2}\right] \tag{31}$$

Where *X = (x1, … , xr*) and Cj is the center of the jth Radial Basic Function (RBF) unit.

Layer 4: Nodes as N (Normalized) nodes. The number of N nodes is equal to that of layer 3 the output of Nj is:

$$ORN\_j = \frac{OR\_j}{\sum\_{k=1}^{u} OR\_k} = \frac{\exp\left[-\frac{||X - C\_j||^2}{\sigma\_j^2}\right]}{\sum\_{k=1}^{u} \exp\left[-\frac{||X - C\_k||^2}{\sigma\_k^2}\right]}\tag{32}$$

Layer 5: Each node in this layer represents an output variable, which is the weighted sum of the incoming signals. Have:

$$\mathcal{Y}(\mathbf{x}) = \sum\_{k=1}^{u} \mathcal{ON}\_{k} w\_{2k} = \frac{\sum\_{k=1}^{u} w\_{2k} \exp\left[-\frac{\|\mathbf{X} - \mathbf{C}\_{k}\|^{2}}{\sigma\_{k}^{2}}\right]}{\sum\_{k=1}^{u} \exp\left[-\frac{\|\mathbf{X} - \mathbf{C}\_{k}\|^{2}}{\sigma\_{k}^{2}}\right]} \tag{33}$$

**Figure 3.** *Dynamic fuzzy artificial neural network (DFANN).*

*Distillation Processes - From Solar and Membrane Distillation to Reactive Distillation…*

Where y is the value of an output variable and w*2k* is the connection weight of each rule:

For the TSK (Takagi Sugeon and Kang).

$$w\_{2k} = k\_{\,j0} + k\_{\,j1}\varkappa\_1 + \dots + k\_{\,jr}\varkappa\_r \quad j = 1, 2, \dots, u \tag{34}$$
