**4. ECG preprocessing and enhancement**

In the view of the hierarchy, the neural network structure of the VAE is mainly composed of three parts. The first part is the encoder, which is used to encode the signals from the input layer. The second part is the decoder, which is located in the right side as shown in **Figure 2**. The third part is the sampling unit located in the middle of the other two parts. Except for the encoder and the decoder which are similar to that of the traditional autoencoder, the addi-

Another issue about how to train the structure is the loss function as shown in formula (9), which is essentially the same as the negative L(Q) in formula (7). In the view of training, the losses of a VAE come from two aspects: the first part is from the neural network that measures how much the difference between the reconstructed data and the original input. This part encourages the decoder to learn to reconstruct the input. Otherwise, the value of this part will become even larger that will increase the total loss value finally. The second part comes from the KL divergence that indicates how much close of the encoder's distribution Q(Z|*X*) and the latent variables distribution. This part can be taken as a regularizer as that of the traditional

tribution P(Z) as possible by minimizing KL divergence of them. In other words, if the encoder outputs representations are different from the specified distribution, then the regularizer term

The last idea for VAE is the way that how to minimize the loss function of Eq. (9) as working on the neural networks, where the algorithms based on gradient decent are popularly adopted. Comparatively, it is feasible to compute the first term in the Eq. (9) as the expectation indicates the reconstruction difference and we can calculate it by the mean squared error between the output of the encoder and the decoder, as similar to that of the traditional autoencoders. However, it is more difficult to compute the second KL divergence directly as

the simplicity, here we assume P(Z) = (Z; 0, I), where I is a unitary diagonal matrix. The advantages of this choice make the computation of the KL divergence manageable. We can compute

log \_\_\_1

Additionally, to train a VAE neural structure, the gradient decent should be focused on when error back propagates through the sampling layers. However, we cannot derivate the loss



) follows a normal distribution Q(Z|X<sup>i</sup>

are the parameters of the mean and the variance, respectively. For

<sup>|</sup>Σ1| <sup>−</sup> *<sup>D</sup>* <sup>+</sup> *tr*(Σ1) <sup>+</sup> (*μ*1)

)[*log*(*P*(*Xi*

) go as close to the latent variables dis-

)). (9)

)~(Z; <sup>θ</sup>), where

). (10)

VAE respect

)‖*P*(*Z*)

*T* (*μ*1)

) directly as the distribution is a non-continuous operation


tional sampling unit is responsible for sampling from the latent variables spaces.

autoencoder. It forces the encoder's distribution Q(Z|X<sup>i</sup>

*N*

*LVAE* = −∑*<sup>i</sup>*=1

78 Machine Learning and Biometrics

et al. [36] on the assumption that Q(Z|*Xi*

and Σ<sup>1</sup>

P(Z) and P(*Xi*

, <sup>Σ</sup>1} and μ1

it in the closed form as:

*KL*(*Q*(*Z*|*Xi*

function over the distribution *Q*(*Z*|*Xi*

<sup>∂</sup> *<sup>L</sup>*\_\_\_\_\_ *VAE*

θ = {μ1

to Q(Z|*Xi*

will penalize the loss function. Otherwise, the penalty will vanish away:

(*EZ*~*Q*(*Z*|*Xi*

)‖*P*(*Z*)

), then we get the gradient expression as following:

<sup>∂</sup>*<sup>Q</sup>* <sup>=</sup> *log*(*P*(*Xi*

) <sup>=</sup> \_\_1 2(

D is a constant value that is only relevant to the dimensionality of the distribution.

and has no gradient. To clarify the problem, suppose we can take the derivation of J

In this section, we introduce our method on ECG preprocessing and enhancement. The task in this procedure is to split the ECG waves into segments according to the cardiac cycle [28] and then take them as data points for training our models. As described in Section 1, QRS complex is responsible for the activities of ventricular depolarization and repolarization, it has morphologically higher amplitude and sharper peak than other components such as P-wave and T-wave. Therefore, it is much more convenient to detect and locate Q peaks (or R, S peaks) than any other components in these ECG segments. Algorithm 1 describes the procedure of how to split ECG waveforms in detail. The templates selected in algorithm 1 are produced by the contours of the most ECG R wave peaks.

The critical step in Algorithm 1 is how to evaluate the similarity between the selected area on the ECG waveform and the given template. Generally, the mean squared error (MSE) is usually adopted in some ECG recognizing applications. However, the main disadvantage of this method is that it is time-consuming to align the selected area with the given template. For example, there are two pictures with the same curve, the similar value of the pictures may be definitely tiny if the template aligns extremely well or a very large as they do not cover each other at all. Another reasonable approach named the correlation coefficient is being currently used [21, 26]. Instead of computing directly the difference between the ECG waveform and the template as the MSE method, it solves an optimal problem that minimizes the sum of the squares of the offsets of the selected ECG data points to the corresponding points on the template.

We introduce a parameter hstep for the length of the segment of ECG waveforms. It is important to keep hstep lie in a proper range. Otherwise, there are more than one R peaks or none in the segment when the hstep is out of the range. To avoid the awkward situations, there is a trick that let the hstep be proportional to the distance between two adjacent peaks and rather less than it, that is, hstep ≤ sampling rate \_\_\_\_\_\_\_\_\_\_\_ heart rate . For instance, suppose sampling rate is 250 Hz and heart rate equals 75 times per minute, then hstep ≤ 200. As the heart rate is not a constant during the sampling procedure, then distance can be calculated by the inequation. For this reason, in all of our experiments, the distance is set empirically as the average of that of previous three cardiac periods. The searching step can be initialized as a constant value as there are no any variations on the vertical directions. We keep the vstep equaling 1 in this chapter.


**Figure 3** shows ECG waveform (top picture) and the R wave peak detection and location (bottom picture). The ECG data are adopted from the American Heart Association (AHA) database on physionnet website [24], which consisted of 80 two-channel ECG recordings and digitized at 250 Hz with 12-bit resolution over a 10-mV range. The recordings in the database are divided into eight classes according to the highest level of ventricular ectopy present.

**Figure 3.** ECG waveform and R-wave peak location adopted from AHA database (top). The bottom picture shows the

Electrocardiogram Recognization Based on Variational AutoEncoder

http://dx.doi.org/10.5772/intechopen.76434

81

In this section, we evaluate the performance of VAE and other autoencoder variants described

To demonstrate the performance of our models on dealing with ECG signals, it is necessary to abstract an intact ECG signal in a cardiac period, which consists of features such as P-wave, QRS complex, and T-wave as described in Section 4. Then detection and location of P-wave becomes more critical step as every cardiac period of ECG signal starts at P-wave. However, as the amplitude of P-wave is smaller than that of QRS complex, and there are many kinds of noise on ECG singles. These factors enlarge the difficulties of

Our solution to alleviate this problem is offered by the fact that it is more feasible to locate R-peaks than to locate the start position of a P-wave. Instead of focusing on the cardiac period, we separate one cardiac period into two semi-cardiac periods at R-peak and then take two parts of the adjacent ECG signals together to form a new period ECG signal, which consists of the second part of the previous cardiac period and the first part of the next one. **Figure 4(a)** shows an example of an ECG signal that is composed of two parts of the adjacent semi-period. Additionally, in the view of information, there is no any feature lost in this separation.

The original ECG recording from ECG database contains several hours of ECG data, and it is unfeasible to train our models using these original ECG data directly. To train our models well, 30,000 ECG signals are abstracted completely from three different ECG databases.

**5. Experimental results and discussion**

result of R peaks detection and location for the ECG waveform in the top picture.

abstraction of ECG signals in a cardiac period.

**5.1. ECG signals for multi-classification**

in Section 2.

#### 18: **return** ECG data array v, R wave peak array *ecg*\_*pos*;

**Figure 2.** Neural network structure of VAE. It consists of three parts: The encoder, the decoder, and the sampling unit. The encoder (indicating by number 2) and the decoder (indicating by number 6) are all fully connected multilayers neural networks. The sampling unit consists of the mean generator (indicating by number 3), the standard deviation generator (indicating by number 4), and the latent vector generator (indicating by number 5). The structure of the sampling unit lies on the assumption of Z ∽ Ν(μ, σ<sup>2</sup> ).

**Figure 3.** ECG waveform and R-wave peak location adopted from AHA database (top). The bottom picture shows the result of R peaks detection and location for the ECG waveform in the top picture.

**Figure 3** shows ECG waveform (top picture) and the R wave peak detection and location (bottom picture). The ECG data are adopted from the American Heart Association (AHA) database on physionnet website [24], which consisted of 80 two-channel ECG recordings and digitized at 250 Hz with 12-bit resolution over a 10-mV range. The recordings in the database are divided into eight classes according to the highest level of ventricular ectopy present.
