**7. Signal preprocessing**

Signal preprocessing is carried out in order to reduce noise, reduce data dimension, find R-peaks, and cut ECG into cardiocycles. In this case, R-peak synchronization is usually performed. Noises are usually removed using a low-pass filter, while the cutoff frequency is selected experimentally. There is still no consensus on how best to find R-peaks. This is due to the fact that cardiocycles in different people are

**Figure 4.** *The original digitized electrocardiogram has a high frequency.*

**Figure 5.** *Using the low-pass filter, the envelope of the cardiac signal is extracted.*

#### **Figure 6.**

*Further, R-peaks are detected in the ECG, with their help the signal is cut into cardiocycles, after which the R-peak is synchronized.*

**27**

**Figure 7.**

**Figure 8.**

*Wfdb library example.*

*Biometric Authentication Based on Electrocardiogram DOI: http://dx.doi.org/10.5772/intechopen.91172*

distinguished by a rather high degree of great variability. In **Figure 4**, a digitized

After noise removal, the procedure for finding R-peaks, slicing an ECG into cardiocycles and synchronizing cardiocycles by R-peaks follows (**Figure 6**).

There are several popular computer libraries for ECG preprocessing. Among them, libraries for the python wfdb [15] and biosppy [16] languages are very

There are several opinions as to which ECG features are best used for biometric identification. Some authors propose using the geometric characteristics of the cardiocycle, such as the amplitude and time characteristics of the cardiocycle peaks.

electrocardiogram is shown (sample rate is 1000 Hz).

popular (**Figures 7** and **8**).

*Example of using the biosppy library.*

**8. Biometric features extraction**

To reduce noises, we can use a low-pass filter (**Figure 5**).

*Biometric Authentication Based on Electrocardiogram DOI: http://dx.doi.org/10.5772/intechopen.91172*

**Figure 7.** *Wfdb library example.*

*Biometric Systems*

**7. Signal preprocessing**

**26**

**Figure 6.**

*R-peak is synchronized.*

**Figure 5.**

**Figure 4.**

*Using the low-pass filter, the envelope of the cardiac signal is extracted.*

*The original digitized electrocardiogram has a high frequency.*

*Further, R-peaks are detected in the ECG, with their help the signal is cut into cardiocycles, after which the* 

Previously, we investigated factors that influence the accuracy of biometric identification using an ECG. We have shown that the quality of an electrocardiograph affects the accuracy of biometric identification. Thus, the recognition accuracy during ECG classification using mixed Gaussian models of subjects from

Signal preprocessing is carried out in order to reduce noise, reduce data dimension, find R-peaks, and cut ECG into cardiocycles. In this case, R-peak synchronization is usually performed. Noises are usually removed using a low-pass filter, while the cutoff frequency is selected experimentally. There is still no consensus on how best to find R-peaks. This is due to the fact that cardiocycles in different people are

the ECG-ID database was 0.66, while for PTB this indicator was 0.8 [14].

**Figure 8.** *Example of using the biosppy library.*

distinguished by a rather high degree of great variability. In **Figure 4**, a digitized electrocardiogram is shown (sample rate is 1000 Hz).

To reduce noises, we can use a low-pass filter (**Figure 5**).

After noise removal, the procedure for finding R-peaks, slicing an ECG into cardiocycles and synchronizing cardiocycles by R-peaks follows (**Figure 6**).

There are several popular computer libraries for ECG preprocessing. Among them, libraries for the python wfdb [15] and biosppy [16] languages are very popular (**Figures 7** and **8**).

### **8. Biometric features extraction**

There are several opinions as to which ECG features are best used for biometric identification. Some authors propose using the geometric characteristics of the cardiocycle, such as the amplitude and time characteristics of the cardiocycle peaks.

**Figure 9.**

*Amplitude and time characteristics of P, Q, S, T peaks. Cardiocycles are synchronized in amplitude and time of onset of R-peak.*

We can see a feature cloud in **Figure 9**, which are the amplitude and time characteristics of P, Q, S and T peaks of cardiac cycles.

Other authors suggest working with the frequency characteristics of the signal. For example, biometric features can be obtained using a discrete wavelet transform. Previously, we explored wavelets such as Haar wavelets, Daubechies wavelets (from db1 to db38), Symlets (from sym2 to sym20), Coiflets (from coif1 to coef17), Biorthogonal (from bior1.1 to bior6.8), Reverse biorthogonal wavelet (from rbio1.1 to rbio6.8), and Discrete Meyer (FIR Approximation) [17]. We have shown that wavelets such as Haar, Daubechies, and Symlets are best suited for biometric identification. We have shown that good results can be obtained if the entire cardiocycle is used as biometric features [18]. The number of features in this case depends on the sampling frequency of the signal. We used data from the following databases. PTB database (sampling rate is 1000 Hz), the cardiocycles consist of 600 points, in the case of the European ST-T Database (sampling rate is 250 Hz), the cardiocycles consist of 150 points, and in the case of St.Petersburg Institute of Cardiological Technics 12-lead Arrhythmia Database (sampling rate is 257 Hz), cardiocycles consist of 153 points.

### **9. Assessment of the informative value of biometric features and the selection of the most informative features**

Experience shows that not all biometric features have the same information content. If you remove of uninformative features, you can significantly increase the speed of data processing. We investigated the informativeness of analytical features (amplitude and time characteristics of P, Q, S, T peaks) obtained from 51 subjects from the PTB database [19]. To do this, we determined the significance of differences between the clouds of points P, Q, S and T regions of the electrocardiograms of the subjects using Student's criterion at a significance level of 95%. Matrices of significance of differences are given below (**Figure 10**).

It can be seen from the figure that the overlap of the points is much smaller in the S and T regions. When using all eight signs together, the overlap of the points is not observed (**Figure 11**).

Conclusion: the most informative analytical features are the amplitude values in the S and T regions.

**29**

**Figure 10.**

**Figure 11.**

**10. Feature classification**

*axes show the numbers of subjects (1–51).*

*Y* is a set of answers.

Let us suppose X is a space of objects.

The problem of person biometric identification concerns classification problems. To solve it, we have to consider algorithms from some finite set and choose an algorithm that gives the least error of the forecast. Let's introduce some notation.

*Matrix of significance of differences when sharing eight features. Note: the color shows the number of cases of significance of differences from eight (the lightest area) to zero (the darkest area). The pattern is symmetrical with respect to the diagonal passing through the upper left and lower right corners. The abscissa and ordinate* 

*Matrices of significance of differences according to 8 characteristics for 51 subjects (P value < 0.05). Note: in the bright areas of the figures, the differences are significant, in the dark areas – unreliable. The figures are symmetrical with respect to the diagonals passing through the upper left and lower right corners. The abscissa* 

= (*xi*, *yi*)

*l*

*<sup>i</sup>*=1 (1)

*Xl*

*Biometric Authentication Based on Electrocardiogram DOI: http://dx.doi.org/10.5772/intechopen.91172*

*and ordinate axes show the numbers of subjects (1–51).*

*Biometric Authentication Based on Electrocardiogram DOI: http://dx.doi.org/10.5772/intechopen.91172*

#### **Figure 10.**

*Biometric Systems*

**Figure 9.**

*onset of R-peak.*

We can see a feature cloud in **Figure 9**, which are the amplitude and time char-

*Amplitude and time characteristics of P, Q, S, T peaks. Cardiocycles are synchronized in amplitude and time of* 

Other authors suggest working with the frequency characteristics of the signal. For example, biometric features can be obtained using a discrete wavelet transform. Previously, we explored wavelets such as Haar wavelets, Daubechies wavelets (from db1 to db38), Symlets (from sym2 to sym20), Coiflets (from coif1 to coef17), Biorthogonal (from bior1.1 to bior6.8), Reverse biorthogonal wavelet (from rbio1.1 to rbio6.8), and Discrete Meyer (FIR Approximation) [17]. We have shown that wavelets such as Haar, Daubechies, and Symlets are best suited for biometric identification. We have shown that good results can be obtained if the entire cardiocycle is used as biometric features [18]. The number of features in this case depends on the sampling frequency of the signal. We used data from the following databases. PTB database (sampling rate is 1000 Hz), the cardiocycles consist of 600 points, in the case of the European ST-T Database (sampling rate is 250 Hz), the cardiocycles consist of 150 points, and in the case of St.Petersburg Institute of Cardiological Technics 12-lead Arrhythmia Database (sampling rate is 257 Hz), cardiocycles consist

**9. Assessment of the informative value of biometric features and the** 

Experience shows that not all biometric features have the same information content. If you remove of uninformative features, you can significantly increase the speed of data processing. We investigated the informativeness of analytical features (amplitude and time characteristics of P, Q, S, T peaks) obtained from 51 subjects from the PTB database [19]. To do this, we determined the significance of differences between the clouds of points P, Q, S and T regions of the electrocardiograms of the subjects using Student's criterion at a significance level of 95%. Matrices of

It can be seen from the figure that the overlap of the points is much smaller in the S and T regions. When using all eight signs together, the overlap of the points is

Conclusion: the most informative analytical features are the amplitude values in

acteristics of P, Q, S and T peaks of cardiac cycles.

**selection of the most informative features**

significance of differences are given below (**Figure 10**).

**28**

not observed (**Figure 11**).

the S and T regions.

of 153 points.

*Matrices of significance of differences according to 8 characteristics for 51 subjects (P value < 0.05). Note: in the bright areas of the figures, the differences are significant, in the dark areas – unreliable. The figures are symmetrical with respect to the diagonals passing through the upper left and lower right corners. The abscissa and ordinate axes show the numbers of subjects (1–51).*

#### **Figure 11.**

*Matrix of significance of differences when sharing eight features. Note: the color shows the number of cases of significance of differences from eight (the lightest area) to zero (the darkest area). The pattern is symmetrical with respect to the diagonal passing through the upper left and lower right corners. The abscissa and ordinate axes show the numbers of subjects (1–51).*

#### **10. Feature classification**

The problem of person biometric identification concerns classification problems. To solve it, we have to consider algorithms from some finite set and choose an algorithm that gives the least error of the forecast. Let's introduce some notation. Let us suppose X is a space of objects.

*Y* is a set of answers.

$$X^l = \left(\mathfrak{X}i, \mathfrak{yi}\right)^l\_{i=1} \tag{1}$$

is a training set, *l* is a sample size.

$$\mathbf{y}\_i = \mathbf{y}^\*(\boldsymbol{\infty}\_i),\tag{2}$$

$$A\_t = \{ \mathbf{a} : \mathbf{X} \to \mathbf{Y} \} \tag{3}$$

are a model of algorithms, t ⊆ T, T is a number of algorithms under consideration.

$$
\mu t \colon (X \times Y)^l \to A\_t \tag{4}
$$

are learning methods. It is required to find a method μ t with the best generalizing power.

When finding a method *μt*, we often have to solve the following subtasks:


$$F = \{ f\_{\hat{f}}; X \Rightarrow D\_{\hat{f}} \text{; j} = 1, \ldots, n \} \tag{5}$$

is a set of features. The method of learning μ j uses only features J ⊆ F. It is used to assess the quality of learning by precedents. *L(a, x)* is a cost function of algorithm a on the object *x*.

$$Q(a, X^l) = \frac{1}{l} \sum\_{i=1}^{l} L(a, x\_i) \tag{6}$$

is a functional of accuracy a on *X*. In this case, we consider an internal quality criterion that is measured on the training set *X<sup>l</sup>* :

$$Q\_{\mu}\{\mathbf{X}^{l}\} = Q\left(\mu\{\mathbf{X}^{l}\}, \mathbf{X}^{l}\right) \tag{7}$$

and an external criterion evaluating the quality of learning on hold-out set *Xk* [2]:

$$Q\_{\mu}(X^{l}, X^{k}) = Q(\mu(X^{l}), X^{k}) \tag{8}$$

**31**

**Author details**

19-07-00780.

**Acknowledgements**

M.R. Bogdanov1,2\*, A.S. Filippova2

provided the original work is properly cited.

1 Ufa State Aviation Technical University, Ufa City, Russia

2 Siberian Telecommunication Company, Moscow, Russia

\*Address all correspondence to: bogdanov\_marat@mail.ru

2 M.Akmullah Bashkir State Pedagogical University, Ufa City, Russia

, G.R. Shakhmametova1

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

The reported study was funded by RFBR according to the research project no.

and Nikolai N. Oskin3

*Biometric Authentication Based on Electrocardiogram DOI: http://dx.doi.org/10.5772/intechopen.91172*

**11. Conclusion and perspectives**

By selecting model hyperparameters, it is possible to significantly increase recognition accuracy. So, in our previous study, it was shown that using the Support Vector Machine classifier for ECG classification uses as default following hyper parameters: C = 1.0, kernel = "rbf," gamma = "auto." When using of default parameters while performing of classification of electrocardiograms, we had an accuracy score equal to 0.93. We tuned hyper parameters of classification with Grid Search procedure varying C parameter in range of [1, 10, 100, 1000], kernel in range of ["linear," "rbf"], and gamma in range of [1e-3, 1e-4]. After performing of tuning, we had the following best parameters set: "kernel": "rbf," "C": 10, "gamma": 0.001.

Traditional password-based authentication methods have a number of disadvantages related primarily to the human factor. Biometric methods of identification and authentication are much more reliable, although they have some disadvantages. Some of them (fingerprints, retina, and voice) were compromised. It is not clear what to do if hackers gain access to a biometric database, because a person cannot change fingerprints as easily as a forgotten password. The development of wireless technologies and technologies of the Internet of medical things makes possible the emergence of new biometric identification scenarios. Here, first of all, I would like to note the biometric authentication of the patient in the Body area network. In this case, ECGs are not used to generate biometric keys confirming the patient's authenticity [22]. The second important area is contactless ECG recording using ultra-wideband radars.

Using these parameters, we had an accuracy score equal to 0.99.

Recognition accuracy will be affected by both the choice of the classification method and its implementation, in particular, the selection of hyperparameters. We tested 14 methods of Machine Learning for classification (Naive Bayes classifier for multivariate Bernoulli models, A decision tree classifier, An extremely randomized tree classifier, Classifier implementing the k-nearest neighbors vote, Label Propagation classifier, Linear Discriminant Analysis, Linear Support Vector Classification, Logistic Regression (aka logit, MaxEnt) classifier, Nearest centroid classifier, A random forest classifier, Classifier using Ridge regression, Ridge classifier with built-in cross-validation, and Gaussian Mixture Models, SVM) [20]. We found that the most accurate methods of classification are Label Propagation classifier (accuracy of recognition is 0.94), an extremely randomized tree classifier (accuracy is 0.92), and a Classifier implementing the k-nearest neighbors vote (accuracy is 0.90) [21].

*Biometric Authentication Based on Electrocardiogram DOI: http://dx.doi.org/10.5772/intechopen.91172*

*Biometric Systems*

consideration.

izing power.

is a training set, *l* is a sample size.

*t*:

of hyperparameters).

• Features selection:

• Choice of the best model *At* (model selection).

yi = y<sup>∗</sup>

(*X* × *Y*)*<sup>l</sup>*

When finding a method *μt*, we often have to solve the following subtasks:

are learning methods. It is required to find a method μ t with the best general-

• Choice of learning method *μt* for a given model *At* (in particular, optimization

*F* = {*fj*:*X* → *Dj*:*j* = 1,…,*n*} (5)

is a functional of accuracy a on *X*. In this case, we consider an internal quality

) = *Q*((*X<sup>l</sup>*

and an external criterion evaluating the quality of learning on hold-out set *Xk* [2]:

) = *Q*((*X<sup>l</sup>*

Recognition accuracy will be affected by both the choice of the classification method and its implementation, in particular, the selection of hyperparameters. We tested 14 methods of Machine Learning for classification (Naive Bayes classifier for multivariate Bernoulli models, A decision tree classifier, An extremely randomized tree classifier, Classifier implementing the k-nearest neighbors vote, Label Propagation classifier, Linear Discriminant Analysis, Linear Support Vector Classification, Logistic Regression (aka logit, MaxEnt) classifier, Nearest centroid classifier, A random forest classifier, Classifier using Ridge regression, Ridge classifier with built-in cross-validation, and Gaussian Mixture Models, SVM) [20]. We found that the most accurate methods of classification are Label Propagation classifier (accuracy of recognition is 0.94), an extremely randomized tree classifier (accuracy is 0.92), and a Classifier

:

), *X<sup>l</sup>*

), *X<sup>k</sup>*

is a set of features. The method of learning μ j uses only features J ⊆ F.

It is used to assess the quality of learning by precedents. *L(a, x)* is a cost function of algorithm a on the object *x*.

> *Q*  (*X<sup>l</sup>*

*Q*  (*X<sup>l</sup>* , *X<sup>k</sup>*

implementing the k-nearest neighbors vote (accuracy is 0.90) [21].

criterion that is measured on the training set *X<sup>l</sup>*

are a model of algorithms, t ⊆ T, T is a number of algorithms under

(*xi*), (2)

→ *At* (4)

(6)

) (7)

) (8)

*At* = {a:X → Y} (3)

**30**

By selecting model hyperparameters, it is possible to significantly increase recognition accuracy. So, in our previous study, it was shown that using the Support Vector Machine classifier for ECG classification uses as default following hyper parameters: C = 1.0, kernel = "rbf," gamma = "auto." When using of default parameters while performing of classification of electrocardiograms, we had an accuracy score equal to 0.93. We tuned hyper parameters of classification with Grid Search procedure varying C parameter in range of [1, 10, 100, 1000], kernel in range of ["linear," "rbf"], and gamma in range of [1e-3, 1e-4]. After performing of tuning, we had the following best parameters set: "kernel": "rbf," "C": 10, "gamma": 0.001. Using these parameters, we had an accuracy score equal to 0.99.

### **11. Conclusion and perspectives**

Traditional password-based authentication methods have a number of disadvantages related primarily to the human factor. Biometric methods of identification and authentication are much more reliable, although they have some disadvantages. Some of them (fingerprints, retina, and voice) were compromised. It is not clear what to do if hackers gain access to a biometric database, because a person cannot change fingerprints as easily as a forgotten password. The development of wireless technologies and technologies of the Internet of medical things makes possible the emergence of new biometric identification scenarios. Here, first of all, I would like to note the biometric authentication of the patient in the Body area network. In this case, ECGs are not used to generate biometric keys confirming the patient's authenticity [22]. The second important area is contactless ECG recording using ultra-wideband radars.

### **Acknowledgements**

The reported study was funded by RFBR according to the research project no. 19-07-00780.
