**2. Materials and methods**

#### **2.1 Spirometric measurements and dataset**

The performance success of the DL architecture is closely related to the dataset chosen as its input. The success of DL architectures trained with a large number of data is higher than those with a smaller volume of data. It is known that this type of DL architecture gives positive results in models with sufficient examples but not very deep architecture.

In this book chapter, we collected a biomedical dataset, including VC, age, height, and weight parameters of normal subjects. About 491 healthy subjects (363 males and 128 females, aged between 21 and 61) were selected for this data collection. To measure the VC parameter, the breathing performance of the 50 s was recorded from each patient. Thus, a data record of 6.82 h was created in the database. The measurements were performed using a biomedical spirometer device (Fukuda Sangyo brand spiroanalyz ST-75) at our Biomedical Device Technology Laboratory (BCT Lab). Related information about the patients in the study group is summarized in **Table 1**.

The spirometric measurement system and the device used are shown in **Figure 2**. At the end of the measurements, a new biomedical dataset consisting of a total of 1.964 data including age, height, and weight, and VC parameters were created. Thus, a data frame with a total of 1.964 data (with a total of 4 columns, each column consisting of 491 values) was created. **Figure 3** shows some sample breathing performance signals of the present dataset.


**Table 1.**

*General features of chosen patients.*

**Figure 2.** *The spirometric measurement system.*

#### **2.2 Proposed deep network model**

Neural networks that have more than one hidden layer are called deep neural networks [20]. Deep networks mean that there are multiple hidden layers within the adaptive neural network (ANN) architecture. ANN algorithms can be adapted to many areas and are widely used in many different fields [21]. ANN structurally consists of three layers (input, hidden, and output layer). Neurons between layers are linked to each other by specific pathways that have a certain weight value [22]. In this chapter, as the ANN algorithm, a multilayer perceptron feed-forward neural network (MLPFFNN) was preferred [23]. In MLPFFNN model, data flow occurs in one direction from the input layer to the output layer [24]. The article aims to create a reliable deep model for predicting VC. Weight, height, and age parameters (independent variables) were given as input parameters to the model. The VC value (dependent variable) was taken as output.

The designed MLPFFNN is multilayer and there are many neurons in each layer except the output layer. The number of neurons in the output layer is 1. The number of neurons in the hidden layers was observed gradually between 1 and 70 and by trial and error, and the most ideal multi-neuron network structure was selected. The best result was achieved in 1000 repetitions (the number of repetitions was increased by 50 steps). In the intended deep network model, the mini-batch size is 40. The effect of the minibatch size on the model performance was examined and after various trials, it was decided that this number was the most ideal. The learning rate was chosen as 0.0012 and gradient reduction as 0.85. Adaptive Moment Estimation (Adam) algorithm is used as the optimizer. The Rectifier Linear Unit (ReLU), which is one of the most commonly used activation functions, is used as an activation function [25], and it is defined as follows:

$$\mathbf{f}(\mathbf{x}) = \max\left\{\mathbf{0}, \mathbf{x}\right\} \tag{1}$$

where x is the weighted sum of the inputs and f (x) is the activation function. The function output is between 0 and the maximum value. The designed MLPFFNN

*Deep Network Model and Regression Analysis Using OLS Method for Predicting Lung… DOI: http://dx.doi.org/10.5772/intechopen.104737*

**Figure 3.** *Sample VC signals from the subject dataset used in this study.*

**Figure 4.** *MLPFFNN architecture.*

structure is shown in **Figure 4**.where, i, j, and k are layers within the model, b1, b2; biases, w1, w2; weights, K, L, M; the number of neurons in the layers, VC; vital capacity.

#### **2.3 Training of the proposed MLPFFNN model**

In this section, the hyperparameter parameters including learning rate, verbose, batch size, the number of iterations, and epoch size are determined. Adam's optimization algorithm was selected for backpropagation and updating of model weights. To avoid overfitting, the number of epochs was adjusted. Thanks to the dropout feature added to the model, overfitting was prevented. In the drop layer, some nodes of the network are removed to prevent the network from being dependent on a particular neuron. Thanks to "letting go," the network can be forced to learn correctly even in the absence of certain information [26]. The learning rate was reduced gradually. The model algorithm was trained and tested using the 66–33% training and a testing data partition as 329 data, 162 data, respectively. Performance metrics of the proposed model are shown in **Figure 5**. Here, we can see the variation of validation loss (val\_loss) compared to training loss (loss). The loss shows how close the neural network is to the optimum. One of the differences between val\_loss and loss is that, when using dropout, validation loss can be lower than training loss (usually not expected in cases where dropout is not used). The values for loss are similar. The general loss decreases after almost every epoch and approaches the value 0, whereas the val\_loss stagnates.

#### **2.4 Regression algorithms**

Regression algorithms estimate the output parameter based on the input parameters. OLS is a type of least-squares method used to predict undefined states in a regression model. In the OLS method, in light of the least-squares principle, the sum of the squares of the differences between the dependent variable and the predicted in the given data set is minimized. The differences obtained are aimed to be minimal.

In the OLS model used in the study, the relevance between the dependent variable (VC) and the independent variables (age, height, and weight) was investigated using equation Eq. (2).

*Deep Network Model and Regression Analysis Using OLS Method for Predicting Lung… DOI: http://dx.doi.org/10.5772/intechopen.104737*

**Figure 5.** *Training and verification graphs with loss of the model depending on the number of epochs.*

$$\mathbf{A} = \mathbf{a}\_0 + \mathbf{a}\_1 \mathbf{X}\_1 + \mathbf{a}\_2 \mathbf{X}\_2 + \mathbf{a}\_3 \mathbf{X}\_3 + \mathbf{e} \tag{2}$$

where A is VC; X1 … X3 symbolized age, height, and weight variables, respectively; α0 is the bias and α1 … α3 are the coefficients of the variables; and e is the error parameter [27]. In this study, the relationship between the real values (measured) and the predicted values found by the model was examined by some popular regression methods based on the OLS algorithm.

#### *2.4.1 Multi-linear regression (MLR)*

MLR is a statistics-based analysis technique that is widely used in output variable estimation using different variables. The purpose of MLR is to model the linear relationship between the independent variables and the dependent variable.

#### *2.4.2 Polynomial regression (PR)*

PR model parameters (X, Y, b, and e) can represent in matrix form, as design input matrix, output response vector, dependent parameters vector, and random error, respectively, as given in Eq. (3) [28].

$$\mathbf{Y} = \mathbf{b}\_0 + \mathbf{b}\_1 \mathbf{X} + \mathbf{b}\_2 \mathbf{Y} + \mathbf{b}\_3 \mathbf{X}\_2 + \mathbf{b}\_4 \mathbf{X} \mathbf{Y} + \mathbf{b}\_5 \mathbf{Y}\_2 + \mathbf{e} \tag{3}$$

#### *2.4.3 Support vector regression (SVR)*

It is known that the SVR algorithm is a very powerful instrument in real value estimation studies [29]. General SVR estimation functions as given in Eq. (4).

$$\mathbf{f}(\mathbf{x}) = \mathbf{w}.\Phi(\mathbf{x}) + \mathbf{b} \tag{4}$$

where w and b are the weight coefficient and the bias coefficient, respectively [30].

#### *2.4.4 Decision tree regression (DTR)*

DTR is a supervised learning method used for classification and regression. The decision tree-based regression algorithm can provide close to optimum distribution decisions [31].

#### *2.4.5 Random forest regression (RFR)*

RFR is a group learning algorithm based on decision trees. Random forests for regression are formed by growing trees depending on a random vector [32]. The output values are numerical and it is assumed that the training set is independently drawn from the distribution of the random vector Y, X. h (x) and E represents the tree predictor and the mean-squared generalization error for any numerical predictor, respectively. The mean-squared generalization error h(x) is given in Eq. (5).

$$\mathbf{E}\_{\mathbf{X},Y} \left( Y - h(\mathbf{X}) \right)^2 \tag{5}$$

In regression tasks, the mean prediction of K regression trees, hk(x) is calculated to obtain the random forest prediction is given in Eq. (6).;

$$\text{RFR prediction} = \frac{1}{K} \sum\_{k=1}^{K} (\mathbf{h}\_k(\mathbf{x})) \tag{6}$$

#### **2.5 Model Performance Evaluation**

To evaluate our proposed method, accuracy is calculated using Eq. (7).

$$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \tag{7}$$

TP, TN, FP, and FN in Eq. (6) are true positives, true negatives, false positives, and false negatives, respectively [33]. A confusion matrix has been formed for calculating the performances of the model used in the study.

## **3. Results**

In this book chapter, the multiple-layer perceptron neural network (MLPFFNN) was selected for the ANN implementation. In the selected MLPFFNN design, the best result was achieved in 1000 repetitions (the number of repetitions was increased by 50 steps) with a mini-batch size of 40, a learning ratio of 0.0012, and the gradient reduction of 0.85. The simulation environment is Python 3.8.5(64 bit). **Figure 6** shows the predictive VC values found by the model versus the actual VC values.

Statistically, the actual value is the value that is obtained by observation or by measuring the available data. The predicted value is the value of the variable predicted based on the regression analysis.

As a result of the graphical comparison, it can be easily seen that the estimated VC values of all the participants participating in the study watch very close to the actually measured (with spirometer) VC values.

*Deep Network Model and Regression Analysis Using OLS Method for Predicting Lung… DOI: http://dx.doi.org/10.5772/intechopen.104737*

**Figure 6.** *The graphical comparison of results.*


#### **Table 2.**

*Three-parameter OLS models result in terms of R-squared.*

When 3-parameter OLS models are examined in terms of R-squared, it is seen that the best OLS result is obtained with Multi-Linear Regression (0.946). Threeparameter OLS models result in terms of R-squared is given in **Table 2**.

The proximity between the predicted results of the model and the actual values measured with the Spirometer is shown in **Figure 7**. The results are quite close to each other. This figure shows the scattering of the predicted and actual values relative to each other. Accordingly, the blue dots on the figure show the data series linear regression situation.

The confusion matrix obtained from the results is given in **Table 3**. Overall accuracy was found at 93.3%. As a result, an efficient deep model is provided for estimating the VC parameter.

## **4. Discussion**

The chapter aims to build an accurate and reliable deep neural network model to predict VC using weight, height, and age parameters (independent variables) as input

**Figure 7.** *Evaluation of model performance.*


#### **Table 3.**

*The confusion matrix of the model.*

parameters to the model. The VC value (dependent variable) was taken as the output of the proposed deep neural network.

In this book chapter, a multiple-layer perceptron neural network (MLPFFNN) was selected as a preferred ANN algorithm. Three-parameter OLS models are examined in terms of R-squared, it is found that the best OLS result is obtained with Multi-Linear Regression (R-squared =0.948). The results showed that the height and age information has a significant effect on the VC compared to the weight information. These variables played a significant role in the prediction of VC. Although studies of estimating lung volume have been encountered in the literature search, a deep neural network model application that estimates the VC value using some specified independent parameters has not been found.

Therefore, it is believed that the results presented in this chapter will fill an important gap in the literature in light of both the database specificity and the presented ML-AI method.

## **5. Conclusions**

COPD disease has become a challenging problem with the effect of COVID-19. The fact that the Respiratory Test Functions, which is the most effective method for diagnosing COPD, cannot be performed at home has forced researchers to find different, new, cheap, technological, and practical methods addressing this challenge.

In this book chapter, it is suggested that the deep neural network-based VC prediction algorithm can be used in clinical tests to reduce the workload of doctors and nurses. As shown in this chapter, a fast and reliable diagnostic tool using ML-AI algorithm was obtained. The proposed ML-AI model provided 93.3% accuracy. The simulation results showed that the VC parameter can be determined with a high success rate using the proposed deep learning model with real data. With the proposed model, the rate of misdiagnosis can be reduced and spirometric measurements can be made quickly without waiting for hours to have Pulmonary Function Test (PFT) performed in hospitals.

The simulation results indicate that a smart tool using ML-AI technology can be a reliable alternative to medical spirometers. Currently, the developed model is planned to be tested clinically and the results will be reported in future studies. The goal is to provide this smart tool to be used in hospitals after approval from field experts and governmental health agencies.

## **Acknowledgements**

This study is supported by the Coordinatorship of Ondokuz Mayıs University's Scientific Research Projects (Project Number: PYO.YMY.1901.20.001), Samsun,

*Deep Network Model and Regression Analysis Using OLS Method for Predicting Lung… DOI: http://dx.doi.org/10.5772/intechopen.104737*

Turkey. We would like to extend our heartfelt thanks to Dr. Kazım Sekeroğlu of Southeastern Louisiana University, Hammond, LA, USA.
