4. Case study

category of generalized linear classifiers (GLCs). In SVRs, a maximal hyperplane is constructed to separate a high dimensional space of input vectors mapped with the feature space. It was initially designed as a classifier only to be modified in a later study by Vapnik [35] as a support vector regressor (SVR) for regression problems. Its robustness in a single model estimation condition has been testified to [36]. Hence, it can be considered invaluable for the estimation of both real valued and indicator functions as common in pattern recognition and regression

When used as a regressor, SVRs attempt to choose the "best" model from a list of possible models (i.e., approximating functions) fð Þ x; ω , where a set of generalized parameters is given by ω. Generally, "good" models are those that can generalize their good predictive performance on an out-of-sample test set. This is often determined by how well the model minimizes the cost function while training with the training data. The core feature of SVR regression in control of its attractive properties is the notion of an ε-insensitive loss function. SVR is suitable for estimating the dominant model under multiple model formulation, where the objective function can be viewed as a primal problem, and its dual form can be obtained by constructing

SVRs generalization characteristics are ensured by the special properties of the optimal hyperplane that maximizes the distance to training examples in a high dimensional feature space. It has been shown to exhibit excellent performance [32]. The merits and limitations of SVRs are summarized thus; merits: SVRs can deal with very high dimensional data; they can learn very elaborate concepts; usually works very well. While the limitations are: requirement of both positive and negative examples; the need to select a good kernel function; consumes lots of memory and CPU time; there are some numerical stability problems in solving the constrained [30, 37, 38]. Analysis of (linear) SVR indicates that the regression model depends mainly on support vectors on the border of ε-insensitive zone; SVR solution is very robust to "outliers" (i.e., data samples outside ε-insensitive zone). These properties make SVM very attractive for

LS-SVRs are reformulated versions of the original SVRs algorithm for classification and function estimation, which maintains the advantages and the attributes of the original SVRs theory. LS-SVRs are closely related to regularization networks and Gaussian processes but additionally emphasize and exploit primal-dual interpretations [39]. LS-SVR possesses excellent generalization performances and is associated with low computational costs. LS-SVR requires less effort in model training in comparison to the original SVR, owing to its simplified algorithm. It minimizes a quadratic penalty on the slack variables which allows the quadratic programming problem to be reduced to a set of matrix inversion operations in the dual space, which takes less time compared to solving the SVR quadratic problem [40]. Robustness, sparseness, and weightings can be incorporated into LS-SVRs where needed and a Bayesian framework with three levels of inference has also been developed [41]. Some of its limitations include being ineffective at handling non-Gaussian noise as well as being

Lagrange function and introducing a set of (dual) variables.

its use in an iterative procedure for multiple model estimation.

3.1.4. Least square support vector regressions (LS-SVR)

sensitive to outliers [42].

problems, respectively.

116 Drilling

A case study is presented below to illustrate one of the advantages inherent in combining AI techniques with domain expert knowledge for improved prediction and optimization of drilling rate of penetration.

### 4.1. Data description

In this study, data from two development wells from onshore Niger Delta hydrocarbon province were used for the development and testing of the models, in each of the AI algorithms compared. The field is about 95 square kilometers in extent with a northwest-southeast trending dual culmination rollover anticline. The wells chosen represents the best in terms of drilling performance as measured by best ROP and bit runs for all the three hole sections considered. The formations encountered are mainly consolidated intercalation of shales and shallow marine shoreface sands with a normal compaction trend, a typical elastic depositional environment of the Niger Delta. The field is a mainly gas field with some of the reservoirs having significant oil rims.

The wells used for the study were selected for ROP prediction because they were the best in class in terms of drilling performance, a result of carefully optimized drilling parameters and practices. The repeatability of such feat is highly desirable, and hence the choice of the wells. The formations encountered are well correlated across the field with lateral continuity. These two wells fairly represents the field with Well-A located in the Eastern flank of the field while Well-B is located 8 km to the west of Well-A and just about 3 km to the field western boundary. While Well-A is highly deviated and deeper in reach with maximum inclination of 74 at total depth of 11,701 ft TVD, Well-B is slightly deviated with maximum inclination of 23 at total depth of 9000 ft TVD The wells are also similar in terms of drilling equipment, the same rig was used for their construction; bit type and bottom hole assembly (BHA) used were same, hence, they were both drilled with the same bottom hole hydraulics. Details of the bit used in the three hole sections included in this research are presented in Table 3.


Table 3. Bit details.

As explained in Section 2.4, the specific energy concept in the drillability of a formation is being explored in this study with particular focus on hydromechanical specific energy, HMSE. The HMSE concept states that "the energy required to remove a unit volume of rock comes primarily from the torque applied on the bit, the weight on bit (WOB), and the hydraulic force exerted by the drilling fluid on the formation" [14]. Drilling data from surface data logging (SDL) tools were used in this study. These were real-time data collected at surface and could be transmitted via satellite to a central location while drilling. Among the numerous data usually collected are; measured depth (MD), hookload (HKLD), weight on bit (WOB), pipe rotation per minute (RPM), rotary torque (TORQ), mud flow-in rate (GPM), total gas (TG), pump strokes per minute (SPM), pits volume change, mud flow-out rate percentage (FFOP%), mud weight in (MW), etc. Since ROP prediction using the hydromechanical specific energy ROP model is the focus of the research, efforts to use as many data that affects ROP were consciously made. Given the HMSE Eqs. (6) and (7) in Section 2.4, [14]. It is necessary therefore, to reorganize the collected data and focus on those with physical relationship with ROP based on the HMSE-ROP model.

Statistical properties of the data in various forms such as standard deviation, mean, median, etc., were taken before training the learning models. Statistical analysis helped to reveal certain characteristics of the datasets, one of such important characteristics is standard deviation as

Well-A (Dataset 1) 3641 WOB, RPM, TORQ, SPP, GPM, Depth, MW, Bit Size Well-B (Dataset 2) 5228 WOB, RPM, TORQ, SPP, GPM, Depth, MW, Bit Size Key: weight on bit (WOB), bit rotation per minute (RPM), rotary torque (TORQ), stand pipe pressure (SPP), flow rate in

Well-Code No of data Utilized drilling parameters (Predictors)

It reveals that the dataset varies widely as a result of the different lithological units penetrated, and as such data normalization was carried out as part of preprocessing. This brought the various data within same range to align their distributions and prevented biasing of the model

Data splitting and model development: to ensure uniform distribution of the data point and removed effect of biased sampling, the normalized data were then randomized before used in the model development. Data from the two wells were randomly split into 70% for training, 15% for testing and 15% for validation with which the algorithms were trained, modified to come up with an acceptable model for testing in each of the artificial intelligence techniques.

> TORQ (kf-p)

TORQ (kf-p)

SPP (psig)

Rate of Penetration Prediction Utilizing Hydromechanical Specific Energy

http://dx.doi.org/10.5772/intechopen.76903

119

SPP (psig) MW (ppg)

MW (ppg) Bit Size (inch)

Bit Size (inch)

ROP (fph)

ROP (fph)

RPM (rpm)

RPM (rpm)

Min 302.4 375 2 10 1 317 8.9 8.5 2.7 Max 9264 2449 47 152 24.28 3522 10.5 16 281 SD 135.15 8.43 51.77 4.47 629.79 0.51 — 117.10 Median 888 16 41 7.06 2272 9.26 — 158 Mean 887.93 16.05 79.21 7.94 2372.71 9.68 — 177.22

Min 2681.3 450 1 2 1.33 1232 8.6 12.25 9 Max 12982.5 1108 68 142 20.32 4216 11.5 16 170 SD 83.94 6.72 28.75 3.47 557.65 0.78 — 40.26 Median 916 14 129 19.51 2878 10.4 — 82.6 Mean 899.59 14.70 117.93 17.73 2878.82 10.29 — 84.43

can be seen in Tables 5 and 6.

Depth (ft)

Depth (ft)

gallons per minute (GPM), mud weight (MW).

toward large values that are present in the dataset [6].

Flowrate (gpm)

Table 5. Statistical analysis of Well-A (Dataset A).

Table 6. Statistical analysis of Well-B (Dataset B).

Flowrate (gpm)

WOB (klb)

WOB (klb)

Table 4. Streamlined datasets for each of the wells (predictors) used in the models.

It is important to mention that the surface drilling mechanics data are inexpensive to collect during drilling operations; the sensors can be calibrated without disturbing drilling operations and are a must-have for drilling operations. Hence, continuous drilling data such as MD, WOB, RPM, flow rate, mud weight, bit size, TORQ, SPP from the two wells were used in this study. Data quality checks were performed on individual wells and simple activity logic was applied to ensure only on-bottom drilling data were used. Noise, as a result of sensor issues, and spurious data points within the dataset were filtered out of the collection first using activity code to sort the data and manually removing data points that are out of range using excel spreadsheet.

#### 4.2. Details of the experiment/methodology

The following approach was used in the preparation of the model using data from the selected well as follows:



Key: weight on bit (WOB), bit rotation per minute (RPM), rotary torque (TORQ), stand pipe pressure (SPP), flow rate in gallons per minute (GPM), mud weight (MW).

Table 4. Streamlined datasets for each of the wells (predictors) used in the models.

As explained in Section 2.4, the specific energy concept in the drillability of a formation is being explored in this study with particular focus on hydromechanical specific energy, HMSE. The HMSE concept states that "the energy required to remove a unit volume of rock comes primarily from the torque applied on the bit, the weight on bit (WOB), and the hydraulic force exerted by the drilling fluid on the formation" [14]. Drilling data from surface data logging (SDL) tools were used in this study. These were real-time data collected at surface and could be transmitted via satellite to a central location while drilling. Among the numerous data usually collected are; measured depth (MD), hookload (HKLD), weight on bit (WOB), pipe rotation per minute (RPM), rotary torque (TORQ), mud flow-in rate (GPM), total gas (TG), pump strokes per minute (SPM), pits volume change, mud flow-out rate percentage (FFOP%), mud weight in (MW), etc. Since ROP prediction using the hydromechanical specific energy ROP model is the focus of the research, efforts to use as many data that affects ROP were consciously made. Given the HMSE Eqs. (6) and (7) in Section 2.4, [14]. It is necessary therefore, to reorganize the collected data and focus on those with physical relationship with ROP based on the

It is important to mention that the surface drilling mechanics data are inexpensive to collect during drilling operations; the sensors can be calibrated without disturbing drilling operations and are a must-have for drilling operations. Hence, continuous drilling data such as MD, WOB, RPM, flow rate, mud weight, bit size, TORQ, SPP from the two wells were used in this study. Data quality checks were performed on individual wells and simple activity logic was applied to ensure only on-bottom drilling data were used. Noise, as a result of sensor issues, and spurious data points within the dataset were filtered out of the collection first using activity code to sort the data and manually removing data points that are out of range using

The following approach was used in the preparation of the model using data from the selected

1. Collect and explore the datasets: raw data from the two wells, which included several drilling equipment parameters, were explored to analyze properties of interesting attributes as it relates to the objective of the study. Eight measured drilling parameters of

2. Data integrity check: verify the data quality and identify plausibility of values from

3. Sorting of data: using drilling activity code to separate on-bottom parameters of the identified predictors (drilling parameters to be used for ROP prediction in the AI models) from HMSE-ROP model. Clean datasets by removing noise either as a result of sensor calibration issues or as equipment malfunctioning using operational background knowledge. The total number of drilling variables which were used as predictors of ROP is

HMSE-ROP model.

118 Drilling

excel spreadsheet.

well as follows:

4.2. Details of the experiment/methodology

operational point of view.

presented in Table 4.

interest were eventually selected for this study.

Statistical properties of the data in various forms such as standard deviation, mean, median, etc., were taken before training the learning models. Statistical analysis helped to reveal certain characteristics of the datasets, one of such important characteristics is standard deviation as can be seen in Tables 5 and 6.

It reveals that the dataset varies widely as a result of the different lithological units penetrated, and as such data normalization was carried out as part of preprocessing. This brought the various data within same range to align their distributions and prevented biasing of the model toward large values that are present in the dataset [6].

Data splitting and model development: to ensure uniform distribution of the data point and removed effect of biased sampling, the normalized data were then randomized before used in the model development. Data from the two wells were randomly split into 70% for training, 15% for testing and 15% for validation with which the algorithms were trained, modified to come up with an acceptable model for testing in each of the artificial intelligence techniques.


Table 5. Statistical analysis of Well-A (Dataset A).


Table 6. Statistical analysis of Well-B (Dataset B).

Data integrity and similarity were also preserved in all methods to avoid bias in evaluating different algorithms across the four AI techniques.

Model development: the implementation of ANN was carried out using MatLab® ANN toolbox. The implementation was based on the backpropagation algorithm with momentum and adaptive learning rate, and the sigmoidal functions. In the implementation of ELM, the algorithm was based on MatLab® regularized ELM codes found in ELM algorithms [43]. The SVR and LS-SVR model was implemented using the least-square-SVM (LS-SVM) proposed by Valyon and Horvath [44] combined with other functions found in the LS-SVMlab1.8 code [45]. The code was slightly modified to include heavy tailed RBF (htrbf) kernel proposed in Chapelle et al. [46].

Train models and cross validate to select best model: in the training of ANN model, weights and biases of the networks were updated by Levenberg-Marquart (LM) algorithm while the number of hidden layers and neurons was randomly investigated from 1 to 5 and 10 to 100, respectively, in a loop. The algorithm was run for 500 times, and the best models that gave the least RMSE values in the cross-validation results were selected. Similar procedure was used in the training of the ELM models except that number of neuron range from 50 to 5000. In the training of SVR and LS-LSVR models, the algorithms hyper-parameters (e-tube (epsilon), tunning parameter (C), lambda and kernel for SVMR and tunning parameter (gam) and kernal for LS-SVMR) were optimized using cross-validation technique. For each run, a kernel function was chosen and investigated for different range of values of other parameters in a loop. The Kernel function and other corresponding hyper-parameters with the least RMSE values during cross-validation of each run were identified as the best model. Table 7 shows the final selected model hyper-parameters.

Testing and evaluation of models: the models were tested using the testing data and the three set evaluation criteria: cc, RMSE and testing time were recorded for evaluation models.

The flowchart presented in Figure 3 summarizes the processes.

model for testing in each of the artificial intelligence techniques.

the algorithms.

Figure 3. Methodology flowchart.

4.3. Performance assessment criteria

performance [27, 32]. The criteria are as follows.

Data from each well were randomly split into 70% for training, 15% for testing, and 15% for validation with which the algorithms were trained, modified to come up with an acceptable

Rate of Penetration Prediction Utilizing Hydromechanical Specific Energy

http://dx.doi.org/10.5772/intechopen.76903

121

To ensure uniform distribution of the data point and removed effect of biased sampling, the normalized data were then randomized before use in the model development. To avoid bias in evaluating different algorithms across the four AI being compared, data integrity and similarity were preserved in all methods. Three performance measures: root mean square error (RMSE), correlation coefficient (cc), and testing time were used to assess the performance of

To establish a valid evaluation of the performance of the different AI being compared, the assessment criteria used in petroleum journals were considered as the criteria for measuring


Table 7. Summary of optimized parameters used in the implementation of models.

Figure 3. Methodology flowchart.

Data integrity and similarity were also preserved in all methods to avoid bias in evaluating

Model development: the implementation of ANN was carried out using MatLab® ANN toolbox. The implementation was based on the backpropagation algorithm with momentum and adaptive learning rate, and the sigmoidal functions. In the implementation of ELM, the algorithm was based on MatLab® regularized ELM codes found in ELM algorithms [43]. The SVR and LS-SVR model was implemented using the least-square-SVM (LS-SVM) proposed by Valyon and Horvath [44] combined with other functions found in the LS-SVMlab1.8 code [45]. The code was slightly modified to include heavy tailed RBF (htrbf) kernel proposed in

Train models and cross validate to select best model: in the training of ANN model, weights and biases of the networks were updated by Levenberg-Marquart (LM) algorithm while the number of hidden layers and neurons was randomly investigated from 1 to 5 and 10 to 100, respectively, in a loop. The algorithm was run for 500 times, and the best models that gave the least RMSE values in the cross-validation results were selected. Similar procedure was used in the training of the ELM models except that number of neuron range from 50 to 5000. In the training of SVR and LS-LSVR models, the algorithms hyper-parameters (e-tube (epsilon), tunning parameter (C), lambda and kernel for SVMR and tunning parameter (gam) and kernal for LS-SVMR) were optimized using cross-validation technique. For each run, a kernel function was chosen and investigated for different range of values of other parameters in a loop. The Kernel function and other corresponding hyper-parameters with the least RMSE values during cross-validation of each run were identified as the best model. Table 7 shows the final

Testing and evaluation of models: the models were tested using the testing data and the three set evaluation criteria: cc, RMSE and testing time were recorded for evaluation

> C = 1.4035e+08, kernel, = 'htrbf'

gam = 990,000 kernel, = 'htrbf',

Node = 2731

kernel option = [0.0733,1.01050], lambda = 5.38274e-04 epsilon = 0.0880,

kernel option = [3.6048e-06,0.9228]

Hidden nodes = [61, 71]

Regularization = 81.9853

Activation function = tribas

Activation functions = [tansig, logsig, purelin]

AI techniques Well-A Well-B

kernel option = [0.0391, 1.04267],

kernel option = [8.3120e-04, 0.4753]

Table 7. Summary of optimized parameters used in the implementation of models.

different algorithms across the four AI techniques.

Chapelle et al. [46].

120 Drilling

selected model hyper-parameters.

SVR C = 9.4422e+08,

LS-SVR gam = 5.750319e+02

ELM Activation function = tribas Node = 1241

kernel, = 'htrbf'

lambda = 0.00310 epsilon = 0.0464

kernel, = 'htrbf',

ANN Activation functions = [logsig, tansig, purelin] Hidden nodes = [51,19]

Regularization = 15.7419

models.

The flowchart presented in Figure 3 summarizes the processes.

Data from each well were randomly split into 70% for training, 15% for testing, and 15% for validation with which the algorithms were trained, modified to come up with an acceptable model for testing in each of the artificial intelligence techniques.

To ensure uniform distribution of the data point and removed effect of biased sampling, the normalized data were then randomized before use in the model development. To avoid bias in evaluating different algorithms across the four AI being compared, data integrity and similarity were preserved in all methods. Three performance measures: root mean square error (RMSE), correlation coefficient (cc), and testing time were used to assess the performance of the algorithms.

#### 4.3. Performance assessment criteria

To establish a valid evaluation of the performance of the different AI being compared, the assessment criteria used in petroleum journals were considered as the criteria for measuring performance [27, 32]. The criteria are as follows.

#### 4.3.1. Correlation coefficient (CC)

This is a measure of the strength of relationship between the predicted value and the actual value being predicted. It indicates how far the model prediction deviates from the real value with high values indicating good performance and vice versa.

$$cc = \frac{\sum \left( y\_a - y\_{\!/\, a} \right) \left( y\_p - y\_{\!/\, p} \right)}{\sqrt{\sum \left( y\_a - y\_{\!/\, a} \right)^2 \left( y\_p - y\_{\!/\, p} \right)^2}} \tag{8}$$

#### 4.3.1. Root mean-squared error (RMSE)

This can be interpreted as the standard deviation of the variance of the predicted value from the corresponding observed value. It is a measure of absolute fit and indicates how close the predicted values are from the actual observed values.

$$\text{rms} = \sqrt{\frac{\left(\mathbf{x}\_1 - y\_1\right)^2 + \left(\mathbf{x}\_2 - y\_2\right)^2 + \dots + \left(\mathbf{x}\_n - y\_n\right)^2}{n}} \tag{9}$$

Tables 9 and 10 show the results of the four AI algorithms used for ROP prediction in the study. After several runs, the best model in each were tested and evaluated to be adjudged the best. The algorithms were independently tested with eight drilling parameters presented in

Training RMSE Testing RMSE Training CC Testing CC Testing Time

Rate of Penetration Prediction Utilizing Hydromechanical Specific Energy

http://dx.doi.org/10.5772/intechopen.76903

123

Training RMSE Testing RMSE Training CC Testing CC Testing Time

SVR 14.39394 23.29097 0.937030 0.808604 2.839218 ANN 27.26942 27.58479 0.737530 0.715336 0.031200 LS-SVR 10.82009 21.57755 0.966169 0.837852 2.730018 ELM 23.17740 27.08876 0.819712 0.731162 0.078000

SVR 10.73935 21.71836 0.980072 0.910637 5.725237 ANN 26.41347 28.04187 0.866958 0.845982 0.031200 LS-SVR 3.69279 18.83404 0.997702 0.933733 5.460035 ELM 25.01964 27.98157 0.881806 0.846528 0.140401

Each of the four AI techniques tested exhibited its competitive performance as shown in the results. Figures 4–6 show the performance of the four techniques in each of the dataset both during the training and testing, and therefore revealed their respective comparative strong and weak points. The comparative results of the four AIs applied to the two datasets using the

RMSE and CC as earlier defined are measures of performance in terms of accuracy, with the algorithm exhibiting lowest RMSE and highest CC being the most accurate predicting algorithm. In Figure 4, a cross-plot of the testing correlation coefficient (cc) against the testing root mean square error (RMSE) shows that in Well-A the best performance in terms of accuracy in the algorithms is produced by LS-SVR followed closely by SVR while the least accurate performance is seen in ELM and ANN. The same pattern is repeated in Well-B with LS-SVR exhibiting the best performance and ANN and ELM performance are not remarkably far from each other. The overall best performance is LS-SVR performance in Well-B. This is as a result of the data density in Well-B as seen in Table 3. Therefore, LS-SVR provides an excellent function

By comparing the testing time as seen in Tables 7 and 8, and plotting in Figure 5, it is evident that among the four algorithms tested, LS-SVR and SVR in both wells require considerable amount of time for model testing, while ANN and ELM require the minimum time for the

same drilling parameters were plotted and are as shown in Figure 4.

Table 8.

4.5. Discussion of results

Table 9. Dataset A results.

Table 10. Dataset B results.

estimation capability.

The strategy followed is to implement the four techniques under the same data and processing conditions as described above to avoid bias in evaluating different algorithms [29, 30, 47]. Also, the design of the individual models utilized the cross-validation technique to select the optimal tuning hyper-parameters with the validation data set using the RMSE evaluation criteria to measure their performance. Runs for each of the techniques were repeated several times using a loop, in order to optimize the hyper-parameter of the models while using crossvalidation to select the best model for the algorithms. The testing data is run on the model and cc, RMSE and testing time were recorded to evaluate the model for comparison.

#### 4.4. Experimental results and discussion

In the implementation of each of the techniques tested for ROP prediction, the training, validation, and testing data described above were used.


The datasets are presented in Table 8.


Table 8. Drilling parameters used in each of the two datasets.


Table 9. Dataset A results.

4.3.1. Correlation coefficient (CC)

122 Drilling

4.3.1. Root mean-squared error (RMSE)

This is a measure of the strength of relationship between the predicted value and the actual value being predicted. It indicates how far the model prediction deviates from the real value

> a � � yp � <sup>y</sup><sup>0</sup>

a � �<sup>2</sup>

This can be interpreted as the standard deviation of the variance of the predicted value from the corresponding observed value. It is a measure of absolute fit and indicates how close the

The strategy followed is to implement the four techniques under the same data and processing conditions as described above to avoid bias in evaluating different algorithms [29, 30, 47]. Also, the design of the individual models utilized the cross-validation technique to select the optimal tuning hyper-parameters with the validation data set using the RMSE evaluation criteria to measure their performance. Runs for each of the techniques were repeated several times using a loop, in order to optimize the hyper-parameter of the models while using crossvalidation to select the best model for the algorithms. The testing data is run on the model and

In the implementation of each of the techniques tested for ROP prediction, the training,

• Dataset A which comprises of eight HMSE-ROP related drilling parameters from Well-A. • Dataset B which comprises of eight HMSE-ROP-related drilling parameters from Well-B.

A Depth, WOB, RPM, TORQ, Flowrate, SPP, MW and Bit Size for Well-A B Depth, WOB, RPM, TORQ, Flowrate, SPP, MW and Bit Size for Well-B

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

n

� �<sup>2</sup> <sup>þ</sup> …: <sup>þ</sup> xn � yn

p � �

� �<sup>2</sup> <sup>r</sup> (8)

� �<sup>2</sup>

(9)

yp � y<sup>0</sup> p

<sup>P</sup> ya � <sup>y</sup><sup>0</sup>

<sup>P</sup> ya � <sup>y</sup><sup>0</sup>

with high values indicating good performance and vice versa.

predicted values are from the actual observed values.

s

rmse ¼

4.4. Experimental results and discussion

The datasets are presented in Table 8.

validation, and testing data described above were used.

Dataset Drilling parameters (Predictors)

Table 8. Drilling parameters used in each of the two datasets.

cc ¼

x<sup>1</sup> � y<sup>1</sup>

� �<sup>2</sup> <sup>þ</sup> <sup>x</sup><sup>2</sup> � <sup>y</sup><sup>2</sup>

cc, RMSE and testing time were recorded to evaluate the model for comparison.


Table 10. Dataset B results.

Tables 9 and 10 show the results of the four AI algorithms used for ROP prediction in the study. After several runs, the best model in each were tested and evaluated to be adjudged the best. The algorithms were independently tested with eight drilling parameters presented in Table 8.

#### 4.5. Discussion of results

Each of the four AI techniques tested exhibited its competitive performance as shown in the results. Figures 4–6 show the performance of the four techniques in each of the dataset both during the training and testing, and therefore revealed their respective comparative strong and weak points. The comparative results of the four AIs applied to the two datasets using the same drilling parameters were plotted and are as shown in Figure 4.

RMSE and CC as earlier defined are measures of performance in terms of accuracy, with the algorithm exhibiting lowest RMSE and highest CC being the most accurate predicting algorithm. In Figure 4, a cross-plot of the testing correlation coefficient (cc) against the testing root mean square error (RMSE) shows that in Well-A the best performance in terms of accuracy in the algorithms is produced by LS-SVR followed closely by SVR while the least accurate performance is seen in ELM and ANN. The same pattern is repeated in Well-B with LS-SVR exhibiting the best performance and ANN and ELM performance are not remarkably far from each other. The overall best performance is LS-SVR performance in Well-B. This is as a result of the data density in Well-B as seen in Table 3. Therefore, LS-SVR provides an excellent function estimation capability.

By comparing the testing time as seen in Tables 7 and 8, and plotting in Figure 5, it is evident that among the four algorithms tested, LS-SVR and SVR in both wells require considerable amount of time for model testing, while ANN and ELM require the minimum time for the

Figure 4. CC-RMSE plot showing testing results for dataset 1 and 2 for wells A and B, respectively.

suitable for scenarios where the execution time critical. It must however be stated, that the use of drilling domain knowledge in the choice of drilling parameters has enhance the accuracy of all the AI algorithm predicted ROPs to be within acceptable range, while using variables from

Rate of Penetration Prediction Utilizing Hydromechanical Specific Energy

http://dx.doi.org/10.5772/intechopen.76903

125

AI techniques have increasingly proved to be of immense value in the oil and gas industry where it has been employed by different segments of the industry. Traditional methods has not been able to manage such huge impacts in such a short time as AI methods because of its ability to decipher hidden codes and complex relationships within the enormous data collected daily during drilling operations. However, application of the right domain expert knowledge has shown improved performance in the deployment of AI techniques. This technique and its application leads to time and cost saving, minimized risk, improved efficiency and solutions many optimization problems. The ability of the technique to retrain itself with life data within

This paper presents an improved methodology of predicting ROP with real-time drilling optimization in mind. Recent studies in the use of AI in the prediction of ROP shows some inconsistency in the selection of input variables. The parameters used in this study are the must haves and easily accessible parameters which can mostly be adjusted while drilling and are therefore controllable. The utilization of HMSE-ROP model has also enhanced the performance of the models as a result of selecting few variables with established relationship to ROP even though nonlinear. All the methods used provided good degree of accuracy, and therefore presented the engineers with options to use whichever algorithm is suitable for their

a shorter time has made it a major founding block for drilling automation.

HMSE-ROP model as input.

Figure 6. AI predicted ROPs plotted against actual ROP for Well-A and -B.

5. Conclusion

Figure 5. Testing time for each of the algorithms tested with the two datasets.

same process. The density and amount of data used for Well-B as can be seen in Table 3, is evidently responsible for the extra time it takes for testing the model.

The application of domain knowledge and in particular, the utilization of specific energy as a concept in selecting the controllable drilling parameters used in the prediction of ROP has proven valuable with all the AI models showing accuracy within acceptable range. A depth plot of actual ROP against the predicted ROP from all the AI models is presented in Figure 6. As can be observed, the qualitative difference is quite elusive showing that the four AI models are good predictors with reasonable accuracy.

In summary, the LS-SVR produces the best ROP model for the two dataset in term of accuracy, while it requires considerable amount of testing time of the four AI techniques compared. Therefore, it is more suitable for situations where accuracy is most desirable. Whereas, ELM and ANN requires the shortest testing execution time and are less accurate, they are more Rate of Penetration Prediction Utilizing Hydromechanical Specific Energy http://dx.doi.org/10.5772/intechopen.76903 125

Figure 6. AI predicted ROPs plotted against actual ROP for Well-A and -B.

suitable for scenarios where the execution time critical. It must however be stated, that the use of drilling domain knowledge in the choice of drilling parameters has enhance the accuracy of all the AI algorithm predicted ROPs to be within acceptable range, while using variables from HMSE-ROP model as input.
