Applications of Response Surface Methodology

## **Chapter 2**

## Application of Central Composite Design with Design Expert v13 in Process Optimization

*Chigoziri N. Njoku and Samuel K. Otisi*

## **Abstract**

This chapter is focused on the study application of central composite design, in response surface methodology. We have reviewed this concept and applied it to optimize Biodiesel yield from transesterification of methanol and vegetable oil with a catalyst derived from eggshell using design expert 13. This optimization was carried out with reaction conditions of reaction time, methanol to oil ratio, catalyst loading, and reaction temperature. Data used as an instance was collected and analyzed from the work of Tshizanga et al. and the result obtained for a randomized experiment showed at a 95% confidence level that all the factors affected the product's output. About 91% yield was obtained and operating parameters were optimized at a temperature of around 61%. Methanol to oil ratio of 22.13, and catalyst loading of around 3.7 wt%. This chapter provided a step-by-step guide on how to carry out this experiment using design expert 13, a reduced Quadratic model with a significant P-value of 0.0325 shows the model is significant, as indicated by an f-value of 3.57. An F-value might be caused by noise only in 3.25% of cases. The run was reduced to 18 compared to the 20 runs originally used by Tshizanga et al.

**Keywords:** response surface methodology (RSM), central composite design (CCD), design of experiment (DOE), design expert

## **1. Introduction**

It has always proven difficult to quickly select an appropriate experimental design, which can simply explicate many response factors. This sometimes leads to a quadratic surface model. CCD can be a choice for this kind of model. An experimental design called the central composite design (CCD) concept has emerged and has been very handy as part of the optimization process and search for the ideal product from ongoing batches. In statistics, a central composite design is an experimental kind of design, helpful in response surface methodology, for creating a second-order (quadratic) model for the response variables without having to use a complete three-level factorial experiment [1]. After performing the designed experiment, linear regression is deployed, sometimes iteratively, to obtain results. Coded Variables are frequently utilized when creating this design Most optimizations are

done by screening all the potential variables [2]. Here, all the possible independent factors are first identified, and these factors are further improved before response surface methodology can finally be used to establish relationships between one or more process variables and their responses. The Central composite design is sometimes referred to as Box-Wilson central composite design and it has been chosen among researchers due to its accuracy.

## **2. Key terms in central composite design**

Some important keywords will be mentioned throughout this chapter. This is to equip the readers with the terminology to understand fully the concept of Response surface methodology.

**Response surface:** These are the related variables. It involves a two or threedimensional plot of the results of experimental data. Response surface methodology (RSM) is used to describe the use of experimental designs that give response surfaces from which information about the experimental system is deduced [3].

**Factor:** This can also be called the parameter or predictor. It is an entity that controls an outcome. The output Change is brought about by the manipulation or tweak of the input factor (s). They can be set and reset at different levels depending on the needs and conditions that affect the experiment.

**Levels of the factors**: The Design of experiments is named by the number of levels chosen for a factor, it could be a two or three-level design. It signifies the value of a factor that is prescribed in an experimental design. Levels could be high, mid, or low (three-level design) and only high and low (two-level design) is often coded as +1(high), 0(mid), and 1(low). Selecting levels for an experiment often requires field experience. For example, for a three-level experiment, selecting the levels in a reactor would require some previous experience to decide 30°C(1), 40°C (0) and 50°C (1) are suitable for low-level, mid-level, and high level respectively.

**Blocking:** This tool is used to eradicate the effects of external disturbances and in the process improve the efficiency of experimental design. External disturbances cause different forms of variations. The main goal is to arrange similar experiments runs into one group, so that the whole group becomes a homogeneous unit. For example in the transesterification reaction, A researcher is attempting to increase the yield of Biodiesel through Mean Absolute Errors (MAE). factors were considered for the initial experiment trials, which might have some impact on the yield. It is decided to study each factor in a two-level setting (i.e. a low value and a high value). Six experimental trials are chosen by the experimenter, but only four trials are possible to run per day. Here, each day can be handled separately as a different block [4].

**Response:** This is the result of the effect of an experiment, which is observed on account of changing the values of the predictors. For example the Yield, Selectivity, or Conversion of a reactant in a reactor.

**Design of experiment (DOE):** This is a statistical approach that involves planning, analyzing, conducting, and interpreting data obtained from experiments [3].

**Randomization:** While designing and running an experiment, there are several factors in the form of external disturbances often known as noise factors, which may influence how the experiment turns out. For example, variations in the quality of the raw material due to seasonal change, variations in the temperature, and their effects on the overall reaction yield may affect the result and such factors are difficult to

*Application of Central Composite Design with Design Expert v13 in Process Optimization DOI: http://dx.doi.org/10.5772/intechopen.109704*

control. Randomization is one of the methods to remove or reduce such errors occurring due to uncontrollable factors. Randomization helps in calculating the cumulative impact of the external disturbances if present in the process [3].

**Model:** This is an equation expressing the relationship between responses and the factors under study or investigation. Here the outcome can be denoted as a function of the experimental factors. For example, a model that has only one parameter *x* could be expressed as;

$$\mathbf{y} = f(\mathbf{x}) + \mathbf{\mathcal{E}} \tag{1}$$

For two parameters model, it could represent as;

$$\mathbf{y} = f(\mathbf{x}\_1, \mathbf{x}\_2) + \mathbf{\mathcal{E}} \tag{2}$$

For the n parameters model, consider the following equation;

$$\mathbf{y} = f(\mathbf{x}\_1, \mathbf{x}\_2 \dots \mathbf{x}\_n) + \mathbf{3} \tag{3}$$

The function, *f x*ð Þ denotes the relationship between the parameters and the response (y) with the residuals (ε) and is depicted through a polynomial equation. Three different models are described:

*Linear model:* This is the simplest polynomial model that contains only linear terms and describes only the linear relationships between the variables and the responses. A linear model with two factors x1, x2 are expressed as:

$$\mathbf{y} = b\_0 + b\_1 \mathbf{x}\_1 + b\_2 \mathbf{x}\_2 + \mathbf{3} \tag{4}$$

Or can generally be represented as;

$$\mathbf{y} = b\_0 + \sum\_{i=1}^{k} b i \mathbf{x} i + \mathbf{3} \tag{5}$$

Here, y is the outcome, *b*<sup>i</sup> is the model coefficients, *b*<sup>0</sup> is the model intercept, i is the factor number from i to k, and *x*<sup>i</sup> is the independent variables.

*Interaction model:* The interaction model holds some extra terms that depict interactions between various variables if any. For a two-factor, It is denoted as;

$$\mathbf{y} = b\_0 + b\_1 \mathbf{x}\_1 + b\_2 \mathbf{x}\_2 + b\_{12} \mathbf{x}\_1 \mathbf{x}\_2 + \mathbf{3} \tag{6}$$

Or generally as;

$$\mathbf{y} = b\_0 + \sum\_{i=1}^{k} b \mathbf{i} \mathbf{x} i + \sum\_{i=1}^{k=1} \sum\_{j=i+1}^{k} b \mathbf{j} \mathbf{j} \mathbf{x} i \mathbf{x} j + \mathbf{3} \tag{7}$$

*b*o, *b* i, and *b*ij are the regression or the model coefficients for intercept, linear, and interaction terms, respectively, and *x*i, and *x*<sup>j</sup> are reaction factors.

*Quadratic model:* Quadratic terms are introduced in the model to help ascertain the optimal value. It helps to identify curvature that exists in the model. This model for two factors and interaction can be represented below:

*Response Surface Methodology - Research Advances and Applications*

$$\mathbf{y} = b\_0 + b\_1 \mathbf{x}\_1 + b\_2 \mathbf{x}\_2 + b\_{12} \mathbf{x}\_1 \mathbf{x}\_2 + b\_{11} \mathbf{x}\_1^2 + b\_{22} \mathbf{x}\_2^2 + \dots \tag{8}$$

or generally a;

$$\mathbf{y} = b\_0 + \sum\_{i=1}^{k} b \mathbf{i} \mathbf{x} i + \sum\_{i=1}^{k} b \dot{\mathbf{i}} \mathbf{x} i + \sum\_{i=1}^{k=1} \sum\_{j=i+1}^{k} b \dot{\mathbf{j}} \mathbf{x} i \mathbf{x} j + \mathbf{3} \tag{9}$$

*b*o, *b* i, *b*ii, and *b*ij are the model coefficient for intercept, linear, quadratic, and interaction terms, respectively, and *x*i, and *x*<sup>j</sup> variables.

*Note:* The Symbol *ℇ*, in the model for eqs. (1) to (9) represents the residuals and the linear and interaction models are used during the screening stage.

**Effects:** This is often regarded as the coefficient of the variables, it can be distinguished from the *main effects;* which involve the factor's coefficient in the first-order model. *Interaction effect*; It is the coefficient of the products of linear terms. *Quadratic effect;* which denotes the coefficient of the square of the linear terms.

**Replication:** Replication means repeating the entire experiment or a part of it, under different operating conditions. It helps to obtain a projection of the experimental errors and to understand and estimate more specifically the factors and their interaction.

## **3. Response surface methodology for optimization design**

The primary goal 0f optimization design is to minimize unfavorable or undesired outputs or maximize the desired outputs. Sometimes, simple linear and interaction models are not enough to provide a brilliant picture of the process. For this study, our goal is to increase the Biodiesel Yield from the transesterification of methanol and vegetable oil using a catalyst derived from the eggshell. The experiment has already been done and data is provided in this reference [4]. We will be using the information from this work to provide a thorough examination of central composite design in process optimization. The variables are reaction temperature, methanol-to-oil ratio, and catalyst weight. If these entities are positioned inside the region in which the experiment is to be conducted, we need a mathematical model that can represent curvature so that it has a local optimum. The best model is the quadratic model as shown in (eq. 9), which contains linear terms for all factors, squared terms for all factors, and products of all pairs of variables. Response surface designs are generally used for fitting quadratic models. A full factorial design with three levels for each input variable is one such design. Due to the excessive number of runs, that is not necessary to fit the model. It is typically not a good design. The CCD and Box– Behnken designs are the two most common designs generally used in response surface modeling although only central composite designs are explored in detail. In these types of designs, the variables take on three or five distinct levels, but not all combinations of these values appear in the design. The steps in CCD for Optimization are outlined below:

**Preliminary stage**: Here, the following steps are done:


*Application of Central Composite Design with Design Expert v13 in Process Optimization DOI: http://dx.doi.org/10.5772/intechopen.109704*


**Analysis stage**: At this stage, the following are done:


**Decision-making stage**: Here, the predicted and actual values are compared to determine the residuals using some useful parameters such as Adjusted R-Square, Mean Absolute Error (MAE), or Mean Square Error (MSE) is employed to assess the model performance and if okay with the result we can proceed to the final stage but if not we go back to the preliminary stage to see how we can adjust the model.

**Optimization stage**: At this stage, the model is ready to be deployed for the optimization process, design expert version 13 is very handy for this entire process, all we need is to specify the required values. The detail on how to determine the CCD components will be done later in this chapter.

## **4. Box: Behnken design (BBD)**

The box design can adapt to the response surface full quadratic model [5]. BBD has no incorporated factorial or fractional factor designs, such as CC. In this design, the treatment combinations are at the midpoints of the edges of the cube and the center as shown in **Figure 1**. BBD is a rotatable design and needs three levels for each factor. BBD should be considered for experiments with more than two factors, and when it is expected that the optimum is known to lie in the middle of the factor ranges. A, B, and C represent factors A, B, and C respectively.

## **5. Central composite design**

Central composite design (CCD): This is a unique kind of response surface design that can fit a full quadratic model. It is comprised of factorial also known as fractional factorial design with a center point attached to a group of stars or axial points. Using the included axial points is an effective method for calculating the coefficients of a second-degree polynomial for the factors [6]. A CCD can be denoted as a square (for two factors design) or a cube (for a three factors design) having corners, which represent the levels (high and low represented as +1, 1 respectively), a star or axial

**Figure 1.** *A representation of box–Behnken design.*

points along the axes at or outside the square helps to account for the curvature and a center point at the origin. The general model for a two-factor full factorial CCD is represented graphically in **Figure 2** below.

**Figure 3** displays a three-factor lay out for a CCD made up of a full factors factorial that forms the cube where each side is coded 1 and + 1 just like in **Figure 2** above. The Stars stand for axial points and alpha is the distance from the edge of the cube to the stars.

## **6. Types of central composite design**

There are three types of CCD namely:


The CCC is a type of CCD in which the location of the axial points forms new extremes from the already attained levels of the factorial factors. The new extremes are determined by a value called alpha (distance between the new extreme and the edges of the factorial points) making it up to 5 levels. It is often determined to achieve a *rotatable* design [7].

The CCI type is a modified form of CCC. The axial points are scaled to be within the limits of the factorial factor [8]. The CCI is also a rotatable type and has 5 levels just like the CCC.

*Application of Central Composite Design with Design Expert v13 in Process Optimization DOI: http://dx.doi.org/10.5772/intechopen.109704*

#### **Figure 2.**

*A visual depiction of the CCD model for determination of total runs for all experiments for two factors full factorial design. K in the model is the number of factors, C is the replicated central points that help to eliminate pure error and N is the experiment runs required for the design.*

**Figure 3.** *A graphical representation of three factors in a full factorial design.*

For the CCF, the axial points correspond to the center points for each side of the cube in **Figure 3** above (three factors designs) and they are non-rotatable [9]. It has only 3 levels. **Figure 4** below provides more insight into this type of CCD.

## **7. Determining the components of central composite design**

Before starting the CCD optimization process, we need to provide a walk-through on how to calculate all the required parameters to build the model.

**Figure 4.** *Three types of central composite Design6.*

## **8. Calculating the number of experiment runs**

To design a CCD experiment for two levels (+1 and � 1 levels of factors) full factorial design is represented by 2k , then the axial points as represented by **Figure 2** are given as 2 k, let C represent the center points and n, the number of times the experiment is repeated to eliminate errors. Then the total number of experiment runs is given as:

$$N = 2^{\mathbf{k}} + 2\mathbf{k} + n\mathbf{C} \tag{10}$$

Where k is the number of factors selected for the experiments. For our case, we have three (3) factors, i.e. Temperature, methanol-ratio, catalyst weight, and 4 repetitions. By substituting k = 3, C = 1, and n = 4 (i.e. 4 repetitions), then N = 18 runs. Luckily, design expert 13 will automatically generate this value once the number of factors and repetitions are provided. Keep in mind the number of center points can also be adjusted by clicking the options button on the software, in this case, we will just use.

## **9. Calculating alpha (α)**

As can be seen that immediately after the factors, n, and C are provided the alpha is automatically calculated, this is because the minimum parameters to calculate it has been specified, now it will be shown how the program generates this value. As has been discussed earlier Alpha is the distance between the new extreme axial points and the edge formed from the factorial levels. Now the following equation will calculate this **α** value for any factor.

$$\mathfrak{a} = \left(\mathfrak{Z}^{\mathrm{k}}\right)^{1/4} \tag{11}$$

For our case, k is 3, and therefore **α = 1.68179** which is in line with the value created by the software. Consider the **Table 1** below for k from 2 to 5 factors and their corresponding values.

*Application of Central Composite Design with Design Expert v13 in Process Optimization DOI: http://dx.doi.org/10.5772/intechopen.109704*


**Table 1.**

*Factors and corresponding α values.*

## **10. Calculating axial values**

Before determining the axial points, the table below shows the factors levels and center points that will be used to compute the axial points. The center points are coded as 0, while low and high levels are designated as �1 and + 1 respectively. Also keep in mind that the experiment has already been performed and data provided from the work of (**Table 2**) Tshizanga et al., [4].

To compute the Axial values, the first thing to do is to find the α that can be added or subtracted from the factor levels (low and high) and the center points. Adding α to factor levels can be coded as + α (higher axial value) while subtracting α from the mean factor levels is however coded as – α (lower axial value). These additional two coded values (+α and – α) are axial and they make the factors up a total of 5. The two equations are given below:

$$\mathbf{x} + \mathbf{a} = \mathbf{X} + \left(\mathbf{a} \ge \frac{\text{High level} - \text{Low level}}{2}\right) \tag{12}$$

$$\alpha = X - \left(\alpha \ge \frac{High\ level - Low\ level}{2}\right) \tag{13}$$

Where α can be found using (eq. 11) although calculated as **1.68179** and X is given by:

$$(\text{Low level} + \text{centre point} + \text{High level})/\text{k} \tag{14}$$

And k is the number of variables, in this case, k is 3. At this point let us get our hands dirty with calculating the values for these 3 factors.

**For Temperature:**

$$\mathbf{X\_1} = \frac{\mathbf{60} + \mathbf{65} + \mathbf{70}}{\mathbf{3}} = \mathbf{65}$$


**Table 2.**

*Experimental ranges of the independent variable.*

$$\begin{aligned} \text{(app. To 4 d.p)} \qquad &+ a = 65 + \left( \text{1.68179 x} \frac{70 - 60}{2} \right) = 73.4090 \text{°C} \\ &- a = 65 - \left( \text{1.68179 x} \frac{70 - 60}{2} \right) = 56.5911 \text{°C} \end{aligned}$$

**For Methanol-Oil ratio:**

$$\mathbf{X}\_2 = \frac{\mathbf{15} + 2\mathbf{2}.5 + 3\mathbf{0}}{3} = 22.5 \ (22.5 : 1)$$

$$+\mathbf{a} = 22.5 + \left(1.68179 \ge \frac{30 - 15}{2}\right) = 35.1134 \ (35.1134 : 1)$$

$$-\mathbf{a} = 22.5 - \left(1.68179 \ge \frac{30 - 15}{2}\right) = 9.8866 \ (9.8866 : 1)$$

**For Catalyst weight:**

$$\mathbf{X}\_3 = \frac{\mathbf{2} + \mathbf{3.5} + \mathbf{5}}{\mathbf{3}} = \mathbf{3.5}$$

$$\mathbf{+a} = \mathbf{3.5} + \left( \mathbf{1.68179} \ge \frac{5 - 2}{2} \right) = 6.0227 \text{ wt\%}$$

$$-\mathbf{a} = \mathbf{3.5} - \left( \mathbf{1.68179} \ge \frac{5 - 2}{2} \right) = 0.9773 \text{ wt\%}$$

Currently, we have succeeded and step by step discussed how the software generated the alpha (α) and axial values as the components of the CCD, below is a table including these axial points (**Table 3**).

Upon specifying the required parameters for the CCD model the software will generate a table where the experiment will now be conducted to enable determining the response for each experiment run. For this case study, our response is biodiesel yield which can be determined from methyl ester and waste vegetable oil weight using the following equation:

$$\text{Yield} \left( \% \right) = \frac{Weight(Biodiesel)}{Weight(Oil)} \ge 100 \tag{15}$$

**Table 4** factors' coded values organized in the standard order.


**Table 3.**

*Experimental ranges of independent variables including calculated axial (star) values.*


*Application of Central Composite Design with Design Expert v13 in Process Optimization DOI: http://dx.doi.org/10.5772/intechopen.109704*

#### **Table 4.**

*Factors' coded values organized in the standard order.*

Immediately after we fill up the required CCD components, the design expert will provide a table for coded levels of factors. This is being used to use as a guide to specifying the actual values and their corresponding responses. The experiment will be repeated four (4) times instead of 6 times (as done by the original researchers) to reduce the experiment runs. The two results will be compared after the optimization stage. **Table 4** shows the coded factors, and **Table 5** shows the actual values and their responses after experimenting in the laboratory.

At this point, we can now replace the coded values with the actual values from the previous calculations. The factor columns were generated with a particular pattern, but it's beyond the scope of this chapter, to learn more about this we recommend reading "RSM simplified by Anderson and Whitcomb" [10].

## **11. Results and analysis**

*Note:* When entering the values for the methanol to oil ratio in design expert, you can ignore all the 1's, since the value of Oil concerning Methanol is always a unit for all the experiment runs.

We can now delve into understanding the data collected to build the model, perform analysis, and finally carry out the optimization. All these steps will be done in design expert software.


**Table 5.**

*Actual factors' values arranged in the standard order after the experiment.*

## **12. Understanding the data**

The reason for this is basically to understand the relationship that exists in the data. In a more statistical sense, we need to know if there is a strong correlation between the variables and the response. If to some extent there exists an intracorrelation among the factors then one of them has to be removed because it will eventually harm the model. Design expert has provided a wonderful dashboard where we can carefully learn more about the data we have collected and make some sense of it. At the left of the software, we will see the *information* part of it. The summary, graphs columns, and evaluation subsections are the places to dig the nuggets from the data.

In the *summary section* we will see the summary statistics of the data, i.e. the number of experiment runs, type of designs and model, minimum, maximum, mean, standard deviations of the responses, and Ratio of maximum to minimum response values (**Table 6**).

We have seen that the mean response is quite far away from the minimum and maximum response, this is the primary reason for building this model to test the statistical significance of this result. If we are okay with the significance we can go ahead with the model built in the evaluation tab (**Table 7**).

*Application of Central Composite Design with Design Expert v13 in Process Optimization DOI: http://dx.doi.org/10.5772/intechopen.109704*


**Table 6.**

*Summary statistics of the factors.*


#### **Table 7.**

*Summary statistics of the response.*

#### **Figure 5.** *A scatterplot of temperature vs. biodiesel yield.*

Moving over to the *graphs column* section, there are scatter plots, histograms, and Box-plots. To make the most sense of this data, the scatter plot is most handy since it tells how the factors are correlated to each other, the drop-down at the top left helps to select the factors to show scatter plots or the correlations plots at the bottom to help display correlations as values between 1 and 1 (blue to red). Values close to 1 show a strong negative correlation and values close to +1 show a strong positive correlation. Now we will go ahead and display the scatter plots of each factor and the response under the factor that mostly affects the biodiesel yield in **Figures 5**–**7** respectively.

The plots have shown that temperature mostly negatively affects yield. The correlation plot in **Figure 7** confirmed this claim since the box with the most blueish color lies between the temperature and Biodiesel columns. As described in the correlation plot in **Figure 8** from design expert software.

Finally, this section provides a unique tab called *evaluation*, where the model name is selected and all their parameters are shown. In this case, a quadratic model has been selected by the software which is the best for CCD. There are two tabs i.e. the results and graphs where the model parameters are evaluated.

In the model tab, the model terms are related to the Significant factor, Variance Inflation Factor, R-Squared, and Power of the model as shown in the table below.

#### **12.1 Model terms**

Power calculations are performed using response type "Continuous" and parameters:

*Application of Central Composite Design with Design Expert v13 in Process Optimization DOI: http://dx.doi.org/10.5772/intechopen.109704*

**Figure 6.** *A scatterplots of methanol/oil ratio against biodiesel yield.*

#### **Figure 7.**

*A scatterplots of catalyst weight against biodiesel yield.*

Delta = 2, Sigma = 1.

Power is evaluated over the 1 to +1 coded factor space. Standard errors should be similar to each other in a balanced design. Lower standard errors are better.

The ideal VIF value is 1.0. VIFs above 10 should cause concern. VIFs above 100 should cause alarm, indicating coefficients are poorly estimated due to multicollinearity.

#### **Figure 8.**

#### *Correlation plots of all the relationships that exist in the data.*

Ideal Rᵢ <sup>2</sup> is 0.0. High Rᵢ <sup>2</sup> means terms are correlated with each other, possibly leading to poor models. If the design has multilinear constraints, then multicollinearity will exist to a higher extent. This inflates the VIFs and the Rᵢ 2 , rendering these statistics useless. Use FDS instead.

The Power Calculation is the estimated chance to find a significant effect out of the current evaluation model. Power depends on the size and structure of the design, the signal-to-noise ratio (number of standard deviations) for the effect, and the model evaluated. The Options button on the Model tab allows the user to define three signal-to-noise ratios that define the number of standard deviations to use. If the power is not large enough (80% or more) for a reasonably sized effect, then the design is underpowered. As can be seen in **Table 8**, we may consider removing the interaction terms since they have lower power. This will be done after analyzing the model and the p-value is higher than 0.05. This means they have affected the performance of the model. Power is an inappropriate tool to evaluate response surface designs. Use prediction-based metrics provided in this program via Fraction of Design Space (FDS) statistics. Click on the Graphs tab to find the FDS graph. More information about FDS is available in the Help. Be sure that the model you selected contains only terms you expect to be significant (**Table 9**).


#### **Table 8.** *Model parameters.*


*Application of Central Composite Design with Design Expert v13 in Process Optimization DOI: http://dx.doi.org/10.5772/intechopen.109704*

**Table 9.** *Leverage.*

## **12.2 Leverage**

The leverage data as shown in the table above is the potential for a design point to influence the fit of the model coefficients, based on its position in the design space. Leverages approaching or at 1 indicate that point will influence the model. A leverage of 1 means the model must exactly fit the observed value. A good design avoids leverages approaching 1. A design for the same model but having more runs will tend to have a lower leverage for each point.

Watch for leverages close to 1.0. Consider replicating these points or make sure they are run very carefully.

The Graphs tab contains the FDS, Perturbation, interactions, Contour, Cube, and 3D Surface Plots to help understand the data and the model parameters.

## **12.3 FDS graph**

The FDS graph is used to compute the volume of the design space that has predicted variance less than or equal to the specified value. The fraction of the design space is calculated as this volume divided by the entire volume of the design space.

The goal is to make a single plot that shows the cumulative fraction of the design space on the x-axis (from zero to one) versus the prediction variance on the y-axis.

For exploration and optimization, we advise an FDS score of at least 0.8, or 80%, and for stability and robustness testing, such as showcasing the design space for quality by design (QbD) work, 100%. Options for assessing the FDS in relationship to four different error categories are provided by the FDS Graph tool, i.e. Mean, prediction intervals, Difference between pairs of Observations, and Tolerance. We are using the Mean error type since the aim of this experiment is to find the optimized factor settings for specific response goals. **Figure 9** below is the visualization for the FDS graph.

There are three parameters: **delta**, **sigma, and alpha** for each type of error and a fourth parameter is **Proportion** for the Tolerance type of error.

**delta** specifies the maximum acceptable half-width (margin of error) of the respective interval for the Mean, Pred, and Tolerance error types. One best way to find the delta is to answer the question, "plus or minus how much is an acceptable estimate?"

**sigma** is an estimate for the standard deviation that will appear on the ANOVA. It can be obtained from previous work with this system, work from a similar system, or outright guessing. A smaller sigma can be entered to enhance the FDS if the unexplained nuisance fluctuation can be reduced during the experiment.

**Alpha** is the used significance level throughout the statistical analysis. Our default is 0.05 or 5%. It is a type I error acceptable risk. FDS rises as alpha increases. The critical value is calculated using alpha/2 for two-sided intervals and alpha for one-sided intervals.

**Proportion** is only used for the Tolerance type of error. It is the percentage of the individual outcomes that must fall within the tolerance range. Building a larger design and raising the delta will boost the FDS score, reducing the sigma, increasing the alpha, and/or decreasing the Proportion [11, 12].

**Figure 9.** *FDS graph.*

## **12.4 Interaction**

When the reaction varies depending on the settings of two elements, there has been an interaction. They will display two non-parallel lines, showing that one element has an impact on them and depends on the level of the other. **Figure 10** displays the standard error of the design with interactions of the model parameters.

## **13. Analysis**

In the Analysis Section in design expert, select no transform in the configure tab and start the analysis using the button at the button. The interface should appear like **Figure 11** below.

You can take advantage of the advanced options button to customize the model like changing from coded to actual factors for factors coding (It is not recommended though).

## **13.1 Fit summary**

The regression calculations to fit all of the polynomial models to the chosen answer are started when the Fit Summary button is clicked. All model terms' effects are calculated by the program. It produces statistics such as p-values, lack of fit, and Rsquared values for comparing the models. The fit summary output is shown on screen in a report which can also be printed and/or copied to another application detected, The "Suggested" model will be highlighted and noted by the program. On the Model panel, this is set as the default model. We Look for the following (**Table 10**):

• A high-order model explains significantly more of the variation that is in the response (p-value small).

**Figure 10.** *Std error of the design with interactions of the model parameters.*


**Figure 11.** *Starting the analysis.*


*Note: Aliased Models should entirely be avoided.*

### **13.2 Sequential model sum of squares**

**Table 11** shows the sum of squares, degree of freedom, mean square, F-value, and p-value of the design model. The Sequential Model Sum of squares is the sum of the squared deviations from the mean for each model. The SS for the Mean is calculated first, followed by the Blocks (if applicable), Linear model, Quadratic model, Special Cubic, Cubic, Residuals, and Total.

For each source, the sum of squares divided by the degrees of freedom yields the mean square. This is used to compute the F-value for the models.


**Table 10.** *Fit summary.* *Application of Central Composite Design with Design Expert v13 in Process Optimization DOI: http://dx.doi.org/10.5772/intechopen.109704*


#### **Table 11.**

*Modeling sequentially, sum of squares.*

The F-value is used to test the significance of adding new model terms to those terms already in the model. For instance, the meaning of the linear terms remains tested after removing the effect of the average and the blocks. Then, the significance of the quadratic terms is tested after removing the average, block, and linear effects. And so on. Select the polynomial with the highest order and where the additional terms are significant and the model is not aliased.

#### **13.3 Model summary statistics**

R-squared is the correlation coefficient for the model. It should be close to one. We recommend using the Adjusted R-squared for DOE evaluation.

The amount of variation that can be explained by the model is shown by the adjusted R-squared. This is the R-squared value after adjusting for how many terms are in the model relative to the number of design points. The Model summary statistics is shown in **Table 12**.

Predicted R-Squared is calculated from the PRESS statistic, this represents the amount of variation in new data explained by the model. A negative Predicted Rsquared means that the overall mean is a better predictor than this model.

Focus on the model maximizing the **Adjusted R<sup>2</sup>** and the **Predicted R2** .

#### **13.4 Lack of fit tests**

The data for the Lack of fit Test is shown above in **Table 13**. This is the p-value associated with the Lack of Fit calculation for this model. The best model should have an insignificant p-value. A typical cutoff would be a p-value >0.10 to conclude an insignificant lack of fit.

The selected model should have an insignificant lack of fit.

#### **13.5 ANOVA for quadratic model**

**Table 14** is the Anova data which is used to test for the significance of the result obtained. Model Probability (a.k.a. p-value) is the probability that the model F statistic is at least the computed value even though the truth is there are no factor effects (the data produced false effects). Probabilities less than the acceptable risk (alpha, by default 0.05) are deemed significant and indicate that there is a model effect. Values greater than the alpha risk suggest no significant effect.


#### **Table 12.**

*Model summary statistics.*


### **Table 13.**

*Lack of fit tests.*

The degree to which the model fits the data is measured by lack of fit. A strong lack of fit (p < .05) is an undesirable property because it shows that the model does not fit the data well. It is desirable to have little lack of fit (P > 0.1).

The model is not significant in comparison to the noise, according to the **model's F-value** of 2.51. The likelihood of noise causing an F-value this large is 10.49%.


#### **Table 14.** *Anova for quadratic model.*

*Application of Central Composite Design with Design Expert v13 in Process Optimization DOI: http://dx.doi.org/10.5772/intechopen.109704*


### **Table 15.**

*Fit statistics.*

**Model terms** are considered significant when the P**-value** is less than 0.0500. B<sup>2</sup> is a significant model term in this situation. The model terms are not significant if the value is greater than 0.1000. Model reduction may enhance the model if it has a lot of unnecessary terms (except those needed to maintain hierarchy).

The significance of the lack of fit is indicated by the **lack of fit F-value** of 20.69. A significant Lack of Fit F-value can only be caused by noise in 1.57 percent of cases. A significant lack of fit is undesirable, we want the model to fit.

A negative Predicted R2 as shown in the Fit Statistics data in **Table 15** implies that the overall mean may be a better predictor of your response than the current model. In some cases, a higher-order model might be more accurate.

**Adeq Precision:** The ratio of signal to noise is measured by Adeq Precision. A ratio of at least 4 is preferred. Your ratio of 4.822 shows an adequate signal. This model can be used to navigate the design space.

## **14. Decision**

From the ANOVA result, it is obvious the model cannot be deployed like this, we need to tweak it a bit before using it for optimization, or else the solutions provided by it will be misleading. Now we will remove the interaction terms from the model since they have lower power (see **Table 8**). We will only repeat the ANOVA section after this change (**Table 16**).

### **14.1 ANOVA for reduced quadratic model**

The model is significant, as indicated by the model's F-value of 3.57 The likelihood of noise producing an F-value this large is only 3.25.

Model terms are considered significant when the P-value is less than 0.0500. In this case, B<sup>2</sup> is a crucial model term in this instance. Model terms are not significant if the value is higher than 0.100. Model reduction may enhance your model if it has a large number of unnecessary terms (excluding those necessary to maintain hierarchy).

The Lack of Fit F-value of 16.87 implies the Lack of Fit is significant. There is only a 2.02% chance that a Lack of Fit F-value this large could occur due to noise. A significant lack of fit is not okay – we want the model to fit. But there is a little bit more improvement than before. So we can work with this model (**Table 17**).

### **14.2 Fit statistics for RQM**

A negative **Predicted R<sup>2</sup>** implies that the overall mean may be a better predictor of your response than the current model. In some cases, a higher-order model may also predict better.


#### **Table 16.**

*ANOVA for reduced quadratic model.*

**Adeq Precision** measures the signal-to-noise ratio. A ratio greater than 4 is desirable. Our ratio of 5.459 indicates an adequate signal. This model can be used to navigate the design space. And there is an improvement in the Adjusted R<sup>2</sup> using this reduced Quadratic Model.

### **14.3 Coefficients in terms of coded factors**

The coefficient estimate data in **Table 18** represents the estimated coefficient and shows the anticipated change in response for each unit change in the factor value. The intercept in an orthogonal design is the overall average response of all the runs. The coefficients are adjustments around that average based on the factor settings. The VIFs are 1 when the factors are orthogonal; Multi-collinearity is indicated by VIFs that are more than 1, and the higher the VIF, the more severe the correlation of components VIFs of fewer than 10 are generally acceptable.

## **14.4 Final equation in terms of coded factors**

You can apply the equation in terms of coded factors in **Table 19** to make predictions about the response for given levels of each factor. By default, the factors' high levels are coded as +1 and their low levels as 1. By comparing the factor coefficients, the coded equation can be used to determine the relative importance of the elements.


**Table 17.** *Fit statistics for RQM.* *Application of Central Composite Design with Design Expert v13 in Process Optimization DOI: http://dx.doi.org/10.5772/intechopen.109704*


#### **Table 18.**

*Coefficients as codified factors.*

#### **14.5 Final equation in using actual factors**

The equation in terms of actual factors in **Table 20** can be used to make predictions about the response for given levels of each factor. Here, the levels should be specified in the original units for each factor. The relative importance of each item should not be determined using this equation because the coefficients are scaled to accommodate the units of each factor and the intercept is not at the center of the design space.

#### **14.6 Diagnostics plots**

Raw residuals and internally studentized options are also available, with externally studentized residuals being the default. The standard errors of the residuals are different unless all the runs in a design have the same leverage. Each raw residual represents a different population (one for each different standard error). As a result, it is not recommended to validate the regression assumptions using raw residuals. All of the individual normal distributions are mapped by studentizing the residuals to a single standard normal distribution. The default is externally studentized residuals based on a deletion procedure since they are more sensitive to detecting issues with the analysis. Internally Studentized residuals are also available but are less sensitive to finding such problems. As described in the diagnostics plot in **Figure 12** from design expert software.


#### **Table 19.** *Equation at the end using coded factors.*


#### **Table 20.**

*Final equation using actual factors.*

**Normal Probability**: If the residuals follow a normal distribution, they should follow a straight line, according to the normal probability plot. Even with typical data, expect some scatter. Only focus on distinct patterns, such as an "S-shaped" curve, which suggests that a response modification might lead to a more accurate analysis.

**Figure 12.** *A diagnostics plots.*

*Application of Central Composite Design with Design Expert v13 in Process Optimization DOI: http://dx.doi.org/10.5772/intechopen.109704*

**Residuals vs. Predicted**: This is a plot of the residuals versus the ascending predicted response values. The idea of constant variance is tested. The plot needs to be random scatter (residuals should have a constant range across the graph). This plot's expanding variance ("megaphone pattern") suggests that a transformation is required.

**Predicted vs. Actual**: An illustration showing a graph of expected and actual response values. The purpose is to detect a value, or group of values, that are not easily predicted by the model.

**Leverage**: A measurement of each point's impact on the model's fit. When a point's leverage is 1, the model perfectly describes the observation at that location. The model is influenced by that point. A run with more than two times the typical leverage is generally regarded as having high leverage. There aren't many runs like them in the factor space. The average leverage is calculated by dividing the number of terms among the model by the number of design runs.

#### **14.7 Model graphs**

All the model graphs which can be used to drive insights on the responses for all input data are shown in **Figures 13**–**18** respectively.

## **15. Optimization**

Here, Our goal is to maximize Biodiesel Yield using the given factors in the range (lower and upper level) summarized in **Table 21** below.

**Figure 13.** *All factors response.*

#### *Response Surface Methodology - Research Advances and Applications*

**Figure 14.** *Interactions.*

**Figure 15.** *Contour plot.*

*Application of Central Composite Design with Design Expert v13 in Process Optimization DOI: http://dx.doi.org/10.5772/intechopen.109704*

**Figure 16.** *Predicted vs. actual.*

**Figure 17.** *3D surface plot.*

#### **Figure 18.** *Cube plot.*


**Table 21.**

*Constraints.*

## **15.1 Solutions**

The design expert Software iterated over all the ranges of factors and found the maximum yield. There are 100 possible solutions. However, we will select the one suggested by the software and shown below in **Table 22**.

## **16. Conclusion**

In this Chapter, we have extensively applied Central Composite design to optimize Biodiesel Synthesis Using a Catalyst and design expert 13 has been used to provide deep statistical analysis. A reduced Quadratic model with a significant p-value of 0.0325 was accepted since the Quadratic model has an insignificant p-value. The

*Application of Central Composite Design with Design Expert v13 in Process Optimization DOI: http://dx.doi.org/10.5772/intechopen.109704*


**Table 22.** *Optimization solutions.*

model is significant, as indicated by the model's F-value of 3.57. An F-value this large might be caused by noise only in 3.25% of cases. The number of the experimental run was reduced to 18 runs compared to the 20 runs used by the original experimenters and we have also obtained a higher yield of 91% compared to the 89% obtained in the original study.

## **Acknowledgements**

I acknowledge my co-author, Dr. C.N Njoku for his help and support to garner this information and for inspiring the success of this work. I also want to return big regard to my mother for always providing her special support in the little ways she could.

## **Author details**

Chigoziri N. Njoku<sup>1</sup> and Samuel K. Otisi<sup>2</sup> \*

1 Africa Centre of Excellence in Future Energies and Electrochemical Systems (ACE-FUELS), Federal University of Technology, Owerri, Nigeria

2 Department of Chemical Engineering, Federal University of Technology, Owerri, Nigeria

\*Address all correspondence to: samuelotisikalu@gmail.com

© 2023 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Application of Central Composite Design with Design Expert v13 in Process Optimization DOI: http://dx.doi.org/10.5772/intechopen.109704*

## **References**

[1] Bhattacharya S. Central composite Design for Response Surface Methodology and its Application in pharmacy. In: Response Surface Methodology in Engineering Science. London, UK: IntechOpen; 2021. DOI: 10.5772/INTECHOPEN.95835

[2] Wikipedia contributors. (2020). Central composite design. In Wikipedia, The Free Encyclopedia. Available from: https://en.wikipedia.org/w/index.php? title=Central\_composite\_design&oldid= 954106283 [Accessed: May 14, 2022]

[3] Skartland LK, Mjos SA, Grung B. Experimental designs for modeling retention patterns and separation efficiency in the analysis of fatty acid methyl esters by gas chromatographymass spectrometry. Journal of Chromatography A. 2011;**1218**: 6823-6831

[4] Tshizanga N, Aransiola EF, Oyekola O. Optimization of biodiesel production from waste vegetable oil and eggshell ash. South African Journal of Chemical Engineering. 2017;**23**:145-156. DOI: 10.1016/j.sajce.2017.05.003

[5] Manohar M, Joseph J, Selvaraj T, Sivakumar D. Application of box Behnken design to optimize the parameters for turning Inconel 718 using coated carbide tools. International Journal of Scientific and Engineering Research. 2013;**4**(620):642

[6] Breyfogle FW. Chapter 17. In: Statistical Methods for Testing, Development, and Manufacturing. John Wiley & Sons Ltd, New York. 252 p; 1992

[7] Singh B, Kumar R, Ahuja N. Optimizing drug delivery systems using systematic" design of experiments." part I: Fundamental aspects. Critical Reviews in Therapeutic Drug Carrier Systems. 2005;**22**(1):27-105

[8] Cavazzuti M. Design of experiments. In: Optimization Methods. Berlin, Heidelberg: Springer; 2013. pp. 13-42

[9] Hassanein HM, Abd-Rabou AS, Sakr SM. Design optimization of transverse flux linear motor for weight reduction and performance improvement using response surface methodology and genetic algorithms. IEEE Transactions on Energy Conservation. 2010;**25**(3): 598-605

[10] Anderson MJ, Whitcomb PJ. RSM Simplified. New York: Productivity, Inc.; 2016

[11] DeGryze L, Vandebroek. Using the correct intervals for prediction: A tutorial on tolerance intervals of ordinary least-squares regression. Chemometrics and Intelligent Laboratory Systems. 2007;**87**(2):147-154

[12] Zahran A, Anderson-Cook CM, Myers RH. Fraction of design space to assess prediction capability of response surface designs. Journal of Quality Technology. 2003;**35**(4):377-386

## **Chapter 3**
