**3. Experimental results**

The following results correspond to the parameter estimates for the probability distributions fitted for the two classes of tasters, as well as the *p*-values referring to the validation of the probabilistic model fitted for the sensory scores.

With these specifications, given a level of significance of 1%, it is noted the confirmation of the fit in the sensory scores for each coffee, therefore, there is statistical evidence to assume that GEV distribution is adequate to model the maximum sensory grades of the evaluated coffees (**Table 3**). It should be noted that the fact that we have *p-*values greater than 1% for the KS test indicates that there is statistical evidence for the acceptance of the test's null hypothesis, as can be seen in Section 2. The test used, however, according to [30], should only be used for completely specified distributions, that is, when there are no unknown parameters that need to be estimated from the sample. Otherwise, the test is very conservative. One solution would be to obtain, via simulation, the theoretical quantiles of the Kolmogorov Smirnov test to compare them with the quantiles obtained from the sample. A similar procedure for the Gumbel distribution was carried out by [31].


#### **Table 3.**

*Parameters estimates and results of the Kolmogorov–Smirnov and Ljung-box tests for the maximum scores given by consumers in the sensory evaluation of the special coffees named in A, B, C and D.*

Alternatively, inspection of fit quality can be assessed via Q-Q plots graphs. They

*each special coffee for the group of untrained (left) and trained (right) tasters.*

*Q-Q plot referring to the fitted GEV distributions for the maximum sensory scores obtained in the evaluation of*

*Intensive Computational Method Applied for Assessing Specialty Coffees by Trained…*

*DOI: http://dx.doi.org/10.5772/intechopen.95234*

are shown in **Figure 1**.

**Figure 1.**

**183**

*Intensive Computational Method Applied for Assessing Specialty Coffees by Trained… DOI: http://dx.doi.org/10.5772/intechopen.95234*

#### **Figure 1.**

(**Step 1)** Draw, with replacement, of *P*, one Bootstrap sample *P*<sup>∗</sup> ;

ð Þ<sup>2</sup> <sup>≤</sup> … <sup>≤</sup>^*<sup>θ</sup>*

� �

(0 <*α*< 1), the *p*-Bootstrap confidence interval with 100 � ð Þ 1 � *α* % level of confi-

ð Þ *B* þ 1 ð Þ 1 � *α=*2 are the highest integers that are not greater thanð Þ *B* þ 1 ð Þ *α=*2 and

Finishing the proposed methodology, the computational resources available in the R software [27, 28] were used through the *boot* and *evd* [29] packages to fitting the probability distributions for sensory scores, hypothesis tests and construction of

The following results correspond to the parameter estimates for the probability distributions fitted for the two classes of tasters, as well as the *p*-values referring to

With these specifications, given a level of significance of 1%, it is noted the confirmation of the fit in the sensory scores for each coffee, therefore, there is statistical evidence to assume that GEV distribution is adequate to model the maximum sensory grades of the evaluated coffees (**Table 3**). It should be noted that the fact that we have *p-*values greater than 1% for the KS test indicates that there is statistical evidence for the acceptance of the test's null hypothesis, as can be seen in Section 2. The test used, however, according to [30], should only be used for completely specified distributions, that is, when there are no unknown parameters that need to be estimated from the sample. Otherwise, the test is very conservative. One solution would be to obtain, via simulation, the theoretical quantiles of the Kolmogorov Smirnov test to compare them with the quantiles obtained from the sample. A similar procedure for the Gumbel distribution was carried out by [31].

∗

∗

the validation of the probabilistic model fitted for the sensory scores.

**Coffee Group Parameter estimates KS**

*by consumers in the sensory evaluation of the special coffees named in A, B, C and D.*

*μ*^ *σ*^ ^*ξ* A Untrained 5.9471 2.4105 �0.6569 0.9077 0.0803

B Untrained 5.9326 2.4624 �0.5721 0.9466 0.3306

C Untrained 6.4290 2.2108 �0.6348 0.9485 0.6084

D Untrained 7.8676 2.0437 �0.9582 0.9962 0.9625

*Parameters estimates and results of the Kolmogorov–Smirnov and Ljung-box tests for the maximum scores given*

Trained 6.9345 1.6259 �0.6156 0.8908 0.2359

Trained 6.6031 1.9455 �0.5736 0.8255 0.0110

Trained 7.0595 1.3382 �0.5485 0.9998 0.9823

Trained 7.8113 1.8183 �0.8221 0.6543 0.6924

∗ ð Þ *B*

, where *k*<sup>1</sup> ¼ ð Þ *B* þ 1 ð Þ *α=*2 and *k*<sup>2</sup> ¼

ð Þ *<sup>k</sup>*<sup>1</sup> is the 100ð Þ *<sup>α</sup>=*<sup>2</sup> %-percentile of the Boot-

ð Þ *<sup>k</sup>*<sup>2</sup> is the 100 1ð Þ � *<sup>α</sup>=*<sup>2</sup> %-percentile of the Boot-

**(***p***-value)**

**LB (***p***-value)**

, for *α* significance level

(**Step 2)** From Bootstrap sample *<sup>P</sup>*<sup>∗</sup> , obtain ^*<sup>θ</sup>* <sup>¼</sup> *Mo X*ð Þ;

∗ <sup>¼</sup> ^*<sup>θ</sup>* ∗ ð Þ<sup>1</sup> <sup>≤</sup> ^*<sup>θ</sup>* ∗

∗ ð Þ *<sup>k</sup>*<sup>1</sup> ; ^*<sup>θ</sup>* ∗ ð Þ *k*<sup>2</sup> h i

(**Step 3)** Repeat the steps 1 and 2 *B* times;

(**Step 4)** From the vector ^**θ**

*Recent Advances in Numerical Simulations*

dence is given by *IC*ð Þ <sup>1</sup>�*<sup>α</sup>* ð Þ*<sup>θ</sup>* : ^*<sup>θ</sup>*

ð Þ *<sup>B</sup>* <sup>þ</sup> <sup>1</sup> ð Þ <sup>1</sup> � *<sup>α</sup>=*<sup>2</sup> , respectively; and ^*<sup>θ</sup>*

strap empirical distribution [16, 26].

strap empirical distribution; and ^*θ*

Bootstrap confidence intervals.

**3. Experimental results**

**Table 3.**

**182**

*Q-Q plot referring to the fitted GEV distributions for the maximum sensory scores obtained in the evaluation of each special coffee for the group of untrained (left) and trained (right) tasters.*

Alternatively, inspection of fit quality can be assessed via Q-Q plots graphs. They are shown in **Figure 1**.

In this sense, the validation of the GEV distribution is corroborated in the Q-Q plots shown in **Figure 1**, because for all the specialty coffees evaluated, the theoretical quantiles showed a linear behavior and close to the straight identity with the observed quantiles and the points being, in their mostly, contained in the 95% confidence interval. It should also be noted that the quantiles have a trend to converge to a region located as an upper tail. All *p*-values of the Ljung Box test are higher than 1%, thus showing the acceptance of the null hypothesis of the test, as described in Section 2. It can be concluded, therefore, that the maximum scores given by trained and untrained tasters they are independent. We should highlight that we have used these tests to verify the assumptions of the Extreme Value Theory models, but that they could be used for other interests, such as in the trend analysis of hydro-climatic series [32–34]. Failure to observe these assumptions can lead to fitted models parameter estimates, as well return levels estimates, biased and/or under/overestimated. For these situations, Bayesian methods, regression or time series based on the Box-Jenkins methodology could be considered [35, 36].

More specifically speaking, for coffee A, the initial mode estimate is 7.8 points for untrained tasters, i.e., ^*θ*<sup>0</sup> <sup>¼</sup> *Mo X*ð Þ is 7.8 points, that is contained in the 95%

*Intensive Computational Method Applied for Assessing Specialty Coffees by Trained…*

∗

∗

points). Likewise for trained tasters, the initial estimate for mode is 8.1 points for trained tasters, that is, ^*θ*<sup>0</sup> <sup>¼</sup> *Mo X*ð Þ, is 8.1 points, that is contained in the 95%

points), indicating, therefore, that the scores attributed to coffee A by trained and

According to the results described in **Table 4**, it is clear that given a sensory panel made up of untrained consumers, there is a probability that all consumers will have a sensory score higher than 6.0, indicating that whatever the taster is, among the types of specialty coffees studied, no coffee will be classified with quality below the Specialty Grade, since all the coffees analyzed showed a high probability that the most frequent grade is higher than 6. On a 9-point verbal hedonic scale, it can be

**Figure 2** presents the histogram and the Q-Q plot for the mode of the fitted distribution for the grades given by the untrained tasters for coffee A in **Table 1**. The histogram suggests that the empirical distribution of *θ* ¼ *Mo X*ð Þ is a normal and this fact is corroborated by the Q-Q plot, since the one-to-one proportionality is maintained considering the quantiles of the standard normal versus the observed quantiles. Similar results were observed for all other specialty coffees, however

When considering an expressive score worthy of international competitions, having a reference higher than 8, the probability of a consumer providing an occurrence of a note being higher than 8 or the coffee being classified as excellent is relatively low for all evaluated coffees (**Table 4**). It is also noted that the probability of a consumer assigning a grade between 9.1 and 10.0 is 32.8%, that is, it can be interpreted that coffee D to be considered exceptional by a consumer is 32.8%. In addition, coffee D is the one with the least amplitude in probability, corresponding to the column "Difference" in **Table 4**, which indicates that it is a type of coffee that provided low variability between the grades attributed by the tasters. On the other hand, coffee B showed greater variability between the grades attributed by

*Histogram for the 5000 values of the bootstrap modes for the scores of untrained tasters and the respective*

concluded that, in general, consumers have a trend to be indifferent to the

ð Þ <sup>2</sup>*:*5% <sup>¼</sup> <sup>6</sup>*:*9 points and ^*<sup>θ</sup>*

ð Þ <sup>2</sup>*:*5% <sup>¼</sup> <sup>7</sup>*:*3 points and ^*<sup>θ</sup>*

∗

∗

ð Þ <sup>97</sup>*:*5% ¼ 8*:*9

ð Þ <sup>97</sup>*:*5% ¼ 9*:*0

confidence interval for Bootstrap mode (^*θ*

*DOI: http://dx.doi.org/10.5772/intechopen.95234*

confidence interval for Bootstrap mode (^*θ*

agradability of specialty coffees.

these results will not be shown.

**Figure 2.**

**185**

*normal Q-Q plot.*

untrained tasters are similar with 95% confidence.

In function of the confirmatory results related to the GEV distribution goodness of fit, given the estimates of the parameters for this distribution applied in the maximum sensory scores given by consumers in the evaluations carried out for each coffee, we proceeded with the calculations of the probabilities for an individual to supply a grade higher than a given grade. The results are described in **Table 4**.

Before that, the distribution modes were calculated as shown in **Table 4**, in order to verify the similarity between the grades provided by trained and untrained tasters. It is observed that occasionally they can be considered very close. For specialty coffees A and B, trained tasters provided higher grades more frequently than untrained tasters and for specialty coffees C and D the opposite occurred.

Although the similarity between the modes of the grades attributed by the tasters is evident, this similarity is not associated with any level of confidence, since the similarity is only punctual. To circumvent this situation, confidence intervals were constructed using the non-parametric Bootstrap method, as shown in Section 2. Thus, it can be stated with 95% confidence that the grades most frequently attributed to coffee A by trained tasters and not trained do not differ statistically, since the point estimate for the fashion of the notes is contained in the respective confidence intervals and they are overlapping.


#### **Table 4.**

*Maximum scores modes given by consumers in the sensory panel of specialty coffees named in A, B, C and D and their respective 95% confidence intervals (q0.025 and q0.975).*

*Intensive Computational Method Applied for Assessing Specialty Coffees by Trained… DOI: http://dx.doi.org/10.5772/intechopen.95234*

More specifically speaking, for coffee A, the initial mode estimate is 7.8 points for untrained tasters, i.e., ^*θ*<sup>0</sup> <sup>¼</sup> *Mo X*ð Þ is 7.8 points, that is contained in the 95% confidence interval for Bootstrap mode (^*θ* ∗ ð Þ <sup>2</sup>*:*5% <sup>¼</sup> <sup>6</sup>*:*9 points and ^*<sup>θ</sup>* ∗ ð Þ <sup>97</sup>*:*5% ¼ 8*:*9 points). Likewise for trained tasters, the initial estimate for mode is 8.1 points for trained tasters, that is, ^*θ*<sup>0</sup> <sup>¼</sup> *Mo X*ð Þ, is 8.1 points, that is contained in the 95% confidence interval for Bootstrap mode (^*θ* ∗ ð Þ <sup>2</sup>*:*5% <sup>¼</sup> <sup>7</sup>*:*3 points and ^*<sup>θ</sup>* ∗ ð Þ <sup>97</sup>*:*5% ¼ 9*:*0 points), indicating, therefore, that the scores attributed to coffee A by trained and untrained tasters are similar with 95% confidence.

According to the results described in **Table 4**, it is clear that given a sensory panel made up of untrained consumers, there is a probability that all consumers will have a sensory score higher than 6.0, indicating that whatever the taster is, among the types of specialty coffees studied, no coffee will be classified with quality below the Specialty Grade, since all the coffees analyzed showed a high probability that the most frequent grade is higher than 6. On a 9-point verbal hedonic scale, it can be concluded that, in general, consumers have a trend to be indifferent to the agradability of specialty coffees.

**Figure 2** presents the histogram and the Q-Q plot for the mode of the fitted distribution for the grades given by the untrained tasters for coffee A in **Table 1**. The histogram suggests that the empirical distribution of *θ* ¼ *Mo X*ð Þ is a normal and this fact is corroborated by the Q-Q plot, since the one-to-one proportionality is maintained considering the quantiles of the standard normal versus the observed quantiles. Similar results were observed for all other specialty coffees, however these results will not be shown.

When considering an expressive score worthy of international competitions, having a reference higher than 8, the probability of a consumer providing an occurrence of a note being higher than 8 or the coffee being classified as excellent is relatively low for all evaluated coffees (**Table 4**). It is also noted that the probability of a consumer assigning a grade between 9.1 and 10.0 is 32.8%, that is, it can be interpreted that coffee D to be considered exceptional by a consumer is 32.8%. In addition, coffee D is the one with the least amplitude in probability, corresponding to the column "Difference" in **Table 4**, which indicates that it is a type of coffee that provided low variability between the grades attributed by the tasters. On the other hand, coffee B showed greater variability between the grades attributed by

**Figure 2.** *Histogram for the 5000 values of the bootstrap modes for the scores of untrained tasters and the respective normal Q-Q plot.*

In this sense, the validation of the GEV distribution is corroborated in the Q-Q

plots shown in **Figure 1**, because for all the specialty coffees evaluated, the theoretical quantiles showed a linear behavior and close to the straight identity with the observed quantiles and the points being, in their mostly, contained in the 95% confidence interval. It should also be noted that the quantiles have a trend to converge to a region located as an upper tail. All *p*-values of the Ljung Box test are higher than 1%, thus showing the acceptance of the null hypothesis of the test, as described in Section 2. It can be concluded, therefore, that the maximum scores given by trained and untrained tasters they are independent. We should highlight that we have used these tests to verify the assumptions of the Extreme Value Theory models, but that they could be used for other interests, such as in the trend analysis of hydro-climatic series [32–34]. Failure to observe these

assumptions can lead to fitted models parameter estimates, as well return levels estimates, biased and/or under/overestimated. For these situations, Bayesian methods, regression or time series based on the Box-Jenkins methodology could be

of fit, given the estimates of the parameters for this distribution applied in the maximum sensory scores given by consumers in the evaluations carried out for each coffee, we proceeded with the calculations of the probabilities for an individual to supply a grade higher than a given grade. The results are described in **Table 4**. Before that, the distribution modes were calculated as shown in **Table 4**, in order to verify the similarity between the grades provided by trained and untrained tasters. It is observed that occasionally they can be considered very close. For specialty coffees A and B, trained tasters provided higher grades more frequently than untrained tasters and for specialty coffees C and D the opposite occurred. Although the similarity between the modes of the grades attributed by the tasters is evident, this similarity is not associated with any level of confidence, since the similarity is only punctual. To circumvent this situation, confidence intervals were constructed using the non-parametric Bootstrap method, as shown in Section 2. Thus, it can be stated with 95% confidence that the grades most frequently attributed to coffee A by trained tasters and not trained do not differ statistically, since the point estimate for the fashion of the notes is contained in the respective

In function of the confirmatory results related to the GEV distribution goodness

*(%)*

A Untrained 7.8 6.9 8.9 47.2 7.7 39.5

B Untrained 7.6 6.2 9.2 59.2 8.2 51.0

C Untrained 8.1 7.2 9.0 47.6 9.5 38.1

D Untrained 9.9 9.1 10.0 32.8 0.2 32.6

*Maximum scores modes given by consumers in the sensory panel of specialty coffees named in A, B, C and D*

*The probabilities P[X > q0.025], P[X > q0.975] and Difference are given in percentages.*

*and their respective 95% confidence intervals (q0.025 and q0.975).*

Trained 8.1 7.3 9.0 54.1 8.2 45.8

Trained 7.9 6.6 9.2 62.4 7.1 55.4

Trained 7.9 7.1 8.9 62.2 8.1 54.1

Trained 9.5 8.6 10.0 44.5 0.7 43.9

*P[X > q0.975] (%)*

**Difference** *(%)*

considered [35, 36].

*Recent Advances in Numerical Simulations*

**Table 4.**

**184**

confidence intervals and they are overlapping.

**Coffee Group Mode** *q0.025 q0.975 P[X > q0.025]*

the tasters, since the difference *P[X > q0.025] - P[X > q0.975]* is the largest among the analyzed coffees.

of the estimates for the maximum notes mode, given the GEV distribution. Other confidence intervals via bootstrap could be considered, such as bootstrap-*t* and BCa [37, 38]. We emphasize that the strategy adopted is innovative in the context of sensory notes and the comparison of confidence intervals can be done as future

*Intensive Computational Method Applied for Assessing Specialty Coffees by Trained…*

The GEV distribution can be applied to the sensory analysis of specialty coffees,

The probabilities obtained by this distribution show that the sensory analysis of specialty coffees performed by untrained consumers indicates that they are able to differentiate specialty coffees and provide similar scores to the sensory analysis

The proposed inference made it possible to attribute some degree of uncertainty regarding the occurrence of sensory scores in the different types of specialty coffees studied and to indicate which group each coffee belongs to with high probability

It can be recommended that more intensive training with tasters or the applica-

tion of the proposed methodology with tasters with international certification should be considered with a view to assessing specialty coffees against a reference score of 9 points, since for the present study, only coffee D has a high probability of presenting this note. It should be noted that according to the analysis protocol provided by Specialty Coffee Association of America, the results of the sensory evaluation vary according to a scale where the grades upper to 9 correspond to

The study has some limitations that provide directions for future research, although the GEV distribution is specific for analyzing maximum values, the data generating mechanism truncates the maximum score at 10. This characteristic could be taken into account, fitting the model to truncated data. Some proposals have appeared in the literature to consider truncation in the estimation process by maximum likelihood, but there is no consolidated methodology yet. Therefore, it is a

The authors are grateful to the National Council for Scientific and Technological

possibility for further studies that may be the subject of future research.

Development (CNPq—Conselho Nacional de Desenvolvimento Científico e Tecnológico), the Minas Gerais State Research Support Foundation (FAPEMIG— Fundação de Amparo para Pesquisa do Estado de Minas Gerais), the Coordination for the Improvement of Higher Education Personnel (CAPES—Coordenação de Aperfeiçoamento de Pessoal de Nível Superior), and the National Coffee Science and Technology Institute (INCT/Café—Instituto Nacional de Ciência e Tecnologia

whose sensorial panel presents an heterogeneity among consumers.

work.

**4. Conclusions and final remarks**

*DOI: http://dx.doi.org/10.5772/intechopen.95234*

performed by consumers with prior training.

according to the Specialty Grade.

exceptional coffee.

**Acknowledgements**

**Conflict of interest**

The authors declare no conflict of interest.

do Café).

**187**

Therefore, in the evaluation of the four specialty coffees, given the low probabilities, it can be said that a sensory experiment carried out with the objective of discriminating the specialty coffees, is done with consumers who present more improved training.

**Figure 3** shows graphically the agreement between the scores given by trained (blue hatched) and untrained (black hatched) tasters, according to the results shown in **Table 3**.

The importance of using bootstrap procedures in the analysis of responses that corroborate with these scores is relevant for statistically validating the scores obtained in international competitions, since it assumes that subjective and / or unknown factors, related to the different sensory perceptions of the tasters may suggest violations in the sample distribution, and as a consequence, the estimates of the probabilistic model are distorted. Thus, through successive resampling, an empirical distribution for each parameter is generated in connection with the assumed probabilistic model, and inferences will be made with better precision and accuracy. The amplitude of the confidence interval in **Figure 3** reflects the precision

**Figure 3.**

*Graphical representation of the probability distributions adjusted for the notes attributed by untrained (black) and trained (blue) tasters and the respective 95% confidence intervals for the maximum scores modes attributed to each special coffee.*

*Intensive Computational Method Applied for Assessing Specialty Coffees by Trained… DOI: http://dx.doi.org/10.5772/intechopen.95234*

of the estimates for the maximum notes mode, given the GEV distribution. Other confidence intervals via bootstrap could be considered, such as bootstrap-*t* and BCa [37, 38]. We emphasize that the strategy adopted is innovative in the context of sensory notes and the comparison of confidence intervals can be done as future work.
