**3.1. Sectorial formula**

Uncertainty is estimated using statistical tools. The peculiarity of statistical instruments applicable in metrology is the essential role of a priori information in their work. The best way to obtain a priori information is a specially performed calibration experiment. In an ideal metrological experiment, the values of all model parameters are known and controlled except for

Statistics without a priori information cannot be used as the metrological tool. But the origin of the a priori information can be different. For example, certain object does not in any way depend on the will of the observer, and, consequently, a calibration experiment is impossible. But it is possible to collect a lot of different data about this object and similar ones. Data can only be used to classify them and to monitor the evolution of the object. On the other hand, if we reformulate the accumulated database as a priori information for identifying an object class from new data, then this is already a metrological formulation of the problem. The estimation of the absolute value characterizing the object is difficult because there is no direct traceability to the standard. But recognizing an object and estimating the magnitude of rela-

tive changes from a small amount of data can be formulated as a metrological task.

the model, we can estimate the observed state of the object.

ment model provides a description of the object under study.

**2. Models**

Usually, the data of the working experiment on the subject of observation are not numerous, but there is a priori information obtained in the calibration experiment. It is assumed that by the time of the working experiment this information is still relevant. Comparing the data and

An effective method—to compare the model used and the available data—is to estimate the probability that the data is generated by a source corresponding to the model. This probability is interpreted, in particular, as an estimate of the reliability of a particular value of the investigated quantity, described in the a priori model as an adjustable parameter. In other words, as an argument for the criterion to choose, one of the many variants of the measure-

In this text, an analysis of the features of traditional statistical tools [1] and some new tools to replace them is proposed. The dignity of new tools (in particular the rank measure) is signifi-

The rank measure was first proposed and intuitively grounded in [2]. In paper [3], it was formally justified. Some aspects of its application were discussed in Ref. [4]. Paper [5] describes the main tools and their applications for the method of converting the densities (MCD). In paper [6] the application of a rank measure to the type of experiment rarely used by metrology but widespread in technical disciplines is discussed. This is a simple interpretation of dynamic experiment. Its main features are as follows: enough data is collected, and a minimum number of observable factors are required to evaluate the values of many parameters of the model.

Habitual models of the measurement experiment are constructed from the principal *f* and stochastic *η* components, formally *M* (*f*, *η*). The principal component is a mathematical description

cantly a better universality, but its disadvantage is a large computing expenses.

one single parameter whose value is estimated.

94 Metrology

A trivial model with an unknown scattering parameter in accordance with mathematical statistics and normative documents of metrology is identified according to the formula (we call it the sectoral formula) *x*¯ <sup>=</sup> *<sup>s</sup>*(*D*) <sup>±</sup> *kS*(*D*), where *x*¯ is the estimate of the value of the measured quantity in the form of a confidence interval, *D* is experimental data, *s*(*D*) is the statistics used to estimate the value of the shift parameter of the distribution given by the model, *S*(*D*) is the statistics applied to estimate the scattering parameter and *k* is the coverage coefficient, which in general depends on the distribution law (both model and real) of the source of randomness, the number of repetitions of the experiment, both statistics, confidence and correction factors.

The property of the formula is illustrated in **Figure 1**. In this figure, by MCM the cloud of possible results of a multiple experiment is calculated and is delineated by means of a formula. The formula is linear, therefore divides the cloud of estimates into two regions by oblique boundaries.

quite simple to calculate. Here are just several simple illustrations. Let us replace the normal distribution to a very important uniform distribution. First, we apply to it normative statistics

Without going into numerical details, we give a few qualitative remarks on the illustrations given. Although the scale of both the distributions and clouds of assessments is comparable, coverage coefficients are distinctly different. It can be judged from the tilt of the

Clouds differ not only in form but also in size. The most compact cloud gives set of a normal distribution with of normative statisticians [**Figure 2** (left)] because this combination is optimal. The combination of a uniform distribution and normative statistics (central) is not optimal; hence, the cloud is scattered more. This loss of efficiency is not catastrophic, so this combination is used in practice. Normative statistics provide acceptable estimates for many finite distributions and many distributions with light tails, but there are such distributions where the efficiency is too small, for example, distributions with heavy tails. The combination of uniform distribution and statistics of extrema (right figure), although not optimally but somewhat more efficient than in the previous example. But in practice this combination is not used because the sectoral formula of the cloud cross section leads to an unacceptably overestimation of the confidence interval value. The reason is that the maximum cloud density of this example is at the vertex, when, as in the previous examples, the maximum density is closer to the centres of the clouds. An effective algorithm for estimating the distribution of the scattering parameter could help, but because of the variability of the distribution form,

De facto, the distribution form and both statistics are used as a single set. The situation can be interpreted in two ways. On the one hand, having the form of distribution, we can choose or synthesize statistics more or less effectively. On the other hand, selecting statistics from a certain set of tools, we actually choose a class of distribution forms for which the statistics are still effective. However, neither the value of efficiency nor the form of distribution can be

**Figure 2.** Clouds of scattering of results of estimates. For normal *N*(*x*, *μ* = 1, *σ* = 1) (left) and uniform *U*(*x*, *min*= −1, *max*= 3)

, the multiplicity of the experiment is 5 and the color markings

2

A New Statistical Tool Focused on Metrological Tasks http://dx.doi.org/10.5772/intechopen.74872

(*max*(*D*) + *min*(*D*))

97

[**Figure 2** (left and central)] and then more suitable statistics of extrema *s* = \_\_1

(*max*(*D*) − *min*(*D*)) [**Figure 2** (right)].

mathematical statistic could not offer such an algorithm.

(central and right) distributions. The number of tests is 10<sup>6</sup>

for the confidence probability of 0.95.

and *S* = \_\_1 2

colored borders.

precisely determined.

The change in the coefficient of coverage will lead to a shift in the boundaries of the blue and red sectors, and a corresponding change in the confidence probability is due to a change in the ratio of the shares of estimates within and outside the confidence interval.

The advantage of the formula is that whatever the dispersion of the source of chance, you will still get your 95% of correct estimates. This is illustrated by the superposition of clouds with different dispersions.

The disadvantage is the strong dependence of the error probability on the standard deviation. If by will of chance the data is close, then the probability of error is large, greater than the confidence probability. If the data is very scattered, then the confidence interval is too wide, with that the actual probability of making a mistake is negligible. The confidence interval is located at the level value of statistics from the border blue/red to the border red/blue. But in the statistical limit, the confidence probability will be met. Intuitively, it is believed that, namely, the extreme values of the cloud of estimates are discarded, but in reality, it is not so. The paradox is that the probability of error is more there when the data seem better and vice versa.

The illustration is given for normal distribution and normative statistics. For other distributions and for other statistics, the scattering clouds of the results are different, sometimes quite bizarre. Coefficient of coverage should also have its own value different from Student; however, it is

**Figure 1.** Clouds of scattering of results of estimates. The number of tests is 10<sup>6</sup> , the multiplicity of the experiment is 5, the source of chance has the normal distribution with μ = 1 and σ = 1 and 2 (the notations in the figure by different transparency) and the color markings for the confidence probability of 0.95 are blue (erroneous estimates) and red (correct *μ* ∈ *r*¯). Statistics are used (the arithmetic mean and the standard deviation) and the coverage coefficient is the Student's coefficient, now depending only on the number of repetitions and the confidence level.

quite simple to calculate. Here are just several simple illustrations. Let us replace the normal distribution to a very important uniform distribution. First, we apply to it normative statistics [**Figure 2** (left and central)] and then more suitable statistics of extrema *s* = \_\_1 2 (*max*(*D*) + *min*(*D*)) and *S* = \_\_1 2 (*max*(*D*) − *min*(*D*)) [**Figure 2** (right)].

The property of the formula is illustrated in **Figure 1**. In this figure, by MCM the cloud of possible results of a multiple experiment is calculated and is delineated by means of a formula. The formula is linear, therefore divides the cloud of estimates into two regions by oblique boundaries. The change in the coefficient of coverage will lead to a shift in the boundaries of the blue and red sectors, and a corresponding change in the confidence probability is due to a change in the

The advantage of the formula is that whatever the dispersion of the source of chance, you will still get your 95% of correct estimates. This is illustrated by the superposition of clouds with

The disadvantage is the strong dependence of the error probability on the standard deviation. If by will of chance the data is close, then the probability of error is large, greater than the confidence probability. If the data is very scattered, then the confidence interval is too wide, with that the actual probability of making a mistake is negligible. The confidence interval is located at the level value of statistics from the border blue/red to the border red/blue. But in the statistical limit, the confidence probability will be met. Intuitively, it is believed that, namely, the extreme values of the cloud of estimates are discarded, but in reality, it is not so. The paradox

is that the probability of error is more there when the data seem better and vice versa.

**Figure 1.** Clouds of scattering of results of estimates. The number of tests is 10<sup>6</sup>

Student's coefficient, now depending only on the number of repetitions and the confidence level.

The illustration is given for normal distribution and normative statistics. For other distributions and for other statistics, the scattering clouds of the results are different, sometimes quite bizarre. Coefficient of coverage should also have its own value different from Student; however, it is

5, the source of chance has the normal distribution with μ = 1 and σ = 1 and 2 (the notations in the figure by different transparency) and the color markings for the confidence probability of 0.95 are blue (erroneous estimates) and red (correct *μ* ∈ *r*¯). Statistics are used (the arithmetic mean and the standard deviation) and the coverage coefficient is the

, the multiplicity of the experiment is

ratio of the shares of estimates within and outside the confidence interval.

different dispersions.

96 Metrology

Without going into numerical details, we give a few qualitative remarks on the illustrations given. Although the scale of both the distributions and clouds of assessments is comparable, coverage coefficients are distinctly different. It can be judged from the tilt of the colored borders.

Clouds differ not only in form but also in size. The most compact cloud gives set of a normal distribution with of normative statisticians [**Figure 2** (left)] because this combination is optimal. The combination of a uniform distribution and normative statistics (central) is not optimal; hence, the cloud is scattered more. This loss of efficiency is not catastrophic, so this combination is used in practice. Normative statistics provide acceptable estimates for many finite distributions and many distributions with light tails, but there are such distributions where the efficiency is too small, for example, distributions with heavy tails. The combination of uniform distribution and statistics of extrema (right figure), although not optimally but somewhat more efficient than in the previous example. But in practice this combination is not used because the sectoral formula of the cloud cross section leads to an unacceptably overestimation of the confidence interval value. The reason is that the maximum cloud density of this example is at the vertex, when, as in the previous examples, the maximum density is closer to the centres of the clouds. An effective algorithm for estimating the distribution of the scattering parameter could help, but because of the variability of the distribution form, mathematical statistic could not offer such an algorithm.

De facto, the distribution form and both statistics are used as a single set. The situation can be interpreted in two ways. On the one hand, having the form of distribution, we can choose or synthesize statistics more or less effectively. On the other hand, selecting statistics from a certain set of tools, we actually choose a class of distribution forms for which the statistics are still effective. However, neither the value of efficiency nor the form of distribution can be precisely determined.

**Figure 2.** Clouds of scattering of results of estimates. For normal *N*(*x*, *μ* = 1, *σ* = 1) (left) and uniform *U*(*x*, *min*= −1, *max*= 3) (central and right) distributions. The number of tests is 10<sup>6</sup> , the multiplicity of the experiment is 5 and the color markings for the confidence probability of 0.95.

## **3.2. Corrections coefficients**

The normative tool has yet a problem that we call a mysterious amendment to deviation. Deviation is recommended to be used not in a pure form, but with a correction coefficient (the so-called standard deviation). It is explained that this amendment allegedly eliminated deviation from the dispersion of the normal random source. But very few noticed that this is not quite true.

of coverage of which is calculated even more easily than the correction. A reasonable way is to abandon the amendments and the coefficients of coverage numerically computed, but this will

A New Statistical Tool Focused on Metrological Tasks http://dx.doi.org/10.5772/intechopen.74872 99

The sectoral formula is useful, but the rank measure copes with similar tasks of metrology

The principle says that the important instrument of metrological research should allow to

According to the principle—using the model and experimental data—the joint probability distribution for all values of each of the estimated variables is calculated. Each point of this distribution is interpreted as the probability that the data is obtained in accordance with the model and, moreover, with specific values of its parameters. Evaluation of the result of the experiment is given as *X*̂ <sup>≔</sup> {(*x*, *<sup>p</sup>*)} (the value of each of the estimated variables, the corresponding probability of this value). Of course, differences in the parameters of the model lead to different probabilities for a particular value of the evaluated value; the same can be said if the

The task of constructing the estimation algorithm is solved in the general form of both MCM and CDM. The results are comparable, although the algorithms are different. To solve this, we need a consistency of the numerical model and also a metric for the data structures that model

The range of possible values of the estimated parameter *<sup>x</sup>* ¯ must be broken one way or another into a set of possible values {*x*}. Using the model for each possible value, a prediction of the possible data values {*Pr*} (it also is a set) should be obtained. Each prediction is compared with the experimental data by means of the metric *μ*. The results of the comparison are collected in the uncertainty function *u*(*x*). And, only after this based on the uncertainty function, simplified

The numerical consistency of the model is understood as the ability of the model (if all the adjustable variables are given) in a numerical experiment to generate model data indistin-

The metric should evaluate the magnitude of the difference between the same type of data in both experimental and simulation origin. The metric is constructed based on the modeling

When using MCM, the 'natural' metric consists of counting the (approximate) matches of the data set to be checked and the extensive database generated for the given parameter values. In order to estimate the probability to the value of the parameter being evaluated, the model is launched many times (at example *N*), at this value of the parameter *x*, and the fraction of coincidences

<sup>|</sup>*x*} <sup>→</sup> *<sup>u</sup>*(*x*).

Formally, this sequence of operations must be performed: *x*¯ *dis* ⎯→{*x*} <sup>→</sup> {*M*(*x*)} <sup>→</sup> {*Pr*|*x*} <sup>→</sup> {*μ*(*Pr*,*D*)

estimate the probability of obtaining a certain sample of data from the selected model.

**4. The principle of measuring of probabilities of origin**

model is the same and the experimental data are different.

guishable (quite similar) from the data obtained in the experiment.

method and also on features of the application where it is used.

the results of the experiment.

formal estimates are performed.

no longer be the coefficients of the Student.

better.

Firstly, the distribution of deviation is asymmetric, its form changes, and is especially strongly at small amounts of repeated experiments. And only to an infinite number of experiments it approximates to normality and, accordingly, to symmetry.

Secondly, because of the nonsymmetric form of deviation distribution, it is not entirely clear in which its characteristic should be adjusted. It is customary to correct the mode, but with the same success, it is possible to correct a centre of gravity or some kind of composite criterion composed of the moments of this distribution.

Thirdly, even for the mode, the recommended corrections only partially eliminate the problem. The reason lies in the desire to describe the correction factor by a simple formula. While its magnitude is simply calculated, the result does not fit into any of the proposed theoretical constructions (**Figure 3**). The reason is the complex and contradictory changes in the form and position of the cloud of estimates as the number of repeated experiments is changing.

The idea of the correction is that, a priori knowing its magnitude, we correct the estimate made by the statistics that measures the scattering parameter so that in the statistical limit the estimate coincides with the value of the dispersion. The question arises: what for? The quality of the estimate of the measured quantity is determined by the sectoral formula, and the coefficient

**Figure 3.** Estimates of the scattering parameter and the effect of corrections as a function of the number of repetitions of the experiment. The source of randomness is the normal distribution with μ = 2 and σ = 0.5. The statistics for estimating the scattering parameter is the deviation. MCM is used for obtaining data by two 10<sup>7</sup> tests. Each point is the result of an independent experiment. Legend on the figure field: is estimate without correction, with correction factor <sup>√</sup> \_\_\_\_ \_\_\_*n n* − 1 (standard deviation), with correction √ \_\_\_\_ \_\_\_ *n* − 2 *<sup>n</sup>* <sup>−</sup> <sup>3</sup> and with correction √ \_\_\_\_ \_\_\_ *n* − 1 *<sup>n</sup>* <sup>−</sup> <sup>3</sup> .

of coverage of which is calculated even more easily than the correction. A reasonable way is to abandon the amendments and the coefficients of coverage numerically computed, but this will no longer be the coefficients of the Student.

The sectoral formula is useful, but the rank measure copes with similar tasks of metrology better.
