**4. The principle of measuring of probabilities of origin**

**3.2. Corrections coefficients**

not quite true.

98 Metrology

The normative tool has yet a problem that we call a mysterious amendment to deviation. Deviation is recommended to be used not in a pure form, but with a correction coefficient (the so-called standard deviation). It is explained that this amendment allegedly eliminated deviation from the dispersion of the normal random source. But very few noticed that this is

Firstly, the distribution of deviation is asymmetric, its form changes, and is especially strongly at small amounts of repeated experiments. And only to an infinite number of experiments it

Secondly, because of the nonsymmetric form of deviation distribution, it is not entirely clear in which its characteristic should be adjusted. It is customary to correct the mode, but with the same success, it is possible to correct a centre of gravity or some kind of composite criterion

Thirdly, even for the mode, the recommended corrections only partially eliminate the problem. The reason lies in the desire to describe the correction factor by a simple formula. While its magnitude is simply calculated, the result does not fit into any of the proposed theoretical constructions (**Figure 3**). The reason is the complex and contradictory changes in the form and

The idea of the correction is that, a priori knowing its magnitude, we correct the estimate made by the statistics that measures the scattering parameter so that in the statistical limit the estimate coincides with the value of the dispersion. The question arises: what for? The quality of the estimate of the measured quantity is determined by the sectoral formula, and the coefficient

**Figure 3.** Estimates of the scattering parameter and the effect of corrections as a function of the number of repetitions of the experiment. The source of randomness is the normal distribution with μ = 2 and σ = 0.5. The statistics for estimating

> \_\_\_\_ \_\_\_ *n* − 1 *<sup>n</sup>* <sup>−</sup> <sup>3</sup> .

independent experiment. Legend on the figure field: is estimate without correction, with correction factor <sup>√</sup>

*<sup>n</sup>* <sup>−</sup> <sup>3</sup> and with correction √

tests. Each point is the result of an

\_\_\_\_ \_\_\_*n n* − 1

the scattering parameter is the deviation. MCM is used for obtaining data by two 10<sup>7</sup>

\_\_\_\_ \_\_\_ *n* − 2

(standard deviation), with correction √

position of the cloud of estimates as the number of repeated experiments is changing.

approximates to normality and, accordingly, to symmetry.

composed of the moments of this distribution.

The principle says that the important instrument of metrological research should allow to estimate the probability of obtaining a certain sample of data from the selected model.

According to the principle—using the model and experimental data—the joint probability distribution for all values of each of the estimated variables is calculated. Each point of this distribution is interpreted as the probability that the data is obtained in accordance with the model and, moreover, with specific values of its parameters. Evaluation of the result of the experiment is given as *X*̂ <sup>≔</sup> {(*x*, *<sup>p</sup>*)} (the value of each of the estimated variables, the corresponding probability of this value). Of course, differences in the parameters of the model lead to different probabilities for a particular value of the evaluated value; the same can be said if the model is the same and the experimental data are different.

The task of constructing the estimation algorithm is solved in the general form of both MCM and CDM. The results are comparable, although the algorithms are different. To solve this, we need a consistency of the numerical model and also a metric for the data structures that model the results of the experiment.

Formally, this sequence of operations must be performed: *x*¯ *dis* ⎯→{*x*} <sup>→</sup> {*M*(*x*)} <sup>→</sup> {*Pr*|*x*} <sup>→</sup> {*μ*(*Pr*,*D*) <sup>|</sup>*x*} <sup>→</sup> *<sup>u</sup>*(*x*). The range of possible values of the estimated parameter *<sup>x</sup>* ¯ must be broken one way or another into a set of possible values {*x*}. Using the model for each possible value, a prediction of the possible data values {*Pr*} (it also is a set) should be obtained. Each prediction is compared with the experimental data by means of the metric *μ*. The results of the comparison are collected in the uncertainty function *u*(*x*). And, only after this based on the uncertainty function, simplified formal estimates are performed.

The numerical consistency of the model is understood as the ability of the model (if all the adjustable variables are given) in a numerical experiment to generate model data indistinguishable (quite similar) from the data obtained in the experiment.

The metric should evaluate the magnitude of the difference between the same type of data in both experimental and simulation origin. The metric is constructed based on the modeling method and also on features of the application where it is used.

When using MCM, the 'natural' metric consists of counting the (approximate) matches of the data set to be checked and the extensive database generated for the given parameter values. In order to estimate the probability to the value of the parameter being evaluated, the model is launched many times (at example *N*), at this value of the parameter *x*, and the fraction of coincidences with the experimental data is counted in this series of numerical experiments, formally {*M*(*x*) <sup>→</sup> *Pr*<sup>=</sup> <sup>=</sup> *<sup>D</sup>*}*<sup>N</sup>* ⎯ *count*→*C*; sequence metric is *<sup>μ</sup>* <sup>≝</sup> \_\_*<sup>C</sup> <sup>N</sup>*|*x*. Repeating a series of experiments for other values of the same parameter, we obtain the required estimate, which looks like a density of probability distribution. If it is required to evaluate number of the factors more than have been considered, then they are evaluated by the same algorithm by performing similar operations.

This creates an equivalence class for data samples formally different as records of the data acquisition process, but within the class, those samples are indistinguishable by the metric. Data sample after simple sorting in ascending order (rank statistics) is a natural representa-

Each of the data sample elements *dk*|*<sup>k</sup>* <sup>=</sup> 1…*n* in its ordered sample has its own order

tributions in each of elements in other positions, where *p*(*x*) and *P*(*x*) are the probability distribution density and the cumulative distribution function of a model random source,

An important feature of the algorithm for identifying a trivial statistical model with the assumptions made is that there is no need to explicitly define the metric. You can immediately go to the estimation of the demanded probability of origin by comparing the prediction of the model in the form of the densities of the distributions of each of the data elements and the sorted experimental data. The formula of a rank measure can be dissected

Their interpretation is obvious: the latter is the formula of the likelihood method, the second is the correction to the likelihood method and the first is the normalizing factor. For this reason,

The rank measure is the simplest solution of the identification problem for the simplest model that can be obtained within the framework of calculating the probability of origin. The reason is in the availability of an analytical formula for calculating the model's prediction. For more complex models, there is no such formula. At least we need to compute the prediction of the model numerically. Studies were conducted and it was revealed that for two important particular models' explicit formulation of a metric is not required too. It is multifactor expansion of the trivial model and model where the parameters of the dynamic deterministic function

In this section we give examples of the application of a rank measure in some basic types of experiments. Let us compare the results obtained by algorithms using a rank measure and the results of normative algorithms. In this section, several varieties of direct measurement

is naturally calculated as the multiplication of the origin probabilities of each of

, *μ*, *σ*)*<sup>k</sup>*−<sup>1</sup>

(1 − *P*(*dk*

, *μ*, *σ*)) *n*−*k*

(*dk*

*n* (*P* (*dk*

*<sup>n</sup> pk* <sup>⁄</sup>*<sup>n</sup>*

considered single. We call this result the 'rank measure'. *End of proof.*

<sup>n</sup> \_\_\_\_\_\_\_\_\_\_ n!

are identified against the background of noise.

**6. Using rank measure in metrology**

experiment and one generalization are considered.

(k <sup>−</sup> 1)!(n <sup>−</sup> k)!) <sup>×</sup> (∏*<sup>k</sup>*=1

the rank measure can be considered as a corrected likelihood method.

*P* (*x*)*<sup>k</sup>*−<sup>1</sup> (1 − *P*(*x*))*<sup>n</sup>*−*<sup>k</sup> p*(*x*) different from the dis-

A New Statistical Tool Focused on Metrological Tasks http://dx.doi.org/10.5772/intechopen.74872

). The probability associated with the entire sample

(*d*) because the event of obtaining a sample of data is

is calculated from the corre-

))(∏*<sup>k</sup>*=1

*<sup>n</sup> p*(*dk*

, *μ*, *σ*))

101

(*x*) <sup>=</sup> \_\_\_\_\_\_\_\_\_\_ *<sup>n</sup>*! (*k* − 1)!(*n* − *k*)!

The probability of the origin of the value of each data element *dk*

tive of each of these classes and can be used instead.

density of distribution *pk* <sup>⁄</sup>*<sup>n</sup>*

sponding order distribution density as *pk* <sup>⁄</sup>*<sup>n</sup>*

the elements of data *m*({*dk*}*n*) <sup>=</sup> <sup>∏</sup>*<sup>k</sup>*=1

respectively.

of data {*dk*}*<sup>n</sup>*

to three factors:

<sup>n</sup>, μ, σ) = (∏k=1

m({d}

When using MCD, the estimation algorithm solves the deconvolution problem in the general formulation *M*(*u*(*x*)) → *Pr*= = *D* → *u*(*x*). The model parameters are specified as densities for both the stochastic component and the parameters to be evaluated (as objective function *u*(*x*)). The prediction of the model will be obtained as a certain *n*-dimensional density describing the possible values of the data. It is required to choose both the dimensions and the form of the density of the evaluated parameters so that the metric points out the maximum similarity of the experimental data and the prediction of the model. The natural metric in this approach is the magnitude of the overlap of the prediction density and the actual experimental data, namely, *μ* ≝ ∫±<sup>∞</sup> *Pr*(*x*,*D*(*x*))*dx* in general and in a case of point data as *μ* ≝ *Pr*(*D*).

Obviously, the solution in general form, without taking into account the structure of the model and data, is very labour-consuming by both methods. But for simple models and data, the situation is so simplified that it leads to simple algorithms.
