Introductory Chapter: Ramifications of Incomplete Knowledge

Jan Peter Hessling

"Facts do not cease to exist because they are ignored."

#### 1. Background

Mathematical statistics has long been widely practiced in many fields of science [1]. Nevertheless, statistical methods have remained remarkably intact ever since the pioneering work [2] of R.A. Fisher and his contemporary scientists early in the twentieth century. Recently however, it has been claimed that most scientific results are wrong [3], due to malpractice of statistical methods. Errors of that kind are not caused by imperfect methodology but rather, reflect lack of understanding and proper interpretation.

In this introductory chapter, a different cause of errors is addressed—the ubiquitous practice of willful ignorance (WI) [4]. Usually it is applied with intent to remedy lack of knowledge and simplify or merely enable application of established statistical methods. Virtually all statistical approaches require complete statistical knowledge at some stage. In practice though, that can hardly ever be established. For instance, Bayes estimation relies upon prior knowledge. Any equal a priori probability assumption ("uninformed prior") does hardly disguise some facts are not known, which may be grossly deceiving. Uniform distribution is a specific assumption like any other. Willful ignorance of that kind must not be confused with knowledge to which we associate some degree of confidence. It may be better to explore rather than ignore consequences of what is not known at all. That will require novel perspectives on how mathematical statistics is practiced, which is the scope of this book.

#### 2. Ambiguity

Incomplete knowledge implies that obtained results may not be unique. That is, results may be ambiguous. Ambiguity de facto means the uncertainty associated with any estimated quantity itself is uncertain. We may adopt a probabilistic view and classify ambiguity as epistemic uncertainty. Ambiguity will here refer to lack of knowledge typically substituted with willful ignorance. Alternatives propelled by different types of willful ignorance can thus be explored to assess ambiguity.

A most powerful source of ambiguity is dependencies. Independence is perhaps the most claimed but often the least discussed presumption. Throwing dices or growing crops, as typically studied by the founders of statistics, independence indeed seems plausible. In all the complexity of modern technology of today however, it is anything but evident observations are independent. For instance,

meteorological radar observations may share sources of errors, meaning recorded data will be statistically dependent. A problem may then arise if our analysis makes use of, e.g., the maximum likelihood method which utilizes the entire covariance matrix. Most of its entries, all covariances between pairs of observations, are usually not known but bluntly set to zero to enable evaluation. This willful ignorance has the drastic consequence of extinguishing ambiguity and, as will be shown, minimizing the resulting uncertainty. Elementary considerations should provide the valuable insight that even exceedingly small covariances may substantially influence the result: the number of covariance elements is n nð Þ � <sup>1</sup> <sup>=</sup>2≈n2=2, while there are only n variances, for n observations. The number of covariance elements is hence n=2 times larger than that of variance. Each element being small is thus not a good enough argument to ignore the collection of all covariance elements.

uncertainty is the indisputable decision-maker. Studying the uncertainty quenching varð Þ� HCAL varð Þ HNWP , the ambiguity regarding the usually unknown but never-

by expanding the uncertainty conservatively, serious events like major thunder-

To enable illustrations, let the eigenstates of the NWP operator of order n [for not known reasons] be multiplicative separable in time t as well as in spatial coordinates x, y, with eigenstates described by orthogonal Legendre polynomials up to order n:

> <sup>θ</sup><sup>j</sup> <sup>þ</sup> k nð Þþ <sup>þ</sup> <sup>1</sup> <sup>1</sup> |fflfflfflfflfflfflfflfflfflfflfflffl{zfflfflfflfflfflfflfflfflfflfflfflffl} � <sup>r</sup>

where the NWP operator propagates the coefficients θjþk nð Þþ <sup>þ</sup><sup>1</sup> <sup>1</sup>ð Þt in time. Only the representation of covð Þ HCALð Þ x; y; t ¼ 0 , the covariance of the measured initial state at t ¼ 0, is of interest here. Discretizing over m domains in both directions x, y followed by sequential scanning over xp for each yq, p, q ¼ 1, 2, …m þ 1, the model is

Without any supplementary information, the variance of the initial measurement should be completely represented by the variance of the initial model state, i.e., varðHNWPð Þ x; y; 0 Þ ¼ varð Þ HCALð Þ x; y; 0 . The question is whether this holds, and

Assuming normal distributed measurement noise, the maximum likelihood method [8] yields the parameter covariance given by Eq. (3), which is propagated

cov <sup>θ</sup> <sup>∗</sup> ð Þ¼ <sup>K</sup>covð Þ <sup>H</sup>CAL <sup>K</sup><sup>T</sup> � ��<sup>1</sup>

Combining these relations, the degree of completeness of the representation of

θrð Þ t ¼ 0 Kr xp þ ð Þ m þ 1 yq

� � <sup>¼</sup> <sup>θ</sup>

covð Þ¼ <sup>H</sup>NWP <sup>K</sup><sup>T</sup>cov <sup>θ</sup> <sup>∗</sup> ð Þ<sup>K</sup> (4)

K n o ffi varð Þ <sup>H</sup>CAL , (5)

> <sup>¼</sup> <sup>F</sup>1, m dep: mF1, m indep:

(6)

�

� �; <sup>H</sup>CAL xj; yj

� � � � can be assessed. Then

ðÞ� t Pkð Þy Pjð Þ x , (1)

T

, (3)

K: (2)

theless required covariances cov HCAL xi; yi

DOI: http://dx.doi.org/10.5772/intechopen.86265

Introductory Chapter: Ramifications of Incomplete Knowledge

NWPð Þ� x; y; t ∑

written in standard affine vector θ;K � � form:

� � � <sup>∑</sup>

NWP xp; yq; 0

uncertainty by the model can be studied:

mation. Indeed, the Fisher information matrix [9]

� � ¼ � <sup>∂</sup><sup>2</sup>

1

3

CRLB <sup>¼</sup> F HCAL,i; <sup>H</sup>CAL,j

n j, <sup>k</sup>¼<sup>0</sup>

ð Þ <sup>n</sup>þ<sup>1</sup> <sup>2</sup>

r¼1

to uncertainty of the best predictions according to Eq. (4):

varð Þ¼ <sup>H</sup>NWP diag <sup>K</sup><sup>T</sup> <sup>K</sup>covð Þ <sup>H</sup>CAL <sup>K</sup><sup>T</sup> � ��<sup>1</sup>

where ffi indicates the addressed equality in the absence of uncertainty quenching or maximal propagation of uncertainty from observations to model. Equality can never be achieved though, since the number of degrees of freedom (NDOF) of prediction is drastically lower than that of calibration data. For typical models and data, the two NDOFs usually differ by an order of 10 or more. A large ratio is actually required to provide sufficient redundancy. As seen in Figure 1, the uncertainty is normally reduced to a small fraction, with substantial uncertainty quenching.

It should be emphasized that stating independence is fundamentally different than stating the degree of dependence which is unknown. These statements in fact oppose each other, since independence maximizes the available amount of infor-

> ln ð Þ p ∂HCAL,i∂HCAL,j � �

if it does not, to which extent can we minimize the discrepancy with WI?

storms may be properly recovered.

Hð Þ <sup>n</sup>

Hð Þ <sup>n</sup>

Various attempts have been made to avoid willful ignorance. The method of maximum entropy [5] focuses on the consequences of improper assignments of unknown statistical information. Covariance intersection [6] fuses observations conservatively to a pair of uncorrelated observations with variance max var ½ � ð Þθ . This approach explores ambiguity along the general principles suggested here, considering all possible values of covariance. Complementing the obtained maximum variance with the least possible variance min var ½ � ð Þθ would render an ambiguity interval, Α ¼ max var ½ �� ð Þθ min var ½ � ð Þθ , different but similar to confidence intervals.

Repeating any statistical analysis with various kinds of willful ignorance [on its input], the ambiguity (A) [of its output] can be assessed. Some WI will give large, while others will yield small resulting uncertainty, not necessarily the maximum and minimum, as it is difficult to imagine all possible kinds of WI. Any specific WI will more or less reduce or quench the uncertainty from its maximum. Identifying a model from calibration data HCAL and then letting the so-obtained model predict the same data HPRD, any chosen willful ignorance of covð Þ HCAL will quench the calibration uncertainty from the maximum over all choices, varð Þ HPRD ≤ max var ½ � ð Þ HPRD ≤varð Þ HCAL . Studying uncertainty quenching through varð Þ� HCAL varð Þ HPRD will indicate possible ramifications of our lack of knowledge varð Þ HPRD ≤ max var ½ � ð Þ HPRD but also the implicit knowledge max var ½ � ð Þ HPRD ≤varð Þ HCAL contained in the structure of the model. Most importantly, such studies will guide us to the least harmful choice of willful ignorance. The analysis is similar in style but different to the method of maximum entropy and covariance intersection. An example is given below.

#### 3. Illustration of uncertainty quenching

Assume we would like to study the evolution of a field over two spatial coordinates, using a model composed of a set of differential equations. The field could refer to meteorology and describe current observations of air pressure or humidity. The initial state may be expanded in the set of basis functions of the appropriate operator, similar to forecasting in numerical weather prediction (NWP) [7]. The basis functions could be thought of as the eigensolutions of a linear operator, which propagates one meteorological state, from one day to another. Neither the interpretation of the field nor the field itself matters for the discussion here. Rather, it is how the uncertainty of the initial state is represented as uncertainty of the distributed eigensolutions of the NWP propagator. This representation will determine the uncertainty of any subsequent forecast, reflecting the past experience in future confidence of predicting the weather. If the forecast uncertainty is lower than our current knowledge reflects, we may falsely reject, e.g., the possibility of experiencing major thunderstorms. In the eye of sailors planning their journey, the forecast

Introductory Chapter: Ramifications of Incomplete Knowledge DOI: http://dx.doi.org/10.5772/intechopen.86265

meteorological radar observations may share sources of errors, meaning recorded data will be statistically dependent. A problem may then arise if our analysis makes use of, e.g., the maximum likelihood method which utilizes the entire covariance matrix. Most of its entries, all covariances between pairs of observations, are usually not known but bluntly set to zero to enable evaluation. This willful ignorance has the drastic consequence of extinguishing ambiguity and, as will be shown, minimizing the resulting uncertainty. Elementary considerations should provide the valuable insight that even exceedingly small covariances may substantially influence the result: the number of covariance elements is n nð Þ � <sup>1</sup> <sup>=</sup>2≈n2=2, while there are only n variances, for n observations. The number of covariance elements is hence n=2 times larger than that of variance. Each element being small is thus not a

good enough argument to ignore the collection of all covariance elements.

knowledge varð Þ HPRD ≤ max var ½ � ð Þ HPRD but also the implicit knowledge

covariance intersection. An example is given below.

3. Illustration of uncertainty quenching

max var ½ � ð Þ HPRD ≤varð Þ HCAL contained in the structure of the model. Most importantly, such studies will guide us to the least harmful choice of willful ignorance. The analysis is similar in style but different to the method of maximum entropy and

Assume we would like to study the evolution of a field over two spatial coordinates, using a model composed of a set of differential equations. The field could refer to meteorology and describe current observations of air pressure or humidity. The initial state may be expanded in the set of basis functions of the appropriate operator, similar to forecasting in numerical weather prediction (NWP) [7]. The basis functions could be thought of as the eigensolutions of a linear operator, which propagates one meteorological state, from one day to another. Neither the interpretation of the field nor the field itself matters for the discussion here. Rather, it is how the uncertainty of the initial state is represented as uncertainty of the distributed eigensolutions of the NWP propagator. This representation will determine the uncertainty of any subsequent forecast, reflecting the past experience in future confidence of predicting the weather. If the forecast uncertainty is lower than our current knowledge reflects, we may falsely reject, e.g., the possibility of experiencing major thunderstorms. In the eye of sailors planning their journey, the forecast

confidence intervals.

Statistical Methodologies

2

Various attempts have been made to avoid willful ignorance. The method of maximum entropy [5] focuses on the consequences of improper assignments of unknown statistical information. Covariance intersection [6] fuses observations conservatively to a pair of uncorrelated observations with variance max var ½ � ð Þθ . This approach explores ambiguity along the general principles suggested here, considering all possible values of covariance. Complementing the obtained maximum variance with the least possible variance min var ½ � ð Þθ would render an ambiguity interval, Α ¼ max var ½ �� ð Þθ min var ½ � ð Þθ , different but similar to

Repeating any statistical analysis with various kinds of willful ignorance [on its input], the ambiguity (A) [of its output] can be assessed. Some WI will give large, while others will yield small resulting uncertainty, not necessarily the maximum and minimum, as it is difficult to imagine all possible kinds of WI. Any specific WI will more or less reduce or quench the uncertainty from its maximum. Identifying a model from calibration data HCAL and then letting the so-obtained model predict the same data HPRD, any chosen willful ignorance of covð Þ HCAL will quench the calibration uncertainty from the maximum over all choices, varð Þ HPRD ≤ max var ½ � ð Þ HPRD ≤varð Þ HCAL . Studying uncertainty quenching through varð Þ� HCAL varð Þ HPRD will indicate possible ramifications of our lack of

uncertainty is the indisputable decision-maker. Studying the uncertainty quenching varð Þ� HCAL varð Þ HNWP , the ambiguity regarding the usually unknown but nevertheless required covariances cov HCAL xi; yi � �; <sup>H</sup>CAL xj; yj � � � � can be assessed. Then by expanding the uncertainty conservatively, serious events like major thunderstorms may be properly recovered.

To enable illustrations, let the eigenstates of the NWP operator of order n [for not known reasons] be multiplicative separable in time t as well as in spatial coordinates x, y, with eigenstates described by orthogonal Legendre polynomials up to order n:

$$H\_{\rm NWP}^{(n)}(\mathbf{x}, \mathbf{y}, t) \equiv \sum\_{j,k=0}^{n} \underbrace{\theta\_j}\_{\mathbf{x}} + \underbrace{k(n+1) + 1}\_{\mathbf{x}}(t) \cdot P\_k(\mathbf{y}) P\_j(\mathbf{x}), \tag{1}$$

where the NWP operator propagates the coefficients θjþk nð Þþ <sup>þ</sup><sup>1</sup> <sup>1</sup>ð Þt in time. Only the representation of covð Þ HCALð Þ x; y; t ¼ 0 , the covariance of the measured initial state at t ¼ 0, is of interest here. Discretizing over m domains in both directions x, y followed by sequential scanning over xp for each yq, p, q ¼ 1, 2, …m þ 1, the model is written in standard affine vector θ;K � � form:

$$H\_{\rm NWP}^{(n)}\left(\mathbf{x}\_{p},\boldsymbol{y}\_{q},\mathbf{0}\right) \equiv \sum\_{r=1}^{\left(n+1\right)^{2}} \theta\_{r}(t=\mathbf{0})K\_{r}\left(\mathbf{x}\_{p}+(m+1)\boldsymbol{y}\_{q}\right) = \overline{\theta}^{T}\overline{K}.\tag{2}$$

Without any supplementary information, the variance of the initial measurement should be completely represented by the variance of the initial model state, i.e., varðHNWPð Þ x; y; 0 Þ ¼ varð Þ HCALð Þ x; y; 0 . The question is whether this holds, and if it does not, to which extent can we minimize the discrepancy with WI?

Assuming normal distributed measurement noise, the maximum likelihood method [8] yields the parameter covariance given by Eq. (3), which is propagated to uncertainty of the best predictions according to Eq. (4):

$$\text{cov}(\theta^\*) = \left[K \text{cov}(H\_{\text{CAL}}) K^T\right]^{-1},\tag{3}$$

$$\mathbf{cov}(H\_{\text{NWP}}) = \mathbf{K}^T \mathbf{cov}(\boldsymbol{\theta}^\*) \mathbf{K} \tag{4}$$

Combining these relations, the degree of completeness of the representation of uncertainty by the model can be studied:

$$\text{var}(H\_{\text{NWP}}) = \text{diag}\left\{ K^T \left[ K \text{cov}(H\_{\text{CAL}}) K^T \right]^{-1} K \right\} \cong \text{var}(H\_{\text{CAL}}),\tag{5}$$

where ffi indicates the addressed equality in the absence of uncertainty quenching or maximal propagation of uncertainty from observations to model. Equality can never be achieved though, since the number of degrees of freedom (NDOF) of prediction is drastically lower than that of calibration data. For typical models and data, the two NDOFs usually differ by an order of 10 or more. A large ratio is actually required to provide sufficient redundancy. As seen in Figure 1, the uncertainty is normally reduced to a small fraction, with substantial uncertainty quenching.

It should be emphasized that stating independence is fundamentally different than stating the degree of dependence which is unknown. These statements in fact oppose each other, since independence maximizes the available amount of information. Indeed, the Fisher information matrix [9]

$$\frac{1}{\text{CRLB}} = F(H\_{\text{CAL},i}, H\_{\text{CAL},j}) = -\left\langle \frac{\partial^2 \ln \left(p\right)}{\partial H\_{\text{CAL},i} \partial H\_{\text{CAL},j}} \right\rangle = \begin{cases} F\_{\text{1}} & m \text{ dep.}\\ m F\_{\text{1}} & m \text{ independ.} \end{cases} \tag{6}$$

confidence without knowledge. WI should minimize rather than maximize the information. That is indeed the principle utilized in the method of maximum

reason. Since the model cannot represent an arbitrary response, it can neither represent an arbitrary variability. This restriction constitutes the very meaning of a "model." This makes it important to describe the covariance of observations

accurately—inappropriate WI may quench uncertainty dramatically.

observations, introduce a finite long correlation length\ksi:

Introductory Chapter: Ramifications of Incomplete Knowledge

DOI: http://dx.doi.org/10.5772/intechopen.86265

� � <sup>¼</sup> cov <sup>H</sup>CAL,i; <sup>H</sup>CAL,j

Uncertainty is lost for obvious reasons. The question is how much and for what

The additional information represented by the structure of the model could be denoted by the model innovation. It is strongly affected by WI attributed to observations. With increasing resolution m, the model innovation grows as the information contained in observations is maximized with an assumption of independence. Indeed, the prediction variance is quenched in agreement with the CRLB, as seen in

If WI of observation covariance instead resembles what the model is able to represent, the model innovation will be the least. Instead of assuming independent

> � � ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi varð Þ HCAL,i var HCAL,j

Increasing the correlation length\ksi from zero as in Figure 1 (bottom), the model innovation decreases, and the variance of the prediction varð Þ HNWP is almost fully restored to the original variance of the observation varð Þ HCAL . The model will then not improve our knowledge of the current weather situation but enable prediction to a later time with comparable trust. Summarizing Figure 1, our WI of observation covariance and resolution m strongly influence our claimed precision

It is a different matter if the model is consistent with the observations it was identified from. Model consistency is usually assessed with a statistical residual analysis. In conventional system identification (CSI) [10], the hypothesis is that the [deterministic] model fully explains the observations. Due to sampling variance of the finite uncertain calibration data though, the best estimate of its parameters will be uncertain. The residual analysis explores if the residual is consistent with the sampling uncertainty of the calibration data but without uncertainty associated

This conjecture of a model without error whatsoever in CSI is questionable. In practice, no model is completely without error. Rather, a finite uncertainty of the model could be regarded as inherited from mismatch to calibration data. If so, the model merely provides a convenient but to a quantifiable degree imperfect basis for expressing uncertain calibration data. The model is utilized to "passively transform" rather than "actively explain" observations to another unknown situation of interest. That intent is typical in, e.g., weather forecasting and product development. Furthermore, the uncertainty of calibration data can often be assessed from the setup of the calibration experiment. In CSI correlation functions are evaluated from a single residual vector, enforcing homoscedasticity and independence of observations. WI of this kind enables the statistical analysis of the residual but often find

The alternative view on model calibration proposed here is that the identified model, composed of its form or structure, parameters, and uncertainty, represents the uncertain calibration data. Model results can thus substitute our observations, to the degree various aspects of the model and observations are consistent. Any given

<sup>q</sup> � � <sup>¼</sup> exp � ri � rj

� � � �=<sup>ξ</sup> � � (7)

entropy [5].

Figure 1 (top).

corr HCAL,i; HCAL,j

varð Þ HNWP of predictions.

with the model.

little support.

5

#### Figure 1.

Uncertainty quenching or excessive reduction of uncertainty due to willful ignorance. Dependence on resolution m (top) and correlation length ξ (Eq. (7)) (bottom) of the calibration data. The legend includes the Mahalanobis canonical distance Ω (Eq. (11)) and the ratio γ between the largest and the smallest eigenvalues of residual variance (Eq. (9)).

is additive as m independent observations are collected, since the joint probability distribution p in that case is multiplicative separable. For dependent observations though, no information is added at all, since F HCAL,i; HCAL,j then remains the same as for only one observation ð Þ F<sup>1</sup> . Aggregating m observations, the minimum variance of any estimator set by the Cramer-Rao lower bound (CRLB) [9] thus decreases with a factor 1=m in the case where errors are independent (Eq. (6), first row), while it remains the same if they are completely dependent (Eq. (6), second row). Independence is thus the worst possible choice of WI, as it builds

Introductory Chapter: Ramifications of Incomplete Knowledge DOI: http://dx.doi.org/10.5772/intechopen.86265

confidence without knowledge. WI should minimize rather than maximize the information. That is indeed the principle utilized in the method of maximum entropy [5].

Uncertainty is lost for obvious reasons. The question is how much and for what reason. Since the model cannot represent an arbitrary response, it can neither represent an arbitrary variability. This restriction constitutes the very meaning of a "model." This makes it important to describe the covariance of observations accurately—inappropriate WI may quench uncertainty dramatically.

The additional information represented by the structure of the model could be denoted by the model innovation. It is strongly affected by WI attributed to observations. With increasing resolution m, the model innovation grows as the information contained in observations is maximized with an assumption of independence. Indeed, the prediction variance is quenched in agreement with the CRLB, as seen in Figure 1 (top).

If WI of observation covariance instead resembles what the model is able to represent, the model innovation will be the least. Instead of assuming independent observations, introduce a finite long correlation length\ksi:

$$\text{corr}\left(H\_{\text{CAL},i}, H\_{\text{CAL},j}\right) = \frac{\text{cov}\left(H\_{\text{CAL},i}, H\_{\text{CAL},j}\right)}{\sqrt{\text{var}(H\_{\text{CAL},i})\text{var}\left(H\_{\text{CAL},j}\right)}} = \exp\left[-\left|r\_i - r\_j\right|/\xi\right] \tag{7}$$

Increasing the correlation length\ksi from zero as in Figure 1 (bottom), the model innovation decreases, and the variance of the prediction varð Þ HNWP is almost fully restored to the original variance of the observation varð Þ HCAL . The model will then not improve our knowledge of the current weather situation but enable prediction to a later time with comparable trust. Summarizing Figure 1, our WI of observation covariance and resolution m strongly influence our claimed precision varð Þ HNWP of predictions.

It is a different matter if the model is consistent with the observations it was identified from. Model consistency is usually assessed with a statistical residual analysis. In conventional system identification (CSI) [10], the hypothesis is that the [deterministic] model fully explains the observations. Due to sampling variance of the finite uncertain calibration data though, the best estimate of its parameters will be uncertain. The residual analysis explores if the residual is consistent with the sampling uncertainty of the calibration data but without uncertainty associated with the model.

This conjecture of a model without error whatsoever in CSI is questionable. In practice, no model is completely without error. Rather, a finite uncertainty of the model could be regarded as inherited from mismatch to calibration data. If so, the model merely provides a convenient but to a quantifiable degree imperfect basis for expressing uncertain calibration data. The model is utilized to "passively transform" rather than "actively explain" observations to another unknown situation of interest. That intent is typical in, e.g., weather forecasting and product development. Furthermore, the uncertainty of calibration data can often be assessed from the setup of the calibration experiment. In CSI correlation functions are evaluated from a single residual vector, enforcing homoscedasticity and independence of observations. WI of this kind enables the statistical analysis of the residual but often find little support.

The alternative view on model calibration proposed here is that the identified model, composed of its form or structure, parameters, and uncertainty, represents the uncertain calibration data. Model results can thus substitute our observations, to the degree various aspects of the model and observations are consistent. Any given

is additive as m independent observations are collected, since the joint probability distribution p in that case is multiplicative separable. For dependent observa-

Uncertainty quenching or excessive reduction of uncertainty due to willful ignorance. Dependence on resolution m (top) and correlation length ξ (Eq. (7)) (bottom) of the calibration data. The legend includes the Mahalanobis canonical distance Ω (Eq. (11)) and the ratio γ between the largest and the smallest eigenvalues

the same as for only one observation ð Þ F<sup>1</sup> . Aggregating m observations, the minimum variance of any estimator set by the Cramer-Rao lower bound (CRLB) [9] thus decreases with a factor 1=m in the case where errors are independent (Eq. (6), first row), while it remains the same if they are completely dependent (Eq. (6), second row). Independence is thus the worst possible choice of WI, as it builds

then remains

tions though, no information is added at all, since F HCAL,i; HCAL,j

Figure 1.

4

of residual variance (Eq. (9)).

Statistical Methodologies

residual is one realization and should relate to its expected variability, with respect to the uncertainties of both the model and the observations it was identified from.

The Mahalanobis distance [6] can be utilized to measure the relative distance between observations and model output, which constitutes the residual ρ:

$$\mathbf{M} = \rho^T \mathbf{cov}^{-1}(\rho)\rho, \quad \rho \equiv H\_{\rm CAT} - H\_{\rm PRD}. \tag{8}$$

to correlations of the residual. The model may very well be consistent with respect to the variance but rarely with respect to covariance of its output. However, ignoring correlations and only focusing on the magnitude of variations of calibration data, i.e., varð Þ HCAL , which is the standard practice [10], is completely different. Then, the belief in the model is perfect and the only limitation of also making perfect forecasts is the finiteness of a random sample of observations. In case of

A potential conflict is inevitable for exceedingly high ratios γ. Indeed, as seen in Figure 1 (bottom) for increasing correlation lengths ξ, the Mahalanobis distance Ω decreases, while the ratio γ rapidly increases. Thus as observation variance is recovered, the requirement to ignore prediction covariance rapidly grows.

"The first principle is that you must not fool yourself and you are the easiest person

Current practice of willful ignorance sometimes makes statistics an art of selfdelusion [3]. Consequences of applied WI are rarely explored, as only one proposi-

Distinguishing what is not known from what is assumed is of paramount importance. Not known to any degree should mean that all possibilities that can be imagined also ought to be considered. Otherwise obtained results only exemplify what the most appropriate answer may be, without any indication of the largest

Our knowledge is almost never complete. Virtually all existing statistical methods nevertheless require precisely that. Until alternative methodologies exist, WI must fill the gap between what is actually known and what must be known. As illustrated, the consequences of different WI may vary dramatically. Therefore we should select and tweak WI carefully. WI should not relate to our unconfirmed

tions of incomplete knowledge might be mitigated with carefully chosen WI: explore all kinds of ignorance that can be imagined. Analyze and collect obtained results in ambiguity intervals, similar to confidence intervals. Another option is to focus on the worst case in a conservative manner. The method of covariance intersection is one example of how that can be exercised. The principle of maximum entropy provides means to maximize the residual uncertainty, to add the least possible amount of information. Minimizing the Fisher information for observations and the Mahalanobis distance for model identification as proposed here is still another kind of conservatism. These methods tackle unknown information with WI and explore its consequences. Finding the most proper WI is indeed nontrivial and

The proposal of a quantifiable ambiguity proposed here suggests how ramifica-

Current practice of statistics utilizes WI in many ways, but the specific choice is rarely discussed in depth. One reason could be that statistics was developed in an entirely different context than practiced today, which is rarely acknowledged and probably not fully comprehended. To exemplify, recall that Fisher's [2] original interpretation of "never" as a finite probability of 5% was just a humble proposal. He urged his readers to adjust "never" to the current context, a piece of advice

Perhaps the reported breakdown of statistics methodologies [3, 4] is due to neglect of ambiguity, driven by a strong tradition of uncritical application of WI.

4. A quest for better practice of willful ignorance

Introductory Chapter: Ramifications of Incomplete Knowledge

DOI: http://dx.doi.org/10.5772/intechopen.86265

tion normally is made without further ado.

belief, but rather address its consequences.

calls for genuinely novel approaches.

almost never followed today.

7

homoscedasticity, γ ¼ 1.

to fool" [11].

possible deviation.

The residual covariance matrix defines its principal variations with typical magnitudes λj:

$$\begin{aligned} \text{cov}(\rho) &= \text{cov}(H\_{\text{CAL}}) - 2\text{cov}(H\_{\text{CAL}}, H\_{\text{PRD}}) + \text{cov}(H\_{\text{PRD}}) = U\Lambda^2 U^T, \\ \Lambda\_{\vec{\eta}} &\equiv \delta\_{\vec{\eta}} \lambda\_{\vec{\eta}}, \quad UU^T = I, \quad \delta\_{\vec{\eta}} = \mathbf{1}, \quad \delta\_{i \neq \vec{\eta}} = \mathbf{0}. \end{aligned} \tag{9}$$

The evaluation of covð Þ HCAL; HPRD in Eq. (9) is challenging, since HPRD has a complicated relation to its "role model" HCAL set by the identification. To simplify, it is set to zero below.

Extracting matrices U from Eq. (9) in Eq. (8), squared deviations are compared to variances. The Mahalanobis distance then transforms into a relative Euclidean norm of the residual in its own space of uncorrelated variations:

$$\mathcal{M} \equiv \tilde{\rho} T \Lambda^{-2} \tilde{\rho} = \sum\_{\tilde{j}} \frac{\left| \tilde{\rho}\_{\tilde{j}} \right|^{2}}{\text{var} \left( \tilde{\rho}\_{\tilde{j}} \right)}, \quad \tilde{\rho} \equiv \mathcal{U} \rho,\tag{10}$$

where ~ρ<sup>j</sup> is the projection of the residual on its principal vector U:,j of variation, while the eigenvalue λ<sup>j</sup> � f g Λ jj expresses its typical magnitude of variation. For a small eigenvalue λj, observing even a moderate projection Uj,:ρ is statistically unlikely and thus strongly violates any model.

To maximize the consistency, in the sense of minimizing the Mahalanobis distance, the variance var ~ρ<sup>j</sup> Þ � of principal residual variations should be maximized. Without addressing any specific residual, maximize what could be defined the Mahalanobis canonical distance:

$$\Omega \equiv \sqrt{\sum\_{j} \text{var}\left(\tilde{\rho}\_{j}\right)} = \sqrt{\sum\_{j} \lambda\_{j}^{2}}.\tag{11}$$

Minimizing the Fisher information matrix under assumption of normality addresses the covariance cov HCAL,i; HCAL,j � � of observations. Minimizing the Mahalanobis canonical distance Ω considers also the covariance cov HPRD,i; HPRD,j � � of the model residual as well as the cross covariance cov HCAL,i; HPRD,j � �, which reflects the model innovation. Hence, willful ignorance for model identification should minimize the Mahalanobis canonical distance rather than the Fisher information matrix, as the former but not the latter also accounts for the innovation of the model structure. The intent is to educate the model to produce the most conservative results.

In practice, no residual projection ~ρ ¼ Uρ is usually negligible. Thus, the likelihood of rejecting any model, considering correlations, increases dramatically if exceedingly small eigenvalues are obtained. For that reason it is wise to check the ratio γ � max<sup>j</sup> λ<sup>j</sup> � �= min<sup>j</sup> λ<sup>j</sup> � � between the largest and smallest eigenvalues of the residual covariance (Eq. (9)). If γ is large, the model is expected to fail with respect Introductory Chapter: Ramifications of Incomplete Knowledge DOI: http://dx.doi.org/10.5772/intechopen.86265

residual is one realization and should relate to its expected variability, with respect to the uncertainties of both the model and the observations it was identified from. The Mahalanobis distance [6] can be utilized to measure the relative distance

The residual covariance matrix defines its principal variations with typical

<sup>Λ</sup>ij � <sup>δ</sup>ijλj, UU<sup>T</sup> <sup>¼</sup> I, <sup>δ</sup>jj <sup>¼</sup> <sup>1</sup>, <sup>δ</sup>i6¼<sup>j</sup> <sup>¼</sup> <sup>0</sup>: (9)

Extracting matrices U from Eq. (9) in Eq. (8), squared deviations are compared to variances. The Mahalanobis distance then transforms into a relative Euclidean

> ~ρj � � � � � � 2

var ~ρ<sup>j</sup>

where ~ρ<sup>j</sup> is the projection of the residual on its principal vector U:,j of variation, while the eigenvalue λ<sup>j</sup> � f g Λ jj expresses its typical magnitude of variation. For a small eigenvalue λj, observing even a moderate projection Uj,:ρ is statistically

To maximize the consistency, in the sense of minimizing the Mahalanobis dis-

Without addressing any specific residual, maximize what could be defined the

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ∑ j

var ~ρ<sup>j</sup> s � �

Minimizing the Fisher information matrix under assumption of normality

Mahalanobis canonical distance Ω considers also the covariance cov HPRD,i; HPRD,j

In practice, no residual projection ~ρ ¼ Uρ is usually negligible. Thus, the likelihood of rejecting any model, considering correlations, increases dramatically if exceedingly small eigenvalues are obtained. For that reason it is wise to check the

residual covariance (Eq. (9)). If γ is large, the model is expected to fail with respect

reflects the model innovation. Hence, willful ignorance for model identification should minimize the Mahalanobis canonical distance rather than the Fisher information matrix, as the former but not the latter also accounts for the innovation of the model structure. The intent is to educate the model to produce the most con-

of the model residual as well as the cross covariance cov HCAL,i; HPRD,j

¼

r

of principal residual variations should be maximized.

ffiffiffiffiffiffiffiffiffi ∑ j λ2 J

� � of observations. Minimizing the

� � between the largest and smallest eigenvalues of the

The evaluation of covð Þ HCAL; HPRD in Eq. (9) is challenging, since HPRD has a complicated relation to its "role model" HCAL set by the identification. To simplify,

ð Þρ ρ, ρ � HCAL � HPRD: (8)

UT,

� � , <sup>~</sup><sup>ρ</sup> � <sup>U</sup>ρ, (10)

: (11)

� �, which

� �

between observations and model output, which constitutes the residual ρ:

covð Þ¼ <sup>ρ</sup> covð Þ� <sup>H</sup>CAL 2covð Þþ <sup>H</sup>CAL; <sup>H</sup>PRD covð Þ¼ <sup>H</sup>PRD <sup>U</sup>Λ<sup>2</sup>

norm of the residual in its own space of uncorrelated variations:

~ρ ¼ ∑ j

<sup>M</sup> � <sup>~</sup>ρTΛ�<sup>2</sup>

unlikely and thus strongly violates any model.

addresses the covariance cov HCAL,i; HCAL,j

� �= min<sup>j</sup> λ<sup>j</sup>

Þ �

Ω �

<sup>M</sup> <sup>¼</sup> <sup>ρ</sup>Tcov�<sup>1</sup>

magnitudes λj:

Statistical Methodologies

it is set to zero below.

tance, the variance var ~ρ<sup>j</sup>

servative results.

ratio γ � max<sup>j</sup> λ<sup>j</sup>

6

Mahalanobis canonical distance:

to correlations of the residual. The model may very well be consistent with respect to the variance but rarely with respect to covariance of its output. However, ignoring correlations and only focusing on the magnitude of variations of calibration data, i.e., varð Þ HCAL , which is the standard practice [10], is completely different. Then, the belief in the model is perfect and the only limitation of also making perfect forecasts is the finiteness of a random sample of observations. In case of homoscedasticity, γ ¼ 1.

A potential conflict is inevitable for exceedingly high ratios γ. Indeed, as seen in Figure 1 (bottom) for increasing correlation lengths ξ, the Mahalanobis distance Ω decreases, while the ratio γ rapidly increases. Thus as observation variance is recovered, the requirement to ignore prediction covariance rapidly grows.

#### 4. A quest for better practice of willful ignorance

"The first principle is that you must not fool yourself and you are the easiest person to fool" [11].

Current practice of willful ignorance sometimes makes statistics an art of selfdelusion [3]. Consequences of applied WI are rarely explored, as only one proposition normally is made without further ado.

Distinguishing what is not known from what is assumed is of paramount importance. Not known to any degree should mean that all possibilities that can be imagined also ought to be considered. Otherwise obtained results only exemplify what the most appropriate answer may be, without any indication of the largest possible deviation.

Our knowledge is almost never complete. Virtually all existing statistical methods nevertheless require precisely that. Until alternative methodologies exist, WI must fill the gap between what is actually known and what must be known. As illustrated, the consequences of different WI may vary dramatically. Therefore we should select and tweak WI carefully. WI should not relate to our unconfirmed belief, but rather address its consequences.

The proposal of a quantifiable ambiguity proposed here suggests how ramifications of incomplete knowledge might be mitigated with carefully chosen WI: explore all kinds of ignorance that can be imagined. Analyze and collect obtained results in ambiguity intervals, similar to confidence intervals. Another option is to focus on the worst case in a conservative manner. The method of covariance intersection is one example of how that can be exercised. The principle of maximum entropy provides means to maximize the residual uncertainty, to add the least possible amount of information. Minimizing the Fisher information for observations and the Mahalanobis distance for model identification as proposed here is still another kind of conservatism. These methods tackle unknown information with WI and explore its consequences. Finding the most proper WI is indeed nontrivial and calls for genuinely novel approaches.

Current practice of statistics utilizes WI in many ways, but the specific choice is rarely discussed in depth. One reason could be that statistics was developed in an entirely different context than practiced today, which is rarely acknowledged and probably not fully comprehended. To exemplify, recall that Fisher's [2] original interpretation of "never" as a finite probability of 5% was just a humble proposal. He urged his readers to adjust "never" to the current context, a piece of advice almost never followed today.

Perhaps the reported breakdown of statistics methodologies [3, 4] is due to neglect of ambiguity, driven by a strong tradition of uncritical application of WI. Could this be caused by lack of awareness of its potentially dramatic consequences? Ignorance of limitations of contemporary state-of-the-art methods is hardly new [12]. Ambiguity indeed sets a meta-perspective on statistical analysis that cannot be avoided and thus needs further exploration.

References

[1] Huxley A. Proper Studies. London:

DOI: http://dx.doi.org/10.5772/intechopen.86265

Introductory Chapter: Ramifications of Incomplete Knowledge

[11] Feynman RP. Cargo Cult Science:

pseudoscience, and learning how to not

commencement address. Engineering

[12] Fisher RA. On the mathematical foundations of theoretical studies. Philosophical Transactions of the Royal

20051213222222; http:/www.library.ade laide.edu.au/digitised/fisher/18pt1.pdf

Some remarks on science

and Science. 1974;10-13

fool yourself. Caltech's 1974

Society A. 1922;222:309-368. Reproduction with Author's note: https://web.archive.org/web/

[2] Bennett JH, Fisher RA. Statistical methods, experimental design, and scientific inference. Fisher. New York:

[3] Ioannidis JPA. Why most published research findings are false. PLoS

Medicine. 2005;2(8):e124. DOI: 10.1371/

[4] Weisberg HI. Willful Ignorance: The Mismeasure of Uncertainty. Hoboken, New Jersey: John Wiley & Sons; 2014.

[5] Jaynes ET. Information theory and statistical mechanics. Physical Review.

[6] Uhlmann JK. Covariance consistency methods for fault-tolerant distributed data fusion. Information Fusion. 2003;

[7] Kalnay E. Atmospheric Modeling, Data Assimilation and Predictability. New York: Cambridge University Press; 2003. ISBN: 978-0-521-79179-3. ISBN:

[8] Hessling JP. Identification of

complex models. SIAM/ASA Journal on Uncertainty Quantification. 2014;2(1):

[9] Kay SM. Fundamentals of Statistical Signal Processing, Estimation Theory. Vol. 1. Upper Saddle River, New Jersey:

[10] Ljung L. System Identification— Theory for the User. Englewood Cliffs, NJ: Prentice-Hall; 1987. 519 p. ISBN

Oxford University Press; 1995

ISBN: 978-0470890448. ISBN:

journal.pmed.0020124

0470890444

1957;106(4):620

4(3):201-215

978-0-521-79629-3

Prentice Hall PTR; 1993

717-744

0-13-881640

9

Chatto and Windus; 1927

## Author details

Jan Peter Hessling Kapernicus AB, Hallingsjo, Sweden

\*Address all correspondence to: peter@kapernicus.com

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Introductory Chapter: Ramifications of Incomplete Knowledge DOI: http://dx.doi.org/10.5772/intechopen.86265

### References

Could this be caused by lack of awareness of its potentially dramatic consequences? Ignorance of limitations of contemporary state-of-the-art methods is hardly new [12]. Ambiguity indeed sets a meta-perspective on statistical analysis that cannot be

avoided and thus needs further exploration.

Statistical Methodologies

Author details

Jan Peter Hessling

8

Kapernicus AB, Hallingsjo, Sweden

provided the original work is properly cited.

\*Address all correspondence to: peter@kapernicus.com

© 2020 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/3.0), which permits unrestricted use, distribution, and reproduction in any medium,

[1] Huxley A. Proper Studies. London: Chatto and Windus; 1927

[2] Bennett JH, Fisher RA. Statistical methods, experimental design, and scientific inference. Fisher. New York: Oxford University Press; 1995

[3] Ioannidis JPA. Why most published research findings are false. PLoS Medicine. 2005;2(8):e124. DOI: 10.1371/ journal.pmed.0020124

[4] Weisberg HI. Willful Ignorance: The Mismeasure of Uncertainty. Hoboken, New Jersey: John Wiley & Sons; 2014. ISBN: 978-0470890448. ISBN: 0470890444

[5] Jaynes ET. Information theory and statistical mechanics. Physical Review. 1957;106(4):620

[6] Uhlmann JK. Covariance consistency methods for fault-tolerant distributed data fusion. Information Fusion. 2003; 4(3):201-215

[7] Kalnay E. Atmospheric Modeling, Data Assimilation and Predictability. New York: Cambridge University Press; 2003. ISBN: 978-0-521-79179-3. ISBN: 978-0-521-79629-3

[8] Hessling JP. Identification of complex models. SIAM/ASA Journal on Uncertainty Quantification. 2014;2(1): 717-744

[9] Kay SM. Fundamentals of Statistical Signal Processing, Estimation Theory. Vol. 1. Upper Saddle River, New Jersey: Prentice Hall PTR; 1993

[10] Ljung L. System Identification— Theory for the User. Englewood Cliffs, NJ: Prentice-Hall; 1987. 519 p. ISBN 0-13-881640

[11] Feynman RP. Cargo Cult Science: Some remarks on science pseudoscience, and learning how to not fool yourself. Caltech's 1974 commencement address. Engineering and Science. 1974;10-13

[12] Fisher RA. On the mathematical foundations of theoretical studies. Philosophical Transactions of the Royal Society A. 1922;222:309-368. Reproduction with Author's note: https://web.archive.org/web/ 20051213222222; http:/www.library.ade laide.edu.au/digitised/fisher/18pt1.pdf

Chapter 2

Evrim Oral

Abstract

Surveying Sensitive Topics with

Data reliability is a common concern especially when asking about sensitive topics such as sexual misconduct, domestic violence, or drug and alcohol abuse. Sensitive topics might cause refusals in surveys due to privacy concerns of the subjects. Unit nonresponse occurs when sampled subjects fail to participate in a study; item nonresponse occurs when sampled subjects do not respond to certain survey questions. Unit nonresponse reduces sample size and study power; it might also increase bias. Respondents, on the other hand, might answer the sensitive questions in a manner that will be viewed favorably by others instead of answering truthfully. Social desirability bias (SDB) has long been recognized as a serious problem in surveying sensitive topics. Various indirect questioning methods have been developed to reduce SDB and increase data reliability, one of them being the randomized response technique (RRT). In this chapter, we will review some of the important indirect questioning techniques proposed for binary responses, with a special focus on RRTs. We will discuss the advantages and disadvantages of some of the indirect questioning techniques and describe some of the recent novel methods.

Keywords: social desirability bias, unmatched count technique, network scale-up technique, nonrandomized response technique, randomized response technique

Data reliability is a common concern across all studies that use surveys, but more so while asking sensitive questions. Sensitive questions include sacred, private, or potentially exposing information that could be incriminating or discriminating for a respondent, or for the social group that is represented by the respondent [1]. For example, in studies which evaluate exposure to HIV infection, respondents are often asked sensitive questions regarding their opposite- or same-sex sexual practices. As another example, in studies which aim to assess substance use and abuse, respondents might suppress disclosure of their drug and alcohol misuse to avoid embarrassment or potentially harmful/unwanted consequences. Estimating the prevalence of such sensitive attributes is particularly important for health care researchers to build scientific knowledge, create necessary public health interven-

Two problems typically arise while studying sensitive topics, (1) nonresponse rate increase and (2) social desirability bias (SDB), which is defined as the tendency of answering questions in a socially acceptable fashion rather than answering truthfully, occurs. Nonresponse rates can be reduced by utilizing some strategies such as

Indirect Questioning

1. Introduction: surveying sensitive topics

tions, and develop political strategies.

11

#### Chapter 2
