**2.2 Statistical power**

In any statistical analysis our strength of conviction for our conclusions is largely dependent on how much data we can observe (sample size), and how consistent our outcomes are within those observed data (variance). In the frequentist statistical context this is reflected in the concept of *statistical power*. Statistical power is defined as the (hypothetical) probability of correctly rejecting the null hypothesis when the null hypothesis is indeed false (and false by a pre-set threshold considered to be of clinical or practical importance). A commonly desired and accepted level of statistical power is 80%. However, it should be noted that even with this level of power, there is a 20% chance of making a Type II error (i.e., incorrectly failing to reject the null hypothesis). Unless the ratio of the standard deviation to the mean (coefficient of variation) is small, the statistical power in small studies is considerably lower than 80%, effectively crippling the ability to confidently draw inferences from the data under this framework.

#### **2.3 Interpretation**

Both violation of assumptions and low statistical power can frustrate the drawing of inferences under traditional statistical approaches. If we manage to obtain a statistically significant effect, how should we interpret it given the potential for confounding by indication? If we fail to see any significant effects where we believe we ought to *a priori*, how do we interpret that? Does our assessment of the meaning of such results change with larger or smaller variance in our sample? Under traditional approaches we surrender to the probabilities of committing Type I or Type II errors, and resign ourselves to having learned nothing.

## **2.4 Preference for errors**

The ultimate motive for use of the NHST framework is to reach reasonable conclusions about a population or process from a (large) subsample of it. However, a real yet unintended consequence of the framework is the focus on avoidance of error. The framework itself is centered on the concept of errors in inference: when we can, we design our studies to avoid Type I error while simultaneously trying to limit Type II errors. In so doing we may make these errors—rather than what we might learn from our data—the primary consideration of our scientific activity. It should come as no surprise that when we make avoidance of error our top priority, we fail to learn all we can from our data.

Modern science's focus on Type I error has proven to be particularly troublesome. In our quest to never actively assert a false truth we have no doubt passively allowed many truths to go unspoken. It is obvious that Type I errors can cause harm in medicine if new treatments are adopted that are actually harmful to patients. Less obvious is the harm that may result if research into a truly efficacious treatment is abandoned simply because a p-value was too high. Such harm is every bit as real (and every bit as irreversible) as that done by introducing an ineffective treatment. It is especially troubling in initial exploratory studies and those where data are acquired only with great difficulty or expense.

**5**

*Introductory Chapter: Research Methods for the Next 60 Years of Space Exploration*

Having seen the problems that small-n settings create in general, how do we solve them? Through a combination of realigning our epistemology, using our current tools differently, and utilizing modern analytic tools developed outside the field of statistics, we can do better research and advance the field of space medicine

Cognitive dissonance is the feeling of discomfort one feels when actions fail to conform to beliefs. [2] To most scientists, making claims about truth without a statistically significant result to point to elicits substantial cognitive dissonance. This perhaps more than anything demonstrates our over-reliance on NHST as a substitute for a more robust epistemology. There are several things we can do to learn from data without suffering from cognitive dissonance—even without significance tests. Altogether they amount to a different epistemological approach

In 1965 Sir Austin Bradford Hill described nine guidelines for determining causation from scientific evidence. [3] It is worth noting that while one of the guidelines deals with *strength of association*, or what we might recognize as *effect size*, none of the criteria deal with significance testing or p-values. Explicitly, Hill called for examining the *quality* of the relationship between exposure and outcome: the logical features of how the evidence suggests they interact, and how that fits with prior knowledge of the same or similar subject matter. This sort of prescription is

Similar to Hill's work, modern causal inference methods may also be of great use in space-health research. These methods have sought to mathematically formalize causation in order to make valid use of observational data for causal estimation and to avoid introducing biases in analyzing such data [4]. Perhaps more important than the methods of analysis that this framework has promoted is the understanding of the assumptions necessary to make causal statements from non-randomized data. Merely understanding the assumptions of positivity, consistency, and conditional exchangeability—and what happens when one violates them—can be of

tremendous help when trying to draw inferences based on limited data.

A common tool used in modern causal inference is a special type of network graph known as the directed acyclic graph (DAG). These are network maps that reflect causal relationships. DAGs are drawn according to some simple rules, but making and using these diagrams can be quite useful for clarifying thinking and formulating testable hypotheses. If we factorize a joint probability distribution over a DAG, we create a Bayesian Network, a powerful tool of probabilistic inference. If we decompose a correlation or covariance matrix over a DAG, we can do path analysis or structural equation modeling, forms of latent-variable analysis. Even without any data collected at all, the structure of a DAG implies variable dependencies and

well-suited to the small-n environment of space medicine.

*3.1.2 Modern causal inference theory: assumptions*

**3. Methodological solutions for research in space medicine**

*DOI: http://dx.doi.org/10.5772/intechopen.92331*

to meet the challenges of the next 60 years.

**3.1 Realigning our epistemology**

to epidemiology for space exploration.

*3.1.1 Guidelines for causation*

*3.1.3 Directed acyclic graphs*

*Introductory Chapter: Research Methods for the Next 60 Years of Space Exploration DOI: http://dx.doi.org/10.5772/intechopen.92331*
