**3. Methodological solutions for research in space medicine**

Having seen the problems that small-n settings create in general, how do we solve them? Through a combination of realigning our epistemology, using our current tools differently, and utilizing modern analytic tools developed outside the field of statistics, we can do better research and advance the field of space medicine to meet the challenges of the next 60 years.

#### **3.1 Realigning our epistemology**

*Beyond LEO - Human Health Issues for Deep Space Exploration*

**2.2 Statistical power**

from the data under this framework.

errors, and resign ourselves to having learned nothing.

data are acquired only with great difficulty or expense.

**2.3 Interpretation**

**2.4 Preference for errors**

we fail to learn all we can from our data.

distributed. When samples are strictly observational (i.e., not from a randomized trial) this assumption is often unwarranted. The implication of violating this assumption can be profound: differential probability of exposure and inequitable distributions of potential confounders can lead to what is known as *confounding by indication*, a subtle form of bias that can lead to misleading or even wholly wrong conclusions.

In any statistical analysis our strength of conviction for our conclusions is largely

Both violation of assumptions and low statistical power can frustrate the drawing of inferences under traditional statistical approaches. If we manage to obtain a statistically significant effect, how should we interpret it given the potential for confounding by indication? If we fail to see any significant effects where we believe we ought to *a priori*, how do we interpret that? Does our assessment of the meaning of such results change with larger or smaller variance in our sample? Under traditional approaches we surrender to the probabilities of committing Type I or Type II

The ultimate motive for use of the NHST framework is to reach reasonable conclusions about a population or process from a (large) subsample of it. However, a real yet unintended consequence of the framework is the focus on avoidance of error. The framework itself is centered on the concept of errors in inference: when we can, we design our studies to avoid Type I error while simultaneously trying to limit Type II errors. In so doing we may make these errors—rather than what we might learn from our data—the primary consideration of our scientific activity. It should come as no surprise that when we make avoidance of error our top priority,

Modern science's focus on Type I error has proven to be particularly troublesome. In our quest to never actively assert a false truth we have no doubt passively allowed many truths to go unspoken. It is obvious that Type I errors can cause harm in medicine if new treatments are adopted that are actually harmful to patients. Less obvious is the harm that may result if research into a truly efficacious treatment is abandoned simply because a p-value was too high. Such harm is every bit as real (and every bit as irreversible) as that done by introducing an ineffective treatment. It is especially troubling in initial exploratory studies and those where

dependent on how much data we can observe (sample size), and how consistent our outcomes are within those observed data (variance). In the frequentist statistical context this is reflected in the concept of *statistical power*. Statistical power is defined as the (hypothetical) probability of correctly rejecting the null hypothesis when the null hypothesis is indeed false (and false by a pre-set threshold considered to be of clinical or practical importance). A commonly desired and accepted level of statistical power is 80%. However, it should be noted that even with this level of power, there is a 20% chance of making a Type II error (i.e., incorrectly failing to reject the null hypothesis). Unless the ratio of the standard deviation to the mean (coefficient of variation) is small, the statistical power in small studies is considerably lower than 80%, effectively crippling the ability to confidently draw inferences

**4**

Cognitive dissonance is the feeling of discomfort one feels when actions fail to conform to beliefs. [2] To most scientists, making claims about truth without a statistically significant result to point to elicits substantial cognitive dissonance. This perhaps more than anything demonstrates our over-reliance on NHST as a substitute for a more robust epistemology. There are several things we can do to learn from data without suffering from cognitive dissonance—even without significance tests. Altogether they amount to a different epistemological approach to epidemiology for space exploration.

## *3.1.1 Guidelines for causation*

In 1965 Sir Austin Bradford Hill described nine guidelines for determining causation from scientific evidence. [3] It is worth noting that while one of the guidelines deals with *strength of association*, or what we might recognize as *effect size*, none of the criteria deal with significance testing or p-values. Explicitly, Hill called for examining the *quality* of the relationship between exposure and outcome: the logical features of how the evidence suggests they interact, and how that fits with prior knowledge of the same or similar subject matter. This sort of prescription is well-suited to the small-n environment of space medicine.

### *3.1.2 Modern causal inference theory: assumptions*

Similar to Hill's work, modern causal inference methods may also be of great use in space-health research. These methods have sought to mathematically formalize causation in order to make valid use of observational data for causal estimation and to avoid introducing biases in analyzing such data [4]. Perhaps more important than the methods of analysis that this framework has promoted is the understanding of the assumptions necessary to make causal statements from non-randomized data. Merely understanding the assumptions of positivity, consistency, and conditional exchangeability—and what happens when one violates them—can be of tremendous help when trying to draw inferences based on limited data.

## *3.1.3 Directed acyclic graphs*

A common tool used in modern causal inference is a special type of network graph known as the directed acyclic graph (DAG). These are network maps that reflect causal relationships. DAGs are drawn according to some simple rules, but making and using these diagrams can be quite useful for clarifying thinking and formulating testable hypotheses. If we factorize a joint probability distribution over a DAG, we create a Bayesian Network, a powerful tool of probabilistic inference. If we decompose a correlation or covariance matrix over a DAG, we can do path analysis or structural equation modeling, forms of latent-variable analysis. Even without any data collected at all, the structure of a DAG implies variable dependencies and

independencies, which in turn have implications for what is and is not possible in the system from which the data were acquired, and thus can help guide critical thinking about problems.
