**4. Bias and diversity in meta-analytically pooled estimates of treatment effects**

Meta-analyses have the potential to provide a precise and valid estimate of an treatment effect on the outcome of interest, as they statistically combine the available evidence relevant for a particular research question. Accordingly, meta-analyses have been established as the top level of the hierarchy of evidence [36]. In a meta-analysis, a pooled estimate of the treatment effect is calculated using the treatment effects obtained from each of the included studies. A metaanalysis thus heavily relies on the necessity of including all relevant studies, or at least a random sample of the relevant studies. This becomes more important, of course, if the total number of relevant studies is small, because each individual study may have a larger impact on the pooled effect size estimate in this case. Therefore, a meta-analysis should be preceded by a systematic review, which intends to identify all studies that addressed a particular research question. A systematic review should be conducted using a documented and systematic approach [37].

Conducting a meta-analysis typically increases the precision of the estimated treatment effect, because the number of patients that contribute to the pooled estimate of the treatment effect is larger in the meta-analysis than in each individual study. But estimates from individual studies may vary considerably. The presence of between-study heterogeneity in effect sizes from individual psychotherapy outcome studies suggests that the total pool of studies may be divided into subgroups of studies that show either larger or smaller treatment effects. The presence of between-study heterogeneity may hint at potential sources of bias or at genuine diversity [38]. Heterogeneity, which is commonly present in meta-analyses of psychotherapy outcome studies, does not necessarily prohibit the conduct of meta-analysis, but rather demands exploration of potential sources of variation [35].

Thus, reducing unsystematic error in the data will result in more precise estimates of treatment effects, while avoiding systematic error—that is bias and genuine diversity—will reduce heterogeneity and increase validity. Thus, bias is different from unsystematic random error and can be regarded as the opposite of validity [39]. Bias has been defined as 'any process at any stage of inference, which tends to produce results or conclusions that differ systematically from the truth ([40], p. 60).' This means that bias may lead to an overestimation or to an underestimation of the true effect. It is important to note that a particular type of bias may lead to opposite deviations from the true effect in different studies [41]. Theoretically, genuine diversity may be differentiated from bias. Nevertheless, the presence of genuine diversity in the studies, which contribute to a pooled effect-estimate in a meta-analysis, may as well reduce the validity of the pooled estimate, because genuine diversity may distort an overall pooled estimate just the same way as bias does.

The issue of bias and diversity in meta-analyses has previously been related to three typically occurring problems in meta-analyses: first, a meta-analysis may not reduce or eliminate bias that has been present in the included studies. For example, if effect estimates from a large number of methodologically flawed studies are combined with only few methodologically sound studies in a meta-analysis, the pooled effect estimate will be biased as well (the so-called *garbage-in, garbage-out problem* [42, 43]). Second, with respect to the potentially present genuine diversity, meta-analysis may even introduce bias in estimating a treatment effect: for example, if the included studies differed regarding study characteristics that may affect the treatment effect (i.e. studies are genuinely diverse), the pooled effect estimate may be invalid (the socalled *apples-and-oranges problem* [42, 43]). If, for example, studies with patients that fulfill diagnostic criteria of two or more mental disorders had systematically larger treatment effects than studies that included patients who fulfill diagnostic criteria of one mental disorder only, combining treatment effects from both subsets of studies would result in an invalid pooled effect estimate. Thus, genuine diversity in the studied samples, treatments, treatment providers or study methodology may reduce the validity of the pooled estimate of the treatment effect. Finally, the validity of meta-analyses may be reduced if the sample of studies that is considered for meta-analytic pooling of treatment effects is not representative of all relevant studies (the so-called *file-drawer problem* [43, 44]). This problem is related to difficulties to publish studies with negative or non-significant results, especially if the study samples are small. In published articles of small-sized studies, treatment effects thus tend to be large and significant. If only published articles are considered for meta-analysis, the obtained effect estimates may only poorly reflect the true treatment effect. The *file-drawer problem* has consequently also been described as publication bias. The problems that may occur from including mainly small and underpowered studies in a meta-analysis have been summarized nicely by

Cuijpers ([9], p. 2): 'If a therapy is found to be superior to an existing therapy in an underpowered trial that would rather raise doubts about the validity of the trial than trust that this new therapy is indeed more effective.'

from individual psychotherapy outcome studies suggests that the total pool of studies may be divided into subgroups of studies that show either larger or smaller treatment effects. The presence of between-study heterogeneity may hint at potential sources of bias or at genuine diversity [38]. Heterogeneity, which is commonly present in meta-analyses of psychotherapy outcome studies, does not necessarily prohibit the conduct of meta-analysis, but rather

Thus, reducing unsystematic error in the data will result in more precise estimates of treatment effects, while avoiding systematic error—that is bias and genuine diversity—will reduce heterogeneity and increase validity. Thus, bias is different from unsystematic random error and can be regarded as the opposite of validity [39]. Bias has been defined as 'any process at any stage of inference, which tends to produce results or conclusions that differ systematically from the truth ([40], p. 60).' This means that bias may lead to an overestimation or to an underestimation of the true effect. It is important to note that a particular type of bias may lead to opposite deviations from the true effect in different studies [41]. Theoretically, genuine diversity may be differentiated from bias. Nevertheless, the presence of genuine diversity in the studies, which contribute to a pooled effect-estimate in a meta-analysis, may as well reduce the validity of the pooled estimate, because genuine diversity may distort an overall pooled

The issue of bias and diversity in meta-analyses has previously been related to three typically occurring problems in meta-analyses: first, a meta-analysis may not reduce or eliminate bias that has been present in the included studies. For example, if effect estimates from a large number of methodologically flawed studies are combined with only few methodologically sound studies in a meta-analysis, the pooled effect estimate will be biased as well (the so-called *garbage-in, garbage-out problem* [42, 43]). Second, with respect to the potentially present genuine diversity, meta-analysis may even introduce bias in estimating a treatment effect: for example, if the included studies differed regarding study characteristics that may affect the treatment effect (i.e. studies are genuinely diverse), the pooled effect estimate may be invalid (the socalled *apples-and-oranges problem* [42, 43]). If, for example, studies with patients that fulfill diagnostic criteria of two or more mental disorders had systematically larger treatment effects than studies that included patients who fulfill diagnostic criteria of one mental disorder only, combining treatment effects from both subsets of studies would result in an invalid pooled effect estimate. Thus, genuine diversity in the studied samples, treatments, treatment providers or study methodology may reduce the validity of the pooled estimate of the treatment effect. Finally, the validity of meta-analyses may be reduced if the sample of studies that is considered for meta-analytic pooling of treatment effects is not representative of all relevant studies (the so-called *file-drawer problem* [43, 44]). This problem is related to difficulties to publish studies with negative or non-significant results, especially if the study samples are small. In published articles of small-sized studies, treatment effects thus tend to be large and significant. If only published articles are considered for meta-analysis, the obtained effect estimates may only poorly reflect the true treatment effect. The *file-drawer problem* has consequently also been described as publication bias. The problems that may occur from including mainly small and underpowered studies in a meta-analysis have been summarized nicely by

demands exploration of potential sources of variation [35].

234 A Multidimensional Approach to Post-Traumatic Stress Disorder - from Theory to Practice

estimate just the same way as bias does.

Thus, all three briefly introduced problems in meta-analyses threaten the validity of the pooled effect estimate. They differ, however, with respect to the interpretation of the estimated treatment effects (**Figure 1**): first, the *garbage-in, garbage-out problem* reflects the bias that has

**Figure 1.** Interpreting pooled effect estimates in the presence of between-study heterogeneity. The validity of the pooled effect estimates depends on the type of problem, which is responsible for the observed between-study heterogeneity.

already been present in a subgroup of poor-quality studies. The pooled effect estimate across all studies as well as the pooled effect estimate of the poor-quality subgroup of studies will be biased. In this case, only the effect estimate of the high-quality subgroup of studies may be regarded as valid. Second, the *apples-and-oranges problem* reflects meaningful variation between effect estimates due to dissimilarity between studies on relevant study characteristics (i.e. genuine diversity). That is, the pooled effect estimate across all included studies may be invalid whereas the pooled effect estimates in each subgroup of studies may be valid. Third, if the *filedrawer problem* was present, the pooled effect estimate of published studies is expected to differ from the pooled effect estimate of unpublished studies, as study results may more likely be published if they have statistically significant results. Many of the unpublished studies may, therefore, have non-significant results. Thus, a meta-analysis restricted to the published studies would probably provide a higher result compared to a meta-analysis restricted to the unpublished studies [45]. Only including both, published and unpublished studies, would warrant the validity of the effect-estimate. The difference between published and unpublished studies should particularly be the case if the study samples are small [46], which further complicates the issue. If a meta-analysis considers only published studies and non-significant results are most likely to be lacking if a study was small in scale, publication bias should be most potent in the small-scale studies and less pronounced or even not present in the large-scale studies. In this case (i.e. when including only published studies), the pooled effect estimate including all studies as well as the pooled effect estimate restricted to small-scale studies might be biased, whereas the effect estimate in the large-scale studies will be most valid with respect to publication bias. It is important to note, however, that the presence of any of the three problems indicates only an increased probability of bias in a meta-analysis rather than being necessarily associated with bias in the meta-analytically pooled effect estimates.
