2. Levels of evidence: the importance of study design

Therapeutically relevant clinical research evidence can be broadly categorized into studies of an observational nature and those that have a structured experimental study design [4, 27]. Experimental studies, which include randomized controlled trials (RCTs) and methodologically sound meta-analyses of RCTs, are positioned at the top of the hierarchy (Figure 1 and Table 1) [3]. Although nomenclature may change across different categories of research (e.g., experimental, qualitative, outcome, or descriptive), the fundamental premise of LSE stratification remains the same—an organized progression from "low to high" along the spectrum of internal/external scientific validity (and repeatability) [28–32].

Bias in a study design can confound results of an investigation and lead to misrepresentation of the true implications of the intervention/treatment being studied [33]. An RCT is a clinical trial design intended to minimize bias by randomly allocating study participants to two or more interventions or treatment "arms" [14, 34] and often "blinding" patients and investigators from knowing which intervention an individual is receiving. Within this paradigm, each treatment arm may represent a different drug, device, or a procedure. It may also represent different ways of applying or using a process, device, a procedure, or a placebo. By limiting any opportunity for patients, clinicians, or investigators to choose which arm of the trial the participants will be assigned to, RCTs effectively minimize bias through the process of randomizing both known and unknown prognostic variables [4, 18, 35, 36]. The above-mentioned "blinding" process thus allows a "less biased" estimate of the treatment effect that has enabled RCTs to revolutionize medical research, achieving the status of "gold standard" for therapeutic research and holding the top position in the EBM hierarchy of LSEs (Table 1 and Figure 1) [37, 38].

Results from RCTs, although considered the most robust and reliable form of evidence, are not always easily translatable or applicable across diverse clinical settings. Moreover, not every medical decision requires data from an RCT [39]. Implementation of RCT findings may be challenging at a single-institution level, primarily because of procedural, work-flow, and other institution-specific factors [2, 40].

Well-designed observational studies are recognized as level IIa, IIb, or III evidence (Table 1) and generally are easier to conduct than an RCTs, but still provide meaningful clinical evidence [37, 41]. Additionally, observational studies may lay the foundation for the definitive RCT to be conducted. Cohort and case-control studies are the two primary types of observational studies that can demonstrate important associations between exposure and disease [37]. Placed slightly above case-control studies on the LSE hierarchy, cohort studies can be both prospective and retrospective in nature [37, 42]. Prospective cohort studies observe two groups of populations—one group with the risk or prognostic factor of interest and the second group without [9]. These populations (or groups) are followed over a variable period of time to observe the development of a disease or a specific outcome among those with the risk factor and those without. Prospective cohort studies can be tailored to collect data regarding exposure to any specific or rare disease and can be designed to observe multiple outcomes for any given exposure or intervention [37, 43]. Retrospective cohort studies, on the other hand, are historic in nature and look in to the past to analyze disease development within a specific group of subjects based on their known (or declared) exposure status. Retrospective cohort studies are more economical to conduct compared to prospective studies and take a shorter amount of time to complete, although the results from such studies may be incomplete or inaccurate [37, 44, 45]. They may also have advantages in terms of utilization of large national data sets to help analyze and derive relationships that may answer or pose new clinical questions.

understand that all LSEs are important and have their own intrinsic value that corresponds to

Figure 1. Levels of scientific evidence according to different types of study. For each category of research (e.g., experimental, qualitative, outcome, or descriptive), the red arrow indicates the increasing level of scientific evidence, manifested

through greater internal, external, and quantitative result validity. Modified from Tomlin and Borgetto [22].

In this chapter, we outline different LSEs and associated study designs, followed by a detailed discussion on implementing clinical research findings in the context of GOR. Finally, we consider adaptation of evidence-based practice to improve both quality of care and patient

Therapeutically relevant clinical research evidence can be broadly categorized into studies of an observational nature and those that have a structured experimental study design [4, 27]. Experimental studies, which include randomized controlled trials (RCTs) and methodologically sound meta-analyses of RCTs, are positioned at the top of the hierarchy (Figure 1 and Table 1) [3]. Although nomenclature may change across different categories of research (e.g., experimental, qualitative, outcome, or descriptive), the fundamental premise of LSE stratification remains the same—an organized progression from "low to high" along the spectrum of

Bias in a study design can confound results of an investigation and lead to misrepresentation of the true implications of the intervention/treatment being studied [33]. An RCT is a clinical trial design intended to minimize bias by randomly allocating study participants to two or more interventions or treatment "arms" [14, 34] and often "blinding" patients and investigators from knowing which intervention an individual is receiving. Within this paradigm, each treatment arm may represent a different drug, device, or a procedure. It may also represent different ways of applying or using a process, device, a procedure, or a placebo. By limiting any opportunity for patients, clinicians, or investigators to choose which arm of the trial the

their level of clinical relevance and overall impact on patient care [26].

2. Levels of evidence: the importance of study design

internal/external scientific validity (and repeatability) [28–32].

safety across our health systems.

56 Vignettes in Patient Safety - Volume 3

In contrast to cohort studies, case-control studies recruit subjects based on the outcome of interest at the outset of the study [46, 47]. Subjects with a specific outcome are categorized as "cases" and subjects without the specific outcome are categorized as "controls" [47]. Retrospective data regarding the presence of exposure to single or multiple risk factors are then collected from both groups, typically by conducting interviews, surveys, or collecting chart data. Based on the collected data, strength of association between disease and exposure may be determined and provided in the form of odds ratio or relative risk [4]. Case-control studies can provide valuable information about rare diseases or those ailments that have a prolonged latency period [4, 37, 44, 45].

Case series, case reports, and expert opinion constitute the lowest quality evidence on the overall hierarchy of LSE, are inherently retrospective in nature, and most often feature no control or comparison groups (or cases) [48]. These reports are usually narrow in scope, describe a single population subgroup, and are often based on the experiences of an individual researcher or a single institution. The above-mentioned factors render data within the latter LSEs less reliable, possibly difficult to reproduce, and often non-generalizable when applied to a larger (or different) population. Such studies, however, can provide useful information on rare diseases or unique presentations and complications associated with particular interventions or procedures [4, 49–51].

The practice of EBM requires deep and critical analysis of the entire body of available evidence in a specific area, with more fragmentary assessments being considered improper and inadequate [15, 52]. Systematic reviews are a key component of evidence-based health care, and are defined loosely as "secondary analyses" of a large collection of reported results from individual studies for the purpose of integrating the overall findings [53, 54]. Systematic reviews essentially use data from individual studies (most often RCTs) and "pool" these data together to draw a more robust conclusion regarding the effect of the intervention being researched on specific clinical outcome(s) [4, 19, 55]. The primary aim of systematic reviews is to determine whether an effect exists and if that effect is negative or positive in relation to a specific clinical approach or intervention vis-à-vis a pre-defined outcome [54]. By "pooling" data and results from multiple studies, well-designed systematic reviews can answer questions that cannot be sufficiently answered by any individual study [56]. In addition, this approach clearly demonstrates any discrepancies between apparently conflicting studies. Finally, systematic reviews can also be used to generate new hypotheses [54, 57].

Figure 2. Schematic representation of the PDCA (Plan-Do-Check-Act) cycle. Each iteration of the cycle involves a number

Fact versus Conjecture: Exploring Levels of Evidence in the Context of Patient Safety and Care Quality

http://dx.doi.org/10.5772/intechopen.76778

59

Figure 3. The evidence-based medicine cycle begins with Assessment (e.g., determination of need for a new cycle/process). This is followed by Asking pertinent questions (e.g., reasonably answered and searchable issue) and Acquisition of data (e.g., existing literature and targeted de novo gathering of information). The next step is the Appraisal (e.g., critical evaluation of all available data in the context of the primary question and the quality/levels of evidence), and finally, Application of the newly synthesized evidence into existing institutional/patient care matrix. Based on the overall outcome of the currently completed cycle, as well as the institutional needs and areas of focus, the determination of "if/when" to

of procedural checkpoints, with specific sets of associated tasks and critical questions.

begin next cycle is made [143, 148].

Having described the different levels of evidence, it is important to note that the LSE hierarchy is not "set in stone" and a number of factors determine the validity and strength of any particular research study and consequently the evidence. Key elements within study methodology, such as patient inclusion or exclusion criteria, play a critical role not only in determining the level of evidence attributable to any particular finding but also the applicability and translatability of study results to any particular patient or institutional setting. The recognition of inherent biases based on the study setting, financing source(s), and the appropriateness of the statistical analysis plan is important when determining the validity of results. Subsequent sections of our chapter will provide a practical discussion on the practical application of LSEs in the clinical arena, focusing specifically on patient safety and quality of care as well as the role of different grades of recommendations (GOR's) in understanding the implementation of evidence in a particular setting or situation.

#### 3. Levels of scientific evidence: clinical applications and examples

In order to better understand how LSEs are relevant to GORs and EBM, some practical clinical examples are provided below to help clarify these important scientific relationships and associations. Further discussion of GORs and implementation paradigms for clinical scientific evidence (e.g., 5A's, P-D-C-A, Figures 2 and 3, respectively) will then follow, with focus on fostering organizational excellence and a culture of safety [58–60].

Fact versus Conjecture: Exploring Levels of Evidence in the Context of Patient Safety and Care Quality http://dx.doi.org/10.5772/intechopen.76778 59

control or comparison groups (or cases) [48]. These reports are usually narrow in scope, describe a single population subgroup, and are often based on the experiences of an individual researcher or a single institution. The above-mentioned factors render data within the latter LSEs less reliable, possibly difficult to reproduce, and often non-generalizable when applied to a larger (or different) population. Such studies, however, can provide useful information on rare diseases or unique presentations and complications associated with particular interven-

The practice of EBM requires deep and critical analysis of the entire body of available evidence in a specific area, with more fragmentary assessments being considered improper and inadequate [15, 52]. Systematic reviews are a key component of evidence-based health care, and are defined loosely as "secondary analyses" of a large collection of reported results from individual studies for the purpose of integrating the overall findings [53, 54]. Systematic reviews essentially use data from individual studies (most often RCTs) and "pool" these data together to draw a more robust conclusion regarding the effect of the intervention being researched on specific clinical outcome(s) [4, 19, 55]. The primary aim of systematic reviews is to determine whether an effect exists and if that effect is negative or positive in relation to a specific clinical approach or intervention vis-à-vis a pre-defined outcome [54]. By "pooling" data and results from multiple studies, well-designed systematic reviews can answer questions that cannot be sufficiently answered by any individual study [56]. In addition, this approach clearly demonstrates any discrepancies between apparently conflicting studies. Finally, systematic reviews

Having described the different levels of evidence, it is important to note that the LSE hierarchy is not "set in stone" and a number of factors determine the validity and strength of any particular research study and consequently the evidence. Key elements within study methodology, such as patient inclusion or exclusion criteria, play a critical role not only in determining the level of evidence attributable to any particular finding but also the applicability and translatability of study results to any particular patient or institutional setting. The recognition of inherent biases based on the study setting, financing source(s), and the appropriateness of the statistical analysis plan is important when determining the validity of results. Subsequent sections of our chapter will provide a practical discussion on the practical application of LSEs in the clinical arena, focusing specifically on patient safety and quality of care as well as the role of different grades of recommendations (GOR's) in understanding the implementation of

3. Levels of scientific evidence: clinical applications and examples

fostering organizational excellence and a culture of safety [58–60].

In order to better understand how LSEs are relevant to GORs and EBM, some practical clinical examples are provided below to help clarify these important scientific relationships and associations. Further discussion of GORs and implementation paradigms for clinical scientific evidence (e.g., 5A's, P-D-C-A, Figures 2 and 3, respectively) will then follow, with focus on

tions or procedures [4, 49–51].

58 Vignettes in Patient Safety - Volume 3

can also be used to generate new hypotheses [54, 57].

evidence in a particular setting or situation.

Figure 2. Schematic representation of the PDCA (Plan-Do-Check-Act) cycle. Each iteration of the cycle involves a number of procedural checkpoints, with specific sets of associated tasks and critical questions.

Figure 3. The evidence-based medicine cycle begins with Assessment (e.g., determination of need for a new cycle/process). This is followed by Asking pertinent questions (e.g., reasonably answered and searchable issue) and Acquisition of data (e.g., existing literature and targeted de novo gathering of information). The next step is the Appraisal (e.g., critical evaluation of all available data in the context of the primary question and the quality/levels of evidence), and finally, Application of the newly synthesized evidence into existing institutional/patient care matrix. Based on the overall outcome of the currently completed cycle, as well as the institutional needs and areas of focus, the determination of "if/when" to begin next cycle is made [143, 148].

Our discussion will begin with a relatively recent account of clinical investigations into a hypothesized association between silicone breast implants and lymphoma [18, 61–64]. Given the growing number of anecdotal case reports regarding observations of lymphoma following silicone breast implantation, several retrospective cohort studies with large numbers of subjects were conducted, including many years of follow-up data [18, 65–67]. An association was reported in some studies, but no statistically significant conclusion could be drawn, suggesting that in order to demonstrate any linkage between silicone breast implants and lymphoma, a greater LSE will be required. When a high-quality systematic review was performed by combining data from all retrospective cohorts, no significant association was shown between silicone breast implants and the development of lymphoma [63]. This particular story highlights the importance of LSEs and the potential for patient harm (economic, physical, and psychological) when available data are insufficient to make specific clinical management recommendation(s) [68, 69]. At the same time, one might also make an argument that further research is required to increase the certainty of the relationship between variables under scrutiny, but this approach may not be feasible for very rare conditions or occurrences due to various ethical, patient safety, and statistical considerations [18].

In contrast, even well-conducted RCTs are sometimes unsuccessful in swaying medical practice. The University Group Diabetes Program trial, a methodically sound RCT conducted in the late 1960s found lack of efficacy of an anti-diabetic drug tolbutamide compared to diet alone in prolonging life. Furthermore, the study suggested that tolbutamide is less effective than diet alone or diet with insulin as a modulator of cardiovascular mortality [77, 78]. Despite relatively high LSE presented in the study, tolbutamide prescriptions increased, as debate over the trial's interpretation continued for more than a decade [78–80]. Similarly, the Antihypertensive and Lipid-Lowering treatment to prevent Heart Attack Trial (ALLHAT) showed that thiazide diuretics were as effective as modern (and much more expensive) calcium-channel blockers and angiotensin-converting-enzyme inhibitors in treating hypertension [81]. These finding were questioned by pharmaceutical companies, and after an initial resurgence of thiazide prescriptions following the trial's publication [82], the sales of newer antihypertensive

Fact versus Conjecture: Exploring Levels of Evidence in the Context of Patient Safety and Care Quality

http://dx.doi.org/10.5772/intechopen.76778

61

All of the above examples show that no single study can provide definitive answers or understanding of therapeutic response, diagnostic test efficacy, or disease-specific risk factors. The struggle continues between the forces of clinical habit, third-party interests, and objective evidence. Policy-makers, opinion leaders, and providers must embrace both open-mindedness and the value of unbiased research in guiding EBM and evidence-based recommendations [86–88]. Likewise, all healthcare providers must be well versed in both the definitions and the application of the concepts of LSE, GOR, and EBM and must recognize that there are multiple factors at play when deciding which evidence is best and how to apply this evidence [87–89]. It has been proposed that misapplication of clinical scientific evidence may be one of the key barriers to sustainable improvement in healthcare quality and safety in a highly complex

Recommendations from various expert groups are based on different LSEs, ranging from randomized controlled trials to so-called expert opinions, and all come with their own set of limitations that should be considered when transforming research findings into clinical practice. After defining and discussing important aspects pertaining to different LSEs, we will now touch upon some of the pitfalls associated with implementing and following EBM in every day practice.

Introduced as an effort to reduce bias and improve the accuracy of evidence, RCTs have expanded medical knowledge and transformed clinical practice [3]. While RCTs are considered to provide the most internally valid evidence, not all RCTs are methodologically sound and often offer only partial answers. In their "Evidence Based Medicine Manifesto for Better Healthcare," Heneghan et al. [91] state that "too many research studies are poorly designed or executed. Too much of the resulting research evidence is withheld or disseminated piecemeal. As the volume of clinical research activity has grown the quality of evidence has often worsened, which has compromised the ability of all health professionals to provide affordable, effective, high value care for patients" [91]. In addition, RCTs are very challenging to execute,

agents increased [38, 83–85].

4. Important limitations

system with increasingly constrained resources [87, 89, 90].

Another example where ethical, financial, and patient safety considerations preclude the conduct of any prospective, randomized research is the area of retained surgical items (RSI) [56, 70]. The retention of surgical instruments is an extremely rare complication, and thus, any study of methods to prevent this dreaded occurrence would need to be prohibitively large to have the power to show a statistically significant advantage of any particular approach over another. At the same time, justification for prospectively comparing specific interventions or the differential application of protocols/procedures related to RSI risk is ethically questionable at best. Consequently, a meta-analytic study of all existing case-control reports on the topic of RSI was performed, effectively demonstrating that pooled data from three source studies identified potential risk factors for RSI that were not apparent from each individual study [56]. While source reports individually suggested that between 3 and 6 variables may be associated with greater incidence of RSI [70–72], the combined report showed that 7 of 11 potential risk factors were significantly associated with elevated odds for RSI [56]. The above exercise in knowledge synthesis shows that carefully implemented meta-analytic approaches can result in better understanding of an important area of patient safety.

Moving to a different patient safety topic, case-based experiences from the 1950s led physicians to avoid epinephrine injections during hand/finger procedures due to concerns for ischemic complications [18, 73]. Despite the absence of higher level of evidence, avoidance of digital epinephrine injections was widely practiced and taught during that time. Eventually, a comprehensive review of literature between the years 1880 and 2000 was performed, highlighting 48 cases of digital infarction, 21 of which involved epinephrine injections [73]. Subsequent to that, a number of cohort studies were published, reporting no significant association between digit ischemia and local epinephrine injections [74–76]. Based on the conclusions drawn from studies with higher LSE, the original hypothesis was rejected [18]. This example demonstrates how observational and case studies may be inherently biased and that higher levels of scientific evidence must be available before making any definitive conclusions, accepting evidence as fact, and implementing evidence-based recommendations [18].

In contrast, even well-conducted RCTs are sometimes unsuccessful in swaying medical practice. The University Group Diabetes Program trial, a methodically sound RCT conducted in the late 1960s found lack of efficacy of an anti-diabetic drug tolbutamide compared to diet alone in prolonging life. Furthermore, the study suggested that tolbutamide is less effective than diet alone or diet with insulin as a modulator of cardiovascular mortality [77, 78]. Despite relatively high LSE presented in the study, tolbutamide prescriptions increased, as debate over the trial's interpretation continued for more than a decade [78–80]. Similarly, the Antihypertensive and Lipid-Lowering treatment to prevent Heart Attack Trial (ALLHAT) showed that thiazide diuretics were as effective as modern (and much more expensive) calcium-channel blockers and angiotensin-converting-enzyme inhibitors in treating hypertension [81]. These finding were questioned by pharmaceutical companies, and after an initial resurgence of thiazide prescriptions following the trial's publication [82], the sales of newer antihypertensive agents increased [38, 83–85].

All of the above examples show that no single study can provide definitive answers or understanding of therapeutic response, diagnostic test efficacy, or disease-specific risk factors. The struggle continues between the forces of clinical habit, third-party interests, and objective evidence. Policy-makers, opinion leaders, and providers must embrace both open-mindedness and the value of unbiased research in guiding EBM and evidence-based recommendations [86–88]. Likewise, all healthcare providers must be well versed in both the definitions and the application of the concepts of LSE, GOR, and EBM and must recognize that there are multiple factors at play when deciding which evidence is best and how to apply this evidence [87–89]. It has been proposed that misapplication of clinical scientific evidence may be one of the key barriers to sustainable improvement in healthcare quality and safety in a highly complex system with increasingly constrained resources [87, 89, 90].
