**4. The triumph and controversies surrounding reliability and validity of frailty scales/indicators in elderly population**

It is important to underscore the importance of association between what is characterized as *frailty* and increased susceptibility to ill health among humans of advanced age. From the earlier sections of this chapter, an account of the term *frailty* has been made. However, it is important to analyze those numbers, in a bid to express not only what they suggest but also not to overexpress their usefulness in science. One important caution needs special attention here in that myself as a clinical researcher, positively influenced by biased affection to make judgment using numbers and experimental findings, maybe at risk of committing a *self-fulfilling prophecy*, a typical form of rather known *pygamalion effect,* quite common among scientists dealing with quantitative research methods and applications. Therefore, this section will be dealt with not only interpretation of reliability and validity of different frailty scales/indicators shown before but also the challenges of assuming the score results to individuals as *sin qua non* in frailty assessment to prospective readers be it practicing geriatricians, bio-gerontologists and/or policy makers, and other decision makers in aging field.

First, on reliability aspects, it is important for geriatricians, other clinicians handling senior citizens, clinician-scientists, policy makers as well as other readers alike to be aware of the fact that frailty scales/indicators scores derived from cited studies above do not in actual sense measure reliability at best. There is no doubt that no other statistic in published literature has been a subject of wide confusion than *coefficient* α for reliability test scores. Specifically, *Cronbach's* α coefficient that at best displays homogeneity of test scores has been incorrectly associated with a quality indicator of internal stability score, a direct reflection of a reliability *Reliability and Validity of Clinicopathological Features Associated with Frailty Syndrome… DOI: http://dx.doi.org/10.5772/intechopen.93499*

estimate. This rather subtle cognitive error has been in existence in science for at least 60 years. It was formerly described by Cronbach via a seminal paper published in 1951 [42]. To simply describe the extent of the spread of the flaw, as well as the confusion therein, until the time this line you read was first typed by the author, at the peak of a COVID19 pandemic, the Cronbach's paper published in Psychometrika back in 1951, had been cited more than 45,000 times in published literature worldwide. Details about the flaws (as well as the resulting confusion) of *Cronbach's* α as an index measure of reliability of test scores are beyond the scope of this book. However, just to give a glimpse to readers, I have decided to provide a narrative account of the fallacy behind usage of Cronbach's α coefficient as a measure of reliability of test results.

Cronbach's α coefficient as a test statistic in principle is consistently and incorrectly taken as a measure of internal stability of test scores, and therefore an estimate of internal consistency. It has been shown that Cronbach's α coefficient cannot provide investigators that sort of information [43]. Cronbach's α coefficient is at best the *greatest lower bound* to reliability estimate, and therefore almost always an underestimate of a reliability coefficient α for internal consistency of test scores [43]. At this juncture, it is important to remind readers on what exactly is internal consistency of test scores results. Simply written, internal consistency of test score results refer to interrelatedness of a set of items, be it test scores results or any other of non-singular matrix scores [44]. It therefore follows that much of the confusion surrounding Cronbach's α coefficient dates back to Cronbach's paper of 1951 [42] in that Cronbach used internal consistency and homogeneity synonymously [44]. It is clear nowadays, therefore, that Cronbach's α coefficient may attain values that are outside the scope of possible reliability scores from a single test result. I would like just to mention a solution to this challenge, just sparingly to include standard measurement errors in the form of the following equation:

$$
\sigma\_{\mathbf{y}} = \sigma\_{\mathbf{x}} \left( \mathbf{1} - \rho\_{\mathbf{x}} + \mathbf{x}\_{\text{\textquotedblleft}\mathbf{y}} \right)^{1/2} \tag{1}
$$

where ρx+x'+—test score reliability in a population, σx—standard deviation of a population of interest, and σy—standard measurement error of a sample of interest.

It is important to remind readers that application of standard measurement error as a measure of internal consistency of test scores assumes each individual score results originated from a test with the same accuracy [43]. Details of this method of assessing internal consistency, and therefore inherent reliability of any given test scores, are given in other published findings of the past [43**–**47].

In this chapter, I have hesitated myself from committing a rather common statistical crime. It is well known that meta-analysis of findings from individual studies, customarily using forest plots, is an efficacious way of deriving effect size as well as identifying small and insignificant statistical results. However, I must admit there have been strong attempts to pool reliability and validity estimates from different studies here. The decision at the end, of not to include forest plots, from meta-analysis in this chapter, is based on the same philosophy, behind the chapter, namely, *reliability and validity* for test scores of sample estimates. On a frank note, there is profound heterogeneity reported from the study sample used for assessment of reliability and validity of frailty scales and indicators in publication database. This made all attempts toward "*forest plotting*" a futile exercise on philosophical grounds. It is quite obvious that given wealthy of statistical tests available to date, there were remedial measures to account for heterogeneity of those referred studies. However, given the fact that data were different from how they were conceptualized, and

not only in the way they were analyzed, made all those statistical tests available for estimating heterogeneity, a *non-starter* in this endeavor.

At large, these studies differ significantly on the basis of their designs. For instance, whereas findings in **Table 1** reflected assessment of Cronbach's alpha coefficient, for what was referred to as internal consistency, out of studies targeting Clinical Frailty Scale, the study by Rockwood and colleagues in Canada was conceived as a prospective observational study [20]. Moreover, Chong and colleagues' study conducted in Singapore was designed in a retrospective fashion [23]. It therefore comes out automatic that *total population at risk* was a distinguishing feature between these two studies. Clearly, with prospective data, one can quantify *population at risk***,** whereas in retrospective data, such a count is not possible. The difference in probability counts of risk between those two studies does not end in risk estimates. It blows out in any calculation involving probabilistic appointments, including the early stages of obtaining Cronbach's α coefficient. Pooling out estimates from these two study designs (prospective vs. retrospective) is no difference from mixing oranges and mangoes together. Whereas the idea may seem useful in gastronomy, it is *a statistical crime*, equivalent to a third-degree murder in jurisprudence [48]. Details about the flaws in pooling estimates of retrospective and prospective designs together are described in length in mathematical statistical literature [49**–**52].

Apart from the design differences between studies whose estimates were pooled as means to assess reliability and validity in this chapter, heterogeneity is also suspected to be present from publication bias. Quite commonly in biomedical research and databases, studies are only published if they attain positive outcomes as per research questions designed by investigators. Whereas the message here is not to support the idea, as I personally believe in learning from findings with negative results from their hypothesized questions, I found it an important message to remind readers. It is quite possible that there were other studies left behind simply because they either failed to appear in press for what so ever reasons or they were left behind merely out of ignorance by the author during retrieval of information used to pool these data. At this point, it should be clear that there are quantitative mechanisms of assessing heterogeneity in statistical data [39, 40, 52**–**61]. However, those techniques are far behind abilities to correct what went wrong during design stage. It was therefore futile to justify application of those techniques to data that was conceived in either retrospective fashion or out of publication bias.

On a positive note, however, the findings from these studies do probably highlight an important construct that is related to diminished ability of various body systems, currently coined as *frailty*. This is because in most of these studies, all of their domains (physical, psychological, or social) do reflect some deficiencies that are commonly associated with those of advanced age, who we may safely assume, to reflect a true concept of frailty. Until now, it must be born to the minds of readers that the lack of gold standard decision rule for assessing frailty forces scholars to make comparisons to available tools. It is therefore a call to action for future researchers in aging research to consider design and development of more innovative concepts and tools in assessment of frailty [62**–**64].

Lastly, and as a matter of urgent priority, geriatricians, aging research scientists as well as other practitioners and decision makers in health need to consider different population base in their future research on frailty. At present, there appears to be palpable evidence that demographic transition has started, and likely to mature soon, in parts of sub-Saharan Africa [65]. For instance, it is quite evident that Tanzania, just like other sub-Saharan African countries, has its population undergoing *demographic transition* [65], perhaps at a faster rate than what was

*Reliability and Validity of Clinicopathological Features Associated with Frailty Syndrome… DOI: http://dx.doi.org/10.5772/intechopen.93499*

seen in Europe in the nineteenth century and early parts of twentieth century. It is clear that part of what may be termed as *residual effects*, in ascertaining factors associated with frailty and other aging-related concepts, to be better explained by environmental milieu found in sub-Saharan Africa, rather than the developed North. It is therefore a matter of intellectual maturity that future studies on quality indicators in frailty assessment will also be tested and validated in population found in the South.

Likewise, on a pioneering scale, global efforts in the interplay of *mind* and *organosystemic degeneration*s in later life need a critical eye among aging researchers. At present, there is a lot of confusion not only among geropsychologists but even among clinician-scientists caring for the senior citizens across nations. For instance, there is a clear gap in research evidence on inability to characterize the cognitive domain in the illustration of the concept of frailty, in addition to clear discrepancies in how best to handle cognitive abnormalities in the oldest old. It also follows the logic that the current interventions and strategies in psychosocial interventions the world over to be *porous* at best and segregate the senior citizens at worst. To this end, I propose that psychosocial challenges arising directly or indirectly from the aging process to be handled using data-based findings. Moreover, there is a desperate call for inductive research to deductive thinking in the science of geropsychology the world over. Short of that, most scales/indicators of frailty will have a *lack-of-fit* on the basis of their missing domain of the psyche.

## **Acknowledgements**

I wish to convey my sincere vote of thanks to my fellow trustees of **Ultimate Family Health Care**, namely, *Drs Godfrey Swai* and *Mathew Mwanjali*, whose inspiration and courage during the chapter preparation, notwithstanding their tolerance and experts' opinion, to review the chapter manuscript, is highly appreciated. Special thanks goes to these two personalities for their inception of the concept of "*family health*" in Tanzania, including their time-to-time "*free mentorship*," both in clinical practice and clinical research works, I have been making with them, out of which my current practice in Tanzania follows and grows. It is from this group interaction, the Dar es Salaam Longitudinal Ageing Study has been conceptualized.

I also wholeheartedly thank all patients and other clients in geriatric/endocrinology clinics at Moyo Safi Hospital, Alshifa Medical and Dialysis Centre as well as AB hospitals in Dar es Salaam, Tanzania, for their openness and very welcoming ground to work and appreciate my geriatric practice. Special vote of thanks goes to Dr Chuor Garang de Alier for his tireless efforts in tracing some of the full text documents used as references in this chapter. His helping hand, out of his tight schedule, at times via midnight calls, as a student at St. Hugh's College in Oxford, aiming at ensuring all references cited, have been read in context, is highly appreciated.

## **Conflict of interest**

No conflict of interest declared in the preparation of this manuscript.

*Frailty in the Elderly - Understanding and Managing Complexity*
