**4.3 Internal consistency**

*Essentials in Hip and Ankle*

"known" groups.

tions of measurement.

**4.2 Reliability**

of variation, that is:

any sources of error.

• **Interpretability of the items:** completing the questionnaire should not require reading skills beyond that of a 12-year-old to avoid missing values and unreliable answers. To meet this recommendation, items should be as short as possible and, written with friendly vocabulary, understandable for a layperson out of health area. Another two points are direct questions, one attribute at a time, and direct reference about the time frame to which the questionnaire refers to.

Evidence for construct validity includes how the scores on the instrument relate to other measures of the construct, in a manner that is consistent with theoretically derived hypotheses concerning the concepts that are being measured. Construct validity should be assessed by testing predefined hypotheses. When testing expected correlations between measures, this can be called convergent validity or divergent validity when dealing with expected differences in scores between

Similar to construct validity is the criterion validity, which refers to the extent to which scores on a particular instrument relate to a gold standard. In a situation where there is no gold standard test or, at least, a well-established measurement tool for a given clinical condition, the analysis of criterion validity can become quite challenging. In these cases, face validity can be achieved by the process of item selection and item reduction. This indicates whether a measure appears to have been designed to measure what it is supposed to measure, in case, ankle-related functionality. Face validity, while contributing to the validity of the data obtained with a measure, is not represented by the outcome of a statistical test but by the judgment of the tester to make sure the measure has been used under similar condi-

Evidence of validity is the first step when choosing an instrument to assess and interpret the effect of pathology and subsequent impairment on physical function,

Reliability relates to score stability, and it concerns the degree to which patients can be distinguished from each other, despite measurement error. Reliability coefficients such intra-class correlation coefficient (ICC) take into account three sources

• The variation among individuals, also known as interindividual variation

• At last, a variation that combines those previous mentioned, which is the error

Index like ICC is used for continuous measures and is expressed as a ration between 0 (low reliability) and 1 (high reliability). High reliability is particularly important for discriminative purposes because the difference observed in a measure should be a perfect reflection of a real change and not overlapped or shadowed by

Authors and test developers should provide clear information about which reliability measure they have used; if ICC is the case, the two-way random effect model is the best option for the far majority of cases. Pearson correlation coefficient is inadequate, because systematic differences are not considered. The correspondent of ICC for ordinal measures is the weighted Cohen's kappa coefficient, which is so

• The personal variation, which is the same as intraindividual variation

attributed to the measurement itself (measurement error)

as well as to compare clinical intervention effectiveness.

**70**

When adding up items with the purpose to measure, a construct is very important to know if those items are well correlated with each other and with the total score generated by them. In other words, it is highly desirable to know if the instrument is homogenous or unidimensional and so if the questionnaire as a whole measure the same concept or construct. This measure of unidimensionality is the internal consistency and is quite presumable that it should be as high or as good as possible.

There are many ways to measure internal consistency. Usually they complement each other in the process of measuring unidimensionality. The principal component analysis or the factor analysis is both very good ways to determine whether the items form only one overall dimension or not. Also confirmatory or exploratory analysis, when applicable, is useful to determine if a given group of items measure one same construct, and therefore are grouped in one scale, or if it would be better to join the items in two or more subscales. The Rasch model is also a way to measure internal consistency by using the fit statistics, which is used to assess unidimensionality, and can be simply explained as a ratio between the observed response and the response predicted by the model. This analysis is also important for evidence of construct validity, which will be better explained in the proper section ahead in this chapter.

Once the scale(s) is(are) defined, then the Cronbach's alpha is the appropriate measure of choice. Here we have two possible situations:


Cronbach's alpha should be interpreted with caution when applied to questionnaires with too many items, approximately more than 20 items. In these cases, the index is usually very high, because Cronbach's alpha is dependent upon the number of items in a scale. The reference value for adequate internal consistency when using the Cronbach's alpha range between 0.70 and 0.95.
