**4.2 Reliability**

Reliability relates to score stability, and it concerns the degree to which patients can be distinguished from each other, despite measurement error. Reliability coefficients such intra-class correlation coefficient (ICC) take into account three sources of variation, that is:


Index like ICC is used for continuous measures and is expressed as a ration between 0 (low reliability) and 1 (high reliability). High reliability is particularly important for discriminative purposes because the difference observed in a measure should be a perfect reflection of a real change and not overlapped or shadowed by any sources of error.

Authors and test developers should provide clear information about which reliability measure they have used; if ICC is the case, the two-way random effect model is the best option for the far majority of cases. Pearson correlation coefficient is inadequate, because systematic differences are not considered. The correspondent of ICC for ordinal measures is the weighted Cohen's kappa coefficient, which is so

**71**

*Patient-Report Outcome Measures for Ankle-Related Functionality*

measure of choice. Here we have two possible situations:

could be removed from the questionnaire.

the Cronbach's alpha range between 0.70 and 0.95.

the preferred option for such variables. In groups or samples with 50 subjects or above, the value of 0.70 is the minimum recommendation for both indexes.

tency and is quite presumable that it should be as high or as good as possible.

When adding up items with the purpose to measure, a construct is very important to know if those items are well correlated with each other and with the total score generated by them. In other words, it is highly desirable to know if the instrument is homogenous or unidimensional and so if the questionnaire as a whole measure the same concept or construct. This measure of unidimensionality is the internal consis-

There are many ways to measure internal consistency. Usually they complement each other in the process of measuring unidimensionality. The principal component analysis or the factor analysis is both very good ways to determine whether the items form only one overall dimension or not. Also confirmatory or exploratory analysis, when applicable, is useful to determine if a given group of items measure one same construct, and therefore are grouped in one scale, or if it would be better to join the items in two or more subscales. The Rasch model is also a way to measure internal consistency by using the fit statistics, which is used to assess unidimensionality, and can be simply explained as a ratio between the observed response and the response predicted by the model. This analysis is also important for evidence of construct validity, which will be better explained in the proper section ahead in this chapter. Once the scale(s) is(are) defined, then the Cronbach's alpha is the appropriate

• A very low Cronbach's alpha indicates that there is no reason to group the items together in a same scale or questionnaire, because there are not well correlated

• On the other hand, a very high Cronbach's alpha suggests that maybe there are redundant items, which means that they measure almost the same attribute of functionality. When this happens it is valuable to judge if one or more items

Cronbach's alpha should be interpreted with caution when applied to questionnaires with too many items, approximately more than 20 items. In these cases, the index is usually very high, because Cronbach's alpha is dependent upon the number of items in a scale. The reference value for adequate internal consistency when using

A large number of definitions and methods have been proposed for assessing responsiveness. A very good comprehensive definition for responsiveness is the ability of the instrument to detect clinically important changes in an individual's status over time even when these changes are small. This ability is the accuracy of the instrument that must be able to differentiate clinical observed changes from measurement error. Even though an instrument can capture very small changes, what really matters is to know if a change is clinically relevant. The Guyatt's responsiveness ratio (RR) does precisely this comparison by relating the variability found within the subject with between the subjects. The reference value for RR is 1.96 because this happens when the minimal important change equals the smallest

*DOI: http://dx.doi.org/10.5772/intechopen.89509*

**4.3 Internal consistency**

with each other

**4.4 Responsiveness**

detectable change.

the preferred option for such variables. In groups or samples with 50 subjects or above, the value of 0.70 is the minimum recommendation for both indexes.
