**5. "Traditional statistics" and item response theory (Rasch and factor analysis)**

The Rasch model and the factor analysis constitute two ways of assessing psychometric properties of an instrument and can be, and frequently are, used in functional scales development. These two statistical procedures have the same theoretical model, which is the item response theory. The basic concept behind IRT is that the probability of choosing a response for each item is a function of both the subject's or patient's ability and the difficulty level of each item.

When applying the concepts of IRT to psychometric properties analysis, it is possible to obtain more detailed information about validity, accuracy, and targeting that helps understanding the clinical meaning of a self-report instrument. It goes beyond just looking at the final score of a questionnaire or at cutoff scores. This closer look at outcome measures like functional scales adds information to those obtained by traditional statistical tests, e.g., Cronbach alpha or ICC. IRT not only improves the methodological quality when elaborating new instruments but gives clearly insights into effects of intervention as well, whether comparing groups or the subject longitudinally.

Rasch analysis can be applied to examine instruments or assessment scales applicable in wide spectrum of disciplines, including studies in health area, education, marketing, economy, and social sciences. In the majority of evaluations, a well-defined group is selected to answer a series of predefined items. The Rasch model offers a mathematical theoretical reference by which researches that elaborate instruments are able to create comparable measures. The main point behind this model is the concept of unidimensionality, which can be summarized by the idea that useful clinical measures involve the analysis of only one human attribute at the time. In other words, it implies that the instrument measures a single latent ability. Taking a self-report questionnaire as an example, this would mean the items are organized according to their difficulty level and are placed in a single linear hierarchic scale.

The Rasch model transforms ordinal scales into interval measures. This process allows us to calibrate item difficulty and subject's ability in a same linear *continuum,* which is divided into equal intervals or *logits*. The *logits* is defined by items and works similarly as a ruler on which individuals are organized accordingly to their level of ability. The probabilistic model of Rasch analysis can be defined by the following formula:

$$Pni 
\left(\infty = 1\right) = f(Bn - Di)$$

**73**

*Patient-Report Outcome Measures for Ankle-Related Functionality*

where *P* is the probability of an "*n*" individual to succeed on a given event "*i*" in any trial. This probability equals to the mathematical function *f* of the subtraction from the "*n*" individual's ability "*B*" in relation to the "*i*" item's difficulty level "*D*." This probability can be extrapolated for multilevel items, i.e., for non-dichotomous responses. As a result of this procedure under IRT concepts, an item characteristic curve can be drawn, which represents the probability of choosing a response for each item based on the subject's or patient's ability. A typical item characteristic curve is defined by two properties: item's difficulty and item's discrimination power. Taking back the ruler analogy cited above, the difficulty of an item functions as a location index that is where in the *continuum* of ability the item works better. Hard items function with high-ability individuals as well as easy items do the same with low-ability subjects. By discrimination power, it means how well an item can separate individuals whose abilities are below or above the item location. Graphically, this property appears as the steepness of the item characteristic curve and can be interpreted as the steeper the greater the discrimination power. A flat curve means that the probability of a right answer is nearly the same with low or high levels of ability. It is worthwhile to stress out that this two properties only describe the form of item characteristic curve, and consequently how well an item function, but it cannot be used as a proof of item validity. Applying these concepts for multilevel item, Likertlike scale, for example, each answer possibility would have its own curve with distinct peaks. All the curves together should measure the spectrum of ability

If all items meet this probabilistic expectation, it is possible to state that the questionnaire, as a whole, assesses an unidimensional construct. This probabilistic framework constitutes the basis of Rasch model and thereby makes it possible to organize items by their difficulty level as well as by the patients' ability level, both

Questionnaires should be responsive to changes in the status of the patient across the spectrum of ability. Another benefit of IRT is that it provides the amount of information that each item contributes at varying levels of ability. Easy items should provide information among low-ability levels examinees, and conversely, hard items that describe difficult tasks give information among high-ability examinees. The questionnaire final score is, therefore, the sum of all these information collected by each item, and the accuracy of it is directly proportional to the amount of information provided. The target of an evaluative instrument is to provide information across all ability ranges. Therefore, such an appropriate evaluative questionnaire should contain items that assess an individual's ability to perform activities

The results of an item characteristic curve are valuable only when the following

*DOI: http://dx.doi.org/10.5772/intechopen.89509*

measured by the item.

requirements are met:

1.Unidimensionality

2.Local independence

3.No time constraints

based on the observed answering pattern.

that span from easy to more challenging ones.

a.The questionnaire measures a single latent trait.

a.The answer for each item is independent from another item.

a.Should be no time limit or restriction when answering the test.

#### *Patient-Report Outcome Measures for Ankle-Related Functionality DOI: http://dx.doi.org/10.5772/intechopen.89509*

*Essentials in Hip and Ankle*

**analysis)**

the subject longitudinally.

hierarchic scale.

following formula:

for the area under the curve is at least 0.70.

distinguish one from another, which compromises reliability.

subject's or patient's ability and the difficulty level of each item.

Another adequate and common measure of responsiveness is the area under the receiver-operating characteristics (ROC) curve. It is very useful to define cutoff scores for discriminative purposes and to define injury severity. The reference value

One point that impact negatively on responsiveness is the presence of floor or ceiling effects. They are considered to be present when more than 15% of respondents achieved the lowest or highest possible score. Thus, the responsiveness is limited because changes cannot be measured in these patients nor is it possible to

Limitations in measurements, such as ceiling or floor effects, can usually be avoided by selecting measures that have been demonstrated to provide meaningful information about people who are similar to those being measured. In other words, the target population of each measurement tool must be considered by matching the sample, e.g., patients with the appropriate questionnaire or functional scale.

**5. "Traditional statistics" and item response theory (Rasch and factor** 

The Rasch model and the factor analysis constitute two ways of assessing psychometric properties of an instrument and can be, and frequently are, used in functional scales development. These two statistical procedures have the same theoretical model, which is the item response theory. The basic concept behind IRT is that the probability of choosing a response for each item is a function of both the

When applying the concepts of IRT to psychometric properties analysis, it is possible to obtain more detailed information about validity, accuracy, and targeting that helps understanding the clinical meaning of a self-report instrument. It goes beyond just looking at the final score of a questionnaire or at cutoff scores. This closer look at outcome measures like functional scales adds information to those obtained by traditional statistical tests, e.g., Cronbach alpha or ICC. IRT not only improves the methodological quality when elaborating new instruments but gives clearly insights into effects of intervention as well, whether comparing groups or

Rasch analysis can be applied to examine instruments or assessment scales applicable in wide spectrum of disciplines, including studies in health area, education, marketing, economy, and social sciences. In the majority of evaluations, a well-defined group is selected to answer a series of predefined items. The Rasch model offers a mathematical theoretical reference by which researches that elaborate instruments are able to create comparable measures. The main point behind this model is the concept of unidimensionality, which can be summarized by the idea that useful clinical measures involve the analysis of only one human attribute at the time. In other words, it implies that the instrument measures a single latent ability. Taking a self-report questionnaire as an example, this would mean the items are organized according to their difficulty level and are placed in a single linear

The Rasch model transforms ordinal scales into interval measures. This process allows us to calibrate item difficulty and subject's ability in a same linear *continuum,* which is divided into equal intervals or *logits*. The *logits* is defined by items and works similarly as a ruler on which individuals are organized accordingly to their level of ability. The probabilistic model of Rasch analysis can be defined by the

*Pni*(*x* = 1) = *f*(*Bn* − *Di*)

**72**

where *P* is the probability of an "*n*" individual to succeed on a given event "*i*" in any trial. This probability equals to the mathematical function *f* of the subtraction from the "*n*" individual's ability "*B*" in relation to the "*i*" item's difficulty level "*D*." This probability can be extrapolated for multilevel items, i.e., for non-dichotomous responses. As a result of this procedure under IRT concepts, an item characteristic curve can be drawn, which represents the probability of choosing a response for each item based on the subject's or patient's ability. A typical item characteristic curve is defined by two properties: item's difficulty and item's discrimination power. Taking back the ruler analogy cited above, the difficulty of an item functions as a location index that is where in the *continuum* of ability the item works better. Hard items function with high-ability individuals as well as easy items do the same with low-ability subjects. By discrimination power, it means how well an item can separate individuals whose abilities are below or above the item location. Graphically, this property appears as the steepness of the item characteristic curve and can be interpreted as the steeper the greater the discrimination power. A flat curve means that the probability of a right answer is nearly the same with low or high levels of ability. It is worthwhile to stress out that this two properties only describe the form of item characteristic curve, and consequently how well an item function, but it cannot be used as a proof of item validity. Applying these concepts for multilevel item, Likertlike scale, for example, each answer possibility would have its own curve with distinct peaks. All the curves together should measure the spectrum of ability measured by the item.

If all items meet this probabilistic expectation, it is possible to state that the questionnaire, as a whole, assesses an unidimensional construct. This probabilistic framework constitutes the basis of Rasch model and thereby makes it possible to organize items by their difficulty level as well as by the patients' ability level, both based on the observed answering pattern.

Questionnaires should be responsive to changes in the status of the patient across the spectrum of ability. Another benefit of IRT is that it provides the amount of information that each item contributes at varying levels of ability. Easy items should provide information among low-ability levels examinees, and conversely, hard items that describe difficult tasks give information among high-ability examinees. The questionnaire final score is, therefore, the sum of all these information collected by each item, and the accuracy of it is directly proportional to the amount of information provided. The target of an evaluative instrument is to provide information across all ability ranges. Therefore, such an appropriate evaluative questionnaire should contain items that assess an individual's ability to perform activities that span from easy to more challenging ones.

The results of an item characteristic curve are valuable only when the following requirements are met:

#### 1.Unidimensionality

a.The questionnaire measures a single latent trait.

#### 2.Local independence

a.The answer for each item is independent from another item.

#### 3.No time constraints

a.Should be no time limit or restriction when answering the test.

4.No guessing as an answer

a.A correct answer may not due to guessing but reflect the person's ability.

This implies that only one latent ability accounts for the individual's response for each of the items contained on the instrument, which is exactly the unidimensionality mentioned throughout this section. Both factor analysis and the Rasch model can ascertain this aspect of construct validity. Those items that did not fit to the model should be revised or eliminated accordingly with scale's goals.
