**5. Screening in RTI context**

The goal of universal screening is to promote the early identification of reading difficulties or potential reading difficulties. In order to prevent further difficulties,

screening measures that detect a large proportion of at-risk students would be desirable so that appropriate remedial support can be provided to students.

Screening and identification of students with/at-risk for reading difficulties represent an important first step in RTI models, for k-2 grades, and, in addition, for students in upper elementary grades where there is a particularly large percentage of struggling readers [12].

As Ref. [37] noted, during the last decade, responsiveness to intervention (RTI) has become popular among many practitioners. Specifically, it has been used as a means of transforming schooling into a prevention system with multiple levels. In order to be implemented successfully, RTI requires ambitious intent, a comprehensive structure, and coordinated service delivery. The level of its effectiveness also relies on building-based personnel that has specialized expertise at all levels of the prevention system.

In that context, a direct route approach to screening is typically employed by schools. Based on this approach, students identified as at risk by a screening process are directly placed in intervention. Direct route approaches require screening decisions to be highly accurate. However, few studies that have examined the predictive validity of reading measures report achieving recommendations concerning classification accuracy.

Ref. [5] compared two approaches that aimed at improving the classification accuracy of predictors of third-grade reading performance. Findings indicated that relying on single screening measures does not result in high levels of classification accuracy. Classification accuracy improved by 2% when a combination of measures was employed and by 6% when a predicted probability risk index was used.

On the other hand, from an RTI perspective, Ref. [24] investigated whether measures of language ability and/or response to language intervention in kindergarten uniquely predicted reading comprehension difficulties in third grade. A total of 366 participants were administered a battery of screening measures at the beginning of kindergarten and progress monitoring probes across the school year. A subset of participants also received a 26-week Tier 2 language intervention. Participants' achievement in word reading was assessed at the end of second grade, and their performance in reading comprehension was measured at the end of third grade. Results showed that measures of language ability in kindergarten significantly added to the prediction of reading comprehension difficulties over and above kindergarten word reading predictors and direct measures of word reading in second grade.

#### **6. Discriminative accuracy-sensitivity-specificity-ROC analysis**

A screening test could be perceived as effective in case it is norm-referenced, and it has appropriate content, validity and reliability, and ease of administration and interpretation. It also needs to be quick and cost-effective. An additional criterion is related to its discrimination accuracy with emphasis on false negative and false positive rates [7, 11]. The accuracy of screening measures is important given the concern of either mislabeling a child or failing to detect a delay.

Continuous efforts for improvement of accuracy of screening instruments have been reported in the relevant literature. These include using a combination of assessments and assessing risk on a continuum rather than as "fixed" cut scores. In addition, the use of probabilities based on multiple assessments has the potential to enhance the accuracy of the screening process by making screening decisions based on multiple indicators as well as on what is known about the prevalence of the condition under question.

**215**

*Screening Young Children at Risk for Reading Failure DOI: http://dx.doi.org/10.5772/intechopen.82081*

studies were interested in the other aspects (e.g., [19]).

reliability after validity's validation were Refs. [31, 32].

school performance may be suspected [14].

score may be altered to achieve the best results.

these indicators [7, 14].

the intervention.

However, according to Ref. [38], the concept of validity has expanded beyond the traditional correlation coefficient between a criterion and the new measure. It was defined as not only the degree with which the measure assesses the construct but also "the adequacy and appropriateness of the inferences and actions taken on the basis of the scores" (p. 13). Validity thus includes social consequences and relevance/utility in addition to more traditional concepts. Furthermore, the same reference, [38], included reliability, content, and criterion validity as part of construct validity. So, even though only a few of the reviewed studies were interested in reliability of testing measures, in accordance to Ref. [38], a larger number of these

If a test is not valid, then, reliability is moot. In other words, if a test is not valid, there is no point in discussing reliability, because test validity is required before reliability can be considered in any meaningful way. The studies that had emphasized

The validity of any predictive instrument depends in part on two key factors: sensitivity and specificity. To compute sensitivity and specificity using the formula mentioned above, the performance of each child on the assessments was first classified as above or below the cutoff score. A cutoff score is a value below which poor

Ideally, the determination of an appropriate cutoff score should be based upon locally developed norms. Ref. [39] supported the use of local cutoff points as well: "in order to differentiate those 'at-risk' children a cutoff may use local norms for the best predictability for future achievement in that school system" (p. 15). Nevertheless, Ref. [40] argued "the cut-off point(s) between normal reading and disabled reading is always arbitrary" (p. 30). In addition, Ref. [7] agreed that often the cutoff point is an arbitrary value that has been adjusted to achieve the best results in predictive accuracy. Once outcome data have been collected, the cutoff

Emphasis is placed on interpretation of sensitivity and predictive value, both of which reflect a screen's ability to accurately identify or predict subjects who will have a poor outcome. Reported values above 0.80 are considered acceptable for

From RTI's perspective, researchers have argued that high levels of sensitivity are necessary for universal screening measures [12, 37]. Although consensus has not been reached regarding optimal levels of sensitivity, acceptable sensitivity values noted in the literature range from 0.70 to 0.90 [12]. Relatedly, specificity levels of at

Related to the labeling issue is the false positive rate, the number of children identified in kindergarten who were not poor readers in first grade. This means that children who do not need intervention may be identified as in need for it. Administrators may be more concerned with false negative rates as in [9], but another negative consequence related to false positive cases is the additional cost of

However, Ref. [1] supported a different point of view and noted that schools should provide this intervention to as many children as possible, if they desire to maximize their chances for early intervention with the most impaired children. This may seem as a waste of resources at first glance. On the other hand, many of the falsely identified children receiving intervention are likely to be below-average

readers even if they may not be among the most seriously disabled readers. In any case, a possible solution to the over-identification rate was proposed by Ref. [40] by using a two-stage screening process or to provide small-group diagnostic interventions in the first grade. Consistent with them, Ref. [1] reported a significant reduction in the percentage of false negative errors within the same

least 0.70 are generally considered adequate for screening measures.

of struggling readers [12].

prevention system.

classification accuracy.

in second grade.

screening measures that detect a large proportion of at-risk students would be desirable so that appropriate remedial support can be provided to students.

Screening and identification of students with/at-risk for reading difficulties represent an important first step in RTI models, for k-2 grades, and, in addition, for students in upper elementary grades where there is a particularly large percentage

As Ref. [37] noted, during the last decade, responsiveness to intervention (RTI) has become popular among many practitioners. Specifically, it has been used as a means of transforming schooling into a prevention system with multiple levels. In order to be implemented successfully, RTI requires ambitious intent, a comprehensive structure, and coordinated service delivery. The level of its effectiveness also relies on building-based personnel that has specialized expertise at all levels of the

In that context, a direct route approach to screening is typically employed by schools. Based on this approach, students identified as at risk by a screening process are directly placed in intervention. Direct route approaches require screening decisions to be highly accurate. However, few studies that have examined the predictive validity of reading measures report achieving recommendations concerning

Ref. [5] compared two approaches that aimed at improving the classification accuracy of predictors of third-grade reading performance. Findings indicated that relying on single screening measures does not result in high levels of classification accuracy. Classification accuracy improved by 2% when a combination of measures

On the other hand, from an RTI perspective, Ref. [24] investigated whether measures of language ability and/or response to language intervention in kindergarten uniquely predicted reading comprehension difficulties in third grade. A total of 366 participants were administered a battery of screening measures at the beginning of kindergarten and progress monitoring probes across the school year. A subset of participants also received a 26-week Tier 2 language intervention. Participants' achievement in word reading was assessed at the end of second grade, and their performance in reading comprehension was measured at the end of third grade. Results showed that measures of language ability in kindergarten significantly added to the prediction of reading comprehension difficulties over and above kindergarten word reading predictors and direct measures of word reading

was employed and by 6% when a predicted probability risk index was used.

**6. Discriminative accuracy-sensitivity-specificity-ROC analysis**

the concern of either mislabeling a child or failing to detect a delay.

A screening test could be perceived as effective in case it is norm-referenced, and it has appropriate content, validity and reliability, and ease of administration and interpretation. It also needs to be quick and cost-effective. An additional criterion is related to its discrimination accuracy with emphasis on false negative and false positive rates [7, 11]. The accuracy of screening measures is important given

Continuous efforts for improvement of accuracy of screening instruments have been reported in the relevant literature. These include using a combination of assessments and assessing risk on a continuum rather than as "fixed" cut scores. In addition, the use of probabilities based on multiple assessments has the potential to enhance the accuracy of the screening process by making screening decisions based on multiple indicators as well as on what is known about the prevalence of the

**214**

condition under question.

However, according to Ref. [38], the concept of validity has expanded beyond the traditional correlation coefficient between a criterion and the new measure. It was defined as not only the degree with which the measure assesses the construct but also "the adequacy and appropriateness of the inferences and actions taken on the basis of the scores" (p. 13). Validity thus includes social consequences and relevance/utility in addition to more traditional concepts. Furthermore, the same reference, [38], included reliability, content, and criterion validity as part of construct validity. So, even though only a few of the reviewed studies were interested in reliability of testing measures, in accordance to Ref. [38], a larger number of these studies were interested in the other aspects (e.g., [19]).

If a test is not valid, then, reliability is moot. In other words, if a test is not valid, there is no point in discussing reliability, because test validity is required before reliability can be considered in any meaningful way. The studies that had emphasized reliability after validity's validation were Refs. [31, 32].

The validity of any predictive instrument depends in part on two key factors: sensitivity and specificity. To compute sensitivity and specificity using the formula mentioned above, the performance of each child on the assessments was first classified as above or below the cutoff score. A cutoff score is a value below which poor school performance may be suspected [14].

Ideally, the determination of an appropriate cutoff score should be based upon locally developed norms. Ref. [39] supported the use of local cutoff points as well: "in order to differentiate those 'at-risk' children a cutoff may use local norms for the best predictability for future achievement in that school system" (p. 15). Nevertheless, Ref. [40] argued "the cut-off point(s) between normal reading and disabled reading is always arbitrary" (p. 30). In addition, Ref. [7] agreed that often the cutoff point is an arbitrary value that has been adjusted to achieve the best results in predictive accuracy. Once outcome data have been collected, the cutoff score may be altered to achieve the best results.

Emphasis is placed on interpretation of sensitivity and predictive value, both of which reflect a screen's ability to accurately identify or predict subjects who will have a poor outcome. Reported values above 0.80 are considered acceptable for these indicators [7, 14].

From RTI's perspective, researchers have argued that high levels of sensitivity are necessary for universal screening measures [12, 37]. Although consensus has not been reached regarding optimal levels of sensitivity, acceptable sensitivity values noted in the literature range from 0.70 to 0.90 [12]. Relatedly, specificity levels of at least 0.70 are generally considered adequate for screening measures.

Related to the labeling issue is the false positive rate, the number of children identified in kindergarten who were not poor readers in first grade. This means that children who do not need intervention may be identified as in need for it. Administrators may be more concerned with false negative rates as in [9], but another negative consequence related to false positive cases is the additional cost of the intervention.

However, Ref. [1] supported a different point of view and noted that schools should provide this intervention to as many children as possible, if they desire to maximize their chances for early intervention with the most impaired children. This may seem as a waste of resources at first glance. On the other hand, many of the falsely identified children receiving intervention are likely to be below-average readers even if they may not be among the most seriously disabled readers.

In any case, a possible solution to the over-identification rate was proposed by Ref. [40] by using a two-stage screening process or to provide small-group diagnostic interventions in the first grade. Consistent with them, Ref. [1] reported a significant reduction in the percentage of false negative errors within the same

sample of children by doubling the number of children they identified as at risk. About 10% of the children, who scored lowest on their predictive tests, resulted in a 42% false negative rate, while by using 20% of the children who scored lowest on their measures, the false negative rate was reduced to 8%.

Almost all of the studies used as predictors a battery of tests or multiple screening measures as Refs. [1, 9] proposed. However, some of the studies (e.g., see Ref. [18]) had used so many variables that the requisite general characteristics of the effective screening could be affected [7, 11]. So, there must be a balance between the demand of quickness, ease, cost-effectiveness, and other characteristics and the accuracy rate in order for a screening procedure to be possibly developed and accepted by the reading scientific community and educators, parents, and children.

A major contributor to the aspect of the discriminate accuracy is that often only a correlation coefficient between a group's scores on a preschool screening instrument and a later achievement measure is provided in the literature as evidence of the test's effectiveness. Such data, although important, provide information only on the similarity of the group's performance on both tests. A correlation coefficient provides no information as to the specific identification of the at-risk and not-atrisk children and the relationship between such status and the projected outcome of a group or poor reader [13].

Lack of discriminative accuracy data [17, 21, 22, 30] contributes to the difficulty of interpreting their findings in terms of screening effectiveness. Some studies had focused on these aspects and reported a range of accuracy and false positives, false negatives, and sensitivity and specificity. Better results (predictive accuracy over the 80%) regarding these aspects were reported by Refs. [18, 32]. Furthermore, Refs. [19, 33] reported a large number of cases; so, it was unclear which the best one was.

In terms of intervention programs designed to remediate deficiencies in at-risk students, false positives, although undesirable, are not critical. These children will receive a training program that they do not actually require. In some cases, the instruction could actually benefit the child's performance. Nevertheless, a concern of negative positives is that they place an increased demand on scarce resources [25].

On the other hand, a false negative error is more serious because these children do not receive the additional assistance they require at the earliest possible time, which makes their problems more difficult to remediate later [25]. A false negative classification will most likely deprive children of the benefits of early intervention because their test results incorrectly suggest that they are not at risk for learning difficulties. In such cases, the cost to the children may be devastating because they are likely to experience repeated failures and frustrations with academic tasks before they are actually identified and placed appropriately.

Is it possible for a screening measure to have a 0 false negative rate? Ref. [18] answered "no." Their explanations regard the different levels of readiness of children on their entry in school. In any case, scientific efforts will be continued in order to decrease the false rates of screening.

#### **7. Conclusions**

This chapter referred to the early identification and prediction of future low reading achievement and discussed the important aspects regarding effective predictors, the discrimination rate, and the sensitivity and specificity of the screening measures. However, because screening studies have usually used inconsistent measurement of risk factors, including heterogeneous patient populations, and inconsistently adjusted for confounders in multivariate models [34], their findings were not comparable.

**217**

*Screening Young Children at Risk for Reading Failure DOI: http://dx.doi.org/10.5772/intechopen.82081*

predictor of reading comprehension status.

above-mentioned studies.

comparisons.

For the best single or multiple predictors, there is evidence that batteries containing multiple tests generally provide better prediction than single instruments, but the increase in efficiency of multi-test batteries is generally not large enough to warrant the extra time and resources required to administer them [1, 5, 9]. Additionally, vocabulary measures proved to be one of the best unique predictors [23]. Moreover, Ref. [23] found that a measure of expressive vocabulary was a good

The most often measures that could be used as effective predictors were the letter name and letter sound knowledge, phonological awareness, verbal short-term memory, and rapid automatized naming [2, 4, 6, 23]. Very often, screeners were based on reading comprehension, word recognition/decoding, and word fluency [24, 28]. Additionally, some studies found as significant predictors the familial risk, and the child's specific characteristics, as well as his/her developmental and school history [32]. On the other hand, although Refs. [33, 36] found that teacher rating was a significant predictor that is consistent with a number of other studies, these ratings cannot substitute for early identification tests. Therefore, they proposed that combining test and teacher data would improve identification of kindergarten children at risk for reading failure. Recently, Ref. [28]'s findings were consistent with the

A method used for validation of an early screening instrument should incorporate: (a) longitudinal design [6, 27], (b) independent assessments of kindergarten performance and learning ability separated by a temporal interval of specific time, [2, 21, 23, 24], (c) random sampling of children in a validation/cross-validation design, and (d) systematic assessment of predictive utility and validity [12]. There is clear evidence that early screening is a viable process, but this effort will only reach fruition, if research is conducted with appropriate rigor. However, there is a low incidence of educational handicaps, especially in the early grades. This means that a large sample size should be included for screening, and the formative evaluations should be age- and/or grade-specific and valid across grade levels for outcome

More than a lot of the screening studies had longitudinal designs, and, the vast majority of the included studies did not adopt their proposed random sampling of participants. Therefore, a number of limitations emerged regarding the generalizability of the findings to other populations. The sampling of the studies was mainly constructed by self-selection of the participants or was a volunteer sample [8]. As Ref. [17] noted, the number of participants was modest and the sample was not selected randomly. Although the samples seemed representative of the school district from which they were selected, results may not be generalized to the larger population of young children or to specific subgroups. Quite a lot of the research

In summary, effective screening tools demonstrate high levels of sensitivity in correctly identifying those students who will actually encounter difficulties, as well as high levels of specificity in the accurate identification of those who are not likely to demonstrate reading difficulties. Ultimately, the goal is to maximize classification accuracy, a summative measure of the overall proportion of students who were

The importance of early intervention has been proven by a large amount of research findings. In this context, the need for carefully designed and accurate screening measures emerges as crucial. Despite the recent interest and research on

was conducted with those methodological problems.

**8. Future research suggestions**

correctly identified as at-risk or not at-risk on a screening measure.

*Screening Young Children at Risk for Reading Failure DOI: http://dx.doi.org/10.5772/intechopen.82081*

*Early Childhood Education*

a group or poor reader [13].

sample of children by doubling the number of children they identified as at risk. About 10% of the children, who scored lowest on their predictive tests, resulted in a 42% false negative rate, while by using 20% of the children who scored lowest on

Almost all of the studies used as predictors a battery of tests or multiple screening measures as Refs. [1, 9] proposed. However, some of the studies (e.g., see Ref. [18]) had used so many variables that the requisite general characteristics of the effective screening could be affected [7, 11]. So, there must be a balance between the demand of quickness, ease, cost-effectiveness, and other characteristics and the accuracy rate in order for a screening procedure to be possibly developed and accepted by the reading scientific community and educators, parents, and children. A major contributor to the aspect of the discriminate accuracy is that often only a correlation coefficient between a group's scores on a preschool screening instrument and a later achievement measure is provided in the literature as evidence of the test's effectiveness. Such data, although important, provide information only on the similarity of the group's performance on both tests. A correlation coefficient provides no information as to the specific identification of the at-risk and not-atrisk children and the relationship between such status and the projected outcome of

Lack of discriminative accuracy data [17, 21, 22, 30] contributes to the difficulty of interpreting their findings in terms of screening effectiveness. Some studies had focused on these aspects and reported a range of accuracy and false positives, false negatives, and sensitivity and specificity. Better results (predictive accuracy over the 80%) regarding these aspects were reported by Refs. [18, 32]. Furthermore, Refs. [19, 33] reported a large number of cases; so, it was unclear which the best one was. In terms of intervention programs designed to remediate deficiencies in at-risk students, false positives, although undesirable, are not critical. These children will receive a training program that they do not actually require. In some cases, the instruction could actually benefit the child's performance. Nevertheless, a concern of negative positives is that they place an increased demand on scarce resources [25]. On the other hand, a false negative error is more serious because these children do not receive the additional assistance they require at the earliest possible time, which makes their problems more difficult to remediate later [25]. A false negative classification will most likely deprive children of the benefits of early intervention because their test results incorrectly suggest that they are not at risk for learning difficulties. In such cases, the cost to the children may be devastating because they are likely to experience repeated failures and frustrations with academic tasks

Is it possible for a screening measure to have a 0 false negative rate? Ref. [18] answered "no." Their explanations regard the different levels of readiness of children on their entry in school. In any case, scientific efforts will be continued in

This chapter referred to the early identification and prediction of future low reading achievement and discussed the important aspects regarding effective predictors, the discrimination rate, and the sensitivity and specificity of the screening measures. However, because screening studies have usually used inconsistent measurement of risk factors, including heterogeneous patient populations, and inconsistently adjusted for confounders in multivariate models [34], their findings

their measures, the false negative rate was reduced to 8%.

before they are actually identified and placed appropriately.

order to decrease the false rates of screening.

**216**

**7. Conclusions**

were not comparable.

For the best single or multiple predictors, there is evidence that batteries containing multiple tests generally provide better prediction than single instruments, but the increase in efficiency of multi-test batteries is generally not large enough to warrant the extra time and resources required to administer them [1, 5, 9]. Additionally, vocabulary measures proved to be one of the best unique predictors [23]. Moreover, Ref. [23] found that a measure of expressive vocabulary was a good predictor of reading comprehension status.

The most often measures that could be used as effective predictors were the letter name and letter sound knowledge, phonological awareness, verbal short-term memory, and rapid automatized naming [2, 4, 6, 23]. Very often, screeners were based on reading comprehension, word recognition/decoding, and word fluency [24, 28]. Additionally, some studies found as significant predictors the familial risk, and the child's specific characteristics, as well as his/her developmental and school history [32].

On the other hand, although Refs. [33, 36] found that teacher rating was a significant predictor that is consistent with a number of other studies, these ratings cannot substitute for early identification tests. Therefore, they proposed that combining test and teacher data would improve identification of kindergarten children at risk for reading failure. Recently, Ref. [28]'s findings were consistent with the above-mentioned studies.

A method used for validation of an early screening instrument should incorporate: (a) longitudinal design [6, 27], (b) independent assessments of kindergarten performance and learning ability separated by a temporal interval of specific time, [2, 21, 23, 24], (c) random sampling of children in a validation/cross-validation design, and (d) systematic assessment of predictive utility and validity [12]. There is clear evidence that early screening is a viable process, but this effort will only reach fruition, if research is conducted with appropriate rigor. However, there is a low incidence of educational handicaps, especially in the early grades. This means that a large sample size should be included for screening, and the formative evaluations should be age- and/or grade-specific and valid across grade levels for outcome comparisons.

More than a lot of the screening studies had longitudinal designs, and, the vast majority of the included studies did not adopt their proposed random sampling of participants. Therefore, a number of limitations emerged regarding the generalizability of the findings to other populations. The sampling of the studies was mainly constructed by self-selection of the participants or was a volunteer sample [8]. As Ref. [17] noted, the number of participants was modest and the sample was not selected randomly. Although the samples seemed representative of the school district from which they were selected, results may not be generalized to the larger population of young children or to specific subgroups. Quite a lot of the research was conducted with those methodological problems.

In summary, effective screening tools demonstrate high levels of sensitivity in correctly identifying those students who will actually encounter difficulties, as well as high levels of specificity in the accurate identification of those who are not likely to demonstrate reading difficulties. Ultimately, the goal is to maximize classification accuracy, a summative measure of the overall proportion of students who were correctly identified as at-risk or not at-risk on a screening measure.

#### **8. Future research suggestions**

The importance of early intervention has been proven by a large amount of research findings. In this context, the need for carefully designed and accurate screening measures emerges as crucial. Despite the recent interest and research on

screening reading disabilities, the body of research on the effectiveness of these measures remains problematic in terms of methodology and the findings seem to be scant. Therefore, the development of a cost-effective and equitable screening, diagnostic, and supportive method that is acceptable by government, educational authorities, school, children, and parents still remains a scientific challenge.

Therefore, it would be useful to design a large longitudinal study with 3 years' interval. Existing research has often used small and non-representative group sizes; thus, there remains a need for further research emphasizing on appropriate sampling in order to make it easy to extrapolate findings to other sampling and generally other situations.

The development of screening tools that are valid, reliable, easy to manage and interpreted by educators with the highest accuracy, sensitivity and specificity, remains an extremely important necessity.
