**3.2 Validation of STEM literacy items**

The results of measuring items and persons can be seen in **Table 5**. Person separation obtained a value of 0.96 or less than two. This means that the test instrument may be made less sensitive in distinguishing students with high abilities and students with low abilities. Based on these results, additional test items may be needed to increase the sensitivity of the test instrument. There are three difficulty levels of test items based on item separation, namely easy, medium, and difficult. The reliability of the person is 0.48, while the item reliability is 0.89. These results reflect that the consistency of student answers is weak. However, the overall quality of the items in the instrument's reliability aspect is good.

There are several factors that affect person reliability. The first factor is the variance of the student's ability level. A wider range of abilities equals a higher person's reliability. The measurement results in **Table 5** show low personal reliability caused by the low variance of students' ability levels. In addition, the value of person separation is still less than two. The second factor is the length of the test or the length of the rating scale. A longer test is a test with a higher person reliability score. The next factor is the number of categories per item. Person reliability will be higher if each item has many categories. The last factor is the targeting of sample items. Better targeting will result in higher person reliability measurements.


#### **Table 5.**

*Items analysis results.*

The factors that affect item reliability are the variance of item difficulty, sample size, and do not depend on the length of the test. Higher item reliability will be obtained on condition that the item difficulty range is wider. In addition, the larger the sample used, the higher the reliability of the items obtained by the researcher. The item reliability factor that differs from person reliability is that item reliability does not depend on the length of the test and is also not influenced by model fit.

Cronbach's alpha value (KR-20) in **Table 6** shows that the results of the measurement of the interaction reliability between items and persons. The KR-20 value of 0.48 (weak) is a reliability value in classical theory measurements, which in the Rasch


#### **Table 6.**

*Cronbach alpha analysis results.*


#### **Table 7.**

*Mean square fit statistic results.*

model is equivalent to person reliability. Therefore, the low KR-20 value is caused by the low variance of the student's ability level. To improve this reliability value, instrument tests can be carried out on a number of samples with extreme variance in ability levels. Items are declared fit with the model when the more difficult items must be more difficult for students to answer. And conversely, the item is declared not fit with the model when the easy to answer item is easy for students. Both statements are true regardless of the student's ability level.

MNSQ infit-outfit value (mean square fit statistic) (**Table 7**) indicates the suitability of the data with the model. Based on the results of the analysis of the overall test, the infit-outfit value is in the excellent range, which indicates that the overall test instrument is fit with the model. Meanwhile, the infit-outfit ZTSD (z-score standardized fit statistic) shows the results of the t-test for the hypothesis of the suitability of the data with the model. The ZSTD infit-outfit value is in the range of −1.9 to +1.9, which indicates that the overall data can be estimated logically.

## **4. Conclusion**

Teachers experienced the workshop activities in developing STEM literacy assessment. They got knowledge of STEM education best practice in Japan from the expert. Most of the teachers have experienced in developing and implementing STEM-based lesson, but no one has it in the STEM literacy assessment. Teacher found difficulties in creating technology, engineering, and mathematics literacies. It indicated from the validation results, which showed low reliability. Person separation obtained a value of 0.96 or less than two. This means that the test instrument may be made less sensitive in distinguishing students with high abilities and students with low abilities. Based on these results, additional test items may be needed to increase the sensitivity of the test instrument. Therefore, it needs more practice for teachers in creating more sensitive items to distinguish students ability.

*Education Annual Volume 2023*
