**10. Difficulty index**

The item difficulty (easiness, facility index, P-value) is the percentage of students who answered an item correctly [6, 40]. The difficulty index ranges from 0 to 100, whereas the higher the values indicate, the easier the question and the


**Table 3.**

*The ideal (optimal) difficulty level (for tests with 100 items).*

low value represents the difficulty of hard items. The ideal (optimal) difficulty levels for type A MCQs is varying according to the number of the options (**Table 3**) [49, 50]. The range of items difficulty can be categorized into difficult, moderate, and easy. Easy and difficult items were reported to have very little discrimination power [48]. Item difficulty is related to the item and the examinee that took the test in the given time [24]. Thus, reusing of the item depending on its difficulty index should be controlled. Some authors found that difficulty indices of items assessing high cognitive levels in Bloom's taxonomy such as evaluation, explanation, analysis, and synthesis are lower than those assessing remembering, understanding, and applying [51, 52].

During item or exam construction, the constructor should aim for acceptably level of difficulty [6]. Sugianto reported that items within the exam could be distributed according to difficulty to moderate level (40%), easy and challenging levels (20%), and easier and more challenging levels (10%) [6]. Other authors reported that most items should be of moderate difficulty or 5% should be in the difficult range [50, 53]. Some authors found that difficulty indices of items assessing high cognitive levels in Bloom's taxonomy such as evaluation, explanation, analysis, and synthesis are lower than those assessing remembering, understanding, and applying [51, 52]. Regarding the general arrangement of test or examination, easy items start first then are followed by difficult ones. At the same time, in the case of diagnostic assessment, the sequence of the learning material is more important [6, 7].

Easy and difficult items affect the item's ability to discriminate between students and show low discrimination power. Some reports described a negative correlation between exam reliability and difficult and easy items [38]. Oermann et al. reported that educationalists must be careful in deleting items with poor DIF because the number of items has more effect on test validity [54]. It is recommended that difficult items should be reviewed for the possible technical and content causes [50]. Possible causes of low difficulty index include uncovered (taught) content material, challenging items, missed key or no correct answer among the item options [55]. Easy items (high *P*-value) can be due to technical causes, or the concerned learning objective (s) were achieved or revisited in coverage that is more superficial [55].

### **11. Interpretation of difficulty index**

In literature including medical education, many ranges of difficulty indices were reported (**Table 4**).



**Table 4.**

*Reference values and interpretation of difficulty index (p-value).*
