Experiment Total compound

**Table 7.** Statistical information for compound words

should be VBD instead of NN.

Sentence to be tagged

Stanford Tagger output (read from left to right)

**Table 8.** Example of Stanford Arabic Tagger Errors

the training set. Hence,

three cases.

The OOV was also measured for the performed experiments. Our ASR system is based on a closed vocabulary, so we assume that there are no unknown words. The OOV was calculated as the percentage of recognized words that do not belong to the testing set, but to

which is equal to 328/9288\*100= 3.53%. For the enhanced cases, Table 6 shows the resulting OOVs. Clearly, the lower the OOV the better the performance is, which was achieved in all

Table 7 shows some statistical information collected during experiments. The "Total compound words" is the total number of Noun-Adjective cases found in the corpus transcription. The "unique compound words" indicates the total number of Noun-Adjective cases after removing duplicates. The last column, "compound words replaced" is the total number of compound words that were replaced back to their original two disjoint words

total words in the testing set ∗ 100

unique compound words

ھذا وقال رئيس لجنة الطاقة بمجلس النواب ورئيس الرابطة الروسية للغاز إن

'lrabiTa 'lrwsiya llghaz 'ina 'l'iHtikarati 'liwrobiya

DTJJ/األوروبية DTNNS/االحتكارات NNP/إن

 NN/بمجلس DTNN/الطاقة NN/لجنة NN/رئيس NN/وقال DT/ھذا NNP/للغاز DTJJ/الروسية DTNN/الرابطة NN/ورئيس DTNN/النواب

hadha waqala ra'ysu lajnati 'lTaqa bimajlisi 'lnuwab wa ra'ysu

compound words replaced

االحتكارات األوروبية

OOV �baseline system� <sup>=</sup> none testing set words

Table 10 shows comparison results of the suggested methods for cross-word modeling. It shows that PoS tagging approach outperform the other methods ( i.e. the phonological rules and small word merging) which were investigated on the same pronunciation corpus. The use of phonological rules was demonstrated in (AbuZeina et al. 2011a) while merging of small-words method was presented in (AbuZeina et al. 2011b). even though PoS tagging seems to be better than the other methods, more research should be carried out for more confidence. So, the comparison demonstrated in Table 10 is subject to change as more cases need to be investigated for both techniques. That is, cross-word was modeled using only two Arabic phonological rules, while only two compounding schemes were applied in PoS tagging approach.

The recognition time is compared with the baseline system. The comparison includes the testing set which includes 1144 speech files. The specifications of the machine where we

conducted the experiments were as follows: a desktop computer which contains a single processing chip of 3.2GHz and 2.0 GB of RAM. We found that the recognition time for the enhanced method is almost the same as the recognition time of the baseline system. This means that the proposed method is almost equal to the baseline system in term of time complexity.

Cross-Word Arabic Pronunciation Variation Modeling Using Part of Speech Tagging 299

A comprehensive research work should be made to find how to effectively represent the compound words in the language model. In addition, we highly recommend further

The proposed knowledge-based approach to model cross-word pronunciation variations problem achieved a feasible improvement. Mainly, PoS tagging approach was used to form compound words. The experimental results clearly showed that forming compound words using a noun and an adjective achieved a better accuracy than merging of a preposition and its next word. The significant enhancement we achieved has not only come from the crossword pronunciation modeling in the dictionary, but also indirectly from the recalculated ngrams probabilities in the language model. We also conclude that Viterbi algorithm works better with long words. Speech recognition research should consider this fact when designing dictionaries. We found that merging words based on their types (tags) leads to significant improvement in Arabic ASRs. We also found that the proposed method outperforms the other cross-word methods such as phonological rules and small-words

The authors would like to thank King Fahd University of Petroleum and Minerals for providing the facilities to write this chapter. We also thank King Abdulaziz City for Science and Technology (KACST) for partially supporting this research work under Saudi Arabia

Abushariah, M. A.-A. M.; Ainon, R. N.; Zainuddin, R.; Elshafei, M. & Khalifa, O. O. Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus. Int. Arab J. Inf. Technol., 2012, 9, 84-93 AbuZeina D., Al-Khatib W., Elshafei M., "Small-Word Pronunciation Modeling for Arabic Speech Recognition: A Data-Driven Approach", Seventh Asian Information Retrieval

AbuZeina D., Al-Khatib W., Elshafei M., Al-Muhtaseb H., "Cross-word Arabic pronunciation variation modeling for speech recognition" , International Journal of

' mma fy 'l'urdun faqad

**10. Conclusion** 

merging.

**Author details** 

**Acknowledgement** 

**11. References** 

Dia AbuZeina, Husni Al-Muhtaseb and Moustafa Elshafei

Government research grant NSTP # (08-INF100-4).

Societies Conference, Dubai, 2011b.

Speech Technology , 2011a.

*King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia* 

research in PoS tagging for Arabic.


**Table 10.** Comparison between cross-word modeling techniques

## **9. Further research**

As future work, we propose investigating more word-combination cases. In particular, we expect that the construct phrases *Idafa* (اإلضافة (make a good candidate. Examples include: .(lquds 'madynatu ,مدينة القدس ) , (bayrwt maTaru ,مطار بيروت) ,(jibal silsilt ,سلسلة جبال) Another suggested candidate is the Arabic "and" connective (العطف واو(, such as: ( أدبية مواد ولغوية, mawad 'dabiyah wa lughawiyah ), (yata'allaqu biqaDaya 'l' iraqi wa 'lsudan يتعلق ،والسودان العراق بقضايا(. A hybrid system could also be investigated. It is possible to use the different cross-word modeling approaches in a one ASR system. It is also worthy to investigate how to model the compound words in the language model. In our method, we create a new sentence for each compound word. we suggest to investigate representing the compound word exclusively with its neighbors. for example, instead of having two complete sentences to represent the compound words (جضخم َ امِ َ َرنب , barnamijDakhm) and ُ ُردن) ِياأل ف , fy'l'urdun) as what we proposed in our method:

َة َب َ ِة َ العق ِير َمِدين َطو ِت َ ِام َ جضخم ل َرن َ َّم َو ُضع ب َد ت َق ُ ُردن ف ِي األ َ َّما ف أ

'mma fy 'l'urdun faqad tamma wad'u barnamijDakhm litaTwyru madynati 'l ' aqabati َة َب َ ِة َ العق ِير َمِدين َطو ِت َ ِامج َضخم ل َرن َ َّم َو ُضع ب َد ت َق ُ ُردن ف ِياأل َ َّما ف أ

' mma fy'l'urdun faqad tamma wad'u barnamijDakhm litaTwyru madynati 'l ' aqabati We propose to add the compound words only with their adjacent words like:

ِير َطو ِت َ ِام َ جضخم ل َرن َو ُضع ب

waD'u barnamijDakhm litaTwyr

$$\vec{\omega} \circ \vec{\omega} \circ \vec{\omega}$$

' mma fy 'l'urdun faqad

298 Modern Speech Recognition Approaches with Case Studies

4 Combined system

**9. Further research** 

ُ ُردن) ِياأل

ِير َمِدين

ِير َمِدين

َ ِام َ جضخم ل

ُ ُردن ف ِياأل َطو ِت

َطو ِت

> َرن َو ُضع ب

waD'u barnamijDakhm litaTwyr

َ َّما ف أ َ ِام َ جضخم ل

َ ِامج َضخم ل

َرن

َرن

َة َب َ ِة َ العق

َة َب َ ِة َ العق

ِير َطو ِت

َد َق (1,2,and3)

**Table 10.** Comparison between cross-word modeling techniques

complete sentences to represent the compound words (جضخم َ امِ َ

ف , fy'l'urdun) as what we proposed in our method:

َ َّم َو ُضع ب

َ َّم َو ُضع ب

َد ت َق

> َد ت َق

We propose to add the compound words only with their adjacent words like:

ُُردن ف

> ُ ُردن ف ِياأل

'mma fy 'l'urdun faqad tamma wad'u barnamijDakhm litaTwyru madynati 'l ' aqabati

' mma fy'l'urdun faqad tamma wad'u barnamijDakhm litaTwyru madynati 'l ' aqabati

ِي األ

َ َّما ف أ

َ َّما ف أ

complexity.

conducted the experiments were as follows: a desktop computer which contains a single processing chip of 3.2GHz and 2.0 GB of RAM. We found that the recognition time for the enhanced method is almost the same as the recognition time of the baseline system. This means that the proposed method is almost equal to the baseline system in term of time
