System Accuracy (%) Execution Time (minutes) Baseline system 87.79 34.14

As future work, we propose investigating more word-combination cases. In particular, we expect that the construct phrases *Idafa* (اإلضافة (make a good candidate. Examples include: .(lquds 'madynatu ,مدينة القدس ) , (bayrwt maTaru ,مطار بيروت) ,(jibal silsilt ,سلسلة جبال) Another suggested candidate is the Arabic "and" connective (العطف واو(, such as: ( أدبية مواد ولغوية, mawad 'dabiyah wa lughawiyah ), (yata'allaqu biqaDaya 'l' iraqi wa 'lsudan يتعلق ،والسودان العراق بقضايا(. A hybrid system could also be investigated. It is possible to use the different cross-word modeling approaches in a one ASR system. It is also worthy to investigate how to model the compound words in the language model. In our method, we create a new sentence for each compound word. we suggest to investigate representing the compound word exclusively with its neighbors. for example, instead of having two

88.48 30.31

َرن

ب , barnamijDakhm) and

1 phonological rules 90.09 33.49 2 PoS tagging 90.18 33.05 3 small word merging 89.95 34.31 A comprehensive research work should be made to find how to effectively represent the compound words in the language model. In addition, we highly recommend further research in PoS tagging for Arabic.

## **10. Conclusion**

The proposed knowledge-based approach to model cross-word pronunciation variations problem achieved a feasible improvement. Mainly, PoS tagging approach was used to form compound words. The experimental results clearly showed that forming compound words using a noun and an adjective achieved a better accuracy than merging of a preposition and its next word. The significant enhancement we achieved has not only come from the crossword pronunciation modeling in the dictionary, but also indirectly from the recalculated ngrams probabilities in the language model. We also conclude that Viterbi algorithm works better with long words. Speech recognition research should consider this fact when designing dictionaries. We found that merging words based on their types (tags) leads to significant improvement in Arabic ASRs. We also found that the proposed method outperforms the other cross-word methods such as phonological rules and small-words merging.

## **Author details**

Dia AbuZeina, Husni Al-Muhtaseb and Moustafa Elshafei *King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia* 

## **Acknowledgement**

The authors would like to thank King Fahd University of Petroleum and Minerals for providing the facilities to write this chapter. We also thank King Abdulaziz City for Science and Technology (KACST) for partially supporting this research work under Saudi Arabia Government research grant NSTP # (08-INF100-4).

## **11. References**


Afify M, Nguyen L, Xiang B, Abdou S, Makhoul J. Recent progress in Arabic broadcast news transcription at BBN. In: Proceedings of INTERSPEECH. 2005, pp 1637–1640

Cross-Word Arabic Pronunciation Variation Modeling Using Part of Speech Tagging 301

Farghaly A, Shaalan K (2009) Arabic natural language processing: challenges and

Gales MJF, Diehl F et al (2007) Development of a phonetic system for large vocabulary Arabic speech recognition. IEEE workshop on automatic speech recognition and

Gallwitz F, Noth E, et al (1996) A category based approach for recognition of out-ofvocabulary words. In: Proceedings of fourth international conference on spoken

Hwang M-H (1993) Subphonetic acoustic modeling for speaker-independent continuous speech recognition, Ph.D. thesis, School of Computer Science, Carnegie Mellon

Hyassat H, Abu Zitar R (2008) Arabic speech recognition using Sphinx engine. Int J Speech

Imai T, Ando A et al (1995) A new method for automatic generation of speaker-dependent phonological rules. 1995 international conference on acoustics, speech, and signal

Jelinek F (1999) Statistical methods for speech recognition, Language, speech and

Khasawneh M, Assaleh K et al (2004) The application of polynomial discriminant function classifiers to isolated Arabic speech recognition. In: Proceedings of the IEEE

Kirchhofl K, Bilmes J, Das S, Duta N, Egan M, Ji G, He F, Henderson J, Liu D, Noamany M, Schoner P, Schwartz R, Vergyri D (2003) Novel approaches to Arabic speech recognition: report from the 2002 John-Hopkins summer workshop, ICASSP 2003, pp

Kuo HJ, Mangu L et al (2010) Morphological and syntactic features for Arabic speech recognition. 2010 IEEE international conference on acoustics speech and signal

Lamel L, Messaoudi A et al (2009) Automatic speech-to-text transcription in Arabic. ACM Trans Asian Lang Inform Process 8(4):1–1822 2 Arabic Speech Recognition Systems Lee KF (1988) Large vocabulary speaker independent continuous speech recognition: the

Messaoudi A, Gauvain JL et al (2006) Arabic broadcast news transcription using a one million word vocalized vocabulary. 2006 IEEE international conference on acoustics,

Mokhtar MA, El-Abddin AZ (1996) A model for the acoustic phonetic structure of Arabic language using a single ergodic hidden Markov model. In: Proceedings of the fourth

Muhammad G, AlMalki K et al (2011) Automatic Arabic digit speech recognition and formant analysis for voicing disordered people. 2011 IEEE symposium on computers

Nofal M, Abdel Reheem E et al (2004) The development of acoustic models for command and control Arabic speech recognition system. 2004 international conference on

Sphinx system. Doctoral dissertation, Carnegie Mellon University.

speech and signal processing, 2006. ICASSP 2006 proceedings

international conference on spoken language, 1996. ICSLP 96

electrical, electronic and computer engineering, 2004. ICEEC'04

Jurafsky D, Martin J (2009) Speech and language processing, 2nd edn. Pearson, NJ

solutions. ACM Trans Asian Lang Inform Process 8(4):1–22.

understanding, 2007. ASRU

language, 1996. ICSLP 96

University.

I344–I347

processing (ICASSP)

and informatics (ISCI)

Tech 9(3–4):133–150

processing, 1995. ICASSP-95

communication series. MIT, Cambridge, MA

international joint conference on neural networks, 2004


Farghaly A, Shaalan K (2009) Arabic natural language processing: challenges and solutions. ACM Trans Asian Lang Inform Process 8(4):1–22.

300 Modern Speech Recognition Approaches with Case Studies

Int J Speech Tech 10:183–195

King Saud University

issue 4, pp 211–220

and applications, 2001

2006. ICTTA'06

technology, Rhodes, Greece.

language processing, 2009. SNLP'09

microelectronics, 2003. ICM 2003

understanding, 2007. ASRU

2008. ICCES 2008

Volume 2, Issue 4, 2009, pp. 67-80.

information technology, pp 195–199

recognition. Telektronikk, 2.2003, pp 70–82.

review. Speech Commun 49(10–11):763–786.

Afify M, Nguyen L, Xiang B, Abdou S, Makhoul J. Recent progress in Arabic broadcast news

Ali, M., Elshafei, M., Alghamdi M. , Almuhtaseb, H. , and Alnajjar, A., "Arabic Phonetic Dictionaries for Speech Recognition". Journal of Information Technology Research,

Alotaibi YA (2004) Spoken Arabic digits recognizer using recurrent neural networks. In: Proceedings of the fourth IEEE international symposium on signal processing and

Al-Otaibi F (2001) speaker-dependant continuous Arabic speech recognition. M.Sc. thesis,

Amdal I, Fosler-Lussier E (2003) Pronunciation variation modeling in automatic speech

Azmi M, Tolba H,Mahdy S, Fashal M(2008) Syllable-based automatic Arabic speech recognition in noisy-telephone channel. In: WSEAS transactions on signal processing proceedings, World Scientific and Engineering Academy and Society (WSEAS), vol 4,

Bahi H, Sellami M (2001) Combination of vector quantization and hidden Markov models for Arabic speech recognition. ACS/IEEE international conference on computer systems

Benzeghiba M, De Mori R et al (2007) Automatic speech recognition and speech variability: a

Billa J, Noamany M et al (2002) Audio indexing of Arabic broadcast news. 2002 IEEE international conference on acoustics, speech, and signal processing (ICASSP) Bourouba H, Djemili R et al (2006) New hybrid system (supervised classifier/HMM) for isolated Arabic speech recognition. 2nd Information and Communication Technologies,

Choi F, Tsakalidis S et al (2008) Recent improvements in BBN's English/Iraqi speech-to-speech translation system. IEEE Spoken language technology workshop, 2008. SLT 2008 Clarkson P, Rosenfeld R (1997) Statistical language modeling using the CMU-Cambridge toolkit. In: Proceedings of the 5th European conference on speech communication and

Elmahdy M, Gruhn R et al (2009) Modern standard Arabic based multilingual approach for dialectal Arabic speech recognition. In: Eighth international symposium on natural

Elmisery FA, Khalil AH et al (2003) A FPGA-based HMM for a discrete Arabic speech recognition system. In: Proceedings of the 15th international conference on

Emami A, Mangu L (2007) Empirical study of neural network language models for Arabic speech recognition. IEEE workshop on automatic speech recognition and

Essa EM, Tolba AS et al (2008) A comparison of combined classifier architectures for Arabic speech recognition. International conference on computer engineering and systems,

transcription at BBN. In: Proceedings of INTERSPEECH. 2005, pp 1637–1640 Alghamdi M, Elshafei M, Almuhtasib H (2009) Arabic broadcast news transcription system.


Owen Rambow, David Chiang, et al., Parsing Arabic Dialects, Final Report – Version 1, January 18, 2006

**Chapter 0**

**Chapter 13**

**VOICECONET: A Collaborative Framework for**

**Study for Brazilian Portuguese**

Nelson Neto, Pedro Batista and Aldebaro Klautau

Additional information is available at the end of the chapter

commercially attractive and underrepresented languages.

learning, especially in developing countries like Brazil [16].

cited.

http://dx.doi.org/10.5772/47835

**1. Introduction**

**Speech-Based Computer Accessibility with a Case**

In recent years, the performance of personal computers has evolved with the production of ever faster processors, a fact that enables the adoption of speech processing in computer-assisted education. There are several speech technologies that are effective in education, among which text-to-speech (TTS) and automatic speech recognition (ASR) are the most prominent. TTS systems [45] are software modules that convert natural language text into synthesized speech. ASR [18] can be seen as the TTS inverse process, in which the digitized speech signal, captured for example via a microphone, is converted into text.

There is a large body of work on using ASR and TTS in educational tasks [14, 37]. All these speech-enabled applications rely on *engines*, which are the software modules that execute ASR or TTS. This work proposes a collaborative framework and associated techniques for constructing speech engines and adopts accessibility as the major application. The network has an important social impact in decreasing the recent digital divide among speakers of

The incorporation of computer technology in learning generated multimedia systems that provide powerful "training tools", explored in computer-assisted learning [34]. Also, Web-based learning has become an important teaching and learning media [52]. However, the financial cost of both computer and software is one of the main obstacle for computer-based

The situation is further complicated when it comes to people with special needs, including visual, auditory, physical, speech, cognitive, and neurological disabilities. They encounter serious difficulties in having access to this technology and hence to knowledge. For example, according to the Brazilian Institute of Geography and Statistics (IBGE), 14.5% of the Brazilian

> ©2012 Neto et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0),which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly

©2012 Neto et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

```
http://old-site.clsp.jhu.edu/ws05/groups/arabic/documents/finalreport.pdf
```
