**2. State of the art**

The most widely cited experiment on vowel perception and acoustics is a simple one conducted at Bell Telephone Laboratories [2]. In that paper, authors recorded repetitions of ten vowels in /h V d/ context uttered by 33 men, 28 women, and 15 children. From these recordings, the first three formant frequencies (*F*1 − *F*3) as well as the fundamental frequency (*F*0) were extracted. Nevertheless, there was considerable formant frequency variability among participants, and formant frequency patterns overlapped substantially.

Formant frequencies have been already well-studied in both American and British English vowels [2–7]. On another note, remarkable numerical investigations were performed by Jan Awrejcewicz involving vocal cord oscillations and primary resonances [8, 9] and other particular effects as stability and bifurcation phenomena [10].

As far as phonology teaching is concerned, Pavón implemented a software programme [1] as a learning tool for his university students of English. One of Pavón's software applications is the fact that users can record a specific phoneme and compare it with an already existing

distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2014 Munoz-Luna et al.; licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

©2012 Munoz-Luna, Jurado-Navas and Taillefer de Haya, licensee InTech. This is an open access chapter

phoneme in his software programme. This sound comparison results in a graphical degree of similarity expressed as percentages, showing the resemblance between user and programme sound waves.

10.5772/57221

331

http://dx.doi.org/10.5772/57221

**Figure 1.** Organ of speech: A. Lips, B. Teeth, C. Teeth ridge, D. Hard palate, E. Soft palate, F. Uvula, G. Pharynx, H. Tongue body, I. Tongue tip, J. Blade, K. Tongue front, L. Back of the tongue, M. Tongue root, N. Jaw, O. Epiglottis, P. Thyroid cartilage,

Spectral Study with Automatic Formant Extraction to Improve Non-native Pronunciation of English Vowels

Vocal tract filter selectively passes energy in the harmonics of the source. The size/shape of the vocal tract determines the amount of energy that is used in oral speech. For each vocalic sound, the so-called formants describe their characteristic resonance. In fact, the vocal tract transfer function for a particular vowel is defined by formant bandwidth and frequency. We can model the acoustic properties of the vocal tract as a tube open at one end, which is the mouth, and closed at the glottis. Assuming this tube uniformity, resonant frequencies can be

*Fn* <sup>=</sup> (2*<sup>n</sup>* <sup>−</sup> <sup>1</sup>)*<sup>c</sup>*

where *n* is the number of the formant, *c* is the speed of sound, and *L* is the length of the tube. However, we also need to consider acoustic constrictions in the vocal tract. One way of modelling the acoustic properties of vowels is to represent the vocal tract as a concatenation of tubes [16]. An alternative approach is known as perturbation theory, which deals with

First formant frequency (*F*1) is traditionally influenced by the shape of the vocal tract. *F*1 is inversely related to tongue height: low vowels have high *F*1 and high vowels have low

vocalic acoustics in terms of relationship between air pressure and speed [17].

<sup>4</sup>*<sup>L</sup>* , (1)

Q. Cricothyroid cartilage, R. Trachea, S. Oral cavity, T. Nasal cavity. Figure taken from [1].

calculated with the following formula:

**3.1. Formant frequencies of the vowels**

Nevertheless, as Pavón himself states, this is an approximate value and it depends on recording conditions (e.g. room noise and external variables), which make an indicative result. Although the idea is conceptually good, a frequency domain analysis is required in order to draw out the degree of resemblance between users' wave forms and those included in the system. On the one hand, software programmes do not distinguish between male and female voice recordings even though fundamental frequencies and formants are different in both cases. Women present peak energy in higher frequencies when talking, and Pavón's software only includes female recordings. On a different matter, time domain comparisons are not significant: results are very often meaningless.

For this reason, this paper attempts to improve the afore-mentioned software including a frequency domain analysis by means of fundamental frequency and *F*1, *F*2 identification. This would allow a more significant comparison between users' recordings and programme audio database. At the same time, depending on formant position, learners will receive information on mouth opening and tongue positioning according to each vowel sound. Consequently, we are making use of authors' previous research on audio signal processing [11], knowledge on communication channels [12], numerical methods [13], analytical modelling [14] and English applied linguistics [15]. This theoretical framework backs up a useful tool for students of English who want to autonomously improve their pronunciation of English vowels.

Finally, we are only focusing on vocalic sounds since not all human sounds offer well-defined formants. Vowels, on their part, do have distinct formants and their study complements oral language teaching, in this case, of the English language.
