**Wavelet Based Speech Strategy in Cochlear Implant**

Gulden Kokturk *Dokuz Eylul University Turkey* 

#### **1. Introduction**

A significant percentage of the populations in developed countries encounter hearing impairment. Cochlear Implant was developed to increase the hearing capacity of these people. In recent years, adults and children have benefited from the usage of Cochlear Implant and the users were positively affected by the improvement of implant techniques. Although these devices allow for increased performance, a significant gap still remains in speech recognition between Cochlear Implant user and people, who possess normal listening capabilities.

The Cochlear implant prosthesis, works with direct stimulation of the auditory nerve cells of deaf people, whose recipient cells in their cochlea were destroyed (House&Berliner, 1982; House&Urban, 1979; Christiansen&Leigh, 2002; Loeb, 1990; Parkins&Anderson, 1983; Wilson, 1993). The system basically consists of the following sections; microphone, speech processor, transmitter, receiver and electrode array. Mainly, implant stimulation systems, signal processing strategies and techniques in increasing the suitability of the instrument to various patients were the improved upon this prosthesis. However, new developments in this area also exist on the configuration of hearing function, especially signal processing strategies.

Speech processing techniques are very important in increasing the users' hearing potential (Eddington, 1980; Wilson et al., 1991; Dormon et al., 1997; Moore&Teagle, 2002). A variety of strategies were developed in the recent years with the aim to improve the hearing abilities of deaf people and take these abilities closer to those of people with natural hearing. Of these, various speech processing strategies were developed for multi-channel cochlear implants (Derbel et al., 1994; Cheikhrouhou et al., 2004; Gopalakrishna et al., 2010; Millar et al., 1984). These strategies can be classified mainly in three parts; waveform strategies, feature extraction strategies and hybrid strategy (N-of-M strategy).

Wavelet method is a basic method that is used for noise filtering, compression and analysis of nonstationary signals and images. The wavelet transform is an appropriate method for semistationary signals and provides a good resolution in both time and frequency. Several studies were carried out on the use of the wavelet transform for speech processing. The wavelet transform gives better results than traditional methods in improving speech. The wavelet packet transform is a type of the discrete wavelet transform that allows for subband analysis in the second decomposition without any constraints. Basically, the wavelet packet

Wavelet Based Speech Strategy in Cochlear Implant 41

automatic gain control unit. Later the signal is filtered with four side by side frequency bands at the center frequencies of 0.5, 1, 2 and 3.4 kHz. Then, the filtered waveforms are changed into stimulation format and sent to four electrodes that were placed into the

The compressed analog approach gives useful spectral information to the electrodes. But the channel interaction causes problems in the compressed analog approach. Since the stimulation is analog, the stimulus is transmitted continuously to the four electrodes at the same time. The simultaneous stimulation causes channel interaction and can negatively

The continuous interleaved sampling approach was developed by researchers of the Research Triangle Institute (Wilson et al., 1991). In this strategy, the signal is sent through the electrode not by stimulation but by interleaved strokes. Here, the amplitude of the pulse is derived from the envelope of bandpass filter. Later, this resulting envelope is compressed and used for modulation of two-phase pulses. The patient's hearing unit is electronically excited. Non-linear compression function (i.e. logarithmic function) is used to ensure that the envelope output is suitable for the patient's dynamic range. A block diagram of the

Fig. 1. Detailed block diagram of continuous interleaved sampling speech strategy in

cochlea by surgical intervention (Dorman et al., 1989).

continuous interleaved strategy is shown in Figure 1.

affect the performance of the device.

cochlear implant (Loizou, 1998)

**2.2 Continuous interleaved sampling** 

transform adapts to the frequency axis and allows decomposition. This particular decomposition is implemented with the optimization criteria. Studies show that the filter banks used in the sound strategies of cochlear implants are parallel to the filter banks used in the wavelet transform. This indicates that the wavelet transform is a method that can be used in Cochlear implants. However, the author could not find any distinctive study on the analysis of speech using the wavelet packet transform for speech processing.

In this study, a new strategy, which uses the wavelet packet transform in obtaining the sequence of the electrode array to be stimulated, is proposed. Thus the stimulated electrode array is based upon but would be different from the N-of-M Strategy. An experimental study was performed and analyzed using Turkish words. The words were selected and grouped based on Turkish sound knowledge, and were recorded in the silent rooms of Ege University, Department of Otology. The experimental subjects listened to the words for both the N-of-M strategy and the proposed strategy, thus comparative results were obtained. Better intelligibility results were attained by the usage of the proposed strategy.

### **2. Strategies for cochlear implants**

Multi-channel implants provide electrical stimulation in the cochlea using an array of electrodes. An electrode array is used so that different auditory nerve fibers can be stimulated at different places in the cochlea. Electrodes respond for each frequency of the signal and hair cells. The ones near the base of the cochlea are stimulated with high frequency signals while electrodes near the apex are stimulated with low frequency signals.

The various signal processing strategies developed for multi-channel cochlea can be examined under two main categories: waveform strategies and feature-extraction strategies. These strategies extract the speech information from the speech signal and redeliver it to the electrods. The waveform strategies use some type of waveform (in analog or pulsatile form) derived by filtering the speech signal into different frequency bands. The feature extraction strategies use some type of spectral features, such as formants, derived by feature extraction algorithms.

There are various parameters that present the acoustic signal information to the electrodes in these signal processing strategies. The first parameter is the number of electrodes used for stimulus that decides the frequency resolution. Mostly, it is used as 16-22 electrodes for stimulation. This also depends on receivers of the individual cochlear implant relating to neuron population distribution. The second parameter is the electrode configuration. Different configurations are controled since the electric current distributes electrodes symmetrically. The most important parameter is the amplitude of the electric current which is constructed using some envelope detection algorithms from the filtered waveform. It is controlled by the loudness level of the stimulation that could be comprehended. The electric current amplitude includes spectral information generated from the time varying current amplitude levels on each electrode and on different electrodes stimulated in the same cycle.

The main strategies are discussed below.

#### **2.1 Compressed analog approach**

The strategy of compressed analog approach has been developed mainly by Symbion Company (Edington, 1980). In this system, first, the audio signal is compressed using the

transform adapts to the frequency axis and allows decomposition. This particular decomposition is implemented with the optimization criteria. Studies show that the filter banks used in the sound strategies of cochlear implants are parallel to the filter banks used in the wavelet transform. This indicates that the wavelet transform is a method that can be used in Cochlear implants. However, the author could not find any distinctive study on the

In this study, a new strategy, which uses the wavelet packet transform in obtaining the sequence of the electrode array to be stimulated, is proposed. Thus the stimulated electrode array is based upon but would be different from the N-of-M Strategy. An experimental study was performed and analyzed using Turkish words. The words were selected and grouped based on Turkish sound knowledge, and were recorded in the silent rooms of Ege University, Department of Otology. The experimental subjects listened to the words for both the N-of-M strategy and the proposed strategy, thus comparative results were obtained.

Multi-channel implants provide electrical stimulation in the cochlea using an array of electrodes. An electrode array is used so that different auditory nerve fibers can be stimulated at different places in the cochlea. Electrodes respond for each frequency of the signal and hair cells. The ones near the base of the cochlea are stimulated with high frequency signals while electrodes near the apex are stimulated with low frequency signals. The various signal processing strategies developed for multi-channel cochlea can be examined under two main categories: waveform strategies and feature-extraction strategies. These strategies extract the speech information from the speech signal and redeliver it to the electrods. The waveform strategies use some type of waveform (in analog or pulsatile form) derived by filtering the speech signal into different frequency bands. The feature extraction strategies use some type of spectral features, such as formants, derived by feature extraction

There are various parameters that present the acoustic signal information to the electrodes in these signal processing strategies. The first parameter is the number of electrodes used for stimulus that decides the frequency resolution. Mostly, it is used as 16-22 electrodes for stimulation. This also depends on receivers of the individual cochlear implant relating to neuron population distribution. The second parameter is the electrode configuration. Different configurations are controled since the electric current distributes electrodes symmetrically. The most important parameter is the amplitude of the electric current which is constructed using some envelope detection algorithms from the filtered waveform. It is controlled by the loudness level of the stimulation that could be comprehended. The electric current amplitude includes spectral information generated from the time varying current amplitude levels on each electrode and on different electrodes stimulated in the same cycle.

The strategy of compressed analog approach has been developed mainly by Symbion Company (Edington, 1980). In this system, first, the audio signal is compressed using the

analysis of speech using the wavelet packet transform for speech processing.

Better intelligibility results were attained by the usage of the proposed strategy.

**2. Strategies for cochlear implants** 

The main strategies are discussed below.

**2.1 Compressed analog approach** 

algorithms.

automatic gain control unit. Later the signal is filtered with four side by side frequency bands at the center frequencies of 0.5, 1, 2 and 3.4 kHz. Then, the filtered waveforms are changed into stimulation format and sent to four electrodes that were placed into the cochlea by surgical intervention (Dorman et al., 1989).

The compressed analog approach gives useful spectral information to the electrodes. But the channel interaction causes problems in the compressed analog approach. Since the stimulation is analog, the stimulus is transmitted continuously to the four electrodes at the same time. The simultaneous stimulation causes channel interaction and can negatively affect the performance of the device.

#### **2.2 Continuous interleaved sampling**

The continuous interleaved sampling approach was developed by researchers of the Research Triangle Institute (Wilson et al., 1991). In this strategy, the signal is sent through the electrode not by stimulation but by interleaved strokes. Here, the amplitude of the pulse is derived from the envelope of bandpass filter. Later, this resulting envelope is compressed and used for modulation of two-phase pulses. The patient's hearing unit is electronically excited. Non-linear compression function (i.e. logarithmic function) is used to ensure that the envelope output is suitable for the patient's dynamic range. A block diagram of the continuous interleaved strategy is shown in Figure 1.

Fig. 1. Detailed block diagram of continuous interleaved sampling speech strategy in cochlear implant (Loizou, 1998)

Wavelet Based Speech Strategy in Cochlear Implant 43

A time signal can be evaluated by a series of coefficients, based on an analysis function. For example, a signal can be transformed from time domain to frequency domain. The oldest and best known method for this is the Fourier transform. Joseph Fourier developed his method that represents signal contents by using basis functions in 1807. Based on this work the wavelet theory was developed by Alfred Haar in 1909. In 1930's, Paul Levy improved Haar basis function using scale varying. In 1981, a transformation method of decomposing a signal into wavelet coefficients and reconstructing the original signal form these coefficients was found by Jean Morlet and Alex Grossman. And Stephane Mallat and Yves Meyer derived multiresolution decomposition using wavelets. Later, Ingrid Daubechies developed a new wavelet analysis method to construct her own family of wavelets using the multiresolution theory. The set of wavelet orthonormal basis function based on Daubechies' work is the milestone of wavelet applications today. With these developments, theoretical

Generally, the Fourier transform is an efficient transform for stationary and pseudo stationary signals. But this technique is not suitable for the nonstationary signals such as noisy and aperiodic signals. These signals can be analyzed using local transformation methods; the short time Fourier transform, time-frequency distributions and wavelets. All these techniques analyzed the signal using the correlation between original signal and

Wavelet transform can be classified as; continuous wavelet transform, discrete wavelet transform and fast wavelet transform (Holschneider, 1989, Mallat, 1998; Meyer, 1992). The wavelet transform applied to a wide range of use, subjects including signal and image processing, and biomedical signal processing. The most important advantage of the wavelet transform is that it allows for the local analysis of the signal. Also, wavelet analysis reveals

A a wavelet function, ψ(t) is a small wave. In wavelet function, the wavelet must be zero as soon as possible while still having oscillatory. Therefore, it includes different frequencies to

ψ(t)

t

investigations of wavelet analysis began to accrue. (Merry, 2005).

such as discontinuities, corruptions etc. in a signal.

Fig. 4. An example wavelet, Coiflet wavelet

**3. Wavelet transform** 

analysis function.

#### **2.3 N-of-M (N-M) speech processing**

In N-M strategy, the audio signal is divided into m frequency bands; and the processor selects n number of the highest-energy envelope outputs (see in Figure 2). Only the electrodes corresponding to the selected n outputs are stimulated at each cycle (Nogueira et al., 2006). For example; in a strategy of 22-6, only 6 of the 22 channel outputs are selected, thus only these 6 selected channels are stimulated. N-M strategy is a hybrid strategy since it also includes feature representation. A general block diagram showing the feature inference for N-M system is given in Figure 3.

Fig. 2. Detailed block diagram of N-M speech strategy in Cochlear implant (Loizou, 1998)

Fig. 3. General block diagram showing the feature inference of N-M speech strategy in Cochlear implant (Loizou, 1998)

#### **3. Wavelet transform**

42 Cochlear Implant Research Updates

In N-M strategy, the audio signal is divided into m frequency bands; and the processor selects n number of the highest-energy envelope outputs (see in Figure 2). Only the electrodes corresponding to the selected n outputs are stimulated at each cycle (Nogueira et al., 2006). For example; in a strategy of 22-6, only 6 of the 22 channel outputs are selected, thus only these 6 selected channels are stimulated. N-M strategy is a hybrid strategy since it also includes feature representation. A general block diagram showing the feature inference

Fig. 2. Detailed block diagram of N-M speech strategy in Cochlear implant (Loizou, 1998)

Fig. 3. General block diagram showing the feature inference of N-M speech strategy in

**2.3 N-of-M (N-M) speech processing** 

for N-M system is given in Figure 3.

Cochlear implant (Loizou, 1998)

A time signal can be evaluated by a series of coefficients, based on an analysis function. For example, a signal can be transformed from time domain to frequency domain. The oldest and best known method for this is the Fourier transform. Joseph Fourier developed his method that represents signal contents by using basis functions in 1807. Based on this work the wavelet theory was developed by Alfred Haar in 1909. In 1930's, Paul Levy improved Haar basis function using scale varying. In 1981, a transformation method of decomposing a signal into wavelet coefficients and reconstructing the original signal form these coefficients was found by Jean Morlet and Alex Grossman. And Stephane Mallat and Yves Meyer derived multiresolution decomposition using wavelets. Later, Ingrid Daubechies developed a new wavelet analysis method to construct her own family of wavelets using the multiresolution theory. The set of wavelet orthonormal basis function based on Daubechies' work is the milestone of wavelet applications today. With these developments, theoretical investigations of wavelet analysis began to accrue. (Merry, 2005).

Generally, the Fourier transform is an efficient transform for stationary and pseudo stationary signals. But this technique is not suitable for the nonstationary signals such as noisy and aperiodic signals. These signals can be analyzed using local transformation methods; the short time Fourier transform, time-frequency distributions and wavelets. All these techniques analyzed the signal using the correlation between original signal and analysis function.

Wavelet transform can be classified as; continuous wavelet transform, discrete wavelet transform and fast wavelet transform (Holschneider, 1989, Mallat, 1998; Meyer, 1992). The wavelet transform applied to a wide range of use, subjects including signal and image processing, and biomedical signal processing. The most important advantage of the wavelet transform is that it allows for the local analysis of the signal. Also, wavelet analysis reveals such as discontinuities, corruptions etc. in a signal.

A a wavelet function, ψ(t) is a small wave. In wavelet function, the wavelet must be zero as soon as possible while still having oscillatory. Therefore, it includes different frequencies to

Fig. 4. An example wavelet, Coiflet wavelet

$$E = \int\_{-\infty}^{\infty} |\psi(t)|^2 dt < \infty \tag{1}$$

$$\mathcal{L}\_{\psi} = \int\_{0}^{\infty} \frac{\left| \vec{\psi}(\omega) \right|^{2}}{\omega} d\omega < \infty \tag{2}$$

$$
\psi\_{a,b} = \frac{1}{\sqrt{|a|}} \psi \left(\frac{t-b}{a}\right), \qquad \qquad a, b \in \mathbb{R}, \quad a \neq 0 \tag{3}
$$

$$X\_{WT}(a,b) = \int\_{-\infty}^{\omega} f(t) \,\psi\_{a,b}^\*(t) dt \tag{4}$$

$$f(t) = \frac{1}{c\_{\psi}^{2}} \int\_{-\infty}^{\infty} \int\_{-\infty}^{\infty} X\_{WT}(a, b) \frac{1}{a^{2}} \psi\left(\frac{t - b}{a}\right) db da \tag{5}$$

$$
\psi\_{m.n} = a\_0^{-m/2} \psi(a\_0^{-m} x - n b\_0), \qquad \text{ } m, n \in \mathbb{Z} \tag{6}
$$


$$\phi\_{l,k} = 2^{-j/2} \phi \{ 2^{-j} \mathbf{t} - \mathbf{k} \} \qquad j, \mathbf{k} \epsilon \mathbb{Z} \tag{7}$$

$$
\psi\_{l,k} = 2^{-j/2} \psi \{ 2^{-j} \mathfrak{t} - k \} \qquad j, k \epsilon \mathbb{Z} \tag{8}
$$

$$
\psi\_{l+1}^{2k}(t) = \Sigma\_{n=-\infty}^{+\infty} h[n] \psi\_{l}^{k} \left( t - 2^{\langle n \rangle} \right) \tag{9}
$$

$$
\psi\_{j+1}^{2k+1}(t) = \Sigma\_{n=-\infty}^{+\infty} g[n] \psi\_j^k \{t - 2^{j\_n}\} \tag{10}
$$

$$h[n] = \langle \psi\_{j+1}^{2k+1}(t), \psi\_{j}^{k}(t - \mathcal{Z}^{/n}) \rangle \tag{11}$$

$$\mathbf{g}[\mathbf{n}] = \langle \psi\_{\mathbf{j}+1}^{2\mathbf{k}+1}(\mathbf{t}), \psi\_{\mathbf{j}}^{\mathbf{k}}(\mathbf{t} - \mathbf{2}^{\mathbf{j}\_{\mathbf{n}}}) \rangle \tag{12}$$

Wavelet Based Speech Strategy in Cochlear Implant 49

� ∗ h�[2t] (14)

� ∗ g�[2t] (15)

� with ℎ� and ��, the coefficients must be obtained. By

�� . This number may be very large, and since explicit

���� ∗ g[2t] (16)

Consequently, coefficients of decomposition and reconstruction are

d� �[t] = d�

packet coefficients. This is given in Figure 8 and Figure 9.

Fig. 8. Two level wavelet packet decomposition with down sampling

Fig. 9. Two level wavelet packet reconstruction with up sampling

binary tree of depth L, where ��2�

Best tree function is a one- or two-dimensional wavelet packet analysis function that computes the optimal sub tree of an initial tree with respect to an entropy type criterion (Coifman&Wickerhauser, 1992). The resulting tree may be much smaller than the initial one. Following the organization of the wavelet packets library, it is natural to count the decompositions issued from a given orthogonal wavelet. A signal of length N = 2L can be expanded in α different ways, where α is the number of binary sub trees of a complete

enumeration is generally intractable, it is interesting to find an optimal decomposition with

To sub sampling of the convolution of d�

D��� �� [t] = d�

d���

���

����[t] = d�

�� ∗ h[2t] + d�

the iteration of these equations, all the branches of the tree are computed by the wavelet

���

Fig. 6. General structure of the wavelet packet transform

Fig. 7. The phase plane of the wavelet packet transform

From the first rule of the multiresolution analysis, the recursive splitting determines a binary tree of the wavelet packet spaces, which are defined as

$$W\_{j+1}^{2k} \oplus W\_{j+1}^{2k+1} = W\_j^k \tag{13}$$

Fig. 6. General structure of the wavelet packet transform

Fig. 7. The phase plane of the wavelet packet transform

binary tree of the wavelet packet spaces, which are defined as

ܹାଵ

From the first rule of the multiresolution analysis, the recursive splitting determines a

ଶାଵ ൌ ܹ

(13)

ଶ ْ ܹାଵ

Consequently, coefficients of decomposition and reconstruction are

$$\mathbf{d}\_{\mathbf{l}\mathbf{l}+1}^{2\mathbf{k}}\mathbf{l}\mathbf{l}\mathbf{l} = \mathbf{d}\_{\mathbf{l}}^{\mathbf{k}} \* \mathbf{\tilde{h}}\mathbf{[2t]}\tag{14}$$

$$\mathbf{d}\_{\mathbf{l}\uparrow+1}^{2\mathbf{k}+1}\mathbf{[t]} = \mathbf{d}\_{\mathbf{l}}^{\mathbf{k}} \* \mathbf{\tilde{g}}\mathbf{[2t]}\tag{15}$$

$$\mathbf{d}\_{\parallel}^{\mathbf{k}}[\mathbf{t}] = \mathbf{\tilde{d}}\_{\parallel+1}^{2\mathbf{k}} \* \mathbf{h}[2\mathbf{t}] + \mathbf{\tilde{d}}\_{\parallel+1}^{2\mathbf{k}+1} \* \mathbf{g}[2\mathbf{t}] \tag{16}$$

To sub sampling of the convolution of d� � with ℎ� and ��, the coefficients must be obtained. By the iteration of these equations, all the branches of the tree are computed by the wavelet packet coefficients. This is given in Figure 8 and Figure 9.

Fig. 8. Two level wavelet packet decomposition with down sampling

Fig. 9. Two level wavelet packet reconstruction with up sampling

Best tree function is a one- or two-dimensional wavelet packet analysis function that computes the optimal sub tree of an initial tree with respect to an entropy type criterion (Coifman&Wickerhauser, 1992). The resulting tree may be much smaller than the initial one. Following the organization of the wavelet packets library, it is natural to count the decompositions issued from a given orthogonal wavelet. A signal of length N = 2L can be expanded in α different ways, where α is the number of binary sub trees of a complete binary tree of depth L, where ��2� �� . This number may be very large, and since explicit enumeration is generally intractable, it is interesting to find an optimal decomposition with

Wavelet Based Speech Strategy in Cochlear Implant 51

In the new speech processing method, the best tree is determined from a block, which is structured to eliminate noise and unwanted elements of the sound signal. Thus the new output electrodes are determined with less error than those of N-of-M strategy. Moreover better transmittance is obtained by minimizing the interference between neighboring

The channel outputs of the matching function is determined by finding the best tree in the

Here, Ek shows the electrode output, and M shows the matching function. ܥǡ is the wavelet coefficients. The matching function shows the relationship between electrodes and output

In this study, 22 electrodes are used for stimulation and the frequency position function of

is used to calculate channel frequency bands for cochlea (Greenwood, 1990). Here; f shows the frequency in Hz unit, while x describes the length ratio of base from 0 to 1. 'A' and 'a' are constant and their values are 156,4 and 2,1 respectively. The selection of electrodes is the same as that of N-M model. In this study, 6 electrodes are selected from 22 channel electrodes and only these 6 electrodes, which have the highest amplitude, are analyzed

The proposed new strategy is essentially based on N-M strategy. The input waveform is given in Figure 13. In addition, the output waveforms for the proposed strategy and the N-M strategy are given in Figure 14 and Figure 15, showing that the input and output

As shown from the graphs, traditional N-M strategy removes some high frequency components that are between 25 ms and 75 ms at wideband spectrogram, high frequency components are very important for intelligibility and consonant recognition such as 's', 'ş', 'f', and such. New method keeps high frequency components using the wavelet packet transform because the wavelet packet transform analyses high frequency components as well as low frequency components. Another effect is mother wavelet selection; and experimentally Daubechies 10 is found to be more effective in high frequency analysis.

(17) ǡܥήܯൌ ܧ

݂ ൌ ܣ ή ͳͲ௫ െ ݇ (18)

Fig. 12. Block diagram of the new speech processing method

matching block. The used matching function is described as

nodes of the wavelet packet transform.

using the wavelet packet transform.

waveforms are very similar.

channels.

respect to a convenient criterion, computable by an efficient algorithm (Donoho, 1995; Guo et al., 2000; Johnstone, 1997).

The difference between the discrete wavelet transform and the wavelet packet transform is in the decomposition of detail space. The wavelet packet transform decomposes not only the approximation space but also the detail space. This means that it can separate frequency band uniformly. Figure 10 and Figure 11 show 2-level analysis and synthesis part of the discrete wavelet transform for comparison.

Fig. 10. Two level analysis part of the discrete wavelet transform

Fig. 11. Two level synthesis part of the discrete wavelet transform

#### **5. Wavelet based speech strategy**

The proposed speech processing strategy is based on wavelet packet transform. The strategy consists of five basic parts. Since the basis for the selection of electrode is frequency selection, it is improved by the use of the wavelet packet transform. The main wavelet function is experimentally selected as Daubechies 10. It is analyzed till level 8. Hanning window is used, to prevent short-term changes of the signal in windowing. A block diagram showing the proposed strategy is given in Figure 12. The channel outputs of the matching function is determined by finding the best tree in the matching block. The matching function shows the relationship between electrodes and output nodes of the wavelet packet transform. In this study the number of electrodes used to stimulate the cochlea is 22. The frequency position function is used to calculate channel frequency bands for cochlea. The selection of electrodes is the same as that of N-of-M model. 6 electrodes are selected for 22 channel electrodes. Only these 6 electrodes, with the highest amplitude, are analyzed using the wavelet packet transform.

respect to a convenient criterion, computable by an efficient algorithm (Donoho, 1995; Guo

The difference between the discrete wavelet transform and the wavelet packet transform is in the decomposition of detail space. The wavelet packet transform decomposes not only the approximation space but also the detail space. This means that it can separate frequency band uniformly. Figure 10 and Figure 11 show 2-level analysis and synthesis part of the

et al., 2000; Johnstone, 1997).

discrete wavelet transform for comparison.

Fig. 10. Two level analysis part of the discrete wavelet transform

Fig. 11. Two level synthesis part of the discrete wavelet transform

The proposed speech processing strategy is based on wavelet packet transform. The strategy consists of five basic parts. Since the basis for the selection of electrode is frequency selection, it is improved by the use of the wavelet packet transform. The main wavelet function is experimentally selected as Daubechies 10. It is analyzed till level 8. Hanning window is used, to prevent short-term changes of the signal in windowing. A block diagram showing the proposed strategy is given in Figure 12. The channel outputs of the matching function is determined by finding the best tree in the matching block. The matching function shows the relationship between electrodes and output nodes of the wavelet packet transform. In this study the number of electrodes used to stimulate the cochlea is 22. The frequency position function is used to calculate channel frequency bands for cochlea. The selection of electrodes is the same as that of N-of-M model. 6 electrodes are selected for 22 channel electrodes. Only these 6 electrodes, with the highest amplitude, are analyzed using

**5. Wavelet based speech strategy** 

the wavelet packet transform.

Fig. 12. Block diagram of the new speech processing method

In the new speech processing method, the best tree is determined from a block, which is structured to eliminate noise and unwanted elements of the sound signal. Thus the new output electrodes are determined with less error than those of N-of-M strategy. Moreover better transmittance is obtained by minimizing the interference between neighboring channels.

The channel outputs of the matching function is determined by finding the best tree in the matching block. The used matching function is described as

$$E\_k = \mathcal{M} \cdot \mathcal{C}\_{l,j} \tag{17}$$

Here, Ek shows the electrode output, and M shows the matching function. ܥǡ is the wavelet coefficients. The matching function shows the relationship between electrodes and output nodes of the wavelet packet transform.

In this study, 22 electrodes are used for stimulation and the frequency position function of

$$f = A \cdot 10^{a\chi} - k \tag{18}$$

is used to calculate channel frequency bands for cochlea (Greenwood, 1990). Here; f shows the frequency in Hz unit, while x describes the length ratio of base from 0 to 1. 'A' and 'a' are constant and their values are 156,4 and 2,1 respectively. The selection of electrodes is the same as that of N-M model. In this study, 6 electrodes are selected from 22 channel electrodes and only these 6 electrodes, which have the highest amplitude, are analyzed using the wavelet packet transform.

The proposed new strategy is essentially based on N-M strategy. The input waveform is given in Figure 13. In addition, the output waveforms for the proposed strategy and the N-M strategy are given in Figure 14 and Figure 15, showing that the input and output waveforms are very similar.

As shown from the graphs, traditional N-M strategy removes some high frequency components that are between 25 ms and 75 ms at wideband spectrogram, high frequency components are very important for intelligibility and consonant recognition such as 's', 'ş', 'f', and such. New method keeps high frequency components using the wavelet packet transform because the wavelet packet transform analyses high frequency components as well as low frequency components. Another effect is mother wavelet selection; and experimentally Daubechies 10 is found to be more effective in high frequency analysis.

Wavelet Based Speech Strategy in Cochlear Implant 53

'Determine optimum tree' block eliminates noise and unnecessary components in speech signal. Therefore, it obtains better result than N-M strategy for electrode selection. New strategy output channels are more accurate than N-M strategy and conduces to reduce

Fig. 15. Wide-band and narrow-band spectrum of the output signal for the N-M strategy

Time (ms)

The experimental study was carried out on 20 healthy subjects between the ages of 23 and 30. The main language of all the subjects is Turkish and they have obtained air conduction thresholds better than 20 dB at octave frequencies ranging from 250 to 6000 Hz bilaterally. Since the subjects would listen to the words in both algorithms and the possibility exists that the subjects could memorize the given words and their order, separate word lists were arranged for the new strategy, and the N-of-M strategy. The arrangements were attained consultation with a Turkish Language specialist and by referring to The Turkish Language Dictionary by Turkish Language Association. All words were balanced in terms of speech knowledge and degree of difficulty at both of the lists. The usage frequencies of vowels and consonants in the lists were determined according to Turkish grammar. In addition, the lists were recorded in the silent rooms of Ege University, the Department of Otology, and the

The test subjects listened to the lists from a microphone that was directly connected to their heads. While listening, they were asked to write the words they heard and leave a space for the words they could not comprehend within a table format in the listening order. The

doctors of the Department approved the reader's sound levels.

intelligibility percentage is calculated as follows.

interaction between neighbour channels.

Frequency (Hz)

**6. Experimental study** 

Fig. 13. Wide-band and narrow-band spectrum of the input signal

Fig. 14. Wide-band and narrow-band spectrum of the output signal for the proposed speech processing method

Time (ms)

Time (ms)

Fig. 13. Wide-band and narrow-band spectrum of the input signal

Fig. 14. Wide-band and narrow-band spectrum of the output signal for the proposed speech

processing method

Frequency (Hz)

Frequency (Hz)

'Determine optimum tree' block eliminates noise and unnecessary components in speech signal. Therefore, it obtains better result than N-M strategy for electrode selection. New strategy output channels are more accurate than N-M strategy and conduces to reduce interaction between neighbour channels.

Fig. 15. Wide-band and narrow-band spectrum of the output signal for the N-M strategy

## **6. Experimental study**

The experimental study was carried out on 20 healthy subjects between the ages of 23 and 30. The main language of all the subjects is Turkish and they have obtained air conduction thresholds better than 20 dB at octave frequencies ranging from 250 to 6000 Hz bilaterally. Since the subjects would listen to the words in both algorithms and the possibility exists that the subjects could memorize the given words and their order, separate word lists were arranged for the new strategy, and the N-of-M strategy. The arrangements were attained consultation with a Turkish Language specialist and by referring to The Turkish Language Dictionary by Turkish Language Association. All words were balanced in terms of speech knowledge and degree of difficulty at both of the lists. The usage frequencies of vowels and consonants in the lists were determined according to Turkish grammar. In addition, the lists were recorded in the silent rooms of Ege University, the Department of Otology, and the doctors of the Department approved the reader's sound levels.

The test subjects listened to the lists from a microphone that was directly connected to their heads. While listening, they were asked to write the words they heard and leave a space for the words they could not comprehend within a table format in the listening order. The intelligibility percentage is calculated as follows.

Wavelet Based Speech Strategy in Cochlear Implant 55

method were separately applied to this data. The results were evaluated by looking at the SNR given in Figure 17. Consequently, the proposed method gives better results for each

In this study, a new speech processing strategy, based on the wavelet packet transform, is proposed for Cochlear implant applications. The foundation of the system lies in, first, obtaining the highest-energy coefficients, and then stimulating the linked electrodes and

The core of the system is based on the wavelet packet transform and it also uses the energy of the wavelet coefficients. By the application of various tests, the effect of intelligibility and noise resistance for the suggested speech processing method was investigated. Then, a new

type of added noise.

Fig. 17. SNR for different types of noise

therefore improving the deaf patient's hearing ability.

**7. Conclusion** 

$$\text{Intelligibility} = \frac{\text{Number of correct words}}{\text{Number of all the words on the list}} \times 100\tag{19}$$

The results of intelligibility test are given in Table 1 for both the N-of-M Strategy and the Proposed Speech Processing Strategy. The intelligibility percent of the proposed wavelet packet transform based strategy is higher than that of N-of-M strategy. The average percentages of intelligibility according to gender are also given in Figure 16.

Fig. 16. Graphical representation of the intelligibility test results


Table 1. Intelligibility test results for both the N-M strategy and the proposed strategy

To test the noise resistance of the proposed strategy, SNR improvement test was applied to the system. This test was performed by adding 5 dB of pink noise, F-16 noise, Volvo noise and factory noise respectively to the recorded data. All the added noise is real noise recorded as indicated by their names. Later, both the proposed method and the N-of-M

The results of intelligibility test are given in Table 1 for both the N-of-M Strategy and the Proposed Speech Processing Strategy. The intelligibility percent of the proposed wavelet packet transform based strategy is higher than that of N-of-M strategy. The average

percentages of intelligibility according to gender are also given in Figure 16.

Fig. 16. Graphical representation of the intelligibility test results

interval

Table 1. Intelligibility test results for both the N-M strategy and the proposed strategy

To test the noise resistance of the proposed strategy, SNR improvement test was applied to the system. This test was performed by adding 5 dB of pink noise, F-16 noise, Volvo noise and factory noise respectively to the recorded data. All the added noise is real noise recorded as indicated by their names. Later, both the proposed method and the N-of-M

**Number of test** 

Male 23-29 8 81,93 Female 23-33 12 79,17 Total 23-33 20 80,55

Male 23-29 8 77,60 Female 23-33 12 75,21 Total 23-33 20 76,40

**subjects** Intelligibility (**%)**

Gender Age

**New/Proposed** 

**N-M** 

Number of correct words

Number of all the words on the list × 100 (19)

Intelligibility =

method were separately applied to this data. The results were evaluated by looking at the SNR given in Figure 17. Consequently, the proposed method gives better results for each type of added noise.

Fig. 17. SNR for different types of noise

### **7. Conclusion**

In this study, a new speech processing strategy, based on the wavelet packet transform, is proposed for Cochlear implant applications. The foundation of the system lies in, first, obtaining the highest-energy coefficients, and then stimulating the linked electrodes and therefore improving the deaf patient's hearing ability.

The core of the system is based on the wavelet packet transform and it also uses the energy of the wavelet coefficients. By the application of various tests, the effect of intelligibility and noise resistance for the suggested speech processing method was investigated. Then, a new

Wavelet Based Speech Strategy in Cochlear Implant 57

Deller J. R., Proakis J. G., Hansen J. H. L. (1994). *Discrete-time processing of speech signals*, New

Derbel A., Ghorbel M., Samet M., Ben Hamida A. (2009). Implementation of strategy based

Dorman M., Hannley M., Dankowski K., Smith L., McCandless G. (1989). Word recognation

Dorman M., Loizou P., Rainey D. (1997). Speech intelligibility as a function of the number of

outputs, *Journal of the Acoustical Society of America*, vol. 102, pp. 2403-2411 Donoho L. (1995). Denosing by soft thresholding, *IEEE Trans. on Information Theory*, vol. 41,

Eddington D. (1980). Speech discrimination in deaf subjects with cochlear implants, *J.* 

Greenwood D. D. (1990). A cochlear frequency-position function for several species-29 years

Gopalakrishna V., Kehtarnavaz N., Loizou P. C. (2010). A recursive wavelet-based strategy

Guo D., Zhu W. H., Gao Z. M., Zhang J. Q. (2000). A study of wavelet thresholding denoising, *IEEE International conference on signal processing*, vol.1, pp. 329-332 Herley C., Vetterly M. (1994). Orthonormal time varying filterbanks and wavelet packets,

Holschneider M. (1989). Wavelet, time frequency methods and phase space, *First* 

House W., Urban J. (1973). Long term results of electrode implantation and electronic

House W., Berliner K. (1982). Cochlear implants: Progress and perspectives, *Annals of* 

Johnstone I. M. (1997). Wavelet threshold estimators for data with correlated noise, *J. Royal* 

Mallat S. (1989). A theory for multiresolution signal decomposition, *IEEE Trans. on Pattern* 

Merry R. J. E.(2005) Wavelet Theory and Applications A literature study, Eindhoven Un. of

Tech., Dept. of Mechanical Engineering, Control Systems Technology Group,

Loeb G. (1990). Cochlear prosthetics, *Annual Review in Neuroscience*, vol. 13, pp. 357-371 Loizou P. (1998) Mimicking the human ear*, IEEE Signal Processing Magazine*, vol. 15, no. 5,

Mallat S. (1998). *A wavelet tour of signal processing*, New York, Academic Press

http://www.narcis.nl/publication/RecordID/oai:library.tue.nl:612762

stimulation of the cochlea in man, *Annals of Otology, Rhinology and Laryngology*, vol.

for real time cochlear implant speech processing on PDA platforms, *IEEE Trans. On* 

on auditory model based wavelet transform speech processing on DSP dedicated to cochlear prosthesis, *Int. Sym. On Comp. Intelligence and Intelligent Informatics*, pp.

by 50 patients fitted the Symbion multichannel cochlear implant, *Ear and Hearing*,

channels of stimulation for signal processors using sine-wave and noise-band

York, Macmillan Publishing Company

*Acoust. Soc. Am.*, vol. 68, no. 3, pp. 885-891, 1980

*Biomedical Eng.*, vol. 57, no. 8, pp. 2053-2063

*IEEE Trans. on Signal Processing*, vol. 41, no. 10

*Otology, Rhinology and Laryngology*, pp. 1-124

*Statistical Society*, vol. 59, no. 2, pp. 319-351

*Anal. Machine Intell.*, vol. 11, pp. 674-693

Eindhoven, June 7, 02.05.2008 Available from

Meyer Y. (1992). *Wavelets and Operators*, Cambridge University

*International conference on wavelet*, Springer-Verlag

later, *J. Acoust. Soc. Am.*, vol. 87, no. 6, pp. 2593-2605

143-148

vol. 10, no. 1

no. 3, pp. 613-627

82, pp. 504-517

pp. 101-130, 1998

electrode selection algorithm, which depends on wavelet entropy distribution, was presented. The proposed electrode selection increases the noise performance and intelligibility. Additionally, the performance of the proposed method is better than the traditional and recently published methods.

In this study, the system was tested in terms of intelligibility; besides, SNR results were compared with those of the N-of-M Strategy. As a result, the proposed method was observed to increase performance in terms of intelligibility.

In the part of the wavelet packet transform, the determination of the optimum tree using the best tree functions is a significant part of this study because this part eliminates noise and unnecessary components from the speech signal. It also helps to improve intelligibility of speech in noisy environments.

During the live experiments session, people with normal hearing are used and also all results are based on only normal-hearing people. A further study on patients who use cochlear implant would give more accurate results for intelligibility.

For future works, hybrid mother wavelet in wavelet decomposition process is recommended. In this hybrid model, Daubechies family can be used for low-pass filter decomposition and Symlet family for high-pass filter decomposition. Moreover, the mother wavelet for deciding the choice of wavelet family can be selected according to speech signal characteristics in real time. This might give better results for speech intelligibility. Additionally, the bionic wavelet instead of the wavelet packet transform can be suggested in entire speech processing (Yuan, 2003).

#### **8. Acknowledgment**

We would like to specially thank the Department of Otology in Ege University and Yahya Ozturk for their assistance in this work.

#### **9. References**


electrode selection algorithm, which depends on wavelet entropy distribution, was presented. The proposed electrode selection increases the noise performance and intelligibility. Additionally, the performance of the proposed method is better than the

In this study, the system was tested in terms of intelligibility; besides, SNR results were compared with those of the N-of-M Strategy. As a result, the proposed method was

In the part of the wavelet packet transform, the determination of the optimum tree using the best tree functions is a significant part of this study because this part eliminates noise and unnecessary components from the speech signal. It also helps to improve intelligibility of

During the live experiments session, people with normal hearing are used and also all results are based on only normal-hearing people. A further study on patients who use

For future works, hybrid mother wavelet in wavelet decomposition process is recommended. In this hybrid model, Daubechies family can be used for low-pass filter decomposition and Symlet family for high-pass filter decomposition. Moreover, the mother wavelet for deciding the choice of wavelet family can be selected according to speech signal characteristics in real time. This might give better results for speech intelligibility. Additionally, the bionic wavelet instead of the wavelet packet transform can be suggested in

We would like to specially thank the Department of Otology in Ege University and Yahya

Cheikhrouhou I., Atitallah R. B., Ouni K., Hamida A. B., Mamoudi N., Ellouze N. (2004).

Christiansen J. B., Leigh I. W. (2002). *Cochlear implants in children: Ethics and choices*,

Coifman R. R., Meyer Y., Wickerhauser M. V. (1992). Wavelet Analysis and Signal

Coifman R. R., Wickerhauser M. V. (1990). *Best adapted wave packet bases, Numerical Algorithm*, Research Group, Dept. of Mathematics, Yale University, NewHaven, Connecticut Coifman R. R., Wickerhauser M. V. (1992). Entropy based algorithms for best basis selection,

Daubechies I. (1992). *Ten Lectures on Wavelets*, Society for Industrial and Applied

Speech analysis using wavelet transforms dedicated to cochlear prosthesis stimulation strategy, *Int. Sym. On Control, Communication and Signal Processing*, pp.

traditional and recently published methods.

speech in noisy environments.

entire speech processing (Yuan, 2003).

Ozturk for their assistance in this work.

Washington, Gallaudet University Press

Mathematics, ISBN 0-89871-274-2

*IEEE Trans. on Information Theory*, vol. 38, part 2

Debnath L. (2002). Wavelet Transforms and Their Applications, Birhauser

**8. Acknowledgment** 

639-642

Processing

**9. References** 

observed to increase performance in terms of intelligibility.

cochlear implant would give more accurate results for intelligibility.


http://www.narcis.nl/publication/RecordID/oai:library.tue.nl:612762

Meyer Y. (1992). *Wavelets and Operators*, Cambridge University

**4** 

**Strategies to Improve Music Perception in** 

*2Faculty of Medicine, School of Medicine, National Yang-Ming University* 

*4Integrated Brain Research Laboratory, Taipei Veterans General Hospital* 

Cochlear implants have been an effective device for the management of patients with total or profound hearing loss over the past few decades. Significant improvements in speech and language can be observed in implantees following rehabilitation. In spite of remarkable linguistic perception, however, it is difficult for these patients to enjoy music although we did see some "superstars" for music performance in our patients. This article aimed to clarify current opinions on the strategies to improve music perception ability in this population of subjects. In part I, we included one of our previous work (Chen et al., 2010) talking about the effect of music training on pitch perception in prelingually deafened children with a cochlear implant. In part II, other factors related to the improvement of music perception in cochlear implantees were discussed, including residual hearing, bimodal hearing, and coding strategies. Evidences from results of our researches and from

**2. Part I: Music training improves pitch perception in prelingually deafened** 

Cochlear implants have been an effective device for the management of deaf children over the past few decades. Significant improvements in speech and language can be observed in implanted children following rehabilitation. In spite of remarkable linguistic perception, however, it is difficult for these children to enjoy music (Galvin et al., 2007; McDermott, 2004). Essential attributes of music include rhythm, timbre, and pitch. Previous studies have shown that perception of rhythm is easier than timbre and pitch for cochlear implant users (Gfeller & Lansing, 1991). Recognition of timbre depends, at least partly, on the

**1. Introduction** 

**2.1 Introduction** 

Corresponding Author

 \*

literature review will both be presented.

**children with cochlear implants** 

Joshua Kuang-Chao Chen1,2, Catherine McMahon3 and

*1Department of Otolaryngology, Cheng Hsin General Hospital* 

*3Center for Language Sciences, Macquarie University* 

**Cochlear Implantees** 

Lieber Po-Hung Li1,2,4,\*

*1,2,4Taiwan 3Australia* 

