**4. Results and discussions**

For the evaluation of the proposed technique, we have applied it to twenty Arabic speech signals pronounced by a male and female speakers. Those signals are corrupted in artificial manner with additive manner by two types of noise (White Gaussian and Car noises) at different values of *SNRi* before denoising.. The used Arabic speech signals (**Table 1**) are material phonetically balanced and they are sampled at 16 *kHz*.

*Speech Enhancement Based on LWT and Artificial Neural Network and Using MMSE Estimate… DOI: http://dx.doi.org/10.5772/intechopen.96365*

Also for the evaluation of the proposed speech enhancement approach, we have applied the denoising technique based on *MMSE* Estimate of Spectral Amplitude [40]. This evaluation is performed in terms of Signal to Noise Ratio (*SNR*), the Segmental *SNR* (*SSNR*) and the Perceptual Evaluation of Speech Quality (*PESQ*). In **Tables 2**–**7** are listed the results obtained from the computations of SNRf (after


#### **Table 1.**

*tansig n*ð Þ¼ 1*=*ð Þ 1 þ *exp* ð Þ �*n* (1)

Generally, neural networks consist of a minimum of two layers (one hidden layer and another output layer). The input information is connected to the hidden layers through weighted connections where the output data is calculated. The number of hidden layers and the number of neurons in each layer controls the performance of the network. According to [41], there are no guidelines for deciding a manner for selecting the number of neurons along with number of hidden layers for a given problem to give the best performance. And it is still a trial and error

For training each *ANN* used in this work we have employed 50 speech signals and 10 others used for testing those networks. Therefore, for training each used ANN, we used 50 couples of Input and Target ð Þ *P*, *T* . Evidently, the noisy speech signals used for the *ANNs* testing do not belong to the training database. The different parameters used for the training of the used ANNs are the epochs number which is equal to 5000, the momentum, μ or Mu which is equal to 0.1, the gradient minimum which is equal to 1e � 7. The employed training algorithm is Leverberg-

In summary, the novelty of the proposed technique consists in applying the denoising technique based on MMSE Estimate of Spectral Amplitude [40]. Also, we apply the ANN for computing ideals thresholds to be used for thresholding of the noisy details coefficients obtained from the application of the *LWT* to the noisy

For the evaluation of the proposed technique, we have applied it to twenty Arabic speech signals pronounced by a male and female speakers. Those signals are corrupted in artificial manner with additive manner by two types of noise (White Gaussian and Car noises) at different values of *SNRi* before denoising.. The used Arabic speech signals (**Table 1**) are material phonetically balanced and they are

design method [41].

**Figure 1.**

*The architecture of the ANN used in this work.*

*Deep Learning Applications*

Marquardt.

speech signal.

sampled at 16 *kHz*.

**26**

**4. Results and discussions**

*Purelin n*ð Þ¼ *n* (2)

*The list of the employed Arabic speech sentences.*


#### **Table 2.**

*Results in term of SNR (signal 4 (female voice) corrupted by Gaussian white noise).*


#### **Table 3.**

*Results in term of SSNR (signal 4 (female voice) corrupted by Gaussian white noise).*

denosing), of SSNR and PESQ and this for the proposed technique and The denoising technique based on MMSE Estimate of Spectral Amplitude [40].

According to those results listed in **Tables 2**–**7**, the best results are in Red colour. In terms of *SNRf* (After denoising), and *SSNR*, the best results are those obtained

from the application of the proposed speech enhancement technique. However, in term of PESQ, the denoising technique based on *MMSE* Estimate of Spectral

*Speech Enhancement Based on LWT and Artificial Neural Network and Using MMSE Estimate…*

In **Figures 2**–**5** are illustrated some examples of speech enhancement using the

These Figures show the efficiency of the proposed speech enhancement technique. In fact, it permits to reduce considerably the noise while conserving the original signal and this especially when the *SNRi* is higher (5, 10 and 15 dB).

In our future work and in order to improve this proposed speech enhancement technique, we will use a Deep Neural Network instead (DNN) instead of a simple ANN and other transforms such as Empirical Mode Decomposition (EMD).

> **PESQ The Denoising approach**

*An example of speech enhancement applying the proposed technique: Signal 4 (pronounced by a female voice (Table 1) corrupted by Gaussian white noise with SNRi* ¼ 10*dB (before enhancement)). After enhancement*

*we have: SNRf* ¼ 19*:*8933*, SSNR* ¼ 6*:*8038 *and PESQ* ¼ 2*:*2016*.*

**The denoising technique based on MMSE Estimate of Spectral Amplitude [40]**

Amplitude [40] is slightly better than the proposed technique.

�5 **2.2837 2.4021 2.5999 2.7163 2.8709 3.0184 3.1190 3.2461 3.3590 3.4789**

*Results in term of PESQ (signal 8 (male voice) corrupted by car noise).*

**The proposed speech enhancement technique**

*DOI: http://dx.doi.org/10.5772/intechopen.96365*

proposed technique.

**SNRi (dB)**

**Table 7.**

**Figure 2.**

**29**


#### **Table 4.**

*Results in term of PESQ (signal 4 (female voice) corrupted by Gaussian white noise).*


#### **Table 5.**

*Results in term of SNR (signal 2 (male voice) corrupted by car noise).*


#### **Table 6.**

*Results in term of SSNR (signal 8 (male voice) corrupted by car noise).*

*Speech Enhancement Based on LWT and Artificial Neural Network and Using MMSE Estimate… DOI: http://dx.doi.org/10.5772/intechopen.96365*

from the application of the proposed speech enhancement technique. However, in term of PESQ, the denoising technique based on *MMSE* Estimate of Spectral Amplitude [40] is slightly better than the proposed technique.

In **Figures 2**–**5** are illustrated some examples of speech enhancement using the proposed technique.

These Figures show the efficiency of the proposed speech enhancement technique. In fact, it permits to reduce considerably the noise while conserving the original signal and this especially when the *SNRi* is higher (5, 10 and 15 dB).

In our future work and in order to improve this proposed speech enhancement technique, we will use a Deep Neural Network instead (DNN) instead of a simple ANN and other transforms such as Empirical Mode Decomposition (EMD).


#### **Table 7.**

denosing), of SSNR and PESQ and this for the proposed technique and The denoising technique based on MMSE Estimate of Spectral Amplitude [40].

**SNRi (dB)**

*Deep Learning Applications*

**Table 4.**

**SNRi (dB)**

**Table 5.**

**SNRi (dB)**

**Table 6.**

**28**

**The proposed speech enhancement technique**

**The proposed speech enhancement technique**

�5 **1.3225 1.3755 1.5935 1.6320 1.8812 1.8977 2.2016 2.2311 2.5147 2.6079**

*Results in term of PESQ (signal 4 (female voice) corrupted by Gaussian white noise).*

�5 **5.8737** 4.2192 **9.8414** 8.3451 **14.1647** 12.6024 **18.5308** 17.4120 **22.5102** 21.4578

�5 **0.2145** �1.1347 **2.7478** 1.7861 **5.6644** 4.7166 **8.8942** 7.8228 **11.9663** 10.9850

*Results in term of SSNR (signal 8 (male voice) corrupted by car noise).*

*Results in term of SNR (signal 2 (male voice) corrupted by car noise).*

**The proposed speech enhancement technique**

According to those results listed in **Tables 2**–**7**, the best results are in Red colour. In terms of *SNRf* (After denoising), and *SSNR*, the best results are those obtained

> **PESQ The Denoising approach**

> **SNRf (dB) The Denoising approach**

> **SSNR (dB) The Denoising approach**

**The denoising technique based on MMSE Estimate of Spectral Amplitude [40]**

**The denoising technique based on MMSE Estimate of Spectral Amplitude [40]**

**The denoising technique based on MMSE Estimate of Spectral Amplitude [40]**

*Results in term of PESQ (signal 8 (male voice) corrupted by car noise).*

#### **Figure 2.**

*An example of speech enhancement applying the proposed technique: Signal 4 (pronounced by a female voice (Table 1) corrupted by Gaussian white noise with SNRi* ¼ 10*dB (before enhancement)). After enhancement we have: SNRf* ¼ 19*:*8933*, SSNR* ¼ 6*:*8038 *and PESQ* ¼ 2*:*2016*.*

#### **Figure 3.**

*An example of speech enhancement applying the proposed technique: Signal 1 (pronounced by a male voice (Table 1) corrupted by Gaussian white noise with SNRi* ¼ 5*dB (before enhancement)). After enhancement we have: SNRf* ¼ 13*:*7710 *, SSNR* ¼ 0*:*7135 *and PESQ* ¼ 2*:*2350*.*

**5. Conclusion**

*SNRf* ¼ 18*:*8848*, SSNR* ¼ 6*:*4497 *and PESQ* ¼ 3*:*5469*.*

*DOI: http://dx.doi.org/10.5772/intechopen.96365*

**Figure 5.**

Speech Quality (*PESQ*).

**31**

In this chapter, we will detail a new speech enhancement technique based on Lifting Wavelet Transform (*LWT*) and Artifitial Neural Network (*ANN*). This technique also uses the *MMSE* Estimate of Spectral Amplitude. It consists at the first step in applying the *LWT* to the noisy speech signal in order to obtain two noisy details coefficients, *cD*<sup>1</sup> and *cD*<sup>2</sup> and one approximation coefficient, *cA*2. After that, *cD*<sup>1</sup> and *cD*<sup>2</sup> are denoised by soft thresholding and for their thresholding, we need to use suitable thresholds, *thrj*, 1≤*j*≤2. Those thresholds, *thrj*, 1≤*j* ≤2, are determined by using an Artificial Neural Network (*ANN*). The soft thresholding of those coefficients, *cD*<sup>1</sup> and *cD*2, is performed in order to obtain two denoised coefficients, *cDd*<sup>1</sup> and *cDd*2. Then the denoising technique based on *MMSE* Estimate of Spectral Amplitude is applied to the noisy approximation *cA*<sup>2</sup> in order to obtain a denoised coefficient, *cAd*2. Finally, the enhanced speech signal is obtained from the application of the inverse of *LWT*, *LWT*�<sup>1</sup> to *cDd*1, *cDd*<sup>2</sup> and *cAd*2. The performance of the proposed speech enhancement technique is justified by the computations of the Signal to Noise Ratio (*SNR*), Segmental *SNR* (*SSNR*) and Perceptual Evaluation of

*An example of speech enhancement applying the proposed technique: Signal 5 (pronounced by a male voice (Table 1) corrupted by car noise with SNRi* ¼ 10*dB (before enhancement)). After enhancement we have:*

*Speech Enhancement Based on LWT and Artificial Neural Network and Using MMSE Estimate…*

#### **Figure 4.**

*An example of speech enhancement applying the proposed technique: Signal 7 (pronounced by a male voice (Table 1) corrupted by car noise with SNRi* ¼ 5*dB (before enhancement)). After enhancement we have: SNRf* ¼ 15*:*1244*, SSNR* ¼ 8*:*7594 *and PESQ* ¼ 3*:*3304*.*

*Speech Enhancement Based on LWT and Artificial Neural Network and Using MMSE Estimate… DOI: http://dx.doi.org/10.5772/intechopen.96365*

**Figure 5.**

**Figure 3.**

*Deep Learning Applications*

**Figure 4.**

**30**

*An example of speech enhancement applying the proposed technique: Signal 1 (pronounced by a male voice (Table 1) corrupted by Gaussian white noise with SNRi* ¼ 5*dB (before enhancement)). After enhancement we*

*An example of speech enhancement applying the proposed technique: Signal 7 (pronounced by a male voice (Table 1) corrupted by car noise with SNRi* ¼ 5*dB (before enhancement)). After enhancement we have:*

*have: SNRf* ¼ 13*:*7710 *, SSNR* ¼ 0*:*7135 *and PESQ* ¼ 2*:*2350*.*

*SNRf* ¼ 15*:*1244*, SSNR* ¼ 8*:*7594 *and PESQ* ¼ 3*:*3304*.*

*An example of speech enhancement applying the proposed technique: Signal 5 (pronounced by a male voice (Table 1) corrupted by car noise with SNRi* ¼ 10*dB (before enhancement)). After enhancement we have: SNRf* ¼ 18*:*8848*, SSNR* ¼ 6*:*4497 *and PESQ* ¼ 3*:*5469*.*
