**3. The proposed technique**

known in speech enhancement application is the spectrum subtraction (SS) that requires only one channel signal [7]. It has been embedded in some high-quality mobile phones for the same application. Though, the SS approach is just appropriate for stationary noise environments [2]. Furthermore, it surely introduces music noise problem. In fact, the higher the noise is reduced, the greater the alteration is brought to the speech signal and accordingly the poorer the intelligibility of the enhanced speech is obtained [8, 9]. As a result, ideal enhancement can hardly be attained when the Signal to Noise Ratio (*SNR*) of the noisy speech is relatively low; below **5** *dB*. However, it has quite good result when the noisy speech *SNR* is relatively high; above **15** *dB* [2]. The SS and other speech enhancement methods that are based on SS principal have ameliorated the decision directed **(***DD*) methods in reducing the musical noise components [10–13]. Numerous algorithms that ameliorate the *DD* methods were suggested in [14]. In [15], a speech enhancement technique based on high order cumulant parameter estimation was proposed. In [16, 17], a subspace technique, which is based on Singular Value Decomposition (*SVD*) approaches was proposed; the signal is enhanced when the noise subspace is eliminated, and accordingly, the clean speech signal is estimated from the subspace of the noisy speech [2]. Another technique which was widely studied in speech enhancement application, is the adaptive noise cancellation (ANC) approach that was firstly suggested in [18, 19]. Moreover, most important speech enhancement methods employed adaptive approaches for getting the tracking ability of non-stationary noise properties [20, 21]. Numerous adaptive techniques were proposed for speech enhancement application, we can find time domain algorithm, frequency domain adaptive algorithms [22–26] or adaptive spatial filtering methods [27, 28] that frequently employ adaptive SVD methods in order to separate the speech signal space from the noisy one. Another direction of research combines the Blind Source Separation (BSS) methods with adaptive filtering algorithms for enhancing the speech signal and to cancel effectively the acoustic echo components [29–32]. This approach employs at least two microphones configuration in order to update the adaptive filtering algorithms. Also, a multi-microphone speech enhancement technique was proposed for the same aim and ameliorated the existing one-channel and two-channel speech enhancement and noise reduction adaptive algorithms [33, 34]. Numerous research works highlighted the problem of speech enhancement on a simple problem of mixing and unmixing signals with convolutive and instantaneous noisy observations [35–37]. In the last decade, a novel research direction has proven the efficacy of the wavelet domain as a novel effective mean that can ameliorates the speech enhancement approaches, and numerous algorithms and methods have been proposed for the same aim [38, 39]. In this chapter, we propose a novel speech enhancement technique based on Lifting Wavelet Transform (*LWT*) and Artifitial Neural Network (*ANN*) and also uses *MMSE* Estimate of Spectral Amplitude [40]. The presentation of this chapter is given as follows: after the introduction, we will deal with Lifting Wavelet Transform (*LWT*), in section 2. Section 3 is reserved to describe the proposed speech enhancement technique. Section 4 presents the

*Deep Learning Applications*

obtained results and finally we conclude our work in section 5.

The Lifting Wavelet Transform (*LWT*) becomes a powerful tool for signal analysis due to its effective and faster implementation compared to the Discrete Wavelet Transform (*DWT*). In the domains of the signal denoising, signal compression and watermarking, the *LWT* permits to obtain better results compared to the *DWT* . The *LWT* permits to saves times and has a better frequency localization

**2. Lifting wavelet transform (***LWT***)**

**24**

In this chapter, we propose a new speech enhancement technique based on Lifting Wavelet Transform (*LWT*) and Artifitial Neural Network (*ANN*). This technique also uses the *MMSE* Estimate of Spectral Amplitude. It consists at the first step in applying the *LWT* to the noisy speech signal in order to obtain two noisy details coefficients, *cD*<sup>1</sup> and *cD*<sup>2</sup> and one approximation coefficient, *cA*2. After that, *cD*<sup>1</sup> and *cD*<sup>2</sup> are denoised by soft thresholding and for their thresholding, we need to use suitable thresholds, *thrj*, 1≤*j*≤2. Those thresholds, *thrj*, 1≤*j* ≤2, are determined by using an Artificial Neural Network (*ANN*). The soft thresholding of those coefficients, *cD*<sup>1</sup> and *cD*2, is performed in order to obtain two denoised coefficients, *cDd*<sup>1</sup> and *cDd*2. Then the denoising technique based on MMSE Estimate of Spectral Amplitude [40] is applied to the noisy approximation *cA*<sup>2</sup> in order to obtain a denoised coefficient, *cAd*2. Finally, the enhanced speech signal is obtained from the application of the inverse of *LWT*, *LWT*�<sup>1</sup> to *cDd*1, *cDd*<sup>2</sup> and *cAd*2. For each coefficient, *cD <sup>j</sup>*, 1≤ *j*≤2, the corresponding ideal *thrj*, 1≤*j*≤ 2 is computed using the following MATLAB function:

**function** [thr] = Compute\_Threshold (cc,cb). R = []; for(i = 1:length(cb)). r = 0; for(j = 1:length(cc)). r = r + (wthresh(cb(j),'s',abs(cb(i)))-cc(j)).^2; end; R = [R r]; end; [guess,ibest] = min(R); thr = abs(cb(ibest));

The inputs of this function are the clean details coefficient, *cc* and the corresponding noisy coefficient, *cb*. The output of this function is the ideal threshold, *thr*. Note that the couple ð Þ *cc*,*cb* can be ð Þ *ccD*1,*cD*<sup>1</sup> or ð Þ *ccD*2,*cD*<sup>2</sup> where *ccD <sup>j</sup> and cD <sup>j</sup>*, 1≤ *j*≤2 are respectively the clean details coefficient and the corresponding noisy details one at the level *j*.

In this chapter and as previously mentioned, the ideal threshold at level *j* is *thrj* and is used for soft thresholding of the noisy details coefficient *cD <sup>j</sup>* at that level. In this work, the used *ANN* is trained by a set of couples ð Þ *P*, *T* where *P* is the input of this neural network and is chosen to be *cD <sup>j</sup>* and *T* is the Target and is chosen to be the corresponding ideal threshold, *thrj* at level *j*. Consequently, for computing a suitable threshold to be used in soft thresholing of *cDj*, 1≤*j*≤2, we use one *ANN* so we have two *ANNs* and two different suitable thresholds. Each of those *ANNs* is constituted of two layers where the first one is a hidden layer and contains ten neurons having tansigmoid activation function and the second layer is the output one and contains one neurone having purelin activation function (**Figure 1**).

As shown in **Figure 1**, the input of this ANN is the noisy details coefficient at level 1≤*j*≤2, *P* ¼ *cDj*, 1≤*j*≤ 2 and the desired output or Target, *T* ¼ *thrj*, 1≤*j*≤ 2. The activation functions tansigmoid and pureline are respectively expressed by Eqs. 1 and 2.

**Figure 1.** *The architecture of the ANN used in this work.*

$$\text{transig } (n) = \mathbf{1}/(\mathbf{1} + \exp \left( -n \right)) \tag{1}$$

$$Purelin\ (n) = n\tag{2}$$

Also for the evaluation of the proposed speech enhancement approach, we have applied the denoising technique based on *MMSE* Estimate of Spectral Amplitude [40]. This evaluation is performed in terms of Signal to Noise Ratio (*SNR*), the Segmental *SNR* (*SSNR*) and the Perceptual Evaluation of Speech Quality (*PESQ*). In **Tables 2**–**7** are listed the results obtained from the computations of SNRf (after

*Speech Enhancement Based on LWT and Artificial Neural Network and Using MMSE Estimate…*

**Female speaker Male speaker**

**Table 1.**

**Table 2.**

**SNRi (dB)**

**Table 3.**

**27**

**SNRi (dB)**

*The list of the employed Arabic speech sentences.*

*DOI: http://dx.doi.org/10.5772/intechopen.96365*

**The proposed speech enhancement technique**

**The proposed speech enhancement technique**

�5 **8.3650 7.1431 13.0857 11.6110 16.9010 15.5721 19.8933 18.8719 22.3135 21.7972**

*Results in term of SNR (signal 4 (female voice) corrupted by Gaussian white noise).*

�5 �**0.0954** �0.7089 **2.5997** 1.7725 **4.7373** 3.9719 **6.8038** 5.9329 **9.5324** 8.7158

*Results in term of SSNR (signal 4 (female voice) corrupted by Gaussian white noise).*

**1 Signal :**لا لن يذيع الخبر **1Signal** :أحفظ من الأرض **2 Signal :**أكمل بالإسلام رسالتك **2 Signal :**أين المسا فرين

> **SNRf (dB) The Denoising approach**

> **SSNR (dB) The Denoising approach**

**The denoising technique based on MMSE Estimate of Spectral Amplitude [40]**

**The denoising technique based on MMSE Estimate of Spectral Amplitude [40]**

**3 Signal :**سقطت إبرة **3 Signal :**لا لم يستمتع بثمرها **4 Signal :**من لم ينتفع **4 Signal :**سيؤذيهم زماننا **5 Signal :**غفل عن ضحكاتها **5 Signal :**كنت قدوة لهم **6 Signal :**و لماذا نشف مالهم **6 Signal :**ازار صائما **7 Signal :**أين زوايانا و قانوننا **7 Signal :**كال و غبط الكبش **8 Signal :**صاد الموروث مدلعا **8 Signal:**هل لذعته بقول **9 Signal :**نبه آبائكم **9 Signal :**عرف واليا و قائدا **10 Signal :**أظهره و قم **10 Signal :**خالا بالنا منكما

Generally, neural networks consist of a minimum of two layers (one hidden layer and another output layer). The input information is connected to the hidden layers through weighted connections where the output data is calculated. The number of hidden layers and the number of neurons in each layer controls the performance of the network. According to [41], there are no guidelines for deciding a manner for selecting the number of neurons along with number of hidden layers for a given problem to give the best performance. And it is still a trial and error design method [41].

For training each *ANN* used in this work we have employed 50 speech signals and 10 others used for testing those networks. Therefore, for training each used ANN, we used 50 couples of Input and Target ð Þ *P*, *T* . Evidently, the noisy speech signals used for the *ANNs* testing do not belong to the training database. The different parameters used for the training of the used ANNs are the epochs number which is equal to 5000, the momentum, μ or Mu which is equal to 0.1, the gradient minimum which is equal to 1e � 7. The employed training algorithm is Leverberg-Marquardt.

In summary, the novelty of the proposed technique consists in applying the denoising technique based on MMSE Estimate of Spectral Amplitude [40]. Also, we apply the ANN for computing ideals thresholds to be used for thresholding of the noisy details coefficients obtained from the application of the *LWT* to the noisy speech signal.
