**1. Introduction**

Numerous speech enhancement techniques have been developed in the previous years as speech enhancement is a core target in numerous challenging domains such as speech and speaker recognitions, telecommunications, teleconferencing and handfree telephony [1]. In such applications, the goal is to recover a speech signal from observations degraded by diverse noises components [2]. The unusual noise components can be of various classes that are frequently present in the environment [3]. Many algorithms and approaches were proposed for resolving the problem of degraded speech signals [4–6]. Furthermore, methods of single or multi-microphones are proposed in order to ameliorate the behaviour of the speech enhancement approaches and also to reduce the acoustic noise components even in very noisy conditions [2]. The most well-known single channel approaches that are extensively

known in speech enhancement application is the spectrum subtraction (SS) that requires only one channel signal [7]. It has been embedded in some high-quality mobile phones for the same application. Though, the SS approach is just appropriate for stationary noise environments [2]. Furthermore, it surely introduces music noise problem. In fact, the higher the noise is reduced, the greater the alteration is brought to the speech signal and accordingly the poorer the intelligibility of the enhanced speech is obtained [8, 9]. As a result, ideal enhancement can hardly be attained when the Signal to Noise Ratio (*SNR*) of the noisy speech is relatively low; below **5** *dB*. However, it has quite good result when the noisy speech *SNR* is relatively high; above **15** *dB* [2]. The SS and other speech enhancement methods that are based on SS principal have ameliorated the decision directed **(***DD*) methods in reducing the musical noise components [10–13]. Numerous algorithms that ameliorate the *DD* methods were suggested in [14]. In [15], a speech enhancement technique based on high order cumulant parameter estimation was proposed. In [16, 17], a subspace technique, which is based on Singular Value Decomposition (*SVD*) approaches was proposed; the signal is enhanced when the noise subspace is eliminated, and accordingly, the clean speech signal is estimated from the subspace of the noisy speech [2]. Another technique which was widely studied in speech enhancement application, is the adaptive noise cancellation (ANC) approach that was firstly suggested in [18, 19]. Moreover, most important speech enhancement methods employed adaptive approaches for getting the tracking ability of non-stationary noise properties [20, 21]. Numerous adaptive techniques were proposed for speech enhancement application, we can find time domain algorithm, frequency domain adaptive algorithms [22–26] or adaptive spatial filtering methods [27, 28] that frequently employ adaptive SVD methods in order to separate the speech signal space from the noisy one. Another direction of research combines the Blind Source Separation (BSS) methods with adaptive filtering algorithms for enhancing the speech signal and to cancel effectively the acoustic echo components [29–32]. This approach employs at least two microphones configuration in order to update the adaptive filtering algorithms. Also, a multi-microphone speech enhancement technique was proposed for the same aim and ameliorated the existing one-channel and two-channel speech enhancement and noise reduction adaptive algorithms [33, 34]. Numerous research works highlighted the problem of speech enhancement on a simple problem of mixing and unmixing signals with convolutive and instantaneous noisy observations [35–37]. In the last decade, a novel research direction has proven the efficacy of the wavelet domain as a novel effective mean that can ameliorates the speech enhancement approaches, and numerous algorithms and methods have been proposed for the same aim [38, 39]. In this chapter, we propose a novel speech enhancement technique based on Lifting Wavelet Transform (*LWT*) and Artifitial Neural Network (*ANN*) and also uses *MMSE* Estimate of Spectral Amplitude [40]. The presentation of this chapter is given as follows: after the introduction, we will deal with Lifting Wavelet Transform (*LWT*), in section 2. Section 3 is reserved to describe the proposed speech enhancement technique. Section 4 presents the obtained results and finally we conclude our work in section 5.

feature that overcomes the shortcomings of *DWT*. The Signal decomposition using

*Speech Enhancement Based on LWT and Artificial Neural Network and Using MMSE Estimate…*

In this chapter, we propose a new speech enhancement technique based on Lifting Wavelet Transform (*LWT*) and Artifitial Neural Network (*ANN*). This technique also uses the *MMSE* Estimate of Spectral Amplitude. It consists at the first step in applying the *LWT* to the noisy speech signal in order to obtain two noisy details coefficients, *cD*<sup>1</sup> and *cD*<sup>2</sup> and one approximation coefficient, *cA*2. After that, *cD*<sup>1</sup> and *cD*<sup>2</sup> are denoised by soft thresholding and for their thresholding, we need to use suitable thresholds, *thrj*, 1≤*j*≤2. Those thresholds, *thrj*, 1≤*j* ≤2, are determined by using an Artificial Neural Network (*ANN*). The soft thresholding of those coefficients, *cD*<sup>1</sup> and *cD*2, is performed in order to obtain two denoised coefficients, *cDd*<sup>1</sup> and *cDd*2. Then the denoising technique based on MMSE Estimate of Spectral Amplitude [40] is applied to the noisy approximation *cA*<sup>2</sup> in order to obtain a denoised coefficient, *cAd*2. Finally, the enhanced speech signal is obtained from the application of the inverse of *LWT*, *LWT*�<sup>1</sup> to *cDd*1, *cDd*<sup>2</sup> and *cAd*2. For each coefficient, *cD <sup>j</sup>*, 1≤ *j*≤2, the corresponding ideal *thrj*, 1≤*j*≤ 2 is computed using

*LWT* needs three steps: splitting, prediction and update.

**3. The proposed technique**

*DOI: http://dx.doi.org/10.5772/intechopen.96365*

the following MATLAB function:

for(i = 1:length(cb)).

R = [R r];

[guess,ibest] = min(R); thr = abs(cb(ibest));

corresponding noisy details one at the level *j*.

for(j = 1:length(cc)).

R = [];

r = 0;

end;

end;

Eqs. 1 and 2.

**25**

**function** [thr] = Compute\_Threshold (cc,cb).

r = r + (wthresh(cb(j),'s',abs(cb(i)))-cc(j)).^2;

The inputs of this function are the clean details coefficient, *cc* and the corresponding noisy coefficient, *cb*. The output of this function is the ideal threshold, *thr*. Note that the couple ð Þ *cc*,*cb* can be ð Þ *ccD*1,*cD*<sup>1</sup> or ð Þ *ccD*2,*cD*<sup>2</sup> where *ccD <sup>j</sup> and cD <sup>j</sup>*, 1≤ *j*≤2 are respectively the clean details coefficient and the

In this chapter and as previously mentioned, the ideal threshold at level *j* is *thrj* and is used for soft thresholding of the noisy details coefficient *cD <sup>j</sup>* at that level. In this work, the used *ANN* is trained by a set of couples ð Þ *P*, *T* where *P* is the input of this neural network and is chosen to be *cD <sup>j</sup>* and *T* is the Target and is chosen to be the corresponding ideal threshold, *thrj* at level *j*. Consequently, for computing a suitable threshold to be used in soft thresholing of *cDj*, 1≤*j*≤2, we use one *ANN* so we have two *ANNs* and two different suitable thresholds. Each of those *ANNs* is constituted of two layers where the first one is a hidden layer and contains ten neurons having tansigmoid activation function and the second layer is the output one and contains one neurone having purelin activation function (**Figure 1**). As shown in **Figure 1**, the input of this ANN is the noisy details coefficient at level 1≤*j*≤2, *P* ¼ *cDj*, 1≤*j*≤ 2 and the desired output or Target, *T* ¼ *thrj*, 1≤*j*≤ 2. The activation functions tansigmoid and pureline are respectively expressed by
