3. Linear prediction coefficients (LPC)

after translating the power spectrum to log domain in order to calculate MFCC coefficients [5].

mel fð Þ¼ 2595 x log <sup>10</sup>ð Þ 1 þ f =700 (1)

The formula used to calculate the mels for any frequency is [19, 22]:

where mel(f) is the frequency (mels) and f is the frequency (Hz).

<sup>C</sup>^ <sup>n</sup> <sup>¼</sup> <sup>X</sup> k

n¼1

log ^ Sk � �

where k is the number of mel cepstrum coefficients, S^<sup>k</sup> is the output of filterbank and C^ <sup>n</sup> is the

The block diagram of the MFCC processor can be seen in Figure 1. It summarizes all the processes and steps taken to obtain the needed coefficients. MFCC can effectively denote the low frequency region better than the high frequency region, henceforth, it can compute formants that are in the low frequency range and describe the vocal tract resonances. It has been generally recognized as a front-end procedure for typical Speaker Identification applications, as it has reduced vulnerability to noise disturbance, with minute session inconsistency and easy to mine [19]. Also, it is a perfect representation for sounds when the source characteristics are stable and consistent (music and speech) [23]. Furthermore, it can capture information from sampled signals with frequencies at a maximum of 5 kHz, which encapsulates most

Cepstral coefficients are said to be accurate in certain pattern recognition problems relating to human voice. They are used extensively in speaker identification and speech recognition [21]. Other formants can also be above 1 kHz and are not efficiently taken into consideration by the large filter spacing in the high frequency range [19]. MFCC features are not exactly accurate in the existence of background noise [14, 24] and might not be well suited for generalization [23].

cos n k � <sup>1</sup>

2 � � π

� �

k

(2)

The MFCCs are calculated using this equation [9, 19]:

6 From Natural to Artificial Intelligence - Algorithms and Applications

energy of sounds that are generated by humans [9].

Figure 1. Block diagram of MFCC processor.

final mfcc coefficients.

Linear prediction coefficients (LPC) imitates the human vocal tract [16] and gives robust speech feature. It evaluates the speech signal by approximating the formants, getting rid of its effects from the speech signal and estimate the concentration and frequency of the left behind residue. The result states each sample of the signal as a direct incorporation of previous samples. The coefficients of the difference equation characterize the formants, thus, LPC needs to approximate these coefficients [25]. LPC is a powerful speech analysis method and it has gained fame as a formant estimation method [17].

The frequencies where the resonant crests happen are called the formant frequencies. Thus, with this technique, the positions of the formants in a speech signal are predictable by calculating the linear predictive coefficients above a sliding window and finding the crests in the spectrum of the subsequent linear prediction filter [17]. LPC is helpful in the encoding of high quality speech at low bit rate [13, 26, 27].

Other features that can be deduced from LPC are linear predication cepstral coefficients (LPCC), log area ratio (LAR), reflection coefficients (RC), line spectral frequencies (LSF) and Arcus Sine Coefficients (ARCSIN) [13]. LPC is generally used for speech reconstruction. LPC method is generally applied in musical and electrical firms for creating mobile robots, in telephone firms, tonal analysis of violins and other string musical gadgets [4].

### 3.1. Algorithm description, strength and weaknesses

Linear prediction method is applied to obtain the filter coefficients equivalent to the vocal tract by reducing the mean square error in between the input speech and estimated speech [28]. Linear prediction analysis of speech signal forecasts any given speech sample at a specific period as a linear weighted aggregation of preceding samples. The linear predictive model of speech creation is given as [13, 25]:

$$\hat{\mathbf{s}}(n) = \sum\_{k=1}^{p} a\_k \mathbf{s}(n-k) \tag{3}$$

where ^s is the predicted sample, s is the speech sample, p is the predictor coefficients.

The prediction error is given as [16, 25]:

$$\mathbf{e}(n) = \mathbf{s}(n) - \hat{\mathbf{s}}(n) \tag{4}$$

Subsequently, each frame of the windowed signal is autocorrelated, while the highest autocorrelation value is the order of the linear prediction analysis. This is followed by the LPC analysis, where each frame of the autocorrelations is converted into LPC parameters set which consists of the LPC coefficients [26]. A summary of the procedure for obtaining the LPC is as seen in Figure 2. LPC can be derived by [7]:

Figure 2. Block diagram of LPC processor.

$$\mathbf{a}\_{\mathbf{m}} = \log \left[ \frac{1 - k\_m}{1 + k\_m} \right] \tag{5}$$

The LPCC processor is as seen in Figure 3. It pictorially explains the process of obtaining

k¼1

LPCC have low vulnerability to noise [30]. LPCC features yield lower error rate as compared to LPC features [31]. Cepstral coefficients of higher order are mathematically limited, resulting in an extremely extensive array of variances when moving from the cepstral coefficients of lower order to cepstral coefficients of higher order [34]. Similarly, LPCC estimates are notorious for having great sensitivity to quantization noise [35]. Cepstral analysis on high-pitch speech signal gives small source-filter separability in the quefrency domain [29]. Cepstral coefficients of lower order are sensitive to the spectral slope, while the cepstral coefficients of

Individual lines of the Line Spectral Pairs (LSP) are known as line spectral frequencies (LSF). LSF defines the two resonance situations taking place in the inter-connected tube model of the human vocal tract. The model takes into consideration the nasal cavity and the mouth shape, which gives the basis for the fundamental physiological importance of the linear prediction illustration. The two resonance situations define the vocal tract as either being completely open or completely closed at the glottis [36]. The two situations begets two groups of resonant frequencies, with the number of resonances in each group being deduced from the quantity of linked tubes. The resonances of each situation are the odd and even line spectra correspond-

The LSF representation was proposed by Itakura [37, 38] as a substitute to the linear prediction parametric illustration. In the area of speech coding, it has been realized that this illustration has an improved quantization features than the other linear prediction parametric illustrations

k m

� �ckam�<sup>k</sup> (6)

Some Commonly Used Speech Feature Extraction Algorithms

http://dx.doi.org/10.5772/intechopen.80419

9

Cm <sup>¼</sup> am <sup>þ</sup>X<sup>m</sup>�<sup>1</sup>

where am is the linear prediction coefficient, Cm is the cepstral coefficient.

ingly, and are interwoven into a singularly rising group of LSF [36].

LPCC. LPCC can be calculated using [7, 15, 33]:

Figure 3. Block diagram of LPCC processor.

higher order are sensitive to noise [15].

5. Line spectral frequencies (LSF)

where am is the linear prediction coefficient, km is the reflection coefficient.

Linear predictive analysis efficiently selects the vocal tract information from a given speech [16]. It is known for the speed of computation and accuracy [18]. LPC excellently represents the source behaviors that are steady and consistent [23]. Furthermore, it is also be used in speaker recognition system where the main purpose is to extract the vocal tract properties [25]. It gives very accurate estimates of speech parameters and is comparatively efficient for computation [14, 26]. Traditional linear prediction suffers from aliased autocorrelation coefficients [29]. LPC estimates have high sensitivity to quantization noise [30] and might not be well suited for generalization [23].
