7. Perceptual linear prediction (PLP)

Perceptual linear prediction (PLP) technique combines the critical bands, intensity-to-loudness compression and equal loudness pre-emphasis in the extraction of relevant information from speech. It is rooted in the nonlinear bark scale and was initially intended for use in speech recognition tasks by eliminating the speaker dependent features [11]. PLP gives a representation conforming to a smoothed short-term spectrum that has been equalized and compressed similar to the human hearing making it similar to the MFCC. In the PLP approach, several prominent features of hearing are replicated and the consequent auditory like spectrum of speech is approximated by an autoregressive all–pole model [52]. PLP gives minimized resolution at high frequencies that signifies auditory filter bank based approach, yet gives the orthogonal outputs that are similar to the cepstral analysis. It uses linear predictions for spectral smoothing, hence, the name is perceptual linear prediction [28]. PLP is a combination of both spectral analysis and linear prediction analysis.

#### 7.1. Algorithm description, strength and weaknesses

In order to compute the PLP features the speech is windowed (Hamming window), the Fast Fourier Transform (FFT) and the square of the magnitude are computed. This gives the power spectral estimates. A trapezoidal filter is then applied at 1-bark interval to integrate the overlapping critical band filter responses in the power spectrum. This effectively compresses the higher frequencies into a narrow band. The symmetric frequency domain convolution on the bark warped frequency scale then permits low frequencies to mask the high frequencies, concurrently smoothing the spectrum. The spectrum is subsequently pre-emphasized to approximate the uneven sensitivity of human hearing at a variety of frequencies. The spectral amplitude is compressed, this reduces the amplitude variation of the spectral resonances. An Inverse Discrete Fourier Transform (IDCT) is performed to get the autocorrelation coefficients. Spectral smoothing is performed, solving the autoregressive equations. The autoregressive coefficients are converted to cepstral variables [28]. The equation for computing the bark scale frequency is:

Type of

Shape of

What is

Speed of

Type of

Noise

Sensitivity

 to

Reliability

 Frequency

captured

quantization/additional

noise

resistance

computation

coefficient

filter

modeled

Filter

Mel frequency cepstral

Mel

 Triangular

 Human

High

 Cepstral

 Medium

 Medium

High

 Low

Auditory

System

coefficient (MFCC)

Linear prediction

Linear

Linear

 Human Vocal

High

Coefficient

Autocorrelation

High

 High

High

 Low

Tract

Prediction

Linear

Linear

 Human Vocal

Medium

 Cepstral

 High

 High

Medium

 Low & Medium

Medium

 Low & Medium

Medium

 Low &

High

Medium

 Low & Medium

Some Commonly Used Speech Feature Extraction Algorithms

http://dx.doi.org/10.5772/intechopen.80419

15

Tract

Prediction

Linear

Linear

 Human Vocal

Medium

 Spectral

 High

 High

Tract

Prediction

Lowpass &

—

 —

High

 Wavelets

 Medium

 Medium

highpass

Bark

 Trapezoidal

 Human

Medium

 Cepstral &

Medium

 Medium

Autocorrelation

Auditory

System

coefficient (LPC) Linear prediction cepstral

coefficient (LPCC) Line spectral frequencies

(LSF)

Discrete wavelet transform (DWT)

Perceptual linear

prediction (PLP)

Table 1.

Comparison

 between the feature extraction techniques.

Figure 6. Block diagram of PLP processor.


7. Perceptual linear prediction (PLP)

14 From Natural to Artificial Intelligence - Algorithms and Applications

of both spectral analysis and linear prediction analysis.

7.1. Algorithm description, strength and weaknesses

frequency is:

Figure 6. Block diagram of PLP processor.

Perceptual linear prediction (PLP) technique combines the critical bands, intensity-to-loudness compression and equal loudness pre-emphasis in the extraction of relevant information from speech. It is rooted in the nonlinear bark scale and was initially intended for use in speech recognition tasks by eliminating the speaker dependent features [11]. PLP gives a representation conforming to a smoothed short-term spectrum that has been equalized and compressed similar to the human hearing making it similar to the MFCC. In the PLP approach, several prominent features of hearing are replicated and the consequent auditory like spectrum of speech is approximated by an autoregressive all–pole model [52]. PLP gives minimized resolution at high frequencies that signifies auditory filter bank based approach, yet gives the orthogonal outputs that are similar to the cepstral analysis. It uses linear predictions for spectral smoothing, hence, the name is perceptual linear prediction [28]. PLP is a combination

In order to compute the PLP features the speech is windowed (Hamming window), the Fast Fourier Transform (FFT) and the square of the magnitude are computed. This gives the power spectral estimates. A trapezoidal filter is then applied at 1-bark interval to integrate the overlapping critical band filter responses in the power spectrum. This effectively compresses the higher frequencies into a narrow band. The symmetric frequency domain convolution on the bark warped frequency scale then permits low frequencies to mask the high frequencies, concurrently smoothing the spectrum. The spectrum is subsequently pre-emphasized to approximate the uneven sensitivity of human hearing at a variety of frequencies. The spectral amplitude is compressed, this reduces the amplitude variation of the spectral resonances. An Inverse Discrete Fourier Transform (IDCT) is performed to get the autocorrelation coefficients. Spectral smoothing is performed, solving the autoregressive equations. The autoregressive coefficients are converted to cepstral variables [28]. The equation for computing the bark scale

Table 1. Comparison between the feature extraction techniques.

#### Some Commonly Used Speech Feature Extraction Algorithms http://dx.doi.org/10.5772/intechopen.80419 15

$$
bar{k}(f) = \frac{26.81 \, f}{1960 + f} - 0.53\tag{17}$$

Author details

Sabur Ajibola Alim1

References

102

161

edge (IKT07), Mashhad, Iran. 2007

multimedia/audio/209082/asr\_project.pdf

\* and Nahrul Khair Alang Rashid<sup>2</sup>

Signal Processing and its Applications (CSPA). 2012. pp. 240-245

[1] Hariharan M, Vijean V, Fook CY, Yaacob S. Speech stuttering assessment using sample entropy and Least Square Support vector machine. In: 8th International Colloquium on

Some Commonly Used Speech Feature Extraction Algorithms

http://dx.doi.org/10.5772/intechopen.80419

17

[2] Manjula GN, Kumar MS. Stuttered speech recognition for robotic control. International

[3] Duffy JR. Motor speech disorders: Clues to neurologic diagnosis. In: Parkinson's Disease

[4] Kurzekar PK, Deshmukh RR, Waghmare VB, Shrishrimal PP. A comparative study of feature extraction techniques for speech recognition system. International Journal of Inno-

[5] Ahmad AM, Ismail S, Samaon DF. Recurrent neural network with backpropagation through time for speech recognition. In: IEEE International Symposium on Communications and Information Technology (ISCIT 2004). Vol. 1. Sapporo, Japan: IEEE; 2004. pp. 98-

[6] Shaneh M, Taheri A. Voice command recognition system based on MFCC and VQ algorithms. World academy of science. Engineering and Technology. 2009;57:534-538

[7] Mosa GS, Ali AA. Arabic phoneme recognition using hierarchical neural fuzzy petri net and LPC feature extraction. Signal Processing: An International Journal (SPIJ). 2009;3(5):

[8] Yousefian N, Analoui M. Using radial basis probabilistic neural network for speech recognition. In: Proceeding of 3rd International Conference on Information and Knowl-

[9] Cornaz C, Hunkeler U, Velisavljevic V. An Automatic Speaker Recognition System. Switzerland: Lausanne; 2003. Retrieved from: http://read.pudn.com/downloads60/sourcecode/

[10] Shah SAA, ul Asar A, Shaukat SF. Neural network solution for secure interactive voice

response. World Applied Sciences Journal. 2009;6(9):1264-1269

vative Research in Science, Engineering and Technology. 2014;3(12):18006-18016

Journal of Engineering and Innovative Technology (IJEIT). 2014;3(12):174-177

and Movement Disorders. Totowa, NJ: Humana Press; 2000. pp. 35-53

\*Address all correspondence to: moaj1st@yahoo.com

2 Universiti Teknologi Malaysia, Skudai, Johor, Malaysia

1 Ahmadu Bello University, Zaria, Nigeria

where bark(f) is the frequency (bark) and f is the frequency (Hz).

The identification achieved by PLP is better than that of LPC [28], because it is an improvement over the conventional LPC because it effectively suppresses the speaker-dependent information [52]. Also, it has enhanced speaker independent recognition performance and is robust to noise, variations in the channel and microphones [53]. PLP reconstructs the autoregressive noise component accurately [54]. PLP based front end is sensitive to any change in the formant frequency.

Figure 6 shows the PLP processor, showing all the steps to be taken to obtain the PLP coefficients. PLP has low sensitivity to spectral tilt, consistent with the findings that it is relatively insensitive to phonetic judgments of the spectral tilt. Also, PLP analysis is dependent on the result of the overall spectral balance (formant amplitudes). The formant amplitudes are easily affected by factors such as the recording equipment, communication channel and additive noise [52]. Furthermore, the time-frequency resolution and efficient sampling of the shortterm representation are addressed in an ad-hoc way [54].

Table 1 shows a comparison between the six feature extraction techniques that have been explicitly described above. Even though the selection of a feature extraction algorithm for use in research is individual dependent, however, this table has been able to characterize these techniques based on the main considerations in the selection of any feature extraction algorithm. The considerations include speed of computation, noise resistance and sensitivity to additional noise. The table also serves as a guide when considering the selection between any two or more of the discussed algorithms.
