5. Line spectral frequencies (LSF)

Individual lines of the Line Spectral Pairs (LSP) are known as line spectral frequencies (LSF). LSF defines the two resonance situations taking place in the inter-connected tube model of the human vocal tract. The model takes into consideration the nasal cavity and the mouth shape, which gives the basis for the fundamental physiological importance of the linear prediction illustration. The two resonance situations define the vocal tract as either being completely open or completely closed at the glottis [36]. The two situations begets two groups of resonant frequencies, with the number of resonances in each group being deduced from the quantity of linked tubes. The resonances of each situation are the odd and even line spectra correspondingly, and are interwoven into a singularly rising group of LSF [36].

The LSF representation was proposed by Itakura [37, 38] as a substitute to the linear prediction parametric illustration. In the area of speech coding, it has been realized that this illustration has an improved quantization features than the other linear prediction parametric illustrations (LAR and RC). The LSF illustration has the capacity to reduce the bit-rate by 25–30% for transmitting the linear prediction information without distorting the quality of synthesized speech [38–40]. Apart from quantization, LSF illustration of the predictor are also suitable for interpolation. Theoretically, this can be inspired by the point that the sensitivity matrix linking the LSF-domain squared quantization error to the perceptually relevant log spectrum is diagonal [41, 42].

#### 5.1. Algorithm description, strength and weaknesses

LP is established on the point that a speech signal can be defined by Eq. (3). Recall

$$\hat{\mathbf{s}}(n) = \sum\_{k=1}^{p} a\_k \mathbf{s}(n-k)$$

where k is the time index and p is the order of the linear prediction, ^s nð Þ is the predictor signal and ak is the LPC coefficients.

The ak coefficients are determined in order to reduce the prediction error by method of autocorrelation or covariance. Eq. (3) can be modified in the frequency domain with the ztransform. As such, a small part of the speech signal is anticipated to be given as an output to the all-pole filter H zð Þ. The new equation is

$$H(z) = \frac{1}{A(z)} = \frac{1}{1 - \sum\_{i=1}^{p} a\_i z^{-1}}\tag{7}$$

they characterize bandwidths and resonance locations and lays emphasis on the important aspect of spectral peak location. In most instances, the LSF representation provides a near-

Some Commonly Used Speech Feature Extraction Algorithms

http://dx.doi.org/10.5772/intechopen.80419

11

Since LSF represents spectral shape information at a lower data rate than raw input samples, it is reasonable that a careful use of processing and analysis methods in the LSP domain could lead to a complexity reduction against alternative techniques operating on the raw input data itself. LSF play an important role in the transmission of vocal tract information from speech coder to decoder with their widespread use being a result of their excellent quantization properties. The generation of LSP parameters can be accomplished using several methods, ranging in complexity. The major problem revolves around finding the roots of the P and Q polynomials defined in Eqs. (8) and (9). This can be obtained through standard root solving methods, or more obscure methods and it is often performed in the cosine domain [36].

Wavelet Transform (WT) theory is centered around signal analysis using varying scales in the time and frequency domains [45]. With the support of theoretical physicist Alex Grossmann, Jean Morlet introduced wavelet transform which permits high-frequency events identification with an enhanced temporal resolution [45–47]. A wavelet is a waveform of effectively limited duration that has an average value of zero. Many wavelets also display orthogonality, an ideal feature of compact signal representation [46]. WT is a signal processing technique that can be used to represent real-life non-stationary signals with high efficiency [33, 46]. It has the ability to mine information from the transient signals concurrently in both time and frequency

Continuous wavelet transform (CWT) is used to split a continuous-time function into wavelets. However, there is redundancy of information and huge computational efforts is required to calculate all likely scales and translations of CWT, thereby restricting its use [45]. Discrete wavelet transform (DWT) is an extension of the WT that enhances the flexibility to the decomposition process [48]. It was introduced as a highly flexible and efficient method for sub band breakdown of signals [46, 49]. In earlier applications, linear

minimal data set for subsequent classification [36].

Figure 4. Block diagram of LSF processor.

6. Discrete wavelet transform (dwt)

domains [33, 45, 48].

where H zð Þ is the all-pole filter and A zð Þ is the LPC analysis filter.

In order to compute the LSF coefficients, an inverse polynomial filter is split into two polynomials P zð Þ and Q zð Þ [36, 38, 40, 41]:

$$P(z) = A(z) + z^{-(p+1)}A(z^{-1})\tag{8}$$

$$Q(z) = A(z) - z^{-(p+1)}A(z^{-1})\tag{9}$$

where P zð Þ is the vocal tract with the glottis closed, Q zð Þ is the LPC analysis filter of order p.

In order to convert LSF back to LPC, the equation below is used [36, 41, 43, 44]:

$$A(z) = 0.5[P(z) + Q(z)]\tag{10}$$

The block diagram of the LSF processor is as seen in Figure 4. The most prominent application of LSF is in the area of speech compression, with extension into the speaker recognition and speech recognition. This technique has also found restricted use in other fields. LSF have been investigated for use in musical instrument recognition and coding. LSF have also been applied to animal noise identification, recognizing individual instruments and financial market analysis. The advantages of LSF include their ability to localize spectral sensitivities, the fact that

Figure 4. Block diagram of LSF processor.

(LAR and RC). The LSF illustration has the capacity to reduce the bit-rate by 25–30% for transmitting the linear prediction information without distorting the quality of synthesized speech [38–40]. Apart from quantization, LSF illustration of the predictor are also suitable for interpolation. Theoretically, this can be inspired by the point that the sensitivity matrix linking the LSF-domain squared quantization error to the perceptually relevant log spectrum is diag-

LP is established on the point that a speech signal can be defined by Eq. (3). Recall

^s nð Þ¼ <sup>X</sup> p

H zð Þ¼ <sup>1</sup>

where H zð Þ is the all-pole filter and A zð Þ is the LPC analysis filter.

k¼1

where k is the time index and p is the order of the linear prediction, ^s nð Þ is the predictor signal

The ak coefficients are determined in order to reduce the prediction error by method of autocorrelation or covariance. Eq. (3) can be modified in the frequency domain with the ztransform. As such, a small part of the speech signal is anticipated to be given as an output to

> A zð Þ <sup>¼</sup> <sup>1</sup> <sup>1</sup> � <sup>P</sup><sup>p</sup>

In order to compute the LSF coefficients, an inverse polynomial filter is split into two poly-

where P zð Þ is the vocal tract with the glottis closed, Q zð Þ is the LPC analysis filter of order p.

The block diagram of the LSF processor is as seen in Figure 4. The most prominent application of LSF is in the area of speech compression, with extension into the speaker recognition and speech recognition. This technique has also found restricted use in other fields. LSF have been investigated for use in musical instrument recognition and coding. LSF have also been applied to animal noise identification, recognizing individual instruments and financial market analysis. The advantages of LSF include their ability to localize spectral sensitivities, the fact that

In order to convert LSF back to LPC, the equation below is used [36, 41, 43, 44]:

<sup>i</sup>¼<sup>1</sup> aiz�<sup>1</sup> (7)

P zð Þ¼ A zð Þþ <sup>z</sup>�ð Þ <sup>p</sup>þ<sup>1</sup> A z�<sup>1</sup> � � (8)

Q zð Þ¼ A zð Þ� <sup>z</sup>�ð Þ <sup>p</sup>þ<sup>1</sup> A z�<sup>1</sup> � � (9)

A zð Þ¼ 0:5½ � P zð Þþ Q zð Þ (10)

aks nð Þ � k

onal [41, 42].

and ak is the LPC coefficients.

the all-pole filter H zð Þ. The new equation is

nomials P zð Þ and Q zð Þ [36, 38, 40, 41]:

5.1. Algorithm description, strength and weaknesses

10 From Natural to Artificial Intelligence - Algorithms and Applications

they characterize bandwidths and resonance locations and lays emphasis on the important aspect of spectral peak location. In most instances, the LSF representation provides a nearminimal data set for subsequent classification [36].

Since LSF represents spectral shape information at a lower data rate than raw input samples, it is reasonable that a careful use of processing and analysis methods in the LSP domain could lead to a complexity reduction against alternative techniques operating on the raw input data itself. LSF play an important role in the transmission of vocal tract information from speech coder to decoder with their widespread use being a result of their excellent quantization properties. The generation of LSP parameters can be accomplished using several methods, ranging in complexity. The major problem revolves around finding the roots of the P and Q polynomials defined in Eqs. (8) and (9). This can be obtained through standard root solving methods, or more obscure methods and it is often performed in the cosine domain [36].
