**Part 1**

**DFT and FFT Applications** 

**1** 

Krzysztof Duda

*Poland* 

**Interpolation Algorithms of DFT** 

*Department of Measurement and Instrumentation, Krakow,* 

**Sinusoidal and Damped Sinusoidal Signals** 

Discrete Fourier Transform (DFT) is probably the most popular signal processing tool. Wide DFT use is partly dedicated to fast Fourier Transform (FFT) algorithms (Cooley & Tukey, 1965, Oppenheim et al., 1999, Lyons, 2004). DFT may also be efficiently computed by recursive algorithms in the window sliding by one sample (Jacobsen & Lyons, 2003, Duda, 2010). Unfortunately, DFT has two main drawbacks that deteriorate signal analysis which are (Harris, 1978, Oppenheim et al., 1999): 1) spectral leakage, and 2) sampling of the continuous spectrum of the discrete signal. Spectral leakage is reduced by proper time windows, and the frequency bins between DFT bins are computed by interpolated DFT

IpDFT algorithms may be derived for discrete sinusoidal or damped sinusoidal signals.

<sup>0</sup> cos( ), 0,1,2,..., 1 *<sup>n</sup> vA n n N* =

<sup>0</sup> cos( ) , 0, 0,1,2,..., 1 *dn*

<sup>−</sup> = + ≥= − , (2)

 ϕ

*<sup>n</sup> vA n e d n N*

where *A*>0 is signal's amplitude, 0<*ω*0<*π* is signal's frequency in radians or radians per sample also referred as angular frequency or pulsation, and *ω*0=*π* rad corresponds to the half of the sampling rate *Fs* in hertz, -*π*<*φ*≤*π* is the phase angle in radians, *n* is the index of the sample, *N* is the number of samples, and *d* is damping factor. If discrete signals (1) and (2)

0 0 2( / )

 π

ω

 ϕ

ω

ω

(IpDFT) algorithms, thoroughly presented in this chapter.

**1. Introduction** 

**2. Basic theory 2.1 Signal model** 

Discrete sinusoidal signal is defined as

and discrete damped sinusoidal signal is defined as

result from sampling analog counterparts then

**for Parameters Estimation of** 

*AGH University of Science and Technology,* 

+= − , (1)

= *F Fs* , (3)

## **Interpolation Algorithms of DFT for Parameters Estimation of Sinusoidal and Damped Sinusoidal Signals**

Krzysztof Duda

*AGH University of Science and Technology, Department of Measurement and Instrumentation, Krakow, Poland* 

### **1. Introduction**

Discrete Fourier Transform (DFT) is probably the most popular signal processing tool. Wide DFT use is partly dedicated to fast Fourier Transform (FFT) algorithms (Cooley & Tukey, 1965, Oppenheim et al., 1999, Lyons, 2004). DFT may also be efficiently computed by recursive algorithms in the window sliding by one sample (Jacobsen & Lyons, 2003, Duda, 2010). Unfortunately, DFT has two main drawbacks that deteriorate signal analysis which are (Harris, 1978, Oppenheim et al., 1999): 1) spectral leakage, and 2) sampling of the continuous spectrum of the discrete signal. Spectral leakage is reduced by proper time windows, and the frequency bins between DFT bins are computed by interpolated DFT (IpDFT) algorithms, thoroughly presented in this chapter.

### **2. Basic theory**

### **2.1 Signal model**

IpDFT algorithms may be derived for discrete sinusoidal or damped sinusoidal signals. Discrete sinusoidal signal is defined as

$$\upsilon\_n = A \cos(\alpha\_0 n + \varphi), n = 0, 1, 2, \dots, N - 1,\tag{1}$$

and discrete damped sinusoidal signal is defined as

$$\text{If } v\_n = A \cos(a\rho\_0 n + \varphi)e^{-dn}, \, d \ge 0, n = 0, 1, 2, \dots, N - 1 \,\, , \tag{2}$$

where *A*>0 is signal's amplitude, 0<*ω*0<*π* is signal's frequency in radians or radians per sample also referred as angular frequency or pulsation, and *ω*0=*π* rad corresponds to the half of the sampling rate *Fs* in hertz, -*π*<*φ*≤*π* is the phase angle in radians, *n* is the index of the sample, *N* is the number of samples, and *d* is damping factor. If discrete signals (1) and (2) result from sampling analog counterparts then

$$
\alpha\_0 = 2\pi (\mathcal{F}\_0 \;/\; F\_s) \; , \tag{3}
$$

Interpolation Algorithms of DFT for Parameters

the window spectrum, and not the signal spectrum alone.

the spectrum of the windowed sinusoidal signal (1) is

ω

leakage correction algorithms e.g. in (Radil et al., 2009).

rescaled by complex amplitude (*A*0/2)*e*±*jφ*.

 ϕ

(a) (b)

(c) (d)

estimated value

Fig. 1. Sinusoidal signal (1) with rectangular window and the modulus of its spectrum: a) signal (1), b) spectrum components for positive and negative frequencies (9), c) continuous

FT spectrum (4)(9), d) sampling the continuous spectrum by DFT bins (5); ωE denotes

Estimation of Sinusoidal and Damped Sinusoidal Signals 5

where *X*(*ejΘ*) and *W*(*ejΘ*) are the FT spectra of the infinite length signal *xn* and the time window *wn*. Thus, according to (8) we observe the convolution of the signal spectrum with

FT of infinite length signal *xn*=cos(*ω*0*n*+*φ*) is a pair of impulses at frequencies ±*ω*0+2*πk* thus

0 0 0 0 ( ) ( ) () ( ) ( ) 2 2 *j jj A A j j Ve e We e We*

Equation (9) is used as starting point in derivation of IpDFT algorithms. It is also used in

According to (9) the spectrum *V*(*ejω*) of the discrete, windowed, sinusoidal signal is the sum of two periodic replicas of window spectrum *W*(*ejω*) shifted to the frequency ±*ω*0 and

 ϕ

 ωω− −+ = + . (9)

 ωω

where *F*0 is the frequency of analog signal, *v*(*t*)=*A*cos(2*πF*0*t*+*φ*) or *v*(*t*)=*A*cos(2*πF*0*t*+*φ*)*edFst*, in hertz, *Fs* is sampling frequency in hertz, and *t* is continuous time in seconds.

In section 3 it is shown how to estimate parameters of (1) and (2) i.e. *A*, *ω*0, *φ* and *d* with the use of DFT. If the investigation refers to analog counterpart signal than parameters of the discrete signal should be rescaled adequately, for example *F*0=*Fs·ω*0/(2*π*).

### **2.2 DFT analysis**

Equations in this section are taken from the textbook (Oppenheim et al., 1999). Fourier transform (FT) of infinite length discrete time signal *xn* is defined as

$$X(e^{j\alpha}) = \sum\_{n=-\infty}^{\infty} \propto\_n e^{-j\alpha n} \text{ .}\tag{4}$$

where *n* is integer sample index that goes from minus to plus infinity and *ω* is continuous frequency in radians (angular frequency, pulsation). Continuous spectrum *X*(*ejω*) defined by (4) is periodic with the period 2*π*. The notation *X*(*ejω*), instead of *X*(*ω*), stresses up the connection between FT and *Z* transform.

For finite length discrete time signal *vn* containing *N* samples DFT is defined as

$$V\_k = \sum\_{n=0}^{N-1} \upsilon\_n e^{-j(2\pi/N)kn} \ , k = 0 \ 1 \ 2 \ \dots \ N-1 \ . \tag{5}$$

From (5) it is seen, that by DFT the FT spectrum is computed only for frequencies *ωk*=(2*π*/*N*)*k*, that is DFT samples continuous spectrum of the discrete signal.

Finite length signal *vn*, *n*=0,1,2,...,*N*-1 is obtained from infinite length signal *xn*, *n*=...-2, -1,0,1,2,... by windowing, that is by multiplication with discrete signal *wn*, called window, with nonzero values only on positions *n*=0,1,2,...,*N*-1

$$
\omega v\_n = w\_n \mathbf{x}\_n \,. \tag{6}
$$

Signal models (1) and (2) were given with rectangular window defined as

$$w\_n^R = \begin{cases} 1, & 0 \le n < N \\ 0, & 0 > n \ge N \end{cases} \tag{7}$$

Discrete signal is always analyzed with the time window as only finite number of samples may be read into computer. If no other window is purposely used then the signal is analyzed with rectangular window (7).

Based on FT convolution property signal windowing (6) in the time domain results in following convolution in the frequency domain

$$V(e^{j\phi}) = \frac{1}{2\pi} \int\_{-\pi}^{\pi} X(e^{j\Theta}) \mathcal{W}(e^{j(\phi-\Theta)}) d\Theta \,. \tag{8}$$

4 Fourier Transform – Signal Processing

where *F*0 is the frequency of analog signal, *v*(*t*)=*A*cos(2*πF*0*t*+*φ*) or *v*(*t*)=*A*cos(2*πF*0*t*+*φ*)*edFst*, in

In section 3 it is shown how to estimate parameters of (1) and (2) i.e. *A*, *ω*0, *φ* and *d* with the use of DFT. If the investigation refers to analog counterpart signal than parameters of the

Equations in this section are taken from the textbook (Oppenheim et al., 1999). Fourier

( ) *j j <sup>n</sup> n*

=−∞

where *n* is integer sample index that goes from minus to plus infinity and *ω* is continuous frequency in radians (angular frequency, pulsation). Continuous spectrum *X*(*ejω*) defined by (4) is periodic with the period 2*π*. The notation *X*(*ejω*), instead of *X*(*ω*), stresses up the

ω<sup>∞</sup> <sup>−</sup>

<sup>=</sup> ∑ , (4)

<sup>=</sup> ∑ , *k N* <sup>=</sup> 0,1,2,..., 1 <sup>−</sup> . (5)

*n nn v wx* = . (6)

Θ

<sup>=</sup> ∫ . (8)

−

. (7)

*n Xe xe* ω

For finite length discrete time signal *vn* containing *N* samples DFT is defined as

*ωk*=(2*π*/*N*)*k*, that is DFT samples continuous spectrum of the discrete signal.

Signal models (1) and (2) were given with rectangular window defined as

*R n*

*w*

ω

(2 / )

π<sup>−</sup> <sup>−</sup>

*j N kn*

From (5) it is seen, that by DFT the FT spectrum is computed only for frequencies

Finite length signal *vn*, *n*=0,1,2,...,*N*-1 is obtained from infinite length signal *xn*, *n*=...-2, -1,0,1,2,... by windowing, that is by multiplication with discrete signal *wn*, called window,

> 1, 0 0, 0

<sup>⎧</sup> <sup>≤</sup> <sup>&</sup>lt; <sup>=</sup> <sup>⎨</sup> > ≥ <sup>⎩</sup>

Discrete signal is always analyzed with the time window as only finite number of samples may be read into computer. If no other window is purposely used then the signal is

Based on FT convolution property signal windowing (6) in the time domain results in

<sup>1</sup> ( ) ( ) ( )( ) <sup>2</sup> *j jj Ve Xe We d* π

π

−

π

 Θω Θ

*n N*

*n N*

1

*N*

*k n n V ve*

with nonzero values only on positions *n*=0,1,2,...,*N*-1

analyzed with rectangular window (7).

following convolution in the frequency domain

=

0

hertz, *Fs* is sampling frequency in hertz, and *t* is continuous time in seconds.

discrete signal should be rescaled adequately, for example *F*0=*Fs·ω*0/(2*π*).

transform (FT) of infinite length discrete time signal *xn* is defined as

**2.2 DFT analysis** 

connection between FT and *Z* transform.

where *X*(*ejΘ*) and *W*(*ejΘ*) are the FT spectra of the infinite length signal *xn* and the time window *wn*. Thus, according to (8) we observe the convolution of the signal spectrum with the window spectrum, and not the signal spectrum alone.

FT of infinite length signal *xn*=cos(*ω*0*n*+*φ*) is a pair of impulses at frequencies ±*ω*0+2*πk* thus the spectrum of the windowed sinusoidal signal (1) is

$$V(e^{j\alpha}) = \frac{A\_0}{2} e^{j\phi} \mathcal{W}(e^{j(\alpha - \alpha\_0)}) + \frac{A\_0}{2} e^{-j\phi} \mathcal{W}(e^{j(\alpha + \alpha\_0)}) \,. \tag{9}$$

Equation (9) is used as starting point in derivation of IpDFT algorithms. It is also used in leakage correction algorithms e.g. in (Radil et al., 2009).

According to (9) the spectrum *V*(*ejω*) of the discrete, windowed, sinusoidal signal is the sum of two periodic replicas of window spectrum *W*(*ejω*) shifted to the frequency ±*ω*0 and rescaled by complex amplitude (*A*0/2)*e*±*jφ*.

Fig. 1. Sinusoidal signal (1) with rectangular window and the modulus of its spectrum: a) signal (1), b) spectrum components for positive and negative frequencies (9), c) continuous FT spectrum (4)(9), d) sampling the continuous spectrum by DFT bins (5); ωE denotes estimated value

Interpolation Algorithms of DFT for Parameters

(estimated value)

spectrum.

**2.3 Time windows** 

Estimation of Sinusoidal and Damped Sinusoidal Signals 7

(a) (b)

continuous spectrum can be significantly reduced by IpDFT algorithms.

Other time windows with interesting properties are described in (Nuttall, 1981).

Fig. 3. Non-coherently sampled sinusoidal signal a), and its continuous Fourier spectrum (denoted by FT) and DFT spectrum b); *ωE* - frequency of DFT bin with the highest magnitude

Fig. 2a depicts coherently sampled sinusoidal signal containing exactly 2 periods. The frequency of this signal is *ω*0=2(2*π*/*N*)≈0.79 rad and it equals the frequency of the DFT bin with index *k*=2. Fig. 2b shows continuous Fourier spectrum (4) and DFT bins of this signal. Upper OX axis is scaled in DFT index *k*, and lower OX axis is scaled in frequency in radians. The range of frequencies intentionally exceeds 2*π* (one period) to stress up the periodic nature of the spectrum of discrete signals and the fact that spectral leakage for small frequencies originates from neighboring period (i.e. the part of the spectrum for negative frequencies). For coherent sampling of sinusoidal signal only one DFT bin is nonzero and the analysis is practically not affected by spectral leakage and sampling of the continuous

Fig. 3a depicts sinusoidal signal which does not contain integer numbers of periods. The frequency of this signal is *ω*0=2.2(2*π*/*N*)≈0.86 rad and lays between DFT bins with index *k*=2 and *k*=3. Fig. 3b shows FT spectrum and DFT spectrum of this signal. Estimation of signal's frequency based on the highest magnitude DFT bin is biased by the error of 0.79-0.86=-0.07 rad or -8%. High estimation errors would also be obtained for amplitude and phase estimation based on the highest magnitude DFT bin. Those errors caused by the sampling of

The application of time windows in DFT analysis is thoroughly reviewed in (Harris, 1978), and also described in signal processing textbooks, e.g. (Oppenheim et al., 1999, Lyons, 2004).

Window properties are determined by its spectrum *W*(*ejω*) that consists from the main lobe which is the highest peak in the spectrum and side lobes. The shape of the spectrum of rectangular window (7) may be observed in Fig. 1b. The main lobe of the window should be as narrow as possible, and side lobes should be as low as possible. Narrow main lobe improves frequency resolution of DFT analysis, while low side lobes reduce spectral

Fig. 1 illustrates equations given above for sinusoidal signal (1) analyzed with rectangular window for *A*=1, *ω*0=1 rad, *φ*= 1.3 rad, and *N* =8. Fig. 1b depicts spectrum components for positive and negative frequencies (9). The sum of those components gives continuous spectrum shown in Fig.1c that may also be computed from the FT definition (4). As seen from Fig.1c the energy is not concentrated in the single frequency bin *ω*0 (as would be for infinite length observation), but spills over to all neighboring frequencies. This phenomenon is called spectral leakage and may be the reason of significant estimation errors. In the given example spectral components for positive and negative frequencies (9) influence each other and maximum shown in Fig. 1c is moved from *ω*0=1 rad to *ωE*=1.04 rad (*E* stands for estimated value), that is estimation error equals 4%. Amplitude estimation error equals approx -15%. Estimation errors depend also on signal's phase *φ* as the sum (9) is complex. Because of the spectral leakage, the signal in the example disturbs its own spectrum. The impact of the spectral leakage would be stronger if the signal was a sum of sinusoidal signals, as every sinusoidal component would disturb its own spectrum and the spectrum of all others sinusoidal components.

Fig. 1d depicts DFT spectrum of the signal. DFT bins are only computed for frequencies *ωk*=(2*π*/*N*)*k*. Frequency estimation error in that case equals approx -21%.

In DFT analysis spectral leakage is reduced by application of time windows other then rectangular and the error caused by the sampling of continuous spectrum only in frequencies *ωk*=(2*π*/*N*)*k* is practically eliminated by IpDFT algorithms.

DFT (5) is derived for periodic signals; as a consequence frequency analysis is correct only for the signals containing integer number of periods. The signal with integer number of periods is called coherently or synchronously sampled. In field measurements, for sinusoidal signals (1) close to coherent sampling is obtained with PLL (Phase Locked Loop) that keeps integer ratio of signal's frequency *F*0 to sampling frequency *Fs*, see (3). The frequency of coherently sampled signal is *ω*0=(2*π*/*N*)*k* rad and equals the frequency of DFT bin with index *k*. Damped sinusoidal signals (2) have transient, and not periodic, nature and thus cannot be synchronously acquired with PLL.

Fig. 2. Coherently sampled sinusoidal signal (2 periods) a), and its continuous Fourier spectrum (denoted by FT) and DFT spectrum b); *ωE* - frequency of DFT bin with the highest magnitude (estimated value)

6 Fourier Transform – Signal Processing

Fig. 1 illustrates equations given above for sinusoidal signal (1) analyzed with rectangular window for *A*=1, *ω*0=1 rad, *φ*= 1.3 rad, and *N* =8. Fig. 1b depicts spectrum components for positive and negative frequencies (9). The sum of those components gives continuous spectrum shown in Fig.1c that may also be computed from the FT definition (4). As seen from Fig.1c the energy is not concentrated in the single frequency bin *ω*0 (as would be for infinite length observation), but spills over to all neighboring frequencies. This phenomenon is called spectral leakage and may be the reason of significant estimation errors. In the given example spectral components for positive and negative frequencies (9) influence each other and maximum shown in Fig. 1c is moved from *ω*0=1 rad to *ωE*=1.04 rad (*E* stands for estimated value), that is estimation error equals 4%. Amplitude estimation error equals approx -15%. Estimation errors depend also on signal's phase *φ* as the sum (9) is complex. Because of the spectral leakage, the signal in the example disturbs its own spectrum. The impact of the spectral leakage would be stronger if the signal was a sum of sinusoidal signals, as every sinusoidal component would disturb its own spectrum and the spectrum of

Fig. 1d depicts DFT spectrum of the signal. DFT bins are only computed for frequencies

In DFT analysis spectral leakage is reduced by application of time windows other then rectangular and the error caused by the sampling of continuous spectrum only in

DFT (5) is derived for periodic signals; as a consequence frequency analysis is correct only for the signals containing integer number of periods. The signal with integer number of periods is called coherently or synchronously sampled. In field measurements, for sinusoidal signals (1) close to coherent sampling is obtained with PLL (Phase Locked Loop) that keeps integer ratio of signal's frequency *F*0 to sampling frequency *Fs*, see (3). The frequency of coherently sampled signal is *ω*0=(2*π*/*N*)*k* rad and equals the frequency of DFT bin with index *k*. Damped sinusoidal signals (2) have transient, and not periodic, nature and

*ωk*=(2*π*/*N*)*k*. Frequency estimation error in that case equals approx -21%.

frequencies *ωk*=(2*π*/*N*)*k* is practically eliminated by IpDFT algorithms.

(a) (b)

Fig. 2. Coherently sampled sinusoidal signal (2 periods) a), and its continuous Fourier spectrum (denoted by FT) and DFT spectrum b); *ωE* - frequency of DFT bin with the highest

thus cannot be synchronously acquired with PLL.

magnitude (estimated value)

all others sinusoidal components.

Fig. 3. Non-coherently sampled sinusoidal signal a), and its continuous Fourier spectrum (denoted by FT) and DFT spectrum b); *ωE* - frequency of DFT bin with the highest magnitude (estimated value)

Fig. 2a depicts coherently sampled sinusoidal signal containing exactly 2 periods. The frequency of this signal is *ω*0=2(2*π*/*N*)≈0.79 rad and it equals the frequency of the DFT bin with index *k*=2. Fig. 2b shows continuous Fourier spectrum (4) and DFT bins of this signal. Upper OX axis is scaled in DFT index *k*, and lower OX axis is scaled in frequency in radians.

The range of frequencies intentionally exceeds 2*π* (one period) to stress up the periodic nature of the spectrum of discrete signals and the fact that spectral leakage for small frequencies originates from neighboring period (i.e. the part of the spectrum for negative frequencies). For coherent sampling of sinusoidal signal only one DFT bin is nonzero and the analysis is practically not affected by spectral leakage and sampling of the continuous spectrum.

Fig. 3a depicts sinusoidal signal which does not contain integer numbers of periods. The frequency of this signal is *ω*0=2.2(2*π*/*N*)≈0.86 rad and lays between DFT bins with index *k*=2 and *k*=3. Fig. 3b shows FT spectrum and DFT spectrum of this signal. Estimation of signal's frequency based on the highest magnitude DFT bin is biased by the error of 0.79-0.86=-0.07 rad or -8%. High estimation errors would also be obtained for amplitude and phase estimation based on the highest magnitude DFT bin. Those errors caused by the sampling of continuous spectrum can be significantly reduced by IpDFT algorithms.

### **2.3 Time windows**

The application of time windows in DFT analysis is thoroughly reviewed in (Harris, 1978), and also described in signal processing textbooks, e.g. (Oppenheim et al., 1999, Lyons, 2004). Other time windows with interesting properties are described in (Nuttall, 1981).

Window properties are determined by its spectrum *W*(*ejω*) that consists from the main lobe which is the highest peak in the spectrum and side lobes. The shape of the spectrum of rectangular window (7) may be observed in Fig. 1b. The main lobe of the window should be as narrow as possible, and side lobes should be as low as possible. Narrow main lobe improves frequency resolution of DFT analysis, while low side lobes reduce spectral

Interpolation Algorithms of DFT for Parameters

highlighted in (Harris, 1978 Fig. 12).

coefficients are non zero.

Estimation of Sinusoidal and Damped Sinusoidal Signals 9

Optimal due to main lobe width and side lobes level are non cosine Kaiser-Bessel and Dolph-Chebyshev windows (Harris, 1978). Those windows generally perform well as

Fig. 4 shows comparison of windows spectra for RVCI, Kaiser-Bessel and Dolph-Chebyshev windows for similar attenuation of the first side lobe approx -31.5 dB (Fig. 4a) and approx -101 dB (Fig. 4b). RVCI window has the fastest decay of side lobes and the widest main lobe. Kaiser-Bessel window has narrow main lobe and slowly decay side lobes, and Dolph-Chebyshev window has the narrowest main lobe and the side lobes on the same level. The markers in Fig. 4 denote DFT bins. Cosine RVCI window has *M*+1 nonzero DFT coefficients as stated by (11). For non cosine Kaiser-Bessel and Dolph-Chebyshev windows all DFT

(a) (b)

IpDFT algorithms was discussed in (Agrež, 2009, Duda et al., 2011b).

approx -31.5 dB, b) attenuation approx -101 dB

non cosine window was given in (Duda, 2011a).

**3. Interpolated DFT algorithms** 

Fig. 4. Amplitude spectra of RVCI, Kaiser-Bessel (K) and Dolph-Chebyshev (Ch) windows with similar attenuation of the first side lobe, markers denote DFT bins; a) attenuation

IpDFT algorithm for sinusoidal signal analyzed with rectangular window was introduced in (Jain et al., 1979). In (Grandke, 1983) similar derivation was presented for sinusoidal signal analyzed with Hanning window. IpDFT algorithms for higher order RVCI windows are given in (Andria et al., 1989). In (Offelli & Petri, 1990) IpDFT algorithm for arbitrary cosine window was proposed based on polynomial approximation. In (Agrež, 2002) multipoint IpDFT was introduced with the feature of reducing long range leakage and thus reducing systematic estimation errors. IpDFT algorithm for the signal analyzed with arbitrary, even

IpDFT algorithms for damped sinusoidal signal analyzed with rectangular window are described in (Yoshida et al., 1981, Bertocco et al., 1994). In (Duda et al., 2011b) those algorithms were put in the same framework, and new algorithms for damped signal with rectangular window were proposed. Application of RVCI windows for damped signals in

leakage. Rectangular window is the one with the narrowest main lobe, which is an advantage and the highest side lobes which is disadvantage. All the other time windows reduce side lobes, and thus spectral leakage, by the cost of widening main lobe i.e. reducing frequency resolution. It is also known that rectangular window has the best noise immunity although systematic errors caused by leakage may be dominant for signal containing small number of cycles.

Time windows are defined as cosine windows or non cosine windows. The cosine windows may be written in the form

$$w\_n = \begin{cases} \sum\_{m=0}^{M} (-1)^m A\_m^w \cos\left(\frac{2\pi}{N} mn\right), & 0 \le n < N\\ 0, & 0 > n \ge N \end{cases} \tag{10}$$

where *w* in *Amw* is introduced to distinguish from signal's amplitude in (1-2).

IpDFT algorithms may only be analytically derived for Rife-Vincent class I (RVCI) windows. Coefficients *Amw* for RVCI windows are given in Tab.1. For *M*=0 RVCI window is rectangular window, and for *M*=1 RVCI window is Hanning (Hann) window. RVCI windows have the advantage of the fastest decay of the side lobes but they also have wide main lobe, which may be observed in Fig. 4. RVCI windows are also referred as cosα(*X*), *α*=0,2,4,6... defined in (Harris, 1978).


Table 1. Coefficients *Amw* for Rife-Vincent class I windows (10)

Other examples of popular cosine windows are Hamming window (*M*=1, *A*<sup>0</sup> *<sup>w</sup>*=0.54, *A*1*<sup>w</sup>*=0.46), and Blackman window (*M*=2, *A*0*<sup>w</sup>*=0.42, *A*1*<sup>w</sup>*=0.5, *A*2*<sup>w</sup>*=0.08).

Cosine windows (10) are the sum of frequency modulated rectangular windows, thus the spectrum of order *M* cosine window is

$$\mathcal{W}\_{M}(e^{j\alpha}) = \sum\_{m=0}^{M} (-1)^{m} \frac{A\_{m}^{w}}{2} \mathcal{W}^{R}(e^{j(\alpha - \alpha\_{m})}) + (-1)^{m} \frac{A\_{m}^{w}}{2} \mathcal{W}^{R}(e^{j(\alpha + \alpha\_{m})}) \,. \tag{11}$$

where *ωm*=(2*π*/*N*)*m* and *WR*(*ejω*) is the spectrum of rectangular window

$$\mathcal{W}^{R}(e^{j\alpha}) = \sum\_{n=0}^{N-1} e^{-j\alpha n} = e^{-j\alpha(N-1)/2} \frac{\sin(\alpha N/2)}{\sin(\alpha/2)} \,. \tag{12}$$

8 Fourier Transform – Signal Processing

leakage. Rectangular window is the one with the narrowest main lobe, which is an advantage and the highest side lobes which is disadvantage. All the other time windows reduce side lobes, and thus spectral leakage, by the cost of widening main lobe i.e. reducing frequency resolution. It is also known that rectangular window has the best noise immunity although systematic errors caused by leakage may be dominant for signal containing small

Time windows are defined as cosine windows or non cosine windows. The cosine windows

<sup>2</sup> ( 1) cos , 0

<sup>⎧</sup> ⎛ ⎞ <sup>⎪</sup> <sup>−</sup> ≤ < ⎜ ⎟ <sup>=</sup> <sup>⎨</sup> ⎝ ⎠ <sup>⎪</sup> > ≥ <sup>⎩</sup>

IpDFT algorithms may only be analytically derived for Rife-Vincent class I (RVCI) windows. Coefficients *Amw* for RVCI windows are given in Tab.1. For *M*=0 RVCI window is rectangular window, and for *M*=1 RVCI window is Hanning (Hann) window. RVCI windows have the advantage of the fastest decay of the side lobes but they also have wide main lobe, which may be observed in Fig. 4. RVCI windows are also referred as cosα(*X*),

m = 0 1 2 3 4 5 6

π

*A mn n N*

∑ , (10)

*m*

*n N*

0

=

*n m*

*<sup>M</sup> m w*

*w N*

where *w* in *Amw* is introduced to distinguish from signal's amplitude in (1-2).

0, 0

number of cycles.

may be written in the form

*α*=0,2,4,6... defined in (Harris, 1978).

*Amw*, M=0 1 *Amw*, M=1 1 1

spectrum of order *M* cosine window is

*M*

ω

*Amw*, M=2 1 4/3 1/3

Table 1. Coefficients *Amw* for Rife-Vincent class I windows (10)

0

ω

=

*m*

*Amw*, M=3 1 3/2 3/5 1/10

*A*1*<sup>w</sup>*=0.46), and Blackman window (*M*=2, *A*0*<sup>w</sup>*=0.42, *A*1*<sup>w</sup>*=0.5, *A*2*<sup>w</sup>*=0.08).

where *ωm*=(2*π*/*N*)*m* and *WR*(*ejω*) is the spectrum of rectangular window

*N R j j n j N*

*n*

1

0

=

*Amw*, M=4 1 8/5 4/5 8/35 1/35

*Amw*, M=5 1 105/63 60/63 45/126 5/63 1/126 *Amw*, M=6 1 396/231 495/462 110/231 33/231 6/231 1/462

Other examples of popular cosine windows are Hamming window (*M*=1, *A*0*<sup>w</sup>*=0.54,

Cosine windows (10) are the sum of frequency modulated rectangular windows, thus the

*<sup>M</sup> w w jj j m R m m m R*

 ω

*A A W e W e W e*

( ) ( 1) ( ) ( 1) ( ) 2 2

sin( / 2) ( ) sin( / 2)

*<sup>N</sup> We e e*

<sup>−</sup> − −−

 ωω

ω

( 1)/2

( ) ( )

ω

ω

= = ∑ . (12)

− +

= − ∑ + − . (11)

*m m*

 ω ω Optimal due to main lobe width and side lobes level are non cosine Kaiser-Bessel and Dolph-Chebyshev windows (Harris, 1978). Those windows generally perform well as highlighted in (Harris, 1978 Fig. 12).

Fig. 4 shows comparison of windows spectra for RVCI, Kaiser-Bessel and Dolph-Chebyshev windows for similar attenuation of the first side lobe approx -31.5 dB (Fig. 4a) and approx -101 dB (Fig. 4b). RVCI window has the fastest decay of side lobes and the widest main lobe. Kaiser-Bessel window has narrow main lobe and slowly decay side lobes, and Dolph-Chebyshev window has the narrowest main lobe and the side lobes on the same level. The markers in Fig. 4 denote DFT bins. Cosine RVCI window has *M*+1 nonzero DFT coefficients as stated by (11). For non cosine Kaiser-Bessel and Dolph-Chebyshev windows all DFT coefficients are non zero.

Fig. 4. Amplitude spectra of RVCI, Kaiser-Bessel (K) and Dolph-Chebyshev (Ch) windows with similar attenuation of the first side lobe, markers denote DFT bins; a) attenuation approx -31.5 dB, b) attenuation approx -101 dB

## **3. Interpolated DFT algorithms**

IpDFT algorithm for sinusoidal signal analyzed with rectangular window was introduced in (Jain et al., 1979). In (Grandke, 1983) similar derivation was presented for sinusoidal signal analyzed with Hanning window. IpDFT algorithms for higher order RVCI windows are given in (Andria et al., 1989). In (Offelli & Petri, 1990) IpDFT algorithm for arbitrary cosine window was proposed based on polynomial approximation. In (Agrež, 2002) multipoint IpDFT was introduced with the feature of reducing long range leakage and thus reducing systematic estimation errors. IpDFT algorithm for the signal analyzed with arbitrary, even non cosine window was given in (Duda, 2011a).

IpDFT algorithms for damped sinusoidal signal analyzed with rectangular window are described in (Yoshida et al., 1981, Bertocco et al., 1994). In (Duda et al., 2011b) those algorithms were put in the same framework, and new algorithms for damped signal with rectangular window were proposed. Application of RVCI windows for damped signals in IpDFT algorithms was discussed in (Agrež, 2009, Duda et al., 2011b).

Interpolation Algorithms of DFT for Parameters

*k k k k*

which may be substituted by

δ

Let us define the following summation

*S*

+

Then (15) and (17) may be rewritten in the form

**3.1 Sinusoidal signal – RVCI windows 3.1.1 Rectangular window (RVCI** *M***=0)** 

From (23) frequency correction is

From (15) we get

+ −

*k k k k*

Estimation of Sinusoidal and Damped Sinusoidal Signals 11



*V V W NW N N <sup>R</sup> VV W NW N N*

<sup>+</sup> − +− + <sup>=</sup> <sup>≈</sup> + − +− −

*k n <sup>w</sup> m M m m*

where *M* is the order of cosine window (10) and *Amw* is the vector of window coefficients.


*k k kk k k kk V V SS V V SS*

<sup>+</sup> <sup>+</sup> <sup>≈</sup> + +

In the next subsections IpDFT algorithms will be derived and explained. In all derivations

 ωω

Rectangular window *wnR* is defined by (7) and the spectrum of this window is given by (12).


<sup>+</sup> −+ − <sup>≈</sup> −+ −

δπ π

*V NN*


δ+

*V N V NN* δπ

δπ

By approximating sine functions by their arguments in (22) we have

*k k* δπ

*V N*

 π


<sup>+</sup> ≈ = <sup>−</sup> + −

 π

> 1 1

+


<sup>=</sup> <sup>+</sup>

<sup>0</sup> <sup>0</sup> ( ) ( ) ( ), 0 <sup>2</sup>

 ϕ

*j jj <sup>A</sup> Ve e We*

δ π

δ π  ω δπ

 ω δπ

for 0 ' ( 1) ' , ( ) for 0 ' / 2 *M m n w m m*

> 1 1 1 1

+ + − −

*m n mA A*

− + <sup>⎪</sup> ≠ = <sup>⎩</sup>

*A mA A*

δπ

δπ

0 | |

<sup>+</sup> <sup>+</sup> ≈ (19)

(20)

. (22)

. (23)

. (24)

∑ , (18)

ω π<sup>−</sup> ≈ ≤ < . (21)

δπ

 δ

 δ δπ

 π

, (16)

. (17)

 π

> π

> π

*VV V NV N N VV V NV N N*

+ − +− + <sup>=</sup> + − +− −

1 0 0 1 0 0

ω δπ

ω δπ

1 1

+ −

Solving (17) for *δ* we obtain three-point (3p) IpDFT algorithm.

=−

the spectrum of the sinusoidal signal (9) is approximated by

*k k* ω

δ

+

− ⎧ <sup>⎪</sup> = = <sup>=</sup> <sup>⎨</sup>

> | || | | || |

The IpDFT problem for sinusoidal and damped sinusoidal signals is depicted in Fig. 5 and may be formulated as follows. Based on the DFT spectrum *Vk* (5) of the signal *xn* analyzed with the known window *wn* (6) find the frequency correction *δ* so to satisfy the equation

$$
\alpha\_0 = (k \pm \delta) \frac{2\pi}{N}, \quad 0 < \delta \le 0.5 \,\text{A} \tag{13}
$$

where *ω*0 is signal's frequency, *N* is the number of samples and *k* is the index of DFT bin with the highest magnitude. If |*Vk*+1|>|*Vk*-1|, as in Fig. 5, then there is '+' in (13).

For coherent sampling depicted in Fig. 2 frequency correction *δ*=0, and there is no need to use IpDFT algorithm. For the example of non coherent sampling shown in Fig. 3 frequency correction is *δ*=0.2 and it would be *δ*=-0.2 if the signal's frequency was *ω*0=1.8(2*π*/*N*)≈0.71 rad.

Fig. 5. Illustration of IpDFT problem; *ωk*–1, *ωk*, *ωk*+1 frequencies of DFT bins, *ω*0 - signal's frequency, *δ* - frequency correction

Let us define the following ratio of modulus of DFT bins with the highest amplitude

$$\frac{|\left|V\_{k+1}\right|}{\left|V\_k\right|} = \frac{|\left|V(a\_{k+1})\right|}{\left|V(a\_k)\right|} = \frac{|\left|V(a\_0 - \delta 2\pi/N + 2\pi/N)\right|}{\left|V(a\_0 - \delta 2\pi/N)\right|}\,,\tag{14}$$

where it was used *ωk*=*ω*0-*δ*2*π*/*N*, *ωk*+1=*ω*0-*δ*2*π*/*N*+2*π*/*N* that goes from Fig. 5 and (13). As stated by (9) the spectrum of the sinusoidal signal consists from the spectrum of the window moved from the position *ω*=0 to *ω*=*ω*0, thus the ratio of signal DFT (14) may be substituted by the ratio of window DFT

$$R(\delta) = \frac{|V\_{k+1}|}{|V\_k|} \approx \frac{|\mathcal{W}(-\delta \mathfrak{L}\pi/N + \mathfrak{L}\pi/N)|}{|\mathcal{W}(-\delta \mathfrak{L}\pi/N)|} \,. \tag{15}$$

The approximation sign '≈' is used in (15) instead of equality, because the ratios may slightly differ due to spectral leakage. Solving (15) for frequency correction *δ* we obtain two-point (2p) IpDFT formulas. Analytic solution of (15) is only possible for RVCI windows.

The ratio of DFT bins may also be defined with three bins as

$$\frac{|V\_k| + |V\_{k+1}|}{|V\_k| + |V\_{k-1}|} = \frac{|V(a\_0 - \delta \Sigma \pi / N)| + |V(a\_0 - \delta \Sigma \pi / N + 2\pi / N)|}{|V(a\_0 - \delta \Sigma \pi / N)| + |V(a\_0 - \delta \Sigma \pi / N - 2\pi / N)|},\tag{16}$$

which may be substituted by

10 Fourier Transform – Signal Processing

The IpDFT problem for sinusoidal and damped sinusoidal signals is depicted in Fig. 5 and may be formulated as follows. Based on the DFT spectrum *Vk* (5) of the signal *xn* analyzed with the known window *wn* (6) find the frequency correction *δ* so to satisfy the equation

> <sup>2</sup> ( ) , 0 0.5 *<sup>k</sup> N* π

where *ω*0 is signal's frequency, *N* is the number of samples and *k* is the index of DFT bin

For coherent sampling depicted in Fig. 2 frequency correction *δ*=0, and there is no need to use IpDFT algorithm. For the example of non coherent sampling shown in Fig. 3 frequency correction is *δ*=0.2 and it would be *δ*=-0.2 if the signal's frequency was *ω*0=1.8(2*π*/*N*)≈0.71

 δ

= ± <≤ , (13)

0

rad.

frequency, *δ* - frequency correction

by the ratio of window DFT

ωδ

with the highest magnitude. If |*Vk*+1|>|*Vk*-1|, as in Fig. 5, then there is '+' in (13).

Fig. 5. Illustration of IpDFT problem; *ωk*–1, *ωk*, *ωk*+1 frequencies of DFT bins, *ω*0 - signal's

Let us define the following ratio of modulus of DFT bins with the highest amplitude


 ω δπ

where it was used *ωk*=*ω*0-*δ*2*π*/*N*, *ωk*+1=*ω*0-*δ*2*π*/*N*+2*π*/*N* that goes from Fig. 5 and (13). As stated by (9) the spectrum of the sinusoidal signal consists from the spectrum of the window moved from the position *ω*=0 to *ω*=*ω*0, thus the ratio of signal DFT (14) may be substituted


The approximation sign '≈' is used in (15) instead of equality, because the ratios may slightly differ due to spectral leakage. Solving (15) for frequency correction *δ* we obtain two-point

*<sup>V</sup> W NN <sup>R</sup> V WN* δπ

(2p) IpDFT formulas. Analytic solution of (15) is only possible for RVCI windows.

*VV V N N VV V N*

0

δ π

ω δπ  π

+ + − + = = <sup>−</sup> , (14)

 π

<sup>+</sup> − + = ≈ <sup>−</sup> . (15)

1 10

ω

ω

*k k*

*k k k k*

δ

The ratio of DFT bins may also be defined with three bins as

$$R(\mathcal{S}) = \frac{|V\_k| + |V\_{k+1}|}{|V\_k| + |V\_{k-1}|} \approx \frac{|\mathcal{W}(-\delta \mathfrak{L}\pi/N)| + |\mathcal{W}(-\delta \mathfrak{L}\pi/N + \mathfrak{L}\pi/N)|}{|\mathcal{W}(-\delta \mathfrak{L}\pi/N)| + |\mathcal{W}(-\delta \mathfrak{L}\pi/N - \mathfrak{L}\pi/N)|}. \tag{17}$$

Solving (17) for *δ* we obtain three-point (3p) IpDFT algorithm.

Let us define the following summation

$$S\_{k+n} = \left| \sum\_{m=-M}^{M} \frac{(-1)^{m+n} A\_{\;\,m}^{\;\prime}}{\delta - (m+n)} \right| \prime \begin{cases} \text{for } m = 0 \text{ A}^{\;\prime}\_{\;\,m} = A\_0^w\\ \text{for } m \neq 0 \text{ A}^{\;\prime}\_{\;\,m} = A\_{\vert m \vert}^w / \text{2} \end{cases} \tag{18}$$

where *M* is the order of cosine window (10) and *Amw* is the vector of window coefficients. Then (15) and (17) may be rewritten in the form

$$\frac{|V\_{k+1}|}{|V\_k|} \approx \frac{S\_{k+1}}{S\_k} \tag{19}$$

$$\frac{|\|V\_k\| + |V\_{k+1}|}{|\|V\_k\| + |\|V\_{k-1}\|} \approx \frac{S\_k + S\_{k+1}}{S\_k + S\_{k-1}}\tag{20}$$

In the next subsections IpDFT algorithms will be derived and explained. In all derivations the spectrum of the sinusoidal signal (9) is approximated by

$$V(e^{j\phi}) \approx \frac{A\_0}{2} e^{j\phi} \mathcal{W}(e^{j(\alpha - \alpha\_0)}), \quad 0 \le \alpha < \pi \,\,\,\tag{21}$$

#### **3.1 Sinusoidal signal – RVCI windows**

#### **3.1.1 Rectangular window (RVCI** *M***=0)**

Rectangular window *wnR* is defined by (7) and the spectrum of this window is given by (12). From (15) we get

$$\frac{|\boldsymbol{V}\_{k+1}|}{|\boldsymbol{V}\_{k}|} \approx \left| \frac{\sin(-\delta\pi + \pi)}{\sin(-\delta\pi \,/\,\mathrm{N} + \pi \,/\,\mathrm{N})} \right| \frac{\sin(-\delta\pi \,/\,\mathrm{N})}{\sin(-\delta\pi)} \right|. \tag{22}$$

By approximating sine functions by their arguments in (22) we have

$$\frac{|\boldsymbol{V}\_{k+1}|}{|\boldsymbol{V}\_{k}|} \approx \left| \frac{\delta \boldsymbol{\pi} / \boldsymbol{N}}{-\delta \boldsymbol{\pi} / \boldsymbol{N} + \boldsymbol{\pi} / \boldsymbol{N}} \right| = \left| \frac{\delta}{\delta - 1} \right|. \tag{23}$$

From (23) frequency correction is

$$\mathcal{S} = \frac{\|\boldsymbol{V}\_{k+1}\|}{\|\boldsymbol{V}\_{k}\| + \|\boldsymbol{V}\_{k+1}\|}. \tag{24}$$

Interpolation Algorithms of DFT for Parameters

ω

By calculating modulus of DFT bins in (15) we get

*k k V V*

2

with the solution for frequency correction

Signal's frequency is next computed from (13).

Similarly to (26), signal's amplitude is

0

Signal's phase, computed as by (28), is

**3.1.3 Higher order RVCI windows** 

rectangular window *M*=0, *A*<sup>0</sup>

0.25 0.5 0.25 1 1

 δδ

+ −

correction (37) is used.

1 | 1|

δ

 <sup>+</sup> − − == + +

δ

1

0

*S S k k*

*Sk*

ω

δ

δ δ

> δ

δδ

*<sup>H</sup> <sup>j</sup> We N*

ω

which may be rewritten as

1

*k k V V*

Estimation of Sinusoidal and Damped Sinusoidal Signals 13

0.25 0.5 0.25 | ( )||sin( /2)| sin( /2 / ) sin( /2) sin( /2 / )


<sup>+</sup> = − + − +− <sup>−</sup> −+ −

 δ

<sup>+</sup> ⋅ − + ⋅ −⋅ +⋅ − + <sup>=</sup> − − −⋅ − . (35)

 δ


As seen in (35) coefficients by the powers of *δ* equal zero, and we get simple equation


<sup>+</sup> <sup>+</sup> <sup>=</sup> − +

1

<sup>−</sup> <sup>=</sup> <sup>+</sup>


<sup>0</sup> arg{ ( )} arg{ ( )} arg{ } *j NN VVe <sup>k</sup>*

with the difference that in (28) frequency correction (24) is used and in (39) frequency

Rectangular window and Hanning window are RVCI windows of order *M*=0 and *M*=1. For

*<sup>w</sup>*=1 and from (18) 1

0.25 0.5 0.25

<sup>−</sup> =++ <sup>−</sup> <sup>−</sup> .

1 2

 δ

 <sup>+</sup> <sup>=</sup> <sup>−</sup> . IpDFT formulas for two-point and three-point interpolation may next be calculated from (19) and (20). For Hanning window *M*=1, *A*0*<sup>w</sup>*=0.5, *A*1*<sup>w</sup>*=0.5 and from (18)

δδ

<sup>+</sup>

 ω

<sup>=</sup> <sup>=</sup> <sup>−</sup> <sup>+</sup> <sup>−</sup> <sup>−</sup> <sup>+</sup>

2| | | | | || | *k k k k V V V V*

1

π

δπ

 δ

δ π

( / )( 1)

*Sk*

− − = ± (39)

 δ δ

1 | 1|

δ<sup>−</sup> <sup>=</sup> <sup>+</sup> , 0

+

δ

δ

 δ

*k k V V*

δ+

*H k k H*

δ π

ω

and 1

*Sk*

*<sup>W</sup> V V <sup>V</sup> W N*

ωπ

≈ − +−

*N N*

 ωπ (33)

, (34)

 ω

− +

 δδ

2

δ

δ δ  δ

, (36)

. (37)

. (38)

1 | |

δ<sup>+</sup> = = ,

*S S k k*

Signal's frequency in next computed from (13).

The amplitude of the frequency bin *ω*0 may be found from the following proportion

$$\frac{|\left|V(o\_0)\right|}{|\left|V(o\_k)\right|} = \frac{|\left|\mathcal{W}^R(0)\right|}{|\left|\mathcal{W}^R(\delta 2\pi/N)\right|} = N \left|\frac{\sin(\delta \pi/N)}{\sin(\delta \pi)}\right| \approx \left|\frac{\delta \pi}{\sin(\delta \pi)}\right|. \tag{25}$$

From (25) signal's amplitude is

$$|\left|V(\alpha\_0)\right| = \left|V\_k \right| \left|\frac{\delta \pi}{\sin(\delta \pi)}\right| = \left|V\_k \right| \frac{\pi}{\sin(\delta \pi)} \delta \text{ .}\tag{26}$$

The phase of the frequency bin *ω*0 may be found from the following equation

$$\arg\{V(o\_0)\} - \arg\{V(o\_k)\} = \arg\{W^R(0)\} - \arg\{W^R(\delta 2\pi/N)\} = -\arg\{e^{-j\delta(\pi/N)(N-1)}\}.\tag{27}$$

From (27) signal's phase is

$$\arg\{V(o\_0)\} = \arg\{V(o\_k)\} \pm \arg\{e^{-j\delta(\pi/N)(N-1)}\}\,. \tag{28}$$

The sign '+' or '-' in (28) is selected the same way as in (13).

In similar way, for the three-point interpolation we get from (17) and (12)

$$\frac{|\left|\left|V\_{k}\right| + \left|V\_{k+1}\right|\right|}{|\left|V\_{k}\right| + \left|V\_{k-1}\right|} \approx \frac{|\left|\left.\delta + 1\right|}{|\left.\delta - 1\right|}\tag{29}$$

and finally

$$\mathcal{S} = \frac{\left| \left| V\_{k+1} \right| + \left| \left| V\_{k-1} \right| \right|}{2 \left| \left| V\_k \right| - \left| V\_{k-1} \right| + \left| \left| V\_{k+1} \right| \right| \right.}. \tag{30}$$

### **3.1.2 Hanning (Hann, RVCI** *M***=1) window**

Periodic Hanning window is defined as

$$w\_n^H = \begin{cases} 0.5 - 0.5 \cos \left( \frac{2\pi}{N} n \right), & 0 \le n < N\_\prime \\ 0, & 0 > n \ge N. \end{cases} \tag{31}$$

Hanning window (31) may be interpreted as the sum of rectangular window and frequency modulated rectangular window, thus based on FT properties, the spectrum of the Hanning window is the following sum of the spectra of rectangular windows

$$\mathcal{W}^{H}\left(e^{j\alpha}\right) = -0.25\mathcal{W}^{R}\left(e^{j(\alpha-\alpha\_{1})}\right) + 0.5\mathcal{W}^{R}\left(e^{j\alpha}\right) - 0.25\mathcal{W}^{R}\left(e^{j(\alpha+\alpha\_{1})}\right), \quad \alpha\_{1} = 2\pi \text{ / } N\tag{32}$$

Inserting (12) into (32), taking the approximation *ej*(*π*/*N*)(*N*-1)≈-1+*j π*/*N*, and assuming *π*/*N*<<1 the modulus of Hanning window spectrum is

$$|\mathcal{W}^H(e^{j\phi})| \approx |\sin(\alpha N/2)| \left| -\frac{0.25}{\sin(\alpha/2 - \pi/N)} + \frac{0.5}{\sin(\alpha/2)} - \frac{0.25}{\sin(\alpha/2 + \pi/N)} \right| \tag{33}$$

By calculating modulus of DFT bins in (15) we get

$$\frac{|\boldsymbol{V}\_{k+1}|}{|\boldsymbol{V}\_{k}|} = \left| \frac{0.25}{\delta} - \frac{0.5}{\delta - 1} + \frac{0.25}{\delta - 2} \right| / \left| -\frac{0.25}{\delta + 1} + \frac{0.5}{\delta} - \frac{0.25}{\delta - 1} \right| \tag{34}$$

which may be rewritten as

12 Fourier Transform – Signal Processing

The amplitude of the frequency bin *ω*0 may be found from the following proportion



δπ

<sup>0</sup> arg{ ( )} arg{ ( )} arg{ (0)} arg{ ( 2 / )} arg{ } *R R j NN V V W WN e <sup>k</sup>*

<sup>0</sup> arg{ ( )} arg{ ( )} arg{ } *j NN VVe <sup>k</sup>*

1 1 | || | | 1| | | | | | 1|

+ − δ

δ

1 1

*n nN*

− + = − + − = (32)

 ω ω

ω

 π

− +

1 1

<sup>2</sup> 0.5 0.5cos , 0 ,

1 1 ( ) ( ) <sup>1</sup> ( ) 0.25 ( ) 0.5 ( ) 0.25 ( ), 2 / *H R RR j j jj W e We We We <sup>N</sup>*

π <sup>⎪</sup> − ≤ <sup>&</sup>lt; ⎜ ⎟ <sup>=</sup> <sup>⎨</sup> ⎝ ⎠ <sup>⎪</sup> > ≥ <sup>⎩</sup>


*k k k k V V V V*

+ −

0, 0 .

ωω

*n N*

Hanning window (31) may be interpreted as the sum of rectangular window and frequency modulated rectangular window, thus based on FT properties, the spectrum of the Hanning

Inserting (12) into (32), taking the approximation *ej*(*π*/*N*)(*N*-1)≈-1+*j π*/*N*, and assuming *π*/*N*<<1

⎧ ⎛ ⎞

*w N*

window is the following sum of the spectra of rectangular windows

 ω <sup>+</sup> <sup>=</sup> − +

 ω δπ

δ π− − − = − =− . (27)

> δ π

δπ

π

δπ<sup>=</sup> <sup>=</sup> . (26)

δ

( / )( 1)

− − = ± . (28)

<sup>+</sup> <sup>+</sup> <sup>≈</sup> <sup>+</sup> <sup>−</sup> (29)

<sup>=</sup> <sup>=</sup> <sup>≈</sup> . (25)

 δπ

> δπ

> > ( / )( 1)

δ π

. (30)

(31)


The phase of the frequency bin *ω*0 may be found from the following equation

*<sup>V</sup> W N <sup>N</sup>*

*R*

δ π

*R*

*V W N*

ω

ω

In similar way, for the three-point interpolation we get from (17) and (12)

δ

*H n*

The sign '+' or '-' in (28) is selected the same way as in (13).

Signal's frequency in next computed from (13).

*k*

 ω

**3.1.2 Hanning (Hann, RVCI** *M***=1) window**  Periodic Hanning window is defined as

ω

the modulus of Hanning window spectrum is

ω

ω

From (25) signal's amplitude is

ω

From (27) signal's phase is

and finally

$$\frac{|\boldsymbol{V}\_{k+1}|}{|\boldsymbol{V}\_{k}|} = \left| \frac{\delta^{2}(2 \cdot 0.25 - 0.5) + \delta(2 \cdot 0.5 - 4 \cdot 0.25) + 2 \cdot 0.25}{\delta(\delta - 1)(\delta - 2)} \right| \frac{\delta(\delta - 1)(\delta + 1)}{\delta^{2}(0.5 - 2 \cdot 0.25) - 0.5} \right|. \tag{35}$$

As seen in (35) coefficients by the powers of *δ* equal zero, and we get simple equation

$$\frac{|\;|\;V\_{k+1}|\;}{\|\;V\_k\;|\;} = \left|\frac{\delta+1}{-\delta+2}\right|\;\prime\tag{36}$$

with the solution for frequency correction

$$\mathcal{S} = \frac{\mathbf{2} \left| \left| V\_{k+1} \right| - \left| V\_k \right|}{\left| V\_k \right| + \left| V\_{k+1} \right|}. \tag{37}$$

Signal's frequency is next computed from (13).

Similarly to (26), signal's amplitude is

$$|\left|V(a\_{\rm{0}})\right| = \left|V\_{k}\right| \frac{|\mathcal{W}^{H}(\rm{0})|}{|\mathcal{W}^{H}(\delta 2\pi/N)|} = \left|V\_{k}\right| \frac{0.5\pi}{\sin(\delta \pi)} \Big/ \left| -\frac{0.25}{\delta -1} + \frac{0.5}{\delta} - \frac{0.25}{\delta + 1} \right|. \tag{38}$$

Signal's phase, computed as by (28), is

$$\arg\{V(o\_0)\} = \arg\{V(o\_k)\} \pm \arg\{e^{-j\delta(\pi/N)(N-1)}\} \tag{39}$$

with the difference that in (28) frequency correction (24) is used and in (39) frequency correction (37) is used.

### **3.1.3 Higher order RVCI windows**

Rectangular window and Hanning window are RVCI windows of order *M*=0 and *M*=1. For rectangular window *M*=0, *A*0*<sup>w</sup>*=1 and from (18) 1 1 | 1| *Sk* δ <sup>−</sup> <sup>=</sup> <sup>+</sup> , 0 1 | | *S S k k* δ <sup>+</sup> = = , 1 1 | 1| *Sk* δ <sup>+</sup> <sup>=</sup> <sup>−</sup> . IpDFT formulas for two-point and three-point interpolation may next be calculated from (19) and (20). For Hanning window *M*=1, *A*0*<sup>w</sup>*=0.5, *A*<sup>1</sup> *<sup>w</sup>*=0.5 and from (18) 0 0.25 0.5 0.25 1 1 *S S k k* δ δδ <sup>+</sup> − − == + + + − and 1 0.25 0.5 0.25 1 2 *Sk* δδ δ <sup>+</sup> <sup>−</sup> =++ <sup>−</sup> <sup>−</sup> .

Interpolation Algorithms of DFT for Parameters

( 1)/2

<sup>⎪</sup> = = <sup>⎨</sup> ⎪ ≠ = ⎩

**3.2 Sinusoidal signal – arbitrary windows** 

the ratio *R*(*δ*) (15), (17) for the window of interest.

estimated from previously computed *δ*=*fδ*(*R*). From the LS polynomial approximation we get

Signal's frequency is given by

and for three-point interpolation

interpolation

≈

for 0 ' , for 0 ' / 2.

*mA A mAA*

*m*

( )

ω

− −

*W e*

⎧

*j*

ω

0 | |

*w m m*

definition of Fourier transform (4) or in the least squares (LS) sense.

*M j N n m M w*

=−

frequency bin

Estimation of Sinusoidal and Damped Sinusoidal Signals 15

For higher order RVCI windows better accuracy in phase estimation may be obtained by replacing angle correction, that is second term in (46), by the angle of the following

*<sup>N</sup> e A <sup>m</sup> jm <sup>N</sup>*

≈ −+ ⎜ ⎟

For known frequency *ω*0 amplitude and phase of the signal may also be computed from the

IpDFT formulas (41), (43) are only valid for RVCI windows, which have wide main lobe that deteriorates frequency resolution and noise performance. In practice it is often desired to analyze the signal with the window having better properties. It is known from literature, e.g. (Harris, 1978), that optimal parametric non cosine Kaiser-Bessel and Dolph-Chebyshev windows often perform superior over other windows including RVCI windows. In the following part IpDFT algorithm for arbitrary, even non cosine, windows is described for two- and three-point interpolation. The algorithm is based on polynomial approximation of

First, the ratio *R*(*δ*) (15), (17) is computed numerically for the selected window based on window spectrum, and then the dependence *δ*=*fδ*(*R*) is approximated by polynomial. During analysis the ratio *R*(*δ*) (15), (17) is evaluated from DFT bins, and frequency correction is

( ), ( )

*P R P R aR*

δ

ω

but this time *R* is computed from DFT bins of the analyzed signal.

*dd l*

where *Pd* denotes *L* degree polynomial, and *R* is computed from window's spectrum.

<sup>0</sup> [ ( )](2 / ) *<sup>d</sup>*

Signal's amplitude is determined from the dependence *XN*=*fX*(*δ*), where for two-point


*VW N*

δ π

*k <sup>V</sup> <sup>W</sup> <sup>X</sup>*

ω

0

*l*

 π =

*l*

≈ = ∑ , (48)

= ±*k PR N* , (49)

= ≈ , (50)

*L*

sin( / 2) ( 1) '[| |](1 / ) , sin( / 2 / )

π

⎛ ⎞

+ ⎝ ⎠

*m N*

 π

ω

ω

∑ (47)

For higher order cosine windows (*M*>1) we may write down (19) and demand that the coefficients by the powers of *δ* equal zero, as in (35) for Hanning window, and next solve the equations for *δ*. Described procedure allows us to find cosine window coefficients *Amw* that give analytic IpDFT solutions, and it turns out that by this procedure RVCI windows are found. For example, for *M*=2 grouping coefficient by the powers of *δ* gives

$$\begin{cases} (-A\_0^w + 2A\_1^w - 2A\_2^w)\delta^4 = 0 \\ (-4A\_0^w + 8A\_1^w - 8A\_2^w)\delta^3 = 0 \\ (-A\_0^w + 4A\_1^w - 10A\_2^w)\delta^2 = 0 \end{cases} \begin{cases} A\_0^w = 6A\_2^w \\ A\_1^w = 4A\_2^w \\ A\_2^w \end{cases} \tag{40}$$

which are RVC1 *M*=2 coefficients listed in Tab. I.

For the signal analyzed with RVCI window for two-point interpolation we get

$$\delta = \frac{(M+1)\left|\left|V\_{k+1}\right| - M\left|V\_k\right|}{\left|V\_k\right| + \left|V\_{k+1}\right|}, \ M = 0, 1, 2, \dots \tag{41}$$

For *M*=0 and *M*=1 (41) agrees with previously derived formulas (24) and (37) for rectangular and Hanning window. Signal's frequency is next computed from (13). Signal's amplitude is

$$|\!|V(o\_0)\!| = |V\_k\!| \frac{|\!|\mathcal{W}(0)\!|}{|\!|\mathcal{W}(\delta 2\pi/N)\!|} = \frac{2\pi}{\sin(\delta \pi)} \frac{|V\_k\!|}{S\_k} \tag{42}$$

For three-point interpolation from (20) we get

$$\mathcal{S} = (M+1)\frac{|V\_{k+1}| - |V\_{k-1}|}{2|V\_k| + |V\_{k-1}| + |V\_{k+1}|} \; \; M = 1, 2, 3, \dots \tag{43}$$

For three-point interpolation and rectangular window frequency correction *δ* is computed from (30) and not (43). Equation (43) does not hold for rectangular window, i.e. *M*=0, because the spectrum of rectangular window contains only one nonzero DFT bin.

Signal's amplitude for three-point interpolation may be computed from the proportion

$$\begin{aligned} \frac{|\left|\left(\left(a\_0 - 2\pi \,/\, N\right)\right) + 2\left|\left|\left(a\_0\right)\right| + \left|\left|\left(a\_0 + 2\pi \,/\, N\right)\right|\right|}{\left|\left|V\_{k-1}\right| + 2\left|\left|V\_k\right| + \left|V\_{k+1}\right|\right|} = \\ & \frac{|\left|\mathcal{W}(-2\pi \,/\, N)\right| + 2\left|\left|\mathcal{W}(0)\right| + \left|\mathcal{W}(2\pi \,/\, N)\right|\right|}{\left|\left|\mathcal{W}(-\delta 2\pi \,/\, N - 2\pi \,/\, N)\right| + 2\left|\left|\mathcal{W}(-\delta 2\pi \,/\, N)\right| + \left|\mathcal{W}(-\delta 2\pi \,/\, N + 2\pi \,/\, N)\right|\right|} \end{aligned} \tag{44}$$

as

$$|\,|V(o\_0)| = \frac{2\pi}{\sin(\delta\pi)} \frac{|V\_{k-1}| + 2\left|\,|V\_k| + \left|V\_{k+1}\right|\right|}{S\_{k-1} + 2S\_k + S\_{k+1}}.\tag{45}$$

The phase of the frequency bin *ω*0 for two-point and three-point interpolation is estimated the same way as for rectangular window (28) and Hanning window (39)

$$\arg\{V(o\_0)\} = \arg\{V(o\_k)\} \pm \arg\{e^{-j\delta(\pi/N)(N-1)}\} \,. \tag{46}$$

14 Fourier Transform – Signal Processing

For higher order cosine windows (*M*>1) we may write down (19) and demand that the coefficients by the powers of *δ* equal zero, as in (35) for Hanning window, and next solve the equations for *δ*. Described procedure allows us to find cosine window coefficients *Amw* that give analytic IpDFT solutions, and it turns out that by this procedure RVCI windows are

4

δ

δ

δ

3

,

0 2 1 2

*A A A A*

6 4

, (40)

, 0,1,2,... *M* = . (41)

, 1,2,3,... *M* = . (43)

 π

. (45)

(44)

*k*

π

 π

1 1

− + − +

*k kk k kk*

*S SS*

1 1

δ π δπ

( / )( 1)

− − = ± . (46)

δπ<sup>=</sup> <sup>=</sup> (42)

*k*

*w w w w*

2

*A*

*w*

<sup>⎧</sup> <sup>=</sup> ⎪⎪ ⎨ = ⎪ ⎪⎩

2

found. For example, for *M*=2 grouping coefficient by the powers of *δ* gives

012

<sup>⎧</sup> <sup>−</sup> +− = ⎪⎪ ⎨ − +− = <sup>⎪</sup> <sup>−</sup> +− = ⎪⎩

*AAA AAA AA A*

*www www ww w*

012

( 2 2) 0 (4 8 8 ) 0 ( 4 10 ) 0

01 2

For the signal analyzed with RVCI window for two-point interpolation we get

1

( 1)| | | | | || |

*k k M V MV V V*

1

For *M*=0 and *M*=1 (41) agrees with previously derived formulas (24) and (37) for rectangular and Hanning window. Signal's frequency is next computed from (13). Signal's amplitude is


1 1

For three-point interpolation and rectangular window frequency correction *δ* is computed from (30) and not (43). Equation (43) does not hold for rectangular window, i.e. *M*=0,

> | ( 2 / )| 2| (0)| | (2 / )| | ( 2 / 2 / )| 2| ( 2 / )| | ( 2 / 2 / )|

*W NW WN W N N W NW N N*

δ π

<sup>2</sup> | | 2| | | | | ( )| sin( ) 2

<sup>+</sup> <sup>+</sup> <sup>=</sup> + +

The phase of the frequency bin *ω*0 for two-point and three-point interpolation is estimated

<sup>0</sup> arg{ ( )} arg{ ( )} arg{ } *j NN VVe <sup>k</sup>*

 ω

*V VV <sup>V</sup>*

*VV V*

Signal's amplitude for three-point interpolation may be computed from the proportion

 ωπ

−++ <sup>=</sup> − − + − +− +

π

the same way as for rectangular window (28) and Hanning window (39)

δπ

+ +

1 1

− +

*WN S*

*<sup>W</sup> <sup>V</sup> V V*

δ π


because the spectrum of rectangular window contains only one nonzero DFT bin.

+

*k k*

which are RVC1 *M*=2 coefficients listed in Tab. I.

δ

+

0

0 0 0

 π

0

ω

ω

1 1 | ( 2 / )| 2| ( )| | ( 2 / )| | | 2| | | |

− + <sup>−</sup> + ++ <sup>=</sup> + +

π

*k kk V NV V N V VV*

 ω

ω

For three-point interpolation from (20) we get

δ

ω

as

π

δπ

+ − <sup>=</sup> <sup>+</sup>

*k*

*V V <sup>M</sup>*

+ −

<sup>−</sup> = +

For higher order RVCI windows better accuracy in phase estimation may be obtained by replacing angle correction, that is second term in (46), by the angle of the following frequency bin

$$\begin{split} \mathcal{W}(e^{j\omega}) &\approx \\ &\approx e^{-j\alpha(N-1)/2} \left( \sum\_{m=-M}^{M} (-1)^{n} A'[\mid m \mid \mid \{1+jm\pi \ /\ N\} \frac{\sin(\alpha N \ / \ 2)}{\sin(\alpha \ / \ 2+m\pi \ /\ N)} \right), \\ &\left\{ \begin{aligned} &\text{for } m=0 \ A'\_{m} = A\_{0}^{w} \\ &\text{for } m\neq 0 \ A'\_{m} = A\_{[m]}^{w}/2. \end{aligned} \right) \end{split} \tag{47}$$

For known frequency *ω*0 amplitude and phase of the signal may also be computed from the definition of Fourier transform (4) or in the least squares (LS) sense.

### **3.2 Sinusoidal signal – arbitrary windows**

IpDFT formulas (41), (43) are only valid for RVCI windows, which have wide main lobe that deteriorates frequency resolution and noise performance. In practice it is often desired to analyze the signal with the window having better properties. It is known from literature, e.g. (Harris, 1978), that optimal parametric non cosine Kaiser-Bessel and Dolph-Chebyshev windows often perform superior over other windows including RVCI windows. In the following part IpDFT algorithm for arbitrary, even non cosine, windows is described for two- and three-point interpolation. The algorithm is based on polynomial approximation of the ratio *R*(*δ*) (15), (17) for the window of interest.

First, the ratio *R*(*δ*) (15), (17) is computed numerically for the selected window based on window spectrum, and then the dependence *δ*=*fδ*(*R*) is approximated by polynomial. During analysis the ratio *R*(*δ*) (15), (17) is evaluated from DFT bins, and frequency correction is estimated from previously computed *δ*=*fδ*(*R*).

From the LS polynomial approximation we get

$$\mathcal{S} \approx P\_d(\mathbf{R}), \quad P\_d(\mathbf{R}) = \sum\_{l=0}^{L} a\_l \mathbf{R}^l \; , \tag{48}$$

where *Pd* denotes *L* degree polynomial, and *R* is computed from window's spectrum. Signal's frequency is given by

$$a\_0 = [k \pm P\_d(\mathbb{R})](2\pi/N) \, , \tag{49}$$

but this time *R* is computed from DFT bins of the analyzed signal.

Signal's amplitude is determined from the dependence *XN*=*fX*(*δ*), where for two-point interpolation

$$X\_N = \frac{|V(o\_0)|}{|V\_k|} \approx \frac{|W(0)|}{|W(\delta 2\pi/N)|},\tag{50}$$

and for three-point interpolation

Interpolation Algorithms of DFT for Parameters

Estimation of Sinusoidal and Damped Sinusoidal Signals 17

Approximation polynomial was fitted to 64 points of window spectrum computed numerically via Fourier transform (4) in the frequency range from 0 to 2*π*/*N* rad, i.e. *W*(*e*-*jω*) was computed by (4) for the set of frequencies from *ω*=0 to *ω*=2*π*/*N* rad with the increment (2*π*/*N*)/63 rad. Systematic errors were defined similar to (Schoukens et al., 1992), for each frequency *ω*0, test signals were generated with the phase from the interval <–*π*/2, *π*/2> changed with the step *π*/20 and the maximum absolute value of differences between estimated and true frequency was selected. It is seen from Fig. 7 that approximation polynomial of order 5 may give acceptable small systematic errors, nevertheless, in results shown in section 4 approximation polynomial of order 10 is used. For that order systematic errors, in Matlab 64 bit precision, for RVCI windows are practically the same for analytic

IpDFT formulas (41), (43) and described approximation based IpDFT (49).

(a) (b)

**3.3 Damped sinusoidal signal – Bertocco-Yoshida algorithms** 

 ϕ

ω

From the definition (5) DFT of the first term in the sum (57) is

ω ϕ

signal (2) in the complex form

*n*

Bessel and Dolph-Chebyshev windows for: a) *ω*0=0.05 rad, and b) *ω*0=1 rad

Fig. 7. Systematic errors of frequency estimation for 3p IpDFT for selected RVCI, Kaiser-

We start this section with derivation of DFT for damped sinusoidal signal (2). Let us rewrite

*<sup>A</sup> v A n e ee ee*

<sup>0</sup> cos( ) ( ) <sup>2</sup>

0 0 0 1 1 ( ) ( ) (2 / ) ( ) 0 0

> φ

<sup>−</sup> <sup>=</sup> <sup>−</sup>

<sup>−</sup> <sup>−</sup> <sup>−</sup> <sup>+</sup>

− − − − + + − − <sup>−</sup>

*N N dn j n dn j n j N kn j jj d n n n*

= =

DFT{ } *<sup>k</sup>*

ωϕ

*ee ee e e e*

where *ωk*=(2*π*/*N*)*k*. Using the formula for the sum of the geometric series (58) becomes

( ) ( ) <sup>1</sup> DFT{ } <sup>1</sup>

0

*ee e*

*dn jn j*

ωϕ

*dn dn j n dn j n*

−− − + −+ = += + . (57)

0

ω ω

*e*

*e* ω ω

0

*k k j j d N*

− −

*j j d*

ω ϕ

 π

0 0 () ()

 ϕ

<sup>=</sup> ∑ ∑ <sup>=</sup> , (58)

 ωϕ

> ωω

. (59)

$$\begin{split} \left| X\_{N} = \frac{|\mathcal{V}(a\_{0})|}{|V\_{k-1}| + 2\left|V\_{k}\right| + |V\_{k+1}|} \approx \\ \approx \left( \frac{|\mathcal{W}(-2\pi/N)| + 2\left|\mathcal{W}(0)\right| + |\mathcal{W}(2\pi/N)|}{|\mathcal{W}(-\delta\Delta\pi/N - 2\pi/N)| + 2\left|\mathcal{W}(-\delta\Delta\pi/N)\right| + |\mathcal{W}(-\delta\Delta\pi/N + 2\pi/N)|} \right) / \\ \approx \left( \left| \mathcal{W}(-\delta\Delta\pi/N - 2\pi/N) \right| + 2\left|\mathcal{W}(-\delta\Delta\pi/N) \right| + |\mathcal{W}(-\delta\Delta\pi/N + 2\pi/N)| \right) / \\ \approx \left( \left| \mathcal{Q} + 2\left|\mathcal{W}(2\pi/N) \right| \right) / \left| \mathcal{W}(0) \right| \}. \end{split} \tag{51}$$

Ratios (50), (51) are next approximated by polynomial *Px*

$$X\_N \approx P\_x(\delta) \,. \tag{52}$$

Signal's amplitude for two-point interpolation is


and signal's amplitude for three-point interpolation is

$$\mathbb{P}\left|V(\alpha\_0)\right| = P\_x(\mathcal{S}) \left( \left| \left| \left| V\_{k-1} \right| + 2 \left| \left| V\_k \right| + \left| V\_{k+1} \right| \right. \right. \right. \right) . \tag{54}$$

Signal's phase is computed the same way for two-point and three-point interpolation based on the dependence *PN*=*fP*(*δ*)

$$P\_N = \arg\{W(\mathcal{S}2\pi \mid N)\}\,. \tag{55}$$

Signal's phase is

$$\arg\{V(\alpha\_0)\} = \arg\{V\_k\} \pm P\_p(\delta) \,\,\,\,\,\tag{56}$$

where *Pp* is polynomial approximating *PN*=*fP*(*δ*), i.e. *PN*≈*Pp*(*δ*).

Fig. 6 shows the dependence *δ*=*fδ*(*R*) for RVCI windows for two-point and three-point interpolation. Fig. 7 presents systematic errors of frequency estimation in dependence of the order of approximation polynomial for selected RVCI, Kaiser-Bessel and Dolph-Chebyshev windows for signal's frequencies *ω*0=0.05 rad and *ω*0=1 rad and signal's length *N*=512.

Fig. 6. Dependence *δ*=*fδ*(*R*) for RVCI windows for: a) two-point and b) three-point interpolation

16 Fourier Transform – Signal Processing


 π

δπ

01 1 = ++ *x k kk* <sup>−</sup> <sup>+</sup> . (54)

δ

 π

. (52)

, (53)

. (55)

, (56)

(51)

*W NW WN W N N W NW N N*

δ π

*X P N x* ≈ ( ) δ



Signal's phase is computed the same way for two-point and three-point interpolation based

*P WN <sup>N</sup>* = arg{ ( 2 / )} δ π

<sup>0</sup> arg{ ( )} arg{ } () *V VP*

Fig. 6 shows the dependence *δ*=*fδ*(*R*) for RVCI windows for two-point and three-point interpolation. Fig. 7 presents systematic errors of frequency estimation in dependence of the order of approximation polynomial for selected RVCI, Kaiser-Bessel and Dolph-Chebyshev windows for signal's frequencies *ω*0=0.05 rad and *ω*0=1 rad and signal's length *N*=512.

*k p* = ±

<sup>0</sup> *x k* = δ

ω

 δ

ω

(a) (b)

Fig. 6. Dependence *δ*=*fδ*(*R*) for RVCI windows for: a) two-point and b) three-point interpolation

⎛ ⎞ −++ <sup>≈</sup> ⎜ ⎟ ⎝ ⎠ − − + − +− +

0 1 1 | ( )| | | 2| | | |

π

ω

− + <sup>=</sup> <sup>≈</sup> + +

*k kk*

*V VV*

/(2 2| (2 / )|/| (0)|).

Signal's amplitude for two-point interpolation is

and signal's amplitude for three-point interpolation is

ω

where *Pp* is polynomial approximating *PN*=*fP*(*δ*), i.e. *PN*≈*Pp*(*δ*).

π

*W NW*

 π

Ratios (50), (51) are next approximated by polynomial *Px*

*N*

+

on the dependence *PN*=*fP*(*δ*)

Signal's phase is

*<sup>V</sup> <sup>X</sup>*

δπ

Approximation polynomial was fitted to 64 points of window spectrum computed numerically via Fourier transform (4) in the frequency range from 0 to 2*π*/*N* rad, i.e. *W*(*e*-*jω*) was computed by (4) for the set of frequencies from *ω*=0 to *ω*=2*π*/*N* rad with the increment (2*π*/*N*)/63 rad. Systematic errors were defined similar to (Schoukens et al., 1992), for each frequency *ω*0, test signals were generated with the phase from the interval <–*π*/2, *π*/2> changed with the step *π*/20 and the maximum absolute value of differences between estimated and true frequency was selected. It is seen from Fig. 7 that approximation polynomial of order 5 may give acceptable small systematic errors, nevertheless, in results shown in section 4 approximation polynomial of order 10 is used. For that order systematic errors, in Matlab 64 bit precision, for RVCI windows are practically the same for analytic IpDFT formulas (41), (43) and described approximation based IpDFT (49).

Fig. 7. Systematic errors of frequency estimation for 3p IpDFT for selected RVCI, Kaiser-Bessel and Dolph-Chebyshev windows for: a) *ω*0=0.05 rad, and b) *ω*0=1 rad

### **3.3 Damped sinusoidal signal – Bertocco-Yoshida algorithms**

We start this section with derivation of DFT for damped sinusoidal signal (2). Let us rewrite signal (2) in the complex form

$$\upsilon v\_n = A \cos(\alpha\_0 n + \varphi) e^{-dn} = \frac{A}{2} (e^{-dn} e^{j(\alpha\_0 n + \varphi)} + e^{-dn} e^{-j(\alpha\_0 n + \varphi)}) \,. \tag{57}$$

From the definition (5) DFT of the first term in the sum (57) is

$$\text{DFT}\{e^{-dn}e^{j(\alpha\_0 n + \varphi)}\} = \sum\_{n=0}^{N-1} e^{-dn}e^{j(\alpha\_0 n + \varphi)}e^{-j(2\pi/N)kn} = e^{j\varphi} \sum\_{n=0}^{N-1} e^{(j\alpha\_0 - j\alpha\_k - d)n},\tag{58}$$

where *ωk*=(2*π*/*N*)*k*. Using the formula for the sum of the geometric series (58) becomes

$$\text{DFT}\{e^{-d\eta}e^{j(\alpha\_0 n + \phi)}\} = e^{j\phi} \frac{1 - e^{(j\alpha\_0 - j\alpha\_k - d)N}}{1 - e^{j\alpha\_0 - j\alpha\_k - d}} \cdot \tag{59}$$

Interpolation Algorithms of DFT for Parameters

**3.3.3 Yoshida (BY-2) algorithm** 

1 2

/

*rr r*

=

1 2

1 2

/

*rr r*

frequency are given by

=

1 2

where

where

8b).

Estimation of Sinusoidal and Damped Sinusoidal Signals 19

Let us define the ratio of the second order differences of the complex DFT bins in the form

2 1 2 1

*<sup>j</sup> k kk V VV <sup>e</sup> R r V VV e*

<sup>−</sup> − +

*re e e e e e re e e e e e*

=− − − − − +− − =− − − − − +− −

 λ

*e Re*

*<sup>j</sup> kk k V VV <sup>e</sup> R r VV V e*

<sup>−</sup> + +

(1 )(1 ) 2(1 )(1 ) (1 )(1 ) (1 )(1 ) 2(1 )(1 ) (1 )(1 )

−− − − − − − − −− −−

2(2 / ) 2 /

*e Re*

Fig. 8 illustrates cases for definitions of the ratio (69) and (71). Four successive DFT bins are always taken for interpolation and the DFT bin *Vk* has the highest magnitude. For (69) DFT bins with the highest magnitudes are *k*-2, *k*-1, *k*, *k*+1 (Fig. 8a) and for (71) *k*-1, *k*, *k*+1, *k*+2 (Fig.

In the original derivation of Yoshida algorithm the ratio (69) is used and damping and

= −− , <sup>2</sup> Re{ 3 /( 1)} *k R*

ω

*N* π

π

*j N j N R*

λλ

*re e e e e e re e e e e e*

=− − − − − +− − =− − − − − +− −

 λ

1 *<sup>k</sup> <sup>j</sup>*

*e*

λ−

<sup>2</sup> *d R* Im{ 3 /( 1)} *<sup>N</sup>*

π

ω

 ω

1 *<sup>k</sup> <sup>j</sup>*

1 1 1 2 2 1 2 1

− +

*k kk*

π

*e*

λ−

ω

 ω

2 / 2(2 / )

*j N j N R*

2 1 1 1

− −

ωω

λλ

λλ

and *λ* is evaluated from (68). From (69) we get

Damping and frequency are given by (65).

Second order differences may also be defined as

ωω

and *λ* is evaluated from (68). From (71) we get

Damping and frequency are given by (65).

 λ  ω

ω

λ

λλ

ωω

*k kk*

1 2

> ω

<sup>−</sup> <sup>=</sup> <sup>−</sup> . (70)

2 1

− + <sup>−</sup> <sup>=</sup> <sup>=</sup> − + <sup>−</sup> , (71)

*k k j*

+ −

ω

−

ω

− + <sup>−</sup> <sup>=</sup> <sup>=</sup> − + <sup>−</sup> , (69)

 λ

 ω

 λ

<sup>−</sup> <sup>=</sup> <sup>−</sup> . (72)

= −− . (73)

λλ

ωω

 λ

 ω

λλ

ωω

 λ

 ω  ω

*k k j*

+ −

ω

−

1 22 1 1 11 1

− −− − + −+ −

 λ

*kk kk kk kk k k k k*

ωω

 π

λ

λ

 ω

 π

1 11 1 1 2 2 1

 λ

*kk k k k k k k kk kk jj j j j j j j jj jj*

+ −+ − + + + +

ωω

*jj jj jj jj j j j j*

−− −− −− −− − − − −

(1 )(1 ) 2(1 )(1 ) (1 )(1 ) (1 )(1 ) 2(1 )(1 ) (1 )(1 )

λλ

λ

λ

ω

Going the same way for the second term in (57) we finally obtain the DFT of the damped sinusoidal signal (2)

$$V\_k = \frac{A}{2} \left( e^{j\phi} \frac{\mathbf{1} - e^{(j\alpha\_0 - j\alpha\_k - d)N}}{\mathbf{1} - e^{j\alpha\_0 - j\alpha\_k - d}} + e^{-j\phi} \frac{\mathbf{1} - e^{(-j\alpha\_0 - j\alpha\_k - d)N}}{\mathbf{1} - e^{-j\alpha\_0 - j\alpha\_k - d}} \right) \tag{60}$$

which may be rewritten considering that 1 *<sup>k</sup> j N e* − ω = and 0 *<sup>d</sup> <sup>j</sup> e* ω λ− + = in the form

$$V\_k = \frac{A}{2} \left( e^{j\phi} \frac{1 - \mathcal{A}^N}{1 - \mathcal{A}e^{-j\phi\_k}} + e^{-j\phi} \frac{1 - \mathcal{A}^{\*N}}{1 - \mathcal{A}^\*e^{-j\phi\_k}} \right). \tag{61}$$

In the following derivation of Bertocco-Yoshida IpDFT algorithms it is assumed that

$$V\_k \approx \frac{A}{2} \left( e^{j\phi} \frac{1 - \lambda^N}{1 - \lambda e^{-j\phi\_k}} \right), \quad 0 \le \phi < \pi \,\,\tag{62}$$

#### **3.3.1 Bertocco (BY-0) algorithm**

Let us define the following ratio of complex DFT bins

$$R = \frac{V\_{k+1}}{V\_k} \approx \frac{A}{2} \left( e^{j\varphi} \frac{1 - \lambda^N}{1 - \lambda e^{-j\alpha\_{k+1}}} \right) / \frac{A}{2} \left( e^{j\varphi} \frac{1 - \lambda^N}{1 - \lambda e^{-j\alpha\_k}} \right) = \frac{1 - \lambda e^{-j\alpha\_k}}{1 - \lambda e^{-j\alpha\_{k+1}}} \,\tag{63}$$

where *Vk* is the DFT bin with the highest magnitude. From (63) we get

$$\mathcal{A} = e^{j\alpha\_k} \frac{1 - R}{1 - R e^{-j2\pi/N}} \cdot \tag{64}$$

Considering 0 *<sup>d</sup> <sup>j</sup> e* ω λ− + = damping and frequency are given by

$$d = -\operatorname{Re}\{\ln(\mathcal{L})\} \; \; \; \; \; \alpha\_0 = \operatorname{Im}\{\ln(\mathcal{L})\} \; . \; \; \; \; \tag{65}$$

#### **3.3.2 BY-1 algorithm**

Let us define the ratio of the first order differences of the complex DFT bins in the form

$$R = \frac{V\_{k-1} - V\_k}{V\_k - V\_{k+1}} \cdot \tag{66}$$

where *Vk* is the DFT bin with the highest magnitude. Substituting (62) into (66) we get

$$R = \frac{V\_{k-1} - V\_k}{V\_k - V\_{k+1}} = \frac{1 - \lambda e^{-j\alpha\_{k+1}}}{1 - \lambda e^{-j\alpha\_{k-1}}} r\_{\prime} \quad r = \frac{-e^{-j\alpha\_k} + e^{-j\alpha\_{k-1}}}{-e^{-j\alpha\_{k+1}} + e^{-j\alpha\_k}} \,\prime \tag{67}$$

$$\mathcal{A} = e^{j\alpha\_k} \frac{r - R}{n e^{-j2\pi/N} - R e^{j2\pi/N}} \ . \tag{68}$$

Damping and frequency are given by (65).

### **3.3.3 Yoshida (BY-2) algorithm**

Let us define the ratio of the second order differences of the complex DFT bins in the form

$$R = \frac{V\_{k-2} - 2V\_{k-1} + V\_k}{V\_{k-1} - 2V\_k + V\_{k+1}} = \frac{1 - \lambda e^{-j\alpha\_{k+1}}}{1 - \lambda e^{-j\alpha\_{k-2}}} r\_{\,\,\,t} \tag{69}$$

where

18 Fourier Transform – Signal Processing

Going the same way for the second term in (57) we finally obtain the DFT of the damped

*k jjd jjd*

2 1 1

− ω

*j j k j j*

> <sup>1</sup> , 0 <sup>2</sup> <sup>1</sup> *<sup>k</sup> N*

λ <sup>−</sup> ⎛ ⎞ <sup>−</sup> <sup>≈</sup> ⎜ ⎟ ≤ < ⎝ ⎠ −

*e*

<sup>1</sup> 1 11 / 2 2 1 11

⎛ ⎞⎛ ⎞ − −− = ≈ ⎜ ⎟⎜ ⎟ <sup>=</sup>

1 *<sup>k</sup> j*

 − <sup>−</sup> <sup>=</sup> <sup>−</sup>

> λ , 0 ω

Let us define the ratio of the first order differences of the complex DFT bins in the form

where *Vk* is the DFT bin with the highest magnitude. Substituting (62) into (66) we get

<sup>1</sup> , <sup>1</sup>

<sup>−</sup> − −+ = = <sup>=</sup> <sup>−</sup> − −+

ω

*V V e ee R r <sup>r</sup> V V e ee* ω

λ

λ

1

*j jj k k*

2/ 2/ *<sup>k</sup> <sup>j</sup> j N j N r R*

*re Re*

π

− −− <sup>+</sup>

1 *k k k k V V <sup>R</sup> V V* −

+

*j jj <sup>k</sup> <sup>V</sup> AA e Re e V e ee*

1

*Re*

 ϕ

2 /

π

*j N R*

= Im{ln( )}

λ

ω

λ

λ

In the following derivation of Bertocco-Yoshida IpDFT algorithms it is assumed that

ω

*A e <sup>e</sup> V e <sup>e</sup>*

ω ω

ω ω

*<sup>A</sup> Ve e*

*j k j*

ϕ

*<sup>A</sup> V e*

*k j j*

where *Vk* is the DFT bin with the highest magnitude. From (63) we get

λ

λ

− + = damping and frequency are given by

*d* = −Re{ln( )}

ω

*e*

ω

λ

ϕ

1

−

Damping and frequency are given by (65).

*k k*

1

λ − <sup>−</sup> <sup>=</sup> <sup>−</sup>

*e*

ω

ϕ

ϕ

which may be rewritten considering that 1 *<sup>k</sup> j N e*

Let us define the following ratio of complex DFT bins

+

ω

**3.3.1 Bertocco (BY-0) algorithm** 

Considering 0 *<sup>d</sup> <sup>j</sup> e*

**3.3.2 BY-1 algorithm** 

λ

0 0 0 0 () ( ) 1 1

*j j dN j j dN j j*

*e e*

1 1 2 1 1 *k k*

⎛ ⎞ − − <sup>=</sup> ⎜ ⎟ <sup>+</sup> ⎝ ⎠ − −

⎛ ⎞ − − <sup>=</sup> ⎜ ⎟ <sup>+</sup> ⎝ ⎠ − −

*k k k k*

= and 0 *<sup>d</sup> <sup>j</sup> e*

λ

\* \*

 λ

 λ

> ω π

1 1

λ

1 1

+ −

ω

ω

*k k k k k k j j j*

1 1

 π

− + − − −

+ +

⎝ ⎠⎝ ⎠ − −− , (63)

ωω

<sup>−</sup> <sup>=</sup> <sup>−</sup> . (66)

 ω

> ω

λλ

λλ

*k k k N N j*

− −−

 ω

− − −−− <sup>−</sup> − − −−−

*N N*

 ϕ

*e e*

− − −

 φ ω ω

> ω ω

> > ω

− + = in the form

, (60)

. (61)

. (62)

*k*

. (64)

. (65)

, (67)

. (68)

ω

−

sinusoidal signal (2)

$$\begin{split} r &= r\_1 \mid r\_2\\ r\_1 &= (1 - \lambda e^{-j\alpha\_{k-1}})(1 - \lambda e^{-j\alpha\_k}) - 2(1 - \lambda e^{-j\alpha\_{k-2}})(1 - \lambda e^{-j\alpha\_k}) + (1 - \lambda e^{-j\alpha\_{k-2}})(1 - \lambda e^{-j\alpha\_{k-1}})\\ r\_2 &= (1 - \lambda e^{-j\alpha\_k})(1 - \lambda e^{-j\alpha\_{k+1}}) - 2(1 - \lambda e^{-j\alpha\_{k-1}})(1 - \lambda e^{-j\alpha\_{k+1}}) + (1 - \lambda e^{-j\alpha\_{k-1}})(1 - \lambda e^{-j\alpha\_k}) \end{split}$$

and *λ* is evaluated from (68). From (69) we get

$$\mathcal{A} = e^{j\alpha\_k} \frac{1 - R}{e^{-j2\pi/N} - R e^{j2(2\pi/N)}} \ . \tag{70}$$

Damping and frequency are given by (65).

Second order differences may also be defined as

$$R = \frac{V\_{k-1} - 2V\_k + V\_{k+1}}{V\_k - 2V\_{k+1} + V\_{k+2}} = \frac{1 - \lambda e^{-j\alpha\_{k+2}}}{1 - \lambda e^{-j\alpha\_{k-1}}} r \,\,\,\tag{71}$$

where

$$\begin{split} r &= r\_1 \mid r\_2\\ r\_1 &= (1 - \lambda e^{-j\alpha\_k})(1 - \lambda e^{-j\alpha\_{k+1}}) - 2(1 - \lambda e^{-j\alpha\_{k-1}})(1 - \lambda e^{-j\alpha\_{k+1}}) + (1 - \lambda e^{-j\alpha\_{k-1}})(1 - \lambda e^{-j\alpha\_k})\\ r\_2 &= (1 - \lambda e^{-j\alpha\_{k+1}})(1 - \lambda e^{-j\alpha\_{k+2}}) - 2(1 - \lambda e^{-j\alpha\_k})(1 - \lambda e^{-j\alpha\_{k+2}}) + (1 - \lambda e^{-j\alpha\_k})(1 - \lambda e^{-j\alpha\_{k+1}}) \end{split}$$

and *λ* is evaluated from (68). From (71) we get

$$\mathcal{A} = e^{i\alpha\_k} \frac{1 - R}{e^{-j2\left(2\pi/N\right)} - R e^{j2\pi/N}} \,. \tag{72}$$

Damping and frequency are given by (65).

Fig. 8 illustrates cases for definitions of the ratio (69) and (71). Four successive DFT bins are always taken for interpolation and the DFT bin *Vk* has the highest magnitude. For (69) DFT bins with the highest magnitudes are *k*-2, *k*-1, *k*, *k*+1 (Fig. 8a) and for (71) *k*-1, *k*, *k*+1, *k*+2 (Fig. 8b).

In the original derivation of Yoshida algorithm the ratio (69) is used and damping and frequency are given by

$$d = \frac{2\pi}{N} \text{Im} \{-\Im / (R - 1)\} \,, \ \alpha = \frac{2\pi}{N} \text{Re} \{k - 3 / (R - 1)\} \,. \tag{73}$$

Interpolation Algorithms of DFT for Parameters

**3.4.1 Rectangular window (RVCI** *M***=0)** 

damped rectangular window spectrum

ω 1 1

ω

*N N*

*n n*

*R*

*R*

arguments and it was assumed that (*N*-1)/*N*≈1.

The damping computed from (79) is

and damping computed from (80) is

*R*

*R*

ω

ω

ω

ω

0 0

= =

window is

where

ω = − ω

where | ( )| *<sup>R</sup> W*

ratios (78) are

ω

Estimation of Sinusoidal and Damped Sinusoidal Signals 21

Based on the spectrum of rectangular window (12) the spectrum of the damped rectangular

sin( / 2) ( ,) ( ) sin( / 2)

ωω

*R j dn j n j jd n j N R j*

1 2 2 2 | ( )| | ( )| , , | ( )| | ( )| *R R k k R R k k*

1 2 2 2 22

2 2 2 2 22

where *D*=*dN*/(2*π*). In derivation of (79-80) sine functions were approximated by theirs


− − <sup>+</sup> <sup>+</sup> = ≈≈ = ++ + +

*W V jD D*

*WV jD D*

δ

δ

1 2 12 1 2

*R R RR R R*

2 2 1 1 2 ( 1) , 0.5 <sup>1</sup>

 δ

2 2 2 2 2 ( 1) , 0.5 <sup>1</sup>

 δ

For *δ*=0.5 (82) must not be used. In implementation if *δ*=0.5 then zero sample should be appended at the end of the signal to change *δ*. Equation (83) may always be used, as from

> | ( )|0 | (0)| | ( )| | | | ( )| | ( 2 / )| *R R R R*

*V WWN jd*

<sup>−</sup> = = − −

 δπ

*V W W jd*

ω


+ + <sup>+</sup> <sup>+</sup> = ≈≈ = −+ − +

*W V jD <sup>D</sup> <sup>R</sup>*

*W V jD <sup>D</sup> <sup>R</sup>*

By comparing *D*2 in (79) and (80) we get desired frequency correction defined by (13)

1 2 2

*<sup>N</sup> We d e e e e W e*

<sup>−</sup> <sup>−</sup> <sup>−</sup> − −− − −

*W W R R W W*

ω

1 1

1 1

*k k*

*k k*

δ

*<sup>R</sup> <sup>d</sup> N R* πδ

*<sup>R</sup> <sup>d</sup> N R* πδ

Modulus of the frequency bin *ω*0 may be computed from the proportion

*k k*

definition (13) frequency correction is never equal -0.5.

ω

*k k*

*k k*



ω

( ) ( 1)/2

*jd* . Let us define following ratios of the squares of frequency bins of the

2 2 1 1

*<sup>k</sup>* is a frequency bin with the highest modulus, see Fig. 5. Using (77) the

2 2 2 2 2

2 2 2 2 2

δδ

δδ

δ

δ

 ω

> ω

+ − = = (78)

δ

δ

<sup>−</sup> = − − − . (81)

− − <sup>=</sup> <sup>≠</sup> <sup>−</sup> , (82)

− + <sup>=</sup> <sup>≠</sup> <sup>−</sup> <sup>−</sup> . (83)

<sup>=</sup> ∑ ∑= <sup>=</sup> <sup>=</sup> , (77)

ω

ω

ω

, (79)

, (80)

. (84)

Fig. 8. DFT of the damped sinusoidal signal. Solid circles denote DFT bins taken for Yoshida (BY-2) algorithm: a) ratio defined by (69), b) ratio defined by (71)

#### **3.3.4 BY-3 algorithm**

Let us define the ratio of the third order differences of the complex DFT bins in the form

$$R = \frac{V\_{k-2} - \Im V\_{k-1} + \Im V\_k - V\_{k+1}}{V\_{k-1} - \Im V\_k + \Im V\_{k+1} - V\_{k+2}} = \frac{1 - \lambda \varepsilon^{-j\alpha\_{k+2}}}{1 - \lambda \varepsilon^{-j\alpha\_{k-2}}} r\_{\prime} \tag{74}$$

where

$$\begin{split} r &= r\_1 \mid r\_2 \\ r\_1 &= (1 - \lambda e^{-j\alpha\_{k-1}})(1 - \lambda e^{-j\alpha\_k})(1 - \lambda e^{-j\alpha\_{k+1}}) - 3(1 - \lambda e^{-j\alpha\_{k-2}})(1 - \lambda e^{-j\alpha\_k})(1 - \lambda e^{-j\alpha\_{k+1}}) \\ &+ 3(1 - \lambda e^{-j\alpha\_{k-2}})(1 - \lambda e^{-j\alpha\_{k-1}})(1 - \lambda e^{-j\alpha\_{k+1}}) - (1 - \lambda e^{-j\alpha\_{k-2}})(1 - \lambda e^{-j\alpha\_{k-1}})(1 - \lambda e^{-j\alpha\_k}) \\ r\_2 &= (1 - \lambda e^{-j\alpha\_k})(1 - \lambda e^{-j\alpha\_{k+1}})(1 - \lambda e^{-j\alpha\_{k+2}}) - 3(1 - \lambda e^{-j\alpha\_{k-1}})(1 - \lambda e^{-j\alpha\_{k+1}})(1 - \lambda e^{-j\alpha\_{k+2}}) \\ &+ 3(1 - \lambda e^{-j\alpha\_{k-1}})(1 - \lambda e^{-j\alpha\_k})(1 - \lambda e^{-j\alpha\_{k+2}}) - (1 - \lambda e^{-j\alpha\_{k-1}})(1 - \lambda e^{-j\alpha\_k})(1 - \lambda e^{-j\alpha\_{k+1}}) \end{split}$$

and *λ* is evaluated from (68). From (74) we get

$$\mathcal{A} = e^{j\alpha\_k} \frac{r - R}{n e^{-j2(2\pi/N)} - R e^{j2(2\pi/N)}} \,. \tag{75}$$

Damping and frequency are given by (65).

#### **3.4 Damped sinusoidal signal – RVCI windows**

In the derivation of IpDFT algorithms for RVCI windows we treat damped signal with RVCI window as sinusoidal signal with damped window i.e.

$$\mathbf{w}\_n = w\_n \mathbf{x}\_n = w\_n A \cos(\alpha\_0 \mathbf{n} + \boldsymbol{\varphi}) e^{-\mathbf{d}\mathbf{n}} = \overline{w}\_n A \cos(\alpha\_0 \mathbf{n} + \boldsymbol{\varphi}) \,. \tag{76}$$

where *dn w we n n* <sup>−</sup> = is damped time window.

### **3.4.1 Rectangular window (RVCI** *M***=0)**

20 Fourier Transform – Signal Processing

Fig. 8. DFT of the damped sinusoidal signal. Solid circles denote DFT bins taken for Yoshida

Let us define the ratio of the third order differences of the complex DFT bins in the form

3 3 1 3 3 1

*<sup>j</sup> k kk k V V VV <sup>e</sup> R r V VV V e*

<sup>−</sup> − ++

1 1 2 1

− + − +

 λ

> λ

1 2 1 1

− + − +

 λ

*k kk k kk*

*j jj j jj*

− −− − −−

*k kk k kk kkk kkk*

*j jj j jj jjj jjj*

− −− − −− −−− −−−

*eee eee*

211 21

−−+ −−

=− − − − − <sup>112</sup>

3(1 )(1 )(1 ) (1 )(1 )(1 )

*e ee e ee*

2(2 / ) 2(2 / ) *<sup>k</sup> <sup>j</sup> j N j N r R*

*re Re*

In the derivation of IpDFT algorithms for RVCI windows we treat damped signal with RVCI

ϕ

*n nn n <sup>n</sup> v wx wA n e wA n* ω

π

(1 )(1 )(1 ) 3(1 )(1 )(1 ) 3(1 )(1 )(1 ) (1 )(1 )(1 )

21 1 1 12

*re e e e e e*

=− − − − − − − + − − − −− − −

> ω

+ − − − −− − −

*k k kk*

1 2

*jj j j*

−− − −

+ +

 λ

ωω

ωω

(1 )(1 )(1 ) 3(1

ωω

λλλ

*re e e e*

*kk k*

λ−

λλ

*e*

ω

ωωω

λλ

−− +

2 2

λλλ

λ

<sup>−</sup> <sup>=</sup> <sup>−</sup> . (75)

 ω ϕ

ωωω

*kkk*

− −

−++

λλ

ω

)(1 )(1 )

*e e*

λλ

ωω

*j j*

− −

 λ

ωω

 ω

*k k j*

+ −

ω

−

λ

λ

 ω

> ω

 ω

 π

0 0 cos( ) cos( ) *dn*

<sup>−</sup> = = += + , (76)

− +− <sup>−</sup> <sup>=</sup> <sup>=</sup> −+ − <sup>−</sup> , (74)

ω

(a) (b)

(BY-2) algorithm: a) ratio defined by (69), b) ratio defined by (71)

**3.3.4 BY-3 algorithm** 

1 2

λ

λ

ω

λλ

ω

Damping and frequency are given by (65).

**3.4 Damped sinusoidal signal – RVCI windows** 

window as sinusoidal signal with damped window i.e.

<sup>−</sup> = is damped time window.

and *λ* is evaluated from (68). From (74) we get

/

*rr r*

=

1

2

where *dn w we n n*

where

Based on the spectrum of rectangular window (12) the spectrum of the damped rectangular window is

$$\overline{\mathcal{W}}^{\mathbb{R}}(e^{j\alpha},d) = \sum\_{n=0}^{N-1} e^{-dn} e^{-j\alpha n} = \sum\_{n=0}^{N-1} e^{-j(\alpha - jd)n} = e^{-j\overline{\sigma}(N-1)/2} \frac{\sin(\overline{\sigma}N / \sqrt{2})}{\sin(\overline{\sigma} / \sqrt{2})} = \overline{\mathcal{W}}^{\mathbb{R}}(e^{j\overline{\sigma}}) \, , \tag{77}$$

where ω = − ω *jd* . Let us define following ratios of the squares of frequency bins of the damped rectangular window spectrum

$$R\_1 = \frac{|\sqrt{\mathcal{V}}^R(\overline{o\_{k+1}})|^2}{|\sqrt{\mathcal{V}}^R(\overline{o\_k})|^2}, \quad R\_2 = \frac{|\sqrt{\mathcal{V}}^R(\overline{o\_{k-1}})|^2}{|\sqrt{\mathcal{V}}^R(\overline{o\_k})|^2},\tag{78}$$

where | ( )| *<sup>R</sup> W* ω*<sup>k</sup>* is a frequency bin with the highest modulus, see Fig. 5. Using (77) the ratios (78) are

$$R\_1 = \frac{|\overline{\mathcal{W}}^R(\overline{\alpha}\_{k+1})|^2}{|\overline{\mathcal{W}}^R(\overline{\alpha}\_k)|^2} \approx \frac{|V\_{k+1}|^2}{|V\_k|^2} \approx \frac{|\delta + jD|^2}{|\delta - 1 + jD|^2} = \frac{\delta^2 + D^2}{(\delta - 1)^2 + D^2},\tag{79}$$

$$R\_2 = \frac{|\overline{\mathcal{W}}^R(\overline{\alpha}\_{k-1})|^2}{|\overline{\mathcal{W}}^R(\overline{\alpha}\_k)|^2} \approx \frac{|V\_{k-1}|^2}{|V\_k|^2} \approx \frac{|\delta + jD|^2}{|\delta + 1 + jD|^2} = \frac{\delta^2 + D^2}{\left(\delta + 1\right)^2 + D^2} \,\,\,\tag{80}$$

where *D*=*dN*/(2*π*). In derivation of (79-80) sine functions were approximated by theirs arguments and it was assumed that (*N*-1)/*N*≈1.

By comparing *D*2 in (79) and (80) we get desired frequency correction defined by (13)

$$\mathcal{S} = -\frac{1}{2} \frac{R\_1 - R\_2}{2R\_1 R\_2 - R\_1 - R\_2} \cdot \tag{81}$$

The damping computed from (79) is

$$d = \frac{2\pi}{N} \sqrt{\frac{\delta^2 - R\_1(\delta - 1)^2}{R\_1 - 1}}, \quad \delta \neq 0.5 \text{ \AA} \tag{82}$$

and damping computed from (80) is

$$d = \frac{2\pi}{N} \sqrt{\frac{\delta^2 - R\_2(\delta + 1)^2}{R\_2 - 1}}, \quad \delta \neq -0.5 \,\text{s}\tag{83}$$

For *δ*=0.5 (82) must not be used. In implementation if *δ*=0.5 then zero sample should be appended at the end of the signal to change *δ*. Equation (83) may always be used, as from definition (13) frequency correction is never equal -0.5.

Modulus of the frequency bin *ω*0 may be computed from the proportion

$$\frac{|\left|V(o\_0)\right|}{\left|V\_k\right|} = \frac{|\left|\overline{\mathcal{W}}^R(0)\right|}{|\left|\overline{\mathcal{W}}^R(\overline{o\_k})\right|} = \frac{|\left|\overline{\mathcal{W}}^R(-jd)\right|}{|\left|\overline{\mathcal{W}}^R(-\delta \mathbb{2}\pi/N - jd)\right|}\,\tag{84}$$

Interpolation Algorithms of DFT for Parameters

ω

**3.4.3 Higher order RVCI windows** 

*M*

(13) for damped RVCI order *M* window

*MR M <sup>d</sup> N R*

πδ

δ

2 2 1 1 2 ( ) ( 1) , 0.5 <sup>1</sup>

Signal's amplitude and phase are next computed as

ω

**4. Some properties of IpDFT algorithms** 

 δ

+ − −− <sup>=</sup> <sup>≠</sup> <sup>−</sup> ,

The damping computed from (97) and (98) are

ω

Estimation of Sinusoidal and Damped Sinusoidal Signals 23

 ω<sup>=</sup> *k k* , ( 1)/2 | ( )|/ <sup>0</sup>

<sup>0</sup> arg{ ( )} arg{ ( )} arg{ ( 2 / )} *<sup>H</sup> V V WN <sup>k</sup>*

=± −

( ) ( 1) ( ) ( 1) ( ) 2 2

ω

*W V MD*

*W V MD*

2 2 2 2

 δ

2 2 2 2

 δ

1 2 12 1 2

*M RR R R M*

πδ

+ − = − + −−− . (99)

*MR M <sup>d</sup> N R*

> *d N AV e* ω

=± − *jd* . (102)

 δπ

δ

δ

*<sup>M</sup> w w jj j m R m m m R*

 ω

*A A W e W e W e*

1 1 1 2 2 2 2 | ( )| | | ( ) | ( )| | | ( 1)

1 1 2 2 2 2 2 | ( )| | | ( ) | ( )| | | ( 1)

*W V M D <sup>R</sup>*

*W V M D <sup>R</sup>*

− − − + = ≈≈ ++ +

where *D*=*dN*/(2*π*) and *M* is the order of RVCI window. For *M*=0 (rectangular window) (97-

By comparing *D*2 in (97) and (98) we get general formula for frequency correction defined by

*M R R*

 ω<sup>0</sup> <sup>=</sup> *k k* , ( 1)/2 | ( )|/ <sup>0</sup>

<sup>0</sup> arg{ ( )} arg{ ( )} arg{ ( 2 / )} *V V WN <sup>k</sup>*

In this section we present results of simulations that describe systematic errors and noise immunity of IpDFT methods. Because of space constrains, only the results of frequency or frequency and damping estimation are presented. Including results for amplitude and phase estimation would multiply the number of figures by three. Furthermore, in practice

 ω

2 2( 1) 2

+ + + + = ≈≈ −− +

 ω

rescaled and moved in frequency spectra of damped rectangular window

For damped RVCI window the ratios of the squares of frequency bins are

ω

ω

ω

98) becomes (79-80), and for *M*=1 (Hanning window) (97-98) is (90-91).

2 1


ω

δ

ω

*Mk k Mk k*

*Mk k Mk k* *d N AV e* ω

( ) ( )

− +

= − ∑ + − . (96)

*m m*

δ π

− − = , (94)

<sup>−</sup> = RVCI window order *M* is a sum of

 ω ω

2 2 2 2 2 ( ) ( 1) , 0.5 <sup>1</sup>

− − ++ <sup>=</sup> <sup>≠</sup> <sup>−</sup> <sup>−</sup> .(100)

− − = , (101)

 δ

*jd* . (95)

, (97)

, (98)

δ


ω

In general, the spectrum of the damped *dn w we n n*

0

=

*m*

From (84) we get

$$|V(a\_0)| \doteq |V\_k| \frac{\sqrt{\delta^2 + D^2}}{D} \frac{|\sin(j dN / 2)|}{\left| \sin(\delta \pi + j d N / 2) \right|} \quad d \neq 0 \tag{85}$$

and signal's amplitude is

$$A = \left| V(o\_0) \right| / e^{-d(N-1)/2} \,. \tag{86}$$

Signal's phase may be computed as in case of sinusoidal signals (27)

$$\arg\{V(o\_0)\} = \arg\{V(o\_k)\} \pm \arg\{\tilde{V}^R(\delta 2\pi/N - jd)\}\,. \tag{87}$$

Sign '+' or '-' in (87) is taken the same way as in (13).

### **3.4.2 Hanning (Hann, RVCI** *M***=1) window**

The spectrum of damped Hanning window *H H dn w we n n* <sup>−</sup> = is given by

$$\overline{\mathcal{W}}^{H} \left( e^{j\overline{\sigma}} \right) = -0.25 \overline{\mathcal{W}}^{R} \left( e^{j(\overline{\sigma} - \alpha\_{1})} \right) + 0.5 \overline{\mathcal{W}}^{R} \left( e^{j\overline{\sigma}} \right) - 0.25 \overline{\mathcal{W}}^{R} \left( e^{j(\overline{\sigma} + \alpha\_{1})} \right), \quad \alpha\_{1} = 2\pi \;/ \; N \;/ \tag{88}$$

where ( ) *<sup>R</sup> <sup>j</sup> W e* ω is the spectrum of damped rectangular window (77) and ω = − ω *jd* . Inserting (77) into (88) and assuming ( / )( 1) 1 / *j NN e j N* π π <sup>−</sup> ≈− + and *π*/*N*<<1 the spectrum of damped Hanning window is further approximated by

$$\left| \nabla^{H} (e^{j\overline{\sigma}}) \approx e^{-j\overline{\sigma}(N-1)/2} \sin(\overline{\alpha}N/2) \right| - \frac{0.25}{\sin(\overline{\alpha}/2 - \pi/N)} + \frac{0.5}{\sin(\overline{\alpha}/2)} - \frac{0.25}{\sin(\overline{\alpha}/2 + \pi/N)} \Big|\_{+} \text{(89)}$$

The ratios of the squares of frequency bins of the damped Hanning window spectrum are

$$R\_1 = \frac{|\overline{\mathcal{W}}(\overline{\alpha}\_{k+1})|^2}{|\!\!\!/\overline{\mathcal{W}}(\overline{\alpha}\_k)\!\!/^2} \approx \frac{|\!\!\!/ V\_{k+1}|^2}{|\!\!/ V\_k\!\!/^2} \approx \frac{(\delta+1)^2 + D^2}{\left(\delta - 2\right)^2 + D^2} \,\,\,\tag{90}$$

$$R\_2 = \frac{|\overline{\mathcal{W}}(\overline{\alpha}\_{k-1})|^2}{|\overline{\mathcal{W}}(\overline{\alpha}\_k)|^2} \approx \frac{|V\_{k-1}|^2}{|V\_k|^2} \approx \frac{(\delta - 1)^2 + D^2}{\left(\delta + 2\right)^2 + D^2} \,\,\,\tag{91}$$

where *D*=*dN*/(2*π*). From (90) and (91) we get desired frequency correction defined by (13)

$$\delta = -\frac{3}{2} \frac{R\_1 - R\_2}{4R\_1 R\_2 - R\_1 - R\_2 - 2} \,. \tag{92}$$

The damping computed from (90) and (91) are

$$d = \frac{2\pi}{N} \sqrt{\frac{(\delta + 1)^2 - R\_1(\delta - 2)^2}{R\_1 - 1}}, \quad \delta \neq 0.5 \,, \ d = \frac{2\pi}{N} \sqrt{\frac{(\delta - 1)^2 - R\_2(\delta + 2)^2}{R\_2 - 1}}, \quad \delta \neq -0.5 \tag{93}$$

Signal's amplitude and phase are next computed as


$$\arg\{V(o\_0)\} = \arg\{V(o\_k)\} \pm \arg\{\overline{V}^H(\delta 2\pi/N - jd)\}\,. \tag{95}$$

### **3.4.3 Higher order RVCI windows**

22 Fourier Transform – Signal Processing


δπ

− − = . (86)

, (85)

*jd* . (87)

ω

 ωπ

<sup>−</sup> ≈− + and *π*/*N*<<1 the spectrum of

*N N*

 π

> ω = − ω*jd* .

, (90)

, (91)

δ

<sup>+</sup> <sup>=</sup> <sup>≠</sup> +

> ( 1)/2 | ( )|/ <sup>0</sup> *d N AV e* ω

<sup>0</sup> arg{ ( )} arg{ ( )} arg{ ( 2 / )} *<sup>R</sup> V V WN <sup>k</sup>*

=± −

1 1 ( ) ( ) <sup>1</sup> ( ) 0.25 ( ) 0.5 ( ) 0.25 ( ), 2 / *W e We We We H R RR j j jj <sup>N</sup>*

is the spectrum of damped rectangular window (77) and

− − ⎛ ⎞ ≈ −+ ⎜ ⎟ −

2 2 2 2

2 2 2 2

1 2 12 1 2

2 4 2 *R R RR R R*

δ

 δ

δ

 δ

*<sup>R</sup> <sup>d</sup> N R* πδ

( 1)/2 0.25 0.5 0.25 ( ) sin( /2) sin( / 2 / ) sin( / 2) sin( / 2 / )

ωπ

The ratios of the squares of frequency bins of the damped Hanning window spectrum are

+ + + + = ≈≈ − +

− − − + = ≈≈ + +

where *D*=*dN*/(2*π*). From (90) and (91) we get desired frequency correction defined by (13)

1 1 1 2 2 22 | ( )| | | ( 1) | ( )| | | ( 2) *k k k k W V <sup>D</sup> <sup>R</sup> WV D*

1 1 2 2 2 22 | ( )| | | ( 1) | ( )| | | ( 2) *k k k k W V <sup>D</sup> <sup>R</sup> WV D*

ω

ω

δ

The damping computed from (90) and (91) are

*<sup>R</sup> <sup>d</sup> N R* πδ

2 2 1 1 2 ( 1) ( 2) , 0.5 <sup>1</sup>

 δ

+− − <sup>=</sup> <sup>≠</sup> <sup>−</sup> ,

Signal's amplitude and phase are next computed as

ω

3

δ

ω

δ π

<sup>−</sup> = is given by

− + = − + − = , (88)

π

 ω

⎝ ⎠ − + . (89)

<sup>−</sup> = − <sup>−</sup> − − . (92)

2 2 2 2 2 ( 1) ( 2) , 0.5 <sup>1</sup>

 δ

−− + <sup>=</sup> ≠ − <sup>−</sup> (93)

 ω ω

 ω

ωω

π

2 2

δ

Signal's phase may be computed as in case of sinusoidal signals (27)

ω

The spectrum of damped Hanning window *H H dn w we n n*

 ω

Inserting (77) into (88) and assuming ( / )( 1) 1 / *j NN e j N*

damped Hanning window is further approximated by

ω

Sign '+' or '-' in (87) is taken the same way as in (13).

**3.4.2 Hanning (Hann, RVCI** *M***=1) window** 

ω

*We e <sup>H</sup> j jN N*

 ω

ω

where ( ) *<sup>R</sup> <sup>j</sup> W e*

ω

0

ω

From (84) we get

and signal's amplitude is

In general, the spectrum of the damped *dn w we n n* <sup>−</sup> = RVCI window order *M* is a sum of rescaled and moved in frequency spectra of damped rectangular window

$$
\overline{\mathcal{W}}\_M(e^{j\overline{\sigma}}) = \sum\_{m=0}^M (-1)^m \frac{A\_m^w}{2} \overline{\mathcal{W}}^R(e^{j(\overline{\sigma} - \alpha\_m)}) + (-1)^m \frac{A\_m^w}{2} \overline{\mathcal{W}}^R(e^{j(\overline{\sigma} + \alpha\_m)}) \,. \tag{96}
$$

For damped RVCI window the ratios of the squares of frequency bins are

$$R\_1 = \frac{|\overline{\mathcal{W}}\_M(\overline{\mathcal{o}\_{k+1}})|^2}{|\overline{\mathcal{W}}\_M(\overline{\mathcal{o}\_k})|^2} \approx \frac{|V\_{k+1}|^2}{|V\_k|^2} \approx \frac{(\delta + M)^2 + D^2}{(\delta - M - 1)^2 + D^2} \,\,\,\tag{97}$$

$$R\_2 = \frac{|\overline{\mathcal{W}}\_M(\overline{\mathcal{o}}\_{k-1})|^2}{|\overline{\mathcal{W}}\_M(\overline{\mathcal{o}}\_k)|^2} \approx \frac{|\boldsymbol{V}\_{k-1}|^2}{|\boldsymbol{V}\_k|^2} \approx \frac{\left(\delta - \boldsymbol{M}\right)^2 + \boldsymbol{D}^2}{\left(\delta + \boldsymbol{M} + 1\right)^2 + \boldsymbol{D}^2} \tag{98}$$

where *D*=*dN*/(2*π*) and *M* is the order of RVCI window. For *M*=0 (rectangular window) (97- 98) becomes (79-80), and for *M*=1 (Hanning window) (97-98) is (90-91).

By comparing *D*2 in (97) and (98) we get general formula for frequency correction defined by (13) for damped RVCI order *M* window

$$\delta = -\frac{2M+1}{2} \frac{R\_1 - R\_2}{2(M+1)R\_1R\_2 - R\_1 - R\_2 - 2M} \,. \tag{99}$$

The damping computed from (97) and (98) are

$$d = \frac{2\pi}{N} \sqrt{\frac{(\delta + M)^2 - R\_1(\delta - M - 1)^2}{R\_1 - 1}},\\ \delta \neq 0.5 \,, \ d = \frac{2\pi}{N} \sqrt{\frac{(\delta - M)^2 - R\_2(\delta + M + 1)^2}{R\_2 - 1}},\\ \delta \neq -0.5 \, \text{(100))}$$

Signal's amplitude and phase are next computed as

$$\vdash |V(a\_0)| = |V\_k| \mid |\overline{V}(0)| \mid / |\overline{V}(\overline{a}\_k)| \mid \, \, A = \mid V(a\_0) \mid / e^{-d(N-1)/2} \,, \tag{101}$$

$$\arg\{V(o\_0)\} = \arg\{V(o\_k)\} \pm \arg\{\mathcal{W}(\delta \mathcal{D} \pi / N - jd)\}\,. \tag{102}$$

### **4. Some properties of IpDFT algorithms**

In this section we present results of simulations that describe systematic errors and noise immunity of IpDFT methods. Because of space constrains, only the results of frequency or frequency and damping estimation are presented. Including results for amplitude and phase estimation would multiply the number of figures by three. Furthermore, in practice

Interpolation Algorithms of DFT for Parameters

considered.

exactly 2, 3, 4, 5, 6 periods.

algorithms.

Estimation of Sinusoidal and Damped Sinusoidal Signals 25

Fig. 10 depicts damped sinusoidal test signals with frequency *ω*0=10.2(2*π*/*N*) rad, i.e. *δ*=0.2. In simulations the range of damping from *d*=0.0001 (Fig.10a) to *d*=0.01 (Fig.10b) is

(a) (b)

Fig. 10. Damped sinusoidal test signals *N*=512, *ω*0=10.2·(2*π*/*N*) rad, a) *d*=0.0001, b) *d*=0.01

Figs. 11-12 present systematic errors of sinusoidal signal frequency estimation for selected RVCI, Kaiser-Bessel and Dolph-Chebyshev windows for two-point and three-point interpolation. The frequency of the test signals in Fig. 11 was changed from *ω*0=1.5·(2*π*/*N*) to *ω*0=249.5·(2*π*/*N*) with the step 8·(2*π*/*N*), and the frequency of the test signals in Fig. 12 was changed from *ω*0=2·(2*π*/*N*) to *ω*0=6·(2*π*/*N*) with the step (2*π*/*N*)/5. In the first case test signal was newer coherently sampled (i.e. signal never contained integer number of periods), whereas in the second case coherent sampling occurred for test signals containing

It is seen from Figs. 11-12 that 3p interpolation gives smaller systematic errors than 2p interpolation. Increasing the order of RVCI window results in significant reduction of systematic errors. High order RVCI *M*=6 window may give negligible small estimation errors as visible in Fig. 11 which is the effect of the fastest decay of the sidelobes. Still, due to wide main lobe of RVCI *M*=6 window, the signal has to contain sufficient number of periods. Systematic errors for Kaiser-Bessel and Dolph-Chebyshev windows are significantly higher then for RVCI *M*=6 window because sidelobes of those windows does not decay so fast. However, it is seen from Fig. 12 that for analysis of the short signal containing 2-4 periods it is advantageous to use, narrow main lobe, Dolph-Chebyshev 120 dB and Kaiser-Bessel *β*=15.8 windows over RVCI windows. Local minima for integer values of *k* in Fig. 12 occur for coherent sampling and cosine windows. For the case of coherent sampling frequency correction is *δ*=0, DFT analysis is correct and there is no need to use IpDFT

Figs. 13-14 present systematic errors of damped sinusoidal signal frequency and damping estimation for BY algorithms and IpDFT with RVCI windows. The frequency of the test signals in Fig. 13 was changed from *ω*0=1.5·(2*π*/*N*) to *ω*0=249.5·(2*π*/*N*) with the step 8·(2*π*/*N*), and damping was set to *d*=0.01 (compare Fig. 10b). In Fig. 14 the damping was

estimation of frequency and damping is of primary importance, and once having frequency and damping the amplitude and phase may be estimated by LS or FT. It is also true that amplitude and phase estimation errors, not shown in this section, behave similarly to frequency estimation errors.

First, systematic errors of IpDFT algorithms for sinusoidal and damped sinusoidal signals are presented, and then robustness against additive, zero-mean, Gaussian noise is shown. In all simulations number of samples *N*=512 was chosen.

Simulations were conducted in Matlab 64-bit floating point precision. Accuracy of this precision determined by the function *eps* (Matlab) is on the level 10–15–10–16, and estimation errors cannot be lower than this accuracy.

### **4.1 Systematic errors**

In this section systematic errors of frequency estimation for sinusoidal signals and frequency and damping estimation for damped sinusoidal signals are presented. For each frequency *ω*<sup>0</sup> or damping *d*, test signals were generated with the phase from the interval <–*π*/2, *π*/2> changed with the step *π*/20 and the maximum absolute difference between estimated and true value was selected.

For obtaining general conclusions the frequency of the test signals was swept in the whole range from 0 to *π* rad. For easier interpretation the frequency of the test signal is also given in DFT index *k*.

Fig. 9 shows two sinusoidal test signals with *N*=512 samples. The signal in Fig. 9a with frequency *ω*0=1.5·(2*π*/*N*) contains 1.5 periods, whereas signal in Fig. 9b with frequency *ω*0=249.5·(2*π*/*N*) rad contains 249.5 periods. Frequencies of those signals scaled in DFT index *k* are 1.5 and 249.5 and it means, that in the frequency spectrum those signals lie in the half way between DFT bins *k*=1 and *k*=2 and bins *k*=249 and *k*=250, respectively, and in both cases frequency correction *δ* (13) equals 0.5. The first signal is sampled approx 341 times per period and the second only approx 2.05 times per period.

Fig. 9. Sinusoidal test signals *N*=512, a) signal with frequency *ω*0=1.5·(2*π*/*N*) rad containing 1.5 periods, b) signal with frequency *ω*0=249.5·(2*π*/*N*) rad containing 249.5 periods

24 Fourier Transform – Signal Processing

estimation of frequency and damping is of primary importance, and once having frequency and damping the amplitude and phase may be estimated by LS or FT. It is also true that amplitude and phase estimation errors, not shown in this section, behave similarly to

First, systematic errors of IpDFT algorithms for sinusoidal and damped sinusoidal signals are presented, and then robustness against additive, zero-mean, Gaussian noise is shown. In

Simulations were conducted in Matlab 64-bit floating point precision. Accuracy of this precision determined by the function *eps* (Matlab) is on the level 10–15–10–16, and estimation

In this section systematic errors of frequency estimation for sinusoidal signals and frequency and damping estimation for damped sinusoidal signals are presented. For each frequency *ω*<sup>0</sup> or damping *d*, test signals were generated with the phase from the interval <–*π*/2, *π*/2> changed with the step *π*/20 and the maximum absolute difference between estimated and

For obtaining general conclusions the frequency of the test signals was swept in the whole range from 0 to *π* rad. For easier interpretation the frequency of the test signal is also given

Fig. 9 shows two sinusoidal test signals with *N*=512 samples. The signal in Fig. 9a with frequency *ω*0=1.5·(2*π*/*N*) contains 1.5 periods, whereas signal in Fig. 9b with frequency *ω*0=249.5·(2*π*/*N*) rad contains 249.5 periods. Frequencies of those signals scaled in DFT index *k* are 1.5 and 249.5 and it means, that in the frequency spectrum those signals lie in the half way between DFT bins *k*=1 and *k*=2 and bins *k*=249 and *k*=250, respectively, and in both cases frequency correction *δ* (13) equals 0.5. The first signal is sampled approx 341 times per

frequency estimation errors.

**4.1 Systematic errors** 

true value was selected.

in DFT index *k*.

all simulations number of samples *N*=512 was chosen.

period and the second only approx 2.05 times per period.

(a) (b)

Fig. 9. Sinusoidal test signals *N*=512, a) signal with frequency *ω*0=1.5·(2*π*/*N*) rad containing

1.5 periods, b) signal with frequency *ω*0=249.5·(2*π*/*N*) rad containing 249.5 periods

errors cannot be lower than this accuracy.

Fig. 10 depicts damped sinusoidal test signals with frequency *ω*0=10.2(2*π*/*N*) rad, i.e. *δ*=0.2. In simulations the range of damping from *d*=0.0001 (Fig.10a) to *d*=0.01 (Fig.10b) is considered.

Fig. 10. Damped sinusoidal test signals *N*=512, *ω*0=10.2·(2*π*/*N*) rad, a) *d*=0.0001, b) *d*=0.01

Figs. 11-12 present systematic errors of sinusoidal signal frequency estimation for selected RVCI, Kaiser-Bessel and Dolph-Chebyshev windows for two-point and three-point interpolation. The frequency of the test signals in Fig. 11 was changed from *ω*0=1.5·(2*π*/*N*) to *ω*0=249.5·(2*π*/*N*) with the step 8·(2*π*/*N*), and the frequency of the test signals in Fig. 12 was changed from *ω*0=2·(2*π*/*N*) to *ω*0=6·(2*π*/*N*) with the step (2*π*/*N*)/5. In the first case test signal was newer coherently sampled (i.e. signal never contained integer number of periods), whereas in the second case coherent sampling occurred for test signals containing exactly 2, 3, 4, 5, 6 periods.

It is seen from Figs. 11-12 that 3p interpolation gives smaller systematic errors than 2p interpolation. Increasing the order of RVCI window results in significant reduction of systematic errors. High order RVCI *M*=6 window may give negligible small estimation errors as visible in Fig. 11 which is the effect of the fastest decay of the sidelobes. Still, due to wide main lobe of RVCI *M*=6 window, the signal has to contain sufficient number of periods. Systematic errors for Kaiser-Bessel and Dolph-Chebyshev windows are significantly higher then for RVCI *M*=6 window because sidelobes of those windows does not decay so fast. However, it is seen from Fig. 12 that for analysis of the short signal containing 2-4 periods it is advantageous to use, narrow main lobe, Dolph-Chebyshev 120 dB and Kaiser-Bessel *β*=15.8 windows over RVCI windows. Local minima for integer values of *k* in Fig. 12 occur for coherent sampling and cosine windows. For the case of coherent sampling frequency correction is *δ*=0, DFT analysis is correct and there is no need to use IpDFT algorithms.

Figs. 13-14 present systematic errors of damped sinusoidal signal frequency and damping estimation for BY algorithms and IpDFT with RVCI windows. The frequency of the test signals in Fig. 13 was changed from *ω*0=1.5·(2*π*/*N*) to *ω*0=249.5·(2*π*/*N*) with the step 8·(2*π*/*N*), and damping was set to *d*=0.01 (compare Fig. 10b). In Fig. 14 the damping was

Interpolation Algorithms of DFT for Parameters

for BY algorithms and RVCI windows

for BY algorithms and RVCI windows

with variance *σ*2 is given by (Kay, 1993)

**4.2 Noise** 

Estimation of Sinusoidal and Damped Sinusoidal Signals 27

(a) (b)

(a) (b)

Fig. 14. Systematic errors of damped sinusoidal signal frequency and damping estimation

Noise performance is typically illustrated by comparison with Cramér-Rao Lower Bound (CRLB). Unbiased estimator that reaches CRLB is optimal Minimum Variance Unbiased (MVU) estimator. CRLB for sinusoidal signal (1) disturbed by zero-mean Gaussian noise

Fig. 13. Systematic errors of damped sinusoidal signal frequency and damping estimation

swept from *d*=10-4 to 10-2 with equidistant steps in logarithmic scale, and the frequency was set to *ω*0=10.2·(2*π*/*N*) (i.e. *δ*=0.2). It is seen from Figs. 13-14 that by choosing high order RVCI windows significant reduction of systematic errors may be obtained, however, the price for this gain is the need for longer signal (in the sense of number of cycles) as explain previously for sinusoidal signals, and higher noise sensitivity as shown in the next section.

Fig. 11. Systematic errors of sinusoidal signal frequency estimation for selected RVCI, Kaiser-Bessel and Dolph-Chebyshev windows: a) 2p interpolation, b) 3p interpolation

Fig. 12. Systematic errors of sinusoidal signal frequency estimation for selected RVCI, Kaiser-Bessel and Dolph-Chebyshev windows: a) 2p interpolation, b) 3p interpolation

26 Fourier Transform – Signal Processing

swept from *d*=10-4 to 10-2 with equidistant steps in logarithmic scale, and the frequency was set to *ω*0=10.2·(2*π*/*N*) (i.e. *δ*=0.2). It is seen from Figs. 13-14 that by choosing high order RVCI windows significant reduction of systematic errors may be obtained, however, the price for this gain is the need for longer signal (in the sense of number of cycles) as explain previously for sinusoidal signals, and higher noise sensitivity as shown

(a) (b)

(a) (b)

Fig. 12. Systematic errors of sinusoidal signal frequency estimation for selected RVCI, Kaiser-Bessel and Dolph-Chebyshev windows: a) 2p interpolation, b) 3p interpolation

Fig. 11. Systematic errors of sinusoidal signal frequency estimation for selected RVCI, Kaiser-Bessel and Dolph-Chebyshev windows: a) 2p interpolation, b) 3p interpolation

in the next section.

Fig. 13. Systematic errors of damped sinusoidal signal frequency and damping estimation for BY algorithms and RVCI windows

Fig. 14. Systematic errors of damped sinusoidal signal frequency and damping estimation for BY algorithms and RVCI windows

### **4.2 Noise**

Noise performance is typically illustrated by comparison with Cramér-Rao Lower Bound (CRLB). Unbiased estimator that reaches CRLB is optimal Minimum Variance Unbiased (MVU) estimator. CRLB for sinusoidal signal (1) disturbed by zero-mean Gaussian noise with variance *σ*2 is given by (Kay, 1993)

Interpolation Algorithms of DFT for Parameters

signal for different time windows

windows.

Estimation of Sinusoidal and Damped Sinusoidal Signals 29

(a) (b)

(a) (b)

sinusoidal signal for different IpDFT algorithms

Fig. 17. a) Mean value and b) standard deviation of frequency estimation error for damped

Fig. 16. a) Mean value and b) standard deviation of frequency estimation error for sinusoidal

Figs. 17-18 present results of frequency and damping estimation for damped sinusoidal signal. It is seen that for high *S*/*N* systematic error is dominant for BY-0, and RVCI *M*=0 window. BY-1 algorithm gives the best results in the wide range of *S*/*N*. It is seen from Fig. 18b that BY methods are better estimators of damping than IpDFT with RVCI

$$\text{var}(a\_{0E}) \ge \frac{12}{\eta N(N^2 - 1)}\tag{103}$$

and for damped sinusoidal signal (2) by (Yao & Pandit, 1995)

$$\text{var}(a\_{0E}) = \text{var}(d\_E) \approx \frac{(1 - z^2)^3 (1 - z^{2N})}{\eta [-N^2 z^{2N} (1 - z^2)^2 + z^2 (1 - z^{2N})^2]}, \; z = \left| e^{-d + j\alpha} \right|, \tag{104}$$

where *E* in subscripts stands for estimated value, and *η* is signal to noise ratio defined as

$$
\eta = A^2 \text{ / (}2\sigma^2\text{)} , \text{ S / N} = 10 \log\_{10}(\eta) \quad \text{(dB)} . \tag{105}
$$

It is seen from (103-104) that the variance of frequency estimator is inverse proportional to the third power of signal length *N*. This strong dependence suggests high number of samples for frequency estimation, i.e. high sampling frequency or/and long observation time.

Fig. 15 shows exemplary realization of the sinusoidal signal disturbed by the zero-mean Gaussian noise with *S*/*N*=10 dB (105) and its DFT spectrum. Mean value and standard deviation of estimation errors, shown in Figs. 16-18, were computed from 1000 realizations of the test signal. Signal's phase was generated as a random variable with uniform distribution on the interval <-*π*/2, *π*/2>.

Fig. 16 presents results of frequency estimation of sinusoidal signal analyzed with different windows for three-point interpolation. It is seen from Fig. 16 that for *S*/*N* from approx 15 dB to 40 dB rectangular window (RVCI *M*=0) has the best noise immunity. For the lower disturbance (higher *S*/*N*) systematic errors become more significant than noise for rectangular window and better results are obtained with RVCI *M*=1 (Hanning) window, Kaiser-Bessel *β*=4.86 window, and Dolph-Chebyshev 50 dB window. RVCI *M*=6 has the highest noise sensitivity due to the widest main lobe.

Fig. 15. a) Exemplary realization of sinusoidal test signal disturbed by 10 dB noise, *N*=512, *ω*0=10.2·(2*π*/*N*) rad; b) DFT spectrum

28 Fourier Transform – Signal Processing

0 2 <sup>12</sup> var( ) ( 1) *<sup>E</sup> N N*

23 2

, / 10log10 *S N* = ( ) (dB)

*Nz z z z*

*N*

η

<sup>≥</sup> − (103)

, | | *<sup>d</sup> <sup>j</sup> z e*− + ω= , (104)

. (105)

η

ω

0 22 22 2 2 2 (1 ) (1 ) var( ) var( ) [ (1 ) (1 ) ]

where *E* in subscripts stands for estimated value, and *η* is signal to noise ratio defined as

It is seen from (103-104) that the variance of frequency estimator is inverse proportional to the third power of signal length *N*. This strong dependence suggests high number of samples for frequency estimation, i.e. high sampling frequency or/and long observation

Fig. 15 shows exemplary realization of the sinusoidal signal disturbed by the zero-mean Gaussian noise with *S*/*N*=10 dB (105) and its DFT spectrum. Mean value and standard deviation of estimation errors, shown in Figs. 16-18, were computed from 1000 realizations of the test signal. Signal's phase was generated as a random variable with uniform

Fig. 16 presents results of frequency estimation of sinusoidal signal analyzed with different windows for three-point interpolation. It is seen from Fig. 16 that for *S*/*N* from approx 15 dB to 40 dB rectangular window (RVCI *M*=0) has the best noise immunity. For the lower disturbance (higher *S*/*N*) systematic errors become more significant than noise for rectangular window and better results are obtained with RVCI *M*=1 (Hanning) window, Kaiser-Bessel *β*=4.86 window, and Dolph-Chebyshev 50 dB window. RVCI *M*=6 has the

*E E N N z z <sup>d</sup>*

− − = ≈ − −+−

and for damped sinusoidal signal (2) by (Yao & Pandit, 1995)

η

highest noise sensitivity due to the widest main lobe.

(a) (b)

Fig. 15. a) Exemplary realization of sinusoidal test signal disturbed by 10 dB noise, *N*=512,

η

2 2

 = *A* /(2 ) σ

ω

distribution on the interval <-*π*/2, *π*/2>.

*ω*0=10.2·(2*π*/*N*) rad; b) DFT spectrum

time.

Fig. 16. a) Mean value and b) standard deviation of frequency estimation error for sinusoidal signal for different time windows

Figs. 17-18 present results of frequency and damping estimation for damped sinusoidal signal. It is seen that for high *S*/*N* systematic error is dominant for BY-0, and RVCI *M*=0 window. BY-1 algorithm gives the best results in the wide range of *S*/*N*. It is seen from Fig. 18b that BY methods are better estimators of damping than IpDFT with RVCI windows.

Fig. 17. a) Mean value and b) standard deviation of frequency estimation error for damped sinusoidal signal for different IpDFT algorithms

Interpolation Algorithms of DFT for Parameters

856–863

2, pp.245-250

pp. 74–80

Prentice Hall

Cliffs, NJ: Prentice-Hall

Edition, Prentice-Hall

58, no. 5, pp.1670-1679

pp.226-232

from www.mathworks.com.

November, pp 124-127

Estimation of Sinusoidal and Damped Sinusoidal Signals 31

Andria, G., Savino, M. & Trotta, A. (1989) Windows and interpolation algorithms to

Bertocco, M., Offeli, C. & Petri, D. (1994) Analysis of damped sinusoidal signals via a

Cooley, J. W., Tukey, J. W. (1965) An Algorithm for the Machine Computation of Complex

Duda, K. (2010) Accurate, Guaranteed-Stable, Sliding DFT, *IEEE Signal Processing Mag*.,

Duda, K. (2011a) DFT Interpolation Algorithm for Kaiser-Bessel and Dolph-Chebyshev

Duda, K., Magalas, L. B., Majewski, M. & Zieliński, T. P. (2011b) DFT based Estimation of

Grandke, T. (1983) Interpolation Algorithms for Discrete Fourier Transforms of Weighted

Harris, F. J. (1978) On the use of windows for harmonic analysis with the discrete Fourier

Jacobsen, E. & Lyons, R. (2003) The sliding DFT, *IEEE Signal Processing Mag.*, vol. 20, no. 2,

Jain V. K., Collins, W. L. & Davis, D. C. (1979) High-Accuracy Analog Measurements via Interpolated FFT," *IEEE Trans. Instrum. Meas.*, vol. Im-28, No. 2, pp.113-122 Kay, S. M. (1993) *Fundamentals of Statistical Signal Processing: Estimation Theory*, Englewood

Lyons, R. G. (2004) *Understanding Digital Signal Processing*, Second Edition, Prentice-Hall Matlab The Language of Technical Computing. Function Reference Volume 1–3, available

Moon, T. K. & Stirling W. C. (1999) *Mathematical Methods and Algorithms for Signal Processing*,

Nuttall, A. H. (1981) Some Windows with Very Good Sidelobe Behavior, *IEEE Trans. On Acoustics, Speech, And Signal Processing*, vol. ASSP-29, no. 1, pp.84-91 Offelli, C. & Petri, D. (1990) Interpolation Techniques for Real-Time Multifrequency Waveform Analysis, *IEEE Trans. Instrum. Meas.*, Vol. 39. No. 1, pp.106-111 Oppenheim, A.V., Schafer, R.W. & Buck, J.R. (1999) *Discrete-Time Signal Processing*, 2nd

Radil, T., Ramos, P. M. & Serra, A. C. (2009) New Spectrum Leakage Correction Algorithm

Schoukens, J., Pintelon, R. & Van hamme, H. (1992) The Interpolated Fast Fourier

Yao, Y. & Pandit, S.M. (1995) Cramér-Rao lower bounds for a damped sinusoidal process,

*IEEE Trans. Signal Process.*, vol. 43, no. 4, pp. 878–885

for Frequency Estimation of Power System Signals, *IEEE Trans. Instrum. Meas.*, Vol.

Transform: A Comparative Study, *IEEE Trans. Instrum. Meas.*, Vol. 41, No. 2,

Damped Oscillation's Parameters in Low-frequency Mechanical Spectroscopy,

Fourier Series, *Mathematics of Computation*, vol. 19, pp. 297-301

Windows, *IEEE Trans. Instrum. Meas.*, vol. 60, no. 3, pp. 784–790

Signals, *IEEE Trans. Instrum. Meas.*, vol. Im-32, No. 2, pp.350-355

*IEEE Trans. Instrum. Meas.*, vol. 60, no. 11, pp. 3608-3618

transform, *Proc. IEEE*, vol. 66, pp. 51–83

improve electrical measurement accuracy, *IEEE Trans. Instrum. Meas.*, vol. 38, pp.

frequency-domain interpolation algorithm, *IEEE Trans. Instrum. Meas.*, vol. 43, no.

Fig. 18. a) Mean value and b) standard deviation of damping estimation error for damped sinusoidal signal for different IpDFT algorithms

### **5. Conclusion**

This chapter describes DFT interpolation algorithms for parameters estimation of sinusoidal and damped sinusoidal signals. IpDFT algorithms have two main advantages:


IpDFT methods may be used as fully functional estimators, especially when noise disturbance is not very strong. If the signal model is known IpDFT may be used for providing starting point for LS optimization that is optimal for Gaussian zero-mean noise disturbance.

For the signals with disturbances not possible to include in the signal model, as e.g. unknown drift, IpDFT with adequate time window may offer better performance than optimization.

### **6. References**


30 Fourier Transform – Signal Processing

(a) (b)

and damped sinusoidal signals. IpDFT algorithms have two main advantages:

2. No need for the signal model (as opposed to parametric methods).

1. Low computational complexity attributed to fast algorithms of DFT computation.

sinusoidal signal for different IpDFT algorithms

**5. Conclusion** 

disturbance.

optimization.

**6. References** 

2009.

Fig. 18. a) Mean value and b) standard deviation of damping estimation error for damped

This chapter describes DFT interpolation algorithms for parameters estimation of sinusoidal

IpDFT methods may be used as fully functional estimators, especially when noise disturbance is not very strong. If the signal model is known IpDFT may be used for providing starting point for LS optimization that is optimal for Gaussian zero-mean noise

For the signals with disturbances not possible to include in the signal model, as e.g. unknown drift, IpDFT with adequate time window may offer better performance than

Agrež, D. (2002) Weighted Multipoint Interpolated DFT to Improve Amplitude Estimation of Multifrequency Signal, *IEEE Trans. Instrum. Meas.*, vol. 51, pp. 287-292 Agrež, D. (2009) A frequency domain procedure for estimation of the exponentially damped

sinusoids, *International Instrumentation and Measurement Technology Conference*, May


**2** 

Omar Mustaf *Gulf University Kingdom of Bahrain* 

**A Proposed Model-Based Adaptive System for** 

Spectral Analysis is of great importance in signal processing applications in general, and in the analysis and the performance evaluation of communications systems specifically. On the other hand the procedure of calculating/estimating the spectrum itself is also important, i.e. whether it's simple or complicated, especially when taking into account the limitations of onboard space in the parallel computation and VLSI implementation. Therefore, in response to the above, Widrow et al. proposed an adaptive method for estimating the frequency content of a signal through demonstrating a relationship between the Discrete Fourier Transform (DFT) and the Least Mean Square (LMS) algorithm, where the DFT coefficients are estimated by a new means using the LMS algorithm (Widrow et al., 1987). That was the original attempt of relating the DFT to the LMS adaptation rule. The main features of such a spectrum analysis are simplicity, adaptability, and suitability with parallel computations and VLSI implementation. This is owing to the nature of the LMS algorithm, which lends

Later on, Mccgee showed that Widrow's spectrum analyzer could be used as a recursive estimator for the sake of solving the exponentially-weighted least squares estimation and a filter bank model was deduced. The fundamental outcome of that work was that the LMS algorithm could act as a bank of filters with two modes of operation, which means when the LMS learning rate is chosen to be ½, the filter poles are located at the origin. This means equivalently that the LMS effective transfer functions are FIR filters; otherwise, the

The Widrow's principal relation between the LMS and the DFT was then extended to the 2-Dimentional (2-D) case by Liu and Bruton, which directly resulted in the 2-D LMS spectrum analyzer. Here it was shown that the 2-D LMS algorithm has the advantage that allows concurrent computations of what was called an updating matrix and therefore the potential for very fast parallel computations of the 2-D DFT; however, there was no

Two years later, a generalization of the above mentioned 2-D LMS spectrum analyzer was demonstrated by (Ogunfunmi and Au, 1995). It was achieved by successfully extending the relation to other 2-D discrete orthogonal transforms. The simulations of that work showed the same results when imposing a frame of a test signal of size 32 by 32 to both a 2-D DFT

**1. Introduction** 

itself to this type of implementation.

equivalent filter is IIR (Mccgee, 1989).

statement about how fast it was (Liu and Bruton, 1993).

**DFT Coefficients Estimation Using SIMULINK** 

Yoshida, I., Sugai, T., Tani, S., Motegi, M., Minamida, K. & Hayakawa, H. (1981) Automation of internal friction measurement apparatus of inverted torsion pendulum type, *J. Phys. E: Sci. Instrum.*, vol. 14, pp. 1201-1206

## **A Proposed Model-Based Adaptive System for DFT Coefficients Estimation Using SIMULINK**

Omar Mustaf *Gulf University Kingdom of Bahrain* 

### **1. Introduction**

32 Fourier Transform – Signal Processing

Yoshida, I., Sugai, T., Tani, S., Motegi, M., Minamida, K. & Hayakawa, H. (1981) Automation

*J. Phys. E: Sci. Instrum.*, vol. 14, pp. 1201-1206

of internal friction measurement apparatus of inverted torsion pendulum type,

Spectral Analysis is of great importance in signal processing applications in general, and in the analysis and the performance evaluation of communications systems specifically. On the other hand the procedure of calculating/estimating the spectrum itself is also important, i.e. whether it's simple or complicated, especially when taking into account the limitations of onboard space in the parallel computation and VLSI implementation. Therefore, in response to the above, Widrow et al. proposed an adaptive method for estimating the frequency content of a signal through demonstrating a relationship between the Discrete Fourier Transform (DFT) and the Least Mean Square (LMS) algorithm, where the DFT coefficients are estimated by a new means using the LMS algorithm (Widrow et al., 1987). That was the original attempt of relating the DFT to the LMS adaptation rule. The main features of such a spectrum analysis are simplicity, adaptability, and suitability with parallel computations and VLSI implementation. This is owing to the nature of the LMS algorithm, which lends itself to this type of implementation.

Later on, Mccgee showed that Widrow's spectrum analyzer could be used as a recursive estimator for the sake of solving the exponentially-weighted least squares estimation and a filter bank model was deduced. The fundamental outcome of that work was that the LMS algorithm could act as a bank of filters with two modes of operation, which means when the LMS learning rate is chosen to be ½, the filter poles are located at the origin. This means equivalently that the LMS effective transfer functions are FIR filters; otherwise, the equivalent filter is IIR (Mccgee, 1989).

The Widrow's principal relation between the LMS and the DFT was then extended to the 2-Dimentional (2-D) case by Liu and Bruton, which directly resulted in the 2-D LMS spectrum analyzer. Here it was shown that the 2-D LMS algorithm has the advantage that allows concurrent computations of what was called an updating matrix and therefore the potential for very fast parallel computations of the 2-D DFT; however, there was no statement about how fast it was (Liu and Bruton, 1993).

Two years later, a generalization of the above mentioned 2-D LMS spectrum analyzer was demonstrated by (Ogunfunmi and Au, 1995). It was achieved by successfully extending the relation to other 2-D discrete orthogonal transforms. The simulations of that work showed the same results when imposing a frame of a test signal of size 32 by 32 to both a 2-D DFT

A Proposed Model-Based Adaptive System for DFT Coefficients Estimation Using SIMULINK 35

In this section the mathematical model of the adaptive spectrum analyzer, which was proposed first by Widrow et al. (Widrow et al., 1987), is presented. The main theory and the useful result of that work is given in this section without going further into the detailed mathematical derivations and proof, as it is out of the scope of this chapter. The objective of the chapter is to use this demonstration to propose a SIMULINK model that is corresponding to the adaptive LMS spectrum analyzer. Also, the proposed simulation model might be adopted later as a basic building block in the design of adaptive two-layer structures for fast filtering as it will be illustrated in the future work section at the end of the chapter. The Widrow et al. model will be abbreviated as WASA throughout the rest of the

The key idea of WASA is that for an arbitrary signal to be analyzed in the frequency domain, it's possible to choose a number of phasors, according to a certain criterion, and then weight these phasors with adaptable weights that might be adapted by iterating the well-known LMS adaptation role. Upon choosing a suitable value for the adaptation rate, these weighted phasors are equal to the DFT coefficients of the signal under consideration.

**2. The mathematical background of the Adaptive Spectrum Analyzer** 

chapter, which stands for Widrow Adaptive Spectrum Analyzer.

Fig. 1 shows the complete schematic diagram of the WASA model.

Fig. 1. Adaptive LMS spectrum Analyzer as proposed by (Widrow, et al., 1987)

phasors span the frequency range from DC up to the sampling frequency

frequency is *<sup>T</sup>* Ω = *2*π

The signal *dj* is the input sampled version of the signal to be Fourier analyzed which is sampled at each time index *j*. The sampling time is T sec and the corresponding sampling

DFT coefficients. The left hand side of Fig. 1 shows a set of complex exponentials, which represent the time domain phasors that are supposed to equal the DFT coefficients of *dj* after weighting them with an adaptable weight vector. There are *N* phasors, where *N* refers to the desired number of the DFT coefficients. The corresponding frequencies of these set of

rad/sec. It's the objective of WASA to resolve *dj* to its corresponding

Ω

, including the

LMS based algorithm and a 2-D spectrum analyzer for the 2-D discrete cosine transform. The same results coincidence was demonstrated when comparing the estimated spectrum of the 2-D DFT LMS based with that of the 2-D discrete Hartly transform (DHT).

Beaufays and Widrow came back in 1995 to compare the LMS spectrum analyzer with the straightforward, non-adaptive implementation of the recursive DFT, and the robustness of the LMS spectrum analyzer to the propagation of round-off errors was demonstrated. Also, they showed that this property is not shared by other recursive DFT algorithms (Beaufays and Widrow, 1995).

In 1999 Alvarez et al. proposed the analog version of the LMS spectrum analyzer, where the coefficients are adapted independently by analog LMS structure which can be implemented in the VLSI technology. The main advantage gained from the work is that since the adaptation is carried out in the continuous time domain, real time computing speed was achieved (Alvarez et al., 1999).

This chapter aims mainly to propose a model-based simulation design procedure for realizing the relationship between the DFT and the LMS adaptation algorithm, using a very powerful simulation tool, namely SIMULINK, which runs under the MATLAB package. The proposed design is supposed to perform a spectral estimation using Widrow's adaptive LMS spectrum analyzer (Widrow et al., 1987). Therefore, this work attempts to provide a design procedure for that theoretical work. Specifically, this chapter proposes a modelbased design procedure with simulation results to show the importance of this relation, which can be considered as an adaptive spectrum analyzer with wide range of applications in the design of modern digital communications transceivers and specifically the transform domain equalizers.

The simulation methodology is based on the SIMULINK environment and it adopts a block by block SIMULINK design procedure by which each *system unit* (system unit means, for example, phasor generator unit, LMS adaptation process unit) of the designed model is created at the block level and then assembled together properly to form the intended system model of Widrow's adaptive spectrum analyzer model.

After finalizing the design, test signals will be imposed to the model for verifying the validity of the DFT coefficients that are estimated adaptively using the LMS algorithm. The simulation results will be showed and discussed clearly in the simulation results discussion section. In addition, new frequency contents will be added to the test input signal during the running mode to test the property of adaptability to sudden changes in the input signal, thus seeing how the system will follow up these changes.

As a future work, a generalized SIMULINK model design procedure for any specified *n-point* spectral estimation is suggested as a further development of this work. Also, there will be a suggestion of how to use the proposed model in the design and simulation of frequency domain equalizers as an application.

Finally, the chapter includes a brief introduction to using SIMULINK, which is important, especially for those who do not have access to or have limited information about SIMULINK. It acts as a *clear, easy, short and concentrated guide* to understand and/or develop the work presented in this chapter.

34 Fourier Transform – Signal Processing

LMS based algorithm and a 2-D spectrum analyzer for the 2-D discrete cosine transform. The same results coincidence was demonstrated when comparing the estimated spectrum of

Beaufays and Widrow came back in 1995 to compare the LMS spectrum analyzer with the straightforward, non-adaptive implementation of the recursive DFT, and the robustness of the LMS spectrum analyzer to the propagation of round-off errors was demonstrated. Also, they showed that this property is not shared by other recursive DFT algorithms (Beaufays

In 1999 Alvarez et al. proposed the analog version of the LMS spectrum analyzer, where the coefficients are adapted independently by analog LMS structure which can be implemented in the VLSI technology. The main advantage gained from the work is that since the adaptation is carried out in the continuous time domain, real time computing speed was

This chapter aims mainly to propose a model-based simulation design procedure for realizing the relationship between the DFT and the LMS adaptation algorithm, using a very powerful simulation tool, namely SIMULINK, which runs under the MATLAB package. The proposed design is supposed to perform a spectral estimation using Widrow's adaptive LMS spectrum analyzer (Widrow et al., 1987). Therefore, this work attempts to provide a design procedure for that theoretical work. Specifically, this chapter proposes a modelbased design procedure with simulation results to show the importance of this relation, which can be considered as an adaptive spectrum analyzer with wide range of applications in the design of modern digital communications transceivers and specifically the transform

The simulation methodology is based on the SIMULINK environment and it adopts a block by block SIMULINK design procedure by which each *system unit* (system unit means, for example, phasor generator unit, LMS adaptation process unit) of the designed model is created at the block level and then assembled together properly to form the intended system

After finalizing the design, test signals will be imposed to the model for verifying the validity of the DFT coefficients that are estimated adaptively using the LMS algorithm. The simulation results will be showed and discussed clearly in the simulation results discussion section. In addition, new frequency contents will be added to the test input signal during the running mode to test the property of adaptability to sudden changes in the input signal,

As a future work, a generalized SIMULINK model design procedure for any specified *n-point* spectral estimation is suggested as a further development of this work. Also, there will be a suggestion of how to use the proposed model in the design and simulation of

Finally, the chapter includes a brief introduction to using SIMULINK, which is important, especially for those who do not have access to or have limited information about SIMULINK. It acts as a *clear, easy, short and concentrated guide* to understand and/or develop

model of Widrow's adaptive spectrum analyzer model.

thus seeing how the system will follow up these changes.

frequency domain equalizers as an application.

the work presented in this chapter.

the 2-D DFT LMS based with that of the 2-D discrete Hartly transform (DHT).

and Widrow, 1995).

domain equalizers.

achieved (Alvarez et al., 1999).

### **2. The mathematical background of the Adaptive Spectrum Analyzer**

In this section the mathematical model of the adaptive spectrum analyzer, which was proposed first by Widrow et al. (Widrow et al., 1987), is presented. The main theory and the useful result of that work is given in this section without going further into the detailed mathematical derivations and proof, as it is out of the scope of this chapter. The objective of the chapter is to use this demonstration to propose a SIMULINK model that is corresponding to the adaptive LMS spectrum analyzer. Also, the proposed simulation model might be adopted later as a basic building block in the design of adaptive two-layer structures for fast filtering as it will be illustrated in the future work section at the end of the chapter. The Widrow et al. model will be abbreviated as WASA throughout the rest of the chapter, which stands for Widrow Adaptive Spectrum Analyzer.

The key idea of WASA is that for an arbitrary signal to be analyzed in the frequency domain, it's possible to choose a number of phasors, according to a certain criterion, and then weight these phasors with adaptable weights that might be adapted by iterating the well-known LMS adaptation role. Upon choosing a suitable value for the adaptation rate, these weighted phasors are equal to the DFT coefficients of the signal under consideration. Fig. 1 shows the complete schematic diagram of the WASA model.

Fig. 1. Adaptive LMS spectrum Analyzer as proposed by (Widrow, et al., 1987)

The signal *dj* is the input sampled version of the signal to be Fourier analyzed which is sampled at each time index *j*. The sampling time is T sec and the corresponding sampling frequency is *<sup>T</sup>* Ω = *2*π rad/sec. It's the objective of WASA to resolve *dj* to its corresponding DFT coefficients. The left hand side of Fig. 1 shows a set of complex exponentials, which represent the time domain phasors that are supposed to equal the DFT coefficients of *dj* after weighting them with an adaptable weight vector. There are *N* phasors, where *N* refers to the desired number of the DFT coefficients. The corresponding frequencies of these set of phasors span the frequency range from DC up to the sampling frequency Ω, including the

A Proposed Model-Based Adaptive System for DFT Coefficients Estimation Using SIMULINK 37

is the instantaneous error between the input signal *dj* and the spectrum analyzer output, *yj*

whereas, μ represents the LMS adaptation rate (speed) and *jX* is the complex conjugate of *Xj*. (Widrow, et al., 1987) concluded that when the learning speed is set to 0.5, then the output of the spectrum analyzer of Fig.1 is equal to the DFT of the input signal *dj-1*. Referring

*e*

*e*

This conclusion is demonstrated through the proposed SIMULINK model of this chapter through considering real time signals for the input *dj*, and then it will be compared with that

This is a brief guide to using SIMULINK regarding to the modelling and simulation of dynamic systems. It gives a general introduction to SIMULINK, its importance and some useful tips of modelling with SIMULINK. The guide provides sufficient knowledge to understand the proposed model of this chapter, especially for those who have a limited knowledge of SIMULIMK. A simple DSP example, namely a 'tapped delay line' of an arbitrary signal, is adopted in the end of the guide as a simple but a good practice of

Finally, it is important here to indicate that the entire guide, and even the design of the proposed system of this chapter, assumes that the MATLAB is installed on a platform with a

*<sup>j</sup> <sup>N</sup> )2(2i*

π

*<sup>j</sup> <sup>N</sup> )2(2i*

π

*e*

*j N 2 i*

π

*1*

⎡

*N 1*

=

⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟

⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜

⎛

⎝

 *vector output analyzer spectrum LMS*

⎞

⎠

*<sup>1</sup> ]d[DFT*

∴ =

**3. Guide for using SIMULINK** 

modelling with SIMULINK.

*1j*

−

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎣

*N*

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎣

of the standard sliding DFT system in the simulation result section.

*1*

⎡

*e*

*j N 2 i*

π

*j*

*<sup>T</sup>* <sup>=</sup> *jj WXy* (7)

*e*

*e*

π

*<sup>j</sup> <sup>N</sup> )1N(2i*

−

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎦

*. . .*

π−

*<sup>j</sup> <sup>N</sup> )1N(2i*

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

⎦

⎤

*Wj*

*. . .*

*Wj*

(8)

(9)

⎤

which is given by:

to Fig. 1, the spectrum analyzer output is:

fundamental frequency *<sup>N</sup>* Ω . The fundamental phasor can be expressed in terms of the time index *j* as:

$$e^{i\frac{\Omega}{N}jT} \tag{1}$$

Where *i =*√��*.* Now, since Ω = *2T.* π, then this phasor becomes:

$$e^{\frac{2\pi}{N}j} \tag{2}$$

The rest of the other phasors are the power of equation (2).

In vector form these phasors might be written as:

$$\mathbf{X}\_{j} \stackrel{\hat{\mathbf{I}}}{=} \frac{I}{\sqrt{N}} \begin{bmatrix} I \\ e^{i\frac{2\pi}{N}j} \\ e^{i\frac{4\pi}{N}j} \\ e^{i\frac{4\pi}{N}j} \\ \vdots \\ e^{i\frac{2\pi(N-I)}{N}j} \end{bmatrix} \tag{3}$$

These phasors are to be weighted at each sampling time index *j* by the adaptable weight vector *Wj*, where

$$\mathcal{W}\_{\hat{f}} \doteq \begin{bmatrix} w\_{j0} \\ w\_{j1} \\ w\_{j2} \\ \cdot \\ \cdot \\ w\_{jN-1} \end{bmatrix} \tag{4}$$

The factor *N <sup>1</sup>* is a normalization factor and it's used to simplify the analysis of the system

of Fig. 1, because it pulls the power of the phasors' vector, *X <sup>j</sup>* to unity, as it is shown in the main literature of (Widrow, et al, 1987). The weight vector components *wj0, wj1,…., wjN-1* are updated at each sampling time index j in accordance to the well-known LMS adaptation rule,

$$\mathcal{W}\_{j+1} = \mathcal{W}\_j + 2\mu \varepsilon\_j \overline{X}\_j \tag{5}$$

Where

$$\mathbf{z}\_{j} = \mathbf{d}\_{j} - \mathbf{y}\_{j} \tag{6}$$

is the instantaneous error between the input signal *dj* and the spectrum analyzer output, *yj* which is given by:

$$\mathbf{y}\_{\cdot j} = \boldsymbol{X}\_{\cdot j}^{\cdot T} \boldsymbol{W}\_{\cdot j} \tag{7}$$

whereas, μ represents the LMS adaptation rate (speed) and *jX* is the complex conjugate of *Xj*. (Widrow, et al., 1987) concluded that when the learning speed is set to 0.5, then the output of the spectrum analyzer of Fig.1 is equal to the DFT of the input signal *dj-1*. Referring to Fig. 1, the spectrum analyzer output is:

This conclusion is demonstrated through the proposed SIMULINK model of this chapter through considering real time signals for the input *dj*, and then it will be compared with that of the standard sliding DFT system in the simulation result section.

### **3. Guide for using SIMULINK**

36 Fourier Transform – Signal Processing

*jT <sup>N</sup> i e* Ω

*j N 2 i e* π

> ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

*N 1*

*e*

These phasors are to be weighted at each sampling time index *j* by the adaptable weight

⎡

=

⎣

⎡

=

*W j*

⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢

⎣

of Fig. 1, because it pulls the power of the phasors' vector, *X <sup>j</sup>* to unity, as it is shown in the main literature of (Widrow, et al, 1987). The weight vector components *wj0, wj1,…., wjN-1* are updated at each sampling time index j in accordance to the well-known LMS adaptation

<sup>+</sup> *WW j1j* +=

ε

*jjj*

, then this phasor becomes:

. The fundamental phasor can be expressed in terms of the

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

<sup>−</sup> *<sup>j</sup> <sup>N</sup> )1N(2i*

⎤

⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

−

*1jN <sup>w</sup> . . j2 w j1 <sup>w</sup> j0 w*

⎦

*<sup>1</sup>* is a normalization factor and it's used to simplify the analysis of the system

με

*j N 4 i*

π

*. . e*

π

*j N 2 i*

π

*1*

*e*

⎤

*<sup>X</sup> <sup>j</sup>* (3)

(4)

*X2 jj* (5)

<sup>=</sup> <sup>−</sup> *yd* (6)

⎦

(1)

(2)

fundamental frequency *<sup>N</sup>*

Where *i =*√��*.* Now, since

time index *j* as:

vector *Wj*, where

The factor

rule,

Where

*N*

Ω

Ω = *2T.* π

The rest of the other phasors are the power of equation (2).

In vector form these phasors might be written as:

This is a brief guide to using SIMULINK regarding to the modelling and simulation of dynamic systems. It gives a general introduction to SIMULINK, its importance and some useful tips of modelling with SIMULINK. The guide provides sufficient knowledge to understand the proposed model of this chapter, especially for those who have a limited knowledge of SIMULIMK. A simple DSP example, namely a 'tapped delay line' of an arbitrary signal, is adopted in the end of the guide as a simple but a good practice of modelling with SIMULINK.

Finally, it is important here to indicate that the entire guide, and even the design of the proposed system of this chapter, assumes that the MATLAB is installed on a platform with a

A Proposed Model-Based Adaptive System for DFT Coefficients Estimation Using SIMULINK 39

Fig. 4 shows the SIMULINK Library Browser window that appears after clicking on the SIMULINK icon. This is where you can find all the components that you may add to your model. Components are known as blocks in the SIMULINK context. Once the SIMULINK library browser is opened, a modeling environment has to be opened. This is called a model file, which is opened from the SIMULINK library browser toolbar by clicking on the new

In this model file you can create, connect all of your system components, simulate, change simulation parameters and finally view and/or print the results. The model file has to be saved with a suitable name and it will be saved in a default folder, namely MATLAB, which is created automatically during the installation period. Fig. 6 clarifies how to save a

model icon as illustrated in Fig. 4. Fig. 5 shows a blank new model file.

Fig. 4. The SIMULINK Library Browser

Fig. 5. New SIMULINK model file

SIMULINK model.

Microsoft Windows operating system and not a Macintosh environment. Consequently, this may lead to some differences to those who are using a Macintosh operating system when dealing with the graphical user interfaces.

### **3.1 SIMULINK at a glance**

SIMULINK which stands for SIMUlation and LINK, is a simulation tool that runs under the MATLAB package which stands for MATrices LABoratory, the main software by Mathworks. SIMULINK combines both the simplicity and the high capability of simulation and modelling of dynamic and embedded systems. SIMULINK can be used efficiently in the design, implementation, simulation and testing of a wide range of applications like communications, controls, signal processing, video processing and image processing. Although SIMULINK requires the installation of MATLAB on the platform of simulation, in general it doesn't require prior knowledge of programming using MATLAB, which is the case here in our proposed model-based system design; however, in other cases it does require such knowledge, but it is out of the scope of this chapter to go further in explaining such cases.

SIMULINK, as stated earlier, works under the MATLAB package, so it requires that MATLAB is installed on the simulation platform, normally a personal computer (pc), and the installation of SIMULINK choice should be selected during the installation process. Thus, the first step is running MATLAB. MATLAB can be run by double-clicking on the MATLAB icon on the desktop, and then SIMULINK is easily opened by either clicking on the SIMULINK icon in the MATLAB toolbar or by writing SIMULINK in what is called the *command window*.

### Fig. 2. Opening a MATLAB session

SIMULINK can be opened in different ways. The simplest way is by clicking on the SIMULINK icon in the MATLAB toolbar as illustrated in Fig. 3.

Fig. 3. Opening SIMULINK

Fig. 4 shows the SIMULINK Library Browser window that appears after clicking on the SIMULINK icon. This is where you can find all the components that you may add to your model. Components are known as blocks in the SIMULINK context. Once the SIMULINK library browser is opened, a modeling environment has to be opened. This is called a model file, which is opened from the SIMULINK library browser toolbar by clicking on the new model icon as illustrated in Fig. 4. Fig. 5 shows a blank new model file.

Fig. 4. The SIMULINK Library Browser

38 Fourier Transform – Signal Processing

Microsoft Windows operating system and not a Macintosh environment. Consequently, this may lead to some differences to those who are using a Macintosh operating system when

SIMULINK which stands for SIMUlation and LINK, is a simulation tool that runs under the MATLAB package which stands for MATrices LABoratory, the main software by Mathworks. SIMULINK combines both the simplicity and the high capability of simulation and modelling of dynamic and embedded systems. SIMULINK can be used efficiently in the design, implementation, simulation and testing of a wide range of applications like communications, controls, signal processing, video processing and image processing. Although SIMULINK requires the installation of MATLAB on the platform of simulation, in general it doesn't require prior knowledge of programming using MATLAB, which is the case here in our proposed model-based system design; however, in other cases it does require such knowledge, but it is out of the scope of this chapter to go further in explaining

SIMULINK, as stated earlier, works under the MATLAB package, so it requires that MATLAB is installed on the simulation platform, normally a personal computer (pc), and the installation of SIMULINK choice should be selected during the installation process. Thus, the first step is running MATLAB. MATLAB can be run by double-clicking on the MATLAB icon on the desktop, and then SIMULINK is easily opened by either clicking on the SIMULINK icon in the MATLAB toolbar or by writing SIMULINK in what is called the

SIMULINK can be opened in different ways. The simplest way is by clicking on the

SIMULINK icon in the MATLAB toolbar as illustrated in Fig. 3.

dealing with the graphical user interfaces.

**3.1 SIMULINK at a glance** 

such cases.

*command window*.

Fig. 2. Opening a MATLAB session

Fig. 3. Opening SIMULINK

In this model file you can create, connect all of your system components, simulate, change simulation parameters and finally view and/or print the results. The model file has to be saved with a suitable name and it will be saved in a default folder, namely MATLAB, which is created automatically during the installation period. Fig. 6 clarifies how to save a SIMULINK model.

Fig. 5. New SIMULINK model file

A Proposed Model-Based Adaptive System for DFT Coefficients Estimation Using SIMULINK 41

Now, the entire components/blocks that you may wish to add to your model are available at the SIMULINK LIBRARY BROWSER. The next step is inserting and connecting the blocks of the model. A block can be inserted from the library browser very easily by left-clicking on the block, holding down the button, dragging it and then releasing the mouse in the model file. The procedure of adding a block to a model is illustrated in Fig. 7. It is useful here to indicate that a brief explanation about the mathematical operation of the selected block

Similarly, the entire model's blocks might be inserted in the same way. At this point, after inserting all of the required model components, you may rearrange the locations of the components within the same model file in such a way that best matches the schematic diagram of the simulated system. The next important step is linking the blocks of the model. Fig. 8 shows how to link two blocks together automatically, while Fig. 9 shows how to make

Each SIMULINK block has its own simulation parameters, and these parameters can be reached by double-clicking on the block and then it can be changed according to the simulation requirements. As an example, Fig. 10 shows how to change the gain value of a

appears at the bottom of the library browser, as is illustrated in Fig. 7.

a branch line.

Fig. 8. Linking blocks in the model file

Fig. 9. Making a branch line

gain block.


Fig. 6. A standard Windows operation to save SIMULINK model files

Fig. 7. Inserting blocks in a SIMULINK model

Now, the entire components/blocks that you may wish to add to your model are available at the SIMULINK LIBRARY BROWSER. The next step is inserting and connecting the blocks of the model. A block can be inserted from the library browser very easily by left-clicking on the block, holding down the button, dragging it and then releasing the mouse in the model file. The procedure of adding a block to a model is illustrated in Fig. 7. It is useful here to indicate that a brief explanation about the mathematical operation of the selected block appears at the bottom of the library browser, as is illustrated in Fig. 7.

Similarly, the entire model's blocks might be inserted in the same way. At this point, after inserting all of the required model components, you may rearrange the locations of the components within the same model file in such a way that best matches the schematic diagram of the simulated system. The next important step is linking the blocks of the model. Fig. 8 shows how to link two blocks together automatically, while Fig. 9 shows how to make a branch line.

Fig. 8. Linking blocks in the model file

40 Fourier Transform – Signal Processing

Fig. 6. A standard Windows operation to save SIMULINK model files

Fig. 7. Inserting blocks in a SIMULINK model

Fig. 9. Making a branch line

Each SIMULINK block has its own simulation parameters, and these parameters can be reached by double-clicking on the block and then it can be changed according to the simulation requirements. As an example, Fig. 10 shows how to change the gain value of a gain block.

A Proposed Model-Based Adaptive System for DFT Coefficients Estimation Using SIMULINK 43

SIMULINK provides the facility of creating subsystems. This means that you have the ability to create your own blocks by combining a number of blocks together to perform a certain function. It is useful here to explain how to create a subsystem using the following example, as this facility will be adopted in this chapter in the creation of the LMS spectrum

Fig. 11. Changing the number of inputs of the scope block

Fig. 12. Tapped delay line model

analyzer model.


Fig. 10. SIMULINK block parameter viewing/changing

Finally, the facility of viewing the simulation results in various ways, adds more flexibility in performing simulations using SIMULINK. Actually, there are several devices that you might use for this purpose, namely *scope*, *x-y plotter*, *display*, *spectrum analyzer*, and *to work space*. The illustrative example in the next subsection shows the use of a *scope* device to display the simulation results*.* The user has the choice of inserting the scope device anywhere in the model to view the desired signal in the time domain.

### **3.2 An illustrative example of running a SIMULINK model: A tapped delay line model**

This example shows a very common process in the DSP applications, which is the creation of a tapped delay line, which is a simple practice to learn how to create/run SIMULINK models. So, the idea is to create a lines that carry delayed versions of an arbitrary signal, say x(n). Suppose that there is a need to create delayed versions of x(n) of up to x(n-4). The model will be created on the last saved model, i.e. the one shown in Fig. 6 which holds the model name 'example\_1'.

The model creation and simulation may be summarized in the following steps.


Fig. 11. Changing the number of inputs of the scope block

Fig. 12. Tapped delay line model

42 Fourier Transform – Signal Processing

Finally, the facility of viewing the simulation results in various ways, adds more flexibility in performing simulations using SIMULINK. Actually, there are several devices that you might use for this purpose, namely *scope*, *x-y plotter*, *display*, *spectrum analyzer*, and *to work space*. The illustrative example in the next subsection shows the use of a *scope* device to display the simulation results*.* The user has the choice of inserting the scope device

**3.2 An illustrative example of running a SIMULINK model: A tapped delay line model**  This example shows a very common process in the DSP applications, which is the creation of a tapped delay line, which is a simple practice to learn how to create/run SIMULINK models. So, the idea is to create a lines that carry delayed versions of an arbitrary signal, say x(n). Suppose that there is a need to create delayed versions of x(n) of up to x(n-4). The model will be created on the last saved model, i.e. the one shown in Fig. 6 which holds the

Step 2. Insert an arbitrary input signal from the source blocks category, and let it be a step

Step 3. Insert a scope block from the sinks category of blocks. Change the number of scope

Step 4. Re arrange all of the blocks inside the model and connect them. Follow the

Step 5. Click on the run button, as indicated in Fig. 12, to run the model for a default

inputs to five, by double-clicking on the scope block and then changing the number

The model creation and simulation may be summarized in the following steps.

Step 1. Insert four unit delays from the discrete blocks category.

procedure given in Fig. 9 to connect a line with a block.

of the axes parameter as shown in Fig. 11.

simulation time of 10 sec.

Fig. 10. SIMULINK block parameter viewing/changing

model name 'example\_1'.

block.

anywhere in the model to view the desired signal in the time domain.

SIMULINK provides the facility of creating subsystems. This means that you have the ability to create your own blocks by combining a number of blocks together to perform a certain function. It is useful here to explain how to create a subsystem using the following example, as this facility will be adopted in this chapter in the creation of the LMS spectrum analyzer model.

A Proposed Model-Based Adaptive System for DFT Coefficients Estimation Using SIMULINK 45

Fig. 14. Useful operations on a subsystem

Suppose you need to use a tapped delay line of certain delays many times in your model or perhaps you want to create your own library of blocks within your SIMULINK package. SIMULINK provides the ability to do that by grouping them within one block with a default name of *subsystem 1*. This can be realized by selecting all the corresponding components and then right-clicking on one of them and choosing the create subsystem option form the menu as illustrated in Fig. 13.

Fig. 13. Subsystem creation; one approach

Now your subsystem has been created, as shown in Fig. 13, and it is ready to use. You can resize your subsystem in order to make it fits its location in the model and also rename the input and/or the output ports in order to make your subsystem more organized and more expressive. Some of the useful operations on a subsystem are shown in Fig. 14.

### **4. The proposed WASA SIMULINK based design**

In this section, a SIMULINK model which corresponds to the theoretical model of WASA is created and later it will be tested for a 4-points DFT case. The complete model is shown in Fig. 15. This model consists of two main subsystems, namely the phasor creation subsystem and the LMS adaptation subsystem. Other blocks are the summer, subtractor, and the mixers bank. The phasor creation subsystem is responsible for the generation of the frequency vector *Xj* which is defined by equation 3. The procedure that is followed here to realize each phasor is to combine two sine and cosine source signals to form the required phasor according to Euler's formula;

$$e^{\pm j\theta} = \cos(\theta) \pm j\sin(\theta) \tag{10}$$

44 Fourier Transform – Signal Processing

Suppose you need to use a tapped delay line of certain delays many times in your model or perhaps you want to create your own library of blocks within your SIMULINK package. SIMULINK provides the ability to do that by grouping them within one block with a default name of *subsystem 1*. This can be realized by selecting all the corresponding components and then right-clicking on one of them and choosing the create subsystem option form the menu

Now your subsystem has been created, as shown in Fig. 13, and it is ready to use. You can resize your subsystem in order to make it fits its location in the model and also rename the input and/or the output ports in order to make your subsystem more organized and more

In this section, a SIMULINK model which corresponds to the theoretical model of WASA is created and later it will be tested for a 4-points DFT case. The complete model is shown in Fig. 15. This model consists of two main subsystems, namely the phasor creation subsystem and the LMS adaptation subsystem. Other blocks are the summer, subtractor, and the mixers bank. The phasor creation subsystem is responsible for the generation of the frequency vector *Xj* which is defined by equation 3. The procedure that is followed here to realize each phasor is to combine two sine and cosine source signals to form the required

*)sin(j)cos(e <sup>j</sup>*

θ±= <sup>±</sup>

θθ

(10)

expressive. Some of the useful operations on a subsystem are shown in Fig. 14.

**4. The proposed WASA SIMULINK based design** 

as illustrated in Fig. 13.

Fig. 13. Subsystem creation; one approach

phasor according to Euler's formula;



Fig. 14. Useful operations on a subsystem

A Proposed Model-Based Adaptive System for DFT Coefficients Estimation Using SIMULINK 47

Fig. 16. The block connection details inside the phasor creation subsystem

Fig. 17. The detailed blocks inside the LMS adaptation subsystem

Fig. 15. The proposed WASA SIMULINK model

Fig. 16 shows a proposed 4points DFT phasor creation subsystem. Coming down to the block level, each phasor is composed of sine and cosine signals with the corresponding frequency band that starts from the DC component and ends with the ( *<sup>N</sup>* π<sup>−</sup> *1)(N2* )

### component.

The outputs from this subsystem are then fed to two places: firstly, to the mixer bank for the sake of calculating the spectrum analyzer output, *yj*; secondly, to the second main subsystem. This is the LMS adaptation process because these phasors are included in the adaptation process of the weight vector *Wj*. Fig. 17 shows the contents of the LMS adaptation subsystem. The blocks inside the red rectangle, which are the simulation of equation 5, perform the adaptation role for each weight of the weight vector. Also, this figure shows how the instantaneous error and the adaptation rate constant are fed to all of the adaptation roles of the weights, *wj*.

46 Fourier Transform – Signal Processing

Fig. 16 shows a proposed 4points DFT phasor creation subsystem. Coming down to the block level, each phasor is composed of sine and cosine signals with the corresponding

The outputs from this subsystem are then fed to two places: firstly, to the mixer bank for the sake of calculating the spectrum analyzer output, *yj*; secondly, to the second main subsystem. This is the LMS adaptation process because these phasors are included in the adaptation process of the weight vector *Wj*. Fig. 17 shows the contents of the LMS adaptation subsystem. The blocks inside the red rectangle, which are the simulation of equation 5, perform the adaptation role for each weight of the weight vector. Also, this figure shows how the instantaneous error and the adaptation rate constant are fed to all of

π<sup>−</sup> *1)(N2* )

frequency band that starts from the DC component and ends with the ( *<sup>N</sup>*

Fig. 15. The proposed WASA SIMULINK model

the adaptation roles of the weights, *wj*.

component.

Fig. 16. The block connection details inside the phasor creation subsystem

Fig. 17. The detailed blocks inside the LMS adaptation subsystem

A Proposed Model-Based Adaptive System for DFT Coefficients Estimation Using SIMULINK 49

shows various cases for the sampled version of this signal after running the proposed model

be 0.5, the output of the adaptive LMS spectrum analyzer will be equal to the DFT of the

μ

ε

μ

will never reach zero as

is chosen to be more than 1, *<sup>j</sup>*

. The results justify the mathematical

ε

μ

. Fig. 19

is chosen to

ε

Another interesting result to be shown here is the instantaneous error signal, *<sup>j</sup>*

proof in (Widrow, et al, 1987) which is given by equation (9) and states that if

input sampled signal at the previous time instance. Otherwise, *<sup>j</sup>*

is increasing up to infinity as the LMS algorithm becomes unstable.

Fig. 18. 4 points DFT WASA model results as compared with sliding DFT

the case shows in Fig. 19 (a). Fig. 19 (c) shows that when

three times for three values of the learning speed

The mechanism of operation of this model might be summarized as follows. The phasors are created in the phasor creation subsystem at each time sample index *j*. These vectors are weighted by the pre-loaded values of the weight vector through the mixer bank, then the weighted phasors are summed together to generate a sample of the LMS spectrum analyzer output *yj*. The latter is subtracted from the input signal sample *dj* to generate the instantaneous error signal *<sup>j</sup>* ε . This error is fed to the LMS adaptation subsystem to update each weight gain of the weight vector instantaneously with the adaptation speed μ and the scaled phasor vector *Xj*. Now this model is finalized, and it is ready for setting the simulation parameters and exploring the results.

### **5. Simulation and results discussions**

In this section, the performance of the proposed WASA SIMULINK model is tested using the following two scenarios.


The simulation parameters, results and results discussions for the two scenarios are as demonstrated below:

### **Scenario (1)**

### **Simulation parameters**

Simulation time (run time): 10 sec.

WASA output size: 4-DFT coefficients.

Input signal, *dj*: pulse signal with a period time of 4μsec, 50% duty cycle, amplitude of 1volts, and sampling rate of 1 MSps.

### **Simulation results and discussion**

The simulation results for this scenario are compared with those obtained from the conventional steady flow DFT. The steady flow DFT operation is designed from the tapped delay line example which was realized previously in Fig. 12 and a DFT SIMULINK block. The results for the proposed WASA model and the steady flow DFT are shown in Fig. 18.

The *display* block is adopted to display the calculated and the estimated DFT coefficients from the sliding DFT and the proposed WASA models respectively. After performing the running of the model, it is clearly seen that the results are absolutely coincident as they appear in their respective display blocks.

48 Fourier Transform – Signal Processing

The mechanism of operation of this model might be summarized as follows. The phasors are created in the phasor creation subsystem at each time sample index *j*. These vectors are weighted by the pre-loaded values of the weight vector through the mixer bank, then the weighted phasors are summed together to generate a sample of the LMS spectrum analyzer output *yj*. The latter is subtracted from the input signal sample *dj* to generate the

scaled phasor vector *Xj*. Now this model is finalized, and it is ready for setting the

In this section, the performance of the proposed WASA SIMULINK model is tested using

1. Imposing a test signal and the results, i.e. the estimated DFT coefficients of the input test signal (*dj*), are compared with those which are obtained from the steady flow DFT

2. Imposing a test input signal and while running the model. New frequency contents are introduced in this test signal to test the adaptability feature of the proposed model and snapshots are taken to show how the power spectrum is updated to include the newly-

The simulation parameters, results and results discussions for the two scenarios are as

Input signal, *dj*: pulse signal with a period time of 4μsec, 50% duty cycle, amplitude of

The simulation results for this scenario are compared with those obtained from the conventional steady flow DFT. The steady flow DFT operation is designed from the tapped delay line example which was realized previously in Fig. 12 and a DFT SIMULINK block. The results for the proposed WASA model and the steady flow DFT are shown in

The *display* block is adopted to display the calculated and the estimated DFT coefficients from the sliding DFT and the proposed WASA models respectively. After performing the running of the model, it is clearly seen that the results are absolutely coincident as they

each weight gain of the weight vector instantaneously with the adaptation speed

. This error is fed to the LMS adaptation subsystem to update

μ

and the

instantaneous error signal *<sup>j</sup>*

the following two scenarios.

added frequency contents.

Simulation time (run time): 10 sec.

WASA output size: 4-DFT coefficients.

1volts, and sampling rate of 1 MSps.

**Simulation results and discussion** 

appear in their respective display blocks.

operation.

demonstrated below:

**Simulation parameters** 

**Scenario (1)** 

Fig. 18.

ε

simulation parameters and exploring the results.

**5. Simulation and results discussions** 

Another interesting result to be shown here is the instantaneous error signal, *<sup>j</sup>* ε . Fig. 19 shows various cases for the sampled version of this signal after running the proposed model three times for three values of the learning speed μ . The results justify the mathematical proof in (Widrow, et al, 1987) which is given by equation (9) and states that if μ is chosen to be 0.5, the output of the adaptive LMS spectrum analyzer will be equal to the DFT of the input sampled signal at the previous time instance. Otherwise, *<sup>j</sup>* ε will never reach zero as the case shows in Fig. 19 (a). Fig. 19 (c) shows that when μ is chosen to be more than 1, *<sup>j</sup>* ε is increasing up to infinity as the LMS algorithm becomes unstable.

Fig. 18. 4 points DFT WASA model results as compared with sliding DFT

A Proposed Model-Based Adaptive System for DFT Coefficients Estimation Using SIMULINK 51

*<sup>2</sup>* − *<sup>o</sup>*

The second test is the adaptability test. This is carried out by imposing a composite signal of

It is clear that this signal contains two fundamental frequencies, *f1* and *f2*, but the *f2* component does not appear in the spectrum of the signal until the time instance *to* as the second term of *d(t)* is multiplied by the shifted unit step function. Therefore, the WASA

until the time instance *to*, then the weight vector will be adapted by the LMS role again to estimate the frequency content of the second term, which is *f2*. The results are viewed using the Spectrum scope block which is connected at the output node of the WASA model.

Two snap shots are taken from the spectrum scope block: one before *to* and the other after *to*,

Finally, it is clearly seen that the proposed model simulated the WASA system perfectly; however, there is an issue that seems to be a drawback for the procedure of the proposed WASA model creation. The point is that when the size of the system is small or moderate, i. e. the size of the DFT operation, the system creation is quite simple and doesn't require a lot of effort to finalize it, which is the case for points 2 to 16. But what about the points 32 and up; 128, for example? The issue is that the system will require a bulk of blocks to be embedded in, and the branching of lines across the model will be tedious work. So, there must be another strategy to be adopted in the model creation phase. This will be clarified in

Actually, there are two tracks to be followed for the use of the proposed WASA SIMULINKbased design and/or developing the proposed adaptive spectrum analyzer. First is to generalize the model by generalizing the two main subsystems of the model, the phasor creation block and the LMS adaptation, to work with any specified number of DFT points. This would be realized by using MATLAB programming to create two script files that represent the mathematical relations of these two subsystems, then turning these scripts into

π

*<sup>1</sup>* to estimate the *f1* component

π

**Scenario (2)** 

*f1*= 100 KHz. *f2*= 200 KHz. *to*= 0.01 sec.

**Simulation results** 

as shown in Fig. 20.

**6. Future work** 

**Simulation parameters**  Simulation time: 0.1 sec. WASA size: 4-DFT points.

Input test signal: *)tt(u).tf2sin()tf2sin()t(d* = π*<sup>1</sup>* +

Amplitude of 1volts, and sampling rate of 1MHz.

two sinusoidal signals to the model to serve as *dj*.

proposed SIMULINK model operates on the term *)tf2sin(*

**Simulation results and discussion** 

the suggestions of future work section.

(b) μ= 0.5

Fig. 19. The effect of the learning speed, μ , on the instantaneous error signal, *<sup>j</sup>* ε

### **Scenario (2)**

50 Fourier Transform – Signal Processing

(a) μ= 0.05

(b) μ= 0.5

(c) μ= 1.5

, on the instantaneous error signal, *<sup>j</sup>*

ε

μ

Fig. 19. The effect of the learning speed,

### **Simulation parameters**

Simulation time: 0.1 sec.

WASA size: 4-DFT points.

Input test signal: *)tt(u).tf2sin()tf2sin()t(d* = π *<sup>1</sup>* + π*<sup>2</sup>* − *<sup>o</sup>*

*f1*= 100 KHz. *f2*= 200 KHz. *to*= 0.01 sec.

Amplitude of 1volts, and sampling rate of 1MHz.

### **Simulation results**

The second test is the adaptability test. This is carried out by imposing a composite signal of two sinusoidal signals to the model to serve as *dj*.

### **Simulation results and discussion**

It is clear that this signal contains two fundamental frequencies, *f1* and *f2*, but the *f2* component does not appear in the spectrum of the signal until the time instance *to* as the second term of *d(t)* is multiplied by the shifted unit step function. Therefore, the WASA proposed SIMULINK model operates on the term *)tf2sin(* π *<sup>1</sup>* to estimate the *f1* component until the time instance *to*, then the weight vector will be adapted by the LMS role again to estimate the frequency content of the second term, which is *f2*. The results are viewed using the Spectrum scope block which is connected at the output node of the WASA model.

Two snap shots are taken from the spectrum scope block: one before *to* and the other after *to*, as shown in Fig. 20.

Finally, it is clearly seen that the proposed model simulated the WASA system perfectly; however, there is an issue that seems to be a drawback for the procedure of the proposed WASA model creation. The point is that when the size of the system is small or moderate, i. e. the size of the DFT operation, the system creation is quite simple and doesn't require a lot of effort to finalize it, which is the case for points 2 to 16. But what about the points 32 and up; 128, for example? The issue is that the system will require a bulk of blocks to be embedded in, and the branching of lines across the model will be tedious work. So, there must be another strategy to be adopted in the model creation phase. This will be clarified in the suggestions of future work section.

### **6. Future work**

Actually, there are two tracks to be followed for the use of the proposed WASA SIMULINKbased design and/or developing the proposed adaptive spectrum analyzer. First is to generalize the model by generalizing the two main subsystems of the model, the phasor creation block and the LMS adaptation, to work with any specified number of DFT points. This would be realized by using MATLAB programming to create two script files that represent the mathematical relations of these two subsystems, then turning these scripts into

A Proposed Model-Based Adaptive System for DFT Coefficients Estimation Using SIMULINK 53

SIMULINK blocks and embedding them later with the main model. This is a facility that SIMULINK provides, which makes the work look more professional and does not require huge and tedious wiring work during the model creation. The input parameter of the phasor creation block, for example, would be the number of the preferred DFT coefficient

Secondly is to extend the work of this chapter in the design of a class of digital modulation receivers. This could be the design of frequency domain adaptive equalizers, which adopt the frequency domain of the signals to remove the transmission channel noise from the received contaminated signals. Therefore, one can adopt the proposed WASA model in the design of the two layer linear structure for fast adaptive filtering, which was presented by (Beaufays, Widrow, 1995). Fig. 21 shows WASA location in the system inside the thick box.

Fig. 21. WASA suggestion for adaptive frequency equalization (Beaufays, Widrow, 1995)

A model-based design is proposed to simulate the LMS adaptive spectrum analyzer. A SIMULINK simulation environment is used for its simplicity and high capabilities. The proposed model successfully simulated Widrow's model for 4-DFT points. The results shows a coincidence between the estimated DFT coefficients from the proposed model and the results calculated from a standard steady flow DFT model. Also, it justifies that when

corresponds to the DFT coefficients of the applied input to the model. In addition, new frequency contents are added to the test input signal during the run (online) as a separate test to assess the adaptability property of the spectrum analyser model for the sudden changes in frequency content of the input signal. As a future work, a generalized

of the model is set to 0.5, then a set of weighted phasors exactly

and the sampling rate.

**7. Conclusion** 

the adaptation rate,

μ

SIMULINK model is suggested for any specified n-point DFT.

Fig. 20. Results for the second test/adaptability test. (a) Frequency spectrum before adding the 10KHz component (b) Spectrum after adding the 10KHz component

SIMULINK blocks and embedding them later with the main model. This is a facility that SIMULINK provides, which makes the work look more professional and does not require huge and tedious wiring work during the model creation. The input parameter of the phasor creation block, for example, would be the number of the preferred DFT coefficient and the sampling rate.

Secondly is to extend the work of this chapter in the design of a class of digital modulation receivers. This could be the design of frequency domain adaptive equalizers, which adopt the frequency domain of the signals to remove the transmission channel noise from the received contaminated signals. Therefore, one can adopt the proposed WASA model in the design of the two layer linear structure for fast adaptive filtering, which was presented by (Beaufays, Widrow, 1995). Fig. 21 shows WASA location in the system inside the thick box.

Fig. 21. WASA suggestion for adaptive frequency equalization (Beaufays, Widrow, 1995)

### **7. Conclusion**

52 Fourier Transform – Signal Processing

(a)

(b)

Fig. 20. Results for the second test/adaptability test. (a) Frequency spectrum before adding

the 10KHz component (b) Spectrum after adding the 10KHz component

A model-based design is proposed to simulate the LMS adaptive spectrum analyzer. A SIMULINK simulation environment is used for its simplicity and high capabilities. The proposed model successfully simulated Widrow's model for 4-DFT points. The results shows a coincidence between the estimated DFT coefficients from the proposed model and the results calculated from a standard steady flow DFT model. Also, it justifies that when the adaptation rate, μ of the model is set to 0.5, then a set of weighted phasors exactly corresponds to the DFT coefficients of the applied input to the model. In addition, new frequency contents are added to the test input signal during the run (online) as a separate test to assess the adaptability property of the spectrum analyser model for the sudden changes in frequency content of the input signal. As a future work, a generalized SIMULINK model is suggested for any specified n-point DFT.

**0**

**3**

*P.R.China*

**FFT-Based Efficient Algorithms for**

Suppose that we have a single sensor receiving a superimposition of attenuated and delayed replicas of a known signal plus noise. From the received data we want to estimate the arrival times of the various replicas and their (complex or real) attenuation coefficients (gains). This is the well-known time delay estimation problem which occurs in many fields including radar, active sonar, wireless communications, Global Navigation Satellite System (GNSS), nondestructive testing, geophysical/seismic exploration, and medical imaging. In this chapter, we will focus on time delay estimation based on one sensor with known probing

The most well-known time delay estimator is the matched filter approach. If there is only one signal or the overlapped signals are separated in time by an interval that is much greater than the width of the signal autocorrelation function, then the matched filter is the optimal estimator when the noise is white Gaussian (Ehrenberg et al., 1978) . The resolution capability of the matched filter approach depends on the signal bandwidth and the larger the signal bandwidth, the better the resolution. However, in many situations there exist some practical limitations on increasing the bandwidth of the transmitted signals. How to resolve closely spaced overlapping noisy echoes has attracted the attention of researchers from many fields

Several approaches have been suggested for this problem and many of them benefit from the recent development of high resolution sinusoidal frequency estimation and Direction of Arrival (DOA) estimation techniques. Sinusoidal frequency estimation techniques such as MUSIC (Schmidt, 1986), Linear Prediction (Tufts et al., 1982), and MODE (Stoica & Nehorai, 1990) are applied to the time delay estimation problem in (Bian & Last, 1997; Kirsteins, 1987; Kirsteins & Kot, 1990). However, these approaches are only applicable to signals with flat (rectangular) band-limited spectra. Several Maximum Likelihood (ML) approaches have also been suggested for this problem. Multidimensional global optimization algorithms are presented in (Bell & Ewart, 1986; Blackowiak & Rajan, 1995; Manickam et al., 1994) to analyze a special class of ocean acoustic data that has very oscillatory autocorrelation functions. An efficient approach based on the Expectation Maximization (EM) algorithm (Moon, 1996) is proposed in (Feder & Weinstein, 1988) that decouples the complicated multidimensional optimization problem into a sequence of multiple separate one-dimensional optimization problems. However, its convergence depends highly on the initialization method used and

**1. Introduction**

signal shapes.

for several decades.

Renbiao Wu, Wenyi Wang and Qiongqiong Jia *Tianjin Key Lab for Advanced Signal Processing Civil Aviation University of China, Tianjin*

**Time Delay Estimation**

### **8. References**


## **FFT-Based Efficient Algorithms for Time Delay Estimation**

Renbiao Wu, Wenyi Wang and Qiongqiong Jia *Tianjin Key Lab for Advanced Signal Processing Civil Aviation University of China, Tianjin P.R.China*

### **1. Introduction**

54 Fourier Transform – Signal Processing

Alvarez, A. O., and Rivera, L. N., Meana, H. P. (1999). Real time high frequency spectrum

Beaufays F., and Widrow, B. (1994). Two-Layer Linear Structure for Fast Adaptive Filtering.

Beaufays, F., Widrow, B. On the advantages of the LMS spectrum analyzer over

Liu, B., Bruton, L. T. (1993). The two-dimentional complex LMS algorithm applied to the 2-

Macgge, W. F. Fundamental relationship between LMS spectrum analyzer and recursive

Ogunfunmi, T., Au, M. (1994). 2-D discrete orthogonal transforms by means of 2-D LMS

Widrow, B., Baudrenghien, P., Vetterli, M., and Titchener, P. (1987). Fundamental Relations

Proceedings of WCNN, San Diego, USA, Vol. 3, June 1994

*processing*, Vol. 40, No. 5, (May 1993), pp. (337-341)

Systems, Vol. 34, No. (7), (July 1987), pp. (814-820)

No. 1, (January 1989), pp. (151-153)

Pacific Grove , CA, USA, November 1994

analyzer. *Proceedings of 42nd symposium on circuits and systems*, Vol. 2, (August

nonadaptive implementations of the sliding –DFT. (1995). *IEEE Transactions on Circuits and Systems-I: Fundamental Theory and Applications*, Vol. 42, No. 4, (April

dD DFT. *IEEE Transactions on Circuits and Systems-II: Analog and digital signal* 

least squares estimation. (1989). IEEE Transactions on Circuits and Systems. Vol. 36,

adaptive algorithms, Poceedings of signals, systems and computers, 0-8186-6405-3,

Between the LMS Algorithm and the DFT. IEEE Transactions on Circuits and

**8. References** 

1999), pp. (753-756)

1995), pp. (218-210)

Suppose that we have a single sensor receiving a superimposition of attenuated and delayed replicas of a known signal plus noise. From the received data we want to estimate the arrival times of the various replicas and their (complex or real) attenuation coefficients (gains). This is the well-known time delay estimation problem which occurs in many fields including radar, active sonar, wireless communications, Global Navigation Satellite System (GNSS), nondestructive testing, geophysical/seismic exploration, and medical imaging. In this chapter, we will focus on time delay estimation based on one sensor with known probing signal shapes.

The most well-known time delay estimator is the matched filter approach. If there is only one signal or the overlapped signals are separated in time by an interval that is much greater than the width of the signal autocorrelation function, then the matched filter is the optimal estimator when the noise is white Gaussian (Ehrenberg et al., 1978) . The resolution capability of the matched filter approach depends on the signal bandwidth and the larger the signal bandwidth, the better the resolution. However, in many situations there exist some practical limitations on increasing the bandwidth of the transmitted signals. How to resolve closely spaced overlapping noisy echoes has attracted the attention of researchers from many fields for several decades.

Several approaches have been suggested for this problem and many of them benefit from the recent development of high resolution sinusoidal frequency estimation and Direction of Arrival (DOA) estimation techniques. Sinusoidal frequency estimation techniques such as MUSIC (Schmidt, 1986), Linear Prediction (Tufts et al., 1982), and MODE (Stoica & Nehorai, 1990) are applied to the time delay estimation problem in (Bian & Last, 1997; Kirsteins, 1987; Kirsteins & Kot, 1990). However, these approaches are only applicable to signals with flat (rectangular) band-limited spectra. Several Maximum Likelihood (ML) approaches have also been suggested for this problem. Multidimensional global optimization algorithms are presented in (Bell & Ewart, 1986; Blackowiak & Rajan, 1995; Manickam et al., 1994) to analyze a special class of ocean acoustic data that has very oscillatory autocorrelation functions. An efficient approach based on the Expectation Maximization (EM) algorithm (Moon, 1996) is proposed in (Feder & Weinstein, 1988) that decouples the complicated multidimensional optimization problem into a sequence of multiple separate one-dimensional optimization problems. However, its convergence depends highly on the initialization method used and

Time Delay Estimation 3

FFT-Based Efficient Algorithms for Time Delay Estimation 57

*L* ∑ *l*=1

*<sup>ω</sup><sup>l</sup>* <sup>=</sup> <sup>−</sup>2*πτ<sup>l</sup> NTs*

Note that the time delay estimation problem is similar to the sinusoidal parameter estimation problem except that the exponential signals are weighted by the known signal spectrum. If we divided both sides of (4) by *S*(*k*), the problem would become identical to the sinusoidal parameter estimation problem. Yet we should not do so for the following reasons: first, *S*(*k*) could be zero for some *k*; second, the noise *E*(*k*)/*S*(*k*) will no longer be a white noise even when *E*(*k*) is white; third, when *E*(*k*) is a white noise, the larger the *S*(*k*) at sample *k*, the higher the signal-to-noise ratio (SNR) of the corresponding *Y*(*k*) and hence dividing *Y*(*k*) by

sinusoidal parameter estimation algorithms, such as MUSIC (Schmidt, 1986), ESPRIT (Roy et al., 1986), PRONY (Kay, 1988), are not directly applicable to our problem of interest. Using MODE (Stoica & Nehorai, 1990) would require a multidimensional search over a parameter space because we can no longer reparameterize the MODE cost function via the coefficients of

We consider below estimating the unknown parameters by minimizing the following

 

When *e*(*nTs*) is a zero-mean white Gaussian random process, *E*(*k*) is also white since DFT is a unitary transformation. For this white noise case, the NLS approach is the same as the ML method. When *E*(*k*) is not white, however, the NLS approach is no longer the ML method. However, it has been shown in (Li & Stoica, 1996) that the NLS approach can still

optimization problem. The cost function has a complicated multimodal shape with a very small attraction domain, which makes it very difficult to find the global minimum. Below, we present a relaxation based optimization algorithm to obtain the NLS parameter estimates.

*Y*(−*N*/2) *Y*(−*N*/2 + 1) ··· *Y*(*N*/2 − 1)

*E*(−*N*/2) *E*(−*N*/2 + 1) ··· *E*(*N*/2 − 1)

*S*(−*N*/2) *S*(−*N*/2 + 1) ··· *S*(*N*/2 − 1)

Before we present our approach, let us consider the following preparations. Let

*Y*(*k*) − *S*(*k*)

*L* ∑ *l*=1

*<sup>l</sup>*=1) with respect to the unknown parameters is a highly nonlinear

*<sup>α</sup>le<sup>j</sup>ωlk* 

2

. (6)

*<sup>T</sup>* , (7)

*<sup>T</sup>* , (9)

, (8)

*N*/2−1 ∑ *k*=−*N*/2 *αle<sup>j</sup>ωlk* + *E*(*k*), (4)

*s* that have high SNRs. Because of this, many well-known

. (5)

*Y*(*k*) = *S*(*k*)

**3.1 Weighted Fourier transform and RELAXation (WRELAX) algorithm**

*<sup>l</sup>*=1) =

where

a polynomial.

*S*(*k*) will de-emphasize those *Y*(*k*)�

**3. FFT-based new algorithms**

nonlinear least squares (NLS) criterion:

have excellent statistical accuracy.

**Y** =

**E** =

**S** = diag

Minimizing *<sup>C</sup>*1({*αl*, *<sup>ω</sup>l*}*<sup>L</sup>*

*<sup>C</sup>*1({*αl*, *<sup>ω</sup>l*}*<sup>L</sup>*

no systematic initialization method is given in (Feder & Weinstein, 1988). More recently, time delay estimation lower bounds (Liu et al., 2010) and time delay estimation for small-samples (Gedalyahu & Eldar, 2010) are also studied.

In this chapter, a family of relaxation- and FFT-based new algorithms was proposed for different scenarios. The remainder of this chapter is organized as follows. In Section 2, we formulate the problem of interest. Section 3 describes the new algorithms. In Section 4, some applications utilizing the new algorithms are provided. Section 5 concludes the whole chapter.

#### **2. Data model and problem formulation**

Time delay estimation is a well-known traditional problem occurring frequently in radar, active sonar, and many other fields. In this problem, the waveform received at a single sensor consists of delayed replicas of the transmitted signal with different gains. The gains reflect the scattering property of the targets or multipath channel transmission features. The received signal waveform *y*(*t*) can be described as

$$y(t) = \sum\_{l=1}^{L} \alpha\_l s(t - \tau\_l) + e(t) \,, \quad 0 \le t \le T \,\tag{1}$$

where *s*(*t*), 0 ≤ *t* ≤ *T*0, represents the known transmitted signal (complex or real valued), *y*(*t*) denotes the received signal, which is composed of *L* replicas of *s*(*t*) with different (complex or real valued) gains {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> and real valued delays {*τl*}*<sup>L</sup> <sup>l</sup>*=1, and *e*(*t*) is the receiver noise, which is modeled as a zero-mean Gaussian random process.

Usually, the above received analog signal is sampled for digital signal processing. To avoid aliasing, we must sample *y*(*t*) according to the bandwidth of *s*(*t*). Let *Bs* denote the double-sided bandwidth of *s*(*t*). Then *y*(*t*) must be sampled with the sampling frequency *fs* satisfying

$$f\_{\mathbf{s}} \ge B\_{\mathbf{s}} \,. \tag{2}$$

After A/D conversion, the sampled received signal has the form

$$y(nT\_s) = \sum\_{l=1}^{L} a\_l s(nT\_s - \tau\_l) + e(nT\_s), \quad n = 0, 1, \dots, N - 1,\tag{3}$$

where *Ts* is the sampling period and is equal to the reciprocal of the sampling frequency *fs*.

Our problem of interest herein is to estimate {*αl*, *<sup>τ</sup>l*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> from {*y*(*nTs*)}*N*−<sup>1</sup> *<sup>n</sup>*=<sup>0</sup> with known *<sup>s</sup>*(*t*), <sup>0</sup> <sup>≤</sup> *<sup>t</sup>* <sup>≤</sup> *<sup>T</sup>*0, or {*s*(*nTs*)}*N*−<sup>1</sup> *<sup>n</sup>*=<sup>0</sup> .

Although we could solve the estimation problem in the time-domain (Bell & Ewart, 1986; Blackowiak & Rajan, 1995; Bruckstein et al., 1985; Feder & Weinstein, 1988), we shall consider below solving the problem in the frequency domain and propose a family of relaxation-based algorithms for different scenarios.

Let *Y*(*k*), *S*(*k*), and *E*(*k*), *k* = −*N*/2, −*N*/2 + 1, ..., *N*/2 − 1, denote the discrete Fourier transforms (DFT's) of *y*(*nTs*), *s*(*nTs*), and *e*(*nTs*), respectively. Provided that aliasing is negligible, then *Y*(*k*) can be written as :

$$Y(k) = S(k) \sum\_{l=1}^{L} \alpha\_l e^{j\omega\_l k} + E(k),\tag{4}$$

where

2 Will-be-set-by-IN-TECH

no systematic initialization method is given in (Feder & Weinstein, 1988). More recently, time delay estimation lower bounds (Liu et al., 2010) and time delay estimation for small-samples

In this chapter, a family of relaxation- and FFT-based new algorithms was proposed for different scenarios. The remainder of this chapter is organized as follows. In Section 2, we formulate the problem of interest. Section 3 describes the new algorithms. In Section 4, some applications utilizing the new algorithms are provided. Section 5 concludes the whole chapter.

Time delay estimation is a well-known traditional problem occurring frequently in radar, active sonar, and many other fields. In this problem, the waveform received at a single sensor consists of delayed replicas of the transmitted signal with different gains. The gains reflect the scattering property of the targets or multipath channel transmission features. The received

where *s*(*t*), 0 ≤ *t* ≤ *T*0, represents the known transmitted signal (complex or real valued), *y*(*t*) denotes the received signal, which is composed of *L* replicas of *s*(*t*) with different (complex or

Usually, the above received analog signal is sampled for digital signal processing. To avoid aliasing, we must sample *y*(*t*) according to the bandwidth of *s*(*t*). Let *Bs* denote the double-sided bandwidth of *s*(*t*). Then *y*(*t*) must be sampled with the sampling frequency *fs*

where *Ts* is the sampling period and is equal to the reciprocal of the sampling frequency *fs*.

Although we could solve the estimation problem in the time-domain (Bell & Ewart, 1986; Blackowiak & Rajan, 1995; Bruckstein et al., 1985; Feder & Weinstein, 1988), we shall consider below solving the problem in the frequency domain and propose a family of relaxation-based

Let *Y*(*k*), *S*(*k*), and *E*(*k*), *k* = −*N*/2, −*N*/2 + 1, ..., *N*/2 − 1, denote the discrete Fourier transforms (DFT's) of *y*(*nTs*), *s*(*nTs*), and *e*(*nTs*), respectively. Provided that aliasing is

*αls*(*t* − *τl*) + *e*(*t*) , 0 ≤ *t* ≤ *T*, (1)

*fs* ≥ *Bs* . (2)

*αls*(*nTs* − *τl*) + *e*(*nTs*), *n* = 0, 1, ··· , *N* − 1, (3)

*<sup>l</sup>*=1, and *e*(*t*) is the receiver noise, which

*<sup>l</sup>*=<sup>1</sup> from {*y*(*nTs*)}*N*−<sup>1</sup> *<sup>n</sup>*=<sup>0</sup> with known *<sup>s</sup>*(*t*),

(Gedalyahu & Eldar, 2010) are also studied.

**2. Data model and problem formulation**

signal waveform *y*(*t*) can be described as

real valued) gains {*αl*}*<sup>L</sup>*

satisfying

*y*(*t*) =

is modeled as a zero-mean Gaussian random process.

*y*(*nTs*) =

<sup>0</sup> <sup>≤</sup> *<sup>t</sup>* <sup>≤</sup> *<sup>T</sup>*0, or {*s*(*nTs*)}*N*−<sup>1</sup> *<sup>n</sup>*=<sup>0</sup> .

algorithms for different scenarios.

negligible, then *Y*(*k*) can be written as :

*L* ∑ *l*=1

After A/D conversion, the sampled received signal has the form

*L* ∑ *l*=1

Our problem of interest herein is to estimate {*αl*, *<sup>τ</sup>l*}*<sup>L</sup>*

*<sup>l</sup>*=<sup>1</sup> and real valued delays {*τl*}*<sup>L</sup>*

$$
\omega\_l = -\frac{2\pi\tau\_l}{NT\_s}.\tag{5}
$$

Note that the time delay estimation problem is similar to the sinusoidal parameter estimation problem except that the exponential signals are weighted by the known signal spectrum. If we divided both sides of (4) by *S*(*k*), the problem would become identical to the sinusoidal parameter estimation problem. Yet we should not do so for the following reasons: first, *S*(*k*) could be zero for some *k*; second, the noise *E*(*k*)/*S*(*k*) will no longer be a white noise even when *E*(*k*) is white; third, when *E*(*k*) is a white noise, the larger the *S*(*k*) at sample *k*, the higher the signal-to-noise ratio (SNR) of the corresponding *Y*(*k*) and hence dividing *Y*(*k*) by *S*(*k*) will de-emphasize those *Y*(*k*)� *s* that have high SNRs. Because of this, many well-known sinusoidal parameter estimation algorithms, such as MUSIC (Schmidt, 1986), ESPRIT (Roy et al., 1986), PRONY (Kay, 1988), are not directly applicable to our problem of interest. Using MODE (Stoica & Nehorai, 1990) would require a multidimensional search over a parameter space because we can no longer reparameterize the MODE cost function via the coefficients of a polynomial.

#### **3. FFT-based new algorithms**

#### **3.1 Weighted Fourier transform and RELAXation (WRELAX) algorithm**

We consider below estimating the unknown parameters by minimizing the following nonlinear least squares (NLS) criterion:

$$\mathcal{C}\_{1}(\{\boldsymbol{\alpha}\_{l},\boldsymbol{\omega}\_{l}\}\_{l=1}^{L}) = \sum\_{k=-N/2}^{N/2-1} \left| Y(k) - \mathcal{S}(k) \sum\_{l=1}^{L} \alpha\_{l} e^{j\omega\_{l}k} \right|^{2} . \tag{6}$$

When *e*(*nTs*) is a zero-mean white Gaussian random process, *E*(*k*) is also white since DFT is a unitary transformation. For this white noise case, the NLS approach is the same as the ML method. When *E*(*k*) is not white, however, the NLS approach is no longer the ML method. However, it has been shown in (Li & Stoica, 1996) that the NLS approach can still have excellent statistical accuracy.

Minimizing *<sup>C</sup>*1({*αl*, *<sup>ω</sup>l*}*<sup>L</sup> <sup>l</sup>*=1) with respect to the unknown parameters is a highly nonlinear optimization problem. The cost function has a complicated multimodal shape with a very small attraction domain, which makes it very difficult to find the global minimum. Below, we present a relaxation based optimization algorithm to obtain the NLS parameter estimates. Before we present our approach, let us consider the following preparations. Let

$$\mathbf{Y} = \begin{bmatrix} Y(-N/2) \ Y(-N/2+1) \ \cdots \ Y(N/2-1) \end{bmatrix}^T \tag{7}$$

$$\mathbf{S} = \text{diag}\left\{ \mathbf{S}(-\text{N}/2) \: \mathbf{S}(-\text{N}/2+1) \cdot \cdots \: \mathbf{S}(\text{N}/2-1) \right\},\tag{8}$$

$$\mathbf{E} = \begin{bmatrix} E(-N/2) \ E(-N/2 + 1) \ \cdots \ E(N/2 - 1) \end{bmatrix}^T,\tag{9}$$

Time Delay Estimation 5

FFT-Based Efficient Algorithms for Time Delay Estimation 59

Iterate the previous two substeps until "practical convergence" is achieved (to be discussed

**Remaining Steps:** Continue similarly until *L* is equal to the desired or estimated number of signals. (Whenever *L* is unknown, it can be estimated from the available data, for instance, by using the generalized Akaike information criterion (AIC) rules which are particularly tailored to the WRELAX method of parameter estimation. See, for example, (Li & Stoica, 1996).)

The "practical convergence" in the iterations of the above WRELAX method may be

two consecutive iterations. The algorithm is bound to converge to at least some local minimum point (Karmanov, 1977). The convergence speed depends on the time delay spacing of the signals. If the spacing between any two signals is larger than the reciprocal of the signal bandwidth, the algorithm converges in a few steps. As the spacing of the signals becomes

At this point, we would like to point out the relationship between WRELAX and the conventional matched filter approach. The matched filter approach can also be formulated

The matched filter method searches for the *L* largest peak positions of *F*(*ω*) as the estimates

Hence when there is only one signal, this one-dimensional matched filter approach is equivalent to the WRELAX algorithm. However, when there are multiple signals that are not well separated, this conventional matched filter approach will perform poorly. In this case, a multidimensional matched filter method (Bell & Ewart, 1986) could be used and the method is equivalent to the NLS fitting approach (Bell & Ewart, 1986). The WRELAX algorithm decouples the multidimensional matched filters into a sequence of one-dimensional matched filters. Thus the excellent parameter estimation performance of the NLS fitting approach can

Similar to the WRELAX algorithm, the EM algorithm proposed in (Feder & Weinstein, 1988) also transforms the multidimensional optimization problem into a series of one-dimensional optimization problems. The detailed implementations of the algorithms, however, are quite different. The EM algorithm consists of two steps, the E (Estimate) step and the M (Maximize) step. The idea is to decompose the observed data into their signal components (the E step) and then to estimate the parameters of each signal component separately (the M step). The

**a***H*(*ω*)(**S**∗**Y**)

*<sup>l</sup>*=1, the estimates {*τ*ˆ*l*}*<sup>L</sup>*

 2

**Y**1. Then compute **Y**<sup>2</sup> by using {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=1,3 and redetermine {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>2</sup> from **Y**2.

determined by checking the relative change of the cost function *<sup>C</sup>*1({*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}*<sup>L</sup>*

*F*(*ω*) =

*<sup>l</sup>*=1, and then the gains are determined as follows

be achieved at a much lower implementation cost.

*<sup>α</sup>*ˆ*<sup>l</sup>* <sup>=</sup> **<sup>a</sup>***H*(*ωl*)(**S**∗**Y**) � **<sup>S</sup>** �<sup>2</sup> *F*

 

> *ωl*=*ω*ˆ *<sup>l</sup>*

*<sup>l</sup>*=<sup>1</sup> obtained in Step (2). Obtain

*<sup>l</sup>*=<sup>1</sup> of {*τl*}*<sup>L</sup>*

. (17)

, *l* = 1, 2, ··· , *L* . (18)

*<sup>l</sup>*=1) in (6) between

*<sup>l</sup>*=<sup>1</sup> can be

*<sup>l</sup>*=<sup>2</sup> and redetermine {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> from

**Step (3):** Assume *<sup>L</sup>* <sup>=</sup> 3. Compute **<sup>Y</sup>**<sup>3</sup> by using {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}<sup>2</sup>

Iterate the previous three substeps until "practical convergence".

{*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}*l*=<sup>3</sup> from **<sup>Y</sup>**3. Next, compute **<sup>Y</sup>**<sup>1</sup> by using {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}<sup>3</sup>

closer, the convergence speed becomes slower. Once we have obtained the estimates {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup>*

determined by using (5).

of {*ωl*}*<sup>L</sup>*

in the frequency domain. Let

later on).

and

$$\mathbf{a}(\omega\_{l}) = \begin{bmatrix} e^{j\omega\_{l}(-N/2)} \ e^{j\omega\_{l}(-N/2+1)} \ \cdots \ e^{j\omega\_{l}(N/2-1)} \end{bmatrix}^{T} \,\,\,\,\tag{10}$$

where (·)*<sup>T</sup>* denotes the transpose. Denote

$$\mathbf{Y}\_{l} = \mathbf{Y} - \sum\_{i=1, i \neq l}^{L} \hat{\alpha}\_{i} [\mathbf{S} \mathbf{a}(\hat{\omega}\_{l})] \tag{11}$$

where {*α*ˆ*i*, *<sup>ω</sup>*<sup>ˆ</sup> *<sup>i</sup>*}*<sup>L</sup> <sup>i</sup>*=1,*i*� *<sup>l</sup>* are assumed to be given. Consider first the case where {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> are complex valued. Let

$$\mathbf{b}(\omega\_l) = \mathbf{S}\mathbf{a}(\omega\_l), \quad l = 1, 2, \cdots, L. \tag{12}$$

Then (6) becomes

$$\mathbf{C}\_{2}(\mathfrak{a}\_{l\prime}\omega\_{l}) = \left\| \begin{array}{c} \mathbf{Y}\_{l} - \mathfrak{a}\_{l}\mathbf{b}(\omega\_{l}) \end{array} \right\|^{2} \,. \tag{13}$$

where �·� denotes the Euclidean norm. Minimizing *C*2(*αl*, *ωl*) with respect to *α<sup>l</sup>* yields the estimate *α*ˆ*<sup>l</sup>* of *α<sup>l</sup>*

$$
\hat{\boldsymbol{\alpha}}\_{l} = \frac{\mathbf{b}^{H}(\boldsymbol{\omega}\_{l})\mathbf{Y}\_{l}}{\mathbf{b}^{H}(\boldsymbol{\omega}\_{l})\mathbf{b}(\boldsymbol{\omega}\_{l})} = \frac{\mathbf{a}^{H}(\boldsymbol{\omega}\_{l})(\mathbf{S}^{\*}\mathbf{Y}\_{l})}{\|\|\mathbf{S}\|\|\_{F}^{2}}\tag{14}
$$

where (·)*<sup>H</sup>* represents the conjugate transpose and �·�*<sup>F</sup>* denotes the *Frobenius* norm (Stewart, 1973). Then the estimate *ω*ˆ *<sup>l</sup>* of *ω<sup>l</sup>* is obtained as follows:

$$\begin{split} \hat{\omega}\_{l} &= \arg\min\_{\omega\_{l}} \left\lVert \mathbf{Y}\_{l} - \frac{\mathbf{b}(\omega\_{l})\mathbf{b}^{H}(\omega\_{l})}{\mathbf{b}^{H}(\omega\_{l})\mathbf{b}(\omega\_{l})} \mathbf{Y}\_{l} \right\rVert^{2} = \arg\max\_{\omega\_{l}} \frac{\mathbf{Y}\_{l}^{H}\mathbf{b}(\omega\_{l})\mathbf{b}^{H}(\omega\_{l})\mathbf{Y}\_{l}}{\mathbf{b}^{H}(\omega\_{l})\mathbf{b}(\omega\_{l})} \\ &= \arg\max\_{\omega\_{l}} \left| \mathbf{a}^{H}(\omega\_{l})(\mathbf{S}^{\*}\mathbf{Y}\_{l}) \right|^{2} \,, \end{split} \tag{15}$$

where we have used the fact that **<sup>b</sup>***H*(*ωl*)**b**(*ωl*) =� **<sup>S</sup>** �<sup>2</sup> *<sup>F</sup>* and hence is independent of *ωl*. Hence *ω*ˆ *<sup>l</sup>* is obtained as the location of the dominant peak of the magnitude squared of the Fourier transform, <sup>|</sup>**a***H*(*ωl*)(**S**∗**Y***l*)<sup>|</sup> 2, which can be efficiently computed by using the fast Fourier transform (FFT) with the weighted data vector **S**∗**Y***<sup>l</sup>* padded with zeros. An alternative scheme to zero-padding FFT is to find an approximate peak location first by using FFT without much zero-padding and then perform a fine search nearby the approximate peak location by, for example, the *f min* function in MATLAB, which uses the Golden section search algorithm. With the estimate of *ω<sup>l</sup>* at hand, *α*ˆ*<sup>l</sup>* can be easily computed from the corresponding complex height:

$$\hat{\mathbf{a}}\_{l} = \frac{\mathbf{a}^{H}(\omega\_{l})(\mathbf{S}^{\*}\mathbf{Y}\_{l})}{\|\mathbf{S}\|\_{F}^{2}}\bigg|\_{\omega\_{l} = \hat{\omega}\_{l}}.\tag{16}$$

With the above simple preparations, we now present the WRELAX algorithm.

**Step (1):** Assume *L* = 1. Obtain {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> from **Y** by using (15) and (16).

**Step (2):** Assume *L* = 2. Compute **Y**<sup>2</sup> with (11) by using {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> obtained in Step (1). Obtain {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>2</sup> from **Y**2. Next, compute **Y**<sup>1</sup> by using {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>2</sup> and then redetermine {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> from **Y**1.

4 Will-be-set-by-IN-TECH

*L* ∑ *i*=1,*i*� *l*

where �·� denotes the Euclidean norm. Minimizing *C*2(*αl*, *ωl*) with respect to *α<sup>l</sup>* yields the

where (·)*<sup>H</sup>* represents the conjugate transpose and �·�*<sup>F</sup>* denotes the *Frobenius* norm (Stewart,

**Y***l* 

*ωl*. Hence *ω*ˆ *<sup>l</sup>* is obtained as the location of the dominant peak of the magnitude squared

fast Fourier transform (FFT) with the weighted data vector **S**∗**Y***<sup>l</sup>* padded with zeros. An alternative scheme to zero-padding FFT is to find an approximate peak location first by using FFT without much zero-padding and then perform a fine search nearby the approximate peak location by, for example, the *f min* function in MATLAB, which uses the Golden section search algorithm. With the estimate of *ω<sup>l</sup>* at hand, *α*ˆ*<sup>l</sup>* can be easily computed from the corresponding

**Step (2):** Assume *L* = 2. Compute **Y**<sup>2</sup> with (11) by using {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> obtained in Step (1). Obtain {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>2</sup> from **Y**2. Next, compute **Y**<sup>1</sup> by using {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>2</sup> and then redetermine

 *ωl*=*ω*ˆ *<sup>l</sup>*

*<sup>α</sup>*ˆ*<sup>l</sup>* <sup>=</sup> **<sup>a</sup>***H*(*ωl*)(**S**∗**Y***l*) � **<sup>S</sup>** �<sup>2</sup> *F*

With the above simple preparations, we now present the WRELAX algorithm.

**Step (1):** Assume *L* = 1. Obtain {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> from **Y** by using (15) and (16).

2

**<sup>b</sup>***H*(*ωl*)**b**(*ωl*) <sup>=</sup> **<sup>a</sup>***H*(*ωl*)(**S**∗**Y***l*)

*<sup>i</sup>*=1,*i*� *<sup>l</sup>* are assumed to be given. Consider first the case where {*αl*}*<sup>L</sup>*

**Y***<sup>l</sup>* = **Y** −

*<sup>α</sup>*ˆ*<sup>l</sup>* <sup>=</sup> **<sup>b</sup>***H*(*ωl*)**Y***<sup>l</sup>*

**<sup>Y</sup>***<sup>l</sup>* <sup>−</sup> **<sup>b</sup>**(*ωl*)**b***H*(*ωl*) **b***H*(*ωl*)**b**(*ωl*)

> 2

**a***H*(*ωl*)(**S**∗**Y***l*)

where we have used the fact that **<sup>b</sup>***H*(*ωl*)**b**(*ωl*) =� **<sup>S</sup>** �<sup>2</sup>

1973). Then the estimate *ω*ˆ *<sup>l</sup>* of *ω<sup>l</sup>* is obtained as follows:

 

> 

*<sup>ω</sup>*<sup>ˆ</sup> *<sup>l</sup>* <sup>=</sup> arg min *<sup>ω</sup><sup>l</sup>*

<sup>=</sup> arg max *<sup>ω</sup><sup>l</sup>*

of the Fourier transform, <sup>|</sup>**a***H*(*ωl*)(**S**∗**Y***l*)<sup>|</sup>

*<sup>e</sup>jωl*(−*N*/2) *<sup>e</sup>jωl*(−*N*/2+1) ··· *<sup>e</sup>jωl*(*N*/2−1) *<sup>T</sup>* , (10)

**b**(*ωl*) = **Sa**(*ωl*), *l* = 1, 2, ··· , *L*. (12)

*<sup>C</sup>*2(*αl*, *<sup>ω</sup>l*) =� **<sup>Y</sup>***<sup>l</sup>* <sup>−</sup> *<sup>α</sup>l***b**(*ωl*) �2, (13)

**Y***<sup>H</sup>*

, (15)

2, which can be efficiently computed by using the

*<sup>l</sup>* **<sup>b</sup>**(*ωl*)**b***H*(*ωl*)**Y***<sup>l</sup>* **b***H*(*ωl*)**b**(*ωl*)

*<sup>F</sup>* and hence is independent of

. (16)

� **<sup>S</sup>** �<sup>2</sup> *F*

<sup>=</sup> arg max *<sup>ω</sup><sup>l</sup>*

*α*ˆ*i*[**Sa**(*ω*ˆ *<sup>i</sup>*)] (11)

, (14)

*<sup>l</sup>*=<sup>1</sup> are

and

where {*α*ˆ*i*, *<sup>ω</sup>*<sup>ˆ</sup> *<sup>i</sup>*}*<sup>L</sup>*

Then (6) becomes

estimate *α*ˆ*<sup>l</sup>* of *α<sup>l</sup>*

complex height:

{*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> from **Y**1.

complex valued. Let

**<sup>a</sup>**(*ωl*) =

where (·)*<sup>T</sup>* denotes the transpose. Denote

Iterate the previous two substeps until "practical convergence" is achieved (to be discussed later on).

**Step (3):** Assume *<sup>L</sup>* <sup>=</sup> 3. Compute **<sup>Y</sup>**<sup>3</sup> by using {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}<sup>2</sup> *<sup>l</sup>*=<sup>1</sup> obtained in Step (2). Obtain {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}*l*=<sup>3</sup> from **<sup>Y</sup>**3. Next, compute **<sup>Y</sup>**<sup>1</sup> by using {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}<sup>3</sup> *<sup>l</sup>*=<sup>2</sup> and redetermine {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> from **Y**1. Then compute **Y**<sup>2</sup> by using {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=1,3 and redetermine {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>2</sup> from **Y**2.

Iterate the previous three substeps until "practical convergence".

**Remaining Steps:** Continue similarly until *L* is equal to the desired or estimated number of signals. (Whenever *L* is unknown, it can be estimated from the available data, for instance, by using the generalized Akaike information criterion (AIC) rules which are particularly tailored to the WRELAX method of parameter estimation. See, for example, (Li & Stoica, 1996).)

The "practical convergence" in the iterations of the above WRELAX method may be determined by checking the relative change of the cost function *<sup>C</sup>*1({*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}*<sup>L</sup> <sup>l</sup>*=1) in (6) between two consecutive iterations. The algorithm is bound to converge to at least some local minimum point (Karmanov, 1977). The convergence speed depends on the time delay spacing of the signals. If the spacing between any two signals is larger than the reciprocal of the signal bandwidth, the algorithm converges in a few steps. As the spacing of the signals becomes closer, the convergence speed becomes slower.

Once we have obtained the estimates {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup> <sup>l</sup>*=1, the estimates {*τ*ˆ*l*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> of {*τl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> can be determined by using (5).

At this point, we would like to point out the relationship between WRELAX and the conventional matched filter approach. The matched filter approach can also be formulated in the frequency domain. Let

$$F(\omega) = \left| \mathbf{a}^H(\omega)(\mathbf{S}^\* \mathbf{Y}) \right|^2 \,. \tag{17}$$

The matched filter method searches for the *L* largest peak positions of *F*(*ω*) as the estimates of {*ωl*}*<sup>L</sup> <sup>l</sup>*=1, and then the gains are determined as follows

$$\hat{\mathbf{a}}\_{l} = \frac{\mathbf{a}^{H}(\omega\_{l})(\mathbf{S}^{\*}\mathbf{Y})}{\|\mathbf{S}\|\_{F}^{2}}\bigg|\_{\omega\_{l} = \hat{\omega}\_{l}}, \quad l = 1, 2, \cdots, L\tag{18}$$

Hence when there is only one signal, this one-dimensional matched filter approach is equivalent to the WRELAX algorithm. However, when there are multiple signals that are not well separated, this conventional matched filter approach will perform poorly. In this case, a multidimensional matched filter method (Bell & Ewart, 1986) could be used and the method is equivalent to the NLS fitting approach (Bell & Ewart, 1986). The WRELAX algorithm decouples the multidimensional matched filters into a sequence of one-dimensional matched filters. Thus the excellent parameter estimation performance of the NLS fitting approach can be achieved at a much lower implementation cost.

Similar to the WRELAX algorithm, the EM algorithm proposed in (Feder & Weinstein, 1988) also transforms the multidimensional optimization problem into a series of one-dimensional optimization problems. The detailed implementations of the algorithms, however, are quite different. The EM algorithm consists of two steps, the E (Estimate) step and the M (Maximize) step. The idea is to decompose the observed data into their signal components (the E step) and then to estimate the parameters of each signal component separately (the M step). The

Time Delay Estimation 7

FFT-Based Efficient Algorithms for Time Delay Estimation 61

Output of WRELAX

Fig. 1. Illustrative comparison of WRELAX with the matched filter method. The vertical lines denote the true time delay. (a) The output of the matched filter and (b)The outputs of the two

MSE of α 1 (dB)

Fig. 2. MSEs ("×") of WRELAX and Cramér-Rao bound (CRB) (solid line) for (a) *τ*<sup>1</sup> and (b)

is smaller than the corresponding true gains). After several steps, they converge to the true

In this example, the time delay spacing between the two signals is *τ*<sup>2</sup> − *τ*<sup>1</sup> = 0.5*τe*. The MSEs for the first signal using WRELAX are compared with the corresponding CRBs in Figure 2 and the MSE and CRB curves for the other signal are similar. From Figure 2, it can be noted that the MSEs obtained by using WRELAX approach the corresponding CRBs as the SNR increases.

Here, we extend the above WRELAX algorithm to the case of multiple looks. Two scenarios will be considered, which include 1) fixed delays but arbitrary gains and 2) fixed delays and

**3.2 Toeplitz property based Weighted Fourier transform and RELAXation (TWRELAX)**






0

CRB WRELAX 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

τ /T

<sup>10</sup> <sup>12</sup> <sup>14</sup> <sup>16</sup> <sup>18</sup> <sup>20</sup> <sup>22</sup> -30

SNR (dB) (b)

CRB WRELAX

(b)

<sup>0</sup> 0.05 0.1 0.15 0.2 0.25 <sup>0</sup>

τ /T

decoupled matched filters with WRELAX for all iterations.

(a)

<sup>10</sup> <sup>12</sup> <sup>14</sup> <sup>16</sup> <sup>18</sup> <sup>20</sup> <sup>22</sup> -170

SNR (dB) (a)

<sup>0</sup> 0.05 0.1 0.15 0.2 0.25 <sup>0</sup>

Output of the Matched Filter

MSE of τ 1 (dB)

*α*1.


time delays and gains.

**algorithm**






0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

algorithm is iterative, using the current parameter estimates to decompose the observed data. At each E step, the residue error corresponding to the current estimates is also decomposed among different signal components. Although initial conditions are needed by EM, no systematic initialization method is given in (Feder & Weinstein, 1988). We have also found that the performance of EM is very sensitive to the initial conditions used. Even with the same initial conditions, our numerical examples show that the convergence speed of EM can be much slower than the last step of WRELAX. Further, WRELAX does not require any initial conditions before its iterations and the first *L* − 1 steps of WRELAX can provide an excellent initial condition for Step *L*.

Consider next the case where {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> are real-valued. Minimizing *C*2(*αl*, *ωl*) with respect to *α<sup>l</sup>* and *ω<sup>l</sup>* yields

$$\hat{\alpha}\_{l} = \frac{\text{Re}\left[\mathbf{a}^{H}(\omega\_{l})(\mathbf{S}^{\*}\mathbf{Y}\_{l})\right]}{\left\|\mathbf{S}\right\|\_{F}^{2}}\Bigg|\_{\omega\_{l}=\hat{\omega}\_{l}}\tag{19}$$

where Re(**X**) denotes the real part of **X**, and

$$
\hat{\omega}\_l = \arg\max\_{\omega\_l} \text{Re}^2 \left[ \mathbf{a}^H(\omega\_l) (\mathbf{S}^\* \mathbf{Y}\_l) \right]. \tag{20}
$$

The WRELAX algorithm could also be implemented in the time domain, which is based on the correlations. However, we prefer to use the frequency domain version of WRELAX. For the time domain version, we could be restricted to use the discrete values of {*τl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> if we only know the sampled version of *s*(*t*). For this case, if a more accurate delay estimate is required, then one has to resort to interpolation (Bell & Ewart, 1986). This inconvinence can be avoided by transforming the problem to the frequency domain, where {*τl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> can take on a continuum of values. Even without considering the additional interpolation cost, the computational load of the time domain correlation-based WRELAX is heavier than that of the frequency domain WRELAX.

We present several numerical examples to demonstrate the performance of the proposed algorithms. In the following examples, we use a windowed chirp signal, *s*(*t*) = *<sup>w</sup>*(*t*)*ejβ*(*t*<sup>−</sup> *<sup>T</sup>*<sup>0</sup> <sup>2</sup> )<sup>2</sup> , 0 ≤ *t* ≤ *T*0, where *β* is the chirp rate and *w*(*t*) is a bell-shaped window function. We use *<sup>N</sup>* <sup>=</sup> 64, *<sup>β</sup>* <sup>=</sup> *<sup>π</sup>* <sup>×</sup> 1012, the signal bandwidth *Bs* <sup>=</sup> *<sup>β</sup>T*0/*π*, and the sampling frequency *fs* = 2*Bs*. *T*<sup>0</sup> is chosen in such a way that *T*<sup>0</sup> = (*N*/2 − 1)*Ts*. The resolution limit of the conventional matched filter method is around *τ<sup>e</sup>* = 1/*Bs*. *L* = 2 signals are assumed to be superimposed together with *α*<sup>1</sup> = *ejπ*/8, *α*<sup>2</sup> = *ejπ*/4.

In the first example, the additive noise is zero-mean white Gaussian and SNR=SNR1=SNR2=10dB. The time delay spacing between the two signals is *τ*<sup>2</sup> − *τ*<sup>1</sup> = *τe*. Even in this case, the conventional matched filter method fails to resolve the two signals, as can be seen from Figure 1(a), where the horizontal axis denotes the normalized time delay *τ*/*T* and the two vertical lines indicate the true time delays of the two signals. However, using WRELAX we can resolve them very well. As pointed out before, the WRELAX algorithm can be viewed as transforming the multi-dimensional matched filters into a sequence of one-dimensional matched filters. The outputs of the two matched filters for all iterations are plotted in Figure 1(b), which illustrates the convergence process of WRELAX. At the beginning of the iteration, the peak positions and the corresponding gain estimates obtained from the filter outputs differ from their true values (one gain estimate is larger and the other

6 Will-be-set-by-IN-TECH

algorithm is iterative, using the current parameter estimates to decompose the observed data. At each E step, the residue error corresponding to the current estimates is also decomposed among different signal components. Although initial conditions are needed by EM, no systematic initialization method is given in (Feder & Weinstein, 1988). We have also found that the performance of EM is very sensitive to the initial conditions used. Even with the same initial conditions, our numerical examples show that the convergence speed of EM can be much slower than the last step of WRELAX. Further, WRELAX does not require any initial conditions before its iterations and the first *L* − 1 steps of WRELAX can provide an excellent

**a***H*(*ωl*)(**S**∗**Y***l*)

� **<sup>S</sup>** �<sup>2</sup> *F*

Re<sup>2</sup> 

the time domain version, we could be restricted to use the discrete values of {*τl*}*<sup>L</sup>*

The WRELAX algorithm could also be implemented in the time domain, which is based on the correlations. However, we prefer to use the frequency domain version of WRELAX. For

know the sampled version of *s*(*t*). For this case, if a more accurate delay estimate is required, then one has to resort to interpolation (Bell & Ewart, 1986). This inconvinence can be avoided

of values. Even without considering the additional interpolation cost, the computational load of the time domain correlation-based WRELAX is heavier than that of the frequency domain

We present several numerical examples to demonstrate the performance of the proposed algorithms. In the following examples, we use a windowed chirp signal, *s*(*t*) =

function. We use *<sup>N</sup>* <sup>=</sup> 64, *<sup>β</sup>* <sup>=</sup> *<sup>π</sup>* <sup>×</sup> 1012, the signal bandwidth *Bs* <sup>=</sup> *<sup>β</sup>T*0/*π*, and the sampling frequency *fs* = 2*Bs*. *T*<sup>0</sup> is chosen in such a way that *T*<sup>0</sup> = (*N*/2 − 1)*Ts*. The resolution limit of the conventional matched filter method is around *τ<sup>e</sup>* = 1/*Bs*. *L* = 2 signals are assumed to be

In the first example, the additive noise is zero-mean white Gaussian and SNR=SNR1=SNR2=10dB. The time delay spacing between the two signals is *τ*<sup>2</sup> − *τ*<sup>1</sup> = *τe*. Even in this case, the conventional matched filter method fails to resolve the two signals, as can be seen from Figure 1(a), where the horizontal axis denotes the normalized time delay *τ*/*T* and the two vertical lines indicate the true time delays of the two signals. However, using WRELAX we can resolve them very well. As pointed out before, the WRELAX algorithm can be viewed as transforming the multi-dimensional matched filters into a sequence of one-dimensional matched filters. The outputs of the two matched filters for all iterations are plotted in Figure 1(b), which illustrates the convergence process of WRELAX. At the beginning of the iteration, the peak positions and the corresponding gain estimates obtained from the filter outputs differ from their true values (one gain estimate is larger and the other

, 0 ≤ *t* ≤ *T*0, where *β* is the chirp rate and *w*(*t*) is a bell-shaped window

**<sup>a</sup>***H*(*ωl*)(**S**∗**Y***l*)

 *ωl*=*ω*ˆ *<sup>l</sup>*

*<sup>α</sup>*ˆ*<sup>l</sup>* <sup>=</sup> Re

*<sup>ω</sup>*<sup>ˆ</sup> *<sup>l</sup>* <sup>=</sup> arg max *<sup>ω</sup><sup>l</sup>*

by transforming the problem to the frequency domain, where {*τl*}*<sup>L</sup>*

superimposed together with *α*<sup>1</sup> = *ejπ*/8, *α*<sup>2</sup> = *ejπ*/4.

*<sup>l</sup>*=<sup>1</sup> are real-valued. Minimizing *C*2(*αl*, *ωl*) with respect to

, (19)

. (20)

*<sup>l</sup>*=<sup>1</sup> can take on a continuum

*<sup>l</sup>*=<sup>1</sup> if we only

initial condition for Step *L*.

*α<sup>l</sup>* and *ω<sup>l</sup>* yields

WRELAX.

*<sup>w</sup>*(*t*)*ejβ*(*t*<sup>−</sup> *<sup>T</sup>*<sup>0</sup>

<sup>2</sup> )<sup>2</sup>

Consider next the case where {*αl*}*<sup>L</sup>*

where Re(**X**) denotes the real part of **X**, and

Fig. 1. Illustrative comparison of WRELAX with the matched filter method. The vertical lines denote the true time delay. (a) The output of the matched filter and (b)The outputs of the two decoupled matched filters with WRELAX for all iterations.

Fig. 2. MSEs ("×") of WRELAX and Cramér-Rao bound (CRB) (solid line) for (a) *τ*<sup>1</sup> and (b) *α*1.

is smaller than the corresponding true gains). After several steps, they converge to the true time delays and gains.

In this example, the time delay spacing between the two signals is *τ*<sup>2</sup> − *τ*<sup>1</sup> = 0.5*τe*. The MSEs for the first signal using WRELAX are compared with the corresponding CRBs in Figure 2 and the MSE and CRB curves for the other signal are similar. From Figure 2, it can be noted that the MSEs obtained by using WRELAX approach the corresponding CRBs as the SNR increases.

#### **3.2 Toeplitz property based Weighted Fourier transform and RELAXation (TWRELAX) algorithm**

Here, we extend the above WRELAX algorithm to the case of multiple looks. Two scenarios will be considered, which include 1) fixed delays but arbitrary gains and 2) fixed delays and

Time Delay Estimation 9

FFT-Based Efficient Algorithms for Time Delay Estimation 63

**<sup>a</sup>***H*(*ωl*)(**S**∗**Y**(*m*)

� **<sup>S</sup>** �<sup>2</sup> *F*

Re<sup>2</sup> 

When both the delays and gains of the target scatterers remain the same during the multiple look interval, we can derive an ML estimator when the noise is assumed to be a zero-mean colored Gaussian noise with an unknown covariance matrix **Q**. Note that although we could continue to use the NLS approach for the current problem, we prefer to take the noise statistics into account since we will show below that doing so in this case introduces little difficulties for sufficiently large *M*. For the former problems, modeling the noise with an unknown covariance matrix **Q** makes the ML approach ill-defined due to too many unknowns (Li et

Let **Y**(*m*) be the DFT of the received data vector due to the *m*th pulse which can be written as

*<sup>α</sup>l***b**(*ωl*) + **<sup>E</sup>**(*m*)

vectors with an unknown covariance matrix **Q** that are independent of each other. Let

**b** = [ **b**(*ω*1) **b**(*ω*2) ··· **b**(*ωL*)]

α = [ *α*<sup>1</sup> *α*<sup>2</sup> ··· *α<sup>L</sup>* ]

**Y**(*m*) = **b**α + **E**(*m*)

**<sup>Q</sup>**−<sup>1</sup> <sup>1</sup> *M*

The log-likelihood function of **Y**(*m*) is proportional to (within an additive constant):

*M* ∑ *m*=1

where det(·) denotes the determinant of a matrix and tr(·) denotes the trace of a matrix. Consider first the estimate of **Q** and the unstructured estimate of **C** = **b**α. It is easy to show

**<sup>Y</sup>**(*m*) <sup>−</sup> **<sup>b</sup>**<sup>α</sup>

*<sup>l</sup>* ) 

**<sup>a</sup>***H*(*ωl*)(**S**∗**Y**(*m*)

*<sup>l</sup>* ) 

*<sup>m</sup>*=<sup>1</sup> are assumed to be zero-mean colored Gaussian random

 *ωl*=*ω*ˆ *<sup>l</sup>* *<sup>α</sup>*(*m*) *l*

*<sup>M</sup> m*=1

, *m* = 1, ..., *M*, (29)

*<sup>T</sup>*, (30)

*<sup>T</sup>* . (31)

*H* 

, (33)

, *m* = 1, 2, ..., *M*. (32)

**<sup>Y</sup>**(*m*) <sup>−</sup> **<sup>b</sup>**<sup>α</sup>

and *ω<sup>l</sup>* yields

, (27)

. (28)

*<sup>m</sup>*=1, *<sup>ω</sup>l*) with respect to the real-valued

 *M* ∑ *m*=1

Minimizing *C*4({*α*

and

al. b, 2002).

and

Then

(*m*) *<sup>l</sup>* }*<sup>M</sup>*

**3.2.2 Fixed delays and fixed gains**

where the noise vectors {**E**(*m*)}*<sup>M</sup>*

*α*ˆ (*m*) *<sup>l</sup>* <sup>=</sup> Re

*<sup>ω</sup>*<sup>ˆ</sup> *<sup>l</sup>* <sup>=</sup> arg max *<sup>ω</sup><sup>l</sup>*

**Y**(*m*) =

<sup>−</sup> ln[det(**Q**)] <sup>−</sup> tr

*L* ∑ *l*=1

fixed gains. In radar applications, the two cases correspond to two different target fluctuation models, e.g., Swerling I and Swerling II (Barton, 1988).

#### **3.2.1 Fixed delays but arbitrary gains**

Consider the case where multiple pulses are transmitted and the ranges of target scatterers remain the same but their gains change randomly during the observation interval.

Let **Y**(*m*) be the DFT of the received vector due to the *m*th pulse. Then

$$\mathbf{Y}^{(m)} = \sum\_{l=1}^{L} \mathbf{a}\_{l}^{(m)} [\mathbf{S} \mathbf{a}(\omega\_{l})] + \mathbf{E}^{(m)}, \quad m = 1, 2, \dots, M,\tag{21}$$

where *α* (*m*) *<sup>l</sup>* denotes the gain of the *l*th scatterers due to the *m*th pulse and the noise vectors {**E**(*m*)}*<sup>M</sup> <sup>m</sup>*=<sup>1</sup> are assumed independent of each other. Our problem of interest is to estimate *α* (*m*) *<sup>l</sup>* , *ω<sup>l</sup> l*=1,..,*L*;*m*=1,...,*M* from **Y**(*m*) *<sup>M</sup> m*=1 .

We now extend the WRELAX algorithm to this multiple look case. The extended WRELAX algorithm minimizes the following NLS criterion:

$$\mathbb{C}\_{\mathbf{3}}(\left\{\mathbf{a}\_{l}^{(m)},\omega\_{l}\right\}\_{l=1,\ldots,L;m=1,\ldots,M}) = \sum\_{m=1}^{M} \left\| \mathbf{Y}^{(m)} - \sum\_{l=1}^{L} \mathbf{a}\_{l}^{(m)} \mathbf{b}(\omega\_{l}) \right\|^{2} \tag{22}$$

where **b**(*ωl*) is defined in (12).

Before we present the extended WRELAX algorithm, let us consider the following preparations. Let

$$\mathbf{Y}\_{l}^{(m)} = \mathbf{Y}^{(m)} - \sum\_{i=1, i \neq l}^{L} \hat{\mathbf{a}}\_{i}^{(m)} \mathbf{b}(\hat{\boldsymbol{\omega}}\_{i}), \quad m = 1, \ldots, M,\tag{23}$$

where *α*ˆ (*m*) *<sup>i</sup>* , *ω*ˆ *<sup>i</sup> <sup>i</sup>*=1,...,*L*,*i*�=*l*,*m*=1,...,*<sup>M</sup>* are assumed given. Then the cost function *C*3( *α* (*m*) *<sup>l</sup>* , *ω<sup>l</sup> l*=1,..,*L*;*m*=1,...,*M*) becomes

$$\mathbb{C}\_{4}(\{\boldsymbol{a}\_{l}^{(m)}\}\_{m=1\prime}^{M},\boldsymbol{\omega}\_{l}) = \sum\_{m=1}^{M} \left\| \mathbf{Y}\_{l}^{(m)} - \boldsymbol{a}\_{l}^{(m)} \mathbf{b}(\boldsymbol{\omega}\_{l}) \right\|^{2}.\tag{24}$$

Minimizing *<sup>C</sup>*4({*αl*}*<sup>L</sup> <sup>l</sup>*=1, *<sup>ω</sup>l*) with respect to the complex-valued {*α*(*m*) *<sup>l</sup>* }*<sup>M</sup> <sup>m</sup>*=<sup>1</sup> and *ω<sup>l</sup>* yields

$$\hat{\mathbf{a}}\_{l}^{(m)} = \left. \frac{\mathbf{a}^{H}(\omega\_{l})(\mathbf{S}^{\*}\mathbf{Y}\_{l}^{(m)})}{\|\mathbf{S}\|\_{F}^{2}} \right|\_{\omega\_{l} = \hat{\omega}\_{l}} \, \tag{25}$$

and

$$\hat{\omega}\_{l} = \arg\max\_{\omega\_{l}} \left[ \sum\_{m=1}^{M} \left| \mathbf{a}^{H}(\omega\_{l}) (\mathbf{S}^{\*} \mathbf{Y}\_{l}^{(m)}) \right|^{2} \right]. \tag{26}$$

Minimizing *C*4({*α* (*m*) *<sup>l</sup>* }*<sup>M</sup> <sup>m</sup>*=1, *<sup>ω</sup>l*) with respect to the real-valued *<sup>α</sup>*(*m*) *l <sup>M</sup> m*=1 and *ω<sup>l</sup>* yields

$$\mathfrak{h}\_l^{(m)} = \frac{\text{Re}\left[\mathbf{a}^H(\omega\_l)(\mathbf{S}^\*\mathbf{Y}\_l^{(m)})\right]}{\|\|\mathbf{S}\|\|\_F^2}\bigg|\_{\omega\_l = \mathcal{Q}\_l} \tag{27}$$

and

8 Will-be-set-by-IN-TECH

fixed gains. In radar applications, the two cases correspond to two different target fluctuation

Consider the case where multiple pulses are transmitted and the ranges of target scatterers

*<sup>l</sup>* denotes the gain of the *l*th scatterers due to the *m*th pulse and the noise vectors

*<sup>m</sup>*=<sup>1</sup> are assumed independent of each other. Our problem of interest is to estimate

We now extend the WRELAX algorithm to this multiple look case. The extended WRELAX

Before we present the extended WRELAX algorithm, let us consider the following

*α*ˆ (*m*)

*M* ∑ *m*=1

*<sup>l</sup>*=1, *<sup>ω</sup>l*) with respect to the complex-valued {*α*(*m*)

� **<sup>S</sup>** �<sup>2</sup> *F*

> 

*<sup>l</sup>* <sup>=</sup> **<sup>a</sup>***H*(*ωl*)(**S**∗**Y**(*m*)

 *M* ∑ *m*=1  **Y**(*m*) *<sup>l</sup>* − *α*

*<sup>l</sup>* )

**<sup>a</sup>***H*(*ωl*)(**S**∗**Y**(*m*)

*<sup>l</sup>* ) 2 

 *ωl*=*ω*ˆ *<sup>l</sup>*

*M* ∑ *m*=1  

**<sup>Y</sup>**(*m*) <sup>−</sup>

*<sup>i</sup>*=1,...,*L*,*i*�=*l*,*m*=1,...,*<sup>M</sup>* are assumed given. Then the cost function

(*m*) *<sup>l</sup>* **b**(*ωl*)

*L* ∑ *l*=1

*<sup>α</sup>*(*m*) *<sup>l</sup>* **b**(*ωl*)

*<sup>i</sup>* **b**(*ω*ˆ *<sup>i</sup>*), *m* = 1, ..., *M*, (23)

 2

*<sup>l</sup>* }*<sup>M</sup>*

, *m* = 1, 2, ..., *M*, (21)

  2

, (22)

. (24)

*<sup>m</sup>*=<sup>1</sup> and *ω<sup>l</sup>* yields

. (26)

, (25)

remain the same but their gains change randomly during the observation interval.

*<sup>l</sup>* [**Sa**(*ωl*)] + **<sup>E</sup>**(*m*)

Let **Y**(*m*) be the DFT of the received vector due to the *m*th pulse. Then

*<sup>α</sup>*(*m*)

*l*=1,..,*L*;*m*=1,...,*M*) =

*L* ∑ *i*=1,*i*�=*l*

*<sup>m</sup>*=1, *ωl*) =

*L* ∑ *l*=1

models, e.g., Swerling I and Swerling II (Barton, 1988).

**Y**(*m*) =

from **Y**(*m*) *<sup>M</sup> m*=1 .

algorithm minimizes the following NLS criterion:

**Y**(*m*)

*l*=1,..,*L*;*m*=1,...,*M*) becomes

*<sup>C</sup>*4({*α*(*m*) *<sup>l</sup>* }*<sup>M</sup>*

> *α*ˆ (*m*)

*<sup>ω</sup>*<sup>ˆ</sup> *<sup>l</sup>* <sup>=</sup> arg max *<sup>ω</sup><sup>l</sup>*

*<sup>l</sup>* <sup>=</sup> **<sup>Y</sup>**(*m*) <sup>−</sup>

**3.2.1 Fixed delays but arbitrary gains**

*l*=1,..,*L*;*m*=1,...,*M*

*C*3( *<sup>α</sup>*(*m*) *<sup>l</sup>* , *ω<sup>l</sup>* 

where **b**(*ωl*) is defined in (12).

preparations. Let

 *α*ˆ (*m*) *<sup>i</sup>* , *ω*ˆ *<sup>i</sup>* 

Minimizing *<sup>C</sup>*4({*αl*}*<sup>L</sup>*

where

*C*3( *α* (*m*) *<sup>l</sup>* , *ω<sup>l</sup>* 

and

where *α*

{**E**(*m*) }*M*

 *α* (*m*) *<sup>l</sup>* , *ω<sup>l</sup>* 

(*m*)

$$\hat{\omega}\_{l} = \arg\max\_{\omega\_{l}} \left\{ \sum\_{m=1}^{M} \text{Re}^{2} \left[ \mathbf{a}^{H}(\omega\_{l}) (\mathbf{S}^{\*} \mathbf{Y}\_{l}^{(m)}) \right] \right\}. \tag{28}$$

#### **3.2.2 Fixed delays and fixed gains**

When both the delays and gains of the target scatterers remain the same during the multiple look interval, we can derive an ML estimator when the noise is assumed to be a zero-mean colored Gaussian noise with an unknown covariance matrix **Q**. Note that although we could continue to use the NLS approach for the current problem, we prefer to take the noise statistics into account since we will show below that doing so in this case introduces little difficulties for sufficiently large *M*. For the former problems, modeling the noise with an unknown covariance matrix **Q** makes the ML approach ill-defined due to too many unknowns (Li et al. b, 2002).

Let **Y**(*m*) be the DFT of the received data vector due to the *m*th pulse which can be written as

$$\mathbf{Y}^{(m)} = \sum\_{l=1}^{L} \alpha\_l \mathbf{b}(\omega\_l) + \mathbf{E}^{(m)}, \quad m = 1, \dots, M,\tag{29}$$

where the noise vectors {**E**(*m*)}*<sup>M</sup> <sup>m</sup>*=<sup>1</sup> are assumed to be zero-mean colored Gaussian random vectors with an unknown covariance matrix **Q** that are independent of each other. Let

$$\mathbf{b} = [\mathbf{b}(\omega\_1)\ \mathbf{b}(\omega\_2)\cdots\mathbf{b}(\omega\_L)]^T,\tag{30}$$

and

$$\boldsymbol{\alpha} = \begin{bmatrix} \boldsymbol{\alpha}\_1 \ \boldsymbol{\alpha}\_2 \ \cdots \ \boldsymbol{\alpha}\_L \end{bmatrix}^T. \tag{31}$$

Then

$$\mathbf{Y}^{(m)} = \mathbf{b}\boldsymbol{\alpha} + \mathbf{E}^{(m)}, \quad m = 1, 2, \dots, M. \tag{32}$$

The log-likelihood function of **Y**(*m*) is proportional to (within an additive constant):

$$-\ln[\det(\mathbf{Q})] - \text{tr}\left\{\mathbf{Q}^{-1}\frac{1}{M}\sum\_{m=1}^{M}\left[\mathbf{Y}^{(m)} - \mathbf{b}\alpha\right]\left[\mathbf{Y}^{(m)} - \mathbf{b}\alpha\right]^{H}\right\},\tag{33}$$

where det(·) denotes the determinant of a matrix and tr(·) denotes the trace of a matrix. Consider first the estimate of **Q** and the unstructured estimate of **C** = **b**α. It is easy to show

Time Delay Estimation 11

FFT-Based Efficient Algorithms for Time Delay Estimation 65

*L* ∑ *i*=1,*i*�=*l*

*<sup>i</sup>*=1,*i*�=*<sup>l</sup>* are assumed given. Then minimizing *<sup>C</sup>*<sup>6</sup> becomes minimizing

*<sup>H</sup>* **<sup>Q</sup>**<sup>ˆ</sup> <sup>−</sup><sup>1</sup>

<sup>=</sup> **<sup>a</sup>***H*(*ωl*)

 **Q**<sup>ˆ</sup> <sup>−</sup> <sup>1</sup>

**<sup>a</sup>***H*(*ωl*)**S**∗**Q**<sup>ˆ</sup> <sup>−</sup>1**C**<sup>ˆ</sup> *<sup>l</sup>*

<sup>2</sup> **Sa**(*ωl*)

**a***H*(*ωl*)**S**∗**Q**ˆ <sup>−</sup>1**C**ˆ *<sup>l</sup>*

 

<sup>2</sup> **Sa**(*ωl*)

*<sup>T</sup>* be **<sup>Q</sup>***t*, where the subscript "*t*" represents the covariance

**Q** = γ **Q***<sup>t</sup>* γ*H*, (49)

 *ωl*=*ω*ˆ *<sup>l</sup>*

 2 **<sup>C</sup>**<sup>ˆ</sup> *<sup>l</sup>* <sup>−</sup> *<sup>α</sup>l***b**(*ωl*)

**S**∗**Q**ˆ <sup>−</sup>1**C**ˆ *<sup>l</sup>*

 2

*<sup>l</sup>*=1. Minimizing *C*7(*αl*, *ωl*) with respect to *α<sup>l</sup>* and *ω<sup>l</sup>*

<sup>2</sup> **Sa**(*ωl*)

 2

  *<sup>l</sup>*=1). Let

. (44)

, (45)

*α*ˆ*i***b**(*ω*ˆ *<sup>i</sup>*), (43)

*<sup>l</sup>*=1. Minimizing *C*7(*αl*, *ωl*) with respect to *α<sup>l</sup>*

 *ωl*=*ω*ˆ *<sup>l</sup>*

<sup>2</sup> . (46)

, (47)

<sup>2</sup> . (48)

We consider below using the relaxation based approach to minimize *<sup>C</sup>*6({*αl*, *<sup>ω</sup>l*}*<sup>L</sup>*

**<sup>C</sup>**<sup>ˆ</sup> *<sup>l</sup>* <sup>−</sup> *<sup>α</sup>l***b**(*ωl*)

 *ωl*=*ω*ˆ *<sup>l</sup>*

 **Q**<sup>ˆ</sup> <sup>−</sup> <sup>1</sup>

**a***H*(*ωl*)**S**∗**Q**ˆ <sup>−</sup>1**C**ˆ *<sup>l</sup>*

<sup>2</sup> **Sa**(*ωl*)

Re<sup>2</sup>

 **Q**<sup>ˆ</sup> <sup>−</sup> <sup>1</sup>

In the above derivations, **Q**ˆ <sup>−</sup><sup>1</sup> plays the role of whitening the noise. A good estimate of **Q** requires a large number of independent data vectors (i.e., *M* should be large enough as compared with *N*). When *M* is small, the noise covariance matrix estimated from (40) is singular or near singular. At least *N* data vectors are needed to guarantee that the matrix **Q**ˆ is

Usually, the receiver noise *e*(*t*) in (1) can be modeled as a zero-mean stationary and ergodic Gaussian stochastic process. Let the covariance matrix corresponding to the sampled noise

matrix of the noise in the time domain. Then **Q***<sup>t</sup>* is a Hermitian and Toeplitz matrix. The

**<sup>C</sup>**<sup>ˆ</sup> *<sup>l</sup>* <sup>=</sup> **<sup>C</sup>**<sup>ˆ</sup> <sup>−</sup>

*<sup>C</sup>*7(*αl*, *<sup>ω</sup>l*) =

Consider first the case of complex-valued {*αl*}*<sup>L</sup>*

*<sup>α</sup>*ˆ*<sup>l</sup>* <sup>=</sup> **<sup>b</sup>***H***Q**<sup>ˆ</sup> <sup>−</sup>1**C**<sup>ˆ</sup> *<sup>l</sup>*

Consider next the case of real-valued {*αl*}*<sup>L</sup>*

non-singular with probability one.

vector [*e*(0), *e*(*Ts*) ··· *e*((*N* − 1)*Ts*)]

**b***H*(*ωl*)**Q**ˆ <sup>−</sup>1**b**(*ωl*)

*<sup>ω</sup>*<sup>ˆ</sup> *<sup>l</sup>* <sup>=</sup> arg max *<sup>ω</sup><sup>l</sup>*

*<sup>α</sup>*ˆ*<sup>l</sup>* <sup>=</sup> Re

*<sup>ω</sup>*<sup>ˆ</sup> *<sup>l</sup>* <sup>=</sup> arg max *<sup>ω</sup><sup>l</sup>*

frequency domain noise covariance matrix **Q** is related to **Q***<sup>t</sup>* as follows

 **Q**<sup>ˆ</sup> <sup>−</sup> <sup>1</sup>

where {*α*ˆ*i*, *<sup>ω</sup>*<sup>ˆ</sup> *<sup>i</sup>*}*<sup>L</sup>*

and *ω<sup>l</sup>* yields:

and

yields

and

that the estimate **Q**ˆ of **Q** is

$$\hat{\mathbf{Q}} = \frac{1}{M} \sum\_{m=1}^{M} \left[ \mathbf{Y}^{(m)} - \mathbf{C} \right] \left[ \mathbf{Y}^{(m)} - \mathbf{C} \right]^{H} \tag{34}$$

where **C**ˆ may be obtained by minimizing the following cost function:

$$\mathbf{C}\_{5} = \det\left[\frac{1}{M} \sum\_{m=1}^{M} \left(\mathbf{Y}^{(m)} - \mathbf{C}\right) \left(\mathbf{Y}^{(m)} - \mathbf{C}\right)^{H}\right].\tag{35}$$

Let

$$
\hat{\mathbf{R}}\_{Y1} = \frac{1}{M} \sum\_{m=1}^{M} \mathbf{Y}^{(m)},\tag{36}
$$

and

$$\hat{\mathbf{R}}\_{YY} = \frac{1}{M} \sum\_{m=1}^{M} \mathbf{Y}^{(m)} (\mathbf{Y}^{(m)})^H. \tag{37}$$

Then

$$\begin{split} \mathbf{G} &= \frac{1}{M} \sum\_{m=1}^{M} \left[ \mathbf{Y}^{(m)} - \mathbf{C} \right] \left[ \mathbf{Y}^{(m)} - \mathbf{C} \right]^{H} \\ &= \hat{\mathbf{R}}\_{YY} - \mathbf{C} \hat{\mathbf{R}}\_{Y1}^{H} - \hat{\mathbf{R}}\_{Y1} \mathbf{C}^{H} + \mathbf{C} \mathbf{C}^{H} \\ &= \left[ \mathbf{C} - \hat{\mathbf{R}}\_{Y1} \right] \left[ \mathbf{C} - \hat{\mathbf{R}}\_{Y1} \right]^{H} + \hat{\mathbf{R}}\_{YY} - \hat{\mathbf{R}}\_{Y1} \hat{\mathbf{R}}\_{Y1}^{H}. \end{split} \tag{38}$$

To minimize det(**G**), we have

$$
\hat{\mathbf{C}} = \hat{\mathbf{R}}\_{Y1}.\tag{39}
$$

Then using the **C**ˆ in (39) to replace the **C** in (34) yields

$$
\hat{\mathbf{Q}} = \hat{\mathbf{R}}\_{YY} - \hat{\mathbf{R}}\_{Y1} \hat{\mathbf{R}}\_{Y1}^H. \tag{40}
$$

With these notations, the above *C*<sup>5</sup> can be rewritten as

$$\mathbf{C}\_{5} = \det\left[\hat{\mathbf{R}}\_{YY} - \mathbf{C}\hat{\mathbf{R}}\_{Y1}^{H} - \hat{\mathbf{R}}\_{Y1}\mathbf{C}^{H} + \mathbf{C}\mathbf{C}^{H}\right] = \det\left[\hat{\mathbf{R}}\_{YY} - \hat{\mathbf{R}}\_{Y1}\hat{\mathbf{R}}\_{Y1}^{H} + (\mathbf{C} - \hat{\mathbf{C}})(\mathbf{C} - \hat{\mathbf{C}})^{H}\right]$$

$$= \det(\hat{\mathbf{Q}})\det\left[\mathbf{I} + \hat{\mathbf{Q}}^{-1}(\mathbf{C} - \hat{\mathbf{C}})(\mathbf{C} - \hat{\mathbf{C}})^{H}\right] = \det(\hat{\mathbf{Q}})\left[1 + (\mathbf{C} - \hat{\mathbf{C}})^{H}\hat{\mathbf{Q}}^{-1}(\mathbf{C} - \hat{\mathbf{C}})\right],\tag{41}$$

where we have used the fact that det(**I** + **ab**) = det(**I** + **ba**) if the dimensions of **a** and **b** permit. Hence minimizing *C*<sup>5</sup> is equivalent to minimizing

$$\mathbf{C}\_{\mathsf{G}}(\{\mathfrak{a}\_{l\prime}\omega\_{l}\}\_{l=1}^{L}) = \left[\mathbf{C} - \hat{\mathbf{C}}\right]^{H} \hat{\mathbf{Q}}^{-1} \left[\mathbf{C} - \hat{\mathbf{C}}\right] = \left[\mathbf{b}\boldsymbol{\alpha} - \hat{\mathbf{C}}\right]^{H} \hat{\mathbf{Q}}^{-1} \left[\mathbf{b}\boldsymbol{\alpha} - \hat{\mathbf{C}}\right],\tag{42}$$

which is again a highly nonlinear optimization problem.

We consider below using the relaxation based approach to minimize *<sup>C</sup>*6({*αl*, *<sup>ω</sup>l*}*<sup>L</sup> <sup>l</sup>*=1). Let

$$
\hat{\mathbf{C}}\_I = \hat{\mathbf{C}} - \sum\_{i=1, i \neq I}^{L} \hat{\mathbf{a}}\_i \mathbf{b}(\hat{\omega}\_i)\_\prime \tag{43}
$$

where {*α*ˆ*i*, *<sup>ω</sup>*<sup>ˆ</sup> *<sup>i</sup>*}*<sup>L</sup> <sup>i</sup>*=1,*i*�=*<sup>l</sup>* are assumed given. Then minimizing *<sup>C</sup>*<sup>6</sup> becomes minimizing

$$\mathbf{C}\_{7}(\boldsymbol{\omega}\_{l},\boldsymbol{\omega}\_{l}) = \left[\hat{\mathbf{C}}\_{l} - \boldsymbol{\alpha}\_{l}\mathbf{b}(\boldsymbol{\omega}\_{l})\right]^{H}\hat{\mathbf{Q}}^{-1}\left[\hat{\mathbf{C}}\_{l} - \boldsymbol{\alpha}\_{l}\mathbf{b}(\boldsymbol{\omega}\_{l})\right].\tag{44}$$

Consider first the case of complex-valued {*αl*}*<sup>L</sup> <sup>l</sup>*=1. Minimizing *C*7(*αl*, *ωl*) with respect to *α<sup>l</sup>* and *ω<sup>l</sup>* yields:

$$\hat{\boldsymbol{w}}\_{l} = \frac{\mathbf{b}^{H}\hat{\mathbf{Q}}^{-1}\hat{\mathbf{C}}\_{l}}{\mathbf{b}^{H}(\omega\_{l})\hat{\mathbf{Q}}^{-1}\mathbf{b}(\omega\_{l})}\bigg|\_{\omega\_{l}=\hat{\omega}\_{l}} = \frac{\mathbf{a}^{H}(\omega\_{l})\left(\mathbf{S}^{\*}\hat{\mathbf{Q}}^{-1}\hat{\mathbf{C}}\_{l}\right)}{\left\|\hat{\mathbf{Q}}^{-\frac{1}{2}}\mathbf{S}\mathbf{a}(\omega\_{l})\right\|^{2}}\bigg|\_{\omega\_{l}=\hat{\omega}\_{l}}\tag{45}$$

and

10 Will-be-set-by-IN-TECH

**<sup>Y</sup>**(*m*) <sup>−</sup> **<sup>C</sup>**

**<sup>Y</sup>**(*m*) <sup>−</sup> **<sup>C</sup>**

*M* ∑ *m*=1

**Y**(*m*)

*<sup>Y</sup>*<sup>1</sup> <sup>−</sup> **<sup>R</sup>**<sup>ˆ</sup> *<sup>Y</sup>*1**C***<sup>H</sup>* <sup>+</sup> **CC***<sup>H</sup>*

**<sup>Y</sup>**(*m*) <sup>−</sup> **<sup>C</sup>**

**Y**(*m*)

(**Y**(*m*)

**<sup>Y</sup>**(*m*) <sup>−</sup> **<sup>C</sup>**

*H*

*<sup>H</sup>* <sup>+</sup> **<sup>R</sup>**<sup>ˆ</sup> *YY* <sup>−</sup> **<sup>R</sup>**<sup>ˆ</sup> *<sup>Y</sup>*1**R**<sup>ˆ</sup> *<sup>H</sup>*

**<sup>R</sup>**<sup>ˆ</sup> *YY* <sup>−</sup> **<sup>R</sup>**<sup>ˆ</sup> *<sup>Y</sup>*1**R**<sup>ˆ</sup> *<sup>H</sup>*

**<sup>b</sup>**<sup>α</sup> <sup>−</sup> **<sup>C</sup>**<sup>ˆ</sup>

**C**ˆ = **R**ˆ *<sup>Y</sup>*1. (39)

*<sup>Y</sup>*1. (40)

<sup>1</sup> + (**<sup>C</sup>** <sup>−</sup> **<sup>C</sup>**<sup>ˆ</sup> )*H***Q**<sup>ˆ</sup> <sup>−</sup>1(**<sup>C</sup>** <sup>−</sup> **<sup>C</sup>**<sup>ˆ</sup> )

*<sup>H</sup>* **<sup>Q</sup>**<sup>ˆ</sup> <sup>−</sup><sup>1</sup>

*<sup>Y</sup>*<sup>1</sup> + (**<sup>C</sup>** <sup>−</sup> **<sup>C</sup>**<sup>ˆ</sup> )(**<sup>C</sup>** <sup>−</sup> **<sup>C</sup>**<sup>ˆ</sup> )*<sup>H</sup>*

**<sup>b</sup>**<sup>α</sup> <sup>−</sup> **<sup>C</sup>**<sup>ˆ</sup>   , (41)

, (42)

*<sup>H</sup>*

*<sup>H</sup>* 

**<sup>Y</sup>**(*m*) <sup>−</sup> **<sup>C</sup>**

, (34)

. (35)

, (36)

)*H*. (37)

*<sup>Y</sup>*1. (38)

that the estimate **Q**ˆ of **Q** is

Let

and

Then

**<sup>Q</sup>**<sup>ˆ</sup> <sup>=</sup> <sup>1</sup> *M*

*C*<sup>5</sup> = det

**<sup>G</sup>** <sup>=</sup> <sup>1</sup> *M*

=

Then using the **C**ˆ in (39) to replace the **C** in (34) yields

With these notations, the above *C*<sup>5</sup> can be rewritten as

permit. Hence minimizing *C*<sup>5</sup> is equivalent to minimizing

which is again a highly nonlinear optimization problem.

*<sup>l</sup>*=1) =

To minimize det(**G**), we have

*C*<sup>5</sup> = det

= det(**Q**ˆ )det

**<sup>R</sup>**<sup>ˆ</sup> *YY* <sup>−</sup> **CR**<sup>ˆ</sup> *<sup>H</sup>*

*<sup>C</sup>*6({*αl*, *<sup>ω</sup>l*}*<sup>L</sup>*

*M* ∑ *m*=1

where **C**ˆ may be obtained by minimizing the following cost function:

 1 *M* *M* ∑ *m*=1

**<sup>R</sup>**<sup>ˆ</sup> *<sup>Y</sup>*<sup>1</sup> <sup>=</sup> <sup>1</sup> *M*

> *M* ∑ *m*=1

**<sup>R</sup>**<sup>ˆ</sup> *YY* <sup>=</sup> <sup>1</sup> *M*

**<sup>Y</sup>**(*m*) <sup>−</sup> **<sup>C</sup>**

**<sup>C</sup>** <sup>−</sup> **<sup>R</sup>**<sup>ˆ</sup> *<sup>Y</sup>*<sup>1</sup>

**<sup>Q</sup>**<sup>ˆ</sup> <sup>=</sup> **<sup>R</sup>**<sup>ˆ</sup> *YY* <sup>−</sup> **<sup>R</sup>**<sup>ˆ</sup> *<sup>Y</sup>*1**R**<sup>ˆ</sup> *<sup>H</sup>*

= det 

where we have used the fact that det(**I** + **ab**) = det(**I** + **ba**) if the dimensions of **a** and **b**

**<sup>C</sup>** <sup>−</sup> **<sup>C</sup>**<sup>ˆ</sup> =

*<sup>H</sup>* **<sup>Q</sup>**<sup>ˆ</sup> <sup>−</sup><sup>1</sup>

= det(**Q**ˆ )

*M* ∑ *m*=1

<sup>=</sup> **<sup>R</sup>**<sup>ˆ</sup> *YY* <sup>−</sup> **CR**<sup>ˆ</sup> *<sup>H</sup>*

**<sup>C</sup>** <sup>−</sup> **<sup>R</sup>**<sup>ˆ</sup> *<sup>Y</sup>*<sup>1</sup>

*<sup>Y</sup>*<sup>1</sup> <sup>−</sup> **<sup>R</sup>**<sup>ˆ</sup> *<sup>Y</sup>*1**C***<sup>H</sup>* <sup>+</sup> **CC***<sup>H</sup>*

**<sup>I</sup>** <sup>+</sup> **<sup>Q</sup>**<sup>ˆ</sup> <sup>−</sup>1(**<sup>C</sup>** <sup>−</sup> **<sup>C</sup>**<sup>ˆ</sup> )(**<sup>C</sup>** <sup>−</sup> **<sup>C</sup>**<sup>ˆ</sup> )*<sup>H</sup>*

**<sup>C</sup>** <sup>−</sup> **<sup>C</sup>**<sup>ˆ</sup>

$$\hat{\omega}\_{l} = \arg\max\_{\omega\_{l}} \frac{\left| \mathbf{a}^{H}(\omega\_{l}) \mathbf{S}^{\*} \hat{\mathbf{Q}}^{-1} \hat{\mathbf{C}}\_{l} \right|^{2}}{\left\| \hat{\mathbf{Q}}^{-\frac{1}{2}} \mathbf{S} \mathbf{a}(\omega\_{l}) \right\|^{2}}. \tag{46}$$

Consider next the case of real-valued {*αl*}*<sup>L</sup> <sup>l</sup>*=1. Minimizing *C*7(*αl*, *ωl*) with respect to *α<sup>l</sup>* and *ω<sup>l</sup>* yields

$$\hat{\mathfrak{a}}\_{l} = \frac{\text{Re}\left[\mathbf{a}^{H}(\omega\_{l})\mathbf{S}^{\ast}\hat{\mathbf{Q}}^{-1}\hat{\mathbf{C}}\_{l}\right]}{\left\|\hat{\mathbf{Q}}^{-\frac{1}{2}}\mathbf{S}\mathbf{a}(\omega\_{l})\right\|^{2}}\bigg|\_{\omega\_{l} = \hat{\omega}\_{l}}\tag{47}$$

and

$$\hat{\omega}\_{l} = \arg\max\_{\omega\_{l}} \frac{\text{Re}^{2} \left[ \mathbf{a}^{H}(\omega\_{l}) \mathbf{S}^{\*} \hat{\mathbf{Q}}^{-1} \hat{\mathbf{C}}\_{l} \right]}{\left\| \hat{\mathbf{Q}}^{-\frac{1}{2}} \mathbf{S} \mathbf{a}(\omega\_{l}) \right\|^{2}}. \tag{48}$$

In the above derivations, **Q**ˆ <sup>−</sup><sup>1</sup> plays the role of whitening the noise. A good estimate of **Q** requires a large number of independent data vectors (i.e., *M* should be large enough as compared with *N*). When *M* is small, the noise covariance matrix estimated from (40) is singular or near singular. At least *N* data vectors are needed to guarantee that the matrix **Q**ˆ is non-singular with probability one.

Usually, the receiver noise *e*(*t*) in (1) can be modeled as a zero-mean stationary and ergodic Gaussian stochastic process. Let the covariance matrix corresponding to the sampled noise vector [*e*(0), *e*(*Ts*) ··· *e*((*N* − 1)*Ts*)] *<sup>T</sup>* be **<sup>Q</sup>***t*, where the subscript "*t*" represents the covariance matrix of the noise in the time domain. Then **Q***<sup>t</sup>* is a Hermitian and Toeplitz matrix. The frequency domain noise covariance matrix **Q** is related to **Q***<sup>t</sup>* as follows

$$\mathbf{Q} = \gamma \mathbf{Q}\_l \gamma^H \,\tag{49}$$

Time Delay Estimation 13

FFT-Based Efficient Algorithms for Time Delay Estimation 67

−25

Fig. 4. MSEs of WRELAX ("×"), TWRELAX ("◦"), and CRBs (solid line) for (a) *τ*<sup>1</sup> , (b) *α*<sup>1</sup> as a

approach the CRBs. Note also that in Figure 4, due to the inversion of poorly estimated noise covariance matrices, some points of the MSE curves of WRELAX are beyond the scope of the

When bandpass real-valued probe signals are used, the correlation function between the received and the known transmitted signals oscillates near the carrier frequency of the transmitted signal. In this case, many existing time delay estimation algorithms perform poorly due to converging to local optimum points. Here, two efficient algorithms are proposed to deal with the above problem. First we assume that the signal amplitudes are complex-valued and use WRELAX to obtain the initial estimates of the delays and the

**3.3 Hybrid-WRELAX algorithm and EXtended Invariance Principle Based WRELAX**

function of the normalized look number log2(*M*/*N*) with SNR=−5 dB.

MSE (dB) of α1

−20

−15

MSE (dB) of α1

Fig. 3. MSEs of WRELAX ("×"), TWRELAX ("◦"), and CRBs (solid line) for (a) *τ*<sup>1</sup> , (b) *α*<sup>1</sup> as a

CRB WRELAX TWRELAX −10

−5

0

−25 −24 −23 −22 −21 −20 −19 −18 −17 −16 −15 −30

−3 −2 −1 <sup>0</sup> <sup>1</sup> <sup>2</sup> <sup>3</sup> −40

(b)

SNR (dB)

(b)

SNR (dB)

CRB WRELAX TWRELAX

CRB WRELAX TWRELAX

CRB WRELAX TWRELAX

−25 −24 −23 −22 −21 −20 −19 −18 −17 −16 −15 −170

−3 −2 −1 <sup>0</sup> <sup>1</sup> <sup>2</sup> <sup>3</sup> −180

(a)

**(EXIP-WRELAX) algorithm**

SNR (dB)

(a)

function of SNR with *M* = 4*N*.

SNR (dB)

−160

−170 −160 −150 −140 −130 −120 −110 −100

axis limits.

MSE (dB) of τ

1

−150

−140

MSE (dB) of τ

1

−130

−120

−110

where γ is the DFT matrix,

$$\gamma = \frac{1}{\sqrt{N}} \left[ \mathbf{a}(-\pi) \,\,\mathbf{a}(-\pi + 2\pi/N) \,\cdots \,\mathbf{a}(\pi - 2\pi/N) \right]^H,\tag{50}$$

and

$$\mathbf{a}(\omega) = \begin{bmatrix} e^{j\omega(-N/2)} \ e^{j\omega(-N/2+1)} \cdots \ e^{j\omega(N/2-1)} \end{bmatrix}^T. \tag{51}$$

It can be shown that, in general, **Q** is no longer a Toeplitz matrix. However, we can use the Toeplitz property of **Q***<sup>t</sup>* to improve the estimation performance. First, we can obtain **Q**ˆ by using (40). Then the estimate **Q**ˆ *<sup>t</sup>* of **Q***<sup>t</sup>* can be obtained by using (49), which is

$$
\hat{\mathbf{Q}}\_{l} = \gamma^{H} \hat{\mathbf{Q}} \,\gamma.\tag{52}
$$

Due to a finite number of data vectors, **Q**ˆ *<sup>t</sup>* is no longer a Toeplitz matrix. Although there are many ways to to modify **<sup>Q</sup>**<sup>ˆ</sup> *<sup>t</sup>* to obtain a Toeplitz matrix **<sup>Q</sup>**<sup>ˆ</sup> (*T*) *<sup>t</sup>* , in this paper we use the following simple approach. Let *q*ˆ*t*(*i*, *j*) be the (*i*, *j*)th element of **Q**ˆ *<sup>t</sup>*. Define

$$\hat{\sigma}(k) = \frac{1}{N-k} \sum\_{i=1}^{N-k} \hat{\eta}\_t(i, i+k), \quad k = 0, 1, \cdots, N-1. \tag{53}$$

Then

$$
\hat{\mathbf{Q}}\_t^{(T)} = \begin{bmatrix}
\mathfrak{f}(0) & \mathfrak{f}(1) & \cdots & \mathfrak{f}(N-1) \\
\mathfrak{f}^\*(1) & \ddots & \ddots & \vdots \\
\vdots & \ddots & \ddots & \mathfrak{f}(1) \\
\mathfrak{f}^\*(N-1) & \cdots & \mathfrak{f}^\*(1) & \mathfrak{f}(0)
\end{bmatrix}.\tag{54}$$

Using **Q**ˆ (*T*) instead of **Q**ˆ in (45)-(48), where

$$
\hat{\mathbf{Q}}^{(T)} = \gamma \hat{\mathbf{Q}}\_t^{(T)} \gamma^H,\tag{55}
$$

we obtain a new algorithm referred to as TWRELAX. The TWRELAX algorithm can greatly improve the estimation performance of WRELAX, especially when *M* is small as compared with *N*, as can be seen from the numerical examples below.

For notational convenience, we denote the new algorithms with and without exploiting the Toeplitz property of the noise covariance matrix by TWRELAX and WRELAX, respectively. Here, the time delay spacing between the two signals is *τ*<sup>2</sup> − *τ*<sup>1</sup> = 0.5*τe*.We have used *�* = 0.001 to test the convergence of TWRELAX and WRELAX. The colored noise is modeled as a first-order autoregressive (AR) process with coefficient *a*1=-0.85.

The MSEs of TWRELAX ("◦") and WRELAX ("×") are compared with the corresponding CRBs (solid line) in Figures 3 and 4 as a function of the SNR and the normalized look number log2(*M*/*N*), respectively. We fix *M* = 4*N* in Figure 3 and SNR=−5 dB in Figure 4. From Figure 3, it can be noted that, when *M* is sufficiently large, both TWRELAX and WRELAX can approach the CRBs for a wide range of SNRs and the former performs slightly better than the latter. However, when *M* is small as compared to *N*, TWRELAX outperforms WRELAX significantly, as can be seen from Figure 4. For this example, when *M* = *N*/2, the MSEs of TWRELAX are very close to the CRBs, while *M* = 4*N* is required before the MSEs of WRELAX 12 Will-be-set-by-IN-TECH

It can be shown that, in general, **Q** is no longer a Toeplitz matrix. However, we can use the Toeplitz property of **Q***<sup>t</sup>* to improve the estimation performance. First, we can obtain **Q**ˆ by

Due to a finite number of data vectors, **Q**ˆ *<sup>t</sup>* is no longer a Toeplitz matrix. Although there are

*r*ˆ(0) *r*ˆ(1) ··· *r*ˆ(*N* − 1) *<sup>r</sup>*ˆ∗(1) ... ... .

. ... ... *<sup>r</sup>*ˆ(1) *r*ˆ∗(*N* − 1) ··· *r*ˆ∗(1) *r*ˆ(0)

using (40). Then the estimate **Q**ˆ *<sup>t</sup>* of **Q***<sup>t</sup>* can be obtained by using (49), which is

*N*−*k* ∑ *i*=1

> . .

**<sup>Q</sup>**<sup>ˆ</sup> (*T*) = <sup>γ</sup>**Q**<sup>ˆ</sup> (*T*)

we obtain a new algorithm referred to as TWRELAX. The TWRELAX algorithm can greatly improve the estimation performance of WRELAX, especially when *M* is small as compared

For notational convenience, we denote the new algorithms with and without exploiting the Toeplitz property of the noise covariance matrix by TWRELAX and WRELAX, respectively. Here, the time delay spacing between the two signals is *τ*<sup>2</sup> − *τ*<sup>1</sup> = 0.5*τe*.We have used *�* = 0.001 to test the convergence of TWRELAX and WRELAX. The colored noise is modeled as a

The MSEs of TWRELAX ("◦") and WRELAX ("×") are compared with the corresponding CRBs (solid line) in Figures 3 and 4 as a function of the SNR and the normalized look number log2(*M*/*N*), respectively. We fix *M* = 4*N* in Figure 3 and SNR=−5 dB in Figure 4. From Figure 3, it can be noted that, when *M* is sufficiently large, both TWRELAX and WRELAX can approach the CRBs for a wide range of SNRs and the former performs slightly better than the latter. However, when *M* is small as compared to *N*, TWRELAX outperforms WRELAX significantly, as can be seen from Figure 4. For this example, when *M* = *N*/2, the MSEs of TWRELAX are very close to the CRBs, while *M* = 4*N* is required before the MSEs of WRELAX

**a**(−*π*) **a**(−*π* + 2*π*/*N*) ··· **a**(*π* − 2*π*/*N*)

*<sup>e</sup>jω*(−*N*/2) *<sup>e</sup>jω*(−*N*/2+1) ··· *<sup>e</sup>jω*(*N*/2−1) �*<sup>T</sup>* . (51)

**Q**ˆ *<sup>t</sup>* = γ*<sup>H</sup>* **Q**ˆ γ. (52)

*q*ˆ*t*(*i*, *i* + *k*), *k* = 0, 1, ··· , *N* − 1. (53)

⎤ ⎥ ⎥ ⎥ ⎥ ⎦

*<sup>t</sup>* <sup>γ</sup>*H*, (55)

. .

*<sup>t</sup>* , in this paper we use the following

. (54)

�*<sup>H</sup>* , (50)

where γ is the DFT matrix,

and

Then

<sup>γ</sup> <sup>=</sup> <sup>1</sup> <sup>√</sup>*<sup>N</sup>* �

**a**(*ω*) = �

many ways to to modify **<sup>Q</sup>**<sup>ˆ</sup> *<sup>t</sup>* to obtain a Toeplitz matrix **<sup>Q</sup>**<sup>ˆ</sup> (*T*)

*<sup>r</sup>*ˆ(*k*) = <sup>1</sup>

**Q**ˆ (*T*) *<sup>t</sup>* =

with *N*, as can be seen from the numerical examples below.

first-order autoregressive (AR) process with coefficient *a*1=-0.85.

Using **Q**ˆ (*T*) instead of **Q**ˆ in (45)-(48), where

simple approach. Let *q*ˆ*t*(*i*, *j*) be the (*i*, *j*)th element of **Q**ˆ *<sup>t</sup>*. Define

*N* − *k*

⎡ ⎢ ⎢ ⎢ ⎢ ⎣

Fig. 3. MSEs of WRELAX ("×"), TWRELAX ("◦"), and CRBs (solid line) for (a) *τ*<sup>1</sup> , (b) *α*<sup>1</sup> as a function of SNR with *M* = 4*N*.

Fig. 4. MSEs of WRELAX ("×"), TWRELAX ("◦"), and CRBs (solid line) for (a) *τ*<sup>1</sup> , (b) *α*<sup>1</sup> as a function of the normalized look number log2(*M*/*N*) with SNR=−5 dB.

approach the CRBs. Note also that in Figure 4, due to the inversion of poorly estimated noise covariance matrices, some points of the MSE curves of WRELAX are beyond the scope of the axis limits.

#### **3.3 Hybrid-WRELAX algorithm and EXtended Invariance Principle Based WRELAX (EXIP-WRELAX) algorithm**

When bandpass real-valued probe signals are used, the correlation function between the received and the known transmitted signals oscillates near the carrier frequency of the transmitted signal. In this case, many existing time delay estimation algorithms perform poorly due to converging to local optimum points. Here, two efficient algorithms are proposed to deal with the above problem. First we assume that the signal amplitudes are complex-valued and use WRELAX to obtain the initial estimates of the delays and the

Time Delay Estimation 15

FFT-Based Efficient Algorithms for Time Delay Estimation 69

based on this idea. The algorithm is referred to as the Hybrid-WRELAX algorithm. It simply

1 2 

*W*(−*N*/2), *W*(−*N*/2 + 1), ··· , *W*(−1), *W*(0)

*S*(−*N*/2), *S*(−*N*/2 + 1), ··· , *S*(0)

*Y*(−*N*/2) *Y*(−*N*/2 + 1) ··· *Y*(0)

*<sup>e</sup>jωl*(−*N*/2) *<sup>e</sup>jωl*(−*N*/2+1) ··· <sup>1</sup>

*L* ∑ *i*=1,*i*�=*l*

where �·� denotes the Euclidean norm. Minimizing *C*9(*αl*, *ωl*) with respect to the real-valued

**b**¯ *<sup>H</sup>*(*ωl*)**Y**¯ *<sup>l</sup>*

**a**¯ *<sup>H</sup>*(*ωl*)(**S**¯ <sup>∗</sup>**Y**¯ *<sup>l</sup>*)

*<sup>F</sup>* denotes the *Frobenius* norm (Stewart, 1973). (More specifically, � **S** �*F*=

**a**¯ *<sup>H</sup>*(*ωl*)(**S**¯ <sup>∗</sup>**Y**¯ *<sup>l</sup>*)

**b**¯ *<sup>H</sup>*(*ωl*)**Y**¯ *<sup>l</sup>*

 **<sup>b</sup>**¯ *<sup>H</sup>*(*ωl*)**b**¯(*ωl*) **<sup>b</sup>**¯(*ωl*)

  2

� **<sup>S</sup>**¯ �<sup>2</sup> *F*

**b**¯ *<sup>H</sup>*(*ωl*)**b**¯(*ωl*)

where (·)*<sup>H</sup>* denotes the conjugate transpose, Re(**Z**) represents the real part of **<sup>Z</sup>**, and

**<sup>Y</sup>**¯ *<sup>l</sup>* <sup>−</sup> Re

*<sup>T</sup>* , (58)

*<sup>T</sup>* . (60)

, (64)

, (65)

, (59)

, (57)

*α*ˆ*i*[**S**¯ **a**¯(*ω*ˆ *<sup>i</sup>*)] (61)

**<sup>b</sup>**¯(*ωl*) = **<sup>S</sup>**¯ **<sup>a</sup>**¯(*ωl*), *<sup>l</sup>* <sup>=</sup> 1, 2, ··· , *<sup>L</sup>*. (62)

*<sup>C</sup>*9(*αl*, *<sup>ω</sup>l*) =� **<sup>Y</sup>**¯ *<sup>l</sup>* <sup>−</sup> *<sup>α</sup>l***b***h*(*ωl*) �2, (63)

Before we present our approach, let us consider the following preparations. Let

, 1, ··· , 1, <sup>√</sup>

**<sup>Y</sup>**¯ *<sup>l</sup>* <sup>=</sup> **<sup>Y</sup>**¯ <sup>−</sup>

*<sup>α</sup>*ˆ*<sup>l</sup>* <sup>=</sup> Re

<sup>=</sup> Re

*<sup>n</sup>*=−*N*/2 <sup>|</sup>*W*(*n*)*S*(*n*)|2.) Then the estimate *<sup>ω</sup>*<sup>ˆ</sup> *<sup>l</sup>* of *<sup>ω</sup><sup>l</sup>* is obtained as follows:

 

> Re<sup>2</sup>

*<sup>ω</sup>*<sup>ˆ</sup> *<sup>l</sup>* <sup>=</sup> arg min *<sup>ω</sup><sup>l</sup>*

<sup>=</sup> arg max *<sup>ω</sup><sup>l</sup>*

*<sup>i</sup>*=1,*i*�=*<sup>l</sup>* are assumed to be given. Let

requires a sequence of weighted Fourier transforms.

**W**¯ = diag

and

Denote

�·� 

∑0

where {*α*ˆ*i*, *<sup>ω</sup>*<sup>ˆ</sup> *<sup>i</sup>*}*<sup>L</sup>*

Then (56) becomes

*α<sup>l</sup>* yields the estimate *α*ˆ*<sup>l</sup>* of *α<sup>l</sup>*

<sup>=</sup> diag <sup>√</sup>

**Y**¯ = **W**¯

**S**¯ = **W**¯ diag

**<sup>a</sup>**¯(*ωl*) =

1 2

amplitudes of the superimposed signals by minimizing a much smoother NLS cost function. Then the initial estimates are refined with two approaches. One approach (referred to as Hybrid-WRELAX) uses the last step of the WRELAX algorithm to minimize the true NLS cost function corresponding to the real-valued signal amplitudes. The other approach (referred to as EXIP-WRELAX) uses the extended invariance principle (EXIP). For Hybrid-WRELAX, the refinement step is iterative, while it is not for EXIP-WRELAX.

Real-valued signals are often bandpass signals that occur, for example, in underwater sonar and ultra wideband ground penetrating radar applications. Bandpass signals have highly oscillatory correlation functions, which makes the super resolution time delay estimation problem more difficult. The larger the center frequency of the pass band, the sharper the oscillation of the correlation function.

The same data model as that in the Section 2 is adopted. However, the transmitted signal *s*(*t*) and the received signal *<sup>y</sup>*(*t*) are real-valued, and the gains {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> and the noise *e*(*t*) are also real-valued.

#### **3.3.1 Hybrid-WRELAX**

Since both the transmitted signal *s*(*t*) and the received signal *y*(*t*) are real-valued, their Fourier transforms are conjugate symmetric, i.e., *Y*(−*k*) = *Y*∗(*k*) and *S*(−*k*) = *S*∗(*k*), *k* = 1, 2, ··· , *N*/2 − 1, where (·)<sup>∗</sup> denotes the complex conjugate, and *Y*(−*N*/2), *Y*(0), *S*(−*N*/2), and *S*(0) are real-valued. It can be readily shown that the cost function (6) is equivalent to

$$\mathcal{C}\_{\mathsf{S}}(\{\boldsymbol{\omega}\_{l},\boldsymbol{\omega}\_{l}\}\_{l=1}^{L}) = \sum\_{k=-N/2}^{0} \mathcal{W}^{2}(k) \left| Y(k) - \mathcal{S}(k) \sum\_{l=1}^{L} \boldsymbol{\omega}\_{l} e^{j\boldsymbol{\omega}\_{l}k} \right|^{2},\tag{56}$$

where {*W*(*k*) = <sup>1</sup>}−<sup>1</sup> *<sup>k</sup>*=−*N*/2+<sup>1</sup> and *<sup>W</sup>*(−*N*/2) = *<sup>W</sup>*(0) = 1/√2. We assume that *<sup>e</sup>*(*nTs*) is a real-valued zero-mean white Gaussian random process with variance *σ*2. Yet *E*(*k*) will not be a circularly symmetric complex-valued zero-mean white Gaussian random process since *E*(−*k*) = *E*∗(*k*), *k* = 1, 2, ··· , *N*/2 − 1. (The circularly symmetric assumption on the noise is widely used in the literature (Stoica & Moses, 1997).)

The cost function *<sup>C</sup>*8({*αl*, *<sup>ω</sup>l*}*<sup>L</sup> <sup>l</sup>*=1) in (56) with {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> being real-valued is referred to as the true cost function. Minimizing *<sup>C</sup>*8({*αl*, *<sup>ω</sup>l*}*<sup>L</sup> <sup>l</sup>*=1) with respect to the unknown parameters is a highly nonlinear optimization problem. For narrowband transmitted signals, the cost function is highly oscillatory and have numerous closely spaced local minima, which makes it very difficult to find the global minimum. By assuming the real-valued amplitudes {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> to be complex-valued, a much smoother cost function can be obtained. This is equivalent to formulate the original time delay estimation problem in its complex analytic signal form. Since the analytic signal of the transmitted signal is lowpass, its autocorrelation function is no longer oscillatory. This is the conventional complex demodulation process and is widely used in practice. Although it is much easier to find the global minimum of the cost function corresponding to complex-valued amplitudes, the so-obtained estimates can be much less accurate than those obtained by minimizing the true cost function. The two cost functions share the same global minimum only when there is no noise. However, as suggested in (Manickam et al., 1994; Vaccaro et al., 1992), we can minimize the cost function associated with complex-valued amplitudes to obtain the initial conditions needed to minimize the true cost function. Below, we present a relaxation based global minimizer of the NLS criterion based on this idea. The algorithm is referred to as the Hybrid-WRELAX algorithm. It simply requires a sequence of weighted Fourier transforms.

Before we present our approach, let us consider the following preparations. Let

$$\begin{split} \bar{\mathbf{W}} &= \text{diag}\left\{ W(-N/2), W(-N/2+1), \dots, W(-1), W(0) \right\} \\ &= \text{diag}\left\{ \frac{1}{\sqrt{2}}, 1, \dots, 1, \frac{1}{\sqrt{2}} \right\}, \end{split} \tag{57}$$

$$\bar{\mathbf{Y}} = \bar{\mathbf{W}} \left[ \mathbf{Y}(-\mathbf{N}/2) \, \mathbf{Y}(-\mathbf{N}/2+1) \, \cdots \, \mathbf{Y}(0) \right]^T,\tag{58}$$

$$\bar{\mathbf{S}} = \bar{\mathbf{W}} \text{diag}\left\{ \mathbf{S}(-N/2), \mathbf{S}(-N/2+1), \dots, \mathbf{S}(0) \right\},\tag{59}$$

and

14 Will-be-set-by-IN-TECH

amplitudes of the superimposed signals by minimizing a much smoother NLS cost function. Then the initial estimates are refined with two approaches. One approach (referred to as Hybrid-WRELAX) uses the last step of the WRELAX algorithm to minimize the true NLS cost function corresponding to the real-valued signal amplitudes. The other approach (referred to as EXIP-WRELAX) uses the extended invariance principle (EXIP). For Hybrid-WRELAX, the

Real-valued signals are often bandpass signals that occur, for example, in underwater sonar and ultra wideband ground penetrating radar applications. Bandpass signals have highly oscillatory correlation functions, which makes the super resolution time delay estimation problem more difficult. The larger the center frequency of the pass band, the sharper the

The same data model as that in the Section 2 is adopted. However, the transmitted signal *s*(*t*)

Since both the transmitted signal *s*(*t*) and the received signal *y*(*t*) are real-valued, their Fourier transforms are conjugate symmetric, i.e., *Y*(−*k*) = *Y*∗(*k*) and *S*(−*k*) = *S*∗(*k*), *k* = 1, 2, ··· , *N*/2 − 1, where (·)<sup>∗</sup> denotes the complex conjugate, and *Y*(−*N*/2), *Y*(0), *S*(−*N*/2), and *S*(0)

> *W*2(*k*)

real-valued zero-mean white Gaussian random process with variance *σ*2. Yet *E*(*k*) will not be a circularly symmetric complex-valued zero-mean white Gaussian random process since *E*(−*k*) = *E*∗(*k*), *k* = 1, 2, ··· , *N*/2 − 1. (The circularly symmetric assumption on the noise

highly nonlinear optimization problem. For narrowband transmitted signals, the cost function is highly oscillatory and have numerous closely spaced local minima, which makes it very difficult to find the global minimum. By assuming the real-valued amplitudes {*αl*}*<sup>L</sup>*

be complex-valued, a much smoother cost function can be obtained. This is equivalent to formulate the original time delay estimation problem in its complex analytic signal form. Since the analytic signal of the transmitted signal is lowpass, its autocorrelation function is no longer oscillatory. This is the conventional complex demodulation process and is widely used in practice. Although it is much easier to find the global minimum of the cost function corresponding to complex-valued amplitudes, the so-obtained estimates can be much less accurate than those obtained by minimizing the true cost function. The two cost functions share the same global minimum only when there is no noise. However, as suggested in (Manickam et al., 1994; Vaccaro et al., 1992), we can minimize the cost function associated with complex-valued amplitudes to obtain the initial conditions needed to minimize the true cost function. Below, we present a relaxation based global minimizer of the NLS criterion

*Y*(*k*) − *S*(*k*)

*<sup>k</sup>*=−*N*/2+<sup>1</sup> and *<sup>W</sup>*(−*N*/2) = *<sup>W</sup>*(0) = 1/√2. We assume that *<sup>e</sup>*(*nTs*) is a

*L* ∑ *l*=1 *<sup>α</sup>le<sup>j</sup>ωlk* 

2

*<sup>l</sup>*=<sup>1</sup> being real-valued is referred to as the

*<sup>l</sup>*=1) with respect to the unknown parameters is a

, (56)

*<sup>l</sup>*=<sup>1</sup> to

are real-valued. It can be readily shown that the cost function (6) is equivalent to

*<sup>l</sup>*=1) in (56) with {*αl*}*<sup>L</sup>*

0 ∑ *k*=−*N*/2 *<sup>l</sup>*=<sup>1</sup> and the noise *e*(*t*) are also

refinement step is iterative, while it is not for EXIP-WRELAX.

and the received signal *<sup>y</sup>*(*t*) are real-valued, and the gains {*αl*}*<sup>L</sup>*

*<sup>l</sup>*=1) =

is widely used in the literature (Stoica & Moses, 1997).)

oscillation of the correlation function.

*<sup>C</sup>*8({*αl*, *<sup>ω</sup>l*}*<sup>L</sup>*

true cost function. Minimizing *<sup>C</sup>*8({*αl*, *<sup>ω</sup>l*}*<sup>L</sup>*

real-valued.

**3.3.1 Hybrid-WRELAX**

where {*W*(*k*) = <sup>1</sup>}−<sup>1</sup>

The cost function *<sup>C</sup>*8({*αl*, *<sup>ω</sup>l*}*<sup>L</sup>*

$$\bar{\mathbf{a}}(\omega\_l) = \begin{bmatrix} e^{j\omega\_l(-N/2)} \ e^{j\omega\_l(-N/2+1)} \ \cdots \ 1 \end{bmatrix}^T. \tag{60}$$

Denote

$$\bar{\mathbf{Y}}\_{l} = \bar{\mathbf{Y}} - \sum\_{i=1, i \neq l}^{L} \hat{\mathbf{a}}\_{i} [\bar{\mathbf{S}} \bar{\mathbf{a}}(\hat{\boldsymbol{\omega}}\_{i})] \tag{61}$$

where {*α*ˆ*i*, *<sup>ω</sup>*<sup>ˆ</sup> *<sup>i</sup>*}*<sup>L</sup> <sup>i</sup>*=1,*i*�=*<sup>l</sup>* are assumed to be given. Let

$$\bar{\mathbf{b}}(\omega\_l) = \bar{\mathbf{S}}\bar{\mathbf{a}}(\omega\_l), \quad l = 1, 2, \cdots, L. \tag{62}$$

Then (56) becomes

$$\mathbf{C}\_{\theta}(\boldsymbol{\alpha}\_{l}, \boldsymbol{\omega}\_{l}) = \left\| \begin{array}{c} \mathbf{\bar{Y}}\_{l} - \boldsymbol{\alpha}\_{l} \mathbf{b}\_{h}(\boldsymbol{\omega}\_{l}) \; \right\| \; \right\| \; \tag{63}$$

where �·� denotes the Euclidean norm. Minimizing *C*9(*αl*, *ωl*) with respect to the real-valued *α<sup>l</sup>* yields the estimate *α*ˆ*<sup>l</sup>* of *α<sup>l</sup>*

$$\begin{split} \mathbb{A}\_{l} &= \frac{\text{Re}\left[\bar{\mathbf{b}}^{H}(\omega\_{l})\bar{\mathbf{Y}}\_{l}\right]}{\bar{\mathbf{b}}^{H}(\omega\_{l})\bar{\mathbf{b}}(\omega\_{l})} \\ &= \frac{\text{Re}\left[\bar{\mathbf{a}}^{H}(\omega\_{l})(\bar{\mathbf{S}}^{\*}\bar{\mathbf{Y}}\_{l})\right]}{\|\,\bar{\mathbf{S}}\,\|\_{F}^{2}} \end{split} \tag{64}$$

where (·)*<sup>H</sup>* denotes the conjugate transpose, Re(**Z**) represents the real part of **<sup>Z</sup>**, and �·� *<sup>F</sup>* denotes the *Frobenius* norm (Stewart, 1973). (More specifically, � **S** �*F*= ∑0 *<sup>n</sup>*=−*N*/2 <sup>|</sup>*W*(*n*)*S*(*n*)|2.) Then the estimate *<sup>ω</sup>*<sup>ˆ</sup> *<sup>l</sup>* of *<sup>ω</sup><sup>l</sup>* is obtained as follows:

$$\begin{split} \hat{\omega}\_{l} &= \arg\min\_{\omega\_{l}} \left\lVert \bar{\mathbf{Y}}\_{l} - \frac{\mathrm{Re}\left[\bar{\mathbf{b}}^{H}(\omega\_{l})\bar{\mathbf{Y}}\_{l}\right]}{\bar{\mathbf{b}}^{H}(\omega\_{l})\bar{\mathbf{b}}(\omega\_{l})}\bar{\mathbf{b}}(\omega\_{l}) \right\rVert^{2} \\ &= \arg\max\_{\omega\_{l}} \mathrm{Re}^{2}\left[\bar{\mathbf{a}}^{H}(\omega\_{l})(\bar{\mathbf{S}}^{\*}\bar{\mathbf{Y}}\_{l})\right], \end{split} \tag{65}$$

Time Delay Estimation 17

FFT-Based Efficient Algorithms for Time Delay Estimation 71

minimizers of a given cost function parameterized in two different ways in some special cases. By appropriately reparameterizing the original cost function and enlarging the supporting domain of the parameter space, less accurate estimates can be obtained from this simple data model. These estimates may be refined to asymptotically achieve the performance available using the original data model. This is the basic idea behind the Extended Invariance Principle (EXIP) proposed in (Söderström & Stoica a, 1989) and (Söderström & Stoica b, 1989) for the purpose of achieving some computational advantages. In this section, we present an EXIP based algorithm, referred to as the EXIP-WRELAX algorithm, that avoids dealing with the

*<sup>l</sup>*=<sup>1</sup> being real-valued can be written in

�*<sup>T</sup>* , (70)

�*<sup>T</sup>* . (71)

, (72)

�*<sup>T</sup>* . (74)

*C*10(η), (75)

⎦ , (78)

<sup>η</sup>˜ *<sup>C</sup>*11(η˜). (76)

*f*(η) = **F**η, (77)

η˜ by solving the following weighted least squares

*l*=1

<sup>α</sup>*<sup>T</sup>* <sup>ω</sup>*<sup>T</sup>* �*<sup>T</sup>* , (69)

*<sup>l</sup>*=<sup>1</sup> with the complex-valued amplitudes {*α*˜*l*}*<sup>L</sup>*

Re*T*(α˜ ) Im*T*(α˜ ) <sup>ω</sup>*<sup>T</sup>* �*<sup>T</sup>* , (73)

, (68)

highly oscillatory true cost function entirely.

By replacing the real-valued amplitudes {*αl*}*<sup>L</sup>*

with Im(**Z**) denotes the imaginary part of **Z**, and

1989), we can obtain a new estimate ˆ

the following vector form

where

with

where

Denote

and

Let

where

By using (58) and (62), the cost function (56) with {*αl*}*<sup>L</sup>*

*C*10(η) =

� � � � � **<sup>Y</sup>**¯ <sup>−</sup> *L* ∑ *l*=1

η = �

*α*<sup>1</sup> *α*<sup>2</sup> ··· *α<sup>L</sup>*

*ω*<sup>1</sup> *ω*<sup>2</sup> ··· *ω<sup>L</sup>*

*α*˜ <sup>1</sup> *α*˜ <sup>2</sup> ··· *α*˜ *<sup>L</sup>*

⎤

(notations introduced for the sake of clarity) in (68), we obtain the following cost function:

� � � � � **<sup>Y</sup>**¯ <sup>−</sup> *L* ∑ *l*=1

α = �

ω = �

*C*11(η˜) =

η˜ = �

α˜ = �

ˆ

<sup>η</sup><sup>ˆ</sup> <sup>=</sup> arg min<sup>η</sup>

η˜ = arg min

**F** = ⎡ ⎣ **I 0 0 0 0 I**

ηˆ from ˆ

with **I** and **0** denote the *L* × *L* identity matrix and the *L* × *L* matrix with zero elements, respectively. Using the EXIP principle (Söderström & Stoica a, 1989; Söderström & Stoica b,

*αl***b**¯(*ωl*)

*<sup>α</sup>*˜*l***b**¯(*ωl*)

� � � � �

2

2

� � � � �

2

2

where we have used the fact that **b**¯ *<sup>H</sup>*(*ωl*)**b**¯(*ωl*) = ∑<sup>0</sup> *<sup>n</sup>*=−*N*/2 <sup>|</sup>*W*(*n*)*S*(*n*)<sup>|</sup> <sup>2</sup> and hence is independent of *ωl*. Hence *ω*ˆ *<sup>l</sup>* is obtained as the location of the dominant peak of Re<sup>2</sup> **a**¯ *<sup>H</sup>*(*ωl*)(**S**¯ <sup>∗</sup>**Y**¯ *<sup>l</sup>*) . With the estimate of *ω<sup>l</sup>* at hand, *α*ˆ*<sup>l</sup>* is easily computed from the corresponding complex height by using *ω*ˆ *<sup>l</sup>* to replace *ω<sup>l</sup>* in (64).

Similarly, minimizing *C*9(*αl*, *ωl*) with respect to *ω<sup>l</sup>* and the complex-valued *αl*, respectively, yields the estimates *ω*ˆ *<sup>l</sup>* of *ω<sup>l</sup>* and *α*ˆ*<sup>l</sup>* of *αl*,

$$
\hat{\omega}\_l = \arg\max\_{\omega\_l} \left| \mathbf{a}^H(\omega\_l) (\bar{\mathbf{S}}^\* \bar{\mathbf{Y}}\_l) \right|^2 \tag{66}
$$

and

$$\hat{\boldsymbol{\alpha}}\_{l} = \frac{\mathbf{a}^{H}(\omega\_{l})(\bar{\mathbf{S}}^{\*}\bar{\mathbf{Y}}\_{l})}{\|\|\bar{\mathbf{S}}\|\|\_{F}^{2}}\bigg|\_{\omega\_{l} = \hat{\omega}\_{l}}\tag{67}$$

where *ω*ˆ *<sup>l</sup>* can also be found via FFT with the weighted data vector.

With the above simple preparations, we now present the Hybrid-WRELAX algorithm.

**Step 1:** Obtain the initial conditions for Step 2 by assuming that {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> are complex-valued and using the WRELAX algorithm as follows:

**Substep (1):** Assume *L* = 1. Obtain {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> from **Y** by using (66) and (67).

**Substep (2):** Assume *L* = 2. Compute **Y**<sup>2</sup> with (96) by using {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> obtained in Substep (1). Obtain {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>2</sup> from **Y**2. Next, compute **Y**<sup>1</sup> by using {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>2</sup> and then redetermine {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> from **Y**1.

Iterate the update of {*ω*ˆ 2, *α*ˆ <sup>2</sup>} and {*ω*ˆ 1, *α*ˆ <sup>1</sup>} until "practical convergence" is achieved (to be discussed later on).

**Substep (3):** Assume *<sup>L</sup>* <sup>=</sup> 3. Compute **<sup>Y</sup>***<sup>h</sup>* by using {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}<sup>2</sup> *<sup>l</sup>*=<sup>1</sup> obtained in Substep (2). Obtain {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}*l*=<sup>3</sup> from **<sup>Y</sup>**3. Next, compute **<sup>Y</sup>**<sup>1</sup> by using {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}<sup>3</sup> *<sup>l</sup>*=<sup>2</sup> and redetermine {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> from **Y**1. Then compute **Y**<sup>2</sup> by using {*ω*ˆ 1, *α*ˆ <sup>1</sup>}*l*=1,3 and redetermine {*ω*ˆ 1, *α*ˆ <sup>1</sup>}*l*=<sup>2</sup> from **Y**2.

Iterate the update of {*ω*ˆ 3, *α*ˆ <sup>3</sup>},{*ω*ˆ 1, *α*ˆ <sup>1</sup>}, and {*ω*ˆ 2, *α*ˆ <sup>2</sup>} until "practical convergence".


Note that WRELAX can be used directly for signals with real-valued amplitudes. For this case, the approach would consist of the substeps of Step 1 above except that (64) and (65) will be used instead of (66) and (67), respectively. We will use a numerical example in Section 5 to show the problem encountered by the direct use of WRELAX when the cost function is highly oscillatory.

#### **3.3.2 EXIP-WRELAX**

The Invariance Principle (IP) of ML estimators is well known in the estimation theory (Zehna, 1966). The invariance principle gives a simple answer to the relationship between the minimizers of a given cost function parameterized in two different ways in some special cases. By appropriately reparameterizing the original cost function and enlarging the supporting domain of the parameter space, less accurate estimates can be obtained from this simple data model. These estimates may be refined to asymptotically achieve the performance available using the original data model. This is the basic idea behind the Extended Invariance Principle (EXIP) proposed in (Söderström & Stoica a, 1989) and (Söderström & Stoica b, 1989) for the purpose of achieving some computational advantages. In this section, we present an EXIP based algorithm, referred to as the EXIP-WRELAX algorithm, that avoids dealing with the highly oscillatory true cost function entirely.

By using (58) and (62), the cost function (56) with {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> being real-valued can be written in the following vector form

$$\mathbf{C}\_{10}(\eta) = \left\| \bar{\mathbf{Y}} - \sum\_{l=1}^{L} a\_l \bar{\mathbf{b}}(\omega\_l) \right\|\_{2}^{2} \tag{68}$$

where

16 Will-be-set-by-IN-TECH

is independent of *ωl*. Hence *ω*ˆ *<sup>l</sup>* is obtained as the location of the dominant peak of

Similarly, minimizing *C*9(*αl*, *ωl*) with respect to *ω<sup>l</sup>* and the complex-valued *αl*, respectively,

 

*<sup>α</sup>*ˆ*<sup>l</sup>* <sup>=</sup> **<sup>a</sup>***H*(*ωl*)(**S**¯ <sup>∗</sup>**Y**¯ *<sup>l</sup>*) � **<sup>S</sup>**¯ �<sup>2</sup> *F*

With the above simple preparations, we now present the Hybrid-WRELAX algorithm.

**Substep (1):** Assume *L* = 1. Obtain {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> from **Y** by using (66) and (67).

**Substep (2):** Assume *L* = 2. Compute **Y**<sup>2</sup> with (96) by using {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> obtained in Substep (1). Obtain {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>2</sup> from **Y**2. Next, compute **Y**<sup>1</sup> by using {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>2</sup> and

Iterate the update of {*ω*ˆ 2, *α*ˆ <sup>2</sup>} and {*ω*ˆ 1, *α*ˆ <sup>1</sup>} until "practical convergence" is achieved (to

{*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> from **Y**1. Then compute **Y**<sup>2</sup> by using {*ω*ˆ 1, *α*ˆ <sup>1</sup>}*l*=1,3 and redetermine

*<sup>l</sup>*=<sup>1</sup> and the real parts of {*α*ˆ*l*}*<sup>L</sup>*

Iterate the update of {*ω*ˆ 3, *α*ˆ <sup>3</sup>},{*ω*ˆ 1, *α*ˆ <sup>1</sup>}, and {*ω*ˆ 2, *α*ˆ <sup>2</sup>} until "practical convergence". **Remaining Substeps:** Continue similarly until *L* is equal to the desired or estimated

**Step 2:** Refine the estimates obtained in Step 1 with the last step of the WRELAX algorithm (i.e., the last substep of Step 1 above) by using Equations (64) and (65) derived for the

1 as initial conditions. Iteratively update {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}, *l* = 1, 2, ··· , *L*, until "practical

Note that WRELAX can be used directly for signals with real-valued amplitudes. For this case, the approach would consist of the substeps of Step 1 above except that (64) and (65) will be used instead of (66) and (67), respectively. We will use a numerical example in Section 5 to show the problem encountered by the direct use of WRELAX when the cost function is highly

The Invariance Principle (IP) of ML estimators is well known in the estimation theory (Zehna, 1966). The invariance principle gives a simple answer to the relationship between the

. With the estimate of *ω<sup>l</sup>* at hand, *α*ˆ*<sup>l</sup>* is easily computed from the

 2

**<sup>a</sup>***H*(*ωl*)(**S**¯ <sup>∗</sup>**Y**¯ *<sup>l</sup>*)

 *ωl*=*ω*ˆ *<sup>l</sup>* *<sup>n</sup>*=−*N*/2 <sup>|</sup>*W*(*n*)*S*(*n*)<sup>|</sup>

, (66)

, (67)

*<sup>l</sup>*=<sup>1</sup> are complex-valued

*<sup>l</sup>*=<sup>1</sup> obtained in Substep (2).

*<sup>l</sup>*=<sup>2</sup> and redetermine

*<sup>l</sup>*=<sup>1</sup> obtained in Step

<sup>2</sup> and hence

where we have used the fact that **b**¯ *<sup>H</sup>*(*ωl*)**b**¯(*ωl*) = ∑<sup>0</sup>

corresponding complex height by using *ω*ˆ *<sup>l</sup>* to replace *ω<sup>l</sup>* in (64).

where *ω*ˆ *<sup>l</sup>* can also be found via FFT with the weighted data vector.

and using the WRELAX algorithm as follows:

then redetermine {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> from **Y**1.

be discussed later on).

{*ω*ˆ 1, *α*ˆ <sup>1</sup>}*l*=<sup>2</sup> from **Y**2.

number of signals.

real-valued {*αl*}*<sup>L</sup>*

convergence".

**3.3.2 EXIP-WRELAX**

oscillatory.

**Step 1:** Obtain the initial conditions for Step 2 by assuming that {*αl*}*<sup>L</sup>*

**Substep (3):** Assume *<sup>L</sup>* <sup>=</sup> 3. Compute **<sup>Y</sup>***<sup>h</sup>* by using {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}<sup>2</sup>

*<sup>l</sup>*=<sup>1</sup> and using {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup>*

Obtain {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}*l*=<sup>3</sup> from **<sup>Y</sup>**3. Next, compute **<sup>Y</sup>**<sup>1</sup> by using {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}<sup>3</sup>

*<sup>ω</sup>*<sup>ˆ</sup> *<sup>l</sup>* <sup>=</sup> arg max *<sup>ω</sup><sup>l</sup>*

Re<sup>2</sup>

and

**a**¯ *<sup>H</sup>*(*ωl*)(**S**¯ <sup>∗</sup>**Y**¯ *<sup>l</sup>*)

yields the estimates *ω*ˆ *<sup>l</sup>* of *ω<sup>l</sup>* and *α*ˆ*<sup>l</sup>* of *αl*,

$$\boldsymbol{\eta} = \begin{bmatrix} \boldsymbol{\alpha}^T \ \boldsymbol{\omega}^T \end{bmatrix}^T,\tag{69}$$

with

$$\boldsymbol{\alpha} = \begin{bmatrix} \boldsymbol{\mathfrak{a}}\_1 \ \boldsymbol{\mathfrak{a}}\_2 \ \cdots \ \boldsymbol{\mathfrak{a}}\_L \end{bmatrix}^T \tag{70}$$

$$
\omega = \begin{bmatrix} \omega\_1 \ \omega\_2 \ \cdots \ \omega\_L \end{bmatrix}^T. \tag{71}
$$

By replacing the real-valued amplitudes {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> with the complex-valued amplitudes {*α*˜*l*}*<sup>L</sup> l*=1 (notations introduced for the sake of clarity) in (68), we obtain the following cost function:

$$\mathcal{C}\_{11}(\vec{\eta}) = \left\| \dot{\mathbf{Y}} - \sum\_{l=1}^{L} \tilde{a}\_{l} \tilde{\mathbf{b}}(\omega\_{l}) \right\|\_{2}^{2} \tag{72}$$

where

$$\tilde{\eta} = \left[ \text{Re}^T(\tilde{\alpha}) \, \text{Im}^T(\tilde{\alpha}) \, \omega^T \right]^T \, \text{.} \tag{73}$$

with Im(**Z**) denotes the imaginary part of **Z**, and

$$\tilde{\alpha} = \begin{bmatrix} \tilde{\mathfrak{a}}\_1 \ \tilde{\mathfrak{a}}\_2 \ \cdots \ \tilde{\mathfrak{a}}\_L \end{bmatrix}^T. \tag{74}$$

Denote

$$
\hat{\eta} = \arg\min\_{\eta} C\_{10}(\eta),
\tag{75}
$$

and

$$
\hat{\eta} = \arg\min\_{\tilde{\eta}} \ C\_{11}(\tilde{\eta}).\tag{76}
$$

Let

$$f(\eta) = \mathbf{F}\eta.\tag{77}$$

where

$$\mathbf{F} = \begin{bmatrix} \mathbf{I} \ \mathbf{0} \\ \mathbf{0} \ \mathbf{0} \\ \mathbf{0} \ \mathbf{I} \end{bmatrix},\tag{78}$$

with **I** and **0** denote the *L* × *L* identity matrix and the *L* × *L* matrix with zero elements, respectively. Using the EXIP principle (Söderström & Stoica a, 1989; Söderström & Stoica b, 1989), we can obtain a new estimate ˆ ηˆ from ˆ η˜ by solving the following weighted least squares

Time Delay Estimation 19

FFT-Based Efficient Algorithms for Time Delay Estimation 73

worse than the latter CRB due to the parsimony principle In this example, Hybrid-WRELAX

WRELAX was extended in to deal with the real-valued signals with highly oscillatory correlation functions (Hybrid-WRELAX and EXIP-WRELAX). The resolution of WRELAX are much higher than that of the conventional matched filter approach. However, when the signals are very closely spaced in arrival times, the convergence speed of WRELAX decreases rapidly. Here, we study how MODE can be used with our efficient WRELAX algorithm for super resolution time delay estimation. The new algorithm is referred to as MODE-WRELAX. Although MODE can provide very poor amplitude estimates and WRELAX has the slow convergence problem, MODE-WRELAX outperforms both MODE and WRELAX. MODE-WRELAX can be used for both complex- and real-valued signals (including

The same data model as that in the Section 2 is adopted. Here, the transmitted signal *s*(*t*) represents an arbitrary known transmitted signal. We assume that *<sup>s</sup>*(*t*) , *<sup>y</sup>*(*t*), *<sup>e</sup>*(*t*) and {*αl*}*<sup>L</sup>*

are either all complex-valued or all real valued, which will be dealt with in the following

**a**(*ω*1) **a**(*ω*2) ··· **a**(*ωL*)

When **S** is an identity matrix, then the above time delay estimation issue becomes a sinusoidal parameter estimation problem and MODE is an asymptotically statistically efficient estimator

*<sup>l</sup>*=<sup>1</sup> for complex-valued signals (Stoica & Sharman a, 1990; Stoica & Sharman b, 1990). The MODE algorithm (Stoica & Sharman a, 1990; Stoica & Sharman b, 1990) can be easily extended to the data model in (83) where **S** is an arbitrary diagonal matrix as follows.

*<sup>l</sup>*=1) = **<sup>Y</sup>***H***P**<sup>⊥</sup>

 **A**˜ *<sup>H</sup>***A**˜ −<sup>1</sup>

*b*<sup>0</sup> ˜ *<sup>b</sup>*<sup>1</sup> ··· ˜ *bL T* *l*=1

*<sup>T</sup>* , (82)

**Y** = **SA**α + **E**. (83)

*<sup>l</sup>*=<sup>1</sup> can be obtained by minimizing the following cost

**A**˜ = **SA**. (86)

*bl*}*<sup>L</sup>*

, where {˜

**<sup>A</sup>**˜ **Y**, (84)

**A**˜ *<sup>H</sup>*, (85)

*<sup>l</sup>*=1) can also be reparametrized in

*<sup>l</sup>*=<sup>0</sup> are the coefficients of

**3.4 Method Of Direction Estimation based WRELAX (MODE-WRELAX) algorithm**

outperforms EXIP-WRELAX at low SNR.

those with highly oscillatory correlation functions).

**3.4.1 MODE-WRELAX for complex-valued signals**

**A** =

Then the data model (4) can be written in the following vector form:

where **Y**, **S** and **E** are same as that in (7)-(9), α is same as that in (70).

*<sup>l</sup>*=<sup>1</sup> of {*ωl*}*<sup>L</sup>*

**P**⊥

To avoid the search over the parameter space, *<sup>C</sup>*10({*ωl*}*<sup>L</sup>*

terms of another parameter vector **b**˜ = ˜

*<sup>C</sup>*12({*ωl*}*<sup>L</sup>*

**<sup>A</sup>**˜ <sup>=</sup> **<sup>I</sup>** <sup>−</sup> **<sup>A</sup>**˜

Assume that (·)*<sup>T</sup>* denote the transpose and let

where **a**(*ωl*) is same as that in (10).

The MODE estimates {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup>*

respectively.

of {*ωl*}*<sup>L</sup>*

function

and

problem

$$\hat{\boldsymbol{\eta}} = \arg\min\_{\boldsymbol{\eta}} \left[ \hat{\boldsymbol{\eta}} - f(\boldsymbol{\eta}) \right]^T \mathbf{W}\_{\text{EXIP}} \left[ \hat{\boldsymbol{\eta}} - f(\boldsymbol{\eta}) \right],\tag{79}$$

where

$$\mathbf{W}\_{\text{EXIP}} = E\left\{ \frac{\partial^2 \left[ \mathbb{C}\_{\eta}(\vec{\eta}) \right]}{\partial \vec{\eta} \partial \vec{\eta}^T} \right\} \bigg|\_{\eta = \hat{\eta}} . \tag{80}$$

It has been shown in (Söderström & Stoica a, 1989; Söderström & Stoica b, 1989) that ˆ ηˆ is asymptotically (for large *N* or high SNR) statistically equivalent to ˆη. The weighting matrix **W**EXIP is simply the *Fisher In f ormation Matrix* (possibly scaled by a constant) for the complex-valued {*α*˜*l*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> with ˜<sup>η</sup> replaced by its estimate <sup>ˆ</sup> η˜ . It can be easily shown that

$$\hat{\eta} = \left(\mathbf{F}^T \mathbf{W}\_{\text{EXIP}} \mathbf{F}\right)^{-1} \left(\mathbf{F}^T \mathbf{W}\_{\text{EXIP}}\right) \hat{\eta}. \tag{81}$$

The EXIP-WRELAX algorithm is composed of two steps. The first step is the same as Step 1 of the Hybrid-WRELAX algorithm and the second step is to refine the initial conditions obtained in Step 1 by using (81). Compared to the Hybrid-WRELAX algorithm, the second step of the EXIP-WRELAX algorithm is non-iterative and avoids dealing with the highly oscillatory true NLS cost function entirely. Our numerical examples show that at low SNR, the former tends to outperform the latter.

Fig. 5. Comparison of the MSEs of the WRELAX for assuming complex-valued signal amplitudes ("+"), Hybrid-WRELAX ("◦"), and EXIP-WRELAX ("×") with the CRBs corresponding to complex-valued (dashed line) and real-valued (solid line) signal amplitudes for (a) *τ*1, (b) *α*1.

The MSEs of the WRELAX ("+") for assuming {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> being complex-valued, Hybrid-WRELAX ("◦"), and EXIP-WRELAX ("×") are compared with the CRBs obtained by assuming {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> being complex-valued (dashed line) and real-valued (solid line) in Figure 5. Note that both Hybrid-WRELAX and EXIP-WRELAX achieve the corresponding CRB. Note also that the threshold effect is obvious in Figure 5, where the MSEs deviate away from the CRBs at low SNR. Although the WRELAX for assuming {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> being complex-valued also attains its corresponding CRB (dashed line) at high SNR, this wrong CRB can be larger than the true CRB by approximately 30 dB. (Note that the former CRB is expected to be worse than the latter CRB due to the parsimony principle In this example, Hybrid-WRELAX outperforms EXIP-WRELAX at low SNR.

#### **3.4 Method Of Direction Estimation based WRELAX (MODE-WRELAX) algorithm**

WRELAX was extended in to deal with the real-valued signals with highly oscillatory correlation functions (Hybrid-WRELAX and EXIP-WRELAX). The resolution of WRELAX are much higher than that of the conventional matched filter approach. However, when the signals are very closely spaced in arrival times, the convergence speed of WRELAX decreases rapidly. Here, we study how MODE can be used with our efficient WRELAX algorithm for super resolution time delay estimation. The new algorithm is referred to as MODE-WRELAX. Although MODE can provide very poor amplitude estimates and WRELAX has the slow convergence problem, MODE-WRELAX outperforms both MODE and WRELAX. MODE-WRELAX can be used for both complex- and real-valued signals (including those with highly oscillatory correlation functions).

The same data model as that in the Section 2 is adopted. Here, the transmitted signal *s*(*t*) represents an arbitrary known transmitted signal. We assume that *<sup>s</sup>*(*t*) , *<sup>y</sup>*(*t*), *<sup>e</sup>*(*t*) and {*αl*}*<sup>L</sup> l*=1 are either all complex-valued or all real valued, which will be dealt with in the following respectively.

#### **3.4.1 MODE-WRELAX for complex-valued signals**

Assume that (·)*<sup>T</sup>* denote the transpose and let

$$\mathbf{A} = \begin{bmatrix} \mathbf{a}(\omega\_1) \ \mathbf{a}(\omega\_2) \cdots \mathbf{a}(\omega\_L) \end{bmatrix}^T,\tag{82}$$

where **a**(*ωl*) is same as that in (10).

Then the data model (4) can be written in the following vector form:

$$\mathbf{Y} = \mathbf{S} \mathbf{A} \boldsymbol{\alpha} + \mathbf{E}.\tag{83}$$

where **Y**, **S** and **E** are same as that in (7)-(9), α is same as that in (70).

When **S** is an identity matrix, then the above time delay estimation issue becomes a sinusoidal parameter estimation problem and MODE is an asymptotically statistically efficient estimator of {*ωl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> for complex-valued signals (Stoica & Sharman a, 1990; Stoica & Sharman b, 1990). The MODE algorithm (Stoica & Sharman a, 1990; Stoica & Sharman b, 1990) can be easily extended to the data model in (83) where **S** is an arbitrary diagonal matrix as follows. The MODE estimates {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> of {*ωl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> can be obtained by minimizing the following cost function

$$\mathbf{C}\_{12}(\{\omega\_l\}\_{l=1}^L) = \mathbf{Y}^H \mathbf{P}\_{\mathbf{\dot{A}}}^\perp \mathbf{Y}\_{\mathbf{\dot{A}}} \tag{84}$$

and

18 Will-be-set-by-IN-TECH

It has been shown in (Söderström & Stoica a, 1989; Söderström & Stoica b, 1989) that ˆ

asymptotically (for large *N* or high SNR) statistically equivalent to ˆη. The weighting matrix **W**EXIP is simply the *Fisher In f ormation Matrix* (possibly scaled by a constant) for the

−<sup>1</sup>

The EXIP-WRELAX algorithm is composed of two steps. The first step is the same as Step 1 of the Hybrid-WRELAX algorithm and the second step is to refine the initial conditions obtained in Step 1 by using (81). Compared to the Hybrid-WRELAX algorithm, the second step of the EXIP-WRELAX algorithm is non-iterative and avoids dealing with the highly oscillatory true NLS cost function entirely. Our numerical examples show that at low SNR, the former tends

MSE (dB) of α 1

Fig. 5. Comparison of the MSEs of the WRELAX for assuming complex-valued signal amplitudes ("+"), Hybrid-WRELAX ("◦"), and EXIP-WRELAX ("×") with the CRBs corresponding to complex-valued (dashed line) and real-valued (solid line) signal

Hybrid-WRELAX ("◦"), and EXIP-WRELAX ("×") are compared with the CRBs obtained by

5. Note that both Hybrid-WRELAX and EXIP-WRELAX achieve the corresponding CRB. Note also that the threshold effect is obvious in Figure 5, where the MSEs deviate away from

also attains its corresponding CRB (dashed line) at high SNR, this wrong CRB can be larger than the true CRB by approximately 30 dB. (Note that the former CRB is expected to be

*<sup>l</sup>*=<sup>1</sup> being complex-valued (dashed line) and real-valued (solid line) in Figure


*<sup>T</sup>* **<sup>W</sup>**EXIP

*∂*<sup>2</sup> [*C*η˜(η˜)] *∂*η˜*∂*η˜*<sup>T</sup>*

 ˆ η˜ − *f*(η)

 η˜= ˆη˜

**<sup>F</sup>***T***W**EXIP

ˆ

η˜ . It can be easily shown that

SNR (dB)

*<sup>l</sup>*=<sup>1</sup> being complex-valued,

*<sup>l</sup>*=<sup>1</sup> being complex-valued

(b)

<sup>10</sup> <sup>15</sup> <sup>20</sup> <sup>25</sup> <sup>30</sup> <sup>35</sup> <sup>40</sup> <sup>45</sup> <sup>50</sup> <sup>55</sup> -80

, (79)

ηˆ is

. (80)

η˜. (81)

CRB (R) CRB (C) Hybrid-WRELAX EXIP-WRELAX Complex WRELAX

problem

where

complex-valued {*α*˜*l*}*<sup>L</sup>*

to outperform the latter.

MSE (dB) of τ 1







ˆ

<sup>η</sup><sup>ˆ</sup> <sup>=</sup> arg min<sup>η</sup>

ˆ ηˆ = 

SNR (dB)

The MSEs of the WRELAX ("+") for assuming {*αl*}*<sup>L</sup>*

the CRBs at low SNR. Although the WRELAX for assuming {*αl*}*<sup>L</sup>*

<sup>10</sup> <sup>15</sup> <sup>20</sup> <sup>25</sup> <sup>30</sup> <sup>35</sup> <sup>40</sup> <sup>45</sup> <sup>50</sup> <sup>55</sup> -160

(a)

amplitudes for (a) *τ*1, (b) *α*1.

assuming {*αl*}*<sup>L</sup>*

 ˆ η˜ − *f*(η)

**W**EXIP = *E*

*<sup>l</sup>*=<sup>1</sup> with ˜<sup>η</sup> replaced by its estimate <sup>ˆ</sup>

**<sup>F</sup>***T***W**EXIP**<sup>F</sup>**

CRB (R) CRB (C) Hybrid-WRELAX EXIP-WRELAX Complex WRELAX

$$\mathbf{P}\_{\tilde{\mathbf{A}}}^{\perp} = \mathbf{I} - \tilde{\mathbf{A}} \left(\tilde{\mathbf{A}}^{H} \tilde{\mathbf{A}}\right)^{-1} \tilde{\mathbf{A}}^{H} \tag{85}$$

$$
\mathbf{A} = \mathbf{S} \mathbf{A}.\tag{86}
$$

To avoid the search over the parameter space, *<sup>C</sup>*10({*ωl*}*<sup>L</sup> <sup>l</sup>*=1) can also be reparametrized in terms of another parameter vector **b**˜ = ˜ *b*<sup>0</sup> ˜ *<sup>b</sup>*<sup>1</sup> ··· ˜ *bL T* , where {˜ *bl*}*<sup>L</sup> <sup>l</sup>*=<sup>0</sup> are the coefficients of

Time Delay Estimation 21

FFT-Based Efficient Algorithms for Time Delay Estimation 75

**Remark 1:** MODE cannot be implemented efficiently to avoid the search over the parameter space when *S*(*k*) = 0 for some *k*. The most commonly used complex analytic signal *s*(*t*) is low-pass. For this case, we can select a contiguous segment of **Y** satisfying |*S*(*k*)| *>* 0, *K*<sup>1</sup> ≤ *k* ≤ *K*2, and preferrably with |*S*(*k*)| above a certain threshold to avoid numerical problems.

**Remark 2:** The amplitude estimates given above can be very poor when the SNR is not

spaced that **A**ˆ in (94) is seriously ill-conditioned. We use a simple spacing adjustment scheme

them in the ascending order and then check the spacing between two adjacent estimates. If the distance between any two estimates, say *ω*ˆ <sup>1</sup> and *ω*ˆ <sup>2</sup> (*ω*ˆ <sup>1</sup> ≤ *ω*ˆ 2), is smaller than a predefined threshold, say �*ωt*, we adjust the estimates by replacing *ω*ˆ <sup>1</sup> with *ω*ˆ <sup>1</sup> − 0.5�*ω<sup>t</sup>* and *ω*ˆ <sup>2</sup> with *<sup>ω</sup>*<sup>ˆ</sup> <sup>2</sup> <sup>+</sup> 0.5�*ωt*. The amplitudes are then estimated using the adjusted estimates of {*ωl*}*<sup>L</sup>*

spacing adjustment step is *ad hoc* but can be used to provide good initial delay and amplitude

*<sup>l</sup>*=<sup>1</sup> and {*α*ˆ*l*}*<sup>L</sup>*

especially for real-valued signals, can be refined by using the last step of the WRELAX

WRELAX is a relaxation-based minimizer of the following nonlinear least-squares (NLS)

When the signals are not spaced very closely, WRELAX usually converges in a few steps. However, when the signals are very closely spaced, the convergence speed of WRELAX is very slow. Yet by using the above MODE algorithm to obtain the initial conditions and then using the last step of WRELAX to refine them, super resolution time delay estimation can be

Before we present the MODE-WRELAX algorithm, let us consider the following preparations.

*L* ∑ *i*=1,*i*�=*l*

> 

*<sup>α</sup>*ˆ*<sup>l</sup>* <sup>=</sup> **<sup>a</sup>***H*(*ωl*)(**S**∗**Y***l*) � **<sup>S</sup>** �<sup>2</sup> *F*

**<sup>a</sup>***H*(*ωl*)(**S**∗**Y***l*)

 *ωl*=*ω*ˆ *<sup>l</sup>*

**Y***<sup>l</sup>* = **Y** −

Minimizing *C*15(*αl*, *ωl*) with respect to *ω<sup>l</sup>* and the complex-valued *α<sup>l</sup>* yields

*<sup>ω</sup>*<sup>ˆ</sup> *<sup>l</sup>* <sup>=</sup> arg max *<sup>ω</sup><sup>l</sup>*

*<sup>i</sup>*=1,*i*�=*<sup>l</sup>* are assumed to be given . Then (95) becomes

*L* ∑ *l*=1

*<sup>l</sup>*=1) =� **Y** −

sufficiently high. This is because some of the MODE estimates {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup>*

to avoid this problem. After obtaining the MODE estimates {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup>*

*<sup>l</sup>*=<sup>1</sup> of {*ωl*}*<sup>L</sup>*

*<sup>C</sup>*14({*αl*, *<sup>ω</sup>l*}*<sup>L</sup>*

*<sup>l</sup>*=<sup>1</sup> in (82).

*<sup>k</sup>*=*K*<sup>1</sup> to estimate {*ωl*}*<sup>L</sup>*

*<sup>l</sup>*=<sup>1</sup> of {*αl*}*<sup>L</sup>*

*<sup>l</sup>*=1.

*<sup>l</sup>*=<sup>1</sup> of {*ωl*}*<sup>L</sup>*

*<sup>l</sup>*=1, which may not be optimal,

*<sup>α</sup>l***Sa**(*ωl*) �<sup>2</sup> . (95)

*α*ˆ*i*[**Sa**(*ω*ˆ *<sup>i</sup>*)], (96)

, (98)

. (99)

*<sup>C</sup>*15(*αl*, *<sup>ω</sup>l*) =� **<sup>Y</sup>***<sup>l</sup>* <sup>−</sup> *<sup>α</sup>l***Sa**(*ωl*) �<sup>2</sup> . (97)

 2 *<sup>l</sup>*=<sup>1</sup> can be so closely

*<sup>l</sup>*=1, we first sort

*<sup>l</sup>*=1. This

*<sup>l</sup>*=<sup>1</sup> with {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup>*

where **<sup>A</sup>**<sup>ˆ</sup> is formed by replacing {*ωl*}*<sup>L</sup>*

We can then apply MODE to the segment {*Y*(*k*)}*K*<sup>2</sup>

estimates to replace the first *L* − 1 steps of WRELAX.

The MODE estimates {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup>*

achieved with a fast convergence speed.

algorithm.

criterion:

Let

and

where {*α*ˆ*i*, *<sup>ω</sup>*<sup>ˆ</sup> *<sup>i</sup>*}*<sup>L</sup>*

the following polynomial:

$$\tilde{b}(z) \stackrel{\Delta}{=} \sum\_{l=0}^{L} \tilde{b}\_l z^{L-l} \stackrel{\Delta}{=} \tilde{b}\_0 \prod\_{l=1}^{L} (z - e^{j\omega\_l}); \ \tilde{b}\_0 \neq 0. \tag{87}$$

Since the polynomial ˜ *<sup>b</sup>*(*z*) in (87) has all of its zeros on the unit circle, its coefficients {˜ *bl*} satisfy the conjugate symmetry constraint (Stoica & Sharman a, 1990):

$$
\tilde{b}\_l = \tilde{b}\_{L-l}^\* \quad l = 0, 1, \cdots, L,\tag{88}
$$

where (·)<sup>∗</sup> denotes the complex conjugate. Let

$$\mathbf{B} = \begin{bmatrix} \tilde{b}\_0 & & 0 \\ \vdots & \ddots & \\ \tilde{b}\_L & & \tilde{b}\_0 \\ & \ddots & \vdots \\ & & \ddots & \tilde{b}\_L \end{bmatrix} \in \mathcal{C}^{N \times (N-L)}.\tag{89}$$

Assume that the diagonal elements of **S** are nonzero (see **Remark 1** for more discussions). Let

$$
\tilde{\mathbf{B}} = \mathbf{S}^{-H} \mathbf{B}.\tag{90}
$$

It can be readily verified that **B***H***A** = **0** and hence **B**˜ *<sup>H</sup>***A**˜ = **0**. Then **P**<sup>⊥</sup> **<sup>A</sup>**˜ <sup>=</sup> **<sup>B</sup>**˜ � **B**˜ *<sup>H</sup>***B**˜ �−<sup>1</sup> **<sup>B</sup>**˜ *<sup>H</sup>* and minimizing *<sup>C</sup>*12({*ωl*}*<sup>L</sup> <sup>l</sup>*=1) in (84) is equivalent to minimizing

$$\mathbf{C}\_{13}(\{b\_l\}\_{l=0}^L) = \mathbf{Y}^H \mathbf{\tilde{B}} \left(\mathbf{\tilde{B}}^H \mathbf{\tilde{B}}\right)^{-1} \mathbf{\tilde{B}}^H \mathbf{Y}. \tag{91}$$

Note that **B**˜ *<sup>H</sup>***B**˜ in (91) can be replaced by a consistent estimate without affecting the asymptotically statistical efficiency of the minimizer of (91). Hence **b**ˆ can be obtained computationally efficiently as follows:

$$\hat{\mathbf{b}} = \operatorname\*{arg\,min}\_{\hat{\mathbf{b}}} \left[ \mathbf{Y}^{H} \mathbf{S}^{-H} \mathbf{B} \left( \hat{\mathbf{B}}\_{0}^{H} \mathbf{S}^{-1} \mathbf{S}^{-H} \hat{\mathbf{B}}\_{0} \right)^{-1} \mathbf{B}^{H} \mathbf{S}^{-1} \mathbf{Y} \right], \tag{92}$$

where **B**ˆ <sup>0</sup> is the initial estimate of **B** obtained by replacing **b**˜ with **b** ˆ˜ (0) in (89). The initial value ˆ **b**˜ (0) is obtained by setting **B**˜ *<sup>H</sup>***B**˜ in (91) to **I**:

$$\hat{\mathbf{b}}^{(0)} = \arg\min\_{\tilde{\mathbf{b}}} \left[ \mathbf{Y}^{H} \mathbf{S}^{-H} \mathbf{B} \mathbf{B}^{H} \mathbf{S}^{-1} \mathbf{Y} \right]. \tag{93}$$

To avoid the trivial solution **<sup>b</sup>**˜ <sup>=</sup> **<sup>0</sup>**, we should impose � **<sup>b</sup>**˜ �<sup>=</sup> 1 in (92) and (93) or some other similar constraints. (For detailed implementation steps, see Section 4.) The estimates {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> of {*ωl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> are the phases of the roots of the polynomial <sup>∑</sup>*<sup>L</sup> l*=0 ˆ **b**˜ *lz<sup>L</sup>*−*<sup>l</sup>* . Once {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup> l*=1 are obtained, the amplitudes α are estimated by applying the linear least-squares approach to

$$
\Upsilon \approx \mathbf{S} \hat{\mathbf{A}} \alpha,\tag{94}
$$

20 Will-be-set-by-IN-TECH

*b*0 *L* ∏ *l*=1

> ⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

Assume that the diagonal elements of **S** are nonzero (see **Remark 1** for more discussions). Let

� **B**˜ *<sup>H</sup>***B**˜ �−<sup>1</sup>

<sup>0</sup> **<sup>S</sup>**−1**S**−*H***B**<sup>ˆ</sup> <sup>0</sup>

**Y***H***S**−*H***BB***H***S**−1**Y**

�−<sup>1</sup>

�

Note that **B**˜ *<sup>H</sup>***B**˜ in (91) can be replaced by a consistent estimate without affecting the asymptotically statistical efficiency of the minimizer of (91). Hence **b**ˆ can be obtained

(*<sup>z</sup>* <sup>−</sup> *<sup>e</sup>jω<sup>l</sup>*

); ˜

, *l* = 0, 1, ··· , *L*, (88)

**B**˜ = **S**−*H***B**. (90)

**B***H***S**−1**Y**

*l*=0 ˆ **b**˜ *lz<sup>L</sup>*−*<sup>l</sup>*

**<sup>Y</sup>** <sup>≈</sup> **SA**<sup>ˆ</sup> <sup>α</sup>, (94)

�

**<sup>A</sup>**˜ <sup>=</sup> **<sup>B</sup>**˜ �

**B**˜ *<sup>H</sup>***B**˜

, (92)

. Once {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup>*

*l*=1

ˆ˜ (0) in (89). The initial value

. (93)

**B**˜ *<sup>H</sup>***Y**. (91)

�−<sup>1</sup> **<sup>B</sup>**˜ *<sup>H</sup>* and

*<sup>b</sup>*(*z*) in (87) has all of its zeros on the unit circle, its coefficients {˜

∈ C*N*×(*N*−*L*)

*b*<sup>0</sup> �= 0. (87)

. (89)

*bl*}

the following polynomial:

Since the polynomial ˜

minimizing *<sup>C</sup>*12({*ωl*}*<sup>L</sup>*

ˆ

{*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup>*

*<sup>l</sup>*=<sup>1</sup> of {*ωl*}*<sup>L</sup>*

computationally efficiently as follows:

ˆ

**b**˜ (0) is obtained by setting **B**˜ *<sup>H</sup>***B**˜ in (91) to **I**:

**b**˜ = arg min **b**˜

˜ *b*(*z*)

where (·)<sup>∗</sup> denotes the complex conjugate. Let

*L* ∑ *l*=0 ˜ *blz<sup>L</sup>*−*<sup>l</sup>* ˜

satisfy the conjugate symmetry constraint (Stoica & Sharman a, 1990): ˜ *bl* = ˜ *b*∗ *L*−*l*

> ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

˜ *b*<sup>0</sup> 0 . . . ... ˜ *bL* ˜ *b*0 ... . . . 0 ˜ *bL*

**B** =

It can be readily verified that **B***H***A** = **0** and hence **B**˜ *<sup>H</sup>***A**˜ = **0**. Then **P**<sup>⊥</sup>

*<sup>C</sup>*13({*bl*}*<sup>L</sup>*

�

where **B**ˆ <sup>0</sup> is the initial estimate of **B** obtained by replacing **b**˜ with **b**

**b**˜ (0) = arg min

**b**˜ �

*<sup>l</sup>*=<sup>1</sup> are the phases of the roots of the polynomial <sup>∑</sup>*<sup>L</sup>*

To avoid the trivial solution **<sup>b</sup>**˜ <sup>=</sup> **<sup>0</sup>**, we should impose � **<sup>b</sup>**˜ �<sup>=</sup> 1 in (92) and (93) or some other similar constraints. (For detailed implementation steps, see Section 4.) The estimates

are obtained, the amplitudes α are estimated by applying the linear least-squares approach to

ˆ

**Y***H***S**−*H***B**

*<sup>l</sup>*=1) in (84) is equivalent to minimizing

*<sup>l</sup>*=0) = **<sup>Y</sup>***H***B**˜

� **B**ˆ *<sup>H</sup>* where **<sup>A</sup>**<sup>ˆ</sup> is formed by replacing {*ωl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> with {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> in (82). **Remark 1:** MODE cannot be implemented efficiently to avoid the search over the parameter space when *S*(*k*) = 0 for some *k*. The most commonly used complex analytic signal *s*(*t*) is low-pass. For this case, we can select a contiguous segment of **Y** satisfying |*S*(*k*)| *>* 0, *K*<sup>1</sup> ≤ *k* ≤ *K*2, and preferrably with |*S*(*k*)| above a certain threshold to avoid numerical problems. We can then apply MODE to the segment {*Y*(*k*)}*K*<sup>2</sup> *<sup>k</sup>*=*K*<sup>1</sup> to estimate {*ωl*}*<sup>L</sup> <sup>l</sup>*=1.

**Remark 2:** The amplitude estimates given above can be very poor when the SNR is not sufficiently high. This is because some of the MODE estimates {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> can be so closely spaced that **A**ˆ in (94) is seriously ill-conditioned. We use a simple spacing adjustment scheme to avoid this problem. After obtaining the MODE estimates {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> of {*ωl*}*<sup>L</sup> <sup>l</sup>*=1, we first sort them in the ascending order and then check the spacing between two adjacent estimates. If the distance between any two estimates, say *ω*ˆ <sup>1</sup> and *ω*ˆ <sup>2</sup> (*ω*ˆ <sup>1</sup> ≤ *ω*ˆ 2), is smaller than a predefined threshold, say �*ωt*, we adjust the estimates by replacing *ω*ˆ <sup>1</sup> with *ω*ˆ <sup>1</sup> − 0.5�*ω<sup>t</sup>* and *ω*ˆ <sup>2</sup> with *<sup>ω</sup>*<sup>ˆ</sup> <sup>2</sup> <sup>+</sup> 0.5�*ωt*. The amplitudes are then estimated using the adjusted estimates of {*ωl*}*<sup>L</sup> <sup>l</sup>*=1. This spacing adjustment step is *ad hoc* but can be used to provide good initial delay and amplitude estimates to replace the first *L* − 1 steps of WRELAX.

The MODE estimates {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> of {*ωl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> and {*α*ˆ*l*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> of {*αl*}*<sup>L</sup> <sup>l</sup>*=1, which may not be optimal, especially for real-valued signals, can be refined by using the last step of the WRELAX algorithm.

WRELAX is a relaxation-based minimizer of the following nonlinear least-squares (NLS) criterion:

$$\mathbf{C}\_{14}(\{a\_l, \omega\_l\}\_{l=1}^L) = \left\|\begin{array}{c} \mathbf{Y} - \sum\nolimits\_l \mathbf{S} \mathbf{a}(\omega\_l) \end{array}\right\|^2 \,. \tag{95}$$

When the signals are not spaced very closely, WRELAX usually converges in a few steps. However, when the signals are very closely spaced, the convergence speed of WRELAX is very slow. Yet by using the above MODE algorithm to obtain the initial conditions and then using the last step of WRELAX to refine them, super resolution time delay estimation can be achieved with a fast convergence speed.

Before we present the MODE-WRELAX algorithm, let us consider the following preparations. Let

$$\mathbf{Y}\_{l} = \mathbf{Y} - \sum\_{i=1, i \neq l}^{L} \hat{\alpha}\_{i} [\mathbf{S} \mathbf{a}(\hat{\omega}\_{l})]\_{l} \tag{96}$$

where {*α*ˆ*i*, *<sup>ω</sup>*<sup>ˆ</sup> *<sup>i</sup>*}*<sup>L</sup> <sup>i</sup>*=1,*i*�=*<sup>l</sup>* are assumed to be given . Then (95) becomes

$$\mathbf{C}\_{15}(\mathfrak{a}\_{l\prime}\omega\_{l}) = \left\| \begin{array}{c} \mathbf{Y}\_{l} - \mathfrak{a}\_{l}\mathbf{S}\mathbf{a}(\omega\_{l}) \end{array} \right\|^{2}. \tag{97}$$

Minimizing *C*15(*αl*, *ωl*) with respect to *ω<sup>l</sup>* and the complex-valued *α<sup>l</sup>* yields

$$\hat{\omega}\_{l} = \arg\max\_{\omega\_{l}} \left| \mathbf{a}^{H}(\omega\_{l}) (\mathbf{S}^{\*} \mathbf{Y}\_{l}) \right|^{2},\tag{98}$$

and

$$\hat{\alpha}\_{l} = \left. \frac{\mathbf{a}^{H}(\omega\_{l})(\mathbf{S}^{\*}\mathbf{Y}\_{l})}{\|\|\mathbf{S}\|\|\_{F}^{2}} \right|\_{\omega\_{l} = \hat{\omega}\_{l}}.\tag{99}$$

Time Delay Estimation 23

FFT-Based Efficient Algorithms for Time Delay Estimation 77

MSE (dB) of α 1

Fig. 6. Comparison of the MSEs of WRELAX ("×"), MODE("◦"), and MODE-WRELAX ("∗")

very poor estimates. Since the MODE amplitude estimates are obtained without spacing adjustment, they are so poor at low SNR that some of their MSEs are above the axis limit due to the inversion of ill-conditioned matrices corresponding to very closely spaced delay estimates. Although the MSEs of the MODE estimates are close to the CRBs corresponding to the complex-valued amplitudes when the SNR is high, the wrong CRBs (not shown to avoid too many lines in the figure) can be larger than the true CRBs, which correspond to the

The detection and classification of roadway subsurface anomalies are very important for the design and quality evaluation of highways. Ultra wideband ground penetrating radar emits nonsinusoidal impulses with extremely large bandwidth (several GHz) and is very suitable for this application because of its high range resolution (on the order of several centimeters). The returned echoes of the ultra wideband ground penetrating radar are superimposed real-valued signals reflected from the boundaries of different media (layers, voids, etc.), which can be described by (1). Both the delays and gains are very useful for the detection and classification of roadway subsurface anomalies. The delays can be used to determine the layer thickness or anomaly location and the gains can be used to classify the type of media because the gains are related to the reflection coefficient at the boundary between two media with different dielectric constants. Once we get the estimates of the media dielectric constants,

WRELAX has been applied to layer stripping, based on a single-scan basis using experimental data and has been proven to be very useful. However, the straightforward extension of WRELAX from one scan to multi-scans is not feasible since, under moving conditions, the antenna can change its height relative to the road surface. As the height above the road decreases (or increases), the amplitude and the time delay of the received signal changes as well, which can lead to incorrect estimates of layer thickness and permittivity. A motion compensation and parameter estimation algorithm for GPR is presented for simultaneous

with the CRBs (solid line) corresponding to real-valued signals for (a) *τ*1, (b) *α*1.


SNR (dB)

(b)

<sup>16</sup> <sup>18</sup> <sup>20</sup> <sup>22</sup> <sup>24</sup> <sup>26</sup> <sup>28</sup> <sup>30</sup> <sup>32</sup> <sup>34</sup> <sup>36</sup> -60

CRB MODE-WRELAX MODE WRELAX

CRB MODE-WRELAX MODE WRELAX

SNR (dB)

<sup>16</sup> <sup>18</sup> <sup>20</sup> <sup>22</sup> <sup>24</sup> <sup>26</sup> <sup>28</sup> <sup>30</sup> <sup>32</sup> <sup>34</sup> <sup>36</sup> -140

(a)

real-valued amplitudes, by approximately 30 dB.

we can judge the type of the media.

**4.1 Pavement profiling via Ground Penetrating Radar (GPR)**

MSE (dB) of τ 1


**4. Applications**

With the above preparations, we now present the steps of the MODE-WRELAX algorithm for complex-valued signals.

**Step (1):** Select a contiguous segment of data vector **Y** (for MODE use only) so that |*S*(*k*)| *>* 0, *<sup>K</sup>*<sup>1</sup> <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> *<sup>K</sup>*2. Apply MODE to the segment to obtain {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup> <sup>l</sup>*=1. Adjust {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> so that the minimum spacing of {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> is at least �*ωt*. Obtain the estimates {*α*ˆ*l*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> of {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> by using (94).

**Step (2):** Refine the estimates obtained in Step (1) by using the last step of WRELAX. That is, compute **<sup>Y</sup>**<sup>1</sup> by using {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}*<sup>L</sup> <sup>l</sup>*=<sup>2</sup> obtained in Step (1). Obtain {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> from **Y**<sup>1</sup> by using (98) and (99). Next, compute **<sup>Y</sup>**<sup>2</sup> by using the updated {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}*l*=1,3,···,*<sup>L</sup>* and determine {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}*l*=<sup>2</sup> from **<sup>Y</sup>**2. Then compute **<sup>Y</sup>**<sup>3</sup> by using the updated {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}*l*=1,2,4,···,*<sup>L</sup>* and determine {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}*l*=<sup>3</sup> from **Y**3. Continue this procedure and similarly determine {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=*<sup>L</sup>* from **Y***L*. Repeat the above process until "practical convergence".

Similarly, we can use MODE as an initialization method for the EM time delay estimation algorithm (Moon, 1996), which is referred to as MODE-EM. However, we have found through numerical simulations that the convergence speed of MODE-EM is slower than that of MODE-WRELAX.

### **3.4.2 MODE-WRELAX for real-valued signals**

For bandpass real-valued signals, the cost function in (95) is a highly oscillatory cost function and is very difficult to find its global minimum. Although MODE is derived for complex-valued signals, we can apply it by assuming the real-valued amplitudes {*αl*}*<sup>L</sup> l*=1 to be complex-valued. These initial estimates are then refined by the WRELAX algorithm. Since the attraction domain of the cost function (95) is extremely small, a very good initial condition is required to achieve the global convergence of any minimizer of (95). The MODE estimates are first refined by WRELAX by assuming {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> to be complex-valued since the attraction domain of (95) becomes much larger when assuming the real-valued {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> to be complex-valued (Manickam et al., 1994). The so-obtained estimates are refined again by WRELAX by using the fact that {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> are real-valued.

With the above preparations, we now present the steps of the MODE-WRELAX algorithm for real-valued signals.

**Step (1):** Select a contiguous segment of data vector **Y** so that |*S*(*k*)| *>* 0, *K*<sup>1</sup> ≤ *k* ≤ *K*2. By assuming the real-valued {*αl*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> to be complex-valued, obtain the estimates {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> and {*α*ˆ*l*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> in the same way as Step (1) of the MODE-WRELAX algorithm for complex-valued signals.

**Step (2):** Refine the estimates obtained in Step (1) above by using the last step of WRELAX by assuming complex-valued signals. Take the real parts of the so-obtained amplitude estimates as the amplitude estimates {*α*ˆ*l*}*<sup>L</sup> <sup>l</sup>*=<sup>1</sup> of {*αl*}*<sup>L</sup> <sup>l</sup>*=1.

**Step (3):** Refine the estimates obtained in Step (2) above by using the last step of WRELAX and the fact that the signals are real-valued.

Now we consider an example where *τ*<sup>2</sup> − *τ*<sup>1</sup> = 0.2*τe*. The MSEs of MODE ("◦"), WRELAX ("×"), and MODE-WRELAX ("∗") are compared with the corresponding CRBs (solid line) in Figure 6. Note that due to the highly oscillatory cost functions and very closely spaced signals, WRELAX converges to some local minimum instead of the global one, which yields

Fig. 6. Comparison of the MSEs of WRELAX ("×"), MODE("◦"), and MODE-WRELAX ("∗") with the CRBs (solid line) corresponding to real-valued signals for (a) *τ*1, (b) *α*1.

very poor estimates. Since the MODE amplitude estimates are obtained without spacing adjustment, they are so poor at low SNR that some of their MSEs are above the axis limit due to the inversion of ill-conditioned matrices corresponding to very closely spaced delay estimates. Although the MSEs of the MODE estimates are close to the CRBs corresponding to the complex-valued amplitudes when the SNR is high, the wrong CRBs (not shown to avoid too many lines in the figure) can be larger than the true CRBs, which correspond to the real-valued amplitudes, by approximately 30 dB.

### **4. Applications**

22 Will-be-set-by-IN-TECH

With the above preparations, we now present the steps of the MODE-WRELAX algorithm for

**Step (1):** Select a contiguous segment of data vector **Y** (for MODE use only) so that |*S*(*k*)| *>*

**Step (2):** Refine the estimates obtained in Step (1) by using the last step of WRELAX. That is,

and (99). Next, compute **<sup>Y</sup>**<sup>2</sup> by using the updated {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}*l*=1,3,···,*<sup>L</sup>* and determine {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}*l*=<sup>2</sup> from **<sup>Y</sup>**2. Then compute **<sup>Y</sup>**<sup>3</sup> by using the updated {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}*l*=1,2,4,···,*<sup>L</sup>* and determine {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}*l*=<sup>3</sup> from **Y**3. Continue this procedure and similarly determine {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=*<sup>L</sup>* from **Y***L*. Repeat the

Similarly, we can use MODE as an initialization method for the EM time delay estimation algorithm (Moon, 1996), which is referred to as MODE-EM. However, we have found through numerical simulations that the convergence speed of MODE-EM is slower than that of

For bandpass real-valued signals, the cost function in (95) is a highly oscillatory cost function and is very difficult to find its global minimum. Although MODE is derived for complex-valued signals, we can apply it by assuming the real-valued amplitudes {*αl*}*<sup>L</sup>*

to be complex-valued. These initial estimates are then refined by the WRELAX algorithm. Since the attraction domain of the cost function (95) is extremely small, a very good initial condition is required to achieve the global convergence of any minimizer of (95). The MODE

be complex-valued (Manickam et al., 1994). The so-obtained estimates are refined again by

*<sup>l</sup>*=<sup>1</sup> are real-valued. With the above preparations, we now present the steps of the MODE-WRELAX algorithm for

**Step (1):** Select a contiguous segment of data vector **Y** so that |*S*(*k*)| *>* 0, *K*<sup>1</sup> ≤ *k* ≤ *K*2.

**Step (2):** Refine the estimates obtained in Step (1) above by using the last step of WRELAX by assuming complex-valued signals. Take the real parts of the so-obtained amplitude estimates

Now we consider an example where *τ*<sup>2</sup> − *τ*<sup>1</sup> = 0.2*τe*. The MSEs of MODE ("◦"), WRELAX ("×"), and MODE-WRELAX ("∗") are compared with the corresponding CRBs (solid line) in Figure 6. Note that due to the highly oscillatory cost functions and very closely spaced signals, WRELAX converges to some local minimum instead of the global one, which yields

*<sup>l</sup>*=1. **Step (3):** Refine the estimates obtained in Step (2) above by using the last step of WRELAX

*<sup>l</sup>*=<sup>1</sup> of {*αl*}*<sup>L</sup>*

*<sup>l</sup>*=<sup>1</sup> in the same way as Step (1) of the MODE-WRELAX algorithm for complex-valued

*<sup>l</sup>*=<sup>1</sup> to be complex-valued, obtain the estimates {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup>*

attraction domain of (95) becomes much larger when assuming the real-valued {*αl*}*<sup>L</sup>*

*<sup>l</sup>*=<sup>1</sup> is at least �*ωt*. Obtain the estimates {*α*ˆ*l*}*<sup>L</sup>*

*<sup>l</sup>*=1. Adjust {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup>*

*<sup>l</sup>*=<sup>2</sup> obtained in Step (1). Obtain {*ω*ˆ *<sup>l</sup>*, *α*ˆ*l*}*l*=<sup>1</sup> from **Y**<sup>1</sup> by using (98)

*<sup>l</sup>*=<sup>1</sup> of {*αl*}*<sup>L</sup>*

*<sup>l</sup>*=<sup>1</sup> to be complex-valued since the

*<sup>l</sup>*=<sup>1</sup> so that the

*<sup>l</sup>*=<sup>1</sup> by using

*l*=1

*<sup>l</sup>*=<sup>1</sup> to

*<sup>l</sup>*=<sup>1</sup> and

0, *<sup>K</sup>*<sup>1</sup> <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> *<sup>K</sup>*2. Apply MODE to the segment to obtain {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup>*

complex-valued signals.

minimum spacing of {*ω*<sup>ˆ</sup> *<sup>l</sup>*}*<sup>L</sup>*

compute **<sup>Y</sup>**<sup>1</sup> by using {*ω*<sup>ˆ</sup> *<sup>l</sup>*, *<sup>α</sup>*ˆ*l*}*<sup>L</sup>*

MODE-WRELAX.

real-valued signals.

{*α*ˆ*l*}*<sup>L</sup>*

signals.

above process until "practical convergence".

**3.4.2 MODE-WRELAX for real-valued signals**

WRELAX by using the fact that {*αl*}*<sup>L</sup>*

By assuming the real-valued {*αl*}*<sup>L</sup>*

as the amplitude estimates {*α*ˆ*l*}*<sup>L</sup>*

and the fact that the signals are real-valued.

estimates are first refined by WRELAX by assuming {*αl*}*<sup>L</sup>*

(94).

#### **4.1 Pavement profiling via Ground Penetrating Radar (GPR)**

The detection and classification of roadway subsurface anomalies are very important for the design and quality evaluation of highways. Ultra wideband ground penetrating radar emits nonsinusoidal impulses with extremely large bandwidth (several GHz) and is very suitable for this application because of its high range resolution (on the order of several centimeters). The returned echoes of the ultra wideband ground penetrating radar are superimposed real-valued signals reflected from the boundaries of different media (layers, voids, etc.), which can be described by (1). Both the delays and gains are very useful for the detection and classification of roadway subsurface anomalies. The delays can be used to determine the layer thickness or anomaly location and the gains can be used to classify the type of media because the gains are related to the reflection coefficient at the boundary between two media with different dielectric constants. Once we get the estimates of the media dielectric constants, we can judge the type of the media.

WRELAX has been applied to layer stripping, based on a single-scan basis using experimental data and has been proven to be very useful. However, the straightforward extension of WRELAX from one scan to multi-scans is not feasible since, under moving conditions, the antenna can change its height relative to the road surface. As the height above the road decreases (or increases), the amplitude and the time delay of the received signal changes as well, which can lead to incorrect estimates of layer thickness and permittivity. A motion compensation and parameter estimation algorithm for GPR is presented for simultaneous

Time Delay Estimation 25

FFT-Based Efficient Algorithms for Time Delay Estimation 79

Blackowiak, A. D. & Rajan, S. D. (1995). Multipath arrival estimates using simulated

Bruckstein, A. M., Shan, T. J. & Kailath, T. (1985). The resolution of overlapping echoes, IEEE Transactions on Acoustics, Speech, and SignalProcessing 33(6): 1357-1367. Ehrenberg, J. E., Ewatt, T. E. & Morris, R. D. (1978). Signal processing techniques for resolving

Feder, M. & Weinstein, E. (1988). Parameter estimation of superimposed signals using the

Gedalyahu, K. & Eldar, Y. (2010). Time-delay estimation from low-rate samples: A union of subspaces approach, IEEE Transactions on Signal Processing 58(6): 3017-3031. He, W., Wu, R. & Liu, J. (2011). Identification Method of EM Property Inversion for Multilayer

Kirsteins, I. P. & Kot, A. C. (1990). Performance analysis of a high resolution time delay

Li, J. & Stoica, P.(1996). Efficient mixed-spectrum estimation with applications to target feature

Li, J. & Wu, R. (1998). An efficient algorithm for time delay estimation, IEEE Transactions on

Li, X., Wu, R., Sheplak, M., & Li, J. (2002). Multifrequency CW-based Time-Delay Estimation for Proximity Ultrasonic Sensors, IEE Proc.-Radar Sonar Navig 149(2): 53-59. Li, J., Halder, B., Stoica, P. & Viberg, M.(1995). Decoupled Maximum Likelihood Angle

Li, X., Wu, R., Rasmi, S. , Li, J. , Cattafesta III, L. N. , & Sheplak M. (2003). Acoustic Proximity

Li, X., Wu, R., Rasmi, S., Li, J., Cattafesta III, L. N. & Sheplak, M. (2003). An Acoustic

Liu, N., Xu, Z. & Sadler, B. (2010). Ziv-zakai time-delay estimation bounds for

Manickam, T. G., Vaccaro, R. J. & Tufts, D. W. (1994). A least-squares algorithm for multipath time-delay estimation, IEEE Transactions on Signal Processing 42(11): 3229-3233. Manickam, T. G., Vaccaro, R. J. & Tufts, D. W.(1994). A least-squares algorithm for multipath time-delay estimation, IEEE Transactions on Signal Processing, 42(11): 3229-3233. Moon, T. K. (1996). The expectation-maximization algorithm, IEEE Signal Processing

Roy, R., Paulraj, A. & Kailath, T.(1986). ESPRIT - A Subspace Rotation Approach to Estimation

on ultrasonics, ferroelectrics, and frequency control 50(7): 898-910.

Estimationfor Signals with Known Waveforms, IEEE Transactions on Signal

Ranging in the Presence of Secondary Echoes, IEEE Transactions on Instrumentation

Proximity Ranging System for Monitoring the Cavity Thickness, IEEE Transactions

frequency-hopping waveforms under frequency-selective fading, IEEE Transactions

of Parameters of Cisoids in Noise. IEEE Transactions on Acoustics, Speech, and

Karmanov, V. G.(1977). Programmation Mathematique, Editions Mir, Moscow. Kay, S. M.(1988). Modern Spectral Estimation: Theory and Application. Prentice-Hall. Kirsteins, I. P. (1987). High resolution time delay estimation, Proceedings of ICASSP 87 pp.

estimation algorithm, Proceedings of ICASSP'90 pp. 2767-2770.

extraction, IEEE Transactions on Signal Processing 44(2), 281-295.

Oceanic Engineering 20(3): 157-165.

Media, IEEE Radar Conference, 2011.

Signal Processing 46(8): 2231-2235.

and Measurement 52(5): 1593-1605.

on Signal Processing 58(12): 6400-6406.

Signal Processing 34(5), 1340-1342.

Processing 43(9): 2154-2163.

Magazine pp. 47-60.

63(6): 1861-1865.

477-489.

451-454.

annealing: Application to crosshole tomography experiment, IEEE Journal of

individual pulses in multipath signal, Journal of the Acoustics Society of America

EM algorithm, IEEE Transactions on Acoustic, Speech and Signal Processing 36(4):

motion compensation and time delay estimation. More details can be found in our papers (Li & Wu, 1998; Su & Wu, 2000; Wu & Li, 1998; Wu et al., 1999; 2002).

### **4.2 Acoustic proximity ranging system**

Proximity ranging is very important in a wide range of remote sensing applications, including level detection, robot manipulation, process control, nondestructive testing, and cavity thickness monitoring. Various types of sensors based on different physical principles such as capacitive or inductive proximity sensors, laser displacement sensors, and ultrasonic sensors, can be used to perform proximity ranging. Among these sensors, ultrasonic sensors have many important advantages over the others. However, due to the presence of secondary echoes, using ultrasonic sensors for very close proximity ranging is very difficult. When ultrasonic sensors face a sound-hard reflective surface, the reflected sound wave can bounce back and forth several times between the sensors and the reflection surface before decaying to zero, which results in unwanted strong and overlapping secondary echoes in the received signal. The time delays for the secondary echoes are approximately integer multiples of the time delay of the first echo. The matched filter based methods cannot resolve two echoes with a time spacing less than the reciprocal of the signal bandwidth. Hence, for most of the very short distance measurement scenarios, the matched filter based algorithms tend to fail or suffer from severe performance degradations due to their poor resolutions. WRELAX and its extended version as a super-resolution time delay estimation approaches are developed for general purposes. They do not exploit the a *priori* information of the integer multiple time delays and the nonnegative amplitude due to the acoustically hard reflections. More details can be found in our papers (Li et al. a, 2002; 2003; Li et al. b, 2003).

### **5. Conclusion**

A family of relaxation- and FFT-based efficient time-delay estimation algorithms were presented for different scenarios in this chapter. By avoiding the computationally demanding multidimensional search over the parameter space, the proposed algorithms minimize the NLS criterion at a much lower implementation cost. They are more efficient and systematic than existing algorithms. Some practical applications utilizing the proposed algorithms are also presented. Theoretical analysis and simulations demonstrate the efficiency of the proposed new algorithms.

### **6. Acknowledgement**

This work was supported in part by the National Natural Science Foundation of China (No. 60879019, No. 61172112, No. 61179064) and in part by the Foundation of Civil Aviation Administration of China (MHRD0606).

#### **7. References**

Barton, D. K.(1988). Modern Radar System Analysis Artech House Inc..


24 Will-be-set-by-IN-TECH

motion compensation and time delay estimation. More details can be found in our papers

Proximity ranging is very important in a wide range of remote sensing applications, including level detection, robot manipulation, process control, nondestructive testing, and cavity thickness monitoring. Various types of sensors based on different physical principles such as capacitive or inductive proximity sensors, laser displacement sensors, and ultrasonic sensors, can be used to perform proximity ranging. Among these sensors, ultrasonic sensors have many important advantages over the others. However, due to the presence of secondary echoes, using ultrasonic sensors for very close proximity ranging is very difficult. When ultrasonic sensors face a sound-hard reflective surface, the reflected sound wave can bounce back and forth several times between the sensors and the reflection surface before decaying to zero, which results in unwanted strong and overlapping secondary echoes in the received signal. The time delays for the secondary echoes are approximately integer multiples of the time delay of the first echo. The matched filter based methods cannot resolve two echoes with a time spacing less than the reciprocal of the signal bandwidth. Hence, for most of the very short distance measurement scenarios, the matched filter based algorithms tend to fail or suffer from severe performance degradations due to their poor resolutions. WRELAX and its extended version as a super-resolution time delay estimation approaches are developed for general purposes. They do not exploit the a *priori* information of the integer multiple time delays and the nonnegative amplitude due to the acoustically hard reflections. More details

A family of relaxation- and FFT-based efficient time-delay estimation algorithms were presented for different scenarios in this chapter. By avoiding the computationally demanding multidimensional search over the parameter space, the proposed algorithms minimize the NLS criterion at a much lower implementation cost. They are more efficient and systematic than existing algorithms. Some practical applications utilizing the proposed algorithms are also presented. Theoretical analysis and simulations demonstrate the efficiency of the

This work was supported in part by the National Natural Science Foundation of China (No. 60879019, No. 61172112, No. 61179064) and in part by the Foundation of Civil Aviation

Bell, B. M. & Ewart, T. E. (1986). Separating multipaths by global optimization of

Bian, Y. & Last, D. (1997). Eigen-decomposition techniques for Loran-C skywave estimation, IEEE Transactions on Aerospace and Electronic Systems 33(1): 117-124.

multidimensional matched filter, IEEE Transactions on Acoustic, Speech and Signal

(Li & Wu, 1998; Su & Wu, 2000; Wu & Li, 1998; Wu et al., 1999; 2002).

can be found in our papers (Li et al. a, 2002; 2003; Li et al. b, 2003).

Barton, D. K.(1988). Modern Radar System Analysis Artech House Inc..

**4.2 Acoustic proximity ranging system**

**5. Conclusion**

proposed new algorithms.

Administration of China (MHRD0606).

Processing 34(5): 1029-1037.

**6. Acknowledgement**

**7. References**


**0**

**4**

Lianming Sun

*Japan*

*The University of Kitakyushu*

**Channel Identification for OFDM Communication**

Orthogonal frequency division multiplexing (OFDM) modulation has excellent performances, for example, strong tolerance against multipath interferences, effective spectral efficiency, high information capacity and simplicity of equalization. Consequently, it has been widely utilized in the services of digital terrestrial broadcasting, asymmetric digital subscriber line (ADSL), local wireless LAN and optical fiber communications. In the transmitter, relay station and receiver, signal processing techniques are used to mitigate the effects caused by various interferences, carrier frequency offset and noise, then to improve the equalization precision of information data. These techniques may achieve the utmost of their effectiveness if the reliable knowledge of the communication channel is applicable. Nevertheless, the prior information of the OFDM channel dynamics is typically unavailable, whereas the practical channel is often time-varying due to the differing propagation paths, scattering and reflection of electric waves. Hence it is necessary to identify the channel model from the observation data and some distinctive structural information inserted in the OFDM signals. In this chapter some channel identification problems as well as the fundamental mathematical tools are discussed,

Channel information is an essential issue in practical communication systems. It is often obtained by channel identification, which may be performed either in the time domain or in the frequency domain (Giannakis et al., 2000). The identification algorithms in time domain are commonly executed through least mean square (LMS) method, recursive least squares (RLS) method, maximum likelihood (ML) when a known sequence of training symbols transmitted in some specified training styles (Haykin, 2001; Ljung, 1999). When no training sequence can be used for channel identification, the blind (Chi et al., 2006; Ding & Li, 2001) or semi-blind algorithms may use some statical or structured properties of the OFDM signals, for example, the cyclic prefix, the symbol pattern of constellation (Koiveunen et al., 2004). If the spatial information is available, the subspace method is the possible choice (Muquet et al., 2002). These algorithms have been utilized in channel estimation and equalization, and have helped to improve the communication performance in applications of equalization (Giannakis et al., 2000), compensation of frequency offset (Yu & Su, 2004), compensation of nonlinearity distortion (Ding et al., 2004), interference compensation in relay station (Shibuya,

Nevertheless, in the presence of multipath interferences with long tags, or with the severe restriction that the carriers outside the signal band width do not convey any information

and several frequency domain algorithms are investigated.

**1. Introduction**

2006; Sun & Sano, 2005).

**System in Frequency Domain**


## **Channel Identification for OFDM Communication System in Frequency Domain**

Lianming Sun *The University of Kitakyushu Japan*

#### **1. Introduction**

26 Will-be-set-by-IN-TECH

80 Fourier Transform – Signal Processing

Schmidt, R. O. (1986). Multiple emitter location and signal parameter estimation, IEEE

Stoica, P. & Nehorai, A. (1990). Performance study of conditional and unconditional

Stoica, P. & Söderström, T. (1989). On Reparametrization of Loss Functions Used in Estimation

Stoica, P. & Sharman, K. C.(1990). Novel Eigenanalysis Method for Direction Estimation, IEE

Stoica, P. & Sharman, K. C.(1990). Maximum Likelihood Methods for Direction-of-Arrival

Su, Z. & Wu, R. (2000). Delay and doppler scale estimation of multiple moving targets via

Tufts, D. W., Kumaresan, R. & Kirsteins, I. (1982). Data adaptive signal estimation by singular value decomposition of a data matrix, Proceedings of the IEEE 70: 684-685. Vaccaro, R. J., Ramalingam, C. S. & Tufts, D. W.(1992). Least-squares time-delay estimation

Wang, Z., Li, J. & Wu, R. (2005). Time-delay-and time-reversal-based robust Capon

Wu, R. & Li, J. (1998). Time-delay estimation via optimizing highly oscillatory cost functions,

Wu, R. & Li, J. (1999). Time delay estimation with multiple looks in colored gaussian noise, IEEE Transactions on Aerospace and Electronic Systems 35(4): 1354-1361. Wu, R., Li, J. & Liu, Z. (1999). Super resolution time delay estimation via mode-wrelax, IEEE

Wu, R., Li, X. & Li, J. (2002). Continuous pavement profiling with ground-penetrating radar,

Wu, R., Cao, Y. & Liu, J. (2010). Multilayered Diffraction Tomography Algorithm for Ground Penetrating Radar, International Conference on Signal Processing, pp.2129-2132. Zangwill, W. I.(1967). Nonlinear Programming: A Unified Approach Prentice-Hall, Inc.,

Zehna, P. W.(1966). Invariance of maximimum likelihood estimation. Annual Mathematic

Transactions on Aerospace and Electronic Systems 35(1): 294-307.

IEE Proceedings-Radar, Sonar and Navigation, 149(4): 183-193.

direction-of-arrival estimation, IEEE Transactions on Acoustics, Speech, and Signal

Estimation, IEEE Transactions on Acoustics, Speech, and Signal Processing, 38(7):

for transient signals in a multipath environment, Journal of Acoustical Society of

beamformers for ultrasound imaging, IEEE Transactions on Medical Imaging 24(10):

Transactions on Antennas and Propagation 34(3): 276-280. Söderström, T. & Stoica, P.(1989). System Identification, Prentice-Hall International.

and the Invariance Principle, Signal Processing 17(4): 383-387.

Stoica, P. & Moses, R. L.(1997). Introduction to Spectral Analysis. Prentice-Hall. Stewart, G. W.(1973). Introduction to Matrix Computations Academic Press, Inc..

ds-wrelax, Electronics Letters 36(9): 827-828.

IEEE Journal of Oceanic Engineering 23(3): 235-244.

Proceedings, Radar and Signal Processing F 137(1): 19-26.

Processing 38(10): 1783-1795.

1132-1143.

1308-1322.

America 92: 210-218.

Englewood Cliffs, N.J..

Statistics 37:755.

Orthogonal frequency division multiplexing (OFDM) modulation has excellent performances, for example, strong tolerance against multipath interferences, effective spectral efficiency, high information capacity and simplicity of equalization. Consequently, it has been widely utilized in the services of digital terrestrial broadcasting, asymmetric digital subscriber line (ADSL), local wireless LAN and optical fiber communications. In the transmitter, relay station and receiver, signal processing techniques are used to mitigate the effects caused by various interferences, carrier frequency offset and noise, then to improve the equalization precision of information data. These techniques may achieve the utmost of their effectiveness if the reliable knowledge of the communication channel is applicable. Nevertheless, the prior information of the OFDM channel dynamics is typically unavailable, whereas the practical channel is often time-varying due to the differing propagation paths, scattering and reflection of electric waves. Hence it is necessary to identify the channel model from the observation data and some distinctive structural information inserted in the OFDM signals. In this chapter some channel identification problems as well as the fundamental mathematical tools are discussed, and several frequency domain algorithms are investigated.

Channel information is an essential issue in practical communication systems. It is often obtained by channel identification, which may be performed either in the time domain or in the frequency domain (Giannakis et al., 2000). The identification algorithms in time domain are commonly executed through least mean square (LMS) method, recursive least squares (RLS) method, maximum likelihood (ML) when a known sequence of training symbols transmitted in some specified training styles (Haykin, 2001; Ljung, 1999). When no training sequence can be used for channel identification, the blind (Chi et al., 2006; Ding & Li, 2001) or semi-blind algorithms may use some statical or structured properties of the OFDM signals, for example, the cyclic prefix, the symbol pattern of constellation (Koiveunen et al., 2004). If the spatial information is available, the subspace method is the possible choice (Muquet et al., 2002). These algorithms have been utilized in channel estimation and equalization, and have helped to improve the communication performance in applications of equalization (Giannakis et al., 2000), compensation of frequency offset (Yu & Su, 2004), compensation of nonlinearity distortion (Ding et al., 2004), interference compensation in relay station (Shibuya, 2006; Sun & Sano, 2005).

Nevertheless, in the presence of multipath interferences with long tags, or with the severe restriction that the carriers outside the signal band width do not convey any information

Fig. 1. Guard interval in OFDM signal

**2.2 OFDM signals in base band**

*d*(*k*)=

**2.3 Pilot and information symbols**

as well as symbol equalization.

**2.4 Multipath channel model**

approximated by

transform as

lim *L*→∞

*y*(*k*) =

1 *L*

*L* ∑ *l*=1

*M* ∑ *m*=0

*N*/2 <sup>∑</sup> *<sup>n</sup>*=−*N*/2+<sup>1</sup>

*N*-point IFFT as follows

elements.

 

Channel Identification for OFDM Communication System in Frequency Domain 83

In the transmission symbol period *N*tx, the transmitted signal in base band is generated by

where the FFT size *N* is a power of 2. *k* and *ω*<sup>0</sup> = 2*π*/*N* are the normalized sampling instant and angular frequency factor, respectively. *D*(*n*, *l*) is the symbol conveyed at the *n*th carrier in the *l*th transmission symbol period, and it belongs to a modulation constellation with finite

For the purpose of synchronization and equalization, scattered pilot symbols are assigned at the specified carriers, i.e., at these pilot carriers, the transmitted symbols *D*(*n*, *l*) are the known ones at both the transmitter and the receiver, and can be employed in channel identification

On the other hand, the symbol *D*(*n*, *l*) at information carrier can generally be treated as a

holds true, where *δ* is the delta function, \* denotes the conjugate complex, *D*¯ <sup>2</sup> is the mean square of the constellation, *n*<sup>1</sup> and *n*<sup>2</sup> are the carrier numbers, *l*<sup>1</sup> is an arbitrary integer.

Assume that the received signal in base band under multipath environment can be

where *rm*(*k*) is the *m*th multipath wave to the receiver, *hm* is its coefficient, *km* is the delay tap, and *e*(*k*) is the additive noise. Correspondingly the channel model can be expressed by *z*

*M* ∑ *m*=0

*<sup>H</sup>*(*z*) = *<sup>h</sup>*<sup>0</sup> <sup>+</sup> *<sup>h</sup>*1*z*−*k*<sup>1</sup> <sup>+</sup> *<sup>h</sup>*2*z*−*k*<sup>2</sup> <sup>+</sup> ··· <sup>+</sup> *hMz*−*kM* , (4)

random sequence with respect to the carrier number *n* and symbol period *l*, i.e.,

*rm*(*k*) + *e*(*k*) =

 


, for *lN*tx − *N*gi ≤ *k < lN*tx + *N*, (1)

*<sup>D</sup>*∗(*n*1, *<sup>l</sup>*)*D*(*n*2, *<sup>l</sup>* <sup>−</sup> *<sup>l</sup>*1) = *<sup>D</sup>*¯ <sup>2</sup>*δ*(*n*<sup>1</sup> <sup>−</sup> *<sup>n</sup>*2)*δ*(*l*1) (2)

*hmd*(*k* − *km*) + *e*(*k*), (3)

 

 

 

*D*(*n*, *l*)*ejnω*0(*k*−*lN*tx)

symbols, the time domain algorithms may suffer from either low convergence rate or high computational complexity. Furthermore, if few training sequences can be available for channel estimation, the blind algorithms in the time domain commonly have to employ nonlinear optimization which may converge slowly.

On the other hand, the OFDM signals in base band are managed by Fourier transform and inverse Fourier transform, it implies that the channel identification can also be performed in the frequency domain with the aid of Fourier transform. The advantages of OFDM channel identification in the frequency domain are as follows: Both the transmitted and received signals in base band, and the dynamic channel model can be treated conveniently through Fourier transform in the frequency domain, while fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT) can significantly reduce the computational complexity in channel identification. Additionally, the dynamics of channel model is easily handled in frequency domain without extra computation even for long delay taps, and only simple computation is required for convolution and deconvolution. Furthermore, the scattered pilot symbols assigned at some specified carriers can be more applicable than that in the time domain, and the identification algorithm can easily be combined with equalization, interference cancellation. Hence it is a strong motivation to develop effective channel identification algorithms in the frequency domain.

In this chapter the channel identification is studied in the frequency domain, and several identification algorithms are presented for the OFDM channel working under severe communication environment or restricted identification conditions. Firstly, the frequency properties of both the OFDM signals in base band and the propagation channel used in identification are briefly illustrated, and some structural features of cyclic prefix, constellation of information symbols and scattered pilot symbols are also shown in the frequency domain. Secondly, the fundamentals of identification algorithms are discussed, including the frequency properties of the inter-symbol interference (ISI) and inter carrier interference (ICI), the correlation function and spectral property of various signals in OFDM system, the leakage error of Fourier transform. Then, several identification algorithms are presented, including the batch processing algorithm, recursive algorithm, the usage of pilot symbols, the method to mitigate the affection of equalization errors for the case of low pilot rate. Next, the applications of the identification algorithms are considered for the cases where the multipath interferences have long delay taps, the OFDM signal has severe bandwidth restriction, or the propagation channel has fast fading. Furthermore, their performances of convergence and computational complexity are analyzed, and compared with the methods in the time domain. It is seen that Fourier transform is a powerful mathematical tool in the identification problems of OFDM channel, and the Fourier transform based algorithms demonstrate attractive performance even under some severe communication conditions.

### **2. Fundamentals in channel identification**

#### **2.1 Guard interval**

Let the normalized period of OFDM information symbol be denoted as *N*. As shown in Fig.1, OFDM guard interval (GI) attaches a copy of the effective symbol's tail part to its head as a cyclic prefix when the signal is transmitted. Let the GI length be *N*gi, then the practical transmission period denoted as *N*tx becomes to *N*tx = *N* + *N*gi.

Fig. 1. Guard interval in OFDM signal

#### **2.2 OFDM signals in base band**

2 Will-be-set-by-IN-TECH

symbols, the time domain algorithms may suffer from either low convergence rate or high computational complexity. Furthermore, if few training sequences can be available for channel estimation, the blind algorithms in the time domain commonly have to employ

On the other hand, the OFDM signals in base band are managed by Fourier transform and inverse Fourier transform, it implies that the channel identification can also be performed in the frequency domain with the aid of Fourier transform. The advantages of OFDM channel identification in the frequency domain are as follows: Both the transmitted and received signals in base band, and the dynamic channel model can be treated conveniently through Fourier transform in the frequency domain, while fast Fourier transform (FFT) and inverse fast Fourier transform (IFFT) can significantly reduce the computational complexity in channel identification. Additionally, the dynamics of channel model is easily handled in frequency domain without extra computation even for long delay taps, and only simple computation is required for convolution and deconvolution. Furthermore, the scattered pilot symbols assigned at some specified carriers can be more applicable than that in the time domain, and the identification algorithm can easily be combined with equalization, interference cancellation. Hence it is a strong motivation to develop effective channel identification

In this chapter the channel identification is studied in the frequency domain, and several identification algorithms are presented for the OFDM channel working under severe communication environment or restricted identification conditions. Firstly, the frequency properties of both the OFDM signals in base band and the propagation channel used in identification are briefly illustrated, and some structural features of cyclic prefix, constellation of information symbols and scattered pilot symbols are also shown in the frequency domain. Secondly, the fundamentals of identification algorithms are discussed, including the frequency properties of the inter-symbol interference (ISI) and inter carrier interference (ICI), the correlation function and spectral property of various signals in OFDM system, the leakage error of Fourier transform. Then, several identification algorithms are presented, including the batch processing algorithm, recursive algorithm, the usage of pilot symbols, the method to mitigate the affection of equalization errors for the case of low pilot rate. Next, the applications of the identification algorithms are considered for the cases where the multipath interferences have long delay taps, the OFDM signal has severe bandwidth restriction, or the propagation channel has fast fading. Furthermore, their performances of convergence and computational complexity are analyzed, and compared with the methods in the time domain. It is seen that Fourier transform is a powerful mathematical tool in the identification problems of OFDM channel, and the Fourier transform based algorithms demonstrate attractive performance

Let the normalized period of OFDM information symbol be denoted as *N*. As shown in Fig.1, OFDM guard interval (GI) attaches a copy of the effective symbol's tail part to its head as a cyclic prefix when the signal is transmitted. Let the GI length be *N*gi, then the practical

nonlinear optimization which may converge slowly.

algorithms in the frequency domain.

even under some severe communication conditions.

transmission period denoted as *N*tx becomes to *N*tx = *N* + *N*gi.

**2. Fundamentals in channel identification**

**2.1 Guard interval**

In the transmission symbol period *N*tx, the transmitted signal in base band is generated by *N*-point IFFT as follows

$$d(k) = \sum\_{n=-N/2+1}^{N/2} D(n,l)e^{j n \omega\_0 (k - lN\_{\rm tx})} \quad \text{for } lN\_{\rm tx} - N\_{\rm gi} \le k < lN\_{\rm tx} + N\_{\prime} \tag{1}$$

where the FFT size *N* is a power of 2. *k* and *ω*<sup>0</sup> = 2*π*/*N* are the normalized sampling instant and angular frequency factor, respectively. *D*(*n*, *l*) is the symbol conveyed at the *n*th carrier in the *l*th transmission symbol period, and it belongs to a modulation constellation with finite elements.

#### **2.3 Pilot and information symbols**

For the purpose of synchronization and equalization, scattered pilot symbols are assigned at the specified carriers, i.e., at these pilot carriers, the transmitted symbols *D*(*n*, *l*) are the known ones at both the transmitter and the receiver, and can be employed in channel identification as well as symbol equalization.

On the other hand, the symbol *D*(*n*, *l*) at information carrier can generally be treated as a random sequence with respect to the carrier number *n* and symbol period *l*, i.e.,

$$\lim\_{L \to \infty} \frac{1}{L} \sum\_{l=1}^{L} D^\*(n\_1, l) D(n\_2, l - l\_1) = \vec{D}^2 \delta(n\_1 - n\_2) \delta(l\_1) \tag{2}$$

holds true, where *δ* is the delta function, \* denotes the conjugate complex, *D*¯ <sup>2</sup> is the mean square of the constellation, *n*<sup>1</sup> and *n*<sup>2</sup> are the carrier numbers, *l*<sup>1</sup> is an arbitrary integer.

#### **2.4 Multipath channel model**

Assume that the received signal in base band under multipath environment can be approximated by

$$y(k) = \sum\_{m=0}^{M} r\_m(k) + e(k) = \sum\_{m=0}^{M} h\_m d(k - k\_m) + e(k),\tag{3}$$

where *rm*(*k*) is the *m*th multipath wave to the receiver, *hm* is its coefficient, *km* is the delay tap, and *e*(*k*) is the additive noise. Correspondingly the channel model can be expressed by *z* transform as

$$H(z) = h\_0 + h\_1 z^{-k\_1} + h\_2 z^{-k\_2} + \dots + h\_M z^{-k\_M},\tag{4}$$

<sup>=</sup> *hm N*

> + *hm N*

<sup>=</sup> *hm N*

> <sup>−</sup> *hm N*

> + *hm N*

*E*s(*n*, *l*) =

*E*c(*n*, *l*) = −

*M* ∑ *m*=*m*<sup>1</sup>

*hm N*

in (6) leads to *E*c(*n*, *l*), which is the ICI term given by

*M* ∑ *m*=*m*<sup>1</sup>

*hm N*

*km*−*N*gi−1 ∑ *k*=0

written as

becomes to

*km*−*N*gi−1 ∑ *k*=0

> *N*−1 ∑ *<sup>k</sup>*=*km*−*N*gi *<sup>N</sup>*/2

*km*−*N*gi−1 ∑ *k*=0

*km*−*N*gi−1 ∑ *k*=0

*N*−1 ∑ *k*=0

*N*/2

*N*/2

*N*/2

<sup>∑</sup> *<sup>n</sup>*1=−*N*/2+<sup>1</sup>

*N*/2

*M* ∑ *m*=0

*E*s(*n*, *l*) in the frequency domain, then it can be expressed by

*km*−*N*gi−1 ∑ *k*=0

 *<sup>N</sup>*/2 <sup>∑</sup> *<sup>n</sup>*1=−*N*/2+<sup>1</sup>

<sup>∑</sup> *<sup>n</sup>*1=−*N*/2+<sup>1</sup>

<sup>∑</sup> *<sup>n</sup>*1=−*N*/2+<sup>1</sup>

<sup>∑</sup> *<sup>n</sup>*1=−*N*/2+<sup>1</sup>

<sup>∑</sup> *<sup>n</sup>*1=−*N*/2+<sup>1</sup>

leakage error whose frequency components contaminate all the carriers.

*<sup>D</sup>*(*n*1, *<sup>l</sup>*−1)*ejn*1*ω*0(*k*−*km*+*N*gi)

*<sup>D</sup>*(*n*1, *<sup>l</sup>* <sup>−</sup> <sup>1</sup>)*ejn*1*ω*<sup>0</sup> (*k*−*km*+*N*gi)

*e*−*jnω*0*<sup>k</sup>*

*D*(*n*1, *l*)*ejn*1*ω*<sup>0</sup> (*k*−*km*)

*D*(*n*1, *l*)*ejn*1*ω*<sup>0</sup> (*k*−*km*)

Following the property of orthogonal basis function of *e*−*jnω*0*k*, the last term in (6) can be

Since the term in (7) is only the frequency component at the *n*th carrier, clearly it still holds the carrier orthogonality. Nevertheless, the first and the second terms in (6), which are the summation within the interval 0 ≤ *k* ≤ *km* − *N*gi − 1 of an incomplete FFT window, yield

Now consider frequency components of all the multipaths. The orthogonal term at *n*th carrier

On the other hand, the first term in (6) for *km > N*gi yields ISI, which is the interference from the (*l* − 1)th symbol period to the *l*th period. Let the representation of ISI be denoted as

where *m*<sup>1</sup> is the smallest integer such that *km*<sup>1</sup> *> N*gi. Moreover, the effect of the second term

 *<sup>N</sup>*/2 <sup>∑</sup> *<sup>n</sup>*1=−*N*/2+<sup>1</sup>  *e* −*jnω*0*k* 

*D*(*n*1, *l*)*ejn*1*ω*0(*k*−*km*)

Channel Identification for OFDM Communication System in Frequency Domain 85

*e*−*jnω*0*<sup>k</sup>*

*hme*−*jnω*0*km D*(*n*, *l*). (7)

*hme*−*jnω*0*km D*(*n*, *l*) = *H*(*ejnω*<sup>0</sup> )*D*(*n*, *l*). (8)

*<sup>D</sup>*(*n*1, *<sup>l</sup>*−1)*ejn*1*ω*0(*k*−*km*+*N*gi)

*D*(*n*1, *l*)*ejn*1*ω*0(*k*−*km*)

 *e* −*jnω*0*k* 

 *e* −*jnω*0*k*  , (9)

. (10)

*e*−*jnω*0*<sup>k</sup>*

. (6)

*e*−*jnω*0*<sup>k</sup>*

where *z*−<sup>1</sup> is a backward shift operator, *kM* is the longest effective delay tap of interference. Substituting *z* = *ejω*<sup>0</sup> into (4) also yields the frequency response function of the channel model.

#### **3. Identification of multipath channel with long delay taps**

When the delay taps of multipath waves are within GI, both equalization and channel identification are easily implemented in OFDM communication (Koiveunen et al., 2004; Wang & Poor, 2003). However, the situation is quite different when the delay taps of some multipath waves exceed GI due to the induced inter-symbol interference (ISI) and inter-carrier interference (ICI) (Suzuki et al., 2002).

Fig. 2. Multipath interference in OFDM system

#### **3.1 Signal properties used in identification**

#### **3.1.1 Interference exceeding GI**

Consider the interference *rm*(*k*) with long delay tap *km* exceeding GI. In the *l*th effective symbol period for *lN*tx ≤ *k < lN*tx + *N*, the component of interference with delay tap *km* in Fig.2 is given by

$$r\_m(k) = h\_m d(k - k\_m)$$

$$= \begin{cases} h\_m \sum\_{n=-N/2+1}^{N/2} D(n, l - 1)e^{jn\omega\_0(k - k\_m + N\_{\rm{gj}} - lN\_{\rm{tr}})} \\ \qquad\quad \text{for } k - k\_m - lN\_{\rm{tr}} < -N\_{\rm{gj}}. \\ h\_m \sum\_{n=-N/2+1}^{N/2} D(n, l)e^{jn\omega\_0(k - k\_m - lN\_{\rm{tr}})} \\ \qquad\quad \text{for } -N\_{\rm{gj}} \le k - k\_m - lN\_{\rm{tr}} < N. \end{cases} \tag{5}$$

Performing *N*-point FFT of *rm*(*k*) within the FFT window for *lN*tx ≤ *k < lN*tx +*N* yields the frequency components of *rm*(*k*). For example, the component corresponding to the *n*th carrier is expressed by

$$\begin{split} &\frac{1}{N} \sum\_{k=IN\_{\rm{lx}}}^{IN\_{\rm{ln}}+N-1} r\_{\rm{m}}(k) e^{-jn\omega\_{0}(k-IN\_{\rm{lx}})} = \frac{1}{N} \sum\_{k=0}^{N-1} r\_{\rm{m}}(k+IN\_{\rm{lx}}) e^{-jn\omega\_{0}k} \\ &= \frac{1}{N} \left( \sum\_{k=0}^{k\_{\rm{m}}-N\_{\rm{l}}-1} r\_{\rm{m}}(k+IN\_{\rm{lx}}) e^{-jn\omega\_{0}k} + \sum\_{k=k\_{\rm{m}}-N\_{\rm{l}}}^{N-1} r\_{\rm{m}}(k+IN\_{\rm{lx}}) e^{-jn\omega\_{0}k} \right). \end{split}$$

4 Will-be-set-by-IN-TECH

where *z*−<sup>1</sup> is a backward shift operator, *kM* is the longest effective delay tap of interference. Substituting *z* = *ejω*<sup>0</sup> into (4) also yields the frequency response function of the channel model.

When the delay taps of multipath waves are within GI, both equalization and channel identification are easily implemented in OFDM communication (Koiveunen et al., 2004; Wang & Poor, 2003). However, the situation is quite different when the delay taps of some multipath waves exceed GI due to the induced inter-symbol interference (ISI) and inter-carrier

> 

> >

*<sup>D</sup>*(*n*, *<sup>l</sup>* <sup>−</sup> <sup>1</sup>)*ejnω*0(*k*−*km*+*N*gi−*lN*tx)

*D*(*n*, *l*)*ejnω*0(*k*−*km*−*lN*tx)

Performing *N*-point FFT of *rm*(*k*) within the FFT window for *lN*tx ≤ *k < lN*tx +*N* yields the frequency components of *rm*(*k*). For example, the component corresponding to the *n*th carrier

*N*

*rm*(*k* + *lN*tx)*e*−*jnω*0*<sup>k</sup>* +

*N*−1 ∑ *k*=0

for *k* − *km* − *lN*tx *<* −*N*gi,

*rm*(*k* + *lN*tx)*e*

*N*−1 ∑ *k*=*km*−*N*gi −*jnω*0*k*

*rm*(*k* + *lN*tx)*e*−*jnω*0*<sup>k</sup>*

⎞ ⎠

for − *N*gi ≤ *k* − *km* − *lN*tx *< N*.

Consider the interference *rm*(*k*) with long delay tap *km* exceeding GI. In the *l*th effective symbol period for *lN*tx ≤ *k < lN*tx + *N*, the component of interference with delay tap *km*

 

(5)

 

**3. Identification of multipath channel with long delay taps**

 

interference (ICI) (Suzuki et al., 2002).



Fig. 2. Multipath interference in OFDM system

*rm*(*k*) = *hmd*(*k* − *km*)

*hm*

*hm*

*N*/2 ∑ *n*=−*N*/2+1

*N*/2 ∑ *n*=−*N*/2+1

*rm*(*k*)*e*−*jnω*0(*k*−*lN*tx) <sup>=</sup> <sup>1</sup>

⎧ ⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

*km*−*N*gi−1 ∑ *k*=0

=

**3.1 Signal properties used in identification**

**3.1.1 Interference exceeding GI**

in Fig.2 is given by

is expressed by

1 *N* *lN*tx+*N*−1 ∑ *k*=*lN*tx

⎛ ⎝

<sup>=</sup> <sup>1</sup> *N*

$$\begin{split} &= \frac{h\_m}{N} \sum\_{k=0}^{k\_m - N\_0^{-1}} \Biggl( \left( \sum\_{n\_1 = -N/2 + 1}^{N/2} D(n\_1, l - 1) e^{jn\_1 \omega\_0 (k - k\_m + N\_{\bar{\beta}})} \right) e^{-jn\omega\_0 k} \Biggr) \\ &+ \frac{h\_m}{N} \sum\_{k = k\_m - N\_{\bar{\beta}}}^{N-1} \Biggl( \left( \sum\_{n\_1 = -N/2 + 1}^{N/2} D(n\_1, l) e^{jn\_1 \omega\_0 (k - k\_m)} \right) e^{-jn\omega\_0 k} \Biggr) \\ &= \frac{h\_m}{N} \sum\_{k=0}^{k\_m - N\_0^{-1}} \Biggl( \left( \sum\_{n\_1 = -N/2 + 1}^{N/2} D(n\_1, l - 1) e^{jn\_1 \omega\_0 (k - k\_m + N\_{\bar{\beta}})} \right) e^{-jn\omega\_0 k} \Biggr) \\ &- \frac{h\_m}{N} \sum\_{k=0}^{k\_m - N\_0^{-1}} \Biggl( \left( \sum\_{n\_1 = -N/2 + 1}^{N/2} D(n\_1, l) e^{jn\_1 \omega\_0 (k - k\_m)} \right) e^{-jn\omega\_0 k} \Biggr) \\ &+ \frac{h\_m}{N} \sum\_{k=0}^{N-1} \Biggl( \left( \sum\_{n\_1 = -N/2 + 1}^{N/2} D(n\_1, l) e^{jn\_1 \omega\_0 (k - k\_m)} \right) e^{-jn\omega\_0 k} \Biggr). \tag{6} \end{split}$$

Following the property of orthogonal basis function of *e*−*jnω*0*k*, the last term in (6) can be written as

$$
\hbar\_m e^{-j n \omega\_0 k\_m} D(n, l). \tag{7}
$$

Since the term in (7) is only the frequency component at the *n*th carrier, clearly it still holds the carrier orthogonality. Nevertheless, the first and the second terms in (6), which are the summation within the interval 0 ≤ *k* ≤ *km* − *N*gi − 1 of an incomplete FFT window, yield leakage error whose frequency components contaminate all the carriers.

Now consider frequency components of all the multipaths. The orthogonal term at *n*th carrier becomes to

$$\sum\_{m=0}^{M} h\_m e^{-j n \omega\_0 k\_m} D(n, l) = H(e^{j n \omega\_0}) D(n, l). \tag{8}$$

On the other hand, the first term in (6) for *km > N*gi yields ISI, which is the interference from the (*l* − 1)th symbol period to the *l*th period. Let the representation of ISI be denoted as *E*s(*n*, *l*) in the frequency domain, then it can be expressed by

$$E\_{\rm S}(n,l) = \sum\_{m=m\_1}^{M} \frac{h\_m}{N} \left( \sum\_{k=0}^{k\_m - N\_{\rm gb} - 1} \left( \sum\_{n\_1 = -N/2 + 1}^{N/2} D(n\_1, l - 1) e^{jn\_1 \omega\_0 \left( k - k\_m + N\_{\rm gb} \right)} \right) e^{-jn\omega\_0 k} \right), \tag{9}$$

where *m*<sup>1</sup> is the smallest integer such that *km*<sup>1</sup> *> N*gi. Moreover, the effect of the second term in (6) leads to *E*c(*n*, *l*), which is the ICI term given by

$$E\_{\mathbb{C}}(n,l) = -\sum\_{m=m\_1}^{M} \frac{l\_{l\mathbb{m}}}{N} \left( \sum\_{k=0}^{k\_n - N\_0 - 1} \left( \sum\_{n\_1 = -N/2 + 1}^{N/2} D(n\_1, l) e^{j n\_1 \omega\_0 (k - k\_n)} \right) e^{-j n \omega\_0 k} \right). \tag{10}$$

for *k* = 0, 1, ··· , *N* − 1, where *d*s(*k*, *l*) and *d*c(*k*, *l*) are the corresponding transmitted signals

Channel Identification for OFDM Communication System in Frequency Domain 87

0, for *<sup>k</sup> <sup>&</sup>gt;* <sup>0</sup> , (15)

0, for *<sup>k</sup> <sup>&</sup>gt;* <sup>0</sup> . (16)

<sup>−</sup>*jnω*0(*km*−*N*gi)

<sup>−</sup>*jnω*0(*km*−*N*gi), for *<sup>n</sup>* <sup>=</sup> *<sup>n</sup>*1.

*H*s(*n*, *n*1)*D*(*n*1, *l*−1) (17)

, for *n* �= *n*1,

*H*c(*n*, *n*1)*D*(*n*1, *l*), (18)

, for *n* �= *n*1,

*<sup>d</sup>*s(*k*, *<sup>l</sup>*) = � *<sup>d</sup>*(*<sup>k</sup>* <sup>−</sup> <sup>1</sup> + (*<sup>l</sup>* <sup>−</sup> <sup>1</sup>)*N*tx), for <sup>−</sup> *<sup>N</sup>* <sup>+</sup> *<sup>N</sup>*gi <sup>+</sup> <sup>1</sup> <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> <sup>0</sup>

*<sup>d</sup>*c(*k*, *<sup>l</sup>*) = � <sup>−</sup>*d*(*<sup>k</sup>* <sup>−</sup> *<sup>N</sup>*gi <sup>−</sup> <sup>1</sup> <sup>+</sup> *lN*tx), for <sup>−</sup> *<sup>N</sup>* <sup>+</sup> *<sup>N</sup>*gi <sup>+</sup> <sup>1</sup> <sup>≤</sup> *<sup>k</sup>* <sup>≤</sup> <sup>0</sup>

*N*/2 <sup>∑</sup> *<sup>n</sup>*1=−*N*/2+<sup>1</sup>

<sup>−</sup>*jn*1*ω*0(*km*−*N*gi)

*N*/2 <sup>∑</sup> *<sup>n</sup>*1=−*N*/2+<sup>1</sup>

From (17) and (18), the data in the frequency domain fulfil the following expression

**HD**(*l*) = **Y**(*l*) − **E**(*l*) − **V**(*l*)

�

. .

<sup>0</sup> ... <sup>0</sup>

where **Y**(*l*), **D**(*l*), **E**(*l*), **E**s(*l*) and **V**(*l*) are the vectors of FFT coefficients of the received signal, the information symbols, leakage error, ISI and noise in the *l*th period, respectively, and

> *H* � *ejN*/2*ω*<sup>0</sup> �

*H*<sup>c</sup> (−*N*/2 + 1, −*N*/2 + 1) ··· *H*<sup>c</sup> (−*N*/2 + 1, *N*/2)

*H*<sup>c</sup> (*N*/2, −*N*/2 + 1) ··· *H*<sup>c</sup> (*N*/2, *N*/2)

. ... .

−*e*

<sup>1</sup> <sup>−</sup> *<sup>e</sup>*−*j*(*n*−*n*1)*ω*<sup>0</sup>

*<sup>e</sup>*−*jn*1*ω*0*km* <sup>−</sup>*e*−*jnω*0*km <sup>e</sup>*−*j*(*n*1−*n*)*ω*0*N*gi <sup>1</sup> <sup>−</sup> *<sup>e</sup>*−*j*(*n*−*n*1)*ω*<sup>0</sup>

*<sup>N</sup> hme*−*jnω*0*km* , for *<sup>n</sup>* <sup>=</sup> *<sup>n</sup>*1.

⎤ ⎥ ⎥ ⎥ ⎦ ,

= **Y**(*l*) − **E**s(*l*) − **H**c**D**(*l*) − **V**(*l*), (19)

. . ⎤ ⎥ ⎦ .

included in (9) and (10). They can be given by

On the other hand, *E*s(*n*, *l*) in (9) can be rewritten by

⎧ ⎪⎪⎪⎨

⎪⎪⎪⎩

in the frequency domain, where *H*s(*n*, *n*1) is

Similarly, *E*c(*n*, *l*) in (10) is approximated by

⎧ ⎪⎪⎪⎨

*M* ∑ *m*=*m*<sup>1</sup>

*M* ∑ *m*=*m*<sup>1</sup>

*hm N*

*km*−*N*gi

*ej*(−*N*/2+1)*ω*<sup>0</sup>

⎪⎪⎪⎩

*H*s(*n*, *n*1) =

where *H*c(*n*, *n*1) is given by

*H*c(*n*, *n*1) =

**H** =

**H**<sup>c</sup> =

⎡ ⎢ ⎢ ⎢ ⎣

> ⎡ ⎢ ⎣

*H* �

*E*s(*n*, *l*) =

*M* ∑ *m*=*m*<sup>1</sup>

*M* ∑ *m*=*m*<sup>1</sup>

*hm N e*

*E*c(*n*, *l*) = −

*km*−*N*gi *<sup>N</sup> hme*

Let the sum of ISI and ICI be denoted as a leakage error *E*(*n*, *l*). Therefore, the frequency domain expression of the received signal in the *l*th symbol period is given by

$$Y(n,l) = H(e^{j n \omega\_0})D(n,l) + \underbrace{E\_{\mathbb{C}}(n,l) + E\_{\mathbb{S}}(n,l)}\_{E(n,l)} + V(n,l),\tag{11}$$

where *Y*(*n*, *l*) and *V*(*n*, *l*) are the frequency components of the received signal and noise at the *n*th carrier. It is clear that the leakage error *E*(*n*, *l*) deteriorates the orthogonality of OFDM carriers and will cause large equalization error. A 16 QAM example with high signal to noise ratio (SNR=30dB) is shown in Fig.3. Besides the direct wave, there are two multipath interference waves. Fig.3(a) indicates the result of MMSE equalization where the delay taps are within GI. It is seen that its equalization error is very low, and can be removed by conventional error correction techniques. However, if one of the multipath interference has delay tap exceeding GI, the equalization error significantly increases even under high SNR situations. For example, just one interference with delay tap 1.25 times longer than GI increases the bit error rate (BER) up to 15% in Fig3(b). Therefore, it is important to reduce the influence of *E*(*n*, *l*) to guarantee high communication performance.

(a) Equalization result when delay taps are within GI (b) Equalization result when one of the delay taps exceeds GI.

Fig. 3. Examples of equalization with multipath interferences

#### **3.1.2 Expressions of ICI and ISI**

It is seen that only the coefficients *hm* for *km > N*gi remain in (9) and (10). Correspondingly, the sub-model of the multipaths exceeding GI can be expressed by *z* transform as

$$\Gamma(z) = h\_{m\_1} z^{N\_{\vec{\theta}} + 1 - k\_{m\_1}} + h\_{m\_1 + 1} z^{N\_{\vec{\theta}} + 1 - k\_{m\_1 + 1}} + \dots + h\_M z^{N\_{\vec{\theta}} + 1 - k\_M} \tag{12}$$

where *hm*<sup>1</sup> , *hm*1+1, ··· are the coefficients of the multipaths exceeding GI. Then performing inverse Fourier transform of *E*s(*n*, *l*) in (9) and *E*c(*n*, *l*) in (10) yields the signals of ISI and ICI in the time domain as

$$
\varepsilon\_{\rm s}(k, l) = \Gamma(z) d\_{\rm s}(k, l), \tag{13}
$$

$$
\varepsilon\_{\mathbb{C}}(k, l) = \Gamma(z) d\_{\mathbb{C}}(k, l) \tag{14}
$$

for *k* = 0, 1, ··· , *N* − 1, where *d*s(*k*, *l*) and *d*c(*k*, *l*) are the corresponding transmitted signals included in (9) and (10). They can be given by

$$d\_{\mathbf{s}}(k,l) = \begin{cases} d(k-1+(l-1)\mathbf{N}\_{\mathbf{k}})\_{\prime} & \text{for } -N+N\_{\mathbf{g}}+1 \le k \le 0\\ 0, & \text{for } k > 0 \end{cases} \tag{15}$$

$$d\_{\mathbb{C}}(k,l) = \begin{cases} -d(k - N\_{\text{gi}} - 1 + lN\_{\text{lx}}), & \text{for } -N + N\_{\text{gi}} + 1 \le k \le 0 \\ 0, & \text{for } k > 0 \end{cases}.\tag{16}$$

On the other hand, *E*s(*n*, *l*) in (9) can be rewritten by

$$E\_{\sf s}(n,l) = \sum\_{n\_1=-N/2+1}^{N/2} H\_{\sf s}(n,n\_1)D(n\_1,l-1) \tag{17}$$

in the frequency domain, where *H*s(*n*, *n*1) is

$$H\_{\mathbf{S}}(n,n\_{1}) = \begin{cases} \sum\_{m=m\_{1}}^{M} \frac{h\_{m}}{N} \frac{e^{-jn\_{1}\omega\_{0}\left(k\_{m}-N\_{\mathrm{g}}\right)} - e^{-jn\omega\_{0}\left(k\_{m}-N\_{\mathrm{g}i}\right)}}{1 - e^{-j\left(n-n\_{1}\right)\omega\_{0}}}, & \text{for } n \neq n\_{1}, \\\sum\_{m=m\_{1}}^{M} \frac{k\_{m}-N\_{\mathrm{g}i}}{N} h\_{m} e^{-jn\omega\_{0}\left(k\_{m}-N\_{\mathrm{g}i}\right)}, & \text{for } n = n\_{1}. \end{cases}$$

Similarly, *E*c(*n*, *l*) in (10) is approximated by

$$E\_{\mathbb{C}}(n,l) = -\sum\_{n\_1=-N/2+1}^{N/2} H\_{\mathbb{C}}(n,n\_1)D(n\_1,l),\tag{18}$$

where *H*c(*n*, *n*1) is given by

6 Will-be-set-by-IN-TECH

Let the sum of ISI and ICI be denoted as a leakage error *E*(*n*, *l*). Therefore, the frequency

where *Y*(*n*, *l*) and *V*(*n*, *l*) are the frequency components of the received signal and noise at the *n*th carrier. It is clear that the leakage error *E*(*n*, *l*) deteriorates the orthogonality of OFDM carriers and will cause large equalization error. A 16 QAM example with high signal to noise ratio (SNR=30dB) is shown in Fig.3. Besides the direct wave, there are two multipath interference waves. Fig.3(a) indicates the result of MMSE equalization where the delay taps are within GI. It is seen that its equalization error is very low, and can be removed by conventional error correction techniques. However, if one of the multipath interference has delay tap exceeding GI, the equalization error significantly increases even under high SNR situations. For example, just one interference with delay tap 1.25 times longer than GI increases the bit error rate (BER) up to 15% in Fig3(b). Therefore, it is important to reduce the

> 

exceeds GI.

<sup>Γ</sup>(*z*) = *hm*<sup>1</sup> *<sup>z</sup>N*gi+1−*km*<sup>1</sup> <sup>+</sup> *hm*1+1*zN*gi+1−*km*1+<sup>1</sup> <sup>+</sup> ··· <sup>+</sup> *hMz<sup>N</sup>*gi+1−*kM*, (12)

It is seen that only the coefficients *hm* for *km > N*gi remain in (9) and (10). Correspondingly,

where *hm*<sup>1</sup> , *hm*1+1, ··· are the coefficients of the multipaths exceeding GI. Then performing inverse Fourier transform of *E*s(*n*, *l*) in (9) and *E*c(*n*, *l*) in (10) yields the signals of ISI and ICI

the sub-model of the multipaths exceeding GI can be expressed by *z* transform as

 *E*(*n*,*l*)

+*V*(*n*, *l*), (11)

(b) Equalization result when one of the delay taps

*ε*s(*k*, *l*) = Γ(*z*)*d*s(*k*, *l*), (13)

*ε*c(*k*, *l*) = Γ(*z*)*d*c(*k*, *l*) (14)

domain expression of the received signal in the *l*th symbol period is given by

influence of *E*(*n*, *l*) to guarantee high communication performance.

(a) Equalization result when delay taps are within

**3.1.2 Expressions of ICI and ISI**

in the time domain as

Fig. 3. Examples of equalization with multipath interferences

 

GI

*Y*(*n*, *l*) = *H*(*ejnω*<sup>0</sup> )*D*(*n*, *l*) + *E*c(*n*, *l*) + *E*s(*n*, *l*)

$$H\_{\mathbf{f}}(n,n\_{1}) = \begin{cases} \sum\_{m=m\_{1}}^{M} \frac{h\_{m}}{N} \frac{e^{-jn\_{1}\omega\_{0}k\_{m}} - e^{-jn\omega\_{0}k\_{m}}e^{-j(n\_{1}-n)\omega\_{0}N\_{\mathbf{g}}}}{1 - e^{-j(n-n\_{1})\omega\_{0}}}, & \text{for } n \neq n\_{1}, \\\sum\_{m=m\_{1}}^{M} \frac{k\_{m} - N\_{\mathbf{g}}}{N} h\_{m} e^{-jn\omega\_{0}k\_{m}}, & \text{for } n = n\_{1}. \end{cases}$$

From (17) and (18), the data in the frequency domain fulfil the following expression

$$\begin{array}{l} \mathbf{HD}(l) = \mathbf{Y}(l) - \mathbf{E}(l) - \mathbf{V}(l) \\ \quad = \mathbf{Y}(l) - \mathbf{E}\_{\sf s}(l) - \mathbf{H}\_{\sf C}\mathbf{D}(l) - \mathbf{V}(l) \end{array} \tag{19}$$

where **Y**(*l*), **D**(*l*), **E**(*l*), **E**s(*l*) and **V**(*l*) are the vectors of FFT coefficients of the received signal, the information symbols, leakage error, ISI and noise in the *l*th period, respectively, and

$$\begin{aligned} \mathbf{H} &= \begin{bmatrix} H\left(e^{j(-N/2+1)\omega\_0}\right) \\ 0 & \ddots & 0 \\ & H\left(e^{jN/2\omega\_0}\right) \end{bmatrix}, \\ \mathbf{H}\_{\mathbf{C}} &= \begin{bmatrix} H\_{\mathbf{C}}\left(-N/2+1, -N/2+1\right) \cdots & H\_{\mathbf{C}}\left(-N/2+1, N/2\right) \\ \vdots & \ddots & \vdots \\ 0 & \ddots & \vdots \\ H\_{\mathbf{C}}\left(N/2, -N/2+1\right) & \cdots & H\_{\mathbf{C}}\left(N/2, N/2\right) \end{bmatrix}. \end{aligned}$$

Denote the IFFT coefficients of *H*¯(*ejnω*<sup>0</sup> ) as ¯

**3.2 Channel identification algorithm**

consequently, the estimation of *H*¯(*ejnω*<sup>0</sup> )

linear interpolation yields that

the pilot rate is not high enough.

**3.2.1 Diversity of multiple antennas**

compensated by the replica of leakage error.

*hm* =

¯

*N*¯ *hm* 

*H*ˆ¯ (*ejPnω*<sup>0</sup> ) =

*<sup>H</sup>*ˆ¯ (*ejnω*<sup>0</sup> ) = *<sup>H</sup>*ˆ¯ (*ejPn*,1*ω*<sup>0</sup> ) + *<sup>n</sup>* <sup>−</sup> *Pn*,1

1 *L*

1 *L*

*L* ∑ *l*=1

*L* ∑ *l*=1

is obtained at the pilot carrier *Pn*. As for non-pilot carriers, if the pilot rate is high, a simple

*Pn*,2 − *Pn*,1

to more smooth interpolation. Furthermore, *H*ˆ (*ejnω*<sup>0</sup> ) can be determined by *H*ˆ¯ (*ejnω*<sup>0</sup> ).

where *Pn*,1 and *Pn*,2 are the number of two adjacent pilot carriers, *Pn*,1 ≤ *n* ≤ *Pn*,2. Compared with the linear interpolation, the second order or high order interpolation methods could lead

Nevertheless, as mentioned in Section 3.1.1, the components of ICI and ISI contaminate all the carriers, and the frequency response function *H*(*ejnω*<sup>0</sup> ) varies remarkably when the channel has long multipath interferences, as shown in Fig.4. As a result, neither the interpolation method nor equalization using the frequency selective diversity can yield satisfactory result if

We will consider some new information estimation and channel identification algorithm by making use of multiple receiver antennas and spectral periodograms whose ISI and ICI are

Commonly, except the symbols at pilot carriers, the information symbols have to be estimated from the received signals for channel identification. Nevertheless, many existing symbol estimation methods cannot work well under the long multipath situations. It is seen that at

*D*∗(*Pn*, *l*)*Y*(*Pn*, *l*)

*D*∗(*Pn*, *l*)*D*(*Pn*, *l*)

*<sup>H</sup>*ˆ¯ (*ejPn*,2*ω*<sup>0</sup> ) <sup>−</sup> *<sup>H</sup>*ˆ¯ (*ejPn*,1*ω*<sup>0</sup> )

, (26)

of the leakage error even when the channel has long multipath interferences.

obtained by

*h*0, ¯ *h*1, ¯

Channel Identification for OFDM Communication System in Frequency Domain 89

From (23) and (24), it can be seen that it is possible to estimate *H*(*ejnω*<sup>0</sup> ) by using the properties

When several preamble or training symbols are available, (23) and (24) can give a batch channel identification only with computational complexity of O(*N*). When no successive training symbols are applicable for channel identification, the pilot symbols could be utilized in some conventional interpolation based channel estimation methods (Coleri et al., 2002; Nguyen et al., 2003). For example, provided that the scattered pilot symbols are assigned at *Pn*th carrier, then the symbol *D*(*Pn*, *l*) at the pilot carrier *Pn* is known at the receiver,

*hm*, for 0≤*km* ≤ *N*gi,

*h*2, ··· , then the coefficients *hm* can be

(25)

(*N*−*N*gi+*km*), for *km <sup>&</sup>gt; <sup>N</sup>*gi. (24)

#### **3.1.3 Statistical properties of ICI and ISI**

Since *E*s(*n*, *l*) in (17) is only related to the information symbols in the (*l* − 1)th symbol period, then from (2), *D*(*n*, *l*) and *E*s(*n*, *l*) are uncorrelated, i.e.,

$$\begin{split} \lim\_{L \to \infty} &\frac{1}{L} \sum\_{l=1}^{L} D^\*(n,l) E\_{\sf s}(n,l) \\ &= \sum\_{n\_1=-N/2+1}^{N/2} \left( \lim\_{L \to \infty} \frac{1}{L} \sum\_{l=1}^{L} D^\*(n,l) D(n\_1,l-1) \right) H\_{\sf s}(n,n\_1) = 0. \end{split} \tag{20} $$

Moreover, multiplying *E*c(*n*, *l*) in (18) by the conjugate information symbol *D*∗(*n*, *l*) and using the results in (2) lead to the following result

$$\begin{split} \lim\_{L \to \infty} &\frac{1}{L} \sum\_{l=1}^{L} D^\*(n, l) \mathbf{E}\_{\mathbf{c}}(n, l) \\ &= -\sum\_{n\_1 = -N/2 + 1}^{N/2} \left( \lim\_{L \to \infty} \frac{1}{L} \sum\_{l=1}^{L} D^\*(n, l) D(n\_1, l) \right) \mathbf{H}\_{\mathbf{c}}(n, n\_1) \\ &= -\bar{D}^2 H\_{\mathbf{c}}(n, n) = -\bar{D}^2 \sum\_{m = m\_1}^{M} \frac{k\_m - N\_{\text{gi}}}{N} h\_m e^{-j n k\_m \omega\_0}. \end{split} \tag{21}$$

Following (21), it is clear that the longer the delay tap *km*, the greater the leakage error is. Therefore, the symbol equalization or interference compensation becomes more difficult.

Furthermore, from (11), the following equation

$$\begin{split} \frac{1}{L} \sum\_{l=1}^{L} D^\*(n,l)Y(n,l) &= H(e^{j n \omega\_0}) \frac{1}{L} \sum\_{l=1}^{L} D^\*(n,l)D(n,l) \\ &+ \frac{1}{L} \sum\_{l=1}^{L} D^\*(n,l) \left( E\_8(n,l) + E\_c(n,l) \right) + \frac{1}{L} \sum\_{l=1}^{L} D^\*(n,l)V(n,l) \end{split} \tag{22}$$

holds true. Then by using the results of (2), (20) and (21), *H*¯ (*ejnω*<sup>0</sup> ) defined in (23) can be obtained by (22) as follows.

$$\begin{split} \bar{H}(e^{jm\omega\_{0}}) &= \lim\_{L\to\infty} \frac{\frac{1}{L} \sum\_{l=1}^{L} \left( D^{\*}(n,l)Y(n,l) \right)}{\frac{1}{L} \sum\_{l=1}^{L} \left( D^{\*}(n,l)D(n,l) \right)} \\ &= H(e^{jm\omega\_{0}}) + \lim\_{L\to\infty} \frac{\frac{1}{L} \sum\_{l=1}^{L} \left( D^{\*}(n,l)E\_{\mathrm{c}}(n,l) \right)}{\frac{1}{L} \sum\_{l=1}^{L} \left( D^{\*}(n,l)D(n,l) \right)} \\ &= \sum\_{m=0}^{m\_{1}-1} h\_{m}e^{-j\kappa k\_{n}\omega\_{0}} + \sum\_{m=m\_{1}}^{M} \left( 1 - \frac{k\_{m} - N\_{\mathrm{g}\mathrm{i}}}{N} \right) hne^{-j\kappa k\_{n}\omega\_{0}}. \end{split} \tag{23}$$

Denote the IFFT coefficients of *H*¯(*ejnω*<sup>0</sup> ) as ¯ *h*0, ¯ *h*1, ¯ *h*2, ··· , then the coefficients *hm* can be obtained by

$$h\_{m} = \begin{cases} \bar{h}\_{m}, & \text{for } 0 \le k\_{m} \le N\_{\text{gi}}, \\ N\bar{h}\_{m}/(N - N\_{\text{gi}} + k\_{m}), & \text{for } k\_{m} > N\_{\text{gi}}. \end{cases} \tag{24}$$

From (23) and (24), it can be seen that it is possible to estimate *H*(*ejnω*<sup>0</sup> ) by using the properties of the leakage error even when the channel has long multipath interferences.

#### **3.2 Channel identification algorithm**

8 Will-be-set-by-IN-TECH

Since *E*s(*n*, *l*) in (17) is only related to the information symbols in the (*l* − 1)th symbol period,

Moreover, multiplying *E*c(*n*, *l*) in (18) by the conjugate information symbol *D*∗(*n*, *l*) and using

*D*∗(*n*, *l*)*D*(*n*1, *l*−1)

*D*∗(*n*, *l*)*D*(*n*1, *l*)

*km* − *N*gi

*D*∗(*n*, *l*)*D*(*n*, *l*)

*L* ∑ *l*=1

*H*c(*n*, *n*1)

*<sup>N</sup> hme*−*jnkmω*<sup>0</sup> . (21)

*D*∗(*n*, *l*)*V*(*n*, *l*) (22)

*hme*−*jnkmω*<sup>0</sup> . (23)

*Hs*(*n*, *n*1) = 0. (20)

**3.1.3 Statistical properties of ICI and ISI**

lim *L*→∞

=

the results in (2) lead to the following result

lim *L*→∞ 1 *L*

= −

Furthermore, from (11), the following equation

*L* ∑ *l*=1

1 *L*

obtained by (22) as follows.

*L* ∑ *l*=1

> + 1 *L*

*H*¯ (*ejnω*<sup>0</sup> ) = lim

= *m*1−1 ∑ *m*=0

*L*→∞

*L* ∑ *l*=1

*N*/2 <sup>∑</sup> *<sup>n</sup>*1=−*N*/2+<sup>1</sup>

1 *L*

then from (2), *D*(*n*, *l*) and *E*s(*n*, *l*) are uncorrelated, i.e.,

*D*∗(*n*, *l*)*E*s(*n*, *l*)

1 *L L* ∑ *l*=1

*D*∗(*n*, *l*)*E*c(*n*, *l*)

1 *L*

*L* ∑ *l*=1

> *M* ∑ *m*=*m*<sup>1</sup>

Following (21), it is clear that the longer the delay tap *km*, the greater the leakage error is. Therefore, the symbol equalization or interference compensation becomes more difficult.

> 1 *L*

*E*s(*n*, *l*) + *E*c(*n*, *l*)

holds true. Then by using the results of (2), (20) and (21), *H*¯ (*ejnω*<sup>0</sup> ) defined in (23) can be

(*D*∗(*n*, *l*)*Y*(*n*, *l*))

(*D*∗(*n*, *l*)*D*(*n*, *l*))

*L* ∑ *l*=1

*L* ∑ *l*=1

*M* ∑ *m*=*m*<sup>1</sup> 1 *L*

> 1 *L*

*L* ∑ *l*=1

> + 1 *L*

(*D*∗(*n*, *l*)*E*c(*n*, *l*))

(*D*∗(*n*, *l*)*D*(*n*, *l*))

<sup>1</sup> <sup>−</sup> *km* <sup>−</sup> *<sup>N</sup>*gi *N*

 lim *L*→∞

<sup>=</sup> <sup>−</sup>*D*¯ <sup>2</sup>*H*c(*n*, *<sup>n</sup>*) = <sup>−</sup>*D*¯ <sup>2</sup>

*D*∗(*n*, *l*)*Y*(*n*, *l*) = *H*(*ejnω*<sup>0</sup> )

*D*∗(*n*, *l*)

1 *L*

1 *L*

= *H*(*ejnω*<sup>0</sup> ) + lim

*L* ∑ *l*=1

*L* ∑ *l*=1

*L*→∞

*hme*−*jnkmω*<sup>0</sup> +

*L* ∑ *l*=1

*N*/2 <sup>∑</sup> *<sup>n</sup>*1=−*N*/2+<sup>1</sup>

 lim *L*→∞

> When several preamble or training symbols are available, (23) and (24) can give a batch channel identification only with computational complexity of O(*N*). When no successive training symbols are applicable for channel identification, the pilot symbols could be utilized in some conventional interpolation based channel estimation methods (Coleri et al., 2002; Nguyen et al., 2003). For example, provided that the scattered pilot symbols are assigned at *Pn*th carrier, then the symbol *D*(*Pn*, *l*) at the pilot carrier *Pn* is known at the receiver, consequently, the estimation of *H*¯(*ejnω*<sup>0</sup> )

$$\hat{H}(e^{jP\_{\text{n}}\omega\_{0}}) = \frac{\frac{1}{L}\sum\_{l=1}^{L} D^{\*}(P\_{\text{n}}l)Y(P\_{\text{n}}l)}{\frac{1}{L}\sum\_{l=1}^{L} D^{\*}(P\_{\text{n}}l)D(P\_{\text{n}}l)}\tag{25}$$

is obtained at the pilot carrier *Pn*. As for non-pilot carriers, if the pilot rate is high, a simple linear interpolation yields that

$$\hat{H}(e^{j n \omega\_0}) = \hat{H}(e^{j P\_{n,1} \omega\_0}) + \frac{n - P\_{n,1}}{P\_{n,2} - P\_{n,1}} \left( \hat{H}(e^{j P\_{n,2} \omega\_0}) - \hat{H}(e^{j P\_{n,1} \omega\_0}) \right), \tag{26}$$

where *Pn*,1 and *Pn*,2 are the number of two adjacent pilot carriers, *Pn*,1 ≤ *n* ≤ *Pn*,2. Compared with the linear interpolation, the second order or high order interpolation methods could lead to more smooth interpolation. Furthermore, *H*ˆ (*ejnω*<sup>0</sup> ) can be determined by *H*ˆ¯ (*ejnω*<sup>0</sup> ).

Nevertheless, as mentioned in Section 3.1.1, the components of ICI and ISI contaminate all the carriers, and the frequency response function *H*(*ejnω*<sup>0</sup> ) varies remarkably when the channel has long multipath interferences, as shown in Fig.4. As a result, neither the interpolation method nor equalization using the frequency selective diversity can yield satisfactory result if the pilot rate is not high enough.

We will consider some new information estimation and channel identification algorithm by making use of multiple receiver antennas and spectral periodograms whose ISI and ICI are compensated by the replica of leakage error.

#### **3.2.1 Diversity of multiple antennas**

Commonly, except the symbols at pilot carriers, the information symbols have to be estimated from the received signals for channel identification. Nevertheless, many existing symbol estimation methods cannot work well under the long multipath situations. It is seen that at

can be expressed by

channel models.

**<sup>Y</sup>**¯ *<sup>q</sup>*(*l*) = **<sup>Y</sup>***q*(*l*) <sup>−</sup> **<sup>E</sup>**s,*q*(*l*).

the *n*th diagonal entry is *H*ˆ

*n*th row are the values of *H*ˆ

direct inverse **H**−<sup>1</sup>

by

**D**ˆ (*l*) = **H**−<sup>1</sup>

the matrix inverse is considered.

**H**−<sup>1</sup>

= 

≈ **I** − 

where **H**−<sup>1</sup>

kms (*l*) =

**H**kms,dg(*l*)

 **I** + **H**−<sup>1</sup>

**<sup>I</sup>** <sup>−</sup> **<sup>H</sup>**−<sup>1</sup>

**A. Key magnitude selection based approach** Let *q*max(*n*) be the antenna number such that

*q*max(*n*) = arg

*q*

kms (*l*)**Y**¯ kms(*l*) =

**H**kms,nodg(*l*) in the key magnitude selection, then the inverse **H**−<sup>1</sup>

**H**kms,dg(*l*) + **H**kms,nodg(*l*)

kms,dg(*l*)**H**kms,nodg(*l*)

kms,dg(*l*)**H**kms,nodg(*l*)

max 

*<sup>q</sup>*max(*n*)(*ejnω*<sup>0</sup> , *<sup>l</sup>* <sup>−</sup> <sup>1</sup>) + *<sup>H</sup>*<sup>ˆ</sup>

part **H**kms,dg(*l*) and the **H**kms,nodg(*l*). Then the estimate of **D**(*l*) can be given by

**H***<sup>q</sup>* + **H**c,*<sup>q</sup>*

where the subscript *q* indicates the antenna number. Two algorithms are developed to estimate the symbol vector **D**(*l*) in the *l*th period from the received signals and the estimated

Channel Identification for OFDM Communication System in Frequency Domain 91

In the (*<sup>l</sup>* <sup>−</sup> <sup>1</sup>)th symbol period, denote the estimates of information symbol as *<sup>D</sup>*ˆ(*n*, *<sup>l</sup>* <sup>−</sup> *<sup>l</sup>*), the frequency response as *<sup>H</sup>*<sup>ˆ</sup> *<sup>q</sup>*(*ejnω*<sup>0</sup> , *<sup>l</sup>* <sup>−</sup> <sup>1</sup>), and its sub-model for the part exceeding GI as <sup>Γ</sup><sup>ˆ</sup> *<sup>q</sup>*(*z*, *<sup>l</sup>* <sup>−</sup> <sup>1</sup>), respectively. These estimates are used in the estimation of *<sup>D</sup>*<sup>ˆ</sup> (*n*, *<sup>l</sup>*) and *<sup>H</sup>*<sup>ˆ</sup> *<sup>q</sup>*(*ejnω*<sup>0</sup> , *<sup>l</sup>*), and the affection of ISI is mitigated by estimating **<sup>E</sup>**s,*q*(*l*) from *<sup>D</sup>*<sup>ˆ</sup> (*n*, *<sup>l</sup>* <sup>−</sup> *<sup>l</sup>*) and <sup>Γ</sup><sup>ˆ</sup> *<sup>q</sup>*(*z*, *<sup>l</sup>* <sup>−</sup> <sup>1</sup>) in (13). The received signal with ISI compensation in the frequency domain is indicated as

It is seen that <sup>|</sup>*H*<sup>ˆ</sup> *<sup>q</sup>*max(*n*)(*ejnω*<sup>0</sup> , *<sup>l</sup>* <sup>−</sup> <sup>1</sup>) + *<sup>H</sup>*<sup>ˆ</sup> c,*q*max(*n*)(*n*, *<sup>n</sup>*)<sup>|</sup> takes the maximum for 1 <sup>≤</sup> *<sup>q</sup>*max(*n*) <sup>≤</sup> *Q* at the *n*th carrier so that the strongest orthogonal component is used to estimate the information symbol *D*(*n*, *l*). It means that the influence of ICI and ISI will be decreased through the technique of key magnitude selection (KMS). Define a matrix **H**kms(*l*) such that

where **<sup>Y</sup>**¯ kms(*l*) is the corresponding vector of *Yq*max(*n*)(*n*, *<sup>l</sup>*). However, the computation of

Notice that the magnitude of the diagonal entries in **H**kms,dg(*l*) is larger than that of

 **H**−<sup>1</sup>

estimate of **D**(*l*) is obtained by multiplication of matrices and vectors, and the estimate of

kms,dg(*l*) is just the reciprocal of the diagonal matrix **H**kms,dg(*l*). Therefore, the

*<sup>H</sup>*<sup>ˆ</sup> *<sup>q</sup>*(*ejnω*<sup>0</sup> , *<sup>l</sup>* <sup>−</sup> <sup>1</sup>) + *<sup>H</sup>*<sup>ˆ</sup> c,*q*(*n*, *<sup>n</sup>*)

**H**kms,dg(*l*) + **H**kms,nodg(*l*)

kms (*l*) is time-consuming. In the proposed algorithm an approximation of

−<sup>1</sup>

−<sup>1</sup>

kms,dg(*l*)**H**kms,nodg(*l*)

c,*q*max(*n*)(*n*, *n*1). Furthermore, separate **H**kms (*l*) into the diagonal

**D**(*l*) = **Y***q*(*l*) − **E**s,*q*(*l*) − **V***q*(*l*), (27)

 

c,*q*max(*n*)(*n*, *n*), while the other entries in the

 **H**−<sup>1</sup>

−<sup>1</sup>

. (28)

**Y**¯ kms(*l*), (29)

kms (*l*) can be approximated

kms,dg(*l*). (30)

the *n*th carrier where *H*(*ejnω*<sup>0</sup> ) is small, the orthogonal component in (6) attenuates to such a small value that symbol equalization becomes fragile to the noise and leakage error, even the error correction techniques might fail to correct the equalization errors at the carriers with small magnitude of frequency response. In order to overcome these difficulties, the diversity of multiple receiver antennas is used in the proposed algorithm. Let the total number of antenna elements be *Q*, correspondingly, the received signal at the *q*th antenna be denoted as *yq*(*k*), where 1 ≤ *q* ≤ *Q*. Correspondingly, the frequency response function from the transmitter to the *q*th antenna is *Hq*(*ejnω*<sup>0</sup> ), and its sub-model for the exceeding GI part is Γ*q*(*z*).

Fig. 4. Example of <sup>|</sup>*Hq*(*e*−*jnω*<sup>0</sup> )<sup>|</sup> , *<sup>q</sup>* <sup>=</sup> 1, ··· , 4

The relative magnitude of *Hq*(*ejnω*<sup>0</sup> ) for 1 <sup>≤</sup> *<sup>q</sup>* <sup>≤</sup> 4 and 0 <sup>≤</sup> *<sup>n</sup>* <sup>≤</sup> 20 is illustrated in Fig.4. Consider *H*1(*e*−*jnω*<sup>0</sup> ) of the first antenna, notice that at the carriers *n* = 3, 13, 19 marked by circle the low *H*1(*e*−*jnω*<sup>0</sup> ) implies that the symbols are difficult to be estimated by frequency selective diversity of single antenna, since low magnitude of *H*1(*ejnω*<sup>0</sup> ) leads to a weak orthogonal component in the received signal *y*1(*k*). On the other hand, *H*2(*ejnω*<sup>0</sup> ) has larger magnitude at the carriers marked with circle, and can help the estimation of information symbols.

An approach had been discussed in (Sun et al., 2009) to perform symbol estimation per information carrier by selecting the largest magnitude *Hq*(*e*−*jnω*<sup>0</sup> ) from the *Q* receiver antennas, where it had to perform ICI reduction and symbol estimation iteratively. Next, the more effective estimation approaches without iterative computation will be considered.

#### **3.2.2 Estimation of information symbols**

In the *l*th symbol period, let the frequency component of *yq*(*k*) at the *n*th carrier be denoted by *Yq*(*n*, *l*). It is calculated from *yq*(*k*) easily by using FFT algorithm within the FFT window *lN*tx ≤ *k < lN*tx + *N*. From (19), the relation between the symbol vector and received signals can be expressed by

10 Will-be-set-by-IN-TECH

a small value that symbol equalization becomes fragile to the noise and leakage error, even the error correction techniques might fail to correct the equalization errors at the carriers with small magnitude of frequency response. In order to overcome these difficulties, the diversity of multiple receiver antennas is used in the proposed algorithm. Let the total number of antenna elements be *Q*, correspondingly, the received signal at the *q*th antenna be denoted as *yq*(*k*), where 1 ≤ *q* ≤ *Q*. Correspondingly, the frequency response function from the transmitter to the *q*th antenna is *Hq*(*ejnω*<sup>0</sup> ), and its sub-model for the exceeding GI part is

The relative magnitude of *Hq*(*ejnω*<sup>0</sup> ) for 1 <sup>≤</sup> *<sup>q</sup>* <sup>≤</sup> 4 and 0 <sup>≤</sup> *<sup>n</sup>* <sup>≤</sup> 20 is illustrated in Fig.4. Consider *H*1(*e*−*jnω*<sup>0</sup> ) of the first antenna, notice that at the carriers *n* = 3, 13, 19

by frequency selective diversity of single antenna, since low magnitude of *H*1(*ejnω*<sup>0</sup> ) leads to a weak orthogonal component in the received signal *y*1(*k*). On the other hand, *H*2(*ejnω*<sup>0</sup> ) has larger magnitude at the carriers marked with circle, and can help the estimation of information

An approach had been discussed in (Sun et al., 2009) to perform symbol estimation per

antennas, where it had to perform ICI reduction and symbol estimation iteratively. Next, the more effective estimation approaches without iterative computation will be considered.

In the *l*th symbol period, let the frequency component of *yq*(*k*) at the *n*th carrier be denoted by *Yq*(*n*, *l*). It is calculated from *yq*(*k*) easily by using FFT algorithm within the FFT window *lN*tx ≤ *k < lN*tx + *N*. From (19), the relation between the symbol vector and received signals

 

is small, the orthogonal component in (6) attenuates to such

 

implies that the symbols are difficult to be estimated

*Hq*(*e*−*jnω*<sup>0</sup> )

 

from the *Q* receiver

 

the *n*th carrier where

Γ*q*(*z*).

 

Fig. 4. Example of <sup>|</sup>*Hq*(*e*−*jnω*<sup>0</sup> )<sup>|</sup> , *<sup>q</sup>* <sup>=</sup> 1, ··· , 4

 

information carrier by selecting the largest magnitude

**3.2.2 Estimation of information symbols**

*H*1(*e*−*jnω*<sup>0</sup> )

 

marked by circle the low

symbols.

 

 

 

*H*(*ejnω*<sup>0</sup> )

 

$$\left(\mathbf{H}\_q + \mathbf{H}\_{\rm c,q}\right)\mathbf{D}(l) = \mathbf{Y}\_q(l) - \mathbf{E}\_{\rm s,q}(l) - \mathbf{V}\_q(l),\tag{27}$$

where the subscript *q* indicates the antenna number. Two algorithms are developed to estimate the symbol vector **D**(*l*) in the *l*th period from the received signals and the estimated channel models.

In the (*<sup>l</sup>* <sup>−</sup> <sup>1</sup>)th symbol period, denote the estimates of information symbol as *<sup>D</sup>*ˆ(*n*, *<sup>l</sup>* <sup>−</sup> *<sup>l</sup>*), the frequency response as *<sup>H</sup>*<sup>ˆ</sup> *<sup>q</sup>*(*ejnω*<sup>0</sup> , *<sup>l</sup>* <sup>−</sup> <sup>1</sup>), and its sub-model for the part exceeding GI as <sup>Γ</sup><sup>ˆ</sup> *<sup>q</sup>*(*z*, *<sup>l</sup>* <sup>−</sup> <sup>1</sup>), respectively. These estimates are used in the estimation of *<sup>D</sup>*<sup>ˆ</sup> (*n*, *<sup>l</sup>*) and *<sup>H</sup>*<sup>ˆ</sup> *<sup>q</sup>*(*ejnω*<sup>0</sup> , *<sup>l</sup>*), and the affection of ISI is mitigated by estimating **<sup>E</sup>**s,*q*(*l*) from *<sup>D</sup>*<sup>ˆ</sup> (*n*, *<sup>l</sup>* <sup>−</sup> *<sup>l</sup>*) and <sup>Γ</sup><sup>ˆ</sup> *<sup>q</sup>*(*z*, *<sup>l</sup>* <sup>−</sup> <sup>1</sup>) in (13). The received signal with ISI compensation in the frequency domain is indicated as **<sup>Y</sup>**¯ *<sup>q</sup>*(*l*) = **<sup>Y</sup>***q*(*l*) <sup>−</sup> **<sup>E</sup>**s,*q*(*l*).

#### **A. Key magnitude selection based approach**

Let *q*max(*n*) be the antenna number such that

$$\mathfrak{q}\_{\text{max}}(n) = \underset{\mathcal{q}}{\text{arg}\max} \left( \left| \hat{H}\_{\emptyset} (\varepsilon^{\text{j}n\omega\_{0}}, l-1) + \hat{H}\_{\text{c},\emptyset} (n, n) \right| \right). \tag{28}$$

It is seen that <sup>|</sup>*H*<sup>ˆ</sup> *<sup>q</sup>*max(*n*)(*ejnω*<sup>0</sup> , *<sup>l</sup>* <sup>−</sup> <sup>1</sup>) + *<sup>H</sup>*<sup>ˆ</sup> c,*q*max(*n*)(*n*, *n*)| takes the maximum for 1 ≤ *q*max(*n*) ≤ *Q* at the *n*th carrier so that the strongest orthogonal component is used to estimate the information symbol *D*(*n*, *l*). It means that the influence of ICI and ISI will be decreased through the technique of key magnitude selection (KMS). Define a matrix **H**kms(*l*) such that the *n*th diagonal entry is *H*ˆ *<sup>q</sup>*max(*n*)(*ejnω*<sup>0</sup> , *<sup>l</sup>* <sup>−</sup> <sup>1</sup>) + *<sup>H</sup>*<sup>ˆ</sup> c,*q*max(*n*)(*n*, *n*), while the other entries in the *n*th row are the values of *H*ˆ c,*q*max(*n*)(*n*, *n*1). Furthermore, separate **H**kms (*l*) into the diagonal part **H**kms,dg(*l*) and the **H**kms,nodg(*l*). Then the estimate of **D**(*l*) can be given by

$$\mathbf{D}(l) = \mathbf{H}\_{\mathrm{kms}}^{-1}(l)\mathbf{Y}\_{\mathrm{kms}}(l) = \left(\mathbf{H}\_{\mathrm{kms},\mathrm{dg}}(l) + \mathbf{H}\_{\mathrm{kms},\mathrm{nodg}}(l)\right)^{-1}\mathbf{Y}\_{\mathrm{kms}}(l),\tag{29}$$

where **<sup>Y</sup>**¯ kms(*l*) is the corresponding vector of *Yq*max(*n*)(*n*, *<sup>l</sup>*). However, the computation of direct inverse **H**−<sup>1</sup> kms (*l*) is time-consuming. In the proposed algorithm an approximation of the matrix inverse is considered.

Notice that the magnitude of the diagonal entries in **H**kms,dg(*l*) is larger than that of **H**kms,nodg(*l*) in the key magnitude selection, then the inverse **H**−<sup>1</sup> kms (*l*) can be approximated by

$$\begin{split} \mathbf{H}\_{\mathrm{kms}}^{-1}(l) &= \left(\mathbf{H}\_{\mathrm{kms,d}\mathrm{g}}(l) + \mathbf{H}\_{\mathrm{kms,nod}\mathrm{g}}(l)\right)^{-1} \\ &= \left(\mathbf{H}\_{\mathrm{kms,d}\mathrm{g}}(l)\left(\mathbf{I} + \mathbf{H}\_{\mathrm{kms,d}\mathrm{g}}^{-1}(l)\mathbf{H}\_{\mathrm{kms,nod}\mathrm{g}}(l)\right)\right)^{-1} \\ &\approx \left(\mathbf{I} - \left(\mathbf{I} - \mathbf{H}\_{\mathrm{kms,d}\mathrm{g}}^{-1}(l)\mathbf{H}\_{\mathrm{kms,nod}\mathrm{g}}(l)\right)\mathbf{H}\_{\mathrm{kms,d}\mathrm{g}}^{-1}(l)\mathbf{H}\_{\mathrm{kms,nod}\mathrm{g}}(l)\right)\mathbf{H}\_{\mathrm{kms,d}\mathrm{g}}^{-1}(l). \end{split} \tag{30}$$

where **H**−<sup>1</sup> kms,dg(*l*) is just the reciprocal of the diagonal matrix **H**kms,dg(*l*). Therefore, the estimate of **D**(*l*) is obtained by multiplication of matrices and vectors, and the estimate of

where *λ<sup>l</sup>* is a forgetting factor over the range of 0 *< λ<sup>l</sup> <* 1. It is seen that the effects of noise and estimation errors are reduced when *<sup>l</sup>* becomes large. Using the estimates *SDD*(*ejnω*<sup>0</sup> , *<sup>l</sup>* <sup>−</sup> <sup>1</sup>) and *SDY*,*q*(*ejnω*<sup>0</sup> , *<sup>l</sup>* <sup>−</sup> <sup>1</sup>) in the iteration of (*<sup>l</sup>* <sup>−</sup> <sup>1</sup>)th symbol period, as well as the estimates *D*ˆ (*n*, *l*), *Yq*(*n*, *l*) and *E*ˆ*q*(*n*, *l*) in the *l*th iteration, the estimates of (34) and (35) are obtained.

Channel Identification for OFDM Communication System in Frequency Domain 93

Moreover, Γˆ *<sup>q</sup>*(*z*, *l*) can be updated by (12) using IFFT of *H*ˆ *<sup>q</sup>*(*ejnω*<sup>0</sup> , *l*). Then let *l* = *l* + 1 for the

As mentioned previously, the frequency response function varies remarkably when the interferences have long delay taps. As a result, the side lobes often occur in the impulse

and leakage error, etc. Through setting a threshold between main lobe and side lobes, the effect of side lobes can be reduced to improve the convergence performance of channel identification and BER performance of symbol estimation (Hamazumi & Imamura, 2000).

In the identification algorithm, the estimation of *D*ˆ (*n*, *l*) is calculated from the received signal compensated by *E*ˆs(*n*, *l*) first, next the leakage error *E*ˆ(*n*, *l*) is estimated, then the channel frequency response is estimated from the spectral periodograms of the transmitted and received signals. The procedure of proposed algorithm can be summarized as follows.

Step 1. Let the initial values of *D*ˆ (*n*, 0), *SDY*,*q*(*ejnω*<sup>0</sup> , 0), *SDD*(*ejnω*<sup>0</sup> , 0) be 0. Choose the initial

Step 2. Calculate *Yq*(*n*, *l*) from the received signal *yq*(*k*) within the FFT window *lN*tx ≤ *k <*

Step 4. Estimate **D**ˆ (*l*) by (29) or (33), and determine *D*ˆ (*n*, *l*) by hard decision or some error

Step 5. Calculate *d*c(*k*, *l*) in (16), furthermore estimate the replica of ICI through FFT of

Step 7. Estimate *H*ˆ *<sup>q</sup>*(*ejnω*<sup>0</sup> , *l*)from (36) and update the sub-model Γˆ *<sup>q</sup>*(*z*, *l*) by (12). Let *l* = *l* + 1,

(1) Periodograms can smooth the power spectra *SDY*,*q*(*ejnω*<sup>0</sup> , *l*) and *SDD*(*ejnω*<sup>0</sup> , *l*) so that the periodograms based identification algorithm can reduce the estimation error of *H*ˆ *<sup>q</sup>*(*ejnω*<sup>0</sup> , *l*), which is caused by the estimation errors of *D*ˆ (*n*, *l*) and *E*ˆ*q*(*n*, *l*), or the noise

values of *H*ˆ *<sup>q</sup>*(*ejnω*<sup>0</sup> , 0), Γˆ *<sup>q</sup>*(*z*, 0), and let the iteration number be *l* = 1.

Step 3. Calculate *d*s(*k*, *l*) by (15), and *E*s,*q*(*n*, *l*) by FFT of *ε*s,*q*(*k*, *l*) in (13).

The features of the proposed algorithm are summarized as follows.

Step 6. Calculate *SDY*,*q*(*ejnω*<sup>0</sup> , *l*) and *SDD*(*ejnω*<sup>0</sup> , *l*) by (34) and (35), respectively.

*SDY*,*q*(*ejnω*<sup>0</sup> , *l*) *SDD*(*ejnω*<sup>0</sup> , *l*)

*hm* of channel model due to the noise, the estimation errors of information symbols

. (36)

Thus the estimate of frequency response function *H*ˆ *<sup>q</sup>*(*ejnω*<sup>0</sup> , *l*) can be given by

*H*ˆ *<sup>q</sup>*(*ejnω*<sup>0</sup> , *l*)=

next iteration.

response ˆ

**3.2.5 Procedure of channel identification**

correction techniques, respectively.

then return to Step 2 to repeat the iterations.

term in *Yq*(*n*, *l*) (Pintelon & Schoukens, 2001).

*lN*tx + *N* through FFT.

*ε*ˆc,*q*(*k*, *l*) in (14).

**3.3 Algorithm features**

information symbol *D*(*n*, *l*) at *n*th carrier can be determined by hard decision, or other error correction techniques (Glover & Grant, 1998).

#### **B. Maximal ratio combination based approach**

Define a matrix **H**mrc(*l*) and a vector **Y**¯ mrc(*l*) whose entries of the *n*th row vector are the addition as follows

$$\sum\_{q=1}^{\mathbb{Q}} \left( H\_q(e^{-jm\omega\_0}) + H\_{\mathbb{c},q}(n,n) \right)^\* \left( \mathbf{H}\_q(n) + \mathbf{H}\_{\mathbb{c},q}(n) \right),\tag{31}$$

$$\sum\_{q=1}^{Q} \left( H\_q(e^{-jm\omega\_0}) + H\_{\mathbb{C},q}(n,n) \right)^\* \bar{Y}\_q(n,l),\tag{32}$$

where **H***q*(*n*) and **H**c,*q*(*n*) are the *n*th row vector of **H***<sup>q</sup>* and **H**c,*q*, respectively. If the phase of *Hq*(*e*−*jnω*<sup>0</sup> ) + *H*c,*q*(*n*, *n*) is close to the true one, then the diagonal part **H**mrc,dg(*l*) of **H**mrc(*l*) will yield the dominate component of **Y**¯ mrc(*l*) and lead to an effect of maximal ratio combination (MRC) (Burke et al., 2005). Therefore, similarly as (29), **D**(*l*) can be estimated by

$$\hat{\mathbf{D}}(l) = \mathbf{H}\_{\text{mrc}}^{-1}(l)\mathbf{\bar{Y}}\_{\text{mrc}}(l) = \left(\mathbf{H}\_{\text{mrc},\text{dg}}(l) + \mathbf{H}\_{\text{mrc},\text{nodg}}(l)\right)^{-1}\mathbf{\bar{Y}}\_{\text{mrc}}(l),\tag{33}$$

where the inverse of **H**mrc(*l*) is calculated by a similar approximation as in (30).

Compared with the KMS based approach, the MRC based approach uses all of the received signals' information to reduce the influence of the additive noise, whereas its performance depends on the phase accuracy of *H*ˆ *<sup>q</sup>*(*e*−*jnω*<sup>0</sup> ) + *H*ˆ c,*q*(*n*, *n*). A feasible choice is in the first several symbol periods to employ KMS based approach, which does not depend on the phase information so much, then to switch to the MRC based approach after the estimation error decreases to a low level.

#### **3.2.3 Estimation of leakage error**

The time domain sequence ˆ*d*(*k*) is calculated through IFFT of *D*ˆ (*n*, *l*), then ˆ*d*c(*k*, *l*) can be obtained by (16). Using the estimation of <sup>Γ</sup><sup>ˆ</sup> *<sup>q</sup>*(*z*, *<sup>l</sup>* <sup>−</sup> <sup>1</sup>) and <sup>ˆ</sup>*d*c(*k*, *<sup>l</sup>*), the values of *<sup>ε</sup>*c,*q*(*k*, *<sup>l</sup>*) can be estimated by (14) in the time domain first. Consequently, *E*ˆc,*q*(*n*, *l*) can be calculated from FFT of *ε*c,*q*(*k*, *l*), and the leakage error *E*ˆ*q*(*n*, *l*) can also be obtained by *E*ˆs,*q*(*n*, *l*) + *E*ˆc,*q*(*n*, *l*).

#### **3.2.4 Estimation of frequency response function**

The channel model is estimated from the frequency component *Yq*(*n*, *l*), the symbol estimate *D*ˆ (*n*, *l*) and the replica of leakage error *E*ˆ(*n*, *l*). Consequently, it is important to remove the influence caused by the noise term in *Yq*(*n*, *l*), the estimation errors of *D*ˆ (*n*, *l*) and *E*ˆ(*n*, *l*). The phases of noise and estimation errors are usually random, then their influence can be mitigated through the smoothing effect of spectral periodograms (Pintelon & Schoukens, 2001). In the iteration of *l*th symbol period, the spectral periodograms *SDD*(*ejnω*<sup>0</sup> , *l*) and *SDY*(*ejnω*<sup>0</sup> , *l*) are defined as follows,

$$\mathcal{S}\_{\rm DD}(e^{jm\omega\_0}, l) = \lambda\_l \mathcal{S}\_{\rm DD}(e^{jm\omega\_0}, l-1) + \hat{D}^\*(n, l)\hat{D}(n, l)\_\prime \tag{34}$$

$$S\_{DY,q}(e^{j n \omega\_0}, l) = \lambda\_l S\_{DY,q}(e^{j n \omega\_0}, l - 1) + \hat{D}^\*(n, l) \left(\Upsilon\_q(n, l) - \hat{E}\_q(n, l)\right) \tag{35}$$

where *λ<sup>l</sup>* is a forgetting factor over the range of 0 *< λ<sup>l</sup> <* 1. It is seen that the effects of noise and estimation errors are reduced when *<sup>l</sup>* becomes large. Using the estimates *SDD*(*ejnω*<sup>0</sup> , *<sup>l</sup>* <sup>−</sup> <sup>1</sup>) and *SDY*,*q*(*ejnω*<sup>0</sup> , *<sup>l</sup>* <sup>−</sup> <sup>1</sup>) in the iteration of (*<sup>l</sup>* <sup>−</sup> <sup>1</sup>)th symbol period, as well as the estimates *D*ˆ (*n*, *l*), *Yq*(*n*, *l*) and *E*ˆ*q*(*n*, *l*) in the *l*th iteration, the estimates of (34) and (35) are obtained. Thus the estimate of frequency response function *H*ˆ *<sup>q</sup>*(*ejnω*<sup>0</sup> , *l*) can be given by

$$
\hat{H}\_q(e^{jm\omega\_0}, l) = \frac{\mathcal{S}\_{DY,q}(e^{jm\omega\_0}, l)}{\mathcal{S}\_{DD}(e^{jm\omega\_0}, l)}.\tag{36}
$$

Moreover, Γˆ *<sup>q</sup>*(*z*, *l*) can be updated by (12) using IFFT of *H*ˆ *<sup>q</sup>*(*ejnω*<sup>0</sup> , *l*). Then let *l* = *l* + 1 for the next iteration.

As mentioned previously, the frequency response function varies remarkably when the interferences have long delay taps. As a result, the side lobes often occur in the impulse response ˆ *hm* of channel model due to the noise, the estimation errors of information symbols and leakage error, etc. Through setting a threshold between main lobe and side lobes, the effect of side lobes can be reduced to improve the convergence performance of channel identification and BER performance of symbol estimation (Hamazumi & Imamura, 2000).

#### **3.2.5 Procedure of channel identification**

12 Will-be-set-by-IN-TECH

information symbol *D*(*n*, *l*) at *n*th carrier can be determined by hard decision, or other error

Define a matrix **H**mrc(*l*) and a vector **Y**¯ mrc(*l*) whose entries of the *n*th row vector are the

where **H***q*(*n*) and **H**c,*q*(*n*) are the *n*th row vector of **H***<sup>q</sup>* and **H**c,*q*, respectively. If the phase of *Hq*(*e*−*jnω*<sup>0</sup> ) + *H*c,*q*(*n*, *n*) is close to the true one, then the diagonal part **H**mrc,dg(*l*) of **H**mrc(*l*) will yield the dominate component of **Y**¯ mrc(*l*) and lead to an effect of maximal ratio combination (MRC) (Burke et al., 2005). Therefore, similarly as (29), **D**(*l*) can be estimated by

Compared with the KMS based approach, the MRC based approach uses all of the received signals' information to reduce the influence of the additive noise, whereas its performance depends on the phase accuracy of *H*ˆ *<sup>q</sup>*(*e*−*jnω*<sup>0</sup> ) + *H*ˆ c,*q*(*n*, *n*). A feasible choice is in the first several symbol periods to employ KMS based approach, which does not depend on the phase information so much, then to switch to the MRC based approach after the estimation error

obtained by (16). Using the estimation of <sup>Γ</sup><sup>ˆ</sup> *<sup>q</sup>*(*z*, *<sup>l</sup>* <sup>−</sup> <sup>1</sup>) and <sup>ˆ</sup>*d*c(*k*, *<sup>l</sup>*), the values of *<sup>ε</sup>*c,*q*(*k*, *<sup>l</sup>*) can be estimated by (14) in the time domain first. Consequently, *E*ˆc,*q*(*n*, *l*) can be calculated from FFT

The channel model is estimated from the frequency component *Yq*(*n*, *l*), the symbol estimate *D*ˆ (*n*, *l*) and the replica of leakage error *E*ˆ(*n*, *l*). Consequently, it is important to remove the influence caused by the noise term in *Yq*(*n*, *l*), the estimation errors of *D*ˆ (*n*, *l*) and *E*ˆ(*n*, *l*). The phases of noise and estimation errors are usually random, then their influence can be mitigated through the smoothing effect of spectral periodograms (Pintelon & Schoukens, 2001). In the iteration of *l*th symbol period, the spectral periodograms *SDD*(*ejnω*<sup>0</sup> , *l*) and

*SDD*(*ejnω*<sup>0</sup> , *<sup>l</sup>*) = *<sup>λ</sup>lSDD*(*ejnω*<sup>0</sup> , *<sup>l</sup>* <sup>−</sup> <sup>1</sup>) + *<sup>D</sup>*<sup>ˆ</sup> <sup>∗</sup>(*n*, *<sup>l</sup>*)*D*<sup>ˆ</sup> (*n*, *<sup>l</sup>*), (34)

*Yq*(*n*, *<sup>l</sup>*) <sup>−</sup> *<sup>E</sup>*ˆ*q*(*n*, *<sup>l</sup>*)

(35)

of *ε*c,*q*(*k*, *l*), and the leakage error *E*ˆ*q*(*n*, *l*) can also be obtained by *E*ˆs,*q*(*n*, *l*) + *E*ˆc,*q*(*n*, *l*).

*SDY*,*q*(*ejnω*<sup>0</sup> , *<sup>l</sup>*) = *<sup>λ</sup>lSDY*,*q*(*ejnω*<sup>0</sup> , *<sup>l</sup>* <sup>−</sup> <sup>1</sup>) + *<sup>D</sup>*<sup>ˆ</sup> <sup>∗</sup>(*n*, *<sup>l</sup>*)

∗

∗ *Y*¯

**H**mrc,dg(*l*) + **H**mrc,nodg(*l*)

**H***q*(*n*) + **H**c,*q*(*n*)

−<sup>1</sup>

*d*(*k*) is calculated through IFFT of *D*ˆ (*n*, *l*), then ˆ*d*c(*k*, *l*) can be

*<sup>q</sup>*(*n*, *l*), (32)

, (31)

**Y**¯ mrc(*l*), (33)

<sup>−</sup>*jnω*<sup>0</sup> ) + *H*c,*q*(*n*, *n*)

*Hq*(*e*−*jnω*<sup>0</sup> ) + *H*c,*q*(*n*, *n*)

where the inverse of **H**mrc(*l*) is calculated by a similar approximation as in (30).

correction techniques (Glover & Grant, 1998).

addition as follows

decreases to a low level.

**3.2.3 Estimation of leakage error** The time domain sequence ˆ

*SDY*(*ejnω*<sup>0</sup> , *l*) are defined as follows,

**B. Maximal ratio combination based approach**

*Q* ∑ *q*=1

*Q* ∑ *q*=1  *Hq*(*e*

**<sup>D</sup>**<sup>ˆ</sup> (*l*) = **<sup>H</sup>**−<sup>1</sup> mrc(*l*)**Y**¯ mrc(*l*) =

**3.2.4 Estimation of frequency response function**

In the identification algorithm, the estimation of *D*ˆ (*n*, *l*) is calculated from the received signal compensated by *E*ˆs(*n*, *l*) first, next the leakage error *E*ˆ(*n*, *l*) is estimated, then the channel frequency response is estimated from the spectral periodograms of the transmitted and received signals. The procedure of proposed algorithm can be summarized as follows.


#### **3.3 Algorithm features**

The features of the proposed algorithm are summarized as follows.

(1) Periodograms can smooth the power spectra *SDY*,*q*(*ejnω*<sup>0</sup> , *l*) and *SDD*(*ejnω*<sup>0</sup> , *l*) so that the periodograms based identification algorithm can reduce the estimation error of *H*ˆ *<sup>q</sup>*(*ejnω*<sup>0</sup> , *l*), which is caused by the estimation errors of *D*ˆ (*n*, *l*) and *E*ˆ*q*(*n*, *l*), or the noise term in *Yq*(*n*, *l*) (Pintelon & Schoukens, 2001).

**3.4.1 Example of a simple channel model**

Table 1. Simulation conditions

Besides the direct wave, there are three multipath interferences in the transmission channel (Higuchi & Sasaoka, 2004; Hori et al., 2003). The coefficients of the waves are illustrated in Table.1. Let the SNR= 20dB. The simulation is performed under 6 conditions of pilot rates:

Channel Identification for OFDM Communication System in Frequency Domain 95

Wave Power (dB) Phase DOA Delay tap Direct 0 0 *π*/8 0 Interference 1 -3 <sup>−</sup>*π*/6 <sup>−</sup>*π*/4 <sup>3</sup>*N*gi

Interference 2 -5 *π*/4 *π*/3 5*N*gi

Interference 3 -5 <sup>−</sup>*π*/3 <sup>−</sup>*π*/6 <sup>7</sup>*N*gi

The successive training symbols are available for identification; the pilot rates are 1/2, 1/4, 1/8, 1/16, respectively; and a severe case where only one pilot carrier is known to remove the ambiguity of channel identification and symbol estimation. At the pilot carrier, the value of *D*ˆ (*n*, *l*) is given by the corresponding true value of pilot symbol, while at the other carriers, the value of *D*(*n*, *l*) has to be estimated from the received signals. Channel identification is started from the initial values of *H*ˆ (*ejnω*<sup>0</sup> , 0) = 1, Γˆ(*z*, 0) = 0. For the comparison of estimation

*<sup>H</sup>*<sup>ˆ</sup> *<sup>q</sup>*(*ejnω*<sup>0</sup> )−*Hq*(*ejnω*<sup>0</sup> )

*Hq*(*ejnω*<sup>0</sup> )

 2  2

errors, the square error of channel identification *ErH* defined by

∑ *q*

*N*−1 ∑ *n*=0 

> ∑ *q*

*N*−1 ∑ *n*=0 

is illustrated in Fig.5(a), and BER curves of the estimated symbols are illustrated in Fig.5(b), respectively. They show that the algorithm works well even for few pilot carrier cases. In Fig.5(b), BER of the estimated symbols decreases to 0 after several iterations, whereas it is larger than 0.5 at low pilot rate in the first iteration due to the initial value of channel identification is quite different from the true one. It implies that the algorithm converges even under the severe initial conditions. The BER curves are plotted in Fig.5(b) when the channel estimate is used for symbol estimation. It is seen that the good BER performance can be guaranteed if the influence of ISI and ICI caused by the long multipath interferences is

Since RLS algorithm is often used for channel identification in the previous methods, the results of RLS algorithm are also shown in Fig.5(a) and Fig.5(b) for comparison with the proposed algorithm. They are obtained under the same simulation conditions. In RLS algorithm, the recursion is performed per sampling instant to estimate the parameters of *hm* in (3) by using the latest samples of *y*(*k*) and *d*(*k*), thus RLS updates the estimates *N*tx = 320 times during 1 iteration in the proposed algorithm. In Fig.5(a), if several successive training periods are available, i.e., the true values of *d*(*k*) can be used for channel identification directly, it is seen that RLS algorithm can yield a small error by using the true *d*(*k*) while its computational load is heavier than that of the proposed algorithm. However, when the training symbols are unavailable, ˆ*d*(*k*) has to be estimated for channel identification, the error of ˆ*d*(*k*) deteriorates the identification performance of RLS algorithm. For example, when the pilot rate is 1/2, the convergence of channel identification and the symbol estimation

*ErH* =

compensated by the replica of leakage error.

4

4

4

, (37)


It is noticed that the purpose of using multiple antennas is just the utilization of the stronger orthogonal component at each carrier, the performance of the proposed algorithm does not depend on the total number of antenna elements as much as the conventional spatial equalizers based on antenna diversity (Higuchi & Sasaoka, 2004; Hori et al., 2003). Moreover, the switching between KMS and MRC improves the algorithm performance under the low pilot rate or low SNR environment, and the proposed algorithm can easily be combined with some error correction techniques to obtain better BER performance.


In each iteration, besides the computation of FFT, the proposed algorithm needs the following calculations: 2(*kM* <sup>−</sup> *<sup>N</sup>*gi)2*<sup>q</sup>* multiplications to estimate ICI and ISI, about *<sup>N</sup>*<sup>2</sup> multiplications and divisions for updating **H**kms (*l*) or **H**mrc(*l*), about 2*N*<sup>2</sup> multiplications to estimate the information symbols, (*q* + 1)*N* multiplications to estimate periodograms, *qN* divisions to estimate the frequency response functions. It is seen that the main computation concentrates on the estimation of information symbols, whereas the channel identification is very simple in the frequency domain. By contrast with the computational complexity of RLS algorithm, besides the estimation of information symbols in RLS algorithm, the recursive identification requires about <sup>O</sup>(*k*<sup>2</sup> *<sup>M</sup>N*tx) multiplications for one symbol period. It is clear that the proposed identification algorithm has less complexity than RLS, especially for large *kM*. Though LMS only requires O(2*kMN*tx) multiplications for channel identification, its convergence rate is much slower than RLS (Balakrishnan et al., 2003).

#### **3.4 Numerical simulation examples**

3GPP 2.5MHz OFDM transmission with 16QAM modulation is used in the examples where the FFT size *N* = 256, GI length *N*gi = *N*/4 = 64 (3GPP, 2006). Moreover, the number of carriers is 256, the total number of receiver antennas is *Q* = 2, and their distance is 1/2 of the wave length. The noise is assumed as an additive Gaussian white noise.

#### **3.4.1 Example of a simple channel model**

14 Will-be-set-by-IN-TECH

(2) Unlike some other algorithms in the time domain whose performance depends on the total number of parameters to be estimated (Ljung, 1999), the proposed algorithm estimates the frequency response function per carrier from the spectral periodograms and shows good

(3) By virtue of the multiple antennas' diversity, the stronger orthogonal component is used in KMS or MRC to estimate the information symbols, and it improves the performance of channel identification and symbol estimation significantly compared with the single

It is noticed that the purpose of using multiple antennas is just the utilization of the stronger orthogonal component at each carrier, the performance of the proposed algorithm does not depend on the total number of antenna elements as much as the conventional spatial equalizers based on antenna diversity (Higuchi & Sasaoka, 2004; Hori et al., 2003). Moreover, the switching between KMS and MRC improves the algorithm performance under the low pilot rate or low SNR environment, and the proposed algorithm can easily be combined with some error correction techniques to obtain better BER performance. (4) The forgetting factor is used in periodogram estimation so that the algorithm can also deal with slow time-varying channels. A small forgetting factor has adaptability to quick channel variation and large BER, whereas a large one is used in low SNR environment for

(5) When the interference that exceeds GI is strong, the convergence of channel identification and BER performance can be improved by appropriately choosing the side lobe threshold

(6) The main computation only requires FFT, matrix multiplication, division of periodograms, therefore, the algorithm has less computational complexity and can easily be utilized in

In each iteration, besides the computation of FFT, the proposed algorithm needs the following calculations: 2(*kM* <sup>−</sup> *<sup>N</sup>*gi)2*<sup>q</sup>* multiplications to estimate ICI and ISI, about *<sup>N</sup>*<sup>2</sup> multiplications and divisions for updating **H**kms (*l*) or **H**mrc(*l*), about 2*N*<sup>2</sup> multiplications to estimate the information symbols, (*q* + 1)*N* multiplications to estimate periodograms, *qN* divisions to estimate the frequency response functions. It is seen that the main computation concentrates on the estimation of information symbols, whereas the channel identification is very simple in the frequency domain. By contrast with the computational complexity of RLS algorithm, besides the estimation of information symbols in RLS algorithm, the recursive identification

identification algorithm has less complexity than RLS, especially for large *kM*. Though LMS only requires O(2*kMN*tx) multiplications for channel identification, its convergence rate is

3GPP 2.5MHz OFDM transmission with 16QAM modulation is used in the examples where the FFT size *N* = 256, GI length *N*gi = *N*/4 = 64 (3GPP, 2006). Moreover, the number of carriers is 256, the total number of receiver antennas is *Q* = 2, and their distance is 1/2 of the

wave length. The noise is assumed as an additive Gaussian white noise.

*<sup>M</sup>N*tx) multiplications for one symbol period. It is clear that the proposed

convergence performance even for long impulse response of channel.

antenna case.

spectral smoothing.

the practical applications.

requires about <sup>O</sup>(*k*<sup>2</sup>

to reduce the influence of side lobes.

much slower than RLS (Balakrishnan et al., 2003).

**3.4 Numerical simulation examples**

Besides the direct wave, there are three multipath interferences in the transmission channel (Higuchi & Sasaoka, 2004; Hori et al., 2003). The coefficients of the waves are illustrated in Table.1. Let the SNR= 20dB. The simulation is performed under 6 conditions of pilot rates:


Table 1. Simulation conditions

The successive training symbols are available for identification; the pilot rates are 1/2, 1/4, 1/8, 1/16, respectively; and a severe case where only one pilot carrier is known to remove the ambiguity of channel identification and symbol estimation. At the pilot carrier, the value of *D*ˆ (*n*, *l*) is given by the corresponding true value of pilot symbol, while at the other carriers, the value of *D*(*n*, *l*) has to be estimated from the received signals. Channel identification is started from the initial values of *H*ˆ (*ejnω*<sup>0</sup> , 0) = 1, Γˆ(*z*, 0) = 0. For the comparison of estimation errors, the square error of channel identification *ErH* defined by

$$Err\_H = \frac{\sum\_{q}^{N-1} \left| \hat{H}\_q(e^{jn\omega\_0}) - H\_q(e^{jn\omega\_0}) \right|^2}{\sum\_{q}^{N-1} \left| H\_q(e^{jn\omega\_0}) \right|^2},\tag{37}$$

is illustrated in Fig.5(a), and BER curves of the estimated symbols are illustrated in Fig.5(b), respectively. They show that the algorithm works well even for few pilot carrier cases. In Fig.5(b), BER of the estimated symbols decreases to 0 after several iterations, whereas it is larger than 0.5 at low pilot rate in the first iteration due to the initial value of channel identification is quite different from the true one. It implies that the algorithm converges even under the severe initial conditions. The BER curves are plotted in Fig.5(b) when the channel estimate is used for symbol estimation. It is seen that the good BER performance can be guaranteed if the influence of ISI and ICI caused by the long multipath interferences is compensated by the replica of leakage error.

Since RLS algorithm is often used for channel identification in the previous methods, the results of RLS algorithm are also shown in Fig.5(a) and Fig.5(b) for comparison with the proposed algorithm. They are obtained under the same simulation conditions. In RLS algorithm, the recursion is performed per sampling instant to estimate the parameters of *hm* in (3) by using the latest samples of *y*(*k*) and *d*(*k*), thus RLS updates the estimates *N*tx = 320 times during 1 iteration in the proposed algorithm. In Fig.5(a), if several successive training periods are available, i.e., the true values of *d*(*k*) can be used for channel identification directly, it is seen that RLS algorithm can yield a small error by using the true *d*(*k*) while its computational load is heavier than that of the proposed algorithm. However, when the training symbols are unavailable, ˆ*d*(*k*) has to be estimated for channel identification, the error of ˆ*d*(*k*) deteriorates the identification performance of RLS algorithm. For example, when the pilot rate is 1/2, the convergence of channel identification and the symbol estimation

used RLS estimation is much slower than that of the algorithm in frequency domain, and its performance becomes very poor when the pilot rate is 1/4, whereas the proposed algorithm

Channel Identification for OFDM Communication System in Frequency Domain 97

Let the pilot rate be 1/8, and SNR be changed from 10dB to 40dB, the other conditions be the same as those in Section 3.4.1. *ErH* plotted in Fig.6 shows that the proposed algorithm

Let the pilot rate be 1/8, and the power of interference 3 be changed from 0dB to −20dB, the other conditions be the same as those in Section 3.4.1. *ErH* and BER versus interference power are illustrated in Fig.7(a) and Fig.7(b), respectively. It is seen that the interferences exceeding GI with high path gain cause severe ICI and ISI, while their information is not so fragile to the side lobes caused by noise, ICI and ISI. Therefore, the convergence of the channel estimation for the interference path with high gain is a little faster than the convergence for the interference with low path gain, as shown in Fig.7(a), and the convergence of channel estimation helps to compensate the influence of ICI and ISI. Though there are two strong multipath interferences exceeding GI, both *ErH* and BER are decreased to a considerable low

Let the pilot rate be 1/8. In order to investigate the influence of *Q* on the algorithm performance, the total number *Q* of antennas is chosen as *Q* = 1, 2, 3 and 4 respectively. The other simulation conditions are the same as those in Section 3.4.1. The curves of *ErH* and

 

works well since it uses spectral periodograms.

**3.4.2 Channel identification versus noise**

converges even for low SNR conditions.

Fig. 6. Channel estimation error versus SNR

level after just about several iterations.

**3.4.3 Channel identification versus interference power**

**3.4.4 Channel identification versus total number of antennas**

Fig. 5. Estimation result (average of 30 simulation runs)

used RLS estimation is much slower than that of the algorithm in frequency domain, and its performance becomes very poor when the pilot rate is 1/4, whereas the proposed algorithm works well since it uses spectral periodograms.

### **3.4.2 Channel identification versus noise**

16 Will-be-set-by-IN-TECH

-

(b) BER of estimated symbols.

-

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

(a) Channel identification error.

Fig. 5. Estimation result (average of 30 simulation runs)

Let the pilot rate be 1/8, and SNR be changed from 10dB to 40dB, the other conditions be the same as those in Section 3.4.1. *ErH* plotted in Fig.6 shows that the proposed algorithm converges even for low SNR conditions.

Fig. 6. Channel estimation error versus SNR

### **3.4.3 Channel identification versus interference power**

Let the pilot rate be 1/8, and the power of interference 3 be changed from 0dB to −20dB, the other conditions be the same as those in Section 3.4.1. *ErH* and BER versus interference power are illustrated in Fig.7(a) and Fig.7(b), respectively. It is seen that the interferences exceeding GI with high path gain cause severe ICI and ISI, while their information is not so fragile to the side lobes caused by noise, ICI and ISI. Therefore, the convergence of the channel estimation for the interference path with high gain is a little faster than the convergence for the interference with low path gain, as shown in Fig.7(a), and the convergence of channel estimation helps to compensate the influence of ICI and ISI. Though there are two strong multipath interferences exceeding GI, both *ErH* and BER are decreased to a considerable low level after just about several iterations.

### **3.4.4 Channel identification versus total number of antennas**

Let the pilot rate be 1/8. In order to investigate the influence of *Q* on the algorithm performance, the total number *Q* of antennas is chosen as *Q* = 1, 2, 3 and 4 respectively. The other simulation conditions are the same as those in Section 3.4.1. The curves of *ErH* and

BER are plotted in Fig.8. If only a single antenna 1 is used for symbol estimation, as illustrated in Fig.4, although the estimation error of *H*1(*ejnω*<sup>0</sup> ) decreases to a low level after 10 iterations, the low magnitude of *H*1(*ejnω*<sup>0</sup> ) leads to weak orthogonal component in the received signal, as a result, the BER remains high for *Q* = 1. On the other hand, both the errors *ErH* and BER for *Q* = 2, 3 and 4 are very low since the strong orthogonal components can be used in symbol estimation, and just 2 antennas can yield good performance in this example. It is seen that though BER is large in the first several iterations, its influence is mitigated in periodograms so that channel identification can provide an effective channel model for equalization.

Channel Identification for OFDM Communication System in Frequency Domain 99

 

   

4, 5*N*gi

4, 7*N*gi

4, respectively. The

The effect of FFT size *N* on the channel identification is considered in the simulation. Here *N* is chosen as 64, 128, 256 and 512, respectively, the GI length is *N*gi = *N*/4. Besides the direct

coefficients of interference power, phase, DOA are given in Table 1, and the other simulation conditions are the same as those in Section 3.4.1. The channel estimation error *ErH* under the

Following the expressions in (9) and (10), or in (17) and (18), it is seen that the effects of ICI and ISI decrease a little with increasing FFT size *N* in the spectral periodogrames of *SDY*,*q*(*ejnω*<sup>0</sup> , *l*). Consequently, the error of channel estimation becomes lower for large *N* since the effects of side lobe caused by ICI and ISI reduce with increasing FFT size *N*, and the proposed algorithm

Let the pilot rate be 1/8, the power, phase and DOA of Interference 3 be changed at every 10 symbol periods so that the channel is time-varying. The power profile is shown in Fig.10(a).

Fig. 8. Channel estimation error and BER versus antenna number *Q*

yields good BER performance under the given simulation conditions.

**3.4.5 Channel identification versus FFT size** *N*

4 cases of FFT size *N* is illustrated in Fig. 9.

**3.4.6 Identification of time-varying channel**

path, the delay taps of 3 interference paths are 3*N*gi

 

Fig. 7. Estimation error versus interference power

18 Will-be-set-by-IN-TECH

(a) Channel estimation error

(b) Bit error rate

 

 

Fig. 7. Estimation error versus interference power

 

 

BER are plotted in Fig.8. If only a single antenna 1 is used for symbol estimation, as illustrated in Fig.4, although the estimation error of *H*1(*ejnω*<sup>0</sup> ) decreases to a low level after 10 iterations, the low magnitude of *H*1(*ejnω*<sup>0</sup> ) leads to weak orthogonal component in the received signal, as a result, the BER remains high for *Q* = 1. On the other hand, both the errors *ErH* and BER for *Q* = 2, 3 and 4 are very low since the strong orthogonal components can be used in symbol estimation, and just 2 antennas can yield good performance in this example. It is seen that though BER is large in the first several iterations, its influence is mitigated in periodograms so that channel identification can provide an effective channel model for equalization.

Fig. 8. Channel estimation error and BER versus antenna number *Q*

### **3.4.5 Channel identification versus FFT size** *N*

The effect of FFT size *N* on the channel identification is considered in the simulation. Here *N* is chosen as 64, 128, 256 and 512, respectively, the GI length is *N*gi = *N*/4. Besides the direct path, the delay taps of 3 interference paths are 3*N*gi 4, 5*N*gi 4, 7*N*gi 4, respectively. The coefficients of interference power, phase, DOA are given in Table 1, and the other simulation conditions are the same as those in Section 3.4.1. The channel estimation error *ErH* under the 4 cases of FFT size *N* is illustrated in Fig. 9.

Following the expressions in (9) and (10), or in (17) and (18), it is seen that the effects of ICI and ISI decrease a little with increasing FFT size *N* in the spectral periodogrames of *SDY*,*q*(*ejnω*<sup>0</sup> , *l*). Consequently, the error of channel estimation becomes lower for large *N* since the effects of side lobe caused by ICI and ISI reduce with increasing FFT size *N*, and the proposed algorithm yields good BER performance under the given simulation conditions.

#### **3.4.6 Identification of time-varying channel**

Let the pilot rate be 1/8, the power, phase and DOA of Interference 3 be changed at every 10 symbol periods so that the channel is time-varying. The power profile is shown in Fig.10(a).


(b) Channel estimation error and BER

 

 

Channel Identification for OFDM Communication System in Frequency Domain 101

(a) Power variation at Interference 3. x: jump points

Fig. 10. Estimation errors for time-varying channel

Fig. 9. Channel estimation error versus FFT size *N*

For the variation of channel, the forgetting factor *<sup>λ</sup><sup>l</sup>* <sup>=</sup> min{0.075 <sup>×</sup> 1.05*<sup>l</sup>* , 0.75} is used. At the first several iterations, small *λ<sup>l</sup>* is selected to mitigate the influences of high estimated symbols' BER and the effect of side lobes. With decreasing of BER, *λ<sup>l</sup>* is increased gradually to smooth the periodograms. The results of *ErH*, BER are illustrated in Fig.10(b). Though the channel varies quickly, the prompt reduction of errors shows that the proposed algorithm can also work well for time-varying channels.

#### **3.4.7 Channel identification of COST 207 model**

Let the pilot rate be 1/8. Assume that the delay profile of the multipath in a hill area is a COST 207 model (European Communities, 1989). There are eleven waves with delay time 0 ≤ *km* ≤ 10 and power *e*<sup>−</sup> *km* 2.5 , twenty one waves with delay time 40 <sup>≤</sup> *km* <sup>≤</sup> 60 and power 0.7079*e*<sup>−</sup> *km*−<sup>40</sup> <sup>4</sup> , fifteen waves exceeding GI with delay time 72 <sup>≤</sup> *km* <sup>≤</sup> 96 and power 0.5623*e*<sup>−</sup> *km*−<sup>72</sup> 3.6 , and *m* = 0 denotes the direct wave. The total power of interferences exceeding GI is −1.29dB. The DOA of multipath waves are generated randomly, and the other conditions are the same as in Section 3.4.1.

As illustrated in Fig.11(a), the received signal suffers from strong multipath interferences, as a result, BER is about 0.4 without interference compensation. In the simulation, the side lobe threshold is chosen as max{0.1 <sup>×</sup> 0.98*<sup>l</sup>* , 0.005} to reduce the influence of side lobe. At the first several iterations, a large side lobe threshold is selected, whereas the side lobe threshold is decreased gradually to deal with weak multipath interferences. The corresponding *ErH* and BER of estimated symbols are shown in Fig.11(b). Though the channel has strong multipath interferences, *ErH* decreases from 1.0 to 0.0025, BER decreases from 0.4 to 0 in about 30 iterations.

20 Will-be-set-by-IN-TECH

 

 

, 0.75} is used. At

<sup>4</sup> ,

3.6 , and

the first several iterations, small *λ<sup>l</sup>* is selected to mitigate the influences of high estimated symbols' BER and the effect of side lobes. With decreasing of BER, *λ<sup>l</sup>* is increased gradually to smooth the periodograms. The results of *ErH*, BER are illustrated in Fig.10(b). Though the channel varies quickly, the prompt reduction of errors shows that the proposed algorithm can

Let the pilot rate be 1/8. Assume that the delay profile of the multipath in a hill area is a COST 207 model (European Communities, 1989). There are eleven waves with delay time 0 ≤ *km* ≤

*m* = 0 denotes the direct wave. The total power of interferences exceeding GI is −1.29dB. The DOA of multipath waves are generated randomly, and the other conditions are the same as in

As illustrated in Fig.11(a), the received signal suffers from strong multipath interferences, as a result, BER is about 0.4 without interference compensation. In the simulation, the side lobe

several iterations, a large side lobe threshold is selected, whereas the side lobe threshold is decreased gradually to deal with weak multipath interferences. The corresponding *ErH* and BER of estimated symbols are shown in Fig.11(b). Though the channel has strong multipath interferences, *ErH* decreases from 1.0 to 0.0025, BER decreases from 0.4 to 0 in about 30

fifteen waves exceeding GI with delay time 72 <sup>≤</sup> *km* <sup>≤</sup> 96 and power 0.5623*e*<sup>−</sup> *km*−<sup>72</sup>

2.5 , twenty one waves with delay time 40 <sup>≤</sup> *km* <sup>≤</sup> 60 and power 0.7079*e*<sup>−</sup> *km*−<sup>40</sup>

, 0.005} to reduce the influence of side lobe. At the first

Fig. 9. Channel estimation error versus FFT size *N*

also work well for time-varying channels.

threshold is chosen as max{0.1 <sup>×</sup> 0.98*<sup>l</sup>*

10 and power *e*<sup>−</sup> *km*

Section 3.4.1.

iterations.

**3.4.7 Channel identification of COST 207 model**

For the variation of channel, the forgetting factor *<sup>λ</sup><sup>l</sup>* <sup>=</sup> min{0.075 <sup>×</sup> 1.05*<sup>l</sup>*

(b) Channel estimation error and BER

Fig. 10. Estimation errors for time-varying channel

**4. Identification of channel with limited bandwidth**

complexity (Sun & Sano, 2007).

about the channel dynamics outside the signal band.

**4.2 Fourier analysis of specified signals**

*u*(*k*) =

*N*<sup>1</sup> ∑ *n*=−*N*<sup>1</sup>

**4.1 Signal bandwidth**

bandwidth is as follows.

In some large capacity OFDM systems such as digital terrestrial television broadcasting, the carriers far away from the frequency of central band do not convey any information data in order to simplify the design of filter with sharp cut-off performance and not to interference the adjacent communication channels, thus the transmitted signal is restricted within a specified frequency band. Due to the dynamic modes of the transmission channel cannot be excited beyond the signal band, channel identification becomes a very difficult problem (Ljung, 1999). On the other hand, if the adaptive algorithms for OFDM system has feedback element, not only the channel information inside the signal band, but also the outside band is important to the processing performance of system stability and convergence rate. When the delay taps are very short and the width of outside band is much narrower than the signal band, extrapolation may exploit a little information outside the signal band from the information inside the band (Hamazumi et al., 2000). This problem had been discussed in the time domain (Sun & Sano, 2005; Ysebaert et al., 2004), however, it is suffered from considerable computational complexity due to the nonlinear optimization or operation of high dimension data matrices. So a frequency domain approach is investigated to decrease the computational

Channel Identification for OFDM Communication System in Frequency Domain 103

In the *l*th transmission symbol period, the symbol *D*(*n*, *l*) of OFDM signal with limited

*<sup>D</sup>*(*n*, *<sup>l</sup>*) = Symbol data(�<sup>=</sup> <sup>0</sup>), for <sup>|</sup>*n*<sup>|</sup> <sup>≤</sup> *<sup>N</sup>*<sup>1</sup>

where *N*<sup>1</sup> *< N*/2. It can be seen that the carriers whose distance from the central carrier are more than *N*<sup>1</sup> do not carry any information data, hence the spectrum of transmitted signal *d*(*k*) in an effective symbol period, i.e. *lN*tx ≤ *k < lN*tx + *N*, is limited to |*n*| ≤ *N*1, whereas the spectral density outside signal band, i.e. for |*n*| *> N*1, becomes to 0. It implies that *d*(*k*) and its corresponding received signal *y*(*k*) for *lN*tx ≤ *k < lN*tx + *N* do hold little information

Two signals are constructed from the original transmitted and received signals as follows:

As an example, the signal *u*(*k*) is illustrated in Fig.12, where *K* is an integer satisfying *N*gi *< K* + *N*gi ≤ *N*, e.g., *K* = *N*/2. Moreover, the expressions of *u*(*k*), *x*(*k*) and their relation can also be constructed similarly as *d*(*k*) and *y*(*k*). From the feature of guard interval, *u*(*k*) = 0 holds for *lN*tx − *N*gi ≤ *k < lN*tx. Then substituting (1) into the expression of *u*(*k*) yields that

*<sup>D</sup>*(*n*, *<sup>l</sup>*)*ejnω*0(*k*−*lN*tx−*N*gi)<sup>−</sup>

0, for |*n*| *> N*<sup>1</sup>

*u*(*k*) = *d*(*k* − *N*gi) − *d*(*k* − *N*gi − *N*), (39) *x*(*k*) = *y*(*k* − *N*gi) − *y*(*k* − *N*gi − *N*). (40)

*<sup>D</sup>*(*n*, *<sup>l</sup>* <sup>−</sup> <sup>1</sup>)*ejnω*0(*k*−*lN*tx)

*N*<sup>1</sup> ∑ *n*=−*N*<sup>1</sup> (38)

Fig. 11. Estimation errors (average of 30 simulation runs).

#### **4. Identification of channel with limited bandwidth**

In some large capacity OFDM systems such as digital terrestrial television broadcasting, the carriers far away from the frequency of central band do not convey any information data in order to simplify the design of filter with sharp cut-off performance and not to interference the adjacent communication channels, thus the transmitted signal is restricted within a specified frequency band. Due to the dynamic modes of the transmission channel cannot be excited beyond the signal band, channel identification becomes a very difficult problem (Ljung, 1999).

On the other hand, if the adaptive algorithms for OFDM system has feedback element, not only the channel information inside the signal band, but also the outside band is important to the processing performance of system stability and convergence rate. When the delay taps are very short and the width of outside band is much narrower than the signal band, extrapolation may exploit a little information outside the signal band from the information inside the band (Hamazumi et al., 2000). This problem had been discussed in the time domain (Sun & Sano, 2005; Ysebaert et al., 2004), however, it is suffered from considerable computational complexity due to the nonlinear optimization or operation of high dimension data matrices. So a frequency domain approach is investigated to decrease the computational complexity (Sun & Sano, 2007).

#### **4.1 Signal bandwidth**

22 Will-be-set-by-IN-TECH

 

 -

(b) Channel estimation error and BER of estimated symbols

(a) Relative power of interferences

Fig. 11. Estimation errors (average of 30 simulation runs).

   

 

In the *l*th transmission symbol period, the symbol *D*(*n*, *l*) of OFDM signal with limited bandwidth is as follows.

$$D(n,l) = \begin{cases} \text{Symbol } \text{ data}(\neq 0), \text{ for } |n| \le N\_1\\ 0, & \text{for } |n| > N\_1 \end{cases} \tag{38}$$

where *N*<sup>1</sup> *< N*/2. It can be seen that the carriers whose distance from the central carrier are more than *N*<sup>1</sup> do not carry any information data, hence the spectrum of transmitted signal *d*(*k*) in an effective symbol period, i.e. *lN*tx ≤ *k < lN*tx + *N*, is limited to |*n*| ≤ *N*1, whereas the spectral density outside signal band, i.e. for |*n*| *> N*1, becomes to 0. It implies that *d*(*k*) and its corresponding received signal *y*(*k*) for *lN*tx ≤ *k < lN*tx + *N* do hold little information about the channel dynamics outside the signal band.

#### **4.2 Fourier analysis of specified signals**

Two signals are constructed from the original transmitted and received signals as follows:

$$
\mu(k) = d(k - N\_{\rm gi}) - d(k - N\_{\rm gi} - N), \tag{39}
$$

$$\mathbf{x}(k) = y(k - N\_{\rm gi}) - y(k - N\_{\rm gi} - N). \tag{40}$$

As an example, the signal *u*(*k*) is illustrated in Fig.12, where *K* is an integer satisfying *N*gi *< K* + *N*gi ≤ *N*, e.g., *K* = *N*/2. Moreover, the expressions of *u*(*k*), *x*(*k*) and their relation can also be constructed similarly as *d*(*k*) and *y*(*k*). From the feature of guard interval, *u*(*k*) = 0 holds for *lN*tx − *N*gi ≤ *k < lN*tx. Then substituting (1) into the expression of *u*(*k*) yields that

$$\mu(k) = \sum\_{n=-N\_1}^{N\_1} D(n,l)e^{jn\omega\_0(k - lN\_{\rm tr} - N\_{\rm g})} - \sum\_{n=-N\_1}^{N\_1} D(n,l-1)e^{jn\omega\_0(k - lN\_{\rm tr})}$$

representations are expressed by

**4.3 Spectra estimation**

approximated as

*Em*,1(*n*, *l*)=*e*−*jnω*0*km*

*Em*,2(*n*, *l*) = *kme*−*jω*0*km*

Let the power spectrum of *u*(*k*) be estimated by

for |*n*| ≤ *N*<sup>1</sup> when *l* is large enough, and

*N*<sup>1</sup> ∑*n*¯ = −*N*<sup>1</sup> *n*¯ �= *n*

*SUU*(*n*, *<sup>l</sup>*) = <sup>1</sup>

*SUU*(*n*, *<sup>l</sup>*) <sup>≈</sup> <sup>2</sup>*D*<sup>2</sup> *<sup>N</sup>*<sup>1</sup>

*SUEm*,2 (*n*, *l*) ≈ 2*e*

∑ *n*¯ = −*N*<sup>1</sup> *n*¯ �= *n*

Furthermore, the spectral leakage error *SUEm*,2 (*n*, *l*) for |*n*| ≤ *N*<sup>1</sup> is

*<sup>H</sup>*(*ejnω*<sup>0</sup> )*SUU*(*n*, *<sup>l</sup>*) <sup>≈</sup> *SUX*(*n*, *<sup>l</sup>*) +

*SUU*(*n*, *<sup>l</sup>*) <sup>≈</sup> <sup>2</sup>*D*<sup>2</sup> *<sup>N</sup>*<sup>1</sup>

for *N*<sup>1</sup> *<* |*n*| *< N*/2. Meanwhile, *SUEm*,1 (*n*, *l*) satisfies

*SUEm*,1 (*n*, *<sup>l</sup>*) <sup>≈</sup> *<sup>D</sup>*<sup>2</sup> *<sup>N</sup>*<sup>1</sup>

while for *N*<sup>1</sup> *<* |*n*| *< N*/2 it turns to

is summarized in (53):

Channel Identification for OFDM Communication System in Frequency Domain 105

·

*l*

∑*n*¯ =−*N*<sup>1</sup> *n*¯ �= *n*

*l* ∑ *l*1=1

∑ *n*¯=−*N*<sup>1</sup>

<sup>1</sup> <sup>−</sup> *<sup>e</sup>*−*j*(*n*−*n*¯)*ω*0*<sup>K</sup>* 1 − cos(*n* − *n*¯)*ω*<sup>0</sup>

Following (45), the relation between spectra of *u*(*k*), *x*(*k*) and the frequency property *H*(*ejnω*<sup>0</sup> )

*M* ∑ *m*=0 *hm* 

<sup>−</sup>*jnω*0*km <sup>D</sup>*<sup>2</sup>

and *SUEm*,1 (*n*, *l*) and *SUEm*,2 (*n*, *l*) are defined in the similar formula. Then *SUU*(*n*, *l*) can be

*<sup>D</sup>*(*n*¯, *<sup>l</sup>*)*e*−*jn*¯*ω*0*N*gi <sup>−</sup> *<sup>D</sup>*(*n*¯, *<sup>l</sup>* <sup>−</sup> <sup>1</sup>)

*<sup>e</sup>*−*j*(*n*−*n*¯)*ω*0(*K*−*km*) <sup>−</sup> *<sup>e</sup>*−*j*(*n*−*n*¯)*ω*0*<sup>K</sup>* <sup>1</sup> <sup>−</sup> *<sup>e</sup>*−*j*(*n*−*n*¯)*ω*<sup>0</sup>

*<sup>D</sup>*(*n*, *<sup>l</sup>*)*e*−*jnω*0*N*gi <sup>−</sup> *<sup>D</sup>*(*n*, *<sup>l</sup>* <sup>−</sup> <sup>1</sup>)

1 − cos(*n* − *n*¯)*ω*0*K* 1 − cos(*n* − *n*¯)*ω*<sup>0</sup>

> 1−cos(*n*− *n*¯)*ω*0*K* 1−cos(*n*−*n*¯)*ω*<sup>0</sup>

> > *e*

*U*∗(*n*, *l*1)*U*(*n*, *l*1), (48)

+2*K*2*D*<sup>2</sup>

<sup>−</sup>*jnω*0*km* <sup>−</sup> *<sup>e</sup>*−*jn*¯*ω*0*km*

*SUEm*,2 (*n*, *l*) = 0. (52)

*SUEm*,1 (*n*, *l*)+*SUEm*,2 (*n*, *l*)

*Kkm*, (51)

. (53)

, (46)

(49)

. (50)

. (47)

$$I = \sum\_{n=-N\_1}^{N\_1} \left( D(n, l)e^{-jn\omega\_0 nN\_{\rm g\bar{l}}} - D(n, l-1) \right) e^{jn\omega\_0 (k - lN\_{\rm tx})},$$
 
$$\text{for } lN\_{\rm tx} \le k < K + lN\_{\rm tx}. \tag{41}$$

Next consider the signal *x*(*k*). Let its component corresponding to interference *m* be denoted

#### Fig. 12. Illustration of signal *u*(*k*)

by *xm*(*k*), then omitting the noise term for the simplicity of notation, *xm*(*k*) and *x*(*k*) can be expressed by

$$\mathbf{x}\_m(k) = h\_m u(k - k\_m), \ \mathbf{x}(k) = \sum\_{m=0}^{M} \mathbf{x}\_m(k) \tag{42}$$

Moreover, in the *l*th symbol period, *xm*(*k*) becomes to

$$\mathbf{x}\_{\mathfrak{M}}(k) = \begin{cases} h\_{\mathfrak{M}} \sum\_{n=-N\_{\mathbf{1}}}^{N\_{\mathbf{1}}} \left( D(n,l)e^{-j n \omega\_{0} N\_{\mathbf{j}\mathfrak{l}}} - D(n,l-1) \right) e^{j n \omega\_{0} (k - l N\_{\mathbf{k}\mathfrak{r}} - k\_{\mathfrak{m}})} \\ \qquad \quad \text{for } k\_{\mathfrak{M}} \le k - l N\_{\mathbf{1}\mathfrak{r}} < K \\ 0, & \text{for } 0 \le k - l N\_{\mathbf{1}\mathfrak{r}} < k\_{\mathfrak{M}} \end{cases} \tag{43}$$

On the other hand, let the Fourier transform of *u*(*k*) in *lN*tx ≤ *k < K* + *lN*tx be given by

$$\mathcal{U}(n,l) = \sum\_{k=lN\_{\rm tr}}^{K+lN\_{\rm tr}-1} \mu(k) e^{-jn\omega\_0(k-lN\_{\rm tr})} \tag{44}$$

for *n* = −*N*/2 + 1, ··· , *N*/2, and similarly the Fourier transform of *x*(*k*), then the following frequency properties of signals *u*(*k*) and *x*(*k*) satisfy the following equation:

$$H(e^{j n \omega\_0}) \mathcal{U}(n, l) = \mathcal{X}(n, l) + \sum\_{m=0}^{M} h\_m \left( E\_{m, 1}(n, l) + E\_{m, 2}(n, l) \right) \tag{45}$$

where *Em*,1(*n*, *l*) and *Em*,2(*n*, *l*) are the leakage terms to the *n*th frequency point from the other components, the *n*th frequency component itself, respectively. Their theoretical representations are expressed by

24 Will-be-set-by-IN-TECH

Next consider the signal *x*(*k*). Let its component corresponding to interference *m* be denoted

by *xm*(*k*), then omitting the noise term for the simplicity of notation, *xm*(*k*) and *x*(*k*) can be

<sup>−</sup>*jnω*0*N*gi <sup>−</sup> *<sup>D</sup>*(*n*, *<sup>l</sup>* <sup>−</sup> <sup>1</sup>)

*u*(*k*)*e*

for *n* = −*N*/2 + 1, ··· , *N*/2, and similarly the Fourier transform of *x*(*k*), then the following

*M* ∑ *m*=0

where *Em*,1(*n*, *l*) and *Em*,2(*n*, *l*) are the leakage terms to the *n*th frequency point from the other components, the *n*th frequency component itself, respectively. Their theoretical

for *km* ≤ *k* − *lN*tx *< K*

*xm*(*k*) = *hmu*(*k* − *km*), *x*(*k*) =

0, for 0 ≤ *k* − *lN*tx *< km*

On the other hand, let the Fourier transform of *u*(*k*) in *lN*tx ≤ *k < K* + *lN*tx be given by

*K*+*lN*tx−1 ∑ *k*=*lN*tx

frequency properties of signals *u*(*k*) and *x*(*k*) satisfy the following equation:

�

*M* ∑ *m*=0

�

*xm*(*k*) (42)

,

(43)

*ejnω*0(*k*−*lN*tx−*km*)

<sup>−</sup>*jnω*0(*k*−*lN*tx) (44)

*hm* (*Em*,1(*n*, *l*) + *Em*,2(*n*, *l*)) (45)

*ejnω*0(*k*−*lN*tx)

for *lN*tx≤*k<K* + *lN*tx. (41)

,

*<sup>D</sup>*(*n*, *<sup>l</sup>*)*e*−*jnω*0*N*gi−*D*(*n*, *<sup>l</sup>*−1)

=

Fig. 12. Illustration of signal *u*(*k*)

*xm*(*k*) =

⎧ ⎪⎪⎪⎪⎨

*hm*

⎪⎪⎪⎪⎩

Moreover, in the *l*th symbol period, *xm*(*k*) becomes to

*N*<sup>1</sup> ∑ *n*=−*N*<sup>1</sup>

�

*U*(*n*, *l*) =

*H*(*ejnω*<sup>0</sup> )*U*(*n*, *l*) = *X*(*n*, *l*) +

*D*(*n*, *l*)*e*

expressed by

*N*<sup>1</sup> ∑ *n*=−*N*<sup>1</sup>

�

$$E\_{m,1}(n,l) = e^{-j n \omega\_0 k\_m} \sum\_{\substack{\vec{n} = -N\_1 \\ \vec{n} \neq \vec{n}}}^{N\_1} \left( D(\vec{n}, l) e^{-j \|\omega\_0 N\_{\vec{n}}} - D(\vec{n}, l - 1) \right)$$

$$\cdot \frac{e^{-j(n - \mathfrak{n}) \omega\_0 (K - k\_m)} - e^{-j(n - \mathfrak{n}) \omega\_0 K}}{1 - e^{-j(n - \mathfrak{n}) \omega\_0}},\tag{46}$$

$$E\_{m,2}(n,l) = k\_m e^{-j\omega\_0 k\_m} \left( D(n,l)e^{-j n \omega\_0 N\_{\rm fb}} - D(n,l-1) \right). \tag{47}$$

#### **4.3 Spectra estimation**

Let the power spectrum of *u*(*k*) be estimated by

$$S\_{III}(n,l) = \frac{1}{l} \sum\_{l\_1=1}^{l} U^\*(n,l\_1) U(n,l\_1),\tag{48}$$

and *SUEm*,1 (*n*, *l*) and *SUEm*,2 (*n*, *l*) are defined in the similar formula. Then *SUU*(*n*, *l*) can be approximated as

$$S\_{III}(n,l) \approx 2\overline{D}^2 \sum\_{\substack{\vec{n}=-N\_1\\ \vec{n}\neq \vec{n}}}^{N\_1} \frac{1-\cos(n-\vec{n})\omega\_0 K}{1-\cos(n-\vec{n})\omega\_0} + 2K^2\overline{D}^2\overline{D}^2$$

for |*n*| ≤ *N*<sup>1</sup> when *l* is large enough, and

$$S\_{UUI}(n,l) \approx 2\overline{D}^2 \sum\_{\hbar=-N\_l}^{N\_l} \frac{1-\cos(n-\overline{\pi})\omega\_0 K}{1-\cos(n-\overline{\pi})\omega\_0} \tag{49}$$

for *N*<sup>1</sup> *<* |*n*| *< N*/2. Meanwhile, *SUEm*,1 (*n*, *l*) satisfies

$$S\_{UE\_{n1}}(n,l) \approx \overline{D}^2 \sum\_{\substack{\vec{n} = -N\_1 \\ \vec{n} \neq \pm n}}^{N\_1} \left( \frac{1 - e^{-j(n-\mathfrak{n})\omega\_0 \mathbf{K}}}{1 - \cos(n-\vec{n})\omega\_0} \left( e^{-j\pi\omega\_0 \mathbf{k}\_m} - e^{-j\hbar\omega\_0 \mathbf{k}\_m} \right) \right). \tag{50}$$

Furthermore, the spectral leakage error *SUEm*,2 (*n*, *l*) for |*n*| ≤ *N*<sup>1</sup> is

$$S\_{UE\_{w,2}}(n,l) \approx 2e^{-jn\omega\_0 k\_w} \overline{D}^2 K k\_{m\nu} \tag{51}$$

while for *N*<sup>1</sup> *<* |*n*| *< N*/2 it turns to

$$S\_{UE\_{w,2}}(n,l) = 0.\tag{52}$$

Following (45), the relation between spectra of *u*(*k*), *x*(*k*) and the frequency property *H*(*ejnω*<sup>0</sup> ) is summarized in (53):

$$H(e^{j n \omega\_0}) \mathcal{S}\_{UU}(n, l) \approx \mathcal{S}\_{UX}(n, l) + \sum\_{m=0}^{M} h\_m \left( \mathcal{S}\_{UE\_{m1}}(n, l) + \mathcal{S}\_{UE\_{m2}}(n, l) \right). \tag{53}$$

where *SDY*(*n*, *l*) and *SDD*(*n*, *l*) are calculated from the received signal *y*(*k*) and the symbol

Channel Identification for OFDM Communication System in Frequency Domain 107

. Furthermore, if the channel has

*SUEm*,1 (*n*, *l*) + *SUEm*,2 (*n*, *l*)

not too long multipath interferences, interpolating channel information from the pilot carriers to their adjacent carriers is applicable to channel identification inside the signal band, and may reduce the influence caused by the estimation error of information symbols *D*(*n*, *l*).

In the simulation, the OFDM information symbols *D*(*l*, *n*) are 64QAM, the FFT/IFFT length is *N* = 2048, the guard interval is *N*gi = *N*/4, and *N*<sup>1</sup> = 600. It means that the number of active carriers is 1201, and the signal band is only about 3/5 of the full band width, hence identification of such an OFDM channel is a very difficult problem. There are 6 symbol transmission periods per signal frame, and 200 scattered pilot carriers are distributed uniformly in the first and fourth symbol periods (3GPP, 2006). Let *K* be chosen as *K* = *N*/2,

As shown in Fig.13, the signal *u*(*k*) has spectral power density of 10−<sup>3</sup> outside the signal band. Compared with the original spectrum of *d*(*k*) whose magnitude is 0 at the carriers for

 *> N*1, the information over the entire frequency band can be extracted from *u*(*k*) though the spectral power density outside the band is a little lower than the inside part. It implies

The true frequency property of *H*(*ejnω*<sup>0</sup> ), where the longest effective delay tap *kM* = 100 is used in channel estimation. The estimates after 50 iterations are plotted in Fig.14. It illustrates that the estimate is very close to the true one even outside the signal band, though the transmitted signal *d*(*k*) has severe band limitation. As a comparison, the channel is also identified by conventional methods using RLS and LMS, and the results show that though

 

that it is possible to identify the dynamics of channel even outside the signal band.

estimates *D*ˆ (*n*, *l*) without using

**4.5 Numerical simulation examples**

Fig. 13. Spectrum of *u*(*k*)

 

the SNR is 15dB.

 *n* 

Using the symbol estimation of *D*(*n*, *l*), the signals *d*(*k*) as well as *u*(*k*) are estimated. Then, *SUU*(*n*, *l*) and *SUX*(*n*, *l*) in (53) can be estimated from *u*(*k*) and *x*(*k*) directly. On the other hand, from (50)-(52), the terms of spectral leakage error *SUEm*,1 (*n*, *l*) and *SUEm*,2 (*n*, *l*) can be calculated beforehand without using observation data and the information of channel dynamics. Consequently, it is possible to estimate the channel property outside the signal band from signals *u*(*k*) and *x*(*k*) if *SUX*(*n*, *l*) is compensated by *SUEm*,1 (*n*, *l*) and *SUEm*,2 (*n*, *l*).

On the other hand, when the channel is time-varying, a forgetting factor *λ* can be used to estimate *SUU*(*n*, *l*) and *SUX*(*n*, *l*)

$$\mathcal{S}\_{UU}(n,l) = \lambda \mathcal{S}\_{UU}(n,l-1) + \mathcal{U}^\*(n,l)\mathcal{U}(n,l),\tag{54}$$

$$S\_{UX}(n,l) = \lambda S\_{UX}(n,l-1) + \mathcal{U}^\*(n,l)X(n,l) \tag{55}$$

respectively, where 0 *< λ <* 1.

#### **4.4 Channel identification algorithm**

Following (53), the estimation of channel model can be deduced as

$$\hat{H}(e^{j n \omega\_0}) = \frac{\mathbb{S}\_{UX}(n, l)}{\mathbb{S}\_{UII}(n, l)} + \sum\_{m=0}^{M} \hat{h}\_m \frac{\mathbb{S}\_{UE\_{w,1}}(n, l) + \mathbb{S}\_{UE\_{w,2}}(n, l)}{\mathbb{S}\_{UII}(n, l)}.\tag{56}$$

Notice that the estimates ˆ *hm*, the coefficients of IFFT of *H*ˆ (*ejnω*<sup>0</sup> ), the estimation can be performed in the following iterative form as

$$\hat{H}^{(i+1)}(e^{j n \omega\_0}) = \frac{\mathcal{S}\_{ULT}(n, l)}{\mathcal{S}\_{ULT}(n, l)} + \sum\_{m=0}^{M} \hat{h}\_m^{(i)} \frac{\mathcal{S}\_{UE\_{m1}}(n, l) + \mathcal{S}\_{UE\_{m2}}(n, l)}{\mathcal{S}\_{ULT}(n, l)},\tag{57}$$

where *i* is the iteration number, and ˆ *h* (0) *<sup>m</sup>* can be chosen as the coefficients of *SUX*(*n*, *l*) *SUU*(*n*, *l*). When the channel dynamics does not vary too fast, the recursive estimation can also be given by using the estimates ˆ *h* (*l*−1) *<sup>m</sup>* , which are estimated in the last symbol period:

$$\hat{H}^{(l)}(e^{j n \omega\_0}) = \frac{\mathcal{S}\_{UX}(n, l)}{\mathcal{S}\_{UII}(n, l)} + \sum\_{m=0}^{M} \hat{h}\_m^{(l-1)} \frac{\mathcal{S}\_{UE\_{m,1}}(n, l) + \mathcal{S}\_{UE\_{m,2}}(n, l)}{\mathcal{S}\_{UII}(n, l)}.\tag{58}$$

The main numerical computation in the identification algorithm is just FFT to estimate the signal *u*(*k*), power spectra *SUU*(*n*, *l*), *SUX*(*n*, *l*), division of *SUEm*,1 (*n*, *l*) + *SUEm*,2 (*n*, *l*) and *SUU*(*n*, *l*), while the two leakage error terms can be pre-calculated, division of *SUX*(*n*, *l*), and *SUU*(*n*, *l*), IFFT *H*ˆ (*ejnω*<sup>0</sup> ) to calculate ˆ *hm*. Furthermore, the computational complexity does not increase too much even though the interference delay taps get longer. So the identification algorithm can be easily implemented, and combined with other adaptive processing techniques.

On the other hand, *H*ˆ (*ejnω*<sup>0</sup> ) inside the signal band can also be given by the spectra of *SDY*(*n*, *l*) and *SDD*(*n*, *l*)

$$\hat{H}^{(e^{j n \omega\_0})} = \frac{S\_{DY}(n, l)}{S\_{DD}(n, l)}, \text{ for } \left| n \right| \le N\_1. \tag{59}$$

where *SDY*(*n*, *l*) and *SDD*(*n*, *l*) are calculated from the received signal *y*(*k*) and the symbol estimates *D*ˆ (*n*, *l*) without using *SUEm*,1 (*n*, *l*) + *SUEm*,2 (*n*, *l*) . Furthermore, if the channel has not too long multipath interferences, interpolating channel information from the pilot carriers to their adjacent carriers is applicable to channel identification inside the signal band, and may reduce the influence caused by the estimation error of information symbols *D*(*n*, *l*).

#### **4.5 Numerical simulation examples**

26 Will-be-set-by-IN-TECH

Using the symbol estimation of *D*(*n*, *l*), the signals *d*(*k*) as well as *u*(*k*) are estimated. Then, *SUU*(*n*, *l*) and *SUX*(*n*, *l*) in (53) can be estimated from *u*(*k*) and *x*(*k*) directly. On the other hand, from (50)-(52), the terms of spectral leakage error *SUEm*,1 (*n*, *l*) and *SUEm*,2 (*n*, *l*) can be calculated beforehand without using observation data and the information of channel dynamics. Consequently, it is possible to estimate the channel property outside the signal band from signals *u*(*k*) and *x*(*k*) if *SUX*(*n*, *l*) is compensated by *SUEm*,1 (*n*, *l*) and *SUEm*,2 (*n*, *l*). On the other hand, when the channel is time-varying, a forgetting factor *λ* can be used to

> *SUU*(*n*, *l*) = *λSUU*(*n*, *l* − 1) + *U*∗(*n*, *l*)*U*(*n*, *l*), (54) *SUX*(*n*, *l*) = *λSUX*(*n*, *l* − 1) + *U*∗(*n*, *l*)*X*(*n*, *l*) (55)

> > *SUEm*,1 (*n*, *l*) + *SUEm*,2 (*n*, *l*)

*hm*, the coefficients of IFFT of *H*ˆ (*ejnω*<sup>0</sup> ), the estimation can be

*SUEm*,1 (*n*, *l*) + *SUEm*,2 (*n*, *l*)

*SUEm*,1 (*n*, *l*) + *SUEm*,2 (*n*, *l*)

*hm*. Furthermore, the computational complexity

*SUU*(*n*, *<sup>l</sup>*) . (56)

*SUU*(*n*, *<sup>l</sup>*) , (57)

(*l*−1) *<sup>m</sup>* , which are estimated in the last

*SUU*(*n*, *<sup>l</sup>*) . (58)

*SUEm*,1 (*n*, *l*) + *SUEm*,2 (*n*, *l*)

<sup>≤</sup> *<sup>N</sup>*1, (59)

and

(0) *<sup>m</sup>* can be chosen as the coefficients of

estimate *SUU*(*n*, *l*) and *SUX*(*n*, *l*)

respectively, where 0 *< λ <* 1.

Notice that the estimates ˆ

*SUX*(*n*, *l*)

processing techniques.

and *SDD*(*n*, *l*)

symbol period:

**4.4 Channel identification algorithm**

Following (53), the estimation of channel model can be deduced as

*SUU*(*n*, *<sup>l</sup>*) <sup>+</sup>

*SUU*(*n*, *<sup>l</sup>*) <sup>+</sup>

*M* ∑ *m*=0 ˆ *hm*

> *M* ∑ *m*=0 ˆ *h* (*i*) *m*

> > *h*

The main numerical computation in the identification algorithm is just FFT to estimate the

*SUU*(*n*, *l*), while the two leakage error terms can be pre-calculated, division of *SUX*(*n*, *l*),

does not increase too much even though the interference delay taps get longer. So the identification algorithm can be easily implemented, and combined with other adaptive

On the other hand, *H*ˆ (*ejnω*<sup>0</sup> ) inside the signal band can also be given by the spectra of *SDY*(*n*, *l*)

, for *n* 

*SDD*(*n*, *l*)

*<sup>e</sup>jnω*<sup>0</sup> ) = *SDY*(*n*, *<sup>l</sup>*)

*M* ∑ *m*=0 ˆ *h* (*l*−1) *<sup>m</sup>*

*SUU*(*n*, *l*). When the channel dynamics does not vary too fast, the recursive

*h*

*<sup>H</sup>*<sup>ˆ</sup> (*ejnω*<sup>0</sup> ) = *SUX*(*n*, *<sup>l</sup>*)

(*ejnω*<sup>0</sup> ) = *SUX*(*n*, *<sup>l</sup>*)

performed in the following iterative form as

where *i* is the iteration number, and ˆ

and *SUU*(*n*, *l*), IFFT *H*ˆ (*ejnω*<sup>0</sup> ) to calculate ˆ

estimation can also be given by using the estimates ˆ

(*ejnω*<sup>0</sup> ) = *SUX*(*n*, *<sup>l</sup>*)

*H*ˆ (

signal *u*(*k*), power spectra *SUU*(*n*, *l*), *SUX*(*n*, *l*), division of

*SUU*(*n*, *<sup>l</sup>*) <sup>+</sup>

*H*ˆ (*i*+1)

*H*ˆ (*l*)

In the simulation, the OFDM information symbols *D*(*l*, *n*) are 64QAM, the FFT/IFFT length is *N* = 2048, the guard interval is *N*gi = *N*/4, and *N*<sup>1</sup> = 600. It means that the number of active carriers is 1201, and the signal band is only about 3/5 of the full band width, hence identification of such an OFDM channel is a very difficult problem. There are 6 symbol transmission periods per signal frame, and 200 scattered pilot carriers are distributed uniformly in the first and fourth symbol periods (3GPP, 2006). Let *K* be chosen as *K* = *N*/2, the SNR is 15dB.

As shown in Fig.13, the signal *u*(*k*) has spectral power density of 10−<sup>3</sup> outside the signal band. Compared with the original spectrum of *d*(*k*) whose magnitude is 0 at the carriers for *n > N*1, the information over the entire frequency band can be extracted from *u*(*k*) though the spectral power density outside the band is a little lower than the inside part. It implies that it is possible to identify the dynamics of channel even outside the signal band.

Fig. 13. Spectrum of *u*(*k*)

The true frequency property of *H*(*ejnω*<sup>0</sup> ), where the longest effective delay tap *kM* = 100 is used in channel estimation. The estimates after 50 iterations are plotted in Fig.14. It illustrates that the estimate is very close to the true one even outside the signal band, though the transmitted signal *d*(*k*) has severe band limitation. As a comparison, the channel is also identified by conventional methods using RLS and LMS, and the results show that though

It can be seen that even for low SNR, the estimation error successfully reduces to a low level just by tens iterations, and the estimated channel model can be applied to design adaptive

Channel Identification for OFDM Communication System in Frequency Domain 109

Channel identification using Fourier transform has been studied in this chapter. Since the OFDM transmitted signals in base band are generated through discrete Fourier transform, both the transmitted and received signals are easily managed in the frequency domain through Fourier transform. Consequently, the identification problem of OFDM channels could be solved in the frequency domain. Two channels have been investigated: the channel with long multipath interferences; and the transmitted signals with severe band limitation, where the conventional time domain methods cannot offer effective estimation. Firstly the properties of OFDM signals and the structural information have been analyzed in the frequency domain, then the relations of channel model and available information extracted from the observation data and structural information have been induced. Based on these relations, the frequency domain algorithms for OFDM channel identification have been developed, and the techniques have also been investigated to improve the identification accuracy, to deal with the time-varying channel and to reduce the computational complexity. It has been illustrated that the proposed frequency domain algorithms have better performance than the conventional time domain methods under the severe identification conditions considered in these problems, and the numerical results have demonstrated the effectiveness of Fourier transform in the channel identification applications. The algorithms work for the MIMO OFDM systems, and the estimation for OFDM channel with frequency offset are under further

3GPP (2006). 3GPP TR 25.814 (release 7), *Technical report*, 3rd Generation Partnership Project.

Balakrishnan, J., Martin, R. & Johson, J. C. (2003). Blind, adaptive channel shortening

Burke, J., Zeidler, J. & Rao, B. (2005). CINR difference analysis of optimal combining versus maximal ratio combining, *IEEE Trans. Wireless Communications* 4(1): 1–5. Chi, C., Feng, C., Chen, C. & Chen, C. (2006). *Blind Equalization and System Identification*,

Coleri, S., Ergen, M. & Bahai, A. (2002). Channel estimation techniques based on pilot arrangement in OFDM systems, *IEEE Trans. Broadcasting* 48(3): 223–229. Ding, L., Zhou, T., Morgan, D., Ma, Z., Kenney, J., Kim, J. & Giardina, D. (2004). A robust

European Communities (1989). COST 207, digital land mobile radio communications, final

Giannakis, G., Hua, Y., Stoica, P. & Tong, L. (2000). *Signal Processing Advances in Wireless*

Ding, Z. & Li, Y. (2001). *Blind Equalization and Identification*, Marcel Dekker, Inc.

report, *Technical report*, Commission of the European Communities.

by sum-squared auto-correlation minimization (SAM), *IEEE Trans. Signal Processing*

digital baseband predistortion constructed using memory polynomials, *IEEE Trans.*

*and Mobile Communications,* Vol.1*: Trends in Channel Estimation and Equalization*,

URL: *www.3gpp.org/ftp/Specs/html-info/TSG-WG–r1.htm*

filters for the OFDM system.

**5. Conclusions**

research work.

**6. References**

51(12): 3086–3093.

*Commun.* pp. 159–165.

Prentice-Hall, Englewood Cliffs, NJ.

Springer.

both RLS and LMS have estimated the channel property inside the signal band, they cannot provide satisfactory identification outside the signal band.

Fig. 14. Frequency property of communication channel

The estimation errors outside the signal band under various noise environments are illustrated in Fig.15, where the error is evaluated by

Fig. 15. Estimation error versus different SNR

It can be seen that even for low SNR, the estimation error successfully reduces to a low level just by tens iterations, and the estimated channel model can be applied to design adaptive filters for the OFDM system.

### **5. Conclusions**

28 Will-be-set-by-IN-TECH

both RLS and LMS have estimated the channel property inside the signal band, they cannot

The estimation errors outside the signal band under various noise environments are

 

∑ *<sup>N</sup>*1*<*|*n*|*<sup>&</sup>lt; <sup>N</sup>* 2 *<sup>H</sup>*(*ejnω*<sup>0</sup> )

∑ *<sup>N</sup>*1*<*|*n*|*<sup>&</sup>lt; <sup>N</sup>* 2

 

*<sup>H</sup>*<sup>ˆ</sup> (*ejnω*<sup>0</sup> )−*H*(*ejnω*<sup>0</sup> )

   2

 

 

 

 

 

<sup>2</sup> . (60)

provide satisfactory identification outside the signal band.

 

 

 

Fig. 15. Estimation error versus different SNR

 

Fig. 14. Frequency property of communication channel

*EH*,out =

illustrated in Fig.15, where the error is evaluated by

Channel identification using Fourier transform has been studied in this chapter. Since the OFDM transmitted signals in base band are generated through discrete Fourier transform, both the transmitted and received signals are easily managed in the frequency domain through Fourier transform. Consequently, the identification problem of OFDM channels could be solved in the frequency domain. Two channels have been investigated: the channel with long multipath interferences; and the transmitted signals with severe band limitation, where the conventional time domain methods cannot offer effective estimation. Firstly the properties of OFDM signals and the structural information have been analyzed in the frequency domain, then the relations of channel model and available information extracted from the observation data and structural information have been induced. Based on these relations, the frequency domain algorithms for OFDM channel identification have been developed, and the techniques have also been investigated to improve the identification accuracy, to deal with the time-varying channel and to reduce the computational complexity. It has been illustrated that the proposed frequency domain algorithms have better performance than the conventional time domain methods under the severe identification conditions considered in these problems, and the numerical results have demonstrated the effectiveness of Fourier transform in the channel identification applications. The algorithms work for the MIMO OFDM systems, and the estimation for OFDM channel with frequency offset are under further research work.

#### **6. References**


**0**

**5**

*Spain*

**Fast Fourier Transform Processors:**

**Implementing FFT and IFFT Cores**

A. Cortés, I. Vélez, M. Turrillas and J. F. Sevillano

*TECNUN (Universidad de Navarra) and CEIT*

**for OFDM Communication Systems**

The terms Fast Fourier Transform (FFT) and Inverse Fast Fourier Transform (IFFT) are used to denote efficient and fast algorithms to compute the Discrete Fourier Transform (DFT) and the Inverse Discrete Fourier Transform (IDFT) respectively. The FFT/IFFT is widely used in many digital signal processing applications and the efficient implementation of the FFT/IFFT

During the last years, communication systems based on Orthogonal Frequency Division Multiplexing (OFDM) have been an important driver for the research in FFT/IFFT algorithms and their implementation. OFDM is a bandwidth efficient multiple access scheme for digital communications (Engels, 2002; Nee & Prasad, 2000). Many of nowadays most important wireless communication systems use this OFDM technique: Digital Audio Broadcasting (DAB) (*World DAB Forum*, n.d.), Digital Video Broadcasting (DVB) (ETS, 2004), Wireless Local Area Network (WLAN) (IEE, 1999), Wireless Metropolitan Area Network (WMAN) (IEE, 2003) and Multi Band –OFDM Ultra Wide Band (MB–OFDM UWB) (ECM, 2005). Moreover, this technique is also employed in important wired applications such as Asymmetric Digital

OFDM systems rely on the IFFT for an efficient implementation of the signal modulation on the transmitter side, whereas the FFT is used for efficient demodulation of the received signal. The FFT/IFFT becomes one of the most critical modules in OFDM transceivers. In fact, the most computationally intensive parts of an OFDM system are the IFFT in the transmitter and the Viterbi decoder in the receiver (Maharatna et al., 2004). The FFT is the second computationally intensive part in the receiver. Therefore, the implementation of the FFT and IFFT must be optimized to achieve the required throughput with the minimum penalty in area and power consumption. The demanding requirements of modern OFDM transceivers lead, in many cases, to the implementation of special–purpose hardware for the most critical parts of the transceiver. Thus, it is common to find the FFT/IFFT implemented as aVery Large Scale Integrated (VLSI) circuit. The techniques applied to the FFT can be applied to the IFFT as well. Moreover, the IFFT can be easily obtained by manipulating the output of a FFT processor. Therefore, the discussion in this chapter concentrates on the FFT without loss

Subscriber Line (ADSL) or Power Line Communication (PLC).

**1. Introduction**

of generality.

is a topic of continuous research.

Glover, I. & Grant, P. (1998). *Digital Communications*, Prentice Hall.


## **Fast Fourier Transform Processors: Implementing FFT and IFFT Cores for OFDM Communication Systems**

A. Cortés, I. Vélez, M. Turrillas and J. F. Sevillano *TECNUN (Universidad de Navarra) and CEIT Spain*

### **1. Introduction**

30 Will-be-set-by-IN-TECH

110 Fourier Transform – Signal Processing

Hamazumi, H., Imamura, K., Iai, N., Shibuya, K. & Sasaki, M. (2000). A study of a

Hamazumi, T. & Imamura, K. (2000). Coupling canceller, *Technical report*, NHK Science &

Higuchi, K. & Sasaoka, H. (2004). Adaptive array suppressing inter symbol interference based on frequency spectrum in OFDM systems, *IEICE Trans. Commun.* J87-B: 1222–1229. Hori, S., Kikuma, N. & Inagaki, N. (2003). MMSE adaptive array suppressing only multipath

Koiveunen, V., Enescu, M. & Sirbu, M. (2004). Blind and semiblind channel estimation, *in*

Ljung, L. (1999). *System Identification – Theory for the User*, Prentice Hall, Upper Saddle River,

Muquet, M., Courville, M. & Duhamel, P. (2002). Subspace-based blind and semiblind channel estimation for OFDM systems, *IEEE Trans. Signal Processing* 50(7): 1699–1712. Nguyen, V., Winkler, M., Hansen, C. & Kuchenbecker, H. (2003). Channel estimation for

Pintelon, R. & Schoukens, J. (2001). *System Identification–A Frequency Domain Approach*, IEEE

Shibuya, K. (2006). Broadcast-wave relay technology for digital terrestrial television

Sun, L. & Sano, A. (2005). Channel identification for SFN relay station with coupling wave in

Sun, L. & Sano, A. (2007). Channel identification and applications to OFDM communication

Sun, L., Sano, A., Sun, W. & Kajiwara, H. (2009). Channel identification and interference

Suzuki, N., Uehara, H. & Yokoyama, M. (2002). A new OFDM demodulation method

Wang, X. & Poor, H. (2003). *Wireless Communication Systems- Advanced Techniques for Signal*

Ysebaert, G., Pisoni, F., Bonavetura, M., Hug, R. & Moonen, M. (2004). Echo cancellation

Yu, J. & Su, Y. (2004). Pilot assisted ML frequency-offset estimation for OFDM systems, *IEEE*

systems with limited bandwidth, *Proc. 15th European Digital Signal Processing*

compensation for OFDM system in long multipath environment, *Signal Processing*

with variable-length effective symbol and ICI canceller, *IEICE Trans. Fundamental*

in DMT-receivers: Circulant decomposition canceller, *IEEE Trans. Signal Processing*

loop interference canceller for the relay stations in an SFN for digital terrestrial broadcasting, *GLOBECOM'00: IEEE Global Telecommunications Conference*, San

waves with delay times beyond the guard interval for fixed reception in the OFDM

K. E. Barner & G. R. Arce (eds), *Nonlinear Signal Processing and Image Processing*, CRC

OFDM systems in case of insufficient guard interval length, *Proc. 15th International*

Glover, I. & Grant, P. (1998). *Digital Communications*, Prentice Hall.

Francisco, pp. 167–171.

Press, pp. 257–332.

*Conference*, Poznan, Poland.

89: 1589–1601.

E85-A(12): 2859–2867.

*Reception*, Prentice Hall.

*Trans. on Communication* 52(11): 1997–2008.

52(9): 2612–2624.

NJ.

Press.

Technology Research Laboratories.

Haykin, S. (2001). *Adaptive Filter Theory*, 4th edn, Prentice Hall.

systems, *IEICE Trans. Commun.* J86-B: 1934–1940.

*Conference on Wireless Communications*, Alberta, Canada.

broadcasting, *Proceedings of the IEEE* 94(1): 269–273.

OFDM systems, *IEICE Trans. Fundamental* J88-A(9): 1045–1054.

The terms Fast Fourier Transform (FFT) and Inverse Fast Fourier Transform (IFFT) are used to denote efficient and fast algorithms to compute the Discrete Fourier Transform (DFT) and the Inverse Discrete Fourier Transform (IDFT) respectively. The FFT/IFFT is widely used in many digital signal processing applications and the efficient implementation of the FFT/IFFT is a topic of continuous research.

During the last years, communication systems based on Orthogonal Frequency Division Multiplexing (OFDM) have been an important driver for the research in FFT/IFFT algorithms and their implementation. OFDM is a bandwidth efficient multiple access scheme for digital communications (Engels, 2002; Nee & Prasad, 2000). Many of nowadays most important wireless communication systems use this OFDM technique: Digital Audio Broadcasting (DAB) (*World DAB Forum*, n.d.), Digital Video Broadcasting (DVB) (ETS, 2004), Wireless Local Area Network (WLAN) (IEE, 1999), Wireless Metropolitan Area Network (WMAN) (IEE, 2003) and Multi Band –OFDM Ultra Wide Band (MB–OFDM UWB) (ECM, 2005). Moreover, this technique is also employed in important wired applications such as Asymmetric Digital Subscriber Line (ADSL) or Power Line Communication (PLC).

OFDM systems rely on the IFFT for an efficient implementation of the signal modulation on the transmitter side, whereas the FFT is used for efficient demodulation of the received signal. The FFT/IFFT becomes one of the most critical modules in OFDM transceivers. In fact, the most computationally intensive parts of an OFDM system are the IFFT in the transmitter and the Viterbi decoder in the receiver (Maharatna et al., 2004). The FFT is the second computationally intensive part in the receiver. Therefore, the implementation of the FFT and IFFT must be optimized to achieve the required throughput with the minimum penalty in area and power consumption. The demanding requirements of modern OFDM transceivers lead, in many cases, to the implementation of special–purpose hardware for the most critical parts of the transceiver. Thus, it is common to find the FFT/IFFT implemented as aVery Large Scale Integrated (VLSI) circuit. The techniques applied to the FFT can be applied to the IFFT as well. Moreover, the IFFT can be easily obtained by manipulating the output of a FFT processor. Therefore, the discussion in this chapter concentrates on the FFT without loss of generality.

**Reference FFT points Architecture Algorithm Application** (Jung et al., 2005) 64 Pipeline–MDC *r*–2 DIT WLAN (Maharatna et al., 2004) 64 Pipeline *r*–2 DIT WLAN (Serrá et al., 2004) 64 Monoprocessor *r*–2 DIT WLAN (Lin et al., 2005) 128 Pipeline–MRMDF MR 2/23 DIF UWB (Saberinia, 2006) 128 Pipeline–BRMDC *r*–2 DIF UWB (Lee et al., 2006) 128 Pipeline–SDF *r*–24 DIF UWB (Liu et al., 2007) 64/128 Pipeline 8-path DF MR DIF UWB (Cortés et al., 2007) 128 Pipeline *r*–24 DIF UWB (Bidet et al., 1995) 8192 Pipeline–SDC *r*–4/2 DIF DVB–T (Lin, Liu & Lee, 2004) 8192 Parallel *r*–23 DIT DVB–T (Wang et al., 2005) 2/8 K Pipeline–SDF *r*–4/2 DIF DVB–T (Lenart & Owal, 2006) 2/4/8 K Pipeline–SDF *r*–22 DIF DVB–T/H (Lee & Park, 2007) 8 K Pipeline–SDF BD DVB–T (He & Torkelson, 1998) 1024 Pipeline–SDF *r*–22 DIF OFDM (Kuo et al., 2003) 64–2048 Cached memory *r*–2 DIT OFDM (Chang & Park, 2004) 1024 Monoprocessor *r*–4 DIF OFDM (Jiang et al., 2004) 64 Parallel *r*–2 DIT OFDM (Lin, Lin, Chen & Chang, 2004) 64 Pipeline *r*–2 DIF MIMO (Rudagi et al., 2010) 64 Pipeline *r*–2 DIT OFDM (Yu et al., 2011) 64 Pipeline *r*–2 DIF OFDM (Tsai et al., 2011) 64 Pipeline MR OFDM (Turrillas et al., 2010) 32 K Pipeline *r*–2*<sup>k</sup>* DIF DVB–T2

113

• For the parallel architectures, FFT algorithms such as radix 2 (Jiang et al., 2004) and radix

• For the pipeline architectures, different FFT algorithms such as radix 2 (Jung et al., 2005; Rudagi et al., 2010; Saberinia, 2006; Yu et al., 2011), mixed radix (MR) (Bidet et al., 1995; Lin, Lin, Chen & Chang, 2004; Lin et al., 2005; Liu et al., 2007; Tsai et al., 2011; Wang et al., 2005), radix 22 (Cortés et al., 2007; He & Torkelson, 1998; Lenart & Owal, 2006; Turrillas et al., 2010), radix 23 (Cortés et al., 2007; He & Torkelson, 1998; Turrillas et al., 2010), radix 24 (Cortés et al., 2007; Lee et al., 2006; Turrillas et al., 2010) and balanced decomposition (BD) (Lee & Park, 2007) which reduces the number of twiddle factors have

When the length of the FFT is not very large, the fixed-point format is the most widely used number representation format due to the area-saving with respect to the floating-point representation. Thus, (Chang & Park, 2004; Cortés et al., 2007; He & Torkelson, 1998; Jiang et al., 2004; Jung et al., 2005; Kuo et al., 2003; Lee et al., 2006; Lin, Lin, Chen & Chang, 2004; Lin et al., 2005; Liu et al., 2007; Maharatna et al., 2004; Saberinia, 2006; Serrá et al., 2004) proposed fixed–point FFTs for *N <* 8192. Nevertheless, (Wang et al., 2005) also used the fixed–point format for large FFTs. However, when the length of the FFT increases, different types of non fixed-point formats have been used. (Bidet et al., 1995) proposed a 8K processor that uses Convergent block floating–point to improve the quantization error of the FFT design. The block floating–point implementation or the semi–floating point (SFlP) format achieve a more efficient FFT design in (Lee & Park, 2007; Lenart & Owal, 2006). In a different approach, (Turrillas et al., 2010) employed a variable datapath technique to process a 32K points FFT.

Table 1. FFT proposals for OFDM systems

Fast Fourier Transform Processors:

Implementing FFT and IFFT Cores for OFDM Communication Systems

been proposed.

23 (Lin, Liu & Lee, 2004) have been used.

Different kinds of FFT algorithms can be found in the literature; e.g.: (Good, 1958; Thomas, 1963), (Cooley & Tukey, 1965), (Rader, 1968b), (Rader, 1968a), (Bruun, 1978) and (Winograd, 1978). Among the different kinds of FFT algorithms, the algorithms based on the approach proposed by James W. Cooley and John W. Tukey in (Cooley & Tukey, 1965) are very popular in OFDM systems. These Cooley–Tukey (CT) algorithms present a very regular structure, which facilitates an efficient implementation. The computations of the FFT is divided into *logr*(*N*) stages, where *N* is the number of points of the FFT and *r* is called the radix of the algorithm. Within each stage, data shuffling and the so-called butterfly computation and twiddle factor multiplications are performed. Usually, the butterfly operations are all identical and the twiddle factor multiplications and data shuffling follow some kind of pattern. This regular structure makes them very attractive for VLSI circuit implementation.

Different hardware architectures have been used in the literature for the implementation of the CT algorithms. The FFT hardware architectures can be classified into three groups:


It is common in the literature to further classify the pipeline architectures according to the structure used for the shuffling into two basic types: Delay Commutator (DC) and Delay Feedback (DF). Also, according to the number of lines of data used in these pipeline architectures, they can be classified into Single–path (S) or Multiple–path (M) architectures.

Many different variations of CT algorithms have been proposed in the literature to improve different aspects of the implementation (memory resources, number of arithmetic operations, etc.) and their mapping to a specific hardware architecture. Table 1 summarizes the features of some FFT/IFFT processors for OFDM systems proposed in the literature. Proposals such as (Chang & Park, 2004; Serrá et al., 2004) employed a monoprocessor architecture to process the FFT. (Jiang et al., 2004; Lin, Liu & Lee, 2004) used parallel architectures and (Kuo et al., 2003) chose a cached memory monoprocessor architecture for the FFT processing in an OFDM system. However, the most widely used architectures for the FFT/IFFT processor in an OFDM system are pipeline architectures. (Cortés et al., 2007; He & Torkelson, 1998; Lee & Park, 2007; Lee et al., 2006; Turrillas et al., 2010; Wang et al., 2005) propose Single-Path Delay Feedback (SDF) architectures. (Lin et al., 2005; Liu et al., 2007) employ an Multi-Path Delay Feedback (MDF) architecture. (Bidet et al., 1995) proposes a Single-Path Delay Commutator (SDC) architecture and (Jung et al., 2005; Saberinia, 2006) chose an Multi-Path Delay Commutator (MDC) architecture. (Saberinia, 2006) called the MDC architecture as Buffered Multi-Path Delay Commutator (BRMDC) due to the buffers used in the input data. Analyzing the radix, *r*, of the algorithms employed in the literature, it can be observed that:

• For the monoprocessor architectures, radix 2 (Serrá et al., 2004) and radix 4 (Chang & Park, 2004) algorithms have been used.

2 Will-be-set-by-IN-TECH

Different kinds of FFT algorithms can be found in the literature; e.g.: (Good, 1958; Thomas, 1963), (Cooley & Tukey, 1965), (Rader, 1968b), (Rader, 1968a), (Bruun, 1978) and (Winograd, 1978). Among the different kinds of FFT algorithms, the algorithms based on the approach proposed by James W. Cooley and John W. Tukey in (Cooley & Tukey, 1965) are very popular in OFDM systems. These Cooley–Tukey (CT) algorithms present a very regular structure, which facilitates an efficient implementation. The computations of the FFT is divided into *logr*(*N*) stages, where *N* is the number of points of the FFT and *r* is called the radix of the algorithm. Within each stage, data shuffling and the so-called butterfly computation and twiddle factor multiplications are performed. Usually, the butterfly operations are all identical and the twiddle factor multiplications and data shuffling follow some kind of pattern. This

Different hardware architectures have been used in the literature for the implementation of the CT algorithms. The FFT hardware architectures can be classified into three groups:

• Monoprocessor: A single hardware element is used to perform all the butterflies, twiddle factor multiplications and data shuffling of each stage. The same hardware is reused for

• Parallel: The computation of the butterflies, twiddle factor multiplications and data shuffling within one stage is accelerated by using several processing elements. The same

• Pipeline: A single hardware element is used to perform all the butterflies, twiddle factor multiplications and data shuffling of each stage. However, in contrast to former categories,

It is common in the literature to further classify the pipeline architectures according to the structure used for the shuffling into two basic types: Delay Commutator (DC) and Delay Feedback (DF). Also, according to the number of lines of data used in these pipeline architectures, they can be classified into Single–path (S) or Multiple–path (M) architectures. Many different variations of CT algorithms have been proposed in the literature to improve different aspects of the implementation (memory resources, number of arithmetic operations, etc.) and their mapping to a specific hardware architecture. Table 1 summarizes the features of some FFT/IFFT processors for OFDM systems proposed in the literature. Proposals such as (Chang & Park, 2004; Serrá et al., 2004) employed a monoprocessor architecture to process the FFT. (Jiang et al., 2004; Lin, Liu & Lee, 2004) used parallel architectures and (Kuo et al., 2003) chose a cached memory monoprocessor architecture for the FFT processing in an OFDM system. However, the most widely used architectures for the FFT/IFFT processor in an OFDM system are pipeline architectures. (Cortés et al., 2007; He & Torkelson, 1998; Lee & Park, 2007; Lee et al., 2006; Turrillas et al., 2010; Wang et al., 2005) propose Single-Path Delay Feedback (SDF) architectures. (Lin et al., 2005; Liu et al., 2007) employ an Multi-Path Delay Feedback (MDF) architecture. (Bidet et al., 1995) proposes a Single-Path Delay Commutator (SDC) architecture and (Jung et al., 2005; Saberinia, 2006) chose an Multi-Path Delay Commutator (MDC) architecture. (Saberinia, 2006) called the MDC architecture as Buffered Multi-Path Delay Commutator (BRMDC) due to the buffers used in the input data. Analyzing the radix,

• For the monoprocessor architectures, radix 2 (Serrá et al., 2004) and radix 4 (Chang & Park,

regular structure makes them very attractive for VLSI circuit implementation.

hardware elements are again reused for all the stages.

a different hardware element is used to process each stage.

*r*, of the algorithms employed in the literature, it can be observed that:

2004) algorithms have been used.

all the stages.


Table 1. FFT proposals for OFDM systems


When the length of the FFT is not very large, the fixed-point format is the most widely used number representation format due to the area-saving with respect to the floating-point representation. Thus, (Chang & Park, 2004; Cortés et al., 2007; He & Torkelson, 1998; Jiang et al., 2004; Jung et al., 2005; Kuo et al., 2003; Lee et al., 2006; Lin, Lin, Chen & Chang, 2004; Lin et al., 2005; Liu et al., 2007; Maharatna et al., 2004; Saberinia, 2006; Serrá et al., 2004) proposed fixed–point FFTs for *N <* 8192. Nevertheless, (Wang et al., 2005) also used the fixed–point format for large FFTs. However, when the length of the FFT increases, different types of non fixed-point formats have been used. (Bidet et al., 1995) proposed a 8K processor that uses Convergent block floating–point to improve the quantization error of the FFT design. The block floating–point implementation or the semi–floating point (SFlP) format achieve a more efficient FFT design in (Lee & Park, 2007; Lenart & Owal, 2006). In a different approach, (Turrillas et al., 2010) employed a variable datapath technique to process a 32K points FFT.

**2.1 Review of the DFT matrix factorization** The DFT can be expressed matricially as,

Fast Fourier Transform Processors:

**x***<sup>T</sup>* = �

1 *e*−*<sup>j</sup>* <sup>2</sup>*<sup>π</sup>*

(*N*−1)2*π <sup>N</sup> e*−*<sup>j</sup>*

**<sup>T</sup>***<sup>N</sup>* <sup>=</sup> **<sup>P</sup>**(*r*)

where *<sup>r</sup>* is the radix of the algorithm, *<sup>p</sup>* <sup>=</sup> *<sup>N</sup>*/*<sup>r</sup>* is a positive integer (*<sup>p</sup>* <sup>∈</sup> **<sup>N</sup>**∗) and **<sup>P</sup>**(*r*)

*<sup>N</sup>* = [**I***<sup>r</sup>* <sup>⊗</sup> **<sup>T</sup>***N*/*r*] · **<sup>D</sup>**(*r*)

Given an *m* × *n* matrix **A** and a matrix **B**, the Kronecker product is defined as

. . . . .

*<sup>N</sup>* <sup>=</sup> *quasidiag*({ **<sup>I</sup>***<sup>p</sup>* **<sup>K</sup>**<sup>1</sup>

(*p*−1)2*π*

*<sup>N</sup>* · [**I***<sup>r</sup>* <sup>⊗</sup> **<sup>T</sup>***N*/*r*] · **<sup>D</sup>**(*r*)

Equation (7) shows how to write a DFT matrix in terms of smaller DFT matrices. This process can be repeated recursively until a expression in terms of **T***r* is arrived. Equation (7) is a matricial representation of the decomposition technique used in the well known

*<sup>N</sup>* ··· *<sup>e</sup>*−*<sup>j</sup>*

*<sup>N</sup>* · { *<sup>x</sup>*<sup>0</sup> *<sup>x</sup>*<sup>1</sup> ··· *xN*−<sup>1</sup> }*<sup>T</sup>* <sup>=</sup> { **<sup>y</sup>**<sup>0</sup> **<sup>y</sup>**<sup>1</sup> ··· **<sup>y</sup>***p*−<sup>1</sup> }*<sup>T</sup>*

where **I***<sup>z</sup>* is the identity matrix of size *z*. The symbol ⊗ represents the Kronecker product.

*<sup>a</sup>*0,0**<sup>B</sup>** *<sup>a</sup>*0,1**<sup>B</sup>** ··· *<sup>a</sup>*0,*n*−1**<sup>B</sup>** *<sup>a</sup>*1,0**<sup>B</sup>** *<sup>a</sup>*1,1**<sup>B</sup>** ··· *<sup>a</sup>*1,*n*−1**<sup>B</sup>**

*am*−1,0**<sup>B</sup>** *am*−1,1**<sup>B</sup>** ··· *am*−1,*n*−1**<sup>B</sup>**

*<sup>p</sup>* **K**<sup>2</sup>

*<sup>N</sup>* }). Replacing (4) in (3),

. .

. .

*<sup>p</sup>* ··· **<sup>K</sup>***r*−<sup>1</sup>

1 *e*−*<sup>j</sup>*

and **T***<sup>N</sup>* is the size *N* square matrix of coefficients given by

Implementing FFT and IFFT Cores for OFDM Communication Systems

⎡ ⎢ ⎢ ⎢ ⎣

**T***<sup>N</sup>* =

In order to factorize **T***N*, (Sloate, 1974) defines

stride permutation matrix. This matrix, **P**(*r*)

**P**(*r*)

with **<sup>y</sup>***<sup>i</sup>* <sup>=</sup> { *xi xi*<sup>+</sup>*<sup>p</sup> xi*<sup>+</sup>2*<sup>p</sup>* ··· *xi*+(*r*−1)*<sup>p</sup>* }. The matrix **<sup>T</sup>**(*p*)

**A** ⊗ **B** =

**D**(*r*)

*<sup>N</sup> e*−*<sup>j</sup>* <sup>2</sup>·2*<sup>π</sup>*

**<sup>T</sup>***<sup>N</sup>* <sup>=</sup> **<sup>P</sup>**(*r*)

*<sup>N</sup>* is a diagonal matrix given by

Cooley-Tukey FFT (Cooley & Tukey, 1965).

with **<sup>K</sup>***<sup>p</sup>* <sup>=</sup> *diag*({ <sup>1</sup> *<sup>e</sup>*−*<sup>j</sup>* <sup>2</sup>*<sup>π</sup>*

**T**(*p*)

⎡ ⎢ ⎢ ⎢ ⎣ *x*<sup>0</sup> *x*<sup>1</sup> *x*<sup>2</sup> ··· *xN*−<sup>1</sup>

11 1 ··· 1

··· ··· ··· ··· ···

2(*N*−1)2*π*

*<sup>N</sup>* · **<sup>T</sup>**(*p*)

*<sup>N</sup> e*−*<sup>j</sup>* <sup>2</sup>·2*<sup>π</sup>*

<sup>−</sup>*<sup>j</sup> mn*2*<sup>π</sup>*

*<sup>N</sup>* ··· *<sup>e</sup>*−*<sup>j</sup>*

*<sup>N</sup>* ··· *<sup>e</sup>*−*<sup>j</sup>*

*<sup>N</sup>* , is defined by its effect on a vector:

(**T***N*)*mn* = *e*

where

**D**(*r*)

DFT(**x**) = **T***<sup>N</sup>* · **x** (1)

(*N*−1)2*π N*

(*N*−1)22*<sup>π</sup> N*

*<sup>N</sup>* can be expressed as

⎤ ⎥ ⎥ ⎥ ⎦ .

*<sup>N</sup>* . (2)

*<sup>N</sup>* , (3)

*<sup>N</sup>* · [**T***<sup>r</sup>* ⊗ **I***N*/*r*] (4)

*<sup>p</sup>* }), (6)

*<sup>N</sup>* · [**T***<sup>r</sup>* ⊗ **I***N*/*r*]. (7)

*<sup>N</sup>* is the

115

(5)

⎤ ⎥ ⎥ ⎥ ⎦

�

From Table 1, it can be seen that many different algorithms and architectures have been proposed for OFDM systems. The designer must select the most appropriate algorithm and the most efficient architecture for that algorithm, given the specifications of a certain OFDM system. This selection is a difficult task. There is not a clear algorithm/architecture winner. Therefore, the designer should explore different algorithms and architectures in the literature to find the optimal one for the specific OFDM application under development.

The typical way of expressing the FFT algorithms in the literature is by means of summations or flow graph notation. Examples of these representations can be found in (He & Torkelson, 1998; Lee & Park, 2007; Lee et al., 2006; Lin, Lin, Chen & Chang, 2004; Lin, Liu & Lee, 2004; Lin et al., 2005; Liu et al., 2007; Maharatna et al., 2004; Tsai et al., 2006). These representations do not help the designer to understand the algorithm fast, therefore, making it difficult to relate it to its HW resources. Additionally, sometimes there is a lack of a general expression for the algorithm/architecture, which makes it harder to adapt and evaluate for a different OFDM system. Therefore, these representations are not practical for design space exploration. What the designer needs for efficient design space exploration is a general expression for the different design parameters of the algorithms which is easy to understand and makes it fast to map to hardware resources.

In (Pease, 1968), a matrix notation to express the FFT is proposed. Different FFT algorithms are obtained combining a reduced set of operators to simplify the implementation of parallel processing in a special–purpose machine. In (Sloate, 1974), H. Sloate used the same approach as (Pease, 1968) to demonstrate how several FFT algorithms previously defined could be derived using the matricial expressions. Additionally, he analyzed some new algorithms and worked out how to relate the matricial expressions to their implementation. However, that notation is not generalized for pipeline architectures. Recently, (Cortés et al., 2009) generalized the above approach presenting a unified approach for radix *r<sup>k</sup>* pipeline SDC/SDF FFT architectures. Radix *r<sup>k</sup>* pipeline FFT architectures are very efficient architectures that are well suited for OFDM systems.

This chapter reviews the matricial representation of radix *r<sup>k</sup>* pipeline SDF FFT architectures. Thus, a general expression in terms of the FFT design parameters that can be linked easily to hardware implementation resources is presented. This way, the designer of FFT/IFFT processors for OFDM systems is provided with the tools for efficient design space exploration. The design space exploration and the optimal architecture selection procedure is illustrated by means of a case study. The case study analyzes the FFT/IFFT processor in a WLAN IEEE 802.11a transceiver. The high level analysis proposed in (Cortés et al., 2009) is extended to implementation level to select the most efficient FFT/IFFT core in terms of area and power consumption.

### **2. A unified matricial approach for radix** *r<sup>k</sup>* **FFT SDF pipeline architectures**

This section presents the matricial representation of radix *r<sup>k</sup>* Decimation In Frequency (DIF) SDF pipeline architectures. First, the DFT matrix factorization procedure that leads to the FFT algorithms is reviewed. This review is used to define the basic types of matrices needed for the FFT. Next, in order to simplify the notation and latter mapping of the matricial representation to hardware resources, some operators are defined. Then, the general expression for radix *r<sup>k</sup>* FFT SDF pipeline architectures is presented and the mapping to hardware resources illustrated.

#### **2.1 Review of the DFT matrix factorization**

The DFT can be expressed matricially as,

$$\text{DFT}(\mathbf{x}) = \mathbf{T}\_N \cdot \mathbf{x} \tag{1}$$

where

4 Will-be-set-by-IN-TECH

From Table 1, it can be seen that many different algorithms and architectures have been proposed for OFDM systems. The designer must select the most appropriate algorithm and the most efficient architecture for that algorithm, given the specifications of a certain OFDM system. This selection is a difficult task. There is not a clear algorithm/architecture winner. Therefore, the designer should explore different algorithms and architectures in the literature

The typical way of expressing the FFT algorithms in the literature is by means of summations or flow graph notation. Examples of these representations can be found in (He & Torkelson, 1998; Lee & Park, 2007; Lee et al., 2006; Lin, Lin, Chen & Chang, 2004; Lin, Liu & Lee, 2004; Lin et al., 2005; Liu et al., 2007; Maharatna et al., 2004; Tsai et al., 2006). These representations do not help the designer to understand the algorithm fast, therefore, making it difficult to relate it to its HW resources. Additionally, sometimes there is a lack of a general expression for the algorithm/architecture, which makes it harder to adapt and evaluate for a different OFDM system. Therefore, these representations are not practical for design space exploration. What the designer needs for efficient design space exploration is a general expression for the different design parameters of the algorithms which is easy to understand and makes it fast

In (Pease, 1968), a matrix notation to express the FFT is proposed. Different FFT algorithms are obtained combining a reduced set of operators to simplify the implementation of parallel processing in a special–purpose machine. In (Sloate, 1974), H. Sloate used the same approach as (Pease, 1968) to demonstrate how several FFT algorithms previously defined could be derived using the matricial expressions. Additionally, he analyzed some new algorithms and worked out how to relate the matricial expressions to their implementation. However, that notation is not generalized for pipeline architectures. Recently, (Cortés et al., 2009) generalized the above approach presenting a unified approach for radix *r<sup>k</sup>* pipeline SDC/SDF FFT architectures. Radix *r<sup>k</sup>* pipeline FFT architectures are very efficient architectures that are

This chapter reviews the matricial representation of radix *r<sup>k</sup>* pipeline SDF FFT architectures. Thus, a general expression in terms of the FFT design parameters that can be linked easily to hardware implementation resources is presented. This way, the designer of FFT/IFFT processors for OFDM systems is provided with the tools for efficient design space exploration. The design space exploration and the optimal architecture selection procedure is illustrated by means of a case study. The case study analyzes the FFT/IFFT processor in a WLAN IEEE 802.11a transceiver. The high level analysis proposed in (Cortés et al., 2009) is extended to implementation level to select the most efficient FFT/IFFT core in terms of area and power

**2. A unified matricial approach for radix** *r<sup>k</sup>* **FFT SDF pipeline architectures**

This section presents the matricial representation of radix *r<sup>k</sup>* Decimation In Frequency (DIF) SDF pipeline architectures. First, the DFT matrix factorization procedure that leads to the FFT algorithms is reviewed. This review is used to define the basic types of matrices needed for the FFT. Next, in order to simplify the notation and latter mapping of the matricial representation to hardware resources, some operators are defined. Then, the general expression for radix *r<sup>k</sup>* FFT SDF pipeline architectures is presented and the mapping to hardware resources

to find the optimal one for the specific OFDM application under development.

to map to hardware resources.

well suited for OFDM systems.

consumption.

illustrated.

$$\mathbf{x}^T = \left\{ \mathbf{x}\_0 \ x\_1 \ x\_2 \ \cdots \ x\_{N-1} \ \right\}.$$

and **T***<sup>N</sup>* is the size *N* square matrix of coefficients given by

$$(\mathbf{T}\_N)\_{mn} = e^{-j\frac{m\pi\lambda\pi}{N}}.\tag{2}$$

$$\mathbf{T}\_N = \begin{bmatrix} 1 & 1 & 1 & \cdots & 1\\ 1 & e^{-j\frac{2\pi}{N}} & e^{-j\frac{2\pi\pi}{N}} & \cdots & e^{-j\frac{(N-1)2\pi}{N}}\\ \vdots & \cdots & \cdots & \cdots & \cdots & \cdots\\ 1 & e^{-j\frac{(N-1)2\pi}{N}} & e^{-j\frac{2(N-1)2\pi}{N}} & \cdots & e^{-j\frac{(N-1)^22\pi}{N}} \end{bmatrix}$$

$$(\mathbf{T}\_{N-1}^{(N-1)} \mathbf{1} \mathbf{1}^{(N)})$$

In order to factorize **T***N*, (Sloate, 1974) defines

$$\mathbf{T}\_N = \mathbf{P}\_N^{(r)} \cdot \mathbf{T}\_N^{(p)} \, \tag{3}$$

where *<sup>r</sup>* is the radix of the algorithm, *<sup>p</sup>* <sup>=</sup> *<sup>N</sup>*/*<sup>r</sup>* is a positive integer (*<sup>p</sup>* <sup>∈</sup> **<sup>N</sup>**∗) and **<sup>P</sup>**(*r*) *<sup>N</sup>* is the stride permutation matrix. This matrix, **P**(*r*) *<sup>N</sup>* , is defined by its effect on a vector:

$$\mathbf{P}\_N^{(r)} \cdot \{ \mathbf{x}\_0 \ \mathbf{x}\_1 \ \cdots \ \mathbf{x}\_{N-1} \ \}^T = \{ \mathbf{y}\_0 \ \mathbf{y}\_1 \ \cdots \ \mathbf{y}\_{p-1} \ \}^T$$

with **<sup>y</sup>***<sup>i</sup>* <sup>=</sup> { *xi xi*<sup>+</sup>*<sup>p</sup> xi*<sup>+</sup>2*<sup>p</sup>* ··· *xi*+(*r*−1)*<sup>p</sup>* }. The matrix **<sup>T</sup>**(*p*) *<sup>N</sup>* can be expressed as

$$\mathbf{T}\_N^{(p)} = [\mathbf{I}\_r \otimes \mathbf{T}\_{N/r}] \cdot \mathbf{D}\_N^{(r)} \cdot [\mathbf{T}\_r \otimes \mathbf{I}\_{N/r}] \tag{4}$$

where **I***<sup>z</sup>* is the identity matrix of size *z*. The symbol ⊗ represents the Kronecker product. Given an *m* × *n* matrix **A** and a matrix **B**, the Kronecker product is defined as

$$\mathbf{A} \otimes \mathbf{B} = \begin{bmatrix} a\_{0,0}\mathbf{B} & a\_{0,1}\mathbf{B} & \cdots & a\_{0,n-1}\mathbf{B} \\ a\_{1,0}\mathbf{B} & a\_{1,1}\mathbf{B} & \cdots & a\_{1,n-1}\mathbf{B} \\ \vdots & \vdots & & \vdots \\ a\_{m-1,0}\mathbf{B} & a\_{m-1,1}\mathbf{B} & \cdots & a\_{m-1,n-1}\mathbf{B} \end{bmatrix} \tag{5}$$

**D**(*r*) *<sup>N</sup>* is a diagonal matrix given by

$$\mathbf{D}\_{N}^{(r)} = \text{quasisdiag}(\{\mathbf{I}\_p \ \mathbf{K}\_p^1 \ \mathbf{K}\_p^2 \ \cdots \ \mathbf{K}\_p^{r-1}\}),\tag{6}$$

with **<sup>K</sup>***<sup>p</sup>* <sup>=</sup> *diag*({ <sup>1</sup> *<sup>e</sup>*−*<sup>j</sup>* <sup>2</sup>*<sup>π</sup> <sup>N</sup> e*−*<sup>j</sup>* <sup>2</sup>·2*<sup>π</sup> <sup>N</sup>* ··· *<sup>e</sup>*−*<sup>j</sup>* (*p*−1)2*π <sup>N</sup>* }). Replacing (4) in (3),

$$\mathbf{T}\_N = \mathbf{P}\_N^{(r)} \cdot [\mathbf{I}\_r \otimes \mathbf{T}\_{N/r}] \cdot \mathbf{D}\_N^{(r)} \cdot [\mathbf{T}\_r \otimes \mathbf{I}\_{N/r}].\tag{7}$$

Equation (7) shows how to write a DFT matrix in terms of smaller DFT matrices. This process can be repeated recursively until a expression in terms of **T***r* is arrived. Equation (7) is a matricial representation of the decomposition technique used in the well known Cooley-Tukey FFT (Cooley & Tukey, 1965).

where *pk* = *rk*(*nk*−1)

*b* = 1, **H**(1,*k*,*i*) is given by,

where **V**(*b*,*k*·*i*) is given by

When *b* = 2, **H**(2,*k*,*i*) reduces to,

Fast Fourier Transform Processors:

When *b* is even and *b >* 2, **H**(*b*,*k*,*i*) is given by,

**<sup>H</sup>**(*b*,*k*,*i*) <sup>=</sup> **M1**(*k*·*i*,*k*) · (**S**(*k*·*i*+*b*−1)

where **V**(*b*,*k*·*i*) is given by (19) and **G**(*b*,*k*·*i*,*m*)

When *b* is odd and *b >* 1, **H**(*b*,*k*,*i*) is given by,

where **G**(*b*,*k*·*i*,*m*) is given by (22).

**G**(*b*,*k*·*i*,*m*) =

**<sup>H</sup>**(*b*,*k*,*i*) <sup>=</sup> **M1**(*k*·*i*,*k*) · (**S**(*k*·*i*+*b*−1)

⎧

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

. The term **H**(*b*,*k*,*i*) represents the *i*

Implementing FFT and IFFT Cores for OFDM Communication Systems

**<sup>H</sup>**(2,*k*,*i*) <sup>=</sup> **M1**(*k*·*i*,*k*) · (**S**(*k*·*i*+1)

**<sup>V</sup>**(*b*,*k*·*i*) <sup>=</sup> **M2**(*k*·*i*+*b*−2) · (**S**(*k*·*i*)+*b*−<sup>2</sup>

**M3**(*k*·*i*+2(*M*−*m*),*b*−2(*M*−*m*))

**M3**(*k*·*i*+2(*M*−*m*),*b*−2(*M*−*m*))

·(**S**(*k*·*i*+*b*−2*m*−3)

·**M2**(*k*·*i*+*b*−2*m*−4)

·(**S**(*k*·*i*+*b*−2*m*−4)

·(**S**(*k*·*i*+*b*−2*m*−2)

·**M2**(*k*·*i*+*b*−2*m*−3)

·(**S**(*k*·*i*+*b*−2*m*−3)

processing stage with the structure of a *r<sup>l</sup>* algorithm is required.

**<sup>H</sup>**(1,*k*,*i*) <sup>=</sup> **M1**(*k*·*i*,*k*) · (**S**(*k*·*i*)

th stage of a *r<sup>k</sup>* FFT algorithm. When

. (17)

, (18)

. (19)

) �

, (20)

117

, (21)

(22)

**G**(*b*,*k*·*i*,*m*)

**G**(*b*,*k*·*i*,*m*)

, *i f* b even and b>2

, *i f* b odd and b>1 .

�

)−<sup>1</sup> · **<sup>B</sup>** · **<sup>S</sup>**(*k*·*i*)

)−<sup>1</sup> · **<sup>B</sup>** · **<sup>S</sup>**(*k*·*i*+1) · **<sup>V</sup>**(2,*k*·*i*)

)−<sup>1</sup> · **<sup>B</sup>** · **<sup>S</sup>**(*k*·*i*+*b*−1) · **<sup>V</sup>**(*b*,*k*·*i*) ·

)−<sup>1</sup> · **<sup>B</sup>** · **<sup>S</sup>**(*k*·*i*+*b*−1) ·

)−<sup>1</sup> · **<sup>B</sup>** · **<sup>S</sup>**(*k*·*i*+*b*−2*m*−3)

)−<sup>1</sup> · **<sup>B</sup>** · **<sup>S</sup>**(*k*·*i*+*b*−2*m*−4)

)−<sup>1</sup> · **<sup>B</sup>** · **<sup>S</sup>**(*k*·*i*+*b*−2*m*−2)

)−<sup>1</sup> · **<sup>B</sup>** · **<sup>S</sup>**(*k*·*i*+*b*−2*m*−3)

When *N* = *rknk*+*<sup>l</sup>* with *l >* 0, (14) means that, after *nk* stages with *r<sup>k</sup>* structure, an additional

The above matricial representation is general in terms of *N*, *r* and *k*. The Decimation In Time (DIT) version of the architecture can be easily obtained by transposing the expressions.

, by (22).

)−<sup>1</sup> · **<sup>B</sup>** · **<sup>S</sup>**(*k*·*i*+*b*−2)

� *<sup>M</sup>*<sup>=</sup> *<sup>b</sup>* <sup>2</sup> −2 ∏*m*=0

� *<sup>M</sup>*<sup>=</sup> *<sup>b</sup>*−<sup>1</sup> <sup>2</sup> −1 ∏*m*=0

#### **2.2 Definition of operators**

In order to simplify the notation, nine operators are defined. Two different types of operators can be distinguished: the reordering operators and the arithmetic operators.

	- **–** *Shuffling Operator* **S**(*a*) : Let *N* be divisible by *ra*,

$$\mathbf{S}^{(a)} = \mathbf{I}\_{r^a} \otimes \mathbf{P}^{(r)}\_{N/r^a}.\tag{8}$$

Note that,

$$(\mathbf{S}^{(a)})^{-1} = \mathbf{I}\_{r^{\mathfrak{d}}} \otimes (\mathbf{P}\_{N/r^{\mathfrak{d}}}^{(r)})^{-1}.\tag{9}$$

	- **–** *Butterfly Operator* **B**: Let *N* be divisible by *r*,

$$\mathbf{B} = \mathbf{I}\_{\mathrm{N}/r} \otimes \mathbf{T}\_r. \tag{10}$$

**–** *First Twiddle Factor Multiplier Operator* **M1**(*a*,*b*) : Let *N* be divisible by *ra*+*b*,

$$\mathbf{M1}^{(a,b)} = \begin{cases} \mathbf{I}\_{r^a} \otimes \left( \prod\_{l=1}^b [\mathbf{P}\_{r^l}^{(r)} \otimes \mathbf{I}\_{N/r^{a+l}}] \cdot \mathbf{D}\_{N/r^d}^{(r^l)} \right. \\ \cdot \prod\_{l=1}^b [\mathbf{P}\_{r^l}^{(r)} \otimes \mathbf{I}\_{N/r^{a+l}}])\_\prime & if r^b < \frac{N}{r^d} \\ \mathbf{I}\_{N\prime} & if r^b \ge \frac{N}{r^d} . \end{cases} \tag{11}$$

**–** *Second Twiddle Factor Multiplier Operator* **M2**(*a*) : Let *N* be divisible by *ra*<sup>+</sup>2,

$$\mathbf{M2}^{(a)} = \mathbf{I}\_{r^4} \otimes \mathbf{D}\_{r^2}^{(r)} \otimes \mathbf{I}\_{N/r^{a+2}}.\tag{12}$$

**–** *Third Twiddle Factor Multiplier Operator* **M3**(*a*,*b*) : Let *<sup>N</sup>* be divisible by *<sup>r</sup>a*+*<sup>b</sup>* and *<sup>b</sup>* <sup>≥</sup> 2,

$$\mathbf{M}\mathbf{3}^{(a,b)} = \mathbf{I}\_{r^a} \otimes \left( \left[ \mathbf{P}\_{r^2}^{(r)} \otimes \mathbf{I}\_{r^{(b-2)}} \right] \cdot \mathbf{D}\_{r^b}^{(r^2)} \cdot \left[ \mathbf{P}\_{r^2}^{(r)} \otimes \mathbf{I}\_{r^{(b-2)}} \right] \right) \otimes \mathbf{I}\_{N/r^{a+b}}.\tag{13}$$

### **2.3 Matricial representation of radix** *r<sup>k</sup>* **pipeline FFT architecture**

The decomposition procedure given in equation (7) can be applied recursively to devise the *<sup>r</sup><sup>k</sup>* SDF pipeline architectures. Let *<sup>N</sup>* <sup>=</sup> *<sup>r</sup>*(*knk*+*l*) with {*k*, *nk*} ∈ **<sup>N</sup>**<sup>∗</sup> and *<sup>l</sup>* ∈ {0, 1, . . . , *<sup>k</sup>* <sup>−</sup> <sup>1</sup>}, the matricial representation of *r<sup>k</sup>* DIF SDF pipeline architectures is given by

$$\mathbf{T}\_N = \begin{cases} \mathbf{Q}\_N \cdot \mathbf{T}\_N^{(p\_k)}, & \text{for } \mathbf{l} = 0 \\\\ \mathbf{Q}\_N \cdot \mathbf{H}^{(l,k,n\_k)} \cdot \mathbf{T}\_N^{(p\_k)}, & \text{for } \mathbf{l} > 0 \end{cases} \tag{14}$$

$$\mathbf{T}\_N^{(p\_k)} = \prod\_{m=0}^{n\_k - 1} \mathbf{H}^{(k, k, n\_k - m - 1)} \tag{15}$$

$$\mathbf{Q}\_N = \prod\_{i=0}^{n\_1 - 1} \mathbf{S}^{(i)}.\tag{16}$$

where *pk* = *rk*(*nk*−1) . The term **H**(*b*,*k*,*i*) represents the *i* th stage of a *r<sup>k</sup>* FFT algorithm. When *b* = 1, **H**(1,*k*,*i*) is given by,

$$\mathbf{H}^{(1,k,i)} = \mathbf{M} \mathbf{1}^{(k \cdot i, k)} \cdot (\mathbf{S}^{(k \cdot i)})^{-1} \cdot \mathbf{B} \cdot \mathbf{S}^{(k \cdot i)}.\tag{17}$$

When *b* = 2, **H**(2,*k*,*i*) reduces to,

$$\mathbf{H}^{(2,k,i)} = \mathbf{M}\mathbf{1}^{(k\cdot i,k)} \cdot (\mathbf{S}^{(k\cdot i+1)})^{-1} \cdot \mathbf{B} \cdot \mathbf{S}^{(k\cdot i+1)} \cdot \mathbf{V}^{(2,k\cdot i)},\tag{18}$$

where **V**(*b*,*k*·*i*) is given by

6 Will-be-set-by-IN-TECH

In order to simplify the notation, nine operators are defined. Two different types of operators

**<sup>S</sup>**(*a*) <sup>=</sup> **<sup>I</sup>***ra* <sup>⊗</sup> **<sup>P</sup>**(*r*)

)−<sup>1</sup> <sup>=</sup> **<sup>I</sup>***ra* <sup>⊗</sup> (**P**(*r*)

:

*rl* <sup>⊗</sup> **<sup>I</sup>***N*/*ra*+*<sup>l</sup>* ] · **<sup>D</sup>**(*r<sup>b</sup>*)

**<sup>I</sup>***N*, *ifr<sup>b</sup>* <sup>≥</sup> *<sup>N</sup>*

*rb* · [**P**(*r*)

*<sup>N</sup>* , *f or* l=0

*<sup>N</sup>* , *f or* l>0

**H**(*k*,*k*,*nk*−*m*−1) (15)

, (16)

:

*rl* <sup>⊗</sup> **<sup>I</sup>***N*/*ra*<sup>+</sup>*<sup>l</sup>* ]), *ifr<sup>b</sup> <sup>&</sup>lt; <sup>N</sup>*

*<sup>N</sup>*/*ra* . (8)

*<sup>N</sup>*/*ra* )<sup>−</sup>1. (9)

*ra*

(11)

(14)

*ra* .

*<sup>r</sup>*<sup>2</sup> ⊗ **I***N*/*ra*<sup>+</sup>2. (12)

: Let *<sup>N</sup>* be divisible by *<sup>r</sup>a*+*<sup>b</sup>* and *<sup>b</sup>* <sup>≥</sup> 2,

*<sup>r</sup>*<sup>2</sup> ⊗ **I***r*(*b*−<sup>2</sup>)]) ⊗ **I***N*/*ra*<sup>+</sup>*b*. (13)

**B** = **I***N*/*<sup>r</sup>* ⊗ **T***r*. (10)

*N*/*ra*

can be distinguished: the reordering operators and the arithmetic operators.

(**S**(*a*)

**<sup>I</sup>***ra* <sup>⊗</sup> (∏*<sup>b</sup>*

*l*=1[**P**(*r*)

· <sup>∏</sup>*<sup>b</sup>*

*l*=1[**P**(*r*)

**M2**(*a*) <sup>=</sup> **<sup>I</sup>***ra* <sup>⊗</sup> **<sup>D</sup>**(*r*)

*<sup>r</sup>*<sup>2</sup> <sup>⊗</sup> **<sup>I</sup>***r*(*b*−<sup>2</sup>)] · **<sup>D</sup>**(*r*<sup>2</sup>)

The decomposition procedure given in equation (7) can be applied recursively to devise the *<sup>r</sup><sup>k</sup>* SDF pipeline architectures. Let *<sup>N</sup>* <sup>=</sup> *<sup>r</sup>*(*knk*+*l*) with {*k*, *nk*} ∈ **<sup>N</sup>**<sup>∗</sup> and *<sup>l</sup>* ∈ {0, 1, . . . , *<sup>k</sup>* <sup>−</sup> <sup>1</sup>}, the

**<sup>Q</sup>***<sup>N</sup>* · **<sup>H</sup>**(*l*,*k*,*nk*) · **<sup>T</sup>**(*pk*)

**<sup>Q</sup>***<sup>N</sup>* · **<sup>T</sup>**(*pk*)

**–** *Butterfly Operator* **B**: Let *N* be divisible by *r*,

**–** *First Twiddle Factor Multiplier Operator* **M1**(*a*,*b*)

⎧ ⎪⎪⎨

⎪⎪⎩

**M1**(*a*,*b*) =

**–** *Second Twiddle Factor Multiplier Operator* **M2**(*a*)

**–** *Third Twiddle Factor Multiplier Operator* **M3**(*a*,*b*)

**M3**(*a*,*b*) <sup>=</sup> **<sup>I</sup>***ra* <sup>⊗</sup> ([**P**(*r*)

**2.3 Matricial representation of radix** *r<sup>k</sup>* **pipeline FFT architecture**

**T***<sup>N</sup>* =

**T**(*pk*) *<sup>N</sup>* =

**Q***<sup>N</sup>* =

matricial representation of *r<sup>k</sup>* DIF SDF pipeline architectures is given by

⎧ ⎨ ⎩

*nk*−1 ∏*m*=0

*n*1−1 ∏ *i*=0

**S**(*i*)

Let *N* be divisible by *ra*+*b*,

Let *N* be divisible by *ra*<sup>+</sup>2,

: Let *N* be divisible by *ra*,

**2.2 Definition of operators**

• Reordering operators

Note that,

• Arithmetic operators

**–** *Shuffling Operator* **S**(*a*)

$$\mathbf{V}^{(b,k\cdot i)} = \mathbf{M} \mathbf{2}^{(k\cdot i+b-2)} \cdot (\mathbf{S}^{(k\cdot i)+b-2})^{-1} \cdot \mathbf{B} \cdot \mathbf{S}^{(k\cdot i+b-2)}.\tag{19}$$

When *b* is even and *b >* 2, **H**(*b*,*k*,*i*) is given by,

$$\mathbf{H}^{(b,k,i)} = \mathbf{M} \mathbf{1}^{(k \cdot i, k)} \cdot (\mathbf{S}^{(k \cdot i + b - 1)})^{-1} \cdot \mathbf{B} \cdot \mathbf{S}^{(k \cdot i + b - 1)} \cdot \mathbf{V}^{(b,k \cdot i)} \cdot \left[ \prod\_{m=0}^{M = \frac{b}{2} - 2}^{\frac{b}{2} - 2} \mathbf{G}^{(b,k \cdot i, m)} \right], \tag{20}$$

where **V**(*b*,*k*·*i*) is given by (19) and **G**(*b*,*k*·*i*,*m*) , by (22).

When *b* is odd and *b >* 1, **H**(*b*,*k*,*i*) is given by,

$$\mathbf{H}^{(b,k;i)} = \mathbf{M} \mathbf{1}^{(k \cdot i, k)} \cdot (\mathbf{S}^{(k \cdot i + b - 1)})^{-1} \cdot \mathbf{B} \cdot \mathbf{S}^{(k \cdot i + b - 1)} \cdot \left[ \prod\_{m = 0}^{M = \frac{b - 1}{2} - 1} \mathbf{G}^{(b, k \cdot i, m)} \right],\tag{21}$$

where **G**(*b*,*k*·*i*,*m*) is given by (22).

$$\mathbf{G}^{(b,k:i,m)} = \begin{cases} \mathbf{M} \mathbf{3}^{(k \cdot i + 2(M-m), b - 2(M-m))} \\ \cdot (\mathbf{S}^{(k \cdot i + b - 2m - 3)})^{-1} \cdot \mathbf{B} \cdot \mathbf{S}^{(k \cdot i + b - 2m - 3)} \\ \cdot \mathbf{M} \mathbf{2}^{(k \cdot i + b - 2m - 4)} \\ \cdot (\mathbf{S}^{(k \cdot i + b - 2m - 4)})^{-1} \cdot \mathbf{B} \cdot \mathbf{S}^{(k \cdot i + b - 2m - 4)}, \; i \; f \text{ b even and b} > 2 \\ \cdot \mathbf{M} \mathbf{3}^{(k \cdot i + 2(M - m), b - 2(M - m))} \\ \cdot (\mathbf{S}^{(k \cdot i + b - 2m - 2)})^{-1} \cdot \mathbf{B} \cdot \mathbf{S}^{(k \cdot i + b - 2m - 2)} \\ \cdot \mathbf{M} \mathbf{2}^{(k \cdot i + b - 2m - 3)} \\ \cdot (\mathbf{S}^{(k \cdot i + b - 2m - 3)})^{-1} \cdot \mathbf{B} \cdot \mathbf{S}^{(k \cdot i + b - 2m - 3)}, \; i \; f \text{ b odd and b} > 1 \end{cases} \tag{22}$$

When *N* = *rknk*+*<sup>l</sup>* with *l >* 0, (14) means that, after *nk* stages with *r<sup>k</sup>* structure, an additional processing stage with the structure of a *r<sup>l</sup>* algorithm is required.

The above matricial representation is general in terms of *N*, *r* and *k*. The Decimation In Time (DIT) version of the architecture can be easily obtained by transposing the expressions.

**H (k,k,0)**

**G (k,ki,1)**

**B**

used for the reordering operators.

the value of *r* depends on the value of *k*.

Usually this optimum value is near *k* = �(log*<sup>r</sup> N*)/2�.

architectures.

**G (k,ki,0)**

x <sup>0</sup> x <sup>1</sup> x <sup>2</sup> x nk−1

Fast Fourier Transform Processors:

**1)**

**2)**

**3)**

**4)**

**H (k,k,1)**

Implementing FFT and IFFT Cores for OFDM Communication Systems

**G (k,ki,n)**

**B**

Fig. 1. Structure of typical stages of *r<sup>k</sup>* pipeline SDF architectures

**N/r ki+2**

**B B**

**N/r ki+1 N/r ki+1**

**SSSS SS B B −1 −1 −1 M2 M3 M2**

**H (k,k,i)**

**<sup>G</sup> (k,ki,M) <sup>V</sup> (k,ki)**

**Only if b>2 Only if b is even**

**LUT LUT LUT**

Therefore, the overall amount of memory words is fixed by the number of points of the FFT. *N* − 1 is the minimum amount of memory reported in the literature for a pipeline FFT architecture. Modifying the values of *r* or *k* does not increase the amount of memory words

Increasing the value of *r* for fixed values of *N* and *k* will reduce the number of butterflies and the overall number of complex multipliers. However, the butterflies become more complex. Regarding the number of twiddle factors, the number of twiddle factors due to **M1** are reduced, while the number of twiddle factors due to **M2** and **M3** increase. For small values of *k*, there is a reduction of the overall number of twiddle factors; while for large values of *k*, there is an increase in the overall number of twiddle factors. Thus, the benefit of increasing

If the values of *N* and *r* are fixed, the benefits of increasing *k* can be studied. The overall number of butterflies is log*<sup>r</sup> N*, and thus, increasing *k* does not increase either the number of butterflies or their area. Increasing the value of *k* reduces the number of complex multipliers due to **M1**. Once **M2** and **M3** appear in an architecture, the number of complex multipliers due to each one remains approximately constant. Thus, an important benefit of increasing the value of *k* is that it reduces the overall number of complex multipliers. When *k* increases, the number of twiddle factors due to **M1** is reduced. For �(log*<sup>r</sup> N*)/2� ≤ *k <* log*<sup>r</sup> N*, one single **M1** with *N* twiddle factors appears. Finally, when *k* = log*<sup>r</sup> N*, **M1** = **I***<sup>N</sup>* and no hardware is needed to implement any **M1**. The number of twiddle factors due to **M2** remains constant with *k*. The number of twiddle factors due to **M3** increases with *k*. Adding the contributions of **M1**, **M2** and **M3**, the overall number of twiddle factors first reduces with *k* and then it starts to increase. It can be concluded that an optimum value of *k* exists for each value of *N* and *r*.

From the general expressions, the designer has to look for those values of *r* and *k* that result in an optimum single-path FFT pipeline architecture for a given application. This search can be easily performed thanks to the close link between the proposed matricial representation and implementation. No other notation allows such an exploration within the pipeline

**H (k,k,nk−1)**

119

**LUT**

**S B S M1 −1**

**B**

**N/r ki+1**

### **2.4 Mapping to pipeline architectures**

The structure of typical stages *r<sup>k</sup>* pipeline SDF architecture is depicted in Figure 1. It can be observed in the first row of Figure 1 that the output of one stage is connected to the input of the next stage. Within each stage, there is a sequence of hardware processing elements which can perform the computations defined by the operators presented in Section 2.2 as illustrated in the second and third rows of Figure 1. The hardware required to implement the computations demanded by each operator is discussed in (Cortés et al., 2009). It is important to note that the shuffling operators surrounding the butterfly operators can be merged in the single device with feedback typical of the SDF architecture. To illustrate this, in the fourth row of Figure 1 the implementation of radix 2*<sup>k</sup>* algorithms is presented.

Each stage **H**(*k*,*k*,*i*) consists of *k* butterfly operators **B**, with their corresponding shuffling and unshuffling **S** and (**S**)−<sup>1</sup> terms. The hardware that implements the arithmetic of the butterfly operators **B** is the same in all the stages of the FFT processor. The implementation of the butterfly depends on the value of *r*. The length of each of the delay lines used for the shuffling and unshuffling corresponding to the first butterfly of the first stage is *N*/*r*. The length of each of these delay lines is reduced by a factor of *r* from one butterfly to the next one as is shown in the fourth row of Figure 1. The number of delay commutator structures used for the shuffling and unshuffling around a butterfly unit is the same as the number of butterfly units.

Operators **M1**, **M2** and **M3** mainly translate to complex multipliers (C.M.) and Look-Up Tables (LUTs) to store the twiddle factors (T.F.). A stage of the form **H**(*k*,*k*,*i*) has one twiddle factor multiplier operator **M1**. The number of twiddle factors stored in the LUT used to implement **M1** is *N* for the first stage and it reduces by a factor of *r<sup>k</sup>* from one stage to the next. When *N* = *rknk* , no complex multiplier is needed to implement the twiddle factor multiplications given by **M1**(*k*(*nk*−1),*k*) of the last stage **H**(*k*,*k*,*nk*−1), because **M1**(*k*(*nk*−1),*k*) = **I***N*. The final processing given by terms of the form **H**(*l*,*k*,*nk*) when *N* = *rknk*+*<sup>l</sup>* , with *l* �= 0, does not actually have a operator **M1**; i.e.: **M1**(*knk*,*k*) = **I***N*.

For *k >* 1, twiddle factor multiplier operators **M2** appear. A stage **H**(*b*,*k*,*i*) has *b*/2 operators **M2** when *b* is even and (*b* − 1)/2 when *b* is odd. The number of twiddle factors stored in the LUT used to implement **M2** is always *r*2.

For *<sup>k</sup> <sup>&</sup>gt;* 2, twiddle factor multiplier operators **M3** appear. A stage **<sup>H</sup>**(*b*,*k*,*i*) has *<sup>b</sup>*/2 <sup>−</sup> <sup>1</sup> operators **M3** when *b* is even and (*b* − 1)/2 when *b* is odd. The number of twiddle factors stored in the LUT used for the implementation of the first **M3** within a stage **H**(*b*,*k*,*i*) is *r<sup>b</sup>* and it reduces by a factor of *r*<sup>2</sup> from one **M3** to the next one within the same stage.

### **2.5 Hardware resources of** *r<sup>k</sup>* **pipeline architectures**

In (Cortés et al., 2009), the complexity of the *r<sup>k</sup>* algorithms in terms of area was analyzed. In this section, some of their conclusions are summarized.

For a given *N*, the designer can vary the parameters *r* and *k* to achieve different implementations. These parameters influence the reordering operators and the arithmetic operators.

The overall amount of memory words used for the delay lines due to the reordering operators depends neither on *r* nor *k*. In an SDF architecture, the amount of memory is *N* − 1 words.

#### 118 Fourier Transform – Signal Processing Fast Fourier Transform Processors: Implementing FFT and IFFT Cores for OFDM Communication Systems <sup>9</sup> Fast Fourier Transform Processors: Implementing FFT and IFFT Cores for OFDM Communication Systems

8 Will-be-set-by-IN-TECH

The structure of typical stages *r<sup>k</sup>* pipeline SDF architecture is depicted in Figure 1. It can be observed in the first row of Figure 1 that the output of one stage is connected to the input of the next stage. Within each stage, there is a sequence of hardware processing elements which can perform the computations defined by the operators presented in Section 2.2 as illustrated in the second and third rows of Figure 1. The hardware required to implement the computations demanded by each operator is discussed in (Cortés et al., 2009). It is important to note that the shuffling operators surrounding the butterfly operators can be merged in the single device with feedback typical of the SDF architecture. To illustrate this, in the fourth row of Figure 1

Each stage **H**(*k*,*k*,*i*) consists of *k* butterfly operators **B**, with their corresponding shuffling and unshuffling **S** and (**S**)−<sup>1</sup> terms. The hardware that implements the arithmetic of the butterfly operators **B** is the same in all the stages of the FFT processor. The implementation of the butterfly depends on the value of *r*. The length of each of the delay lines used for the shuffling and unshuffling corresponding to the first butterfly of the first stage is *N*/*r*. The length of each of these delay lines is reduced by a factor of *r* from one butterfly to the next one as is shown in the fourth row of Figure 1. The number of delay commutator structures used for the shuffling and unshuffling around a butterfly unit is the same as the number of butterfly units. Operators **M1**, **M2** and **M3** mainly translate to complex multipliers (C.M.) and Look-Up Tables (LUTs) to store the twiddle factors (T.F.). A stage of the form **H**(*k*,*k*,*i*) has one twiddle factor multiplier operator **M1**. The number of twiddle factors stored in the LUT used to implement **M1** is *N* for the first stage and it reduces by a factor of *r<sup>k</sup>* from one stage to the next. When *N* = *rknk* , no complex multiplier is needed to implement the twiddle factor multiplications given by **M1**(*k*(*nk*−1),*k*) of the last stage **H**(*k*,*k*,*nk*−1), because **M1**(*k*(*nk*−1),*k*) = **I***N*.

For *k >* 1, twiddle factor multiplier operators **M2** appear. A stage **H**(*b*,*k*,*i*) has *b*/2 operators **M2** when *b* is even and (*b* − 1)/2 when *b* is odd. The number of twiddle factors stored in the

For *<sup>k</sup> <sup>&</sup>gt;* 2, twiddle factor multiplier operators **M3** appear. A stage **<sup>H</sup>**(*b*,*k*,*i*) has *<sup>b</sup>*/2 <sup>−</sup> <sup>1</sup> operators **M3** when *b* is even and (*b* − 1)/2 when *b* is odd. The number of twiddle factors stored in the LUT used for the implementation of the first **M3** within a stage **H**(*b*,*k*,*i*) is *r<sup>b</sup>* and

In (Cortés et al., 2009), the complexity of the *r<sup>k</sup>* algorithms in terms of area was analyzed. In

For a given *N*, the designer can vary the parameters *r* and *k* to achieve different implementations. These parameters influence the reordering operators and the arithmetic

The overall amount of memory words used for the delay lines due to the reordering operators depends neither on *r* nor *k*. In an SDF architecture, the amount of memory is *N* − 1 words.

, with *l* �= 0, does

The final processing given by terms of the form **H**(*l*,*k*,*nk*) when *N* = *rknk*+*<sup>l</sup>*

it reduces by a factor of *r*<sup>2</sup> from one **M3** to the next one within the same stage.

**2.4 Mapping to pipeline architectures**

the implementation of radix 2*<sup>k</sup>* algorithms is presented.

not actually have a operator **M1**; i.e.: **M1**(*knk*,*k*) = **I***N*.

**2.5 Hardware resources of** *r<sup>k</sup>* **pipeline architectures**

this section, some of their conclusions are summarized.

LUT used to implement **M2** is always *r*2.

operators.

Fig. 1. Structure of typical stages of *r<sup>k</sup>* pipeline SDF architectures

Therefore, the overall amount of memory words is fixed by the number of points of the FFT. *N* − 1 is the minimum amount of memory reported in the literature for a pipeline FFT architecture. Modifying the values of *r* or *k* does not increase the amount of memory words used for the reordering operators.

Increasing the value of *r* for fixed values of *N* and *k* will reduce the number of butterflies and the overall number of complex multipliers. However, the butterflies become more complex. Regarding the number of twiddle factors, the number of twiddle factors due to **M1** are reduced, while the number of twiddle factors due to **M2** and **M3** increase. For small values of *k*, there is a reduction of the overall number of twiddle factors; while for large values of *k*, there is an increase in the overall number of twiddle factors. Thus, the benefit of increasing the value of *r* depends on the value of *k*.

If the values of *N* and *r* are fixed, the benefits of increasing *k* can be studied. The overall number of butterflies is log*<sup>r</sup> N*, and thus, increasing *k* does not increase either the number of butterflies or their area. Increasing the value of *k* reduces the number of complex multipliers due to **M1**. Once **M2** and **M3** appear in an architecture, the number of complex multipliers due to each one remains approximately constant. Thus, an important benefit of increasing the value of *k* is that it reduces the overall number of complex multipliers. When *k* increases, the number of twiddle factors due to **M1** is reduced. For �(log*<sup>r</sup> N*)/2� ≤ *k <* log*<sup>r</sup> N*, one single **M1** with *N* twiddle factors appears. Finally, when *k* = log*<sup>r</sup> N*, **M1** = **I***<sup>N</sup>* and no hardware is needed to implement any **M1**. The number of twiddle factors due to **M2** remains constant with *k*. The number of twiddle factors due to **M3** increases with *k*. Adding the contributions of **M1**, **M2** and **M3**, the overall number of twiddle factors first reduces with *k* and then it starts to increase. It can be concluded that an optimum value of *k* exists for each value of *N* and *r*. Usually this optimum value is near *k* = �(log*<sup>r</sup> N*)/2�.

From the general expressions, the designer has to look for those values of *r* and *k* that result in an optimum single-path FFT pipeline architecture for a given application. This search can be easily performed thanks to the close link between the proposed matricial representation and implementation. No other notation allows such an exploration within the pipeline architectures.

The EVM is defined as:

Fast Fourier Transform Processors:

Fig. 2. EVM in the 16-QAM constellation

OFDM symbols with an average power *Po*,

symbol of the *i*

the BW.

*<sup>E</sup>*rms <sup>=</sup> <sup>1</sup>

are used to determine the allowed quantization error.

**3.2 System analysis and FFT/IFFT specifications**

20

20 ∑ *i*=1  ∑<sup>16</sup> *j*=1 

where EVM(*i*, *j*, *k*) is the magnitude of the error vector for the *kth* sub–carrier of the *j*

EVM =

Implementing FFT and IFFT Cores for OFDM Communication Systems

where *I* + *jQ* is the measured symbol and *Io* + *jQo* is the transmitted symbol. Figure 2 illustrates the computation of the EVM for a 16-QAM constellation. The IEEE 802.11a specifies

a mean value *E*rms which has to be measured after transmitting *Nf* = 20 frames of *Lp* = 16

transmitter. The EVM value in Table 2 is the minimum EVM in a WLAN system when the system transmits data modulated in 64–QAM with the maximum data rate of 54 Mbps.

The IEEE 802.11a standard specifies the spectral mask of the output signal in the transmitter. Figure 3 presents the spectral mask of the IEEE 802.11a transmitter output where this height is *H*/*No* = 40*dB*. The quality constraint selected for the receiver is the non–degradation of the CNR when the input signal to the receiver has the spectral mask specified by the standard. This is a first approach that produces an overconstrained core. EVM and *H*/*No* specifications

The goal of this analysis is to determine the parameters of Table 3. Table 3 presents five important system specifications which influence significantly the design of the FFT/IFFT: the length of the FFT/IFFT *N*, the system clock frequency *fclk*, the value of *Ktx* in the transmitter, the value of *KAGC* and the *CNR* in the receiver. As the OFDM system supports 64 sub-carriers, the FFT core must be designed to perform 64-points FFT; i.e.: *N* = 64. As shown in Table 2, the OFDM symbol period (*tsym*) in a WLAN system is equal to 4 *μ*s. Following the analysis of (Velez, 2005), *fclk* = 60 MHz is selected, higher than the maximum data rate R and multiple of

<sup>∑</sup>(48+4)

(48 + 4)16 · *Po*

*th* frame. The EVM is the quality constraint that must not be degraded in the

(*<sup>I</sup>* − *Io*)<sup>2</sup> + (*<sup>Q</sup>* − *Qo*)2, (23)

I+jQ

Average Power Po=1

*<sup>k</sup>*=<sup>1</sup> EVM(*i*, *j*, *k*)

, (24)

*th* OFDM

121

EVM

Io+jQo

### **3. Case study: WLAN IEEE 802.11a**

In this section, a case study is analyzed applying the proposed design space exploration in order to achieve the most efficient algorithm/architecture for an OFDM system. This design space exploration does not only analyze the hardware complexity of the algorithms as in (Cortés et al., 2009). Additionally, an implementation level analysis is carried out also taking into account the power consumption which is very important for wireless devices.

The main objective of this design space exploration is to select the most efficient radix *r<sup>k</sup>* pipeline SDF FFT architecture for the OFDM application. An FFT for the WLAN IEEE 802.11a standard has been selected as the case study. In this case, the length of the FFT/IFFT is 64 points. According to (Cortés et al., 2009), the optimum value of *k* is near 3. Therefore, the following analysis concentrates on *r* = 22, *r* = 23 and *r* = 24 algorithms in order to perform the implementation level analysis and to study the silicon area and power consumption results.

The proposed design space exploration can be divided into four steps. At first, the OFDM system described by the standard is analyzed to extract the specifications. Once the OFDM specifications are known, the designer focuses on the system and on the FFT/IFFT specifications to determine the parameters needed for the FFT/IFFT design. The next step is the implementation level analysis of different FFT/IFFT algorithms and architectures in order to select the most efficient core for the WLAN system according to a given criterion. This step is composed of two main analysis: the Error Vector Magnitude (EVM) analysis in the transmitter and the Carrier-To-Noise Ratio (CNR) analysis in the receiver. After these analysis, the data bitwidth (*dbw*) and the twiddle factors bitwidth (*tbw*) are chosen so as not to degrade the system performance. Finally, the layout of the most efficient FFT/IFFT core for WLAN 802.11a in terms of area and power consumption is shown.

### **3.1 Study of the IEEE 802.11a standard**

Table 2 shows the parameters specified in IEEE 802.11a standard for a WLAN system (IEE, 1999). *Nc* = 64 sub-carriers are used. *Np* = 4 sub-carriers are used as pilot tones to make the coherent detection more robust against frequency offsets and phase noise. These pilot tones are always in sub-carriers -21, -7, 7 and 21 and they are modulated in Binary Phase-Shift Keying (BPSK). *Nd* = 48 tones are employed as data sub-carriers and the rest of sub-carriers are zero. The length of the guard interval must be longer than the delay spread of the channel.


Table 2. OFDM Parameters for a WLAN IEEE 802.11a system

Considering an indoor environment, the necessary guard interval in a WLAN system is 0.8 *μ*s. Therefore, the last *Ngi* = 16 data must be copied at the beginning of the OFDM symbol. The maximum data rate is *R* = 54 Mbps. Additionally, the bandwidth of the system is *BW* = 20 MHz. The standard also determines that the OFDM symbol period is *tsym* = 4*μs*. In IEEE 802.11a, data can be modulated in BPSK, Quadrature Phase-Shift Keying (QPSK), 16-Quadrature Amplitude Modulation (QAM) or 64-QAM.

The EVM is defined as:

10 Will-be-set-by-IN-TECH

In this section, a case study is analyzed applying the proposed design space exploration in order to achieve the most efficient algorithm/architecture for an OFDM system. This design space exploration does not only analyze the hardware complexity of the algorithms as in (Cortés et al., 2009). Additionally, an implementation level analysis is carried out also taking into account the power consumption which is very important for wireless devices. The main objective of this design space exploration is to select the most efficient radix *r<sup>k</sup>* pipeline SDF FFT architecture for the OFDM application. An FFT for the WLAN IEEE 802.11a standard has been selected as the case study. In this case, the length of the FFT/IFFT is 64 points. According to (Cortés et al., 2009), the optimum value of *k* is near 3. Therefore, the following analysis concentrates on *r* = 22, *r* = 23 and *r* = 24 algorithms in order to perform the implementation level analysis and to study the silicon area and power consumption

The proposed design space exploration can be divided into four steps. At first, the OFDM system described by the standard is analyzed to extract the specifications. Once the OFDM specifications are known, the designer focuses on the system and on the FFT/IFFT specifications to determine the parameters needed for the FFT/IFFT design. The next step is the implementation level analysis of different FFT/IFFT algorithms and architectures in order to select the most efficient core for the WLAN system according to a given criterion. This step is composed of two main analysis: the Error Vector Magnitude (EVM) analysis in the transmitter and the Carrier-To-Noise Ratio (CNR) analysis in the receiver. After these analysis, the data bitwidth (*dbw*) and the twiddle factors bitwidth (*tbw*) are chosen so as not to degrade the system performance. Finally, the layout of the most efficient FFT/IFFT core for

Table 2 shows the parameters specified in IEEE 802.11a standard for a WLAN system (IEE, 1999). *Nc* = 64 sub-carriers are used. *Np* = 4 sub-carriers are used as pilot tones to make the coherent detection more robust against frequency offsets and phase noise. These pilot tones are always in sub-carriers -21, -7, 7 and 21 and they are modulated in Binary Phase-Shift Keying (BPSK). *Nd* = 48 tones are employed as data sub-carriers and the rest of sub-carriers are zero. The length of the guard interval must be longer than the delay spread of the channel. **N***<sup>c</sup>* **N***<sup>d</sup>* **N***<sup>p</sup>* **N***gi* **R BW t***sym* **EVM N***<sup>f</sup>* **L***<sup>p</sup> H*/*No*

64 48 4 16 54 20 4 -25 20 16 40

Considering an indoor environment, the necessary guard interval in a WLAN system is 0.8 *μ*s. Therefore, the last *Ngi* = 16 data must be copied at the beginning of the OFDM symbol. The maximum data rate is *R* = 54 Mbps. Additionally, the bandwidth of the system is *BW* = 20 MHz. The standard also determines that the OFDM symbol period is *tsym* = 4*μs*. In IEEE 802.11a, data can be modulated in BPSK, Quadrature Phase-Shift Keying (QPSK),

**(Mbps) (MHz) (***μ***s) (dB) (dB)**

WLAN 802.11a in terms of area and power consumption is shown.

Table 2. OFDM Parameters for a WLAN IEEE 802.11a system

16-Quadrature Amplitude Modulation (QAM) or 64-QAM.

**3.1 Study of the IEEE 802.11a standard**

**3. Case study: WLAN IEEE 802.11a**

results.

$$\text{EVM} = \sqrt{(I - I\_o)^2 + (Q - Q\_o)^2} \,\text{}\,\text{}\tag{23}$$

where *I* + *jQ* is the measured symbol and *Io* + *jQo* is the transmitted symbol. Figure 2 illustrates the computation of the EVM for a 16-QAM constellation. The IEEE 802.11a specifies

Fig. 2. EVM in the 16-QAM constellation

a mean value *E*rms which has to be measured after transmitting *Nf* = 20 frames of *Lp* = 16 OFDM symbols with an average power *Po*,

$$E\_{\rm rms} = \frac{1}{20} \sum\_{i=1}^{20} \sqrt{\frac{\sum\_{j=1}^{16} \left[ \sum\_{k=1}^{(48+4)} \text{EVM}(i, j, k) \right]}{(48+4)16 \cdot P\_0}},\tag{24}$$

where EVM(*i*, *j*, *k*) is the magnitude of the error vector for the *kth* sub–carrier of the *j th* OFDM symbol of the *i th* frame. The EVM is the quality constraint that must not be degraded in the transmitter. The EVM value in Table 2 is the minimum EVM in a WLAN system when the system transmits data modulated in 64–QAM with the maximum data rate of 54 Mbps.

The IEEE 802.11a standard specifies the spectral mask of the output signal in the transmitter. Figure 3 presents the spectral mask of the IEEE 802.11a transmitter output where this height is *H*/*No* = 40*dB*. The quality constraint selected for the receiver is the non–degradation of the CNR when the input signal to the receiver has the spectral mask specified by the standard. This is a first approach that produces an overconstrained core. EVM and *H*/*No* specifications are used to determine the allowed quantization error.

#### **3.2 System analysis and FFT/IFFT specifications**

The goal of this analysis is to determine the parameters of Table 3. Table 3 presents five important system specifications which influence significantly the design of the FFT/IFFT: the length of the FFT/IFFT *N*, the system clock frequency *fclk*, the value of *Ktx* in the transmitter, the value of *KAGC* and the *CNR* in the receiver. As the OFDM system supports 64 sub-carriers, the FFT core must be designed to perform 64-points FFT; i.e.: *N* = 64. As shown in Table 2, the OFDM symbol period (*tsym*) in a WLAN system is equal to 4 *μ*s. Following the analysis of (Velez, 2005), *fclk* = 60 MHz is selected, higher than the maximum data rate R and multiple of the BW.

**FFT Floating−point Ktx tx 1/K EVM** 123

for *Ktx* is *Ktx* = 4 so as not to degrade the system performance. Additionally, *Ktx* = 4 is

Figure 5 shows the effect of the factor *Ktx* on the EVM in the transmitter, that is, how the EVM is degraded by the clipping produced in the IFFT. If a factor *Ktx* = 8 were selected, the

4 5 6 7 8 9 10 11 12 13

Ktx

EVM value would not meet the standard specification. It can be observed that *Ktx* = 4 is a conservative decision, since the EVM specified by the standard for a BPSK modulation with a

An Automatic Gain Control (AGC) module is a block found in many electronic devices. The AGC is commonly used to dynamically adjust the gain of the receiver amplifier to keep the

In this analysis, the effect of the AGC is modeled as a gain *KAGC*. For the analysis, the value of *KAGC* that does not degrade the CNR, is selected. This value is determined by means of

Figure 6 presents the system model to determine this factor *KAGC*. This model is composed by a floating-point IFFT at the transmitter and a floating-point FFT at the receiver. The channel is

data rate of 9 Mbps is -8 dB, and the EVM obtained with *Ktx* = 4 is -61.95 dB.

modeled as as an Additive White Gaussian Noise (AWGN) channel.

**IFFT Floating−point**

Fast Fourier Transform Processors:

−60 −55 −50 −45 −40 −35 −30 −25 −20 −15

**3.2.2** *kAGC* **and** *CNRmax* **in the receiver**

received signal at the desired power level.

simulations.

Fig. 5. Analysis of the effect of the *Ktx* in EVM

EVM of the 64−point FFT, dB

Fig. 4. System model to study the effect of *Ktx* in EVM

Implementing FFT and IFFT Cores for OFDM Communication Systems

implemented as a simple shift in a real implementation.

Fig. 3. Spectral mask of a transmitter output


Table 3. System specifications

### **3.2.1** *ktx* **in the transmitter**

In order to select *Ktx*, an analysis of the IFFT output data must be carried out for each OFDM system. For this analysis, the modulation where the data reach the highest values in the I and Q data must be used.

In this analysis, *Ktx* is estimated by means of simulations. The transmission of the number of frames determined by the standard is simulated for different values of *Ktx* to calculate the EVM. Then, the largest *Ktx* that does not degrade the EVM can be selected. This way, the whole dynamic range of the Digital-to-Analog Converter (DAC) is exploited.

Figure 4 shows the system model used to analyze the EVM degradation with the increase of the value of *Ktx*. The system model is composed of a floating–point IFFT in the transmitter and a floating–point FFT in the receiver. The effect of the clipping in the DAC is emulated by a *limiter* located after the multiplication by *Ktx*. This *limiter* is responsible for saturating the amplified IFFT output according to the integer bits of the data representation. Two integer bits are considered for both the input and output data in the IFFT, since the maximum value of the input data is 1.08 when the modulation is 64–QAM. In order to improve the dynamic range of the IFFT output, the data are amplified by a factor *Ktx*. A factor *Ktx*, which is a power of two, is preferred to simplify the hardware implementation. In (Velez, 2005), the overflow probability of the different types of modulations in WLAN is analyzed carrying out Monte Carlo simulations with 3 ≤ *Ktx* ≤ 10 . It is concluded that BPSK modulation is the most sensitive to the clipping at the DAC. (Velez, 2005) also concluded that the most suitable value

Fig. 4. System model to study the effect of *Ktx* in EVM

12 Will-be-set-by-IN-TECH

**N** Length of the FFT/IFFT **f***clk* System clock frequency

**CNR***max* Maximum carrier–to–Noise Ratio

**K***tx* Maximum gain to improve the DAC performance **K***AGC* Maximum gain to improve the FFT performance

In order to select *Ktx*, an analysis of the IFFT output data must be carried out for each OFDM system. For this analysis, the modulation where the data reach the highest values in the I and

In this analysis, *Ktx* is estimated by means of simulations. The transmission of the number of frames determined by the standard is simulated for different values of *Ktx* to calculate the EVM. Then, the largest *Ktx* that does not degrade the EVM can be selected. This way, the

Figure 4 shows the system model used to analyze the EVM degradation with the increase of the value of *Ktx*. The system model is composed of a floating–point IFFT in the transmitter and a floating–point FFT in the receiver. The effect of the clipping in the DAC is emulated by a *limiter* located after the multiplication by *Ktx*. This *limiter* is responsible for saturating the amplified IFFT output according to the integer bits of the data representation. Two integer bits are considered for both the input and output data in the IFFT, since the maximum value of the input data is 1.08 when the modulation is 64–QAM. In order to improve the dynamic range of the IFFT output, the data are amplified by a factor *Ktx*. A factor *Ktx*, which is a power of two, is preferred to simplify the hardware implementation. In (Velez, 2005), the overflow probability of the different types of modulations in WLAN is analyzed carrying out Monte Carlo simulations with 3 ≤ *Ktx* ≤ 10 . It is concluded that BPSK modulation is the most sensitive to the clipping at the DAC. (Velez, 2005) also concluded that the most suitable value

whole dynamic range of the Digital-to-Analog Converter (DAC) is exploited.

Fig. 3. Spectral mask of a transmitter output

Table 3. System specifications

**3.2.1** *ktx* **in the transmitter**

Q data must be used.

for *Ktx* is *Ktx* = 4 so as not to degrade the system performance. Additionally, *Ktx* = 4 is implemented as a simple shift in a real implementation.

Figure 5 shows the effect of the factor *Ktx* on the EVM in the transmitter, that is, how the EVM is degraded by the clipping produced in the IFFT. If a factor *Ktx* = 8 were selected, the

Fig. 5. Analysis of the effect of the *Ktx* in EVM

EVM value would not meet the standard specification. It can be observed that *Ktx* = 4 is a conservative decision, since the EVM specified by the standard for a BPSK modulation with a data rate of 9 Mbps is -8 dB, and the EVM obtained with *Ktx* = 4 is -61.95 dB.

### **3.2.2** *kAGC* **and** *CNRmax* **in the receiver**

An Automatic Gain Control (AGC) module is a block found in many electronic devices. The AGC is commonly used to dynamically adjust the gain of the receiver amplifier to keep the received signal at the desired power level.

In this analysis, the effect of the AGC is modeled as a gain *KAGC*. For the analysis, the value of *KAGC* that does not degrade the CNR, is selected. This value is determined by means of simulations.

Figure 6 presents the system model to determine this factor *KAGC*. This model is composed by a floating-point IFFT at the transmitter and a floating-point FFT at the receiver. The channel is modeled as as an Additive White Gaussian Noise (AWGN) channel.

where *P*ˆ

*<sup>i</sup>* is the power of *yi* and *P*ˆ

Fast Fourier Transform Processors:

*PYq* , which is a more representative value.

a confidence interval equal to 0.95 is defined as,

applying (28), the SNR of the transmitted signal is of 39.09 dB.

Implementing FFT and IFFT Cores for OFDM Communication Systems

simulations. For this value of *KAGC*, (*CNRmax*)*dB* is 39.1 dB.

Fig. 7. (*CNR*)*dB* vs the power of the input signal in the FFT (*PYq* )

Table 4 presents the necessary OFDM parameters for the next step.

35 35.5 36 36.5 37 37.5 38 38.5 39 39.5 40

**3.2.3 Summary of results**

CNR of the 64−point FFT, dB

*<sup>n</sup>* is the power of the difference between *yi* and *yq*.

*CNR*<sup>ˆ</sup> (31)

125

The gain *KAGC* is selected so that the *CNR* is not degraded. Factor *KAGC* fixes the power of the signal *Yq*; i.e.: *PYq* . The figures to analyze the effect of *KAGC* on CNR plot the CNR versus

The CNR is obtained by means of Monte Carlo simulations. This method allows the designer to estimate a measure of the performance of a communication system and the quality of the estimation itself (Sevillano, 2004). The looseness of our estimation in order to achieve

Therefore, it can be said that the probability of *CNR* belonging to the interval [(1 − *γ*)*CNR*ˆ ,(1 + *γ*)*CNR*ˆ ] is 0.95. The Monte Carlo simulations are stopped when *γ <* 10−2.

It is assumed that the signal arrives with the maximum quality *H*/*No* = 40 dB. Therefore,

Figure 7 shows the (*CNR*)*dB* versus the power of the input signal of the FFT *PYq* . The signal works without degradation with *PYq <* −4 dB, which corresponds to *KAGC* = 6 in the

−20 −18 −16 −14 −12 −10 −8 −6 −4 −2 0

In this step, the length of the FFT *N*, the system clock frequency, *fclk*, the gain factor in the transmitter, *Ktx*, and the gain factor in the AGC, *KAGC* and the *CNRmax* have been estimated.

Power of the FFT input signal, dB

*var*ˆ (*CNR*ˆ )

*<sup>γ</sup>* <sup>=</sup> <sup>2</sup> ·

Fig. 6. System model to estimate the CNR for different values of *KAGC*

Before applying the AWGN channel, the power of the output signal of the IFFT, *Xi*, is normalized to 1 dividing the signal by the factor *λ*, which is defined as

$$
\lambda = \sqrt{\frac{N\_d + N\_p}{N\_c^2}}.\tag{25}
$$

In Figure 6, *Y* is the signal after applying the AWGN channel. At the receiver, this signal is multiplied by the gain *KAGC* and a *limiter* is applied. This *limiter* is responsible for limiting the amplified signal according to the number of integer bits of the data representation. Once the *limiter* is applied, *Yq* is the input of the floating-point FFT and *yq* is the output of the floating-point FFT. *yi* is the output of the floating-point FFT without noise and without applying the *limiter*. For the simulations, it is necessary to determine the variance of the noise to be applied to the transmitted signal so that the signal meets the spectral mask with the height *H*/*No* given by the standard. The Signal-To-Noise Ratio (*SNR*) of the transmitter output can be calculated from the height of the spectral mask *H*/*No* given by the standard. The power of the output signal of the transmitter can be written as,

$$P\_o = \frac{(N\_d + N\_p) \cdot BW \cdot (H - N\_0)}{N\_c} \tag{26}$$

where it is assumed that the power of the data sub-carriers is equal to the power of the pilot sub-carriers. The noise power is given by:

$$P\_{\rm ll} = \mathcal{N}\_o \cdot \mathcal{BW}.\tag{27}$$

Therefore,

$$\text{SNR} = \frac{P\_o}{P\_{\text{fl}}} = (\frac{H}{N\_o} - 1) \cdot \frac{(N\_d + N\_p)}{N\_{\text{c}}}.\tag{28}$$

Then, the variance *σ*<sup>2</sup> of the noise in the channel is found to be,

$$
\sigma^2 = \frac{1}{\left(\frac{H}{N\_\circ} - 1\right) \cdot \frac{\left(N\_d + N\_p\right)}{N\_\circ}}.\tag{29}
$$

In order to select the value of the factor *KAGC*, the CNR is used as a figure of merit. CNR is defined as,

$$\text{CNR} = \frac{\mathcal{P}\_i}{\mathcal{P}\_n} \tag{30}$$

14 Will-be-set-by-IN-TECH

Before applying the AWGN channel, the power of the output signal of the IFFT, *Xi*, is

In Figure 6, *Y* is the signal after applying the AWGN channel. At the receiver, this signal is multiplied by the gain *KAGC* and a *limiter* is applied. This *limiter* is responsible for limiting the amplified signal according to the number of integer bits of the data representation. Once the *limiter* is applied, *Yq* is the input of the floating-point FFT and *yq* is the output of the floating-point FFT. *yi* is the output of the floating-point FFT without noise and without applying the *limiter*. For the simulations, it is necessary to determine the variance of the noise to be applied to the transmitted signal so that the signal meets the spectral mask with the height *H*/*No* given by the standard. The Signal-To-Noise Ratio (*SNR*) of the transmitter output can be calculated from the height of the spectral mask *H*/*No* given by the standard.

> *Po* <sup>=</sup> (*Nd* <sup>+</sup> *Np*) · *BW* · (*<sup>H</sup>* <sup>−</sup> *No*) *Nc*

where it is assumed that the power of the data sub-carriers is equal to the power of the pilot

− 1) ·

*No* <sup>−</sup> <sup>1</sup>) · (*Nd*+*Np* ) *Nc*

> *i P*ˆ *n*

In order to select the value of the factor *KAGC*, the CNR is used as a figure of merit. CNR is

CNR <sup>=</sup> *<sup>P</sup>*<sup>ˆ</sup>

(*Nd* + *Np*) *Nc*

*Nd* + *Np N*<sup>2</sup> *c*

**Floating−point**

**<sup>q</sup> y <sup>q</sup>**

**FFT Floating−point**

**Yi y <sup>i</sup>**

**CNR**

. (25)

, (26)

. (28)

. (29)

, (30)

*Pn* = *No* · *BW*. (27)

**IFFT FFT**

**AWGN**

**x <sup>i</sup> Y**

**X <sup>i</sup> Y**

**KAGC**

**KAGC**

Fig. 6. System model to estimate the CNR for different values of *KAGC*

normalized to 1 dividing the signal by the factor *λ*, which is defined as

The power of the output signal of the transmitter can be written as,

SNR <sup>=</sup> *Po*

Then, the variance *σ*<sup>2</sup> of the noise in the channel is found to be,

*Pn*

= ( *<sup>H</sup> No*

*<sup>σ</sup>*<sup>2</sup> <sup>=</sup> <sup>1</sup> ( *<sup>H</sup>*

sub-carriers. The noise power is given by:

Therefore,

defined as,

*λ* = 

**Floating−point**

where *P*ˆ *<sup>i</sup>* is the power of *yi* and *P*ˆ *<sup>n</sup>* is the power of the difference between *yi* and *yq*.

The gain *KAGC* is selected so that the *CNR* is not degraded. Factor *KAGC* fixes the power of the signal *Yq*; i.e.: *PYq* . The figures to analyze the effect of *KAGC* on CNR plot the CNR versus *PYq* , which is a more representative value.

The CNR is obtained by means of Monte Carlo simulations. This method allows the designer to estimate a measure of the performance of a communication system and the quality of the estimation itself (Sevillano, 2004). The looseness of our estimation in order to achieve a confidence interval equal to 0.95 is defined as,

$$\gamma = \frac{2 \cdot \sqrt{v \text{âr}(\text{C\hat{N}R})}}{\text{C\hat{N}R}} \tag{31}$$

Therefore, it can be said that the probability of *CNR* belonging to the interval [(1 − *γ*)*CNR*ˆ ,(1 + *γ*)*CNR*ˆ ] is 0.95. The Monte Carlo simulations are stopped when *γ <* 10−2.

It is assumed that the signal arrives with the maximum quality *H*/*No* = 40 dB. Therefore, applying (28), the SNR of the transmitted signal is of 39.09 dB.

Figure 7 shows the (*CNR*)*dB* versus the power of the input signal of the FFT *PYq* . The signal works without degradation with *PYq <* −4 dB, which corresponds to *KAGC* = 6 in the simulations. For this value of *KAGC*, (*CNRmax*)*dB* is 39.1 dB.

Fig. 7. (*CNR*)*dB* vs the power of the input signal in the FFT (*PYq* )

#### **3.2.3 Summary of results**

In this step, the length of the FFT *N*, the system clock frequency, *fclk*, the gain factor in the transmitter, *Ktx*, and the gain factor in the AGC, *KAGC* and the *CNRmax* have been estimated. Table 4 presents the necessary OFDM parameters for the next step.

transmitter and a floating-point FFT at the receiver as can be seen in Figure 8(a). The EVM value is employed to select the values of *dbw* and *tbw* needed in the transmitter. An EVM margin of -10 dB is used in order to select a conservative word–length. Thus, a margin is left for other sources of error, such as the error produced by the analog processing. In order to

(a) EVM model

(b) CNR model

obtain the EVM results in Figures 9 and 10 the transmission of *Nf* = 20 frames of *Lp* = 16 OFDM symbols modulated in 64-QAM has been simulated. The pipeline-SDF with the radix 22, 23 and 24 DIF algorithms have been studied to find the most efficient implementation. For radix 23 and 24 algorithms, the bitwidth of the constant multiplier is assumed to be equal to

Figure 9 shows the EVM and the area results of the pipeline–SDF DIF 64–point FFT/IFFT core for different algorithms. In order to guarantee an EVM of at least -35 dB, the radix 22 algorithm requires a (*dbw*,*tbw*) of (12,8), whereas the radix 24 and 23 algorithms need (12,7). For the given

The EVM and the power results of the pipeline–SDF DIF 64–point FFT/IFFT core are presented in Figure 10. It can be observed that the radix 2<sup>2</sup> algorithm requires less power consumption for the same (*dbw*, *tbw*) than the rest of algorithms, whereas the radix 2<sup>3</sup>

Table 6 compares the area and the power consumption for the different algorithms to obtain an EVM of at least -35 dB. The power consumption in Table 6 has been normalized. (Kuo et al., **Algorithm dbw tbw Area P***norm* **AP**

**(mm**2**)** SDF r22 12 8 0.0993 0.0211 0.00210 SDF r23 12 7 0.0998 0.0233 0.00232 SDF r24 12 7 0.0960 0.0222 0.00214

**FIXED−POINT Ktx tx 1/K**

**KAGC**

**AWGN**

Implementing FFT and IFFT Cores for OFDM Communication Systems

**KAGC**

Fig. 8. System models to study the effect of quantization error on EVM and CNR

EVM, the radix 24 algorithm achieves the smallest core with (*dbw*,*tbw*)=(12, 7).

**IFFT**

**IFFT FLOATING−POINT**

Fast Fourier Transform Processors:

the bitwidth of the normal multiplier.

algorithms consume more than the others.

Table 6. AP comparison for WLAN system with *EVM* ≤ −35*dB*

**FFT FLOATING−POINT**

> **FFT FIXED−POINT**

**FFT FLOATING−POINT** **EVM**

**CNR**

127


Table 4. Specifications for the architecture selection for WLAN systems

#### **3.3 Selection of algorithm/architectures**

A high throughput FFT core is needed to fulfil the required specifications. Pipeline architectures are well suited to achieve small silicon area, high throughput, short processing time and reduced power consumption. Table 5 presents the hardware complexity of the candidate pipeline-SDF architectures. The pipeline-SDF radix 2<sup>2</sup> DIF architecture is composed


Table 5. Hardware complexity of candidate FFT architectures working at 60 MHz for WLAN systems

of six butterflies and two complex multipliers. The pipeline-SDF radix 23 DIF architecture is composed also of six butterflies, but one complex multiplier and two constants multipliers by one constant are used. The pipeline-SDF radix 2<sup>4</sup> DIF architecture is formed by six butterflies, one complex multiplier and one constant multiplier by two constants. Therefore, the three algorithms have the same number of multipliers taking into account normal and constant multipliers. The radix 2<sup>3</sup> and 24 DIF algorithms reduce ROM with respect to the radix 2<sup>2</sup> DIF one, but they add control logic.

To sum up, at this point it is not clear which is the most efficient architecture for WLAN. Therefore, it is necessary to make an implementation level analysis. The hardware complexity of radix 22, 23 and 24 DIF algorithms is compared to search the optimum design that meets the system specifications. The EVM constrains the word-length of the IFFT in the transmitter. The word-length in the receiver is constrained by the CNR. If the transmitter and the receiver are implemented in the same chip, the highest word-lengths must be chosen. The figures of merit to select the algorithm and architecture are the area and the power consumption estimated for an Application-Specific Integrated Circuit (ASIC) technology. Thus, this selection process can be stated as the problem of finding the FFT/IFFT processor which minimizes the AP cost function subjected to the constraints given by the specifications. The AP criterion trades off area and power consumption and can be used as a measure of the efficiency of the core. The area and power results presented in the following sections have been calculated for a TSMC 90 nm 6 ML technology with a clock frequency of 60 MHz working at 1.0V and a temperature of 25*o*C. The area results have been estimated multiplying the cell area by a factor of 2.

### **3.3.1 EVM analysis**

In order to analyze the effect of the IFFT quantization error on the EVM during transmission, an ideal reception is considered. The system model is composed of a fixed-point IFFT at the

16 Will-be-set-by-IN-TECH

**N***gi* **BW t***sym* **EVM (dB) N***<sup>f</sup>* **L***<sup>p</sup>* **f***clk* **N K***tx* **K***AGC* **CNR***max* **(MHz) (***μ***s) (MHz) (dB)** 16 20 4 -25 20 16 60 64 4 6 39.1

A high throughput FFT core is needed to fulfil the required specifications. Pipeline architectures are well suited to achieve small silicon area, high throughput, short processing time and reduced power consumption. Table 5 presents the hardware complexity of the candidate pipeline-SDF architectures. The pipeline-SDF radix 2<sup>2</sup> DIF architecture is composed **Architec. r Normal Constant ROM RAM adds**

> pipeline–SDF 22 8 0 80 192 28 pipeline–SDF 23 4 4 64 192 40 pipeline–SDF 24 4 4 64 192 35

Table 5. Hardware complexity of candidate FFT architectures working at 60 MHz for WLAN

of six butterflies and two complex multipliers. The pipeline-SDF radix 23 DIF architecture is composed also of six butterflies, but one complex multiplier and two constants multipliers by one constant are used. The pipeline-SDF radix 2<sup>4</sup> DIF architecture is formed by six butterflies, one complex multiplier and one constant multiplier by two constants. Therefore, the three algorithms have the same number of multipliers taking into account normal and constant multipliers. The radix 2<sup>3</sup> and 24 DIF algorithms reduce ROM with respect to the radix 2<sup>2</sup> DIF

To sum up, at this point it is not clear which is the most efficient architecture for WLAN. Therefore, it is necessary to make an implementation level analysis. The hardware complexity of radix 22, 23 and 24 DIF algorithms is compared to search the optimum design that meets the system specifications. The EVM constrains the word-length of the IFFT in the transmitter. The word-length in the receiver is constrained by the CNR. If the transmitter and the receiver are implemented in the same chip, the highest word-lengths must be chosen. The figures of merit to select the algorithm and architecture are the area and the power consumption estimated for an Application-Specific Integrated Circuit (ASIC) technology. Thus, this selection process can be stated as the problem of finding the FFT/IFFT processor which minimizes the AP cost function subjected to the constraints given by the specifications. The AP criterion trades off area and power consumption and can be used as a measure of the efficiency of the core. The area and power results presented in the following sections have been calculated for a TSMC 90 nm 6 ML technology with a clock frequency of 60 MHz working at 1.0V and a temperature of 25*o*C. The area results have been estimated multiplying the cell area by a factor of 2.

In order to analyze the effect of the IFFT quantization error on the EVM during transmission, an ideal reception is considered. The system model is composed of a fixed-point IFFT at the

**mults. mults. (regs.) (regs.) /subs**

Table 4. Specifications for the architecture selection for WLAN systems

**3.3 Selection of algorithm/architectures**

systems

one, but they add control logic.

**3.3.1 EVM analysis**

transmitter and a floating-point FFT at the receiver as can be seen in Figure 8(a). The EVM value is employed to select the values of *dbw* and *tbw* needed in the transmitter. An EVM margin of -10 dB is used in order to select a conservative word–length. Thus, a margin is left for other sources of error, such as the error produced by the analog processing. In order to

Fig. 8. System models to study the effect of quantization error on EVM and CNR

obtain the EVM results in Figures 9 and 10 the transmission of *Nf* = 20 frames of *Lp* = 16 OFDM symbols modulated in 64-QAM has been simulated. The pipeline-SDF with the radix 22, 23 and 24 DIF algorithms have been studied to find the most efficient implementation. For radix 23 and 24 algorithms, the bitwidth of the constant multiplier is assumed to be equal to the bitwidth of the normal multiplier.

Figure 9 shows the EVM and the area results of the pipeline–SDF DIF 64–point FFT/IFFT core for different algorithms. In order to guarantee an EVM of at least -35 dB, the radix 22 algorithm requires a (*dbw*,*tbw*) of (12,8), whereas the radix 24 and 23 algorithms need (12,7). For the given EVM, the radix 24 algorithm achieves the smallest core with (*dbw*,*tbw*)=(12, 7).

The EVM and the power results of the pipeline–SDF DIF 64–point FFT/IFFT core are presented in Figure 10. It can be observed that the radix 2<sup>2</sup> algorithm requires less power consumption for the same (*dbw*, *tbw*) than the rest of algorithms, whereas the radix 2<sup>3</sup> algorithms consume more than the others.

Table 6 compares the area and the power consumption for the different algorithms to obtain an EVM of at least -35 dB. The power consumption in Table 6 has been normalized. (Kuo et al.,


Table 6. AP comparison for WLAN system with *EVM* ≤ −35*dB*

where *P* is the power consumption with a voltage of *VDD* and working at *fclk*. By slightly

The *r*22 algorithm needs larger bitwidths to achieve the target EVM. This extra bit in *tbw* increases the area needed. The *r*23 and *r*24 algorithms can achieve the required EVM with smaller *tbw*. The *r*24 algorithm with *dbw* = 12 and *tbw* = 7 achieves the most area–efficient implementation that fulfills the EVM specification. Nevertheless, the power consumption of the *r*24 algorithm is higher than the power consumption of *r*22 algorithm. In fact, the *r*22 algorithm is the most power–efficient design. In order to achieve a trade–off, the parameter *AP* = *Area* · *Pnorm* is employed since it takes into account the area and the power consumption. The AP parameter trades off the area and power consumption of the core and, thus, it measures the efficiency of the design. For WLAN transmitter, it can be observed in

After the EVM analysis, the *r*23 algorithm is discarded. Therefore, the CNR analysis focuses on the *r*22 and *r*24 algorithms. At this point, the area and power results of the FFT/IFFT core for different bitwidth configurations are already known. Then, the CNR analysis is used to

In order to analyze the effect of the FFT quantization, the simulation model is formed by a floating–point IFFT at the transmitter and a fixed–point FFT at the receiver as is shown in Figure 8(b). First, only data are quantized. In this case, a figure of the CNR versus *dbw* can be used in order to select the *dbw* which does not degrade the CNR. Once *dbw* is selected, the twiddle factors are also quantized and a figure of the CNR versus *tbw* is shown in order to choose the *tbw* which does not degrade the CNR. This analysis is done for the candidates in

As an example, figures of (*CNR*)*dB* versus *dbw* and *tbw* are given for two algorithms. Figure 11(a) shows the (*CNR*)*dB* obtained versus the *dbw* parameter. In this case, the twiddle factors are not quantized. From the figure, a data bitwidth of *dbw* = 15 is selected to avoid degrading the (*CNR*)*dB* for both radix 22 and radix 24 algorithms. Once the data bitwidth is chosen, the twiddle factors are quantized. Figure 11(b) presents the (*CNR*)*dB* versus *tbw* where *dbw* = 15. In this case, a twiddle factor bitwidth of *tbw* = 10 is selected for both radix 22 and radix 24 algorithms. It can be observed that increasing more *tbw* does not improve the

Table 7 summarizes the (*dbw*,*tbw*) needed by the FFT to comply with the CNR requirement. Comparing the bitwidths needed by the IFFT in the transmitter to comply with the EVM and the ones needed by the FFT in the receiver to comply with the CNR, it can be said that the CNR is a much more restrictive specification. Therefore, the (*dbw*, *tbw*) selected for the FFT are the bitwidths used in the FFT/IFFT core. Table 7 presents the AP results of the FFT algorithms with the necessary bitwidths (*dbw*,*tbw*) to comply with the specifications. Taking into account the *AP*, the most efficient core for a WLAN system is the pipeline–SDF radix 2<sup>2</sup>

order to determine the necessary *dbw*,*tbw* to comply with the CNR specification.

· 109. (33)

129

*Pnorm* <sup>=</sup> *<sup>P</sup> V*2 *DD* · *fclk*

Table 6 that the most efficient cores are the ones using *r*22 and *r*24 algorithms.

select the *dbw* and *tbw* which fulfills the CNR specification.

**3.3.2 CNR analysis**

performance of the core.

DIF architecture.

modifying the expression given by (Kuo et al., 2003) as follows,

Implementing FFT and IFFT Cores for OFDM Communication Systems

Fast Fourier Transform Processors:

Fig. 9. EVM and area of the pipeline–SDF DIF 64–point FFT/IFFT core

Fig. 10. EVM and power of the pipeline–SDF DIF 64–point FFT/IFFT core

2003) proposes a normalization of the power consumption as,

$$P\_{norm} = \frac{P}{V\_{DD}^2 \cdot N \cdot f\_{clk}} \cdot 1000 \,\text{\AA} \tag{32}$$

where *P* is the power consumption with a voltage of *VDD* and working at *fclk*. By slightly modifying the expression given by (Kuo et al., 2003) as follows,

$$P\_{norm} = \frac{P}{V\_{DD}^2 \cdot f\_{clk}} \cdot 10^9. \tag{33}$$

The *r*22 algorithm needs larger bitwidths to achieve the target EVM. This extra bit in *tbw* increases the area needed. The *r*23 and *r*24 algorithms can achieve the required EVM with smaller *tbw*. The *r*24 algorithm with *dbw* = 12 and *tbw* = 7 achieves the most area–efficient implementation that fulfills the EVM specification. Nevertheless, the power consumption of the *r*24 algorithm is higher than the power consumption of *r*22 algorithm. In fact, the *r*22 algorithm is the most power–efficient design. In order to achieve a trade–off, the parameter *AP* = *Area* · *Pnorm* is employed since it takes into account the area and the power consumption. The AP parameter trades off the area and power consumption of the core and, thus, it measures the efficiency of the design. For WLAN transmitter, it can be observed in Table 6 that the most efficient cores are the ones using *r*22 and *r*24 algorithms.

### **3.3.2 CNR analysis**

18 Will-be-set-by-IN-TECH

(14,9)

(14,10)

(14,9)

(15,10)

(14,9)

−55 −50 −45 −40 −35 −30 −25

(12,8)

(12,8) (12,8)

(12,7)

(12,7)

(12,7)

(12,8)

(12,7)

(12,7)

· 1000, (32)

radix 22 radix 23 radix 24

(12,9)

EVM of the 64−point FFT, dB

(12,9)

(12,9)

−44 −43 −42 −41 −40 −39 −38 −37 −36 −35 −34 <sup>115</sup>

Fig. 10. EVM and power of the pipeline–SDF DIF 64–point FFT/IFFT core

*Pnorm* <sup>=</sup> *<sup>P</sup> V*2

2003) proposes a normalization of the power consumption as,

EVM of the 64−point FFT, dB

*DD* · *N* · *fclk*

0.1

Power of the FFT,

m

W

radix 22 radix 23 radix 24

Fig. 9. EVM and area of the pipeline–SDF DIF 64–point FFT/IFFT core

(14,10)

(14,10)

0.105

0.11

0.115

Area of the FFT, mm2

0.12

0.125

(15,10)

(15,10)

0.13

After the EVM analysis, the *r*23 algorithm is discarded. Therefore, the CNR analysis focuses on the *r*22 and *r*24 algorithms. At this point, the area and power results of the FFT/IFFT core for different bitwidth configurations are already known. Then, the CNR analysis is used to select the *dbw* and *tbw* which fulfills the CNR specification.

In order to analyze the effect of the FFT quantization, the simulation model is formed by a floating–point IFFT at the transmitter and a fixed–point FFT at the receiver as is shown in Figure 8(b). First, only data are quantized. In this case, a figure of the CNR versus *dbw* can be used in order to select the *dbw* which does not degrade the CNR. Once *dbw* is selected, the twiddle factors are also quantized and a figure of the CNR versus *tbw* is shown in order to choose the *tbw* which does not degrade the CNR. This analysis is done for the candidates in order to determine the necessary *dbw*,*tbw* to comply with the CNR specification.

As an example, figures of (*CNR*)*dB* versus *dbw* and *tbw* are given for two algorithms. Figure 11(a) shows the (*CNR*)*dB* obtained versus the *dbw* parameter. In this case, the twiddle factors are not quantized. From the figure, a data bitwidth of *dbw* = 15 is selected to avoid degrading the (*CNR*)*dB* for both radix 22 and radix 24 algorithms. Once the data bitwidth is chosen, the twiddle factors are quantized. Figure 11(b) presents the (*CNR*)*dB* versus *tbw* where *dbw* = 15. In this case, a twiddle factor bitwidth of *tbw* = 10 is selected for both radix 22 and radix 24 algorithms. It can be observed that increasing more *tbw* does not improve the performance of the core.

Table 7 summarizes the (*dbw*,*tbw*) needed by the FFT to comply with the CNR requirement. Comparing the bitwidths needed by the IFFT in the transmitter to comply with the EVM and the ones needed by the FFT in the receiver to comply with the CNR, it can be said that the CNR is a much more restrictive specification. Therefore, the (*dbw*, *tbw*) selected for the FFT are the bitwidths used in the FFT/IFFT core. Table 7 presents the AP results of the FFT algorithms with the necessary bitwidths (*dbw*,*tbw*) to comply with the specifications. Taking into account the *AP*, the most efficient core for a WLAN system is the pipeline–SDF radix 22 DIF architecture.

Fig. 12. Layout of the 64 complex–point FFT/IFFT fabricated in a 90 nm technology, 6–ML

. where *Ta* is the anchor of the transistor of the technology actually used, *A* is the occupied area

In order to assess the quality of the FFT/IFFT core, Table 9 makes a comparison with other 64 complex–point FFT/IFFT cores found in the literature. In Table 9, the area (A*norm*) and the power (P*norm*) have been normalized to a 90 nm technology using (34) and (33). The *AP* parameter indicates that our core is the most efficient one for a WLAN application. In fact, the

Many different FFT/IFFT algorithms and architectures have been proposed in the literature for OFDM systems as has been presented in Section 1. Additionally, the usual FFT notations do not facilitate to perform a general analysis for the FFT/IFFT algorithm and architecture

In Section 3, a design space exploration among different algorithms has been carried out. This search is hard to perform if general expressions are not available for the different algorithms in a unified way and if a mapping to the implementation can not be easily established. In this chapter, the matricial notation summarized in Section 2 is used as a tool to help the designer

**dbw tech. f***clk* **A***norm* **t***proc* **P***norm* **AP (***μ***m) (MHz) (mm**2**) (***μ***s)** (Yu et al., 2011) 16 0.18 80 0.2209 – 0.0378 0.00835 (Tsai et al., 2011) 14.8 0.09 394 0.102 – 0.091 0.00928 **Proposal** 15 0.09 60 0.1362 0.67 0.0277 0.00377 131

CMOS process. The core size is 0.370 <sup>×</sup> 0.368 *mm*<sup>2</sup>

Fast Fourier Transform Processors:

**4. Conclusions**

selection.

for *Ta* and *Tb* is the technology for which the area is normalized.

Implementing FFT and IFFT Cores for OFDM Communication Systems

presented core requires the smallest normalized power (P*norm*).

Table 9. Comparison of fixed–point FFT processors for WLAN systems

Fig. 11. Selection of *dbw* and *tbw* of the pipeline–SDF DIF 64–point FFT/IFFT core for radix 22 and radix 24 algorithms


Table 7. AP comparison for WLAN system with (*CNR*)*dB* ≥ 38.32 dB

To sum up, the parameters of the final implementation of the pipeline–SDF radix 2<sup>2</sup> DIF architecture are shown in Table 8. As mentioned before, the system clock frequency is 60 MHz. Working at this frequency, the processing time of the chosen core is 0.67 *μ*s. The data format is 2.13, whereas the format of the twiddle factors is 1.9. The EVM is -51.92 dB and the (*CNR*)*dB* is 38.57 dB. The estimated silicon area of the pipeline–SDF radix 2<sup>2</sup> DIF FFT/IFFT core is 0.1284 *mm*<sup>2</sup> and the power consumption estimation *Pnorm* is 0.0277. It can be observed that the EVM complies with the specification in Table 4, and the CNR is close to the *CNRmax*.


Table 8. Core Parameters for the pipeline–SDF radix 22 DIF architecture

#### **3.4 Layout of the FFT/IFFT core in an ASIC**

Figure 12 shows the layout of the 64 complex–point FFT/IFFT fabricated in a 90 nm TSMC technology, 6–ML CMOS process. The core size is 0.1362 *mm*2. In the previous section, the core area was estimated to be 0.1284 *mm*2. It can be said that the area estimation is accurate enough. In order to present a comparison with the proposals found in the literature, the area of the cores is normalized as (B.M.Baas, 1999) using the equation:

$$A\_{norm} = \frac{A}{(T\_a/T\_b)^2} \tag{34}$$

Fig. 12. Layout of the 64 complex–point FFT/IFFT fabricated in a 90 nm technology, 6–ML CMOS process. The core size is 0.370 <sup>×</sup> 0.368 *mm*<sup>2</sup>

where *Ta* is the anchor of the transistor of the technology actually used, *A* is the occupied area for *Ta* and *Tb* is the technology for which the area is normalized.

.

In order to assess the quality of the FFT/IFFT core, Table 9 makes a comparison with other 64 complex–point FFT/IFFT cores found in the literature. In Table 9, the area (A*norm*) and the power (P*norm*) have been normalized to a 90 nm technology using (34) and (33). The *AP* parameter indicates that our core is the most efficient one for a WLAN application. In fact, the presented core requires the smallest normalized power (P*norm*).


Table 9. Comparison of fixed–point FFT processors for WLAN systems

### **4. Conclusions**

20 Will-be-set-by-IN-TECH

radix 22 radix 24

Fig. 11. Selection of *dbw* and *tbw* of the pipeline–SDF DIF 64–point FFT/IFFT core for radix

**Algorithm dbw tbw Area P***norm* **AP**

To sum up, the parameters of the final implementation of the pipeline–SDF radix 2<sup>2</sup> DIF architecture are shown in Table 8. As mentioned before, the system clock frequency is 60 MHz. Working at this frequency, the processing time of the chosen core is 0.67 *μ*s. The data format is 2.13, whereas the format of the twiddle factors is 1.9. The EVM is -51.92 dB and the (*CNR*)*dB* is 38.57 dB. The estimated silicon area of the pipeline–SDF radix 2<sup>2</sup> DIF FFT/IFFT core is 0.1284 *mm*<sup>2</sup> and the power consumption estimation *Pnorm* is 0.0277. It can be observed that the EVM complies with the specification in Table 4, and the CNR is close to the *CNRmax*. **N dbw tbw f***clk* **t***proc* **CNR EVM** 64 15 10 60 MHz 0.67 *μ*s 38.57 dB -51.92 dB

Figure 12 shows the layout of the 64 complex–point FFT/IFFT fabricated in a 90 nm TSMC technology, 6–ML CMOS process. The core size is 0.1362 *mm*2. In the previous section, the core area was estimated to be 0.1284 *mm*2. It can be said that the area estimation is accurate enough. In order to present a comparison with the proposals found in the literature, the area

*Anorm* <sup>=</sup> *<sup>A</sup>*

Table 7. AP comparison for WLAN system with (*CNR*)*dB* ≥ 38.32 dB

Table 8. Core Parameters for the pipeline–SDF radix 22 DIF architecture

of the cores is normalized as (B.M.Baas, 1999) using the equation:

**3.4 Layout of the FFT/IFFT core in an ASIC**

**(mm**2**)** SDF r22 15 10 0.1284 0.0277 0.00356 SDF r24 15 10 0.1256 0.0287 0.00360

34

35

36

CNR of the 64−point FFT, dB

37

38

39

<sup>7</sup> <sup>8</sup> <sup>9</sup> <sup>10</sup> <sup>11</sup> <sup>12</sup> <sup>13</sup> <sup>14</sup> <sup>15</sup> <sup>33</sup>

(b) *tbw* (*dbw* = 15)

(*Ta*/*Tb*)<sup>2</sup> , (34)

tbw (bits)

radix 22 radix 24

<sup>10</sup> <sup>11</sup> <sup>12</sup> <sup>13</sup> <sup>14</sup> <sup>15</sup> <sup>16</sup> <sup>17</sup> <sup>18</sup> <sup>19</sup> <sup>24</sup>

(a) *dbw*

22 and radix 24 algorithms

dbw (bits)

CNR of the 64−point FFT, dB

Many different FFT/IFFT algorithms and architectures have been proposed in the literature for OFDM systems as has been presented in Section 1. Additionally, the usual FFT notations do not facilitate to perform a general analysis for the FFT/IFFT algorithm and architecture selection.

In Section 3, a design space exploration among different algorithms has been carried out. This search is hard to perform if general expressions are not available for the different algorithms in a unified way and if a mapping to the implementation can not be easily established. In this chapter, the matricial notation summarized in Section 2 is used as a tool to help the designer

Kuo, J.-C., Wen, C.-H., Lin, C.-H. & Wu, A.-Y. (2003). VLSI Design of a Variable-Length

Lee, H.-Y. & Park, I.-C. (2007). Balanced Binary-Tree Decomposition for Area-Efficient Pipelined FFT Processing, *IEEE Transactions on Circuits and Systems-I* 54(4): 889–900. Lee, J., Lee, H., Cho, S.-I. & Choi, S.-S. (2006). A High-speed, Low-complexity Radix-2 FFT

Lenart, T. & Owal, V. (2006). Architectures for Dynamic Data Scaling in 2/4/8k

Lin, H.-L., Lin, H., Chen, Y.-C. & Chang, R. C. (2004). A Novel Pipelined Fast Fourier

Lin, Y.-W., Liu, H.-Y. & Lee, C.-Y. (2004). A Dynamic Scaling FFT Processor for DVB-T

Lin, Y.-W., Liu, H.-Y. & Lee, C.-Y. (2005). A 1-GS/s FFT/IFFT Processor for UWB Applications,

Liu, L., Ren, J., Wang, X. & Ye, F. (2007). Design of Low-Power, 1GS/s Throughput

Maharatna, K., Grass, E. & Jagdhold, U. (2004). A 64-Point Fourier Transform Chip for

Nee, R. V. & Prasad, R. (2000). *OFDM for Wireless Multimedia Communications*, Artech House. Pease, M. (1968). An Adaptation of the Fast Fourier Transform for Parallel Processing, *Journal*

Rader, C. M. (1968a). A Linear Filtering Approach to the Computation of the Discrete Fourier

Rader, C. M. (1968b). Discrete Fourier transforms when the Number of Data Samples is Prime,

Rudagi, J. M., Lobo, R., Patil, P. & Biraj, N. (2010). An Efficient 64-point Pipelined FFT Engine,

Saberinia, E. (2006). Implementation of a Multi-band Pulsed-OFDM Transceiver, *Journal of*

Serrá, M., Martí, P. & Carrabina, J. (2004). IFFT/FFT core architecture with an Identical Stage

Sevillano, J. F. (2004). *Diseno de un Demodulador QPSK para Sistemas de Radiodifusión Digital por* ˜ *Satélite*, PhD thesis, Campus tecnológico de la Universidad de Navarra. Sloate, H. (1974). Matrix Representation for Sorting and the FFT, *IEEE Transactions on Circuits*

Thomas, L. H. (1963). Using a Computer to Solve Problems in Physics, *Applications of Digital*

Applications, *IEEE Journal of Solid-State Circuits* 39(1): 2005–2013.

*Processing Systems Design and Implementation* pp. 7–11.

*IEEE Journal of Solid-State Circuits* 40(8): 1726–1735.

*Symposium on Circuits and Systems* pp. 2594–2597.

*of the Association of Computing Machines* 15(2): 252–264.

*Advances in Wireless Communications* pp. 606–610.

*Proceedings of the IEEE* 56: 1107U–1108. ˝

*Applied Signal Processing* pp. 1306–1316.

Implementing FFT and IFFT Cores for OFDM Communication Systems

*Systems* pp. 4719–4722.

*Circuits* 39(3): 484–492.

*Computing* pp. 204–208.

*VLSI Signal Processing* 43: 73–88.

*and Systems* CAS-21(1): 109–116.

*Computers* .

14(11): 1286–1290.

Fast Fourier Transform Processors:

FFT/IFFT Processor for OFDM-Based Communication Systems, *EURASIP Journal on*

133

Processor for MB-OFDM UWB Systems, *IEEE International Symposium on Circuits and*

Pipeline FFT Cores, *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*

Transform Architecture for Double Rate OFDM Systems, *IEEE Workshop on Signal*

FFT Processor for MIMO-OFDM UWB Communication System, *IEEE International*

High-Speed Wireless LAN Application Using OFDM, *IEEE Journal of Solid-State*

Transform, *Northeast Electronics Research and Engineering Meeting Record* 10: 218–219.

*2010 International Conference on Advances in Recent Technologies in Communication and*

Structure for Wireless LAN Communications, *IEEE 5th Workshop on Signal Processing*

in this search. The OFDM parameters obtained from the IEEE 802.11a standard analysis have been employed as constraints for the optimization problem. The AP parameter, which trades off area and power consumption, has been used as a measure of the efficiency of the core. Finally, a pipeline–SDF radix 2<sup>2</sup> DIF FFT/IFFT processor has been proposed, since it achieves the minimum of the AP cost function.

To sum up, it can be concluded that there is no unique FFT/IFFT algorithm, architecture and implementation that is optimal for all OFDM systems. Therefore, it is recommended to perform a search across the algorithm, architecture and implementation dimensions for each OFDM system. The matricial notation is presented in this chapter as a unified and compact representation that can help the designer in this search. This search is feasible, and the FFT/IFFT cores implemented using this approach present a great efficiency.

### **5. References**


22 Will-be-set-by-IN-TECH

in this search. The OFDM parameters obtained from the IEEE 802.11a standard analysis have been employed as constraints for the optimization problem. The AP parameter, which trades off area and power consumption, has been used as a measure of the efficiency of the core. Finally, a pipeline–SDF radix 2<sup>2</sup> DIF FFT/IFFT processor has been proposed, since it achieves

To sum up, it can be concluded that there is no unique FFT/IFFT algorithm, architecture and implementation that is optimal for all OFDM systems. Therefore, it is recommended to perform a search across the algorithm, architecture and implementation dimensions for each OFDM system. The matricial notation is presented in this chapter as a unified and compact representation that can help the designer in this search. This search is feasible, and

Bidet, E., Castelain, D., Joanblanq, C. & Senn, P. (1995). A Fast Single-Chip Implementation of 8192 Complex Point FFT, *IEEE Journal of Solid State Circuits* 30(3): 300–305. B.M.Baas (1999). A Low-Power, High Performance, 1024-point FFT Processor, *IEEE Journal of*

Bruun, G. (1978). z-Transform DFT Filters and FFTs, *IEEE Transactions on Acoustics, Speech and*

Chang, Y.-S. & Park, S.-C. (2004). An Enhanced Memory Assignment Scheme for

Cooley, J. W. & Tukey, J. (1965). An Algorithm For Machine Calculation of Complex Fourier

Cortés, A., Vélez, I. & Sevillano, J. F. (2009). Radix *r<sup>k</sup>* FFTs: Matricial Representation

Cortés, A., Vélez, I., Irizar, A. & Sevillano, J. F. (2007). Area efficient IFFT/FFT core for

Engels, M. (ed.) (2002). *Wireless OFDM Systems. How to make them work?*, Kluwer Academic. ETS (2004). *Digital Video Broadcasting (DVB); Framing structure, channel coding and modulation*

Good, I. J. (1958). The Interaction Algorithm and Practical Fourier Analysis, *Journal of the Royal*

He, S. & Torkelson, M. (1998). Designing Pipeline FFT Processor for OFDM (de)Modulation, *URSI International Symposium on Signals, Systems, and Electronics* 29: 257–262. IEE (1999). *Part11: Wireless LAN Medium Access Control and Physical Layer specifications.*

IEE (2003). *Part 16: Air interface for fixed broadband wireless access systems amendment 2: Medium Access Control modifications and additional physical layer specifications for 2-11 GHz*. Jiang, M., Yang, B., Fu, Y., Jiang, A. & an Wang, X. (2004). Design Of FFT processor with

Low Power Complex Multiplier for OFDM-based High-speed Wireless Applications, *International Symposium on Communications and Information Technologies* pp. 639–641. Jung, Y., Yoon, H. & Kim, J. (2005). New Efficient FFT Algorithm and Pipeline Implementation

Results for OFDM/DMT Applications, *IEEE Transactions on Consumer Electronics*

ECM (2005). *Standard ECMA-368, High rate ultra wideband PHY and MAC standard*.

Memory-Based FFT Processor, *IEEE Transactions Fundamentals* E87-A(11): 3020–3024.

and SDC/SDF Pipeline Implementation, *IEEE Transactions on Signal Processing*

the FFT/IFFT cores implemented using this approach present a great efficiency.

the minimum of the AP cost function.

*Solid-State Circuits* 34(3): 380–387.

Series, *Mathematics of Computation* 19: 297–301.

MB-OFDM UWB, *Electronic Letters* 43(11).

*High-speed Physical Layer in the 5 GHz Band*.

*Signal Processing* 26(1): 56–63.

*for digital terrestrial television*.

*Statistical Society* 20(2): 361–372.

57(7): 2824–2839.

49(1): 14–20.

**5. References**


**6** 

*Malaysia* 

**FPGA Implementation of Inverse** 

Somayeh Mohammady, Nasri Sulaiman,

*Universiti Putra Malaysia (UPM)* 

**Fast Fourier Transform in Orthogonal** 

Roslina M. Sidek, Pooria Varahram, M. Nizar Hamidon

**Frequency Division Multiplexing Systems** 

In modern communication systems, Orthogonal Frequency Division Multiplexing (OFDM) systems are used to transmit with higher data rate and avoid Inter Symbol Interference (ISI). The OFDM transmitter and receiver contain Inverse Fast Fourier Transform (IFFT) and Fast Fourier Transform (FFT), respectively. The IFFT block provides orthogonality between adjacent subcarriers. The orthogonality makes the signal frame relatively secure to the fading caused by natural multipath environment. As a result OFDM system has become very popular in modern telecommunication systems. Beside all the advantages of OFDM system, there is a main drawback of high Peak to Average Power Ratio (PAPR). There have been many approaches on reducing PAPR in time domain and frequency domain. Some of them work in time domain such as Partial Transmit Sequence Insertion (PTS) and some other methods perform in frequency domain such as Dummy Sequence Insertion (DSI) and Selected Mapping (SLM) methods (Bauml et al., 1996; Muller et al., 1997). Since according to (Baxley et al., 2007), the SLM method reduce PAPR with the least computational complexity and least additional modification requirements on the current technology, therefore most of recent researches have considered SLM based method modifications for their work. Most of these methods modified the OFDM transmitter in a way that multiple IFFT processors are required for implementation. This will increase the number of additions and multiplications

In this Chapter, the OFDM system and the main block of IFFT are introduced. The IFFT block is implemented on FPGA and verification results are discussed. The Optimum Phase Sequence Insertion with Dummy Insertion (OPS-DSI) method is one of recent PAPR reduction techniques and a good example of application for IFFT processor is studied in this

Fig. 1 shows how OFDM signal is processed. The data input signal with high data rate is split into narrow band channels with lower data rate and then, they are modulated by using general signal modulation (PSK, QAM) and followed by with Inverse Fast Fourier

Chapter and the FPGA implementation result is verified with simulation results.

**1. Introduction** 

that are needed for implementation.

**2. OFDM system** 


## **FPGA Implementation of Inverse Fast Fourier Transform in Orthogonal Frequency Division Multiplexing Systems**

 Somayeh Mohammady, Nasri Sulaiman, Roslina M. Sidek, Pooria Varahram, M. Nizar Hamidon *Universiti Putra Malaysia (UPM) Malaysia* 

### **1. Introduction**

24 Will-be-set-by-IN-TECH

134 Fourier Transform – Signal Processing

Tsai, P.-Y., Chen, C.-W. & Huang, M.-Y. (2011). Automatic IP Generation of FFT/IFFT

Tsai, T. H., Peng, C. C. & Chen, T.-M. (2006). Design of a FFT/IFFT Soft IP Generator

Turrillas, M., Cortés, A., Sevillano, J. F., Vélez, I., Oria, C., Irizar, A. & Baena, V. (2010).

Velez, I. (2005). *Metodología de Diseño de Sistemas Digitales para Telecomunicaciones basada*

Wang, C. C., Huang, J. M. & Cheng, H. C. (2005). A 2K/8K mode Small-Area FFT Processor for

Winograd, S. (1978). On Computing the Discrete Fourier Transform, *Mathematics of*

Yu, C., Yen, M.-H. & Hsiung, P.-A. (2011). A Low-Power 64-point Pipeline FFT/IFFT Processor for OFDM Applications, *IEEE Transactions on Consumer Electronics* 57(1): 40–45.

*Journal on Advances in Signal Processing* 2011(ID-136319).

5(8): 1173–1180.

Universidad de Navarra.

*Computation* 32: 175–199.

*World DAB Forum* (n.d.). http://www.worlddab.org/.

46(15).

pp. 28–32.

Processors with Word-length Optimization for MIMO-OFDM Systems, *EURASIP*

using on OFDM Communication System, *WSEAS Transactions on Circuits and Systems*

Comparison of area–efficient FFT algorithms for DVB-T2 receivers, *Electronic Letters*

*en C/C++: Aplicación a WLAN 802.11a*, PhD thesis, Campus tecnológico de la

OFDM Demodulation of DVB-T Receivers, *IEEE Transactions on Consumer Electronics*

In modern communication systems, Orthogonal Frequency Division Multiplexing (OFDM) systems are used to transmit with higher data rate and avoid Inter Symbol Interference (ISI). The OFDM transmitter and receiver contain Inverse Fast Fourier Transform (IFFT) and Fast Fourier Transform (FFT), respectively. The IFFT block provides orthogonality between adjacent subcarriers. The orthogonality makes the signal frame relatively secure to the fading caused by natural multipath environment. As a result OFDM system has become very popular in modern telecommunication systems. Beside all the advantages of OFDM system, there is a main drawback of high Peak to Average Power Ratio (PAPR). There have been many approaches on reducing PAPR in time domain and frequency domain. Some of them work in time domain such as Partial Transmit Sequence Insertion (PTS) and some other methods perform in frequency domain such as Dummy Sequence Insertion (DSI) and Selected Mapping (SLM) methods (Bauml et al., 1996; Muller et al., 1997). Since according to (Baxley et al., 2007), the SLM method reduce PAPR with the least computational complexity and least additional modification requirements on the current technology, therefore most of recent researches have considered SLM based method modifications for their work. Most of these methods modified the OFDM transmitter in a way that multiple IFFT processors are required for implementation. This will increase the number of additions and multiplications that are needed for implementation.

In this Chapter, the OFDM system and the main block of IFFT are introduced. The IFFT block is implemented on FPGA and verification results are discussed. The Optimum Phase Sequence Insertion with Dummy Insertion (OPS-DSI) method is one of recent PAPR reduction techniques and a good example of application for IFFT processor is studied in this Chapter and the FPGA implementation result is verified with simulation results.

### **2. OFDM system**

Fig. 1 shows how OFDM signal is processed. The data input signal with high data rate is split into narrow band channels with lower data rate and then, they are modulated by using general signal modulation (PSK, QAM) and followed by with Inverse Fast Fourier

FPGA Implementation of Inverse Fast Fourier Transform

Languages (HDL).

Fig. 3. Virtex-5 Pro development board

is performing.

I/O gates which makes it suitable for a typical implementation.

in Orthogonal Frequency Division Multiplexing Systems 137

subchannel with different frequency widths. Each subcarrier carries a signal at the same

Field Programmable Gate Arrays (FPGAs) are configurable and re-programmable digital logic devices, and programming code is usually written in Hardware Description

Virtex-5 FXT Evaluation Kit is used for implementation which has the Xilinx Virtex-5 XC5VFX30T-FF665 FPGA chip. The datasheet of this FPGA is provided in Appendix D. This board also has 64 MB DDR2 SDRAM memory and 16 MB FLASH memory with variety of

As shown in Fig. 3, the JTAG-USB cable is connected to the JTAG connector of the FPGA board for programming the FPGA. This cable should be connected to the USB port of the PC. The other connector is named serial port which is used to command while the program

This section introduces fundamentals of IFFT block prototype at the transmitter and the FFT block at the receiver. The basic equations of the FFT and the Inverse FFT (IFFT) are given by:

*-j2 kn/N*

*X(k)= x(n)e ,k=0,...,N-1* <sup>π</sup> ∑ (1)

*N-1*

*n=1*

time in parallel. Every subcarrier is modulated by a constellation symbol.

**3. IFFT implementation on Field Programmable Gate Array (FPGA)** 

Transform (IFFT) which provides orthogonality between adjacent sub-channels. After IFFT, the last portion of signal is copied to the head to provide immunity to Inter Symbol Interference (ISI) which is shown by Cyclic Prefix (CP) in Fig. 1.

*OFDM Signal*

Fig. 2. Comparison between FDM and OFDM signals

Although OFDM process is similar to Frequency Division Multiplexing (FDM) signal but, there are some differences. FDM is a single carrier signal in which signal is divided into frequency bands with some guard interval between them to avoid interferences. This is not an issue in OFDM signal since neighboring subcarriers are orthogonal to each other; overlapping does not create interference and the bandwidth is used more efficiently as shown in Fig. 2.

The OFDM is a form of Frequency Division Multiplexing (FDM) scheme and Multicarrier Modulation (MCM) scheme. In FDM, each signal is divided to smaller signals named 136 Fourier Transform – Signal Processing

Transform (IFFT) which provides orthogonality between adjacent sub-channels. After IFFT, the last portion of signal is copied to the head to provide immunity to Inter Symbol

IFFT

*FDM Signal*

*OFDM Signal*

Although OFDM process is similar to Frequency Division Multiplexing (FDM) signal but, there are some differences. FDM is a single carrier signal in which signal is divided into frequency bands with some guard interval between them to avoid interferences. This is not an issue in OFDM signal since neighboring subcarriers are orthogonal to each other; overlapping does not create interference and the bandwidth is used more efficiently as

The OFDM is a form of Frequency Division Multiplexing (FDM) scheme and Multicarrier Modulation (MCM) scheme. In FDM, each signal is divided to smaller signals named

*Saving of the bandwidth*

Cyclic Prefix

*freq.*

*freq.*

Digital to Analog Converter

Parallel to Serial Converter

Data Output

Interference (ISI) which is shown by Cyclic Prefix (CP) in Fig. 1.

Narrow band channels with low data rate

QAM modulation

*Guard*

Fig. 2. Comparison between FDM and OFDM signals

Serial to Parallel Conversion

Fig. 1. The OFDM signal structure

Data Input With High Data Rate

shown in Fig. 2.

subchannel with different frequency widths. Each subcarrier carries a signal at the same time in parallel. Every subcarrier is modulated by a constellation symbol.

## **3. IFFT implementation on Field Programmable Gate Array (FPGA)**

Field Programmable Gate Arrays (FPGAs) are configurable and re-programmable digital logic devices, and programming code is usually written in Hardware Description Languages (HDL).

Fig. 3. Virtex-5 Pro development board

Virtex-5 FXT Evaluation Kit is used for implementation which has the Xilinx Virtex-5 XC5VFX30T-FF665 FPGA chip. The datasheet of this FPGA is provided in Appendix D. This board also has 64 MB DDR2 SDRAM memory and 16 MB FLASH memory with variety of I/O gates which makes it suitable for a typical implementation.

As shown in Fig. 3, the JTAG-USB cable is connected to the JTAG connector of the FPGA board for programming the FPGA. This cable should be connected to the USB port of the PC. The other connector is named serial port which is used to command while the program is performing.

This section introduces fundamentals of IFFT block prototype at the transmitter and the FFT block at the receiver. The basic equations of the FFT and the Inverse FFT (IFFT) are given by:

$$X(\mathbf{k}) = \sum\_{n=1}^{N-1} x(n)e^{-j2\pi kn/N} \text{ ,} k = 0, \ldots, N-1 \tag{1}$$

FPGA Implementation of Inverse Fast Fourier Transform

 Data RAM 0

 Data RAM 1

Fig. 5. Block diagram of the Radix2-Burst I/O architecture (Hemphill et al., 2007)

distributed memory can be used for data memories and phase memories.

are some parameters that should be defined here.

The signal cannot be simultaneously loaded and unloaded like Radix4, Burst I/O architecture and the loading should be stopped during the calculation of the transform.

The point sizes can be from 8 to 65536 and a minimum of block memories is used in this algorithm. When the point size is equal or less than 1024, both block memory and

In order to have accurate IFFT block, the model of targeted FPGA should be indicated in AccelDSP tool window. The AccelDSP is a synthesis tool that transforms a design in Matlab into a hardware module. This module can be VHDL or Verilog code. This tool controls an

There is a browser in GUI that shows the design hierarchy, the M-files, and the generated HDL source files. In this project, AccelDSP is used to generate the IFFT and FFT blocks. To guide the synthesis process, the design objects in the project explorer window is used. There

One of the important parameters in design of IFFT block is the algorithm to implement it which was discussed before and it is selected the Radix. The other parameter is the IFFT length that denotes the number of differential points in the IFFT. There is also option for I/O

integrated environment with other design tools such as Matlab and Xilinx ISE tools.

FFT (Yiqun et al., 2006).

Input Data

in Orthogonal Frequency Division Multiplexing Systems 139

is smaller. This means it is smaller in size than the Radix4 solution. The forth scheme is based on the Radix2 architecture. The "Radix2-Lite-Burst I/O" uses a time-multiplexed approach to the butterfly and the butterfly is even smaller howeverthe transform time is longer. In this project the Radix2-Burst I/O architecture as shown in Figure 5.4, is used due to the less hardware resource requirement compared to the other algorithms to prototype

switch

+

switch

Output Data

x -+

 RADIX-2 BUTTERFLY

 ROM for Twiddles

$$\mathbf{x}(n) = \frac{1}{N} \sum\_{n=1}^{N-1} \mathbf{X}(k) e^{-j2\pi kn/N} \text{ , } n = 0, \ldots, N-1 \tag{2}$$

where *N* is the transform size or the number of sample points in the data frame and *j* = −1 . *X(k)* is the frequency output of the FFT at *kth* point where *k*=0, 1, …, *N*-1 and *x(n)* is the time sample at *nth* point with *n*=0, 1,…, *N-*1*.*

Due to the symmetric of the exponential matrix *<sup>j</sup>*<sup>2</sup> *kn / N e* − π , it can be represented as twiddle factor that is shown with *nk WN* . The computation can be performed faster by using twiddle factor as it depends on the number of points used and there is no need to recalculate it and the values can be referred to a matrix of twiddle factors. As the transform time is very crucial in FFT process, there is always a trade-off between the core size and the transform time. In Xilinx there are four architectures of Pipelined-Streaming I/O, Radix4-Burst I/O, Radix2-Burst I/O and Radix2-Lite-Burst I/O. They have different features to cover different time and size requirements.

In Pipelined Streaming I/O architecture, the data is processed continuously. The Radix4 uses an iterative approach to process the data. The data is loaded and processed separately. It is smaller in size than the pipelined solution howeverhas a longer transform time. The third architecture has the same iterative approach as Radix4 althoughthas longer transform time. The Radix2 is based on Decimation In Frequency (DIF) and separates the input data into two halves of:

$$X(0), X(1), \dots, X(\frac{N}{2}-1) \\ and \, X(\frac{N}{2}), X(\frac{N}{2}+1), \dots, X(N-1) \tag{3}$$

The FFT formula for both even and odd conditions can be written in two summations as follows:

$$\begin{aligned} X(k) &= \sum\_{n=0}^{\frac{N}{2}-1} a(n) \mathcal{W}\_{\frac{N}{2}}^{nk}, \text{ where } a(n) = \mathbf{x}(n) + \mathbf{x}(n + \frac{N}{2})\\ \text{and} \\ X(2k+1) &= \sum\_{n=0}^{\frac{N}{2}-1} b(n) \mathcal{W}\_{\frac{N}{2}}^{nk}, \text{ where } b(n) = \mathbf{x}(n) - \mathbf{x}(n + \frac{N}{2}) \end{aligned} \tag{4}$$

This operation for 2 points can be graphically presented in Fig. 4.

Fig. 4. Two point butterfly graph

where *Y(0)=X(0)+X(1)* and *Y(0)=X(0)-X(1),* respectively. This FFT flow graph is called butterfly graph. When the number of point is increased, the butterfly is expanded. Radix2 scheme in third architecture separates the input data into two halves therefore the butterfly 138 Fourier Transform – Signal Processing

*<sup>1</sup> x(n)= X(k)e ,n=0,...,N-1 <sup>N</sup>* π

where *N* is the transform size or the number of sample points in the data frame and *j* = −1 . *X(k)* is the frequency output of the FFT at *kth* point where *k*=0, 1, …, *N*-1 and *x(n)* is the time

factor that is shown with *nk WN* . The computation can be performed faster by using twiddle factor as it depends on the number of points used and there is no need to recalculate it and the values can be referred to a matrix of twiddle factors. As the transform time is very crucial in FFT process, there is always a trade-off between the core size and the transform time. In Xilinx there are four architectures of Pipelined-Streaming I/O, Radix4-Burst I/O, Radix2-Burst I/O and Radix2-Lite-Burst I/O. They have different features to cover different

In Pipelined Streaming I/O architecture, the data is processed continuously. The Radix4 uses an iterative approach to process the data. The data is loaded and processed separately. It is smaller in size than the pipelined solution howeverhas a longer transform time. The third architecture has the same iterative approach as Radix4 althoughthas longer transform time. The Radix2 is based on Decimation In Frequency (DIF) and separates the input data

> 01 1 1 1 2 22

The FFT formula for both even and odd conditions can be written in two summations as

*N NN X( ), X( ), ,X( )and X( ),X( ), ,X(N )* … … <sup>−</sup> + − (3)

Y(0)

Y(1)

2

2

(4)

*-j2 kn/N*

∑ (2)

− π , it can be represented as twiddle

*N-1*

*n=1*

sample at *nth* point with *n*=0, 1,…, *N-*1*.*

time and size requirements.

into two halves of:

follows:

Due to the symmetric of the exponential matrix *<sup>j</sup>*<sup>2</sup> *kn / N e*

1 2

−

*N*

*n*

=

∑

2 1

*and*

Fig. 4. Two point butterfly graph

0 2

1 2

−

*N*

*n*

This operation for 2 points can be graphically presented in Fig. 4.

X(0)

X(1)

=

∑

0 2

*nk N*

*<sup>N</sup> X( k ) b(n)W , where b(n) x(n) x(n )*

where *Y(0)=X(0)+X(1)* and *Y(0)=X(0)-X(1),* respectively. This FFT flow graph is called butterfly graph. When the number of point is increased, the butterfly is expanded. Radix2 scheme in third architecture separates the input data into two halves therefore the butterfly

+ = = −+

*<sup>N</sup> X(k) a(n)W , where a(n) x(n) x(n )*

= = + +

*nk N*

is smaller. This means it is smaller in size than the Radix4 solution. The forth scheme is based on the Radix2 architecture. The "Radix2-Lite-Burst I/O" uses a time-multiplexed approach to the butterfly and the butterfly is even smaller howeverthe transform time is longer. In this project the Radix2-Burst I/O architecture as shown in Figure 5.4, is used due to the less hardware resource requirement compared to the other algorithms to prototype FFT (Yiqun et al., 2006).

Fig. 5. Block diagram of the Radix2-Burst I/O architecture (Hemphill et al., 2007)

The signal cannot be simultaneously loaded and unloaded like Radix4, Burst I/O architecture and the loading should be stopped during the calculation of the transform.

The point sizes can be from 8 to 65536 and a minimum of block memories is used in this algorithm. When the point size is equal or less than 1024, both block memory and distributed memory can be used for data memories and phase memories.

In order to have accurate IFFT block, the model of targeted FPGA should be indicated in AccelDSP tool window. The AccelDSP is a synthesis tool that transforms a design in Matlab into a hardware module. This module can be VHDL or Verilog code. This tool controls an integrated environment with other design tools such as Matlab and Xilinx ISE tools.

There is a browser in GUI that shows the design hierarchy, the M-files, and the generated HDL source files. In this project, AccelDSP is used to generate the IFFT and FFT blocks. To guide the synthesis process, the design objects in the project explorer window is used. There are some parameters that should be defined here.

One of the important parameters in design of IFFT block is the algorithm to implement it which was discussed before and it is selected the Radix. The other parameter is the IFFT length that denotes the number of differential points in the IFFT. There is also option for I/O

FPGA Implementation of Inverse Fast Fourier Transform

consumption which is presented in a table in ISE.

**4. Results and discussions** 

Floating-Point Plot to verify a match.

amplitude.

in Orthogonal Frequency Division Multiplexing Systems 141

model should be able to run successfully. As shown in Fig. 6, the centre block with Xilinx

Then with system generator block, the NGC Netlist file can be generated. This file contains information that the ISE software is able to analyze it and estimate the hardware resource

When the IFFT block is designed in AccelDSP tool, the fixed point model of the design is generated. The AccelDSP automatically runs a MATLAB fixed-point simulation. Then the verification process can be done visually which is to compare the Fixed-Point Plot with the

Fig. 7 presents an amplitude comparison between generated fixed point model of IFFT design with *N*=256 and Radix 2 shown by (a) and floating point model of this design which is shown by (b). The x-axis unit is the Number of samples per time (N). It can be observed that these two results are the same and therefore the IFFT processor is verified in terms of

(a) (b)

of signal in floating point model of the design overlap each other.

Fig. 7. Magnitude comparison between (a) fixed point model and (b) floating point model

The Angle comparison between fixed point model and floating point is presented in Fig. 8. As shown by Fig. 8 (a) and (b), the angle of signal in fixed point model of IFFT and the angle

sign at the background is the IFFT model which is generated by AccelDSP tools.

Fig. 6. System Generator block diagram of IFFT block and input output blocks

format. With the data I/O format option in AccelDSP GUI, input and output data can be initialized. Single buffering does not parallel any operations. Double buffering parallels the loading and unloading of frames of data. Natural Order I/O only applies to Single Fly architectures. Decimation Algorithm will naturally have inputs or outputs in digit/bit reverse ordering; DIF has natural order input and digit/bit reverse output, DIT has digit/bit reverse input and natural order output. The 'Yes' will force input and output to be natural order regardless of decimation type. The input data can be set to complex or real. Decimation algorithm parameter will be set to Decimation In Time (DIT) or Decimation In Frequency (DIF) algorithm. Scaling is the 1/IFFT Length ratio that can be set. Complex multiplier is another option that chooses different complex multiplier architectures. Round Mode sets the Quantizer round mode property for all data path quantizers. If Floor is selected for round mode, the numbers between 0 and -1 will be rounded to 0 and all the other numbers, bigger than zero and lower than -1, will round to the closest number. For example -1.8 will be -2. There is also a section for input data width that shows the number of bits used to represent the input. Input Data Fract Width shows the number of bits used to represent fractional part of input word width. Twiddle width is another parameter that shows the number of bits used to represent twiddle factors. In addition twiddle factor width is the number of bits used to represent the fractional part of phase factors. The range of the phase factors is (-1, 1) and therefore 2 bits are always needed for the integer part of the phase factors. The fractional part will always, twiddle factor width=twiddle width - 2. The data width can be also modified for output of IFFT. Output data width depends on the scaling option. If Scaling is set to 'Yes', output data width = input data width. If scaling is set to 'No", output data width = input data width + *log*2 (IFFT length) + 1. The output data fractional width indicates the greater of input data fractional width or twiddle fractional width.

Another important setting in AccelDSP tool is about the form of flow in the design. For this particular application, the flow should be set to System Generator. At the end of design, a library including the IFFT block with desired name is created. From Matlab simulink environment the library and IFFT block is accessible.

When the IFFT block is inserted in simulink window, some other components are required to complete the model which is shown in Fig. 6. These components are Input data, signal Synchronization line, output gate, and complex to real imaginary converter. At this time the model should be able to run successfully. As shown in Fig. 6, the centre block with Xilinx sign at the background is the IFFT model which is generated by AccelDSP tools.

Then with system generator block, the NGC Netlist file can be generated. This file contains information that the ISE software is able to analyze it and estimate the hardware resource consumption which is presented in a table in ISE.

## **4. Results and discussions**

140 Fourier Transform – Signal Processing

sampleIn\_Re outputValid

In

Re Im Complex to Real -Imag

syncIn Signal From Workspace 2

Signal From Workspace

width.

Inp

In

In

environment the library and IFFT block is accessible.

sampleIn\_Im sampleOut\_Re

syncIn sampleOut\_Im

ifft\_003

format. With the data I/O format option in AccelDSP GUI, input and output data can be initialized. Single buffering does not parallel any operations. Double buffering parallels the loading and unloading of frames of data. Natural Order I/O only applies to Single Fly architectures. Decimation Algorithm will naturally have inputs or outputs in digit/bit reverse ordering; DIF has natural order input and digit/bit reverse output, DIT has digit/bit reverse input and natural order output. The 'Yes' will force input and output to be natural order regardless of decimation type. The input data can be set to complex or real. Decimation algorithm parameter will be set to Decimation In Time (DIT) or Decimation In Frequency (DIF) algorithm. Scaling is the 1/IFFT Length ratio that can be set. Complex multiplier is another option that chooses different complex multiplier architectures. Round Mode sets the Quantizer round mode property for all data path quantizers. If Floor is selected for round mode, the numbers between 0 and -1 will be rounded to 0 and all the other numbers, bigger than zero and lower than -1, will round to the closest number. For example -1.8 will be -2. There is also a section for input data width that shows the number of bits used to represent the input. Input Data Fract Width shows the number of bits used to represent fractional part of input word width. Twiddle width is another parameter that shows the number of bits used to represent twiddle factors. In addition twiddle factor width is the number of bits used to represent the fractional part of phase factors. The range of the phase factors is (-1, 1) and therefore 2 bits are always needed for the integer part of the phase factors. The fractional part will always, twiddle factor width=twiddle width - 2. The data width can be also modified for output of IFFT. Output data width depends on the scaling option. If Scaling is set to 'Yes', output data width = input data width. If scaling is set to 'No", output data width = input data width + *log*2 (IFFT length) + 1. The output data fractional width indicates the greater of input data fractional width or twiddle fractional

Another important setting in AccelDSP tool is about the form of flow in the design. For this particular application, the flow should be set to System Generator. At the end of design, a library including the IFFT block with desired name is created. From Matlab simulink

When the IFFT block is inserted in simulink window, some other components are required to complete the model which is shown in Fig. 6. These components are Input data, signal Synchronization line, output gate, and complex to real imaginary converter. At this time the

Fig. 6. System Generator block diagram of IFFT block and input output blocks

 System Generator

Re

out

OutA\_Re

out

Im

OutA\_Im Real -Imag to Complex

out To Workspace 2

When the IFFT block is designed in AccelDSP tool, the fixed point model of the design is generated. The AccelDSP automatically runs a MATLAB fixed-point simulation. Then the verification process can be done visually which is to compare the Fixed-Point Plot with the Floating-Point Plot to verify a match.

Fig. 7 presents an amplitude comparison between generated fixed point model of IFFT design with *N*=256 and Radix 2 shown by (a) and floating point model of this design which is shown by (b). The x-axis unit is the Number of samples per time (N). It can be observed that these two results are the same and therefore the IFFT processor is verified in terms of amplitude.

Fig. 7. Magnitude comparison between (a) fixed point model and (b) floating point model

The Angle comparison between fixed point model and floating point is presented in Fig. 8. As shown by Fig. 8 (a) and (b), the angle of signal in fixed point model of IFFT and the angle of signal in floating point model of the design overlap each other.

FPGA Implementation of Inverse Fast Fourier Transform

0.10

0.08

0.06

0.04

0.02

0

Re(N)






0.10

0.08

0.06

0.04

0.02

0






(c)

Fig. 9. Real and imaginary signal verification of IFFT

Re(N)

in Orthogonal Frequency Division Multiplexing Systems 143

AW Inverse FFT Real Output AW Inverse FFT Imag Output 0.12

0.10

0.08

0.06

0.04

0.02

Im(N)

0





0 100 200 300 0 100 200 300 N N (a) (b)

Reference Inverse FFT Real Output Reference Inverse FFT Imag Output

0.12

0.10

0.08

0.06

0.04

0.02

Im(N)

0





0 100 200 300 0 100 200 300 N N

(d)

Fig. 8. Angel comparison between (a) fixed point model and (b) floating point model

The other form of verification can be performed using real and imaginary constellation graph. In order to verify designed IFFT, the AccelDSP tool generates real and imaginary constellation graphs and the verification can be done visually. Fig. 9 presents constellation based on IFFT with *N*=256 and Radix 2. In Fig. 9, the real part of fixed point model is shown by (a) which agree with (c) which is the real part in floating point model of IFFT design. The imaginary part of fixed point model is shown by (b) in Fig. 9 and it agree with (d) which is the imaginary part of floating point model of IFFT design.

As a result of discussed verifications, the error between fixed point model and floating point model is negative which is presented in Fig. 10. The other important parameters in designing hardware modules are hardware resource consumption and power consumption. These parameters can be estimated using Xilinx ISE tool. First the NGC Netlist file should be generated using System generator block in Matlab simulink and then through ISE the hardware consumption can be measured. In ISE GUI, from file folder, the saved project can be opened and then using Implement Top Module bottom, the hardware resource consumption table is generated which is presented in Table 1.

The main consideration is the percentages of DSP48 and IO Utilization. As shown in Table 1, the DSP48 and IO Utilization units of IFFT are used 6% and 16%, respectively.

The power consumed by the implemented DSI-SLM scheme is estimated by ISE XPower analyzer, Xilinx tool, after the place and route process. The processor consumes a total power of about 630 milliWatts and dynamic power of 10 milliWatts. Table 2 presents the details of power report. The ISE tool is also able to generate power consumption report.

142 Fourier Transform – Signal Processing

(a) (b)

the imaginary part of floating point model of IFFT design.

consumption table is generated which is presented in Table 1.

Fig. 8. Angel comparison between (a) fixed point model and (b) floating point model

The other form of verification can be performed using real and imaginary constellation graph. In order to verify designed IFFT, the AccelDSP tool generates real and imaginary constellation graphs and the verification can be done visually. Fig. 9 presents constellation based on IFFT with *N*=256 and Radix 2. In Fig. 9, the real part of fixed point model is shown by (a) which agree with (c) which is the real part in floating point model of IFFT design. The imaginary part of fixed point model is shown by (b) in Fig. 9 and it agree with (d) which is

As a result of discussed verifications, the error between fixed point model and floating point model is negative which is presented in Fig. 10. The other important parameters in designing hardware modules are hardware resource consumption and power consumption. These parameters can be estimated using Xilinx ISE tool. First the NGC Netlist file should be generated using System generator block in Matlab simulink and then through ISE the hardware consumption can be measured. In ISE GUI, from file folder, the saved project can be opened and then using Implement Top Module bottom, the hardware resource

The main consideration is the percentages of DSP48 and IO Utilization. As shown in Table 1,

The power consumed by the implemented DSI-SLM scheme is estimated by ISE XPower analyzer, Xilinx tool, after the place and route process. The processor consumes a total power of about 630 milliWatts and dynamic power of 10 milliWatts. Table 2 presents the details of power report. The ISE tool is also able to generate power consumption report.

the DSP48 and IO Utilization units of IFFT are used 6% and 16%, respectively.

Fig. 9. Real and imaginary signal verification of IFFT

FPGA Implementation of Inverse Fast Fourier Transform

Power consumption for *N*=512 is presented in Table 3.

increased.

respectively.

are used by 6% and 16%, respectively.

report.

in Orthogonal Frequency Division Multiplexing Systems 145

As mentioned before, one of main consideration in designing OFDM systems is the computational complexity which can be defined by the number of real additions and

According to [Baxely et al., 2007] Each IFFT requires *N/2logN+N/2* complex multiplication and *NlogN* complex addition. A complex multiplication takes four real multiplications and two real additions. Total number of real addition required for an IFFT, *AIFFT* is presented in Eq. (1) and total number of real multipications for an IFFT, *MIFFT* is presented in Eq. (2). Hence the total number of multiplication and addition of one IFFT can be given by Eq. (3)

2 2

where *N* is the number of subcarriers. When *N*=256, *TIFFT*=3850. For *N*=512 and 1024, the value of *TIFFT* is 8471 and 18484 respectively. It is obvious that by increasing the lenght of IFFT, the number of additions and multipication required for Implementing the IFFT is

Design process of IFFT for higher number of subcarriers (*N*=512 and 1024) and Radix-4 is very similar to IFFT with Radix-2 and *N*=256. The Hardware resource Consumption and

**xc5vfx30t-1ff665 Resources** 

Number of fully used LUT-FF pairs 554 64%

Table 3. Hardware resource consumption of IFFT block with Radix-2 and *N*=512

*N N A ( log N ) N log N* = ++ (5)

*T A M N log N N IFFT IFFT IFFT* = += + 5 3 (7)

**Used** 

Slices 303 5% DSP48 Slices 3 4%

IO Utilization 58 16%

As shown in Table 3, the DSP48 and IO Utilization units of IFFT are used 4% and 16%,

According to ISE estimation, this IFFT processor consumes a total power of about 631 milliWatts and dynamic power of 11 milliWatts. Table 4 presents the details of this power

The Hardware Resource Consumption of IFFT processor with Radix-2 and *N*=1024 is presented in Table 5. It is shown that, the DSP48 and IO Utilization units of this IFFT block

This IFFT processor consumes a total power of about 630 milliWatts and dynamic power of 10 milliWatts which are estimated by ISE tools. Table 6 presents the details of power report.

**Percentages of Consumption** 

*N N <sup>M</sup>* = + *( log N )* (6)

2 2 *IFFT*

4 2 2 *IFFT*

multiplications that is required for hardware implementation of a system.

Fig. 10. Error study between fixed point model and floating point model


Table 1. Hardware resource consumption of IFFT block with Radix-2 and *N*=256


Table 2. Power consumption report for IFFT with Radix-2 and N=256

144 Fourier Transform – Signal Processing

Er

R

0 100 200 300 0 100 200 300 N N

Fig. 10. Error study between fixed point model and floating point model

**xc5vfx30t-1ff665 Resources** 

Number of fully used LUT-FF pairs 455 56%

Table 1. Hardware resource consumption of IFFT block with Radix-2 and *N*=256

**Name Power [W]**

**Total Quiescent Power** 0.61977 **Total Dynamic Power** 0.01030

**Total Power** 0.63008

Table 2. Power consumption report for IFFT with Radix-2 and N=256

10


Slices 252 4% DSP48 Slices 4 6%

IO Utilization 58 16%

**Logic** 0.00006 538 20480 2.6 **Signals** 0.00057 1103 --- --- **DSP** 0.00026 4 64 6.3

**Used** 

**Percentages of Consumption** 

**Resources** 

**Used Total Available Utilization [%]** 

N/A

10


10


10

ErR


10


10


Error in Real Output (Reference - AW) Error in Imaginary Output (Reference - AW)

10


10


10


As mentioned before, one of main consideration in designing OFDM systems is the computational complexity which can be defined by the number of real additions and multiplications that is required for hardware implementation of a system.

According to [Baxely et al., 2007] Each IFFT requires *N/2logN+N/2* complex multiplication and *NlogN* complex addition. A complex multiplication takes four real multiplications and two real additions. Total number of real addition required for an IFFT, *AIFFT* is presented in Eq. (1) and total number of real multipications for an IFFT, *MIFFT* is presented in Eq. (2). Hence the total number of multiplication and addition of one IFFT can be given by Eq. (3)

$$A\_{IFT} = 2(\frac{N}{2}\log N + \frac{N}{2}) + 2N\log N\tag{5}$$

$$M\_{IFT} = 4(\frac{N}{2}\log N + \frac{N}{2})\tag{6}$$

$$T\_{IFFT} = A\_{IFFT} + M\_{IFFT} = \text{5N} \log \text{N} + \text{3N} \tag{7}$$

where *N* is the number of subcarriers. When *N*=256, *TIFFT*=3850. For *N*=512 and 1024, the value of *TIFFT* is 8471 and 18484 respectively. It is obvious that by increasing the lenght of IFFT, the number of additions and multipication required for Implementing the IFFT is increased.

Design process of IFFT for higher number of subcarriers (*N*=512 and 1024) and Radix-4 is very similar to IFFT with Radix-2 and *N*=256. The Hardware resource Consumption and Power consumption for *N*=512 is presented in Table 3.


Table 3. Hardware resource consumption of IFFT block with Radix-2 and *N*=512

As shown in Table 3, the DSP48 and IO Utilization units of IFFT are used 4% and 16%, respectively.

According to ISE estimation, this IFFT processor consumes a total power of about 631 milliWatts and dynamic power of 11 milliWatts. Table 4 presents the details of this power report.

The Hardware Resource Consumption of IFFT processor with Radix-2 and *N*=1024 is presented in Table 5. It is shown that, the DSP48 and IO Utilization units of this IFFT block are used by 6% and 16%, respectively.

This IFFT processor consumes a total power of about 630 milliWatts and dynamic power of 10 milliWatts which are estimated by ISE tools. Table 6 presents the details of power report.

FPGA Implementation of Inverse Fast Fourier Transform

**Name Power [W]**

**Total Quiescent Power** 0.62160 **Total Dynamic Power** 0.02939

are used by 18% and 16%, respectively.

**Total Quiescent Power** 0.62125 **Total Dynamic Power** 0.02573

**Total Power** 0.64698

Table 10. Power consumption report for IFFT with Radix-4 and N=1024

**Name Power [W]**

**Total Power** 0.65099

Table 8. Power consumption report for IFFT with Radix-4 and N=256

**xc5vfx30t-1ff665 Resources** 

Number of fully used LUT-FF pairs 1001 51%

Table 9. Hardware resource consumption of IFFT block with Radix-4 and *N*=1024

in Orthogonal Frequency Division Multiplexing Systems 147

When comparing Table 7 and Table 1, it can be observed that the IO Utilization has no

The IFFT processor with Radix-4 and *N*=256 consumes a total power of 0.65099 Watt and dynamic power of 0.02939 Watt which are estimated by ISE tools. Table 8 presents the details of power report. By Comparing Table 8 with Table 2, it is seen that power

> **Logic** 0.00098 1364 20480 6.7 **Signals** 0.00322 2775 --- --- **DSP** 0.00078 12 64 18.8

**Resources** 

**Used Total Available Utilization [%]** 

N/A

**Resources** 

**Used Total Available Utilization [%]** 

N/A

**Percentages of Consumption** 

**Used** 

Slices 629 12% DSP48 Slices 12 18%

IO Utilization 58 16%

The Hardware resource consumption of IFFT processor with Radix-4 and *N*=1024 is presented in Table 9. It is shown that, the DSP48 and IO Utilization units of this IFFT block

> **Logic** 0.00080 1364 20480 6.7 **Signals** 0.00317 2775 --- --- **DSP** 0.00078 12 64 18.8

changes. However the consumption of DSP48 slices is increased by about 12%.

consumption of Radix-4 is increased compared to Radix-2 by about 0.02 Watt.


Table 4. Power consumption report for IFFT with Radix-2 and *N*=512


Table 5. Hardware resource consumption of IFFT block with Radix-2 and *N*=1024


Table 6. Power consumption report for IFFT with Radix-2 and *N*=1024

The Hardware Resource Consumption of IFFT processor with Radix-4 and *N*=256 is presented in Table 7. It is shown that, the DSP48 and IO Utilization units of this IFFT block are used by 18% and 16%, respectively.


Table 7. Hardware resource consumption of IFFT block with Radix-4 and *N*=256

146 Fourier Transform – Signal Processing

**Logic** 0.00017 613 20480 3 **Signals** 0.00074 1198 --- --- **DSP** 0.00019 3 64 4.7

**Total Quiescent Power** 0.61993 **Total Dynamic Power** 0.01197 **Total Power** 0.63190

**Total Quiescent Power** 0.61977 **Total Dynamic Power** 0.01030 **Total Power** 0.63008

are used by 18% and 16%, respectively.

Table 4. Power consumption report for IFFT with Radix-2 and *N*=512

Table 6. Power consumption report for IFFT with Radix-2 and *N*=1024

**xc5vfx30t-1ff665 Resources** 

Number of fully used LUT-FF pairs 455 56%

Table 5. Hardware resource consumption of IFFT block with Radix-2 and *N*=1024

**Name Power [W] Used Total Available Utilization [%]** 

**Resources** 

N/A

**Resources** 

N/A

**Percentages of Consumption** 

**Percentages of Consumption** 

**Used** 

Slices 330 6% DSP48 Slices 4 6%

IO Utilization 58 16%

**Name Power [W] Used Total Available Utilization [%]** 

**Logic** 0.00006 538 20480 2.6 **Signals** 0.00057 1103 --- --- **DSP** 0.00026 4 64 6.3

The Hardware Resource Consumption of IFFT processor with Radix-4 and *N*=256 is presented in Table 7. It is shown that, the DSP48 and IO Utilization units of this IFFT block

**Used** 

Slices 696 13% DSP48 Slices 12 18%

IO Utilization 58 16%

**xc5vfx30t-1ff665 Resources** 

Number of fully used LUT-FF pairs 1064 49%

Table 7. Hardware resource consumption of IFFT block with Radix-4 and *N*=256

When comparing Table 7 and Table 1, it can be observed that the IO Utilization has no changes. However the consumption of DSP48 slices is increased by about 12%.

The IFFT processor with Radix-4 and *N*=256 consumes a total power of 0.65099 Watt and dynamic power of 0.02939 Watt which are estimated by ISE tools. Table 8 presents the details of power report. By Comparing Table 8 with Table 2, it is seen that power consumption of Radix-4 is increased compared to Radix-2 by about 0.02 Watt.


Table 8. Power consumption report for IFFT with Radix-4 and N=256


Table 9. Hardware resource consumption of IFFT block with Radix-4 and *N*=1024

The Hardware resource consumption of IFFT processor with Radix-4 and *N*=1024 is presented in Table 9. It is shown that, the DSP48 and IO Utilization units of this IFFT block are used by 18% and 16%, respectively.


Table 10. Power consumption report for IFFT with Radix-4 and N=1024

FPGA Implementation of Inverse Fast Fourier Transform

and can be defined by:

where, *ω0 = 2π/T* and *j* = −1 .

modified version of these methods.

2009):

in Orthogonal Frequency Division Multiplexing Systems 149

The PAPR is calculated as the ratio of the maximum power and the average power of signal

1

<sup>−</sup> <sup>ω</sup>

*N*

*n s(t) A e*

0

where *A = (A0,A1, ...,A(N-1))* is a modulated data sequence of length *N* in the time interval (*0,T*), where *Ai* is a symbol from a signal constellation and *T* is the OFDM symbol duration.

Basically the performance of a PAPR reduction is measured using Complementary Cumulative Distribution Function (CCDF) graph. It denotes the probability that the PAPR of a data symbol exceeds a predefined threshold as expressed by (Han et al., 2005; Heo et al.,

1

*probability(PAPR z) probability(PAPR z)*

where *N* is the number of subcarriers and *z* is the threshold. Basically, this probability function is used as a graph to determine the ability of an algorithm in reducing the PAPR of the OFDM signal and the PAPR is usually compared to unmodified OFDM signal at 0.01% CCDF which is shown by 10-4 CCDF in horizontal vector of graphs. A typical OFDM signal without any PAPR reduction technique has about 8dB to 13dB PAPR at 10-4 CCDF (Raab et al., 2011). Therefore, when a PAPR reduction technique is applied to the OFDM system, it is expected to reduce the 13dB PAPR to some lower value. According to the IEEE standard

Several techniques have been developed to reduce PAPR of the OFDM signal. There are two main categories for these techniques, distortion based methods (which means that applying these methods result in out-of-band distortion) and distortion less methods (there is no outof-band distortion). First category includes Clipping (May et al., 1998), Windowing (Van et al., 1998), Envelope Scaling (Foomooljareon et al., 2002), Random Phase Updating (Nikookar et al., 2002), Peak Reduction Carrier (Tan et al., 2003), Companding (Hao et al., 2006; Hao et al., 2010; Cao et al., 2007; Chang et al, 2010; Hao et al., 2008; Kim et al., 2008) and other

Clipping is a simple technique for PAPR reduction, where in the transmitter, the signal is clipped to a desired level and the phase information remains unchanged. The clipping method applies distortion to the system; therefore normally clipping technique is integrated with filtering method in expense of additional IFFT and FFT blocks which increase the

〉 =− ≤

*N N*

*F(z) ( exp( z))*

1 11 1

=− =− − −

*F(z) exp(z)*

(IEEE STD 802.16e™-2005), the reduction should be at least 3dB.

= −

=

*PAPR( A)*

where *s(t)* is the *N* carriers OFDM envelope presented as below (Ochiai, 2003):

2

(8)

(10)

<sup>=</sup> ∑ (9)

2

*max s(t)*

*E s(t)* <sup>⎡</sup> <sup>⎤</sup> <sup>⎣</sup> <sup>⎦</sup> <sup>=</sup> <sup>⎡</sup> <sup>⎤</sup> <sup>⎣</sup> <sup>⎦</sup>

0

*j nt n*

The IFFT processor with Radix-4 and *N*=1024 consumes a total power of 0.64698 Watt and dynamic power of 0.02573 Watt which are estimated by ISE tools. Table 10 presents the details of power report.

### **5. Recent application of IFFT processor**

Simple structure of an OFDM symbol consist of 4 sinusoids is shown in Fig. 11. The OFDM signal is created by the sum of multiple sinusoidal signals. Due to the constructive interference, as shown in Fig. 3 high peaks will be structured and as a result of destructive interference, the average power might be as low as zero. Hence, the ratio between peak and average will be high (Higashinaka et al., 2009).

Fig. 11. Sample of OFDM signal behavior

High Peak to Average Power Ratio (PAPR) is a major design challenge in OFDM systems (Krongold et al., 2003; Bauml et al., 1996; Wei et al., 2006).

The reason is that when OFDM signal with high PAPR is introduced to amplification stage, Power Amplifier (PA) which is usually peak power limited, is forced to operate in the nonlinear region (Nieto, 2005). This will cause two impacts, out-of-band distortion or spreading the spectrum that can be measured by Adjacent Channel Power Ratio (ACPR) metric and inband distortion, which can be measured by Error Vector Magnitude (EVM) metric.

There are some PAs with a wide dynamic linear region (Class AB), however they are generally expensive, consume more power, and less efficient (Cooper, 2008; Varahram et al., 2009; Sharma et al., 2010). Hence, in order to have high-efficiency OFDM signal and extended battery life, the PAPR must be reduced and the linearity of PA should be maximized.

The PAPR is calculated as the ratio of the maximum power and the average power of signal and can be defined by:

$$PAPR(A) = \frac{\max\left[\left|\mathbf{s}(t)\right|^2\right]}{E\left[\left|\mathbf{s}(t)\right|^2\right]}\tag{8}$$

where *s(t)* is the *N* carriers OFDM envelope presented as below (Ochiai, 2003):

$$s(t) = \sum\_{n=0}^{N-1} A\_n e^{j\alpha\_0 nt} \tag{9}$$

where *A = (A0,A1, ...,A(N-1))* is a modulated data sequence of length *N* in the time interval (*0,T*), where *Ai* is a symbol from a signal constellation and *T* is the OFDM symbol duration.

where, *ω0 = 2π/T* and *j* = −1 .

148 Fourier Transform – Signal Processing

The IFFT processor with Radix-4 and *N*=1024 consumes a total power of 0.64698 Watt and dynamic power of 0.02573 Watt which are estimated by ISE tools. Table 10 presents the

Simple structure of an OFDM symbol consist of 4 sinusoids is shown in Fig. 11. The OFDM signal is created by the sum of multiple sinusoidal signals. Due to the constructive interference, as shown in Fig. 3 high peaks will be structured and as a result of destructive interference, the average power might be as low as zero. Hence, the ratio between peak and

y=y1+y2+y3+y4


High Peak to Average Power Ratio (PAPR) is a major design challenge in OFDM systems

The reason is that when OFDM signal with high PAPR is introduced to amplification stage, Power Amplifier (PA) which is usually peak power limited, is forced to operate in the nonlinear region (Nieto, 2005). This will cause two impacts, out-of-band distortion or spreading the spectrum that can be measured by Adjacent Channel Power Ratio (ACPR) metric and in-

There are some PAs with a wide dynamic linear region (Class AB), however they are generally expensive, consume more power, and less efficient (Cooper, 2008; Varahram et al., 2009; Sharma et al., 2010). Hence, in order to have high-efficiency OFDM signal and extended battery life, the PAPR must be reduced and the linearity of PA should be

band distortion, which can be measured by Error Vector Magnitude (EVM) metric.

details of power report.

2

1.5

1

0.5

0

Amlitude

maximized.





Fig. 11. Sample of OFDM signal behavior

(Krongold et al., 2003; Bauml et al., 1996; Wei et al., 2006).

**5. Recent application of IFFT processor** 

average will be high (Higashinaka et al., 2009).

y1 y2 y3 y4

Basically the performance of a PAPR reduction is measured using Complementary Cumulative Distribution Function (CCDF) graph. It denotes the probability that the PAPR of a data symbol exceeds a predefined threshold as expressed by (Han et al., 2005; Heo et al., 2009):

$$\begin{aligned} \text{probability}(PAPR/z) &= 1 - \text{probability}(PAPR \le z) \\ \text{if } z = 1 - F(z)^N &= 1 - (1 - \exp(-z))^N \\ F(z) &= 1 - \exp(z) \end{aligned} \tag{10}$$

where *N* is the number of subcarriers and *z* is the threshold. Basically, this probability function is used as a graph to determine the ability of an algorithm in reducing the PAPR of the OFDM signal and the PAPR is usually compared to unmodified OFDM signal at 0.01% CCDF which is shown by 10-4 CCDF in horizontal vector of graphs. A typical OFDM signal without any PAPR reduction technique has about 8dB to 13dB PAPR at 10-4 CCDF (Raab et al., 2011). Therefore, when a PAPR reduction technique is applied to the OFDM system, it is expected to reduce the 13dB PAPR to some lower value. According to the IEEE standard (IEEE STD 802.16e™-2005), the reduction should be at least 3dB.

Several techniques have been developed to reduce PAPR of the OFDM signal. There are two main categories for these techniques, distortion based methods (which means that applying these methods result in out-of-band distortion) and distortion less methods (there is no outof-band distortion). First category includes Clipping (May et al., 1998), Windowing (Van et al., 1998), Envelope Scaling (Foomooljareon et al., 2002), Random Phase Updating (Nikookar et al., 2002), Peak Reduction Carrier (Tan et al., 2003), Companding (Hao et al., 2006; Hao et al., 2010; Cao et al., 2007; Chang et al, 2010; Hao et al., 2008; Kim et al., 2008) and other modified version of these methods.

Clipping is a simple technique for PAPR reduction, where in the transmitter, the signal is clipped to a desired level and the phase information remains unchanged. The clipping method applies distortion to the system; therefore normally clipping technique is integrated with filtering method in expense of additional IFFT and FFT blocks which increase the

FPGA Implementation of Inverse Fast Fourier Transform

transmitted.

X

Serial to Parallel Converter

in Orthogonal Frequency Division Multiplexing Systems 151

will be transmitted, however if the iterations for both loops are performed and still the PAPR is not less than the threshold, the signal with minimum PAPR among them will be

B

Fig. 12. Block diagram of the OPS-DSI scheme, transmitter

enhanced by about 1.9dB compared to original OFDM signal.

according to Shannon-Hartly theorem (Hartley, 1928).

PAPR is enhanced by about 3.4dB.

suitable for FPGA implementation.

Dummy Sequence Insertion + Side information

*Loop\_a*

*Si si* 

IFFT PAPR <PAPRTH

*Loop\_ b*

The CCDF result of a typical OFDM system with OPS-DSI scheme is compared with C-SLM and DSI methods. As shown in Fig. 13 (a), the PAPR of original OFDM signal is about 11.8dB at 10-4 CCDF or 0.01% CCDF. When DSI method is applied to this system, the PAPR is reduced to about 9.9dB shown by Fig. 13 (b) which means that the PAPR performance is

When the C-SLM method with 8 IFFTs (number of candidate signals, *M*=8) is applied to the OFDM signal, the PAPR of 8.5dB is achieved which is shown by Fig. 13 (c). In this case, the

It is shown by Fig. 13 (d) that when OPS-DSI scheme is applied to the OFDM signal, the PAPR of about 7.7dB is achieved. In other words, the PAPR is enhanced by about 4.2dB compared to original signal. The PAPR performance of implemented OPS-DSI scheme is shown by Fig. 13 (e). The implemented system shows slightly degraded PAPR performance which is due to the Hardware input bit resolution. The ISE tol is able to generate total data path delay for OPS-DSI design which is 10.937 ns. This delay is within the accepted range

While comparing this result with recent works (Jeon et al., 2011; Naeiny et al., 2011; Hong et al., 2010; Wang et al., 2011; Kim et al., 2006), the face that 4.2dB reduction is achieved with only one IFFT and lowest complexity makes OPS-DSI method a very attractive method

In some literature papers, the PAPR performance is studied using time domain symbols. Fig. 14 presents 1024 samples of output signal with and without PAPR. Blue color samples are the output signal without PAPR reduction and the red color samples are the output signal when PAPR reduction is applied. It can be observed that the OFDM signal peaks are suppressed. However the reduction seems to be insignificant. The reason is that OPS-DSI scheme is a probabilistic method and the reduction is based on signal modification,

therefore, time domain graph is not an accurate study tool for this case.

Random selection of the phase sequence Matrix

Transmitted Signal

complexity of the system. In windowing technique a large signal peak is multiplied with a certain frame. Envelope scaling method is an algorithm to reduce PAPR by scaling the input envelope for some subcarriers before they are sent to IFFT. In the random phase updating algorithm, some random phases are generated and assigned for each carrier. The process of updating is continued till the peak value of the OFDM signal is below the threshold. The peak reduction carrier involves the use of a higher order modulation scheme to represent a lower order modulation symbol (Vijayarangan et al., 2009). The Companding technique is used to compress and expand the OFDM signal in order to reduce PAPR. The speech processing is the main application of companding method as it has less frequent peaks problem.

The second category of PAPR reduction methods is named distortionless techniques. These methods have significant PAPR performance without causing nonlinear distortion. However, they typically incur large computational complexities and sometimes side information transmission. Moreover, these methods usually require receiver side modifications that may be incompatible to existing communication systems. Such approaches include Coding (Jones et al., 1994; Kwon et al., 2009), Partial Transmit Sequence (PTS) (Muller et al., 1997; Gao et al., 2009; Chen et al., 2010; Kang et al., 1999), Selected Mapping (SLM) (Bauml et al., 1996, ), Dummy Signal Insertion (DSI) (Ryu et al., 2004; Qian et al., 2005), Tone Injection and Tone Reservation (Tellado, 2000), Interleaving (Jayalath et al., 2000), Active Constellation Extension (ACE) (Krongold et al., 2003).

Most of the recent researches are concentrating on modified SLM and PTS methods (Wang et al. 2011; Ghassemi et al. 2010; Naeiny et al., 2011; Kim et al., 2006; Jeon et al., 2011; Hong et al., 2010). According to the review, most of the modified methods reduce PAPR at the expense of complexity in the transmitter or degrading the spectrum efficiency of the system. It should be noted that improving the performance of SLM based techniques requires high number of IFFT processors which leads to high complexity. The PTS based methods also have drawback of complexity from another aspect. The improvement of these methods requires extra number of additions and multiplication to be implemented for finding optimum value which leads to high complexity. Hence, there is good scope to design a new method to overcome previous drawbacks and enhance the PAPR performance.

Here one of recently proposed methods for reducing PAPR of OFDM signal is presented (Mohammady et al., 2011). This method is named Optimum Phase Sequence insertion with Dummy Sequence Insertion (OPS-DSI). As shown in block diagram of OPS-DSI scheme in Fig. 13, there are two loops in OPS-DSI algorithm. If the PAPR is not less than the threshold, *Loop\_a* with specific number of iterations is performed and the PAPR will be compared. If the PAPR is less than the threshold, the signal will be transmitted regardless of the second loop, otherwise the second loop *Loop\_b* with predefined number of iterations is executed and the PAPR is calculated similarly. When *Loop\_a* is performed, a new random dummy is generated and inserted to the signal, howeverthe phase sequence is the same as last iteration. It should be noted that the number of iterations is specified based on the PAPR reduction requirement and data rate. The value of the PAPR threshold is also based on each standard in wireless broadband.

It should be noted that when *Loop\_b* is running, *Loop\_a* is repeated. It means that in *Loop\_b,* new random phase sequence will be selected and multiplied to the signal and then a new random dummy is inserted to the signal. When the threshold condition is passed, the signal 150 Fourier Transform – Signal Processing

complexity of the system. In windowing technique a large signal peak is multiplied with a certain frame. Envelope scaling method is an algorithm to reduce PAPR by scaling the input envelope for some subcarriers before they are sent to IFFT. In the random phase updating algorithm, some random phases are generated and assigned for each carrier. The process of updating is continued till the peak value of the OFDM signal is below the threshold. The peak reduction carrier involves the use of a higher order modulation scheme to represent a lower order modulation symbol (Vijayarangan et al., 2009). The Companding technique is used to compress and expand the OFDM signal in order to reduce PAPR. The speech processing is the main application of companding method as it has less frequent peaks

The second category of PAPR reduction methods is named distortionless techniques. These methods have significant PAPR performance without causing nonlinear distortion. However, they typically incur large computational complexities and sometimes side information transmission. Moreover, these methods usually require receiver side modifications that may be incompatible to existing communication systems. Such approaches include Coding (Jones et al., 1994; Kwon et al., 2009), Partial Transmit Sequence (PTS) (Muller et al., 1997; Gao et al., 2009; Chen et al., 2010; Kang et al., 1999), Selected Mapping (SLM) (Bauml et al., 1996, ), Dummy Signal Insertion (DSI) (Ryu et al., 2004; Qian et al., 2005), Tone Injection and Tone Reservation (Tellado, 2000), Interleaving (Jayalath et

Most of the recent researches are concentrating on modified SLM and PTS methods (Wang et al. 2011; Ghassemi et al. 2010; Naeiny et al., 2011; Kim et al., 2006; Jeon et al., 2011; Hong et al., 2010). According to the review, most of the modified methods reduce PAPR at the expense of complexity in the transmitter or degrading the spectrum efficiency of the system. It should be noted that improving the performance of SLM based techniques requires high number of IFFT processors which leads to high complexity. The PTS based methods also have drawback of complexity from another aspect. The improvement of these methods requires extra number of additions and multiplication to be implemented for finding optimum value which leads to high complexity. Hence, there is good scope to design a new

Here one of recently proposed methods for reducing PAPR of OFDM signal is presented (Mohammady et al., 2011). This method is named Optimum Phase Sequence insertion with Dummy Sequence Insertion (OPS-DSI). As shown in block diagram of OPS-DSI scheme in Fig. 13, there are two loops in OPS-DSI algorithm. If the PAPR is not less than the threshold, *Loop\_a* with specific number of iterations is performed and the PAPR will be compared. If the PAPR is less than the threshold, the signal will be transmitted regardless of the second loop, otherwise the second loop *Loop\_b* with predefined number of iterations is executed and the PAPR is calculated similarly. When *Loop\_a* is performed, a new random dummy is generated and inserted to the signal, howeverthe phase sequence is the same as last iteration. It should be noted that the number of iterations is specified based on the PAPR reduction requirement and data rate. The value of the PAPR threshold is also based on each

It should be noted that when *Loop\_b* is running, *Loop\_a* is repeated. It means that in *Loop\_b,* new random phase sequence will be selected and multiplied to the signal and then a new random dummy is inserted to the signal. When the threshold condition is passed, the signal

al., 2000), Active Constellation Extension (ACE) (Krongold et al., 2003).

method to overcome previous drawbacks and enhance the PAPR performance.

standard in wireless broadband.

problem.

will be transmitted, however if the iterations for both loops are performed and still the PAPR is not less than the threshold, the signal with minimum PAPR among them will be transmitted.

Fig. 12. Block diagram of the OPS-DSI scheme, transmitter

The CCDF result of a typical OFDM system with OPS-DSI scheme is compared with C-SLM and DSI methods. As shown in Fig. 13 (a), the PAPR of original OFDM signal is about 11.8dB at 10-4 CCDF or 0.01% CCDF. When DSI method is applied to this system, the PAPR is reduced to about 9.9dB shown by Fig. 13 (b) which means that the PAPR performance is enhanced by about 1.9dB compared to original OFDM signal.

When the C-SLM method with 8 IFFTs (number of candidate signals, *M*=8) is applied to the OFDM signal, the PAPR of 8.5dB is achieved which is shown by Fig. 13 (c). In this case, the PAPR is enhanced by about 3.4dB.

It is shown by Fig. 13 (d) that when OPS-DSI scheme is applied to the OFDM signal, the PAPR of about 7.7dB is achieved. In other words, the PAPR is enhanced by about 4.2dB compared to original signal. The PAPR performance of implemented OPS-DSI scheme is shown by Fig. 13 (e). The implemented system shows slightly degraded PAPR performance which is due to the Hardware input bit resolution. The ISE tol is able to generate total data path delay for OPS-DSI design which is 10.937 ns. This delay is within the accepted range according to Shannon-Hartly theorem (Hartley, 1928).

While comparing this result with recent works (Jeon et al., 2011; Naeiny et al., 2011; Hong et al., 2010; Wang et al., 2011; Kim et al., 2006), the face that 4.2dB reduction is achieved with only one IFFT and lowest complexity makes OPS-DSI method a very attractive method suitable for FPGA implementation.

In some literature papers, the PAPR performance is studied using time domain symbols. Fig. 14 presents 1024 samples of output signal with and without PAPR. Blue color samples are the output signal without PAPR reduction and the red color samples are the output signal when PAPR reduction is applied. It can be observed that the OFDM signal peaks are suppressed. However the reduction seems to be insignificant. The reason is that OPS-DSI scheme is a probabilistic method and the reduction is based on signal modification, therefore, time domain graph is not an accurate study tool for this case.

FPGA Implementation of Inverse Fast Fourier Transform

**6. Conclusion** 

**7. References** 

2057.

23: 447–461.

65.

*Broadcasting,* 56(1): 110-113.

in Orthogonal Frequency Division Multiplexing Systems 153

In this chapter the OFDM transmition system is studied. The main component of IFFT processor is introduced. Hardware implementation of this block is performed and results are compared with the simulation. A very iportant application of IFFT in PAPR reduction scheme of OPS-DSI is reviewed. Different type of algorithms for IFFT are tested and Hardware resource consumption and power consumption are estimated using ISE tools. The complexity of implementing one IFFT block in the FPGA is mathematicaly computed.

Bauml R. W., Fischer R. F. H., Huber J. B. (1996). Reducing the peak-to-average power ratio

Baxley, R. J., & Zhou, G. T. (2007). Comparing selected mapping and partial transmit sequence for PAR reduction. *Broadcasting, IEEE Transactions on, 53*(4): 797-803. Cao R., Jiang, T., Qin, J. (2007). Study on companding transforms for reduction in PAPR of OFDM signals. *Tien Tzu Hsueh Pao/Acta Electronica Sinica, 35*(6): 1099-1101. Chang, P., Jeng, S., Chen, J. (2010). Utilizing a novel root companding transform technique

Chen, J. (2010). Application of quantum-inspired evolutionary algorithm to reduce PAPR of

Cooper, S., (2008). Digital Radio Techniques for Energy Efficient OFDM Base stations, Axis

Foomooljareon P., Fernando W.A.C. *Input sequence envelope scaling in PAPR reduction of* 

Gao J., Wang J., Wang B. (2009). Peak-to-average power ratio reduction based on cyclic

Ghassemi A., Gulliver, T. (2010). PAPR reduction of OFDM using PTS and error-correcting code subblocking. *IEEE Transactions on Wireless Communications,* 9(3): 980-989. Han S. H., Lee J. H. (2005). An overview of peak-to-average power ratio reduction

Heo S., Joo H., No J., Lim D., Shin D. *Analysis of PAPR reduction performance of SLM schemes* 

Information Theory (ISIT 2009), Seoul, June 28-July 3 2009, Korea, 2009. Hao M., Liaw C. (2008). A companding technique for PAPR reduction of OFDM systems.

Network Technology [White Paper], Retrieved March 2009 from

Multimedia Communications, Honolulu, 27-30 Oct. Hawaii, 2002.

http://www.axisnt.com/downloads/DigitalRadioWP.pdf

*Information Technology Application, IITA 2009,* 2: 161-164.

Hartley R.V.L. (1928). Transmission of Information. *Bell System Technical Journal*.

*IEICE Transactions on Communications*, E91-B(3): 935-938.

of multicarrier modulation by selected mapping. *Electronics Letters, 32*(22): 2056-

to reduce PAPR in OFDM systems. *International Journal of Communication Systems,* 

an OFDM signal using partial transmit sequences technique. *IEEE Transactions on* 

*OFDM*. Proceedings of the 5th International Symposium on Wireless Personal

iteration partial transmit sequence. *3rd International Symposium on Intelligent* 

techniques for multicarrier transmission. *IEEE Wireless Communications, 12*(2): 56-

*with correlated phase vectors*. Proceedings of the IEEE International Symposium on

Fig. 13. Comparison of PAPR performance, (a) Original OFDM signal, (b) DSI method, (c) C-SLM method, (d) OPS-DSI method in simulation, (e) OPS-DSI in Implementation

Fig. 14. OFDM signal in time domain, (Blue) without PAPR reduction technique, (Red) with PAPR reduction technique

## **6. Conclusion**

152 Fourier Transform – Signal Processing

10

0

10


10

Pr[PAPR>PAPR0]


10


10

0.20

0.18

0.16

0.14

0.12

0.10

Signal Amplitude

0.08

0.06

0.04

0.02

0

PAPR reduction technique


5 6 7 8 9 10 11 12 13 PAPR0 [dB]

0 200 400 600 800 1000 1200

Fig. 14. OFDM signal in time domain, (Blue) without PAPR reduction technique, (Red) with

Number of Samples

Fig. 13. Comparison of PAPR performance, (a) Original OFDM signal, (b) DSI method, (c) C-SLM method, (d) OPS-DSI method in simulation, (e) OPS-DSI in Implementation

(a) Original OFDM Signal

Without OPS-DSI scheme With OPS-DSI scheme

(b) DSI, L=55 (c) C-SLM, M=8 (d) OPS-DSI, M=1 (e) OPS-DSI Imple., M=1 In this chapter the OFDM transmition system is studied. The main component of IFFT processor is introduced. Hardware implementation of this block is performed and results are compared with the simulation. A very iportant application of IFFT in PAPR reduction scheme of OPS-DSI is reviewed. Different type of algorithms for IFFT are tested and Hardware resource consumption and power consumption are estimated using ISE tools. The complexity of implementing one IFFT block in the FPGA is mathematicaly computed.

### **7. References**


FPGA Implementation of Inverse Fast Fourier Transform

Graz, July 23-25 Austria, 2008.

*on Broadcasting, 52*(2): 261-267.

*50*(1): 89-94.

1998.

5(1): 25-36.

306.

in Orthogonal Frequency Division Multiplexing Systems 155

Nikookar H., Lidsheim K. S. (2002). Random phase updating algorithm for OFDM transmission with low PAPR. *IEEE Transactions on Broadcasting, 48*(2): 123-128. Ochiai H. (2003). Performance analysis of peak power and band-limited OFDM system with linear scaling. *IEEE Transactions on Wireless Communications, 2*(5): 1055-1065. Pratt T. G., Jones N., Smee L., Torrey M. (2006). OFDM link performance with companding

Qian H., Xiao Ch., Chen N., Zhou G.T., *Dynamic selected mapping for OFDM*. Proceedings of

Qian, H. (2005). *Power Efficiency improvements for wireless transmissions. Power Efficiency* 

Ryu H. G., Lee J. E., Park J. S. (2004). Dummy sequence insertion (DSI) for PAPR reduction

Sharma P.K., Basu A. *Performance Analysis of Peak-to-Average Power Ratio Reduction Techniques* 

Tan C.E., Wassell I.J. *Data bearing peak reduction carriers for OFDM systems. Information*,

Tellado J. (2000). *Multicarrier modulation with low PAR. Applications to DSL and wireless*

Tellado, J. (2000). *Peak to average power reductionfor multicarrier modulation.* Doctoral

Van N. R., De W. A. *Reducing the peak-to-average power ratio of OFDM*. Proceedings of 48th

Varahram P., Mohammady S., Hamidon M.N., Sidek R.M., Khatun S. (2009). Digital

Wang L., Liu J. (2011). PAPR Reduction of OFDM Signals by PTS with Grouping and

wideband applications. *Journal of Electrical Engineering.* 60(3): 129-135. Vijayarangan V., Sukanesh D. (2009). An overview of techniques for reducing Peak to

Kluwer Academic Publication. Dordrecht, Netherlands, 2000.

and Computer Engineering, Georgia Institute of Technology, USA.

(ICASSP '05). Philadelphia, PA, 18-23 March USA. 2005.

Computing (ARTCom), 16-17 Oct. India, 2010.

Multimedia, December 15-18, Singapore, 2003.

dissertation, Stanford University, USA.

on Communication Systems, Networks and Digital Signal Processing (CNSDSP),

for PAPR reduction in the presence of non-linear amplification. *IEEE Transactions* 

IEEE International Conference on Acoustics, Speech, and Signal Processing

*Improvements for Wireless Transmissions,* Doctoral dissertation, School of Electrical

in the OFDM communication system. *IEEE Transactions on Consumer Electronics.* 

*for Wireless Communication Using OFDM Signals*, Proceedings of the International Conference on Advances in Recent Technologies in Communication and

Proceedings of the 2003 Joint Conference of the Fourth International Conference on Communications and Signal Processing the Fourth Pacific Rim Conference on

IEEE Vehicular Technology Conference (VTC 1998), Ottawa, May 18-21 Canada,

predistortion technique for compensating memory effects of power amplifiers in

Average Power Ratio and its selection criteria for orthogonal frequency division multiplexing radio systems. *Journal of theoretical and applied information technology.*

Recursive Phase Weighting Methods, *IEEE Transactions on Broadcasting.* 57(2): 299-


154 Fourier Transform – Signal Processing

Hao, M., Lai, C. (2010). Precoding for PAPR reduction of OFDM signals with minimum

Hao, M., Liaw, C. *A companding technique for PAPR reduction of OFDM systems*. Proceedings

Higashinaka M., Fukui N., Kubo H. (2009). On peak to average power ratio of generalized frequency division multiple access. *IEICE Electronics Express*. 6 (13): 943-948. Hong, E., Har, D. (2010). Peak-to-average power ratio reduction in OFDM systems using all-

Jayalath A., Tellambura C. (2000). Reducing the peak-to-average power ratio of orthogonal

Jeon H. B., No J. S., Shin D. J. (2011). A Low-Complexity SLM Scheme Using Additive

Jones A., Wilkinson T., Barton S. (1994). Block coding scheme for reduction of peak to mean

Kang S. G., Kim J. G., Joo E. K. (1999). A novel subblock partition scheme for partial transmit

Kim S. W., Chung J. K., Ryu H. G., PAPR Reduction of the OFDM Signal by the SLM-based

Krongold B. S., Jones D. L. (2003). PAR reduction in OFDM via active constellation

Kwon J. W., Park S. K., Kim Y. *Peak-to-average power ratio reduction by the partial shift sequence* 

Kwon U., Kim D., Im G. (2009). Amplitude clipping and iterative reconstruction of MIMO-

May, T., Rohling, H., *Reducing the peak-to-average power ratio in OFDM radio transmission* 

Muller S. H., Huber J. B. (1997). OFDM with reduced peak-to-average power ratio by

Naeiny M. F., Marvasti F. (2011). Selected mapping algorithm for PAPR reduction of space-

Nieto J.W. *An investigation of coded OFDM and CEOFDM waveforms utilizing different* 

sequence OFDM. *IEEE Transactions on Broadcasting.* 45(3): 333-338.

extension. *IEEE Transactions on Broadcasting. 49*(3): 258-268.

NIDC2009), Beijing 6-8 Nov. China, 2009.

*Communication., 8*(1): 268-277.

369.

Ottawa, Canada, May 18-21 1998.

*Vehicular Technology,* 60(3): 1211-1216.

of International Symposium on Intelligent Signal Processing and Communications

frequencydivision multiplexing signal through bit or symbol interleaving.

Mapping Sequences for PAPR Reduction of OFDM Signals. *IEEE Transactions* 

envelope power ratio of multicarrier transmission schemes. *Electronics Letters,* 

WHT and DSI Method, IEEE Region 10 Conference (TENCON-2006), Hong Kong,

*method for space-frequency block coded OFDM systems*, Proceedings of 2009 IEEE International Conference on Network Infrastructure and Digital Content (IEEE IC-

OFDM signals with optimum equalization. *IEEE Transactions on Wireless* 

*systems*, Proceedings of 48th IEEE Vehicular Technology Conference, VTC 1998,

optimum combination of partial transmit sequences. *Electronics Letters, 33*(5): 368-

frequency coded OFDM systems without side information. *IEEE Transactions on* 

*modulation schemes on HF channels*, Proceedings of the 6th International Symposium

error probability. *IEEE Transactions on Broadcasting,* 56(1): 120-128.

pass filters. *IEEE Transactions on Broadcasting,* 56(1): 114-119.

(ISPACS'06), Yonago, 12-15 Dec. Japan, 2006.

*Electronics Letters. 36*(13): 1161-1163.

*on Broadcasting*, PP(99): 0-1.

*30*(25): 2098-2099.

14-17 Nov. China, 2006.

on Communication Systems, Networks and Digital Signal Processing (CNSDSP), Graz, July 23-25 Austria, 2008.


**7** 

*Uruguay* 

**Fault Diagnosis of** 

*Universidad de Montevideo* 

**Induction Motors Based on FFT** 

Castelli Marcelo, Juan Pablo Fossatti and José Ignacio Terra

The FFT (Fast Fourier Transform) can be used for on-line failure detection of asynchronous motors. In this work a methodology is described for the most likely to happen faults in

In industrialized countries, induction motors are responsible for the 40% to 50% of energy consumption. The electric motors in most cases are responsible for the proper functioning of the productive system. In this line, the corrective maintenance of equipment is very expensive since it involves unscheduled downtime and damage to the production process

Nowadays, there are many published techniques and allowed commercial tools for the induction motors failure detection. Despite this, most industries still do not use detection

Here a methodology for monitoring and diagnosis of induction motors is presented, which monitors the engine without removing it from the production line, being this methodology:

Next a brief study of the most likely faults to occur in induction motors has been made, as

Then, a methodology based on the measurement of the stator current signal has been developed. This can show the frequency and magnitude of each failure happen to occur in

Most failures in induction motors can be classified in two main groups: isolation failures

Isolation failures are commonly characterized by stator coils short-circuits, while mechanical faults are commonly associated to rotor or rotor related damage. The most important mechanical failures are: rotor broken bars and rings, bearings damage, irregular gaps (static

After that, the methodology has been carried out to validate in the laboratory.

and dynamics eccentricities), unbalances, refrigeration troubles, etc.

induction motors: broken rotor bars, bearing damage, short circuits and eccentricity.

**1. Introduction** 

caused by equipment failures.

reliable, easy to apply and low cost.

**2. Failures in induction motors** 

and mechanical failures (Botha, 1997).

this kind of engines.

and monitoring techniques of electrical machines.

well as the existing methods for detecting them.

