**3. Formulation of OMP with an NLS step for CS**

#### **3.1 Mathematical development**

Consider a continuous time signal *X*(*t*) consisting of *K* complex sinusoids of the form

$$X(t) = \sum\_{k=1}^{K} a\_k \exp(2\pi i \, f\_k \, t) \tag{4}$$

where *ak* is the complex valued amplitude and *fk* is the real valued frequency of the *k*th sinusoid. This signal model is broadly applicable [see Duarte and Baraniuk (2010) and 172 Applications of Digital Signal Processing

grid of the time series, OMP can be improved by evaluating the cost function on an overcomplete dictionary (Candes et al. 2011), but as in the ell-1 estimates discussed above, this step becomes computationally intractable long before machine precision can be

Here we follow the development of nonlinear least squares (NLS) given by Stoica and Moses (1997). Their eq. (4.3.1) defines a cost function to be minimized as a function of the

 *f*(ZDM) = 6t **|***y*(*t*) - 6k Dkexp[i(Zk*t*+Mk)]**|**2 (3) where the sums are over the number of sinusoids present in the signal, *k* = 1 to *K* and the time points run from *t* = 0 to *N*-1. Stoica and Moses also show (see their eqs. 4.3.2-4.3.8), that the frequency vector Z is the critical unknown and the amplitude and phase (or complex amplitude) are simply "nuisance" parameters that are obtained from Z. While eq. (3) appears to require simultaneous solution for three real vectors, each of length K, Stoica and Moses (eqs. 4.3.2-4.3.8) show that the problem can be reduced to solving for just the frequency vector Z and that the complex amplitude vector can be calculated directly from

the frequency vector. We use a version of these equations below in eqs. (8) and (13).

In principle, solution of the CS analog of eq. (3) could be performed to directly obtain the parameters of a sparse signal, but in practice, direct solution of eq. (3) is not computationally practical (Stoica and Moses, 1997). The difficulty is that even for a small *K*, eq. (3) is highly multimodal (see for example, Fig. 1 in Li et al., 2000) and the solution requires extremely good first guesses for the vector Z. Even with good initial values for Z, performance guarantees are difficult to find and continue to be the subject of intense investigation (Salzo

Similar two-step model-based approaches to estimating the frequency and amplitude of real and complex sinusoids have been discussed previously in the literature (Stoica et al., 2000: Li et al., 2000; Chan and So, 2004; Christensen and Jensen, 2006). Stoica et al. discuss the use of NLS to obtain the amplitude for complex sinusoidal signals given the frequency; Li et al. and Chan and So discuss a combined matching pursuit NLS approach similar to ours for obtaining the frequencies of complex and real harmonic sinusoidal signals, respectively; and Christensen and Jensen use matching pursuit plus NLS to estimate frequencies in a signal that is the sum of arbitrary frequency real sinusoids. To the best of our knowledge, our paper is the first application of an OMP/NLS algorithm to estimate the frequency and

Consider a continuous time signal *X*(*t*) consisting of *K* complex sinusoids of the form

where *ak* is the complex valued amplitude and *fk* is the real valued frequency of the *k*th sinusoid. This signal model is broadly applicable [see Duarte and Baraniuk (2010) and

(4)

obtained for arbitrary frequencies.

vectors ZDM

**2.3 Nonlinear Least Squares method** 

and Villa, 2011 and references therein).

amplitude from CS measurements.

**3.1 Mathematical development** 

**3. Formulation of OMP with an NLS step for CS** 

references therein]. We take *f*min = 0 and *f*max = 1 to set up our test problem; we sample *X*(*t*) at the Nyquist rate for complex signals, '*t* = 1/*f*max=1, to obtain the sampled time series *X***<sup>S</sup>** of length *N* from *t* = 0 to *t* = *N***-1** where *N* is the number of time samples. As in all applications of compressive sensing, we make a sparsity assumption, *K* << *N*, and mix the input signal vector *X***S** plus sampled noise *n* down in dimension to the measured vector *y* of dimension *M*:

$$y = \Phi(\,\,\lambda\_S + n),\tag{5}$$

where **Ʒ** is an *M* x *N* mixing matrix and *K*<*M* << *N*. Note that in eq. (5) *n* is added to *X***<sup>S</sup>** prior to mixing. In other models noise is added to the mixed version of *X***S**, **Ʒ***X***S,** or even to **Ʒ** itself. We generate the elements of **Ʒ** using the pseudo-random number functions in our software (*Mathematica* and *Python*) such that they are taken uniformly from the set of nine complex numbers: {-1 – i, -1, -1 + i, -i, 0, i, 1 – i, 1, 1 + i} or equivalently, the elements are taken from the sum of random integers drawn from {-1,0,1} plus i times different random integers drawn from {-1,0,1}. We use a complex mixing matrix because our signal model is complex. The noise is assumed to be independent and identically distributed (i.i.d.) Gaussian noise with standard deviation V/21/2 and is added to the real and the imaginary part of each element of *X***S**, so that the covariance of *n* is V**2***I* , where *I* is the *N* x *N* identity matrix. If the frequencies lie on the DFT frequency grid associated with the time series *t* = 0 to *t* = *N***-1**, eq. (5) can be solved for the frequencies by writing *s* = **DFT** *X***S**, substituting *X***S** = **IDFT** *s* (**IDFT** = Inverse DFT) in eq. (5), and solving *y* = **Ʒ(IDFT** *s***+***n***)** by minimizing the ell-1 norm of *s* subject to the measurement constraint eq. (5) if *n* = 0 or by minimizing the ell-1 norm penalized by an arbitrary fraction of the constraint (LASSO) in the presence of noise (Candes & Wakin, 2008; Loris, 2008).

Although the noise is assumed to be i.i.d., the mixing matrix **Ʒ** colors the noise in the observation vector *y*. The covariance of *y* is given by

$$\text{Cov}[y] = \sigma 2 \blacktriangleright \mathbf{Q} \clubsuit \mathbf{H},\tag{6}$$

and the standard maximum likelihood estimator requires definition of a weighting matrix *W*,

$$\mathcal{W} = (\mathbf{Q} \cdot \mathbf{Q}^{\mathrm{Id}})^{-1},\tag{7}$$

where the superscript H indicates the Hermitian conjugate (see Stoica et al., 2000 and Chan and So, 2004, for a discussion of weighted estimators in NLS). If the inverse in eq. (7) is illconditioned or does not exist, this indicates a poor choice of mixing matrix **Ʒ** and another one should be chosen. The maximum likelihood estimator (MLE) for *X***S**, *C***(***Z***)** is solved by finding the vector *Z* that minimizes the weighted square of the residual given by

$$\mathbf{C(Z)} \equiv (\boldsymbol{\Phi}\mathbf{Z} - \boldsymbol{y}) \text{ } \mathbb{H} \text{ } (\boldsymbol{\Phi}\mathbf{Z} - \boldsymbol{y}) \text{ } \tag{8}$$

*Z* is a vector taken from the linear subspace spanned by at most *K* complex sinusoids sampled over *t* = 0 to *N***-1** (see the corresponding equation for determining the amplitude and frequency of a sum of complex sinusoids in a system that does not have compressive sensing, Stoica and Moses, 1997, eq. 4.3.6). CS recovery is equivalent to determining the spectral support (that is, the *K* unknown frequencies) of the input signal *X***S***,* or equivalently determining the vector *Z* that minimizes eq. (8) (Duarte & Baraniuk, 2010). In the absence of noise, weighting with *W* is unnecessary because the solution is exact and both the weighted

Applications of the Orthogonal Matching Pursuit/ Nonlinear

of eq. (11).

**3.2 Algorithm description** 

vector *y* to form the first residual *r***1**.

Least Squares Algorithm to Compressive Sensing Recovery 175

 *U* = ) { *x***(** *f***1**), *x*( *f***2**),. . .*x*(*f***K**)} (12) Note that if there is no noise and if all frequencies are known exactly, eq. (11) can be verified by substituting *y* **=** *U A***(***U***),** which is equivalent to eq. (5), on the right hand side

Finally, starting from estimates of the frequencies and amplitudes from OMP as described above, apply weighted NLS to get better values. This is done by finding the frequency or set

 *R*(*f*) = |{*A*[*U*(*f*)]*U*(*f*) - *y*}H *W* {*A*[*U*(*f*)]*U*(*f*) - *y*}|, (13) which is the same as the weighted least squares estimator given by eq. (8) with the substitution of *A***[***U(f)***]** defined by eq. (11) for the amplitude vector and *U***(***f***)** defined by eq. (12) for the mixed sinusoids (see the analogous equations in Stoica and Moses, 1997, eqs.

As described in Table 1, the first step in the first iteration of the Do loop is estimation of the first frequency in the spectral support of the signal *X***s**. This is given by the frequency of the sinusoid whose image after multiplication by **Ʒ** has the maximum correlation with the observation vector *y* (see, for example, Tropp and Gilbert, 2007 Algorithm 3, step 2). Here we use the equivalent form, the argmax of *G***(***f***,** *y***)** with respect to *f* to obtain the first estimate of the frequency of the first sinusoid *f*1 in eq. (4). At this point previous implementations of discrete OMP use the amplitude estimator eq. (11) to estimate the amplitude of the first sinusoid *a*1 = *A*[**Ʒ** *x*(*f*1)], multiply this amplitude estimate times *x***(***f***1)**, given by eq. (9), and by the measurement or mixing matrix **Ʒ** and subtract this vector from the measurement

In our algorithm, we proceed differently by improving the precision of the frequency estimates using NLS before finding the amplitude estimate. We take the frequency *f***1** from the argmax of *G***(***f***,** *y***)** evaluated on a discrete set of frequencies and use that as the starting value to solve the NLS problem given by eq. (13). We have used several methods and several different software packages to solve the NLS problem. A simple decimation routine [i.e., tabulating *R***(***f***)** from *f***1-**'*f* to *f***1+**'*f* ('*f* is the over-complete grid spacing) in 10 steps, finding the argmin, decreasing '*f* by a factor of 10, tabulating and finding the argmin of *R***(***f***)** again until the specified precision is reached] works well but is not very efficient. Powell's method in *Python (*"scipy.optimize.fmin\_powell") and one of the Newton methods, the PrincipalAxis method, and the Conjugate Gradient method in *Mathematica*'s minimizer "FindMinimum" all work and take less time than the decimation routine. A detailed investigation of minimizers for the NLS step in our version of OMP is beyond the scope of this chapter. The oversampling *N***f** required for our method and that required for

Given the better value of *f*1, we compute *a*1 from eq. (11) and a new value of the residual *r*

 *r*1 = *y* – *A*(*U*1)*U*1 = *y* – *a*1 **Ʒ** *x*(*f*1). (14)

conventional OMP are nearly identical as discussed below in Section 6.

with the NLS estimate of the first signal removed from *y* as in OMP

where *U* is the spectral support matrix given that depends on **{***f***1,***f***2,…***f***K}** through

of frequencies *f* **= {** *f***1,** *f***2,…***f***k }** that minimize the functional *R***(***f***)** given by

4.3.7 and 4.3.8). The product *A***[***U(f)***]** *U***(***f***)** in eq. (13) is the same as **Ʒ***Z* in eq. (8).

and un-weighted residuals are zero. Finding the *K* sinusoids that solve eq. (8) is the standard NLS problem and if this were computationally tractable, the problem would be solved. But as pointed out by Li et al. (2000) [see also the discussion in Stoica & Moses (1997)], "the NLS cost function in (3) is usually multimodal with many local minima," and "the minimization of the NLS cost function requires the use of a very fine searching algorithm and may be computationally prohibitive."

One way out of this dilemma is to use NLS in place of least squares within OMP. This has two advantages over using NLS by itself. First, the frequency band over which one has to search in NLS is reduced from the entire frequency band to the frequency grid spacing in the over-complete dictionary used in OMP. Second, the estimates of the frequencies at any given iteration are improved from the values on the grid by using NLS in the previous iteration (see the discussion of a similar continuous version of matching pursuit by Jacques & De Vleeschouwer, 2008).

The first step in our formulation is to define the vector function of frequency, *x*(*f*), as the time series for a unity amplitude complex sinusoid at frequency *f* evaluated at integral sampling times *t* = 0 to *t* = *N -*1,

$$\mathbf{x}(\mathfrak{f}) = \{1, \,\mathbf{e}^{i\mathbf{2}\mathfrak{m}\mathfrak{f}}, \,\mathbf{e}^{i4\mathfrak{m}\mathfrak{f}}, \,\dots, \,\mathbf{e}^{i\mathbf{2}(N-1)\mathfrak{m}\mathfrak{f}}\}.\tag{9}$$

Note that the solution for *X***S** in eq. (5) is a linear combination of *K* vectors *x*(*f***i**), (*i* = 1,*K*). To use OMP, we need an over-complete dictionary (Candes et al., 2010) which means that *x***(***f***)** is evaluated on a fine grid oversampled by the factor *N***f** from the DFT grid. The second step is to define a function that can be evaluated on the fine grid to find a grid frequency close to one of the true frequencies in the input signal *Xs*. Here we use the function *G(f*, *r*) given by

$$G(f,r) = \mathbb{1}\left/\left|r^\* \, W \, r - \frac{\left[\Phi \, \mathbf{x}(f)\right]^\* W \, r}{\sqrt{\left|\Phi \, \mathbf{x}(f)\right|^\* W \left|\Phi \, \mathbf{x}(f)\right|}}\right|\right.\tag{10}$$

where initially *r* = *y* and subsequently, *r* equals the residual of *y* with 1 to *K* components removed as discussed below. We calculate the argmax of *G***(***f***,***y***)** over *f* in the dictionary frequencies to make a first estimate of one of the frequencies present in the input signal *X*(*t*). If there is no noise and if there is only one sinusoid, this procedure provides the dictionary vector whose frequency is nearest that of the input sinusoid. If multiple sinusoids are present, the maximum of *G*(*f*,*y*) occurs at one of the dictionary vectors whose frequency is near one of the input sinusoids provided that the dictionary is sufficiently over-complete and that **Ʒ** possesses the restricted isometry property (Duarte and Baraniuk, 2010). Note that *G*(*f*,*r*) is the inverse square of the distance between *r* and the linear span of *x***(***f***)** in the *W*-normed inner product space (defined by < *a* , *b* > = *a***H***Wb* ). Thus finding the argmax of *G***(***f***,***r***)** is equivalent to finding the argmax of the inner product of the residual with the product of **Ʒ** times the dictionary vectors *x***(***f***j)** for all *f***j** on the over-complete frequency grid (see Tropp and Gilbert, 2007, Algorithm 3, Step 2).

Given estimates of the frequencies **{***f***1,***f***2,…***f***K}** present in the input signal, we can find estimates of the amplitudes of each sinusoid by using the least squares estimator *A***(***U***)** for the amplitude vector **{***a***1,***a***2,…***a***K}** (see Stoica and Moses, 1997, eq. 4.3.8 and Stoica et al., 2000)

$$A(L) = (L^{\text{H}} \mathcal{V} \mathcal{V} \, L)^{-1} \, L^{\text{H}} \, \mathcal{V} \, \mathcal{Y} \, \tag{11}$$

where *U* is the spectral support matrix given that depends on **{***f***1,***f***2,…***f***K}** through

$$\mathcal{U} = \left. \Phi \left( \mathbf{x}(f\_1), \ x(f\_2), \dots, \mathbf{x}(f\_K) \right) \right| \tag{12}$$

Note that if there is no noise and if all frequencies are known exactly, eq. (11) can be verified by substituting *y* **=** *U A***(***U***),** which is equivalent to eq. (5), on the right hand side of eq. (11).

Finally, starting from estimates of the frequencies and amplitudes from OMP as described above, apply weighted NLS to get better values. This is done by finding the frequency or set of frequencies *f* **= {** *f***1,** *f***2,…***f***k }** that minimize the functional *R***(***f***)** given by

$$R(\mathfrak{f}) = |\{A[L\mathcal{U}[f]]L[\mathfrak{f}] - y\}^{\mathrm{H}}\,\mathrm{V}\,\{A[L\mathcal{U}[f]]L[\mathfrak{f}] - y\}|\,\mathrm{}\,\tag{13}$$

which is the same as the weighted least squares estimator given by eq. (8) with the substitution of *A***[***U(f)***]** defined by eq. (11) for the amplitude vector and *U***(***f***)** defined by eq. (12) for the mixed sinusoids (see the analogous equations in Stoica and Moses, 1997, eqs. 4.3.7 and 4.3.8). The product *A***[***U(f)***]** *U***(***f***)** in eq. (13) is the same as **Ʒ***Z* in eq. (8).

#### **3.2 Algorithm description**

174 Applications of Digital Signal Processing

and un-weighted residuals are zero. Finding the *K* sinusoids that solve eq. (8) is the standard NLS problem and if this were computationally tractable, the problem would be solved. But as pointed out by Li et al. (2000) [see also the discussion in Stoica & Moses (1997)], "the NLS cost function in (3) is usually multimodal with many local minima," and "the minimization of the NLS cost function requires the use of a very fine searching

One way out of this dilemma is to use NLS in place of least squares within OMP. This has two advantages over using NLS by itself. First, the frequency band over which one has to search in NLS is reduced from the entire frequency band to the frequency grid spacing in the over-complete dictionary used in OMP. Second, the estimates of the frequencies at any given iteration are improved from the values on the grid by using NLS in the previous iteration (see the discussion of a similar continuous version of matching pursuit by Jacques

The first step in our formulation is to define the vector function of frequency, *x*(*f*), as the time series for a unity amplitude complex sinusoid at frequency *f* evaluated at integral

Note that the solution for *X***S** in eq. (5) is a linear combination of *K* vectors *x*(*f***i**), (*i* = 1,*K*). To use OMP, we need an over-complete dictionary (Candes et al., 2010) which means that *x***(***f***)** is evaluated on a fine grid oversampled by the factor *N***f** from the DFT grid. The second step is to define a function that can be evaluated on the fine grid to find a grid frequency close to one of the true frequencies in the input signal *Xs*. Here we use the function *G(f*, *r*) given by

where initially *r* = *y* and subsequently, *r* equals the residual of *y* with 1 to *K* components removed as discussed below. We calculate the argmax of *G***(***f***,***y***)** over *f* in the dictionary frequencies to make a first estimate of one of the frequencies present in the input signal *X*(*t*). If there is no noise and if there is only one sinusoid, this procedure provides the dictionary vector whose frequency is nearest that of the input sinusoid. If multiple sinusoids are present, the maximum of *G*(*f*,*y*) occurs at one of the dictionary vectors whose frequency is near one of the input sinusoids provided that the dictionary is sufficiently over-complete and that **Ʒ** possesses the restricted isometry property (Duarte and Baraniuk, 2010). Note that *G*(*f*,*r*) is the inverse square of the distance between *r* and the linear span of *x***(***f***)** in the *W*-normed inner product space (defined by < *a* , *b* > = *a***H***Wb* ). Thus finding the argmax of *G***(***f***,***r***)** is equivalent to finding the argmax of the inner product of the residual with the product of **Ʒ** times the dictionary vectors *x***(***f***j)** for all *f***j** on the over-complete frequency grid

Given estimates of the frequencies **{***f***1,***f***2,…***f***K}** present in the input signal, we can find estimates of the amplitudes of each sinusoid by using the least squares estimator *A***(***U***)** for the amplitude vector **{***a***1,***a***2,…***a***K}** (see Stoica and Moses, 1997, eq. 4.3.8 and Stoica et al., 2000)

 *A*(*U*) = (*U*H*W U*)-1 *U*H *W y* (11)

, ... , ei2(*N-*1)Ǒ*<sup>f</sup>*

]. (9)

(10)

algorithm and may be computationally prohibitive."

 *x*(*f*) = [1, ei2Ǒ*<sup>f</sup>* , ei4Ǒ*<sup>f</sup>*

(see Tropp and Gilbert, 2007, Algorithm 3, Step 2).

& De Vleeschouwer, 2008).

sampling times *t* = 0 to *t* = *N -*1,

As described in Table 1, the first step in the first iteration of the Do loop is estimation of the first frequency in the spectral support of the signal *X***s**. This is given by the frequency of the sinusoid whose image after multiplication by **Ʒ** has the maximum correlation with the observation vector *y* (see, for example, Tropp and Gilbert, 2007 Algorithm 3, step 2). Here we use the equivalent form, the argmax of *G***(***f***,** *y***)** with respect to *f* to obtain the first estimate of the frequency of the first sinusoid *f*1 in eq. (4). At this point previous implementations of discrete OMP use the amplitude estimator eq. (11) to estimate the amplitude of the first sinusoid *a*1 = *A*[**Ʒ** *x*(*f*1)], multiply this amplitude estimate times *x***(***f***1)**, given by eq. (9), and by the measurement or mixing matrix **Ʒ** and subtract this vector from the measurement vector *y* to form the first residual *r***1**.

In our algorithm, we proceed differently by improving the precision of the frequency estimates using NLS before finding the amplitude estimate. We take the frequency *f***1** from the argmax of *G***(***f***,** *y***)** evaluated on a discrete set of frequencies and use that as the starting value to solve the NLS problem given by eq. (13). We have used several methods and several different software packages to solve the NLS problem. A simple decimation routine [i.e., tabulating *R***(***f***)** from *f***1-**'*f* to *f***1+**'*f* ('*f* is the over-complete grid spacing) in 10 steps, finding the argmin, decreasing '*f* by a factor of 10, tabulating and finding the argmin of *R***(***f***)** again until the specified precision is reached] works well but is not very efficient. Powell's method in *Python (*"scipy.optimize.fmin\_powell") and one of the Newton methods, the PrincipalAxis method, and the Conjugate Gradient method in *Mathematica*'s minimizer "FindMinimum" all work and take less time than the decimation routine. A detailed investigation of minimizers for the NLS step in our version of OMP is beyond the scope of this chapter. The oversampling *N***f** required for our method and that required for conventional OMP are nearly identical as discussed below in Section 6.

Given the better value of *f*1, we compute *a*1 from eq. (11) and a new value of the residual *r* with the NLS estimate of the first signal removed from *y* as in OMP

$$r\_1 = y - A(lL\_1)lL\_1 = y - a\_1 \Phi \ge x(f\_1). \tag{14}$$

Applications of the Orthogonal Matching Pursuit/ Nonlinear

**4. Results for sparse sinusoids without noise** 

**4.1 Signal composed of a single sinusoid** 

**4.2 Signal composed of a 20 sinusoids** 

Least Squares Algorithm to Compressive Sensing Recovery 177

There are two methods to handle the case where the actual number of sinusoids present is unknown, yet still smaller than *K*. The simpler method, applicable for high SNR (V small compared to the smallest signal amplitude), is to perform *K* iterations of the OMP/NLS algorithm, which will incur an additional noise folding penalty, by projecting the additional noise dimensions onto the solution. The second method is to stop when the residual can be explained by noise alone though hypothesis testing. At the solution, the weighted squared residual *r*Hk*W r*k will display a F-squared statistic with 2*k* degrees of freedom, where *k* is the actual number of sinusoids present in the signal. The hypothesis that the residual is caused by noise alone, is accepted when *r***kH***Wr***k <** V**2***T* for some threshold *T* and rejected otherwise. The value for *T* is dependent on a user selectable significance level, the probability of incorrectly rejecting the given hypothesis. For a significance level of D, *T* = CDF-1(1-D), where CDF is the cumulative distribution function of the chi-squared distribution with 2*k* degrees of freedom.

Consider first a signal of the form given by eq. (4) with *K* = 1, *f*1 = 0.44681715350529533 and *a***1** = 0.8018794857541801. Fig. 1 shows a plot of *G***(***f***,** *y***)** for *N* = 1024, *M* = 4. Finding the argmax of *G***(***f***,** *y***)** evaluated for 32,768 frequencies between *f***min** and *f***max** (*N***f** = 32) yields an initial value for the frequency and amplitude of *f*1 = 0.446808 and *a***1** = 0.801070+0.026421i. Minimization of *R*[{)*x(f*1)}] starting at *f***1** = 0.446808 yields a final value *f*1 equal to the input frequency *f*1 with error less than 1x10-16 (machine precision) and the amplitude *a*1 through

Fig. 1. *G*(*f*, *y*) as a function of frequency for a signal composed of a single sinusoid mixed with *N* = 1024x4. Note the appearance of a single strong peak in the estimator that serves as

The algorithm also works for multiple frequencies. More than 20 independent tests were performed for an input signal composed of 20 independent frequencies randomly chosen between 0 and 1; all frequency components have amplitude of 1. In all tests our algorithm recovered the 20 frequencies to machine precision with a 128x1024 mixing matrix. For test 1,

an excellent starting value for minimizing the functional *R*(*f*) given in eq. (13).

We used D = 0.05 in our simulations, but D is an application-specific parameter.

*A*[{) *x(f*1)}] equal to the input amplitude with error less than 4x10-16.

where *U*1 = ) *x***(** *f***1**). The argmax of *G***(***f***,** *r***1)** now yields a first estimate of the frequency of the second sinusoid, *f*2. Next improve the estimates of both *f*1 and *f*2 by again solving the NLS problem by minimizing the functional *R***(***f***)** over *f* **= {***f***1,** *f***2}.** Note that this overwrites the previous estimate of the first frequency *f***1**. The amplitudes *a***1** and *a*2 are recalculated using (8) with *U***2** given by

$$\mathbf{U}\_2 = \begin{bmatrix} \ \mathbf{@x}(f\_1), \ \mathbf{@x}(f\_2) \ \end{bmatrix} \tag{15}$$

for the latest values of *f*1 and *f*2, Finally, in this iteration estimates of the first two sinusoids are removed from *y*:

$$r\_2 = y - A(\text{L2}) \text{ L2}.\tag{16}$$

If *K*, the total number of sinusoids present in the signal, is known, this process is repeated *K* times until *f*K and *a*K are obtained. In the absence of noise, the sum of these sinusoids solves (5) exactly and *r***K** = 0.


Table 1. OMP/NLS Algorithm.

176 Applications of Digital Signal Processing

where *U*1 = ) *x***(** *f***1**). The argmax of *G***(***f***,** *r***1)** now yields a first estimate of the frequency of the second sinusoid, *f*2. Next improve the estimates of both *f*1 and *f*2 by again solving the NLS problem by minimizing the functional *R***(***f***)** over *f* **= {***f***1,** *f***2}.** Note that this overwrites the previous estimate of the first frequency *f***1**. The amplitudes *a***1** and *a*2 are recalculated using

 *U***2** = [ **Ʒ***x***(***f*1), **Ʒ***x***(***f*2) ] (15) for the latest values of *f*1 and *f*2, Finally, in this iteration estimates of the first two sinusoids

 *r*2 = *y* – *A*(*U*2) *U*2. (16) If *K*, the total number of sinusoids present in the signal, is known, this process is repeated *K* times until *f*K and *a*K are obtained. In the absence of noise, the sum of these sinusoids solves

(8) with *U***2** given by

are removed from *y*:

(5) exactly and *r***K** = 0.

 *f***min,** *f***max**

CS Mixing Matrix **Ʒ** Measured data *y*

*f =* **(***f***max***-f***min)/(***N N***f)**

*r***i=** *y***–** *A***(***U***)** *U* 

**{***f***1,** *f***2, …***f***KT}** 

Table 1. OMP/NLS Algorithm.

*HWri < T:* 

If *ri*

 *KT = i* Break End If

*U* = **{Ʒ***x***(***f***1), Ʒ***x***(***f***2)… Ʒ***x***(***f***i)}**

Maximum number of sinusoids *K* or threshold *T* 

*f*<sup>i</sup> = Argmax *G***(***f***,** *r***i-1)** over **{***f***min,** *f***min+**'*f***,…***f***max-**'*f***,***f***max} {***f***1,***f***2,…***f***i} =** Argmin[*R***(***f***)** with initial value *f* = { *f*1, *f*2, …,*f*i} ]

**{***a***1,***a***2,…***a***KT}=** *A* **[{ Ʒ***x* **(***f***1), Ʒ***x* **(***f***2)… Ʒ***x* **(***f***K) }]** 

Oversampling ratio for dictionary *N***<sup>f</sup>**

Inputs:

Initialize  *U* = [ ]  *r0 = y K***T** *= K W =* **(Ʒ ƷH)-1**  '

Do *i* = 1 to *K*

End Do Output of Do: *K***<sup>T</sup>**

There are two methods to handle the case where the actual number of sinusoids present is unknown, yet still smaller than *K*. The simpler method, applicable for high SNR (V small compared to the smallest signal amplitude), is to perform *K* iterations of the OMP/NLS algorithm, which will incur an additional noise folding penalty, by projecting the additional noise dimensions onto the solution. The second method is to stop when the residual can be explained by noise alone though hypothesis testing. At the solution, the weighted squared residual *r*Hk*W r*k will display a F-squared statistic with 2*k* degrees of freedom, where *k* is the actual number of sinusoids present in the signal. The hypothesis that the residual is caused by noise alone, is accepted when *r***kH***Wr***k <** V**2***T* for some threshold *T* and rejected otherwise. The value for *T* is dependent on a user selectable significance level, the probability of incorrectly rejecting the given hypothesis. For a significance level of D, *T* = CDF-1(1-D), where CDF is the cumulative distribution function of the chi-squared distribution with 2*k* degrees of freedom. We used D = 0.05 in our simulations, but D is an application-specific parameter.
