**2.1 Compressive sensing**

In compressive sensing (Donoho, 2006; Candes & Tao, 2006; Baraniuk, 2007), a sparse vector *s* of dimension *N* can be recovered from a measured vector *y* of dimension *M* **(***M* **<<** *N***)** after transformation by a sensing matrix 4 as shown in eq. (1)

$$y = \Theta \, s + n \tag{1}$$

where *n* is a noise vector. Often, 4 is factored into two matrices, 4 = )< where ) is a "random" mixing matrix and < is a Hermitian matrix with columns that form a basis in which the input vector is sparse. A canonical example is the case in which the input is a time series with samples taken from a single sinusoid with an integer number of periods in the time window. These data are not sparse but are transformed into a sparse vector by the discrete Fourier transform (DFT). Note that although 4 is not square and hence not invertible, < is both square and invertible. Work in compressive sensing has shown (Donoho, 2006; Candes & Tao, 2006; Baraniuk, 2007) that under quite general conditions, all *N* components of *s* may be recovered from the much smaller number of measurements of *y*. With no noise (*n* = 0) recovery proceeds by minimizing the ell-1 norm of a test vector *s***'** (the ell-1 norm of *s'* is given by the sum of the absolute values of the elements of *s***'**) subject to the constraint *y* = 4 *s*'. In the presence of noise, recovery proceeds by minimizing a linear combination of the ell-1 norm of the target vector and the ell-2 norm of the residual vector given by *y* - 4 *s*

$$s'(\lambda) = \operatorname{argmin}\_{s} (\lambda \mid |s \mid |\_1 + | \mid y \text{ - } \Theta \text{ s } | \mid \text{?}) \tag{2}$$

where the parameter O is chosen such that the signal is optimally recovered (Baraniuk, 2007; Loris, 2008).

### **2.2 Orthogonal Matching Pursuit method**

Orthogonal matching pursuit (OMP) is an alternative method that can be used to find the target vector *s* from the measurement vector *y*. Matching pursuit has a rich history in signal processing long before CS and has appeared under many names (Mallat & Zhang, 1993; Pati et al., 1993; Davis et al., 1997). With the advent of CS, many variants of OMP have been applied to recovery including methods called MOMP, ROMP, CoSaMP, etc. (Needell and Tropp, 2008; Needell and Vershynin, 2009; Huang and Zhu, 2011) but with one exception (Jacques and De Vleeschouwer, 2008) discussed below, all of these methods recover frequencies (or other parameters) from discrete grids.

The basic idea of all matching pursuit algorithms is to minimize a cost function to obtain frequencies of sinusoids present in the signal. First, take the frequency corresponding to the smallest value of the cost function and calculate the linear least squares estimate for the complex amplitude at this frequency. Second, mix this sinusoid with the known CS mixing matrix ) and remove this mixed approximation to the first sinusoid from the CS measurement vector [*y* in eq. (1)]. This process is repeated until a known number of sinusoids is found or a system-defined threshold is reached. For frequencies not on the DFT

Applications of the Orthogonal Matching Pursuit/ Nonlinear

dimension *M*:

(Candes & Wakin, 2008; Loris, 2008).

observation vector *y*. The covariance of *y* is given by

Least Squares Algorithm to Compressive Sensing Recovery 173

references therein]. We take *f*min = 0 and *f*max = 1 to set up our test problem; we sample *X*(*t*) at the Nyquist rate for complex signals, '*t* = 1/*f*max=1, to obtain the sampled time series *X***<sup>S</sup>** of length *N* from *t* = 0 to *t* = *N***-1** where *N* is the number of time samples. As in all applications of compressive sensing, we make a sparsity assumption, *K* << *N*, and mix the input signal vector *X***S** plus sampled noise *n* down in dimension to the measured vector *y* of

 *y* = )( *X*<sup>S</sup> +*n*), (5) where **Ʒ** is an *M* x *N* mixing matrix and *K*<*M* << *N*. Note that in eq. (5) *n* is added to *X***<sup>S</sup>** prior to mixing. In other models noise is added to the mixed version of *X***S**, **Ʒ***X***S,** or even to **Ʒ** itself. We generate the elements of **Ʒ** using the pseudo-random number functions in our software (*Mathematica* and *Python*) such that they are taken uniformly from the set of nine complex numbers: {-1 – i, -1, -1 + i, -i, 0, i, 1 – i, 1, 1 + i} or equivalently, the elements are taken from the sum of random integers drawn from {-1,0,1} plus i times different random integers drawn from {-1,0,1}. We use a complex mixing matrix because our signal model is complex. The noise is assumed to be independent and identically distributed (i.i.d.) Gaussian noise with standard deviation V/21/2 and is added to the real and the imaginary part of each element of *X***S**, so that the covariance of *n* is V**2***I* , where *I* is the *N* x *N* identity matrix. If the frequencies lie on the DFT frequency grid associated with the time series *t* = 0 to *t* = *N***-1**, eq. (5) can be solved for the frequencies by writing *s* = **DFT** *X***S**, substituting *X***S** = **IDFT** *s* (**IDFT** = Inverse DFT) in eq. (5), and solving *y* = **Ʒ(IDFT** *s***+***n***)** by minimizing the ell-1 norm of *s* subject to the measurement constraint eq. (5) if *n* = 0 or by minimizing the ell-1 norm penalized by an arbitrary fraction of the constraint (LASSO) in the presence of noise

Although the noise is assumed to be i.i.d., the mixing matrix **Ʒ** colors the noise in the

and the standard maximum likelihood estimator requires definition of a weighting matrix *W*,

where the superscript H indicates the Hermitian conjugate (see Stoica et al., 2000 and Chan and So, 2004, for a discussion of weighted estimators in NLS). If the inverse in eq. (7) is illconditioned or does not exist, this indicates a poor choice of mixing matrix **Ʒ** and another one should be chosen. The maximum likelihood estimator (MLE) for *X***S**, *C***(***Z***)** is solved by

 *C***(***Z***) =** (**Ʒ***Z* – *y*) H *W* (**Ʒ***Z* – *y*): (8) *Z* is a vector taken from the linear subspace spanned by at most *K* complex sinusoids sampled over *t* = 0 to *N***-1** (see the corresponding equation for determining the amplitude and frequency of a sum of complex sinusoids in a system that does not have compressive sensing, Stoica and Moses, 1997, eq. 4.3.6). CS recovery is equivalent to determining the spectral support (that is, the *K* unknown frequencies) of the input signal *X***S***,* or equivalently determining the vector *Z* that minimizes eq. (8) (Duarte & Baraniuk, 2010). In the absence of noise, weighting with *W* is unnecessary because the solution is exact and both the weighted

finding the vector *Z* that minimizes the weighted square of the residual given by

Cov[*y*] = V2**Ʒ Ʒ**H, (6)

*W* = (**Ʒ Ʒ**H)-1, (7)

grid of the time series, OMP can be improved by evaluating the cost function on an overcomplete dictionary (Candes et al. 2011), but as in the ell-1 estimates discussed above, this step becomes computationally intractable long before machine precision can be obtained for arbitrary frequencies.

#### **2.3 Nonlinear Least Squares method**

Here we follow the development of nonlinear least squares (NLS) given by Stoica and Moses (1997). Their eq. (4.3.1) defines a cost function to be minimized as a function of the vectors ZDM

$$f(\alpha, \alpha, \varphi) = \Sigma\_t \left\{ y(t) \cdot \Sigma\_k \,\alpha\_k \exp\left[i(\alpha\_k t + \varphi\_k)\right] \right\} \,\tag{3}$$

where the sums are over the number of sinusoids present in the signal, *k* = 1 to *K* and the time points run from *t* = 0 to *N*-1. Stoica and Moses also show (see their eqs. 4.3.2-4.3.8), that the frequency vector Z is the critical unknown and the amplitude and phase (or complex amplitude) are simply "nuisance" parameters that are obtained from Z. While eq. (3) appears to require simultaneous solution for three real vectors, each of length K, Stoica and Moses (eqs. 4.3.2-4.3.8) show that the problem can be reduced to solving for just the frequency vector Z and that the complex amplitude vector can be calculated directly from the frequency vector. We use a version of these equations below in eqs. (8) and (13).

In principle, solution of the CS analog of eq. (3) could be performed to directly obtain the parameters of a sparse signal, but in practice, direct solution of eq. (3) is not computationally practical (Stoica and Moses, 1997). The difficulty is that even for a small *K*, eq. (3) is highly multimodal (see for example, Fig. 1 in Li et al., 2000) and the solution requires extremely good first guesses for the vector Z. Even with good initial values for Z, performance guarantees are difficult to find and continue to be the subject of intense investigation (Salzo and Villa, 2011 and references therein).

Similar two-step model-based approaches to estimating the frequency and amplitude of real and complex sinusoids have been discussed previously in the literature (Stoica et al., 2000: Li et al., 2000; Chan and So, 2004; Christensen and Jensen, 2006). Stoica et al. discuss the use of NLS to obtain the amplitude for complex sinusoidal signals given the frequency; Li et al. and Chan and So discuss a combined matching pursuit NLS approach similar to ours for obtaining the frequencies of complex and real harmonic sinusoidal signals, respectively; and Christensen and Jensen use matching pursuit plus NLS to estimate frequencies in a signal that is the sum of arbitrary frequency real sinusoids. To the best of our knowledge, our paper is the first application of an OMP/NLS algorithm to estimate the frequency and amplitude from CS measurements.
