**Adaptive Blind Channel Equalization**

Shafayat Abrar 1, Azzedine Zerguine 2 and Asoke Kumar Nandi 3

1 *Department of Electrical Engineering, COMSATS Institute of Information Technology, Islamabad 44000*  <sup>2</sup>*Department of Electrical Engineering, King Fahd University of Petroleum and Minerals, Dhahran 31261*  <sup>3</sup>*Department of Electrical Engineering and Electronics, The University of Liverpool, Liverpool L69 3BX 1Pakistan*  2 *Saudi Arabia*  3*United Kingdom* 

#### **1. Introduction**

For bandwidth-efficient communication systems, operating in high inter-symbol interference (ISi) environments, adaptive equalizers have become a necessary component of the receiver architecture . An accurate estimate of the amplitude and phase distortion introduced by the channel is essential to achieve high data rates with low error probabilities. An adaptive equalizer provides a simple practical device capable of both learning and inverting the distorting effects of the channel. In conventional equalizers, the filter tap weights are initially set using a training sequence of data symbols known both to the transmitter and receiver. These trained equalizers are effective and widely used . Conventional least mean square (LMS) adaptive filters are usually employed in such supervised receivers, see Haykin (1996).

However, there are several drawbacks to the use of training sequences. Implementing a training sequence can involve significant transceiver complexity . Like in a point-to-multipoint network transmissions, sending training sequences is either impractical or very costly in terms of data throughput. Also for slowly varying channels, an initial training phase may be tolerable. However, there are scenarios where training may not be feasible, for example, in equalizer implementations of digital cellular handsets. When the communications environment is highly non-stationary, it may even become grossly impractical to use training sequences. A blind equalizer, on the other hand, does not require a training sequence to be sent for start-up or restart. Rather, the blind equalization algorithms use a priori knowledge regarding the statistics of the transmitted data sequence as opposed to an exact set of symbols known both to the transmitter and receiver. In addition, the specifications of training sequences are often left ambiguous in standards bodies, leading to vendor specific training sequences and inter-operability problems . Blind equalization solves this problem as well, see Ding & Li (2001); Garth et al. (1998); Haykin (1994).

In this Chapter, we provide an introduction to the basics of adaptive blind equalization. We describe popular methodologies and criteria for designing adaptive algorithms for blind equalization . Most importantly, we discuss how to use the probability density function (PDF) of transmitted signal to design ISi-sensitive cost functions. We discuss the issues of admissibility of proposed cost function and stability of derived adaptive algorithm .

#### **2. Trained and blind adaptive equalizers: Historical perspectives**

Adaptive trained channel equalization was first developed by Lucky for telephone channels, see Lucky (1965; 1966). Lucky proposed the so-called zero-forcing (ZF) method to be applied in FIR equalization . It was an adaptive procedure and in a noiseless situation, the optimal ZF equalizer tends to be the inverse of the channel. In the mean time, Widrow and Hoff introduced the least mean square (LMS) adaptive algorithm which begins adaptation with the aid of a training sequence known to both transmitter and receiver, see Widrow & Hoff (1960); Widrow et al. (1975). The LMS algorithm is capable of reducing mean square error (MSE) between the equalizer output and the training sequence. Once the signal eye is open, the equalizer is then switched to tracking mode which is commonly known as decision-directed mode. The decision-directed method is unsupervised and its effectiveness depends on the initial condition of equalizer coefficients; if the initial eye is closed then it is likely to diverge.

In blind equalization, the desired signal is unknown to the receiver, except for its probabilistic or statistical properties over some known alphabets. As both the channel and its input are unknown, the objective of blind equalization is to recover the unknown input sequence based solely on its probabilistic and statistical properties, see C.R. Johnson, Jr. et al. (1998); Ding & Li (2001); Haykin (1994). Historically, the possibility of blind equalization was first discussed in Allen & Mazo (1974), where the authors proved analytically that an adjusting equalizer, optimizing the mean-squared sample values at its output while keeping a particular tap anchored at unit value, is capable of inverting the channel without needing a training sequence. In the subsequent year, Sato was the first who came up with a robust realization of an adaptive blind equalizer for PAM signals, see Sato (1975). It was followed by a number of successful attempts on blind magnitude equalization (i.e., equalization without carrier-phase recovery) in Godard (1980) for complex-valued signals (V29/QPSK/QAM), in Treichler & Agee (1983) for AM/FM signals, in Serra & Esteves (1984) and Bellini (1986) for PAM signals. However, many of these algorithms originated from intuitive starting points.

The earliest works on joint blind equalization and carrier-phase recovery were reported in Benveniste & Goursat (1984); Kennedy & Ding (1992); Picchi & Prati (1987); Wesolowski (1987). Recent references include Abrar & Nandi (2010a;b;c;d); Abrar & Shah (2006a); Abrar & Qureshi (2006b); Abrar et al. (2005); Goupil & Palicot (2007); Im et al. (2001); Yang et al. (2002); Yuan & Lin (2010); Yuan & Tsai (2005). All of these blind equalizers are capable of recovering the true power of transmitted data upon convergence and are classified as *Bussgang-type,* see Bellini (1986). The Bussgang blind equalization algorithms make use of a nonlinear estimate of the channel input. The memoryless nonlinearity, which is the function of equalizer output, is designed to minimize an ISi-sensitive cost function that implicitly exploits higher-order statistics. The performance of such kind of blind equalizer depends strongly on the choice of nonlinearity.

The first comprehensive analytical study of the blind equalization problem was presented by Benveniste, Goursat, and Ruget in Benveniste et al. (1980a;b). They established that if

the transmitted signal is composed of non-Gaussian, independent and identically distributed samples, both channel and equalizer are linear time-invariant filters, noise is negligible, and the probability density functions of transmitted and equalized signals are equal, then the channel has been perfectly equalized . This mathematical result is very important since it establishes the possibility of obtaining an equalizer with the sole aid of signal's statistical properties and without requiring any knowledge of the channel impulse response or training data sequence. Note that the very term ''blind equalization" can be attributed to Benveniste and Goursat from the title of their paper Benveniste & Goursat (1984). This seminal paper established the connection between the task of blind equalization and the use of higher-order statistics of the channel output. Through rigorous analysis, they generalized the original Sato algorithm into a class of algorithms based on non-MSE cost functions. More importantly, the convergence properties of the proposed algorithms were carefully investigated.

The second analytical landmark occurred in 1990 when Shalvi and Weinstein significantly simplified the conditions for blind equalization, see Shalvi & Weinstein (1990). Before this work, it was usually believed that one needs to exploit infinite statistics to ensure zero-forcing equalization. Shalvi and Weinstein showed that the zero-forcing equalization can be achieved if only two statistics of the involved signals are restored . Actually, they proved that, if the fourth order cumulant (kurtosis) is maximized and the second order cumulant (energy) remains the same, then the equalized signal would be a scaled and rotated version of the transmitted signal. Interesting accounts on Shalvi and Weinstein criterion can be found in Tugnait et al. (1992) and Romano et al. (2011).

#### **3. System model and "Bussgang" blind equalizer**

The baseband model for a typical complex-valued data communication system consists of an unknown linear time-invariant channel {h} which represents the physical inter-connection between the transmitter and the receiver. The transmitter generates a sequence of complex-valued random input data *{an},* each element of which belongs to a complex alphabet *A.* The data sequence *{an}* is sent through the channel whose output *Xn* is observed by the receiver. The input/ output relationship of the channel can be written as:

$$\alpha\_n = \sum\_k a\_{n-k} h\_k + \nu\_{n\prime} \tag{1}$$

where the additive noise *Vn* is assumed to be stationary, Gaussian, and independent of the channel input *an ,* We also assume that the channel is stationary, moving-average and has finite length. The function of equalizer at the receiver is to estimate the delayed version of original data, *an - J,* from the received signal *Xn ,* Let *Wn* = *[wn,O, Wn,l,* · · · , *Wn,N - 1f*  be vector of equalizer coefficients with N elements (superscript T denotes transpose). Let *Xn* = *[xn,Xn-l* · · · *,Xn-N+l]T* be the vector of channel observations. The output of the equalizer is - *H Yn* - *Wn Xn* (2)

$$y\_n = w\_n^H \mathfrak{x}\_n \tag{2}$$

where superscript H denotes conjugate transpose . If { t} = { h} \* { w\*} represents the overall channel-equalizer impulse response (where\* denotes convolution), then (2) can be expressed as:

$$y\_n = \sum\_{l} w\_l^\* x\_{n-i} = \sum\_{l} a\_{n-l} t\_l + \boldsymbol{\nu}\_n^{\prime} = \underbrace{t\_\delta a\_{n-\delta} + \sum\_{l \neq \delta} t\_l a\_{n-l} + \boldsymbol{\nu}\_n^{\prime}}\_{\text{signal} + \text{ ISI} + \text{noise}} \tag{3}$$

Equation (3) distinctly exposes the effects of multi-path inter-symbol interference and additive noise . Even in the absence of additive noise, the second term can be significant enough to cause an erroneous detection.

The idea behind a Bussgang blind equalizer is to minimize (or maximize), through the choice of equalizer filter coefficients *w,* a certain cost function J, depending on the equalizer output *Yn,* such that *Yn* provides an estimate of the source signal *an* up to some inherent indeterminacies, giving, *Yn* = *IX an-o,* where *IX* = l1Xlen E C represents an arbitrary gain. The phase 'Y represents an isomorphic rotation of the symbol constellation and hence is dependent on the rotational symmetry of signal alphabets; for example, 'Y = mrr/2 radians, with m E {O, 1,2,3} for a quadrature amplitude modulation (QAM) system. Hence, a Bussgang blind equalizer tries to solve the following optimization problem:

$$\mathfrak{w}^\dagger = \arg\text{ optimize}\_{\mathfrak{w}} J\_\prime \text{ with } I = \mathsf{E}[\mathcal{J}(y\_n)] \tag{4}$$

The cost J is an expression for implicitly embedded higher-order statistics of *Yn* and ..7 *(Yn)* is a real-valued function. Ideally, the cost J makes use of statistics which are significantly modified as the signal propagates through the channel, and the optimization of cost modifies the statistics of the signal at the equalizer output, aligning them with those at channel input. The equalization is accomplished when equalized sequence *Yn* acquires an identical distribution as that of the channel input *an,* see Benveniste et al. (1980a). If the implementation method is realized by stochastic gradient-based adaptive approach, then the updating algorithm is

$$\mathfrak{w}\_{n+1} = \mathfrak{w}\_{\mathbb{N}} \pm \mu \left( \frac{\partial \mathcal{J}}{\partial \mathfrak{w}\_{\mathbb{N}}} \right)^{\*} \tag{5a}$$

$$\mathfrak{a} = \mathfrak{w}\_{\mathfrak{n}} + \mu \Phi(y\_{\mathfrak{n}})^\* \mathfrak{x}\_{\mathfrak{n}} \quad \text{with } \Phi(y\_{\mathfrak{n}}) = \pm \frac{\partial \mathcal{J}}{\partial y\_{\mathfrak{n}}^\*} \tag{5b}$$

where *fl* is step-size, governing the speed of convergence and the level of steady-state performance, see Haykin (1996). The positive and negative signs in (Sa) are respectively for maximization and minimization. The complex-valued error-function *4>(yn)* can be understood as an estimate of the difference between the desired and the actual equalizer outputs. That is, *4>(yn)* = *'f(yn)* - *Yn,* where *'f(yn)* is an estimate of the transmitted data *an.*  The nonlinear memory-less estimate, *'f(yn),* is usually referred to as Bussgang nonlinearity and is selected such that, at steady state, when *Yn* is close to *an-o,* the autocorrelation of *Yn*  becomes equal to the cross-correlation between *Yn* and *'f(yn),* i.e.,

$$\mathbb{E}\left[y\_n \Phi(y\_{n-i})^\*\right] = 0 \Rightarrow \mathbb{E}\left[y\_n y\_{n-i}^\*\right] = \mathbb{E}\left[y\_n \Psi(y\_{n-i})^\*\right]$$

An admissible estimate of *'f(yn),* however, is the conditional expectation E *[an,* IYn], see Nikias & Petropulu (1993). Using Bayesian estimation technique, E *[an,* IYn] was derived in Bellini (1986); Fiori (2001); Haykin (1996); Pinchas & Bobrovsky (2006; 2007). These methods, however, rely on explicit computation of higher-order statistics and are not discussed here.

#### **4. Trained and blind equalization design methodologies**

Generally, a blind equalization algorithm attempts to invert the channel using both the received data samples and certain known statistical properties of the input data. For example, it is easy to show that for a minimum phase channel, the spectra of the input and output signals of the channel can be used to determine the channel impulse response . However, most communication channels do not possess minimum phase. To identify a non-minimum phase channel, a non-Gau ssian signal is required along with nonlinear processing at the receiver using higher-order moments of the signal, see Benveniste et al. (1980a;b). Based upon available analysis, simulations, and experiments in the literature, it can be said that an admissible blind cost function has two main attributes : **1)** it makes use of statistics which are significantly modified as the signal propagates through the channel, and 2) optimization of the cost function modifies the statistics of the signal at the channel output, aligning them with the statistics of the signal at the channel input.

Designing a blind equalization cost function has been lying strangely more in the realm of art than science; majority of the cost functions tend to be proposed on intuitive grounds and then validated . Due to this reason, a plethora of blind cost functions is available in literature. On the contrary, the fact is that there exist established methods which facilitate the designing of blind cost functions requiring statistical properties of transmitted and received signals. One of the earliest methods originated in late 70's in geophysics community who sought to determine the inverse of the channel in seismic data analysis and it was named minimum entropy deconvolution (MED), see Gray (1979b); Wiggins (1977). Later in early 90's, Satorius and Mulligan employed MED principle and came up with several proposals to blindly equalize the communication channels, see Satorius & Mulligan (1993). However, those marvelous signal-specific proposals regrettably failed to receive serious attention.

In the sequel, we discuss MED along with other popular methods for designing blind cost functions and corresponding adaptive equalizers.

#### **4.1 Lucky criterion**

In 1965, Lucky suggested that the propagation channel may be inverted by an equalizer if equalizer minimizes the following peak distortion criterion , see Lucky (1965):

$$J\_{\text{peak}} = \frac{1}{|t\_{\delta}|} \sum\_{l \neq \delta} |t\_{l}| \tag{6}$$

This criterion is equivalent to requiring that the equalizer maximizes the eye opening. the intuitive explanation of (6) is as follows. From (3), ignoring the noise, obtain the value of error £ due to ISi, given as£= *Yn* - *an-S* = *an- S (ts* -1) + LI# *t1 an- I·* Assuming the maximum and minimum values of *an* are ii and -ii, respectively ; the maximum error is easily written as

$$\mathcal{E}\_{\text{max}} = |y\_n - a\_{n-\delta}|\_{\text{max}} = |a\_{n-\delta}| \left| t\_\delta - 1 \right| + \overline{a} \sum\_{l \neq \delta} |t\_l| \tag{7}$$

If *t,5* is close to unity, then the cost /peak is the scaled version of *Emax* as given by /peak ,:::; *Emaxl(lt.,lti).* It is important to note that the cost !peak is a convex function of the equalizer weights and has a well-defined minimum. Thus any minimum of !peak found by gradient search or other systematic programming methods must be the absolute (or global) minimum of distortion, see Lucky (1966). To prove the convexity of /peak, it is necessary to show that for two equalizer settings w(a) and w(b) and for all *Y/,* 0 ::::; *Y/* :::; 1

$$J\_{\text{peak}}(\eta \,\text{w}^{(a)} + (1 - \eta)\text{w}^{(b)}) \le \eta I\_{\text{peak}}(\text{w}^{(a)}) + (1 - \eta)I\_{\text{peak}}(\text{w}^{(b)}) \tag{8}$$

The above equation shows that the distortion always lies on or beneath the chord joining values of distortion in N-spaces. Below is the proof of (8):

$$J\_{\text{peak}}(\eta \,\mathrm{w}^{(a)} + (1 - \eta)\mathrm{w}^{(b)}) = \sum\_{l \neq \delta} \left| \sum\_{k} h\_{k}(\eta \,(\boldsymbol{w}\_{l-k}^{(a)})^{\*} + (1 - \eta)(\boldsymbol{w}\_{l-k}^{(b)})^{\*}) \right|$$

$$\leq \eta \sum\_{l \neq \delta} \left| \sum\_{k} h\_{k}(\boldsymbol{w}\_{l-k}^{(a)})^{\*} \right| + (1 - \eta) \sum\_{l \neq \delta} \left| \sum\_{k} h\_{k}(\boldsymbol{w}\_{l-k}^{(b)})^{\*} \right| \tag{9}$$

$$= \eta \, J\_{\text{peak}}(\mathbf{w}^{(a)}) + (1 - \eta) J\_{\text{peak}}(\mathbf{w}^{(b)}).$$

However, in practice, it is not an easy one to achieve this convexity in a gradient search (adaptive) procedure, as it is necessary to obtain the projection of the gradient onto the constraint hyperplane, see Allen & Mazo (1973). Alternatively one can also seek to minimize the mean-square distortion criterion :

$$J\_{\rm ms} = \frac{1}{|t\_{\delta}|^2} \sum\_{l \neq \delta} |t\_l|^2 \tag{10}$$

The cost *Jms* is not convex but unimodal, mathematically tractable and capable of yielding admissible solution, see Allen & Mazo (1973). Under the assumption lt.,1 = max{ltl}, Shalvi & Weinstein (1990) used the expression (10) to quantify ISi, i.e., ISi = *Jms-*Using criterion (10), we can formulate the following tractable problem for ISi mitigation:

$$\mathbf{w}^{\dagger} = \arg\min\_{\mathbf{w}} \sum\_{l \neq \delta} |t\_l|^2 \quad \text{s.t.} \quad |t\_\delta| = 1. \tag{11}$$

where we have assumed that the equalizer coefficients (w+) have been selected such that the condition (lt.,I = **1)** is always satisfied. Introducing the channel autocorrelation matrix **1£,**  whose (i, *j)* element is given by *1-lij* = fa *hk-ihk-j'* we can show that I:1 lti1<sup>2</sup>= *w;f 1lwn.* The equalizer has to make one of the coefficients of { t1} say *t,5* = *to* = *w;f h* to be unity and others to be zero, where *h* = [hK-1,hK-2, · · · *,h1 , hof ;* it gives the value of ISi of an unequalized system at time index *n* as follows:

$$\text{ISI} = \frac{\text{w}\_{\text{n}}^{H} \mathbf{\mathcal{H}} \mathbf{w}\_{\text{n}}}{|\mathbf{w}\_{\text{n}}^{H} \mathbf{\hat{n}}|^{2}} - 1. \tag{12}$$

Now consider the optimization of the problem (11). Using Lagrangian multiplier A, we obtain

$$\sum\_{l \neq \delta} |t\_l|^2 + \lambda \left(t\_\delta - 1\right) = w\_n^H \mathcal{H} w\_n - 1 + \lambda \left(w\_n^H h - 1\right) \tag{13}$$

Next differentiating with respect *tow~* and equating to zero, we get *1lwn* + *Ah* = 0 =} *Wn* = -A1£- 1h. Substituting the above value of *Wn* in (12), we obtain

$$\text{ISI}\_{\text{fl}} = \frac{h^H \mathcal{H}^{-1} h}{|h^H \mathcal{H}^{-1} h|^2} - 1. \tag{14}$$

To appreciate the possible benefit of solution (14), consider a channel h\_ 1 = 1 - *e, ho* = *e* and h1 = 0, where O ~ e ~ l. Without equalizer, we have

$$\text{ISI} = \frac{(1 - \varepsilon)^2 + \varepsilon^2}{\max(1 - \varepsilon, \varepsilon)^2} - 1. \tag{15}$$

The ISi approaches zero when e is either zero or unity . Assuming a 2-tap equalizer, we obtain

$$\mathcal{H} = \begin{bmatrix} (1 - \varepsilon)^2 + \varepsilon^2 & (1 - \varepsilon)\varepsilon \\ (1 - \varepsilon)\varepsilon & (1 - \varepsilon)^2 + \varepsilon^2 \end{bmatrix} \tag{16}$$

Using (14) and (16), we obtain

$$\text{ISI} = \frac{\varepsilon^2 (1 - \varepsilon)^2}{1 - 4\varepsilon + 6\varepsilon^2 - 4\varepsilon^3 + 2\varepsilon^4}. \tag{17}$$

Refer to Fig. 1, the ISi (17) of the equalized system is lower than that of the uncompensated system . The adaptive implementation of Jrns can be realized in a supervised scenario .

Fig. 1. ISi of unequalized and equalized systems.

Combining the two expression (7) and (10), the following cost is obtained which is usually termed as mean square error (MSE) criterion, see Widrow et al. (1975):

$$J\_{\rm mse} = \mathbb{E}\left[ \left| a\_{n-\delta} - y\_n \right|^2 \right] \tag{18}$$

Minimizing (18), we obtain the following update :

$$\mathbf{w}\_{n+1} = \mathbf{w}\_n + \mu \left(a\_{n-\delta} - y\_n\right)^\* \mathbf{x}\_n. \tag{19}$$

which is known as LMS algorithm or Widrow-Hoff algorithm. Note that the parameter µ is a positive step-size and the following value of <sup>µ</sup>= µLMS ensures the stability of algorithm, see Farhang-Boroujeny (1998):

$$0 < \mu\_{\rm LMS} < \frac{1}{2NP\_{\mu}} \tag{20}$$

where *Pa* = E [la12] is the average energy of signal *an* and *N* is the length of equalizer.

#### **4.2 Minimum entropy deconvolution criterion**

The minimum entropy deconvolution (MED) is probably the earliest principle for designing cost functions for blind equalization. This principle was introduced by Wiggins in seismic data analysis in the year 1977, who sought to determine the inverse channel wt that maximizes the *kurtosis* of the deconvolved data *Yn,* see Wiggins (1977; 1978). For seismic data, which are super-Gaussian in nature, he suggested to maximize the following cost:

$$\frac{\frac{1}{B}\sum\_{b=1}^{B}|y\_{n-b+1}|^4}{\left(\frac{1}{B}\sum\_{b=1}^{B}|y\_{n-b+1}|^2\right)^2} \tag{21}$$

This deconvolution scheme seeks the smallest number of large spikes consistent with the data, thus maximizing the order or, equivalently, minimizing the entropy or disorder in the data, Walden (1985). Note that the equation (21) has the statistical form of sample kurtosis and the expression is scale-invariant. Later, in the year 1979, Gray generalized the Wiggins' proposal with two degrees of freedom as follows, Gray (1979b):

$$\mathbf{J}\_{\text{med}}^{(p,q)} \equiv \frac{\frac{1}{\mathcal{B}} \sum\_{b=1}^{\mathcal{B}} |y\_{n-b+1}|^p}{\left(\frac{1}{\mathcal{B}} \sum\_{b=1}^{\mathcal{B}} |y\_{n-b+1}|^q\right)^{\frac{p}{q}}} \tag{22}$$

The criterion was rigorously investigated in Donoho (1980), where Donoho developed general rules for designing MED-type estimators . Several cases of MED, in the context of blind deconvolution of seismic data, have appeared in the literature, like J~~~ in Ooe & Ulrych (1979), J~~~ in Wiggins (1977), lime oJ~e~e,p) in Claerbout (1977), J~~~ in Gray (1978), and J~:Jl in Gray (1979a).

In the derivation of the criterion (22), it is assumed that the original signal *an,* which is primary reflection coefficients in geophysical system or transmitted data in communication systems, can be modeled as realization of independent non-Gaussian process with distribution

$$p\_{\mathcal{A}}(a;\alpha) = \frac{\alpha}{2\beta\Gamma\left(\frac{1}{a}\right)} \exp\left(-\frac{|a|^{\alpha}}{\beta^{\alpha}}\right) \tag{23}$$

where signal *an* is real-valued, II:'. is the shape parameter, *f3* is the scale parameter, and f( ·) is the Gamma function. This family covers a wide range of distributions. The certain event

(a: = 0), double exponential (a: = 1), Gaussian (a: = 2), and uniform distributions (a: -+ oo) are all members. For geophysical deconvolution problem, we have the range 0.6 ~ a: ~ 1.5, and for communication system where the signals are uniformly distributed we have (a:-+ oo). Although, signals in communication are discrete , the equation (23) is still good to approximate some densely and uniformly distributed signal.

In the context of geophysics, where the primary coefficient *an* is super-Gaussian, maximizing the criterion (23) drives the distribution of deconvolved sequence *Yn* away from py *(yn;* p) towards py *(yn;* q), where p > q. However, for the communication blind equalization problem, the underlying distribution of the transmitted (possibly pulse amplitude modulated) data symbols are closer to a uniform density (sub-Gaussian) and thus we would minimize the cost (23) with p > q. We have the following cost for blind equalization of communication channel:

$$\mathbf{w}^{\dagger} = \begin{cases} \arg\min\_{\overline{w}} \mathbf{J}^{(p,q)}\_{\text{med}}, & \text{if } p > q, \\ \arg\max\_{\overline{w}} \mathbf{J}^{(p,q)}\_{\text{med}'}, & \text{if } p < q. \end{cases} \tag{24}$$

The feasibility of (24) for blind equalization of digital signals has been studied in Satorius & Mulligan (1992; 1993) and Benedetto et al. (2008). In Satorius & Mulligan (1992), implementing (24) with p > q, the following adaptive algorithm was obtained:

$$w\_{n+1} = w\_n + \mu \sum\_{k=1}^{\oplus} \left( \frac{\sum\_{b=1}^{B} |y\_{n-b+1}|^p}{\sum\_{b=1}^{B} |y\_{n-b+1}|^q} |y\_{n-k+1}|^{q-2} - |y\_{n-k+1}|^{p-2} \right) y\_{n-k+1}^\* \mathbf{x}\_{n-k+1}, \tag{25}$$

In the sequel, we will refer to (25) as Satorius-Mulligan algorithm (SMA). Also, for a detailed discussion on the stochastic approximate realization of MED, refer to Walden (1988).

#### **4.3 Constant modulus criterion**

The most popular and widely studied blind equalization criterion is the constant modulus criterion, Godard (1980); Treichler & Agee (1983); Treichler & Larimore (1985); it is given by

$$J\_{\rm cm} = \mathbb{E}\left[\left(|y\_n|^2 - R\_{\rm cm}\right)^2\right],\tag{26}$$

where Rem = E [lal4] /E [la12] is a statistical constant usually termed as dispersion constant. For an input signal that has a constant modulus lanl = ~, the criterion penalizes output samples *Yn* that do not have the desired constant modulus characteristics. This modulus restoral concept has a particular advantage in that it allows the equalizer to be adapted independent of carrier recovery. Because the cost is insensitive to the phase of *Yn,* the equalizer adaptation can occur independently and simultaneously with the operation of the carrier recovery system . This property also makes it applicable to analog modulation signals with constant amplitude such as those using frequency or phase modulation, see Treichler & Larimore (1985). The stochastic gradient-descent minimization of (26) yields the following algorithm :

$$\mathbf{w}\_{n+1} = \mathbf{w}\_n + \mu \left( \mathcal{R}\_{\text{cm}} - |y\_n|^2 \right) \, y\_n^\* \, \mathbf{x}\_{n\prime} \tag{27}$$

which is usually termed as constant modulus algorithm (CMA). Note that, considering B = **1,**  p = 4, q = 2 and large B, SMA (25) reduces to CMA.

If data symbols are independent and identically distributed, noise is negligible and the length of the equalizer is infinite then after some calculations, the CM cost may be expressed as a function of joint channel-equalizer impulse response coefficients as follows:

$$J\_{\rm cm} = \left(\mathsf{E}[|a\_{\boldsymbol{n}}|^4] - 2\mathsf{E}[|a\_{\boldsymbol{n}}|^2]^2\right) \sum\_{l} |t\_l|^4 + 2\mathsf{E}[|a\_{\boldsymbol{n}}|^2]^2 \left(\sum\_{l} |t\_l|^2\right)^2 - 2\mathsf{E}[|a\_{\boldsymbol{n}}|^4] \sum\_{l} |t\_l|^2 + \text{const.}\tag{28}$$

As in Godard (1980), the partial derivative of *fem* with respect to tk can be written as

$$\frac{\partial f\_{\rm cm}}{\partial t\_k} = 4t\_k \left( \mathbb{E}[|a\_{\hbar}|^4](|t\_k|^2 - 1) + 2\mathbb{E}[|a\_{\hbar}|^2]^2 \sum\_{l \neq k} |t\_l|^2 \right) \tag{29}$$

The minimum can be found by solving *aj~* = 0, i.e.,

$$t\_k\left(\mathsf{E}[|a\_{\mathsf{H}}|^4](|t\_k|^2 - 1) + 2\mathsf{E}[|a\_{\mathsf{H}}|^2]^2 \sum\_{l \neq k} |t\_l|^2\right) = 0, \; \forall \; k \tag{30}$$

Unfortunately, the set of equations has an infinite number of solutions; the cost *fem* is thus non-convex. The solutions TM, *M* = 1,2, ·••,can be represented as follows: all elements of the set { t1} are equal to zero, except *M* of them and those non-zero elements have equal magnitude of o-;\_*4* defined by

$$
\sigma\_{\mathcal{M}}^2 = \frac{\mathbb{E}[|a\_{\boldsymbol{n}}|^4]}{\mathbb{E}[|a\_{\boldsymbol{n}}|^4] + \mathbb{2}(\mathcal{M} - 1)\mathbb{E}[|a\_{\boldsymbol{n}}|^2]^2} \tag{31}
$$

Among these solutions, under the condition E[lanl4] < 2E[lanl2]2, the solution <sup>T</sup>*1* is that for which the energy is the largest at the equalizer output and ISi is zero . The absolute minimum of *fem* is therefore reached in the case of zero ISL

#### **4.4 Shtrom-Fan criterion**

In the year 1998, Shtrom and Fan presented a class of cost functions for achieving blind equalization which were solely the function of {t} parameters, see Shtrom & Fan (1998). They suggested to minimize the difference between any two norms of the joint channel-equalizer impulse response, each raised to the same power, i.e.,

$$J\_{\rm sf} = \left(\sum\_{l} |t\_l|^p\right)^{r/p} - \left(\sum\_{l} |t\_l|^q\right)^{r/q}, \ p < q \text{ and } r > 0 \tag{32}$$

where p, q, r E ~ - This proposal was based on the following property of vector norms:

$$\lim\_{s\to 0} \sqrt[s]{\sum\_{l} |t\_l|^s} \ge \dots \ge \sqrt[p]{\sum\_{l} |t\_l|^p} \ge \sqrt[q]{\sum\_{l} |t\_l|^q} \ge \dots \ge \lim\_{m \to \infty} \sqrt[m]{\sum\_{l} |t\_l|^m} \tag{33}$$

where p < q and equality occurs if and only if t1 = ±b1-k, *k* E Z, which is precisely the zero-forcing condition . From the above there is a multitude of cost functions to choose from. From (32), we have following possibilities to minimize:

$$
\Sigma\_l |t\_l| - \max\_l \{ |t\_l| \}, \ (p = 1, q \to \infty, r = 1) \tag{34a}
$$

$$(\sum\_{l} |t\_l|)^2 - \sum\_{l} |t\_l|^2, \quad (p=1, q=r=2) \tag{34b}$$

$$(\sum\_{l}|t\_{l}|^{2})^{2} - \sum\_{l}|t\_{l}|^{4}, \quad (p=2, q=r=4) \tag{34c}$$

$$(\sum\_{l}|t\_{l}|^{2})^{3} - \sum\_{l}|t\_{l}|^{6}, \quad (p=2, q=r=6) \tag{34d}$$

$$(\sum\_{l}|t\_{l}|^{4})^{2} - \sum\_{l}|t\_{l}|^{8}, \quad (p=4, q=r=8). \tag{34e}$$

Some of these cost functions are easily implementable, whereas others are not.

Consider p = 2 and q = *r* = *2m* in (32) to obtain a subclass:

$$J\_{\rm sf}^{\rm sub} = \left(\sum\_{l} |t\_{l}|^{2}\right)^{m} - \sum\_{l} |t\_{l}|^{2m} \tag{35}$$

This subclass is not convex, although it is potentially unimodal in *t* domain and easily implementable. As in Shtrom & Fan (1998), the partial derivative of f!tb with respect to tk can be written as

$$\frac{\partial f\_{\rm sf}^{\rm sub}}{\partial t\_k} = 2mt\_k \left( \left( \sum\_l |t\_l|^2 \right)^{m-1} - |t\_k|^{2(m-1)} \right) \tag{36}$$

The equation (36) has two solutions, one of which corresponds to *t*1 = 0, *Vl.* This solution will not occur if a constraint is imposed. The other solution is the minimum corresponding to zero-forcing condition. This is seen from (36) as LI lt112 = 1tkl2, which can only hold when *t*  has at most one nonzero element, i.e., the desired delta function. Now compare this result with that of constant modulus in equation (31) which contains multiple nonzero-forcing solutions . It means, in contrast to CMA, it is less likely to have local minima in J!t in equalizer domain .

The cost functions (34a), in their current form, are not directly applicable in real scenario as we have no information of {t}'s. These costs need to be converted from functions of {t}'s to functions of *Yn's.* As in Shtrom & Fan (1998), we can show that

$$|\Sigma\_l|t\_l|^2 = \frac{\mathbf{C}\_{1,1}^{y\_n}}{\mathbf{C}\_{1,1}^d} = \frac{\mathbf{E}[|y\_n|^2]}{\mathbf{E}[|a|^2]}\tag{37a}$$

$$\Sigma\_l |t\_l|^4 = \frac{\mathbb{C}\_{2,2}^{y\_u}}{\mathbb{C}\_{2,2}^d} = \frac{\mathbb{E}[|y\_n|^4] - 2\,\mathbb{E}[|y\_n|^2]^2}{\mathbb{E}[|a|^4] - 2\,\mathbb{E}[|a|^2]^2} \tag{37b}$$

$$\Sigma\_{l}|t\_{l}|^{6} = \frac{\mathbf{C}\_{3,3}^{y\_{n}}}{\mathbf{C}\_{3,3}^{q}} = \frac{\mathbf{E}[|y\_{n}|^{6}] - 9\,\mathbf{E}[|y\_{n}|^{4}]\mathbf{E}[|y\_{n}|^{2}] + 12\,\mathbf{E}[|y\_{n}|^{2}]^{3}}{\mathbf{E}[|a|^{6}] - 9\,\mathbf{E}[|a|^{4}]\mathbf{E}[|a|^{2}] + 12\,\mathbf{E}[|a|^{2}]^{3}} \tag{37c}$$

where *q,q* is *(p* + *q* )th order cumulant of complex random variable defined as follows :

$$\mathsf{C}\_{p,q}^{z} = \text{cumulant}\left(\underbrace{z\mathrel{\mathop{:}\_{q}}\cdots\underset{p}{\mathop{:}\_{q}}}\_{p \text{ terms}}\underbrace{z\mathrel{\mathop{:}\_{q}}\cdots\underset{q}{\mathop{:}\_{q}}}\_{q \text{ terms}}\right) \tag{38}$$

Using (37a) and assuming m = 2, we obtain the following expression for f!tb:

$$J\_{\rm sf}^{\rm sub} = \left(\frac{\mathbb{E}[|y\_n|^2]}{\mathbb{E}[|a|^2]}\right)^2 - \frac{\mathbb{E}[|y\_n|^4] - 2\operatorname{E}[|y\_n|^2]^2}{\mathbb{E}[|a|^4] - 2\operatorname{E}[|a|^2]^2} \tag{39}$$

Minimizing (39) with respect to coefficients w\*, we obtain the following adaptive algorithm :

$$\mathbf{w}\_{\eta+1} = \mathbf{w}\_{\eta} + \mu \left( \frac{\mathbf{R\_{cm}}}{P\_{\boldsymbol{\theta}}} \hat{\mathbf{E}}[|y\_{\boldsymbol{\eta}}|^{2}] - |y\_{\boldsymbol{\eta}}|^{2} \right) y\_{\boldsymbol{\eta}}^{\*} \mathbf{x}\_{\eta} \tag{40a}$$

$$\widehat{\mathbb{E}}[|y\_{n+1}|^2] = \widehat{\mathbb{E}}[|y\_n|^2] + \frac{1}{n} \left( |y\_n|^2 - \widehat{\mathbb{E}}[|y\_n|^2] \right) \tag{40b}$$

where *Pa* is the average energy of signal *an* and Rem is the same statistical constant as we defined in CMA. Note that the algorithm requires an iterative estimate of equalizer output energy. We will refer to (40) as Shtrom-Fan Algorithm (SFA).

#### **4.5 Shalvi-Weinstein criterion**

Earlier to Shtrom and Fan, in the year 1990, Shalvi and Weinstein suggested a criterion that laid the theoretical foundation to the problem of blind equalization, see Shalvi & Weinstein (1990). They demonstrated that the condition of equality between the PDF's of the transmitted and equalized signals, due to BGR theorem Benveniste et al. (1980a;b), was excessively tight. Under the similar assumptions, as laid by Benveniste *et al.,* they demonstrated that it is possible to perform blind equalization by satisfying the condition E[IYnl2] = E[lanl2] and ensuring that a nonzero cumulant of order higher than 2 of *an* and *Yn* are equal.

For a two dimensional signal *an* with four-quadrant symmetry (i.e., E[a;] = 0), they suggested to maximize the following unconstrained cost function (which involved second and fourth order cumulants) :

$$J\_{\rm SW} = \mathfrak{sgn}\left[\mathbb{C}\_{2,2}^{g\_n}\right] \left(\mathbb{C}\_{2,2}^{y\_n} + (\gamma\_1 + 2)\left(\mathbb{C}\_{1,1}^{y\_n}\right)^2 + 2\gamma\_2 \mathbb{C}\_{1,1}^{y\_n}\right) \tag{41a}$$

$$\mathbf{y} = \mathbf{s} \mathbf{g} \mathbf{n} \left[ \mathbf{C}\_{2,2}^{a\_n} \right] \left( \mathbf{E} \left[ |y\_n|^4 \right] + \gamma\_1 \mathbf{E} \left[ |y\_n|^2 \right]^2 + 2\gamma\_2 \mathbf{E} \left[ |y\_n|^2 \right] \right) \tag{41b}$$

where -y1 and 'Y2 are some statistical constants. The corresponding stochastic gradient algorithm is given by

$$w\_{n+1} = w\_n + \mu \operatorname{sgn}[\mathbb{C}\_{2,2}^{d\_n}] \left( \gamma\_1 \hat{\mathbb{E}}[|y\_n|^2] + |y\_n|^2 + \gamma\_2 \right) y\_n^\* \ge\_{n\prime} \tag{42a}$$

$$\widehat{\mathsf{E}}[|y\_{n+1}|^2] = \widehat{\mathsf{E}}[|y\_n|^2] + \frac{1}{n} \left( |y\_n|^2 - \widehat{\mathsf{E}}[|y\_n|^2] \right) \tag{42b}$$

where ( sgn[ c;:z] = -1) due to the sub-Gaussian nature of digital signals . The above algorithm is usually termed as Shalvi-Weinstein algorithm (SWA). Note that SWA unifies CMA and SFA (i.e., the specific case in Equation (40)). Substituting -y1 = 0 and 'Y2 = -Rem in SWA, we obtain CMA (27). Similarly, substituting 'Y2 = 0 and 'Yl = -Rem/ *Pa* in SWA, we obtain SFA (40). Note that the Shtrom-Fan criterion appears to be the generalization of Shalvi-Weinstein criterion with cumulants of generic orders.

#### **5. Blind equalization of APSK signal: Employing MED principle**

#### **5.1 Designing a blind cost function**

We employ MED principle and use the PDFs of transmitted amplitude-phase shift-keying (APSK) and ISi-affected received signal to design a cost function for blind equalization. Consider a continuous APSK signal, where signal alphabets { *a R* + J *a* I} E A are assumed to be uniformly distributed over a circular region of radius *Ra* and center at the origin. The joint PDF of *aR* and *a1* is given by (refer to Fig. 3(a))

$$p\_{\mathcal{A}}(a\_{\mathcal{R}} + j a\_I) = \begin{cases} \frac{1}{\pi R\_{\mathcal{A}}^2} & \sqrt{a\_{\mathcal{R}}^2 + a\_I^2} \le R\_{\mathcal{A}} \\ 0, & \text{otherwise.} \end{cases} \tag{43}$$

Now consider the transformation *Y* = *Ja~* + *ay* and 0 = *L\_(aR,a1),* where *Y* is the modulus and L\_() denotes the angle in the range (0,2rr) that is defined by the point (i,j). The joint distribution of the modulus Y and 0 can be obtained as py,0 (y,0) = y/(nR~), g ~ 0, 0 :::; *0* < 2n. Since *Y* and 0 are independent, we obtain a triangular distribution for *Y* given by py *(g* : Ho) = *2g IR~, g* ~ 0, where Ho denotes the hypothesis that signal is distortion-free . Let *Yn, Yn- l,* · · · *,Yn - N+l* be a sequence, of size N, obtained by taking modulus of randomly

Fig. 2. a) A continuous APSK, and b) a discrete practical 16APSK.

generated distortion-free signal alphabets A, where subscript *n* indicates discrete time index . Let Z1, *Z2,* · · · *,ZN* be the order statistic of sequence {Y}. Let *py(Yn,Yn-l,* · · · *,Yn-N+l* : Ho) be an N-variate density of the continuous type, then, under the hypothesis Ho, we obtain

$$p\_{\mathcal{Y}}(\tilde{y}\_n, \tilde{y}\_{n-1}, \dots, \tilde{y}\_{n-N+1} : H\_0) = \frac{2^N}{R\_d^{2N}} \prod\_{k=1}^N \tilde{y}\_{n-k+1}.\tag{44}$$

Next we find *Py(Yn, Yn-l,* · · · , *Yn-N+l* : Ho) as follows:

$$p\_{\mathcal{Y}}^{\*}(\tilde{y}\_{n}, \tilde{y}\_{n-1}, \dots, \tilde{y}\_{n-N+1} : H\_{0}) = \int\_{0}^{\infty} p\_{\mathcal{Y}}(\lambda \tilde{y}\_{n}, \lambda \tilde{y}\_{n-1}, \dots, \lambda \tilde{y}\_{n-N+1} : H\_{0}) \lambda^{N-1} \,\mathrm{d}\lambda$$

$$= \frac{2^{N}}{R\_{a}^{2N}} \prod\_{k=1}^{N} \tilde{y}\_{n-k+1} \int\_{0}^{\mathrm{Ra}/\tilde{z}\_{N}} \lambda^{2N-1} \,\mathrm{d}\lambda = \frac{2^{N-1}}{N \left(\tilde{z}\_{N}\right)^{2N}} \prod\_{k=1}^{N} \tilde{y}\_{n-k+1}.$$

where z1,z2, *···,ZN* are the order statistic of elements *Yn,Yn-l,* · · · *,Yn-N+l,* so that z1 = min{g} and *ZN* = max{g}. Now consider the next hypothesis (H1) that signal suffers with multi-path interference as well as with additive Gaussian noise (refer to Fig. 3(b)). Due to which, the in-phase and quadrature components of the received signal are modeled as normal distributed; owing to central limit theorem, it is theoretically justified. It means that the modulus of the received signal follows Rayleigh distribution,

$$p\_{\mathcal{Y}}(\mathcal{Y}:H\_1) = \frac{\mathcal{Y}}{\sigma\_{\tilde{\mathcal{Y}}}^2} \exp\left(-\frac{\tilde{y}^2}{2\sigma\_{\tilde{\mathcal{Y}}}^2}\right), \mathcal{Y} \ge 0, \sigma\_{\tilde{\mathcal{Y}}} > 0. \tag{46}$$

Fig. 3. PDFs (not to scale) of a) continuous APSK and b) Gaussian distributed received signal. The N-variate densities *py(Yn,Yn-l,* · · · *,Yn-N+l* : H1) and *Py(Yn,Yn-l,* · · · *,Yn-N+l* : H1) are obtained as

$$p\_{\mathcal{Y}}(\tilde{y}\_{n}, \tilde{y}\_{n-1}, \dots, \tilde{y}\_{n-N+1} : H\_{1}) = \frac{1}{\sigma\_{\tilde{g}}^{2N}} \prod\_{k=1}^{N} \tilde{y}\_{n-k+1} \exp\left(-\frac{\tilde{y}\_{n-k+1}^{2}}{2\sigma\_{\tilde{g}}^{2}}\right) \tag{47}$$

$$p\_{\tilde{\mathcal{Y}}}^{\*}(\tilde{y}\_{n}, \tilde{y}\_{n-1}, \dots, \tilde{y}\_{n-N+1} : H\_{1}) = \frac{\prod\_{k=1}^{N} \tilde{y}\_{n-k+1}}{\sigma\_{\tilde{\mathcal{Y}}}^{2N}} \int\_{0}^{\infty} \exp\left(-\frac{\lambda^{2} \sum\_{k'=1}^{N} \tilde{y}\_{n-k'+1}^{2}}{2\sigma\_{\tilde{\mathcal{Y}}}^{2}}\right) \lambda^{2N-1} \,\mathrm{d}\lambda \text{ (48)}$$

S b ti . - 1 1 2 - 2 '\'N -2 b . u s tuting u - *21~* cr9 *L..k'=l Yn- k'+l'* we o tam

$$p\_{\mathcal{Y}}^{\*}(\mathcal{Y}\_{n}, \mathcal{Y}\_{n-1}, \dots, \mathcal{Y}\_{n-N+1} : H\_{1}) = \frac{2^{N-1} \Gamma\left(N\right)}{\left(\sum\_{k=1}^{N} \mathcal{Y}\_{n-k+1}^{2}\right)^{N}} \prod\_{k=1}^{N} \mathcal{Y}\_{n-k+1} \tag{49}$$

The scale-invariant uniformly most powerful test of *Py(Yn,Yn- 1,* · · · *,Yn- N+l* : Ho) against *Py(Yn,Yn- l,* · · · *,Yn- N+l :* H1) providesus,seeSidaketal. (1999):

$$O(\tilde{y}\_n) = \frac{p\_{\mathcal{Y}}^\*(\tilde{y}\_n, \tilde{y}\_{n-1}, \dots, \tilde{y}\_{n-N+1} : H\_0)}{p\_{\mathcal{Y}}^\*(\tilde{y}\_n, \tilde{y}\_{n-1}, \dots, \tilde{y}\_{n-N+1} : H\_1)} = \frac{1}{N!} \left[ \frac{\sum\_{k=1}^N \tilde{y}\_{n-k+1}^2}{\tilde{z}\_N^2} \right]^N \underset{H\_1}{\gtrless} C \tag{50}$$

where C is a threshold . Assuming large *N,* we can approximate iJ Lf=i *Y~-k+l* ,:j **E** [1Ynl2] . It helps obtaining a statistical cost for the blind equalization of APSK signal as follows:

$$w^\dagger = \arg\max\_{\overline{w}} \frac{\mathbb{E}\left[ \left| y\_n \right|^2 \right]}{\left( \max \left\{ \left| y\_n \right| \right\} \right)^2} \tag{51}$$

Based on the previous discussion, maximizing cost (51) can be interpreted as determining the equalizer coefficients, *w,* which drives the distribution of its output, *Yn,* away from Gaussian distribution toward uniform, thus removing successfully the interference from the received APSK signal. Note that the above result (51) may be obtained directly from (24) by substituting *p* = 2 and *q-+* oo, see Abrar & Nandi (2010b).

#### **5.2 Admissibility of the proposed cost**

The cost (51) demands maximizing equalizer output energy while minimizing the largest modulus . Since the largest modulus of transmitted signal *an* is *Ra,* incorporating this *a priori*  knowledge, the unconstrained cost (51) can be written in a constrained form as follows:

$$\arg \mathbf{w}^{\dagger} = \arg \max\_{\mathbf{w}} \mathbb{E}\left[ |y\_{\boldsymbol{\eta}}|^2 \right] \text{ s.t. } \max\left\{ |y\_{\boldsymbol{\eta}}| \right\} \le \mathcal{R}\_{\boldsymbol{\theta}}.\tag{52}$$

By incorporating *Ra,* it would be possible to recover the true energy of signal *an* upon successful convergence. Also note that max{IYnl} = Ra LI ltil and E[IYnl2] = Pa LI ltil2- Based on which, we note that the cost (52) is quadratic, and the feasible region (constraint) is a convex set (proof of which is similar to that in Equation (9)). The problem, however, is non-convex and may have multiple local maxima. Nevertheless, we have the following theorem :

**Theorem:** Assume wt is a local optimum in (52), and tt is the corresponding total channel equalizer impulse-response and channel noise is negligible . Then it holds I tI I = JI \_zt.

**Proof:** Without loss of generality we assume that the channel and equalizer are real-valued. We re-write (52) as follows:

$$\boldsymbol{w}^{\dagger} = \arg\max\_{\mathbf{w}} \sum\_{l} t\_{l}^{2} \text{ s.t. } \sum\_{l} |t\_{l}| \le 1. \tag{53}$$

Now consider the following quadratic problem in *t* domain

$$\mathbf{t}^{\dagger} = \arg\max\_{\mathbf{t}} \sum\_{l} t\_{l}^{2} \text{ s.t. } \sum\_{l} |t\_{l}| \le 1. \tag{54}$$

Assume tU ) is a feasible solution to (54). We have

$$\sum\_{l} t\_{l}^{2} \le \left(\sum\_{l} |t\_{l}|\right)^{2} \le 1\tag{55}$$

and

$$\left(\sum\_{I} |t\_{I}|\right)^{2} = \sum\_{I} t\_{I}^{2} + \sum\_{l\_{1}} \sum\_{l\_{2}, l\_{2} \neq l\_{1}} |t\_{I\_{1}} t\_{l\_{2}}|\tag{56}$$

The first equality in (55) is achieved if and only if all cross terms in (56) are zeros. Now assume that t(k) is a local optimum of (54), i.e., the following proposition holds

$$\exists \varepsilon > 0, \; \forall \mathbf{t}^{\langle f \rangle}, \; \|\mathbf{t}^{\langle f \rangle} - \mathbf{t}^{\langle k \rangle}\|\_{2} \le \varepsilon \tag{57}$$

⇒ Li(t?)) 2 ~ Lz(tf/)) 2 . Suppose t(k) does not satisfy the **Theorem.** Consider t(c) defined by

$$t\_{l\_1}^{(c)} = t\_{l\_1}^{(k)} + \frac{\varepsilon}{\sqrt{2}},$$

$$t\_{l\_2}^{(c)} = t\_{l\_2}^{(k)} - \frac{\varepsilon}{\sqrt{2}},$$

and *t?)* = *t?), l -:f* 11, 12. We also assume that *tt)* < {k) . Next, we have llt(c) - t(k) 112 = *e,*  and LI lt}c)I = LI lt?)I :::; l. However, one can observe that

$$\sum\_{l} (t\_l^{(k)})^2 - \sum\_{l} (t\_l^{(c)})^2 = \sqrt{2}\varepsilon \left( t\_{l\_2}^{(k)} - t\_{l\_1}^{(k)} \right) - \varepsilon^2 < 0,\tag{58}$$

which means t(k) is not a local optimum to (54). Therefore, we have shown by a counterexample that all local maxima of (54) should satisfy the **Theorem.** 

#### **5.3 Adaptive optimization of the proposed cost**

For a stochastic gradient-based adaptive implementation of (52), we need to modify it to involve a *differentiable* constraint; one of the possibilities is

$$\mathfrak{w}^{\dagger} = \arg\max\_{\mathfrak{w}} \mathbb{E}\left[|y\_{\mathfrak{n}}|^2\right] \text{ s.t. } \text{fmax}(\mathcal{R}\_{a\prime}|y\_{\mathfrak{n}}|) = \mathcal{R}\_{a\prime} \tag{59}$$

where we have used the following identity (below *a, b* E C):

$$\text{fmax}(|a|, |b|) \equiv \frac{| |a| + |b| | + ||a| - |b||}{2} = \begin{cases} |a|, & \text{if } |a| \ge |b| \\ |b|, & \text{otherwise.} \end{cases} \tag{60}$$

The function fmax is differentiable, viz

$$\frac{\partial \operatorname{fmax}(|a|, |b|)}{\partial a^\*} = \frac{a \left(1 + \operatorname{sgn}(|a| - |b|) \right)}{4|a|} = \begin{cases} a/(2|a|), & \text{if } |a| > |b|\\ 0, & \text{if } |a| < |b| \end{cases} \tag{61}$$

If IYn I < *Ra,* then the cost (59) simply maximizes output energy. However, if IYn I > *Ra,*  then the constraint is violated and the new update *Wn+* 1 is required to be computed such that the magnitude of *a posteriori* output *w~+1xn* becomes smaller than or equal to *Ra.* Next, employing Lagrangian multiplier, we get

$$\mathfrak{w}^{\dagger} = \arg\max\_{\mathfrak{w}} \left\{ \mathbb{E}[|y\_n|^2] + \lambda (\text{fmax} \left( \mathcal{R}\_{a\nu} |y\_n| \right) - \mathcal{R}\_a) \right\}. \tag{62}$$

The stochastic approximate gradient-based optimization of wt = argmaxw E[.J] is realized as *Wn+l* = *Wn* + µ *dJ* /iJw~, whereµ > 0 is a small step-size. Differentiating (62) with respect *tow~* gives

 $\frac{\partial |y\_n|^2}{\partial w\_n^\*} = \frac{\partial |y\_n|^2}{\partial y\_n}$  $\frac{\partial y\_n}{\partial w\_n^\*} = y\_n^\*$   $\mathbf{x}\_n$ 

and

$$\frac{\partial \text{fmax}(R\_{a\nu} \| y\_{\text{\textquotedblleft}}))}{\partial \mathbf{w}\_{\text{\textquotedblright}}^{\*}} = \frac{\partial \text{fmax}(R\_{a\nu} \| y\_{\text{\textquotedblleft}}))}{\partial y\_{\text{\textquotedblleft}}} \frac{\partial y\_{\text{\textquotedblleft}}}{\partial \mathbf{w}\_{\text{\textquotedblright}}^{\*}} = \frac{g\_{\eta} y\_{\eta}^{\*}}{4 \| y\_{\text{\textquotedblleft}} \|} \mathbf{x}\_{\text{\textquotedblleft}} \right\| $$

where *gn* = 1 + sgn (IYnl - *Ra);* we obtain *Wn+l* = *Wn* + *µ(1* + *Agni* (41Ynl)) *y~xn.* If IYnl < *Ra,* then *gn* = 0 and *Wn+* 1 = *Wn* + *µy~xn.* Otherwise, if *IYn* I > *Ra,* then *gn* = 2 and

$$
\mathfrak{w}\_{n+1} = \mathfrak{w}\_n + \mu \left( 1 + \lambda / (2 \, |y\_n|) \right) y\_n^\* \mathfrak{x}\_n.
$$

As mentioned earlier, in this case, we have to compute A such that *w;{+l Xn* lies inside the circular region without sacrificing the output energy. Such an update can be realized by minimizing *IYn* 12 while satisfying the Bussgang condition, see Bellini (1994). Note that the satisfaction of Bussgang condition ensures recovery of the true signal energy upon successful convergence. One of the possibilities is A = -2(1 + */3) IYn* I, *(3* > 0, which leads to

$$
\mathfrak{w}\_{n+1} = \mathfrak{w}\_n + \mu(-\beta)\mathfrak{y}\_n^\*\mathfrak{x}\_n.
$$

The Bussgang condition requires

$$\underbrace{\mathbb{E}\left[y\_{n}y\_{n-i}^{\*}\right]}\_{|y\_{n}|R\_{\sigma}} = 0, \ \forall i \in \mathbb{Z} \tag{63}$$

In steady-state, we assume *Yn* = *an- o* + *Un,* where *Un* is convolutional noise. For *i* =/. 0, (63) is satisfied due to uncorrelated *an* and independent and identically distributed samples of *Un.*  Let *an* comprises *M* distinct symbols on L moduli { R1, • • • , RL}, where *RL* = *Ra* is the largest modulus. Let Mi denote the number of unique (distortion-free) symbols on the ith modulus, i.e., LT=l *Mz* = M. With negligible *Un,* we solve (63) for i = 0 to get

$$M\_1 R\_1^2 + \dots + M\_{L-1} R\_{L-1}^2 + \frac{1}{2} M\_L R\_L^2 - \frac{\beta}{2} M\_L R\_L^2 = 0 \tag{64}$$

The last two terms indicate that, when *IYn* I is close to *RL,* it would be equally likely to update in either direction . Noting that LT=l *MzRf* = *MPa,* the simplification of (64) gives

$$\beta = 2 \frac{M}{M\_L} \frac{P\_a}{R\_a^2} - 1. \tag{65}$$

The use of (65) ensures recovery of true signal energy upon successful convergence . Finally the proposed algorithm is expressed as

$$\begin{array}{l} \mathfrak{w}\_{n+1} = \mathfrak{w}\_{n} + \mu \operatorname{f}(y\_{n}) \, y\_{n}^{\star} \, \mathfrak{x}\_{n} \\ \operatorname{f}(y\_{n}) = \begin{cases} 1, \text{ if } |y\_{n}| \le R\_{d} \\ -\beta, \text{ if } |y\_{n}| > R\_{d} .\end{cases} \end{array} \tag{66}$$

Note that the error-function f(y n)Y~ has 1) finite derivative at the origin, 2) is increasing for IYnl < *Ra,* 3) decreasing for IYnl > *Ra* and 4) insensitive to phase/frequency offset errors. In Baykal et al. (1999), these properties 1)-4) have been regarded as essential features of a constant modulus algorithm ; this motivates us to denote (66) as ,BCMA.

#### **5.4 Stability of the derived adaptive algorithm**

In this Section, we carry out a deterministic stability analysis of ,BCMA for any bounded magnitude received signal. The analysis relies on the analytical framework of Rupp & Sayed (2000). We shall assume that the successive regression vectors *{Xi}* are nonzero and also uniformly bounded from above and from below. We update the equalizer only when its output amplitude is higher than certain threshold; we stop the update otherwise.

In our analysis, we assume that the threshold is *Ra.* So we only consider those updates when IYn I > *Ra;* we extract and denote the active update steps with time index *k.* We study the following form:

$$
\omega\_{k+1} = \mathbf{w}\_k + \mu\_k \Phi\_k^\* \mathbf{x}\_k, \ \Phi\_k \neq 0, \ k = 0, 1, 2, \cdots \tag{67}
$$

where <Pk = <I>(yk) = f(yk)Yk· Let w\* denote vector of the optimal equalizer and let zk = wf xk = ak- 6 is the optimal output so that lzkl = *Ra.* Define the a priori and a posteriori estimation errors **a** - <sup>H</sup>ek := zk - Yk = wk xk

$$\begin{aligned} e\_k^\mathbf{a} &:= z\_k - y\_k = \overline{w}\_k^H \boldsymbol{x}\_k \\ e\_k^\mathbf{P} &:= z\_k - s\_k = \overline{w}\_{k+1}^H \boldsymbol{x}\_k \end{aligned} \tag{68}$$

where wk = w\* - wk· We assume that lekl is small and equalizer is operating in the vicinity of w\*. We introduce a function s(x, *y):* 

$$\mathfrak{f}(\mathbf{x}, y) := \frac{\Phi(y) - \Phi(\mathbf{x})}{\mathbf{x} - y} = \frac{\mathbf{f}(y)\,y - \mathbf{f}(\mathbf{x})\,\mathbf{x}}{\mathbf{x} - y}, \ (\mathbf{x} \neq y) \tag{69}$$

Using s(x, y) and simple algebra, we obtain

$$\Phi\_k = \mathbf{f}(y\_k) \, y\_k = \boldsymbol{\xi}(z\_{k'} y\_k) e\_k^\mathbf{a} \tag{70}$$

$$e\_k^{\mathbf{p}} = \left(1 - \frac{\mu\_k}{\overline{\mu}\_k} \zeta(z\_k, y\_k)\right) e\_k^{\mathbf{a}} \tag{71}$$

where µk = l/ llxkll 2. For the stability of adaptation, we require lef I < *len* To ensure it, we require to guarantee the following for all possible combinations of zk and yk:

$$\left|1 - \frac{\mu\_k}{\overline{\mu}\_k} \xi(z\_k, y\_k)\right| < 1, \ \forall k \tag{72}$$

Now we require to prove that the real part of the function s(zk, Yk) defined by (69) is positive and bounded from below. Recall that lzkl = *Ra* and IYkl > *Ra.* We start by writing *zkf* yk = rej<p for some r < land for some <P E [0, 2n) . Then expression (69) leads to

$$
\xi(z\_k, y\_k) = \frac{\overbrace{\mathbf{f}(y\_k)}^{(=-\beta)} \underbrace{\overbrace{\mathbf{y}\_k - \mathbf{f}(z\_k)}^{(=0)} z\_k}\_{z\_k - y\_k} = \frac{\beta y\_k}{y\_k - z\_k} = \frac{\beta}{1 - re^{j\phi}}.\tag{73}
$$

It is important for our purpose to verify whether the real part of f3 *I* ( **1** - *rejcp)* is positive . For any fixed value of *r,* we allow the angle *cp* to vary from zero to 2rr, then the term *{31* (1 - *rejcp)*  describes a circle in the complex plane whose least positive value is f3 *I* ( 1 + *r),* obtained for *cp* = n, and whose most positive value is /3/(l - r), obtained for *cp* = 0. This shows that for r E (0, 1 ), the real part of the function s(zk, Yk) lies in the interval

$$\frac{\beta}{1+r} \le \xi\_R(z\_{k'}y\_k) \le \frac{\beta}{1-r} \tag{74}$$

Referring to Fig. 4, note that the function s(zk, Yk) assumes values that lie inside a circle in the

Fig. 4. Plot of s(zbyk) for arbitrary *{3,* rand *cp* E [0,2rr).

right-half plane. From this figure, we can obtain the following bound for the imaginary part of s(zbyk) (that is s1(zk,Yk)):

$$-\frac{\beta \, r}{1 - r^2} \le \mathfrak{T}\_l(z\_{k'} y\_k) \le \frac{\beta \, r}{1 - r^2}.\tag{75}$$

Let A and B be any two positive numbers satisfying

$$A^2 + B^2 < 1.\tag{76}$$

We need to find a /Jk that satisfies

$$\left|\frac{\mu\_k}{\overline{\mu}\_k}\mathfrak{F}\_I(z\_k, y\_k)\right| < A \implies \mu\_k < \frac{A\,\overline{\mu}\_k}{|\mathfrak{F}\_I(z\_k, y\_k)|}\tag{77}$$

$$\left|1 - \frac{\mu\_k}{\overline{\mu}\_k} \overline{\zeta}\_R(z\_k, y\_k)\right| < \mathcal{B} \implies \mu\_k > \frac{(1 - \mathcal{B})\overline{\mu}\_k}{\overline{\zeta}\_R(z\_k, y\_k)}\tag{78}$$

$$0 < \frac{(1 - B)\overline{\mu}\_k}{\overline{\xi}\_R(z\_{k'} y\_k)} < \mu\_k < \frac{A \,\overline{\mu}\_k}{|\overline{\xi}\_I(z\_{k'} y\_k)|} < 1\tag{79}$$

Using the extremum values of sR(zk,Yk) and s1(zbyk), we obtain

$$\frac{(1+r)(1-B)}{\beta||\mathfrak{x}\_k||^2} < \mu\_k < \frac{(1-r^2)A}{\beta r ||\mathfrak{x}\_k||^2} \tag{80}$$

We need to guarantee that the upper bound in the above expression is larger than the lower bound. This can be achieved by choosing { A, B} properly such that

$$0 < (1 - \mathcal{B}) < \frac{1 - r}{r} A < 1\tag{81}$$

From our initial assumptions that the equalizer is in the vicinity of open-eye solution and IYk I > Ra, we know that *r* < 1. It implies that we require to determine the *smallest* value of *r*  which satisfies (81), or in other words, we have to determine the bound for step-size with the *largest* equalizer output amplitude . In Fig. S(a), we plot the function *fr* := (1 - *r)/r* versus *r;*  note that for 0.5::; *r* < 1 we have *fr* ::; 1.

Fig. 5. Plots: a) *fr* and b) fB.

Let { A0 , B<sup>0</sup> } be such that (1 - B<sup>0</sup> ) = pA0 , where 0 < p < 1. To satisfy (81), we need 0.5 ::; *r* < 1 and p < (1 - *r)/r.* From (76), *B*0 must be such that

$$(1 - B\_0)^2 \rho^{-2} + B\_0^2 < 1\tag{82}$$

which reduces to the following quadratic inequality in B<sup>0</sup> :

$$\left(1+\rho^{-2}\right)B\_{\text{o}}^{2}-2\rho^{-2}B\_{\text{o}}+\left(\rho^{-2}-1\right)<0.\tag{83}$$

If we find a B0 that satisfies this inequality, then a pair { A0 , B<sup>0</sup> } satisfying (76) and (81) exists. So consider the quadratic function fB := ( 1 + p- *2)* B2 - 2p- *<sup>2</sup>*B + (p- *<sup>2</sup>*- 1). It has a negative minimum and it crosses the real axis at the positive roots B( 1) = (1 - *p2) I* (1 + *p2 ),* and B(2) = 1. This means that there exist many values of B, between the roots, at which the quadratic function in B evaluates to negative values (refer to Fig. S(b)).

Hence, B0 falls in the interval ( 1 - p*2)* / ( 1 + p2) < B0 < l; it further gives A0 = 2p *I* ( 1 + p*2).*  Using { A0 , B<sup>0</sup> }, we obtain

$$\frac{3\rho^2}{\beta\left(1+\rho^2\right)\|\mathbf{x}\_k\|^2} < \mu\_k < \frac{3\rho}{\beta\left(1+\rho^2\right)\|\mathbf{x}\_k\|^2} \tag{84}$$

Note that, arg minp *p2* I (l + p<sup>2</sup> ) = 0, and argmru<p *p* I (l + p2) = 1. So making *p* = 0 and *p* = l in the lower and upper bounds, respectively, and replacing llxk 112 with E [llxk 112], we find the widest stochastic stability bound on f/-k as follows:

$$0 < \mu < \frac{3}{2\beta \mathbb{E}\left[||\mathbf{x}\_k||^2\right]}.\tag{85}$$

The significance of (85) is that it can easily be measured from the equalizer input samples. In adaptive filter theory, it is convenient to replace E [llxkll 2] with tr **(R),** where *R* = E [xkx£i] is the autocorrelation matrix of channel observation. Also note that, when noise is negligible and channel coefficients are normalized, the quantity tr (R) can be expressed as the product of equalizer length (N) and transmitted signal average energy *(Pa);* it gives

$$0 < \mu\_{\beta \text{CMA}} < \frac{3}{2\beta N P\_a} \tag{86}$$

Note that the bound (86) is remarkably similar to the stability bound of complex-valued LMS algorithm (refer to expression (20)). Comparing (86) and (20), we obtain a simple and elegant relation between the step-sizes of *f3CMA* and complex-valued LMS:

$$\frac{\mu\_{\beta \text{CMA}}}{\mu\_{\text{LMS}}} < \frac{3}{\beta}. \tag{87}$$

#### **6. Simulation results**

We compare *f3CMA* with CMA (Equation (27)) and SFA (Equation (40)). We consider transmission of amplitude-phase shift-keying (APSK) signals over a complex voice-band channel (channel-I), taken from Picchi & Prati (1987), and evaluate the average ISi traces at SNR = 30dB. We employ a seven-tap equalizer with central spike initialization and use 8- and 16-APSK signalling . Step-sizes have been selected such that all algorithms reached steady-state requiring same number of iterations . The parameter *f3* is obtained as:

$$\beta = \begin{cases} 1.535, & \text{for 8.APS} \\\\ 1.559, & \text{for 16.APS} \end{cases} \tag{88}$$

Results are summarized in Fig. 6; note that the *f3CMA* performed better than CMA and SFA by yielding much lower ISi floor. Also note that SFA performed slightly better than CMA.

Next we validate the stability bound (86). Here we consider a second complex channel (as channel-2) taken from Kennedy & Ding (1992). In all cases, the simulations were performed with Nit = 104 iterations, Nrun = 100 runs, and no noise. In Fig. 7, we plot the probability of divergence (Pdiv) for three different equalizer lengths, against the normalized step-size, fl-norm = µ/fl-bound · The Pdiv is estimated as Pdiv = Ndiv/ Nrun, where Ndiv indicates the number of times equalizer diverged. Equalizers were initialized close to zero-forcing solution . It can be seen that the bound does guarantee a stable performance when µ < fl-bound.

Fig. 6. Residual ISi: a) 8-APSK and b) 16-APSK. The inner and outer moduli of 8-APSK are 1.000 and 1.932, respectively. And the inner and outer moduli of 16-APSK are 1.586 and 3.000, respectively. The energies of 8-APSK and 16-APSK are 2.366 and 5.757, respectively.

Fig. 7. Probability of divergence on channel-1 and channel-2 with three equalizer lengths, no noise, Nit= 104 iterations and Nrun = 100 runs for a) 8-APSK and b) 16-APSK.

#### **7. Concluding remarks**

In this Chapter, we have introduced the basic concept of adaptive blind equalization in the context of single-input single-output communication systems. The key challenge of adaptive blind equalizers lies in the design of special cost functions whose minimization or maximization result in the removal of inter-symbol interference . We have briefly discussed popular criteria of equalization like Lucky, the mean square error, the minimum entropy, the constant modulus, Shalvi-Weinstein and Shtrom-Fan . Most importantly, based on minimum entropy deconvolution principle, the idea of designing specific cost function for the blind equalization of given transmitted signal is described in detail. We have presented a case study of amplitude-phase shift-keying signal for which a cost function is derived and corresponding adaptive algorithm is obtained . We have also addressed the admissibility of the proposed cost function and stability of the corresponding algorithm. The blind adaptation of the derived algorithm is shown to possess better convergence behavior compared to two existing algorithms . Finally, hints are provided to obtain blind equalization cost functions for square and cross quadrature amplitude modulation signals .

#### **8. Exercises**

1. Refer to Fig. 8 for geometrical details of square- and cross-QAM signals. Now following the ideas presented in Section 5, show that the blind equalization cost functions for square- and cross-QAM signals are respectively as follows:

$$\max\_{\mathbf{w}} \mathbb{E}\left[ |y\_n|^2 \right], \text{ s.t. } \max\left\{ |y\_{R,n}|, |y\_{I,n}| \right\} \le R. \tag{89}$$

and

mJX E [1Ynl 2] , s.t. max {IP *YR,nl,* IY1,nl} + max *{IYR,n* I, IPYI,nl} - max { *IYR,n* I, IY1,nl} ::; pR. (90)

Fig. 8. Geometry of a) square- and b) cross-QAM *(P* = ~) .

**2.** By exploiting the independence between the in-phase and quadrature components of square QAM signal, show that the following blind equalization cost function may be obtained:

$$\max\_{\mathcal{W}} \mathbb{E}\left[ \left| y\_{\mathbb{R}} \right|^2 \right], \text{ s.t. } \max \left\{ \left| y\_{\mathbb{R}, \mathbb{R}} \right| \right\} = \max \left\{ \left| y\_{I, \mathbb{R}} \right| \right\} \le \mathbb{R}. \tag{91}$$

The cost (91) originally appeared in Satorius & Mulligan (1993). Refer to Meng et al. (2009) and Abrar & Nandi (2010a), respectively, for its block-iterative and adaptive optimization.

#### **9. Acknowledgment**

The authors acknowledge the support of COMSATS Institute of Information Technology, Islamabad, Pakistan, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia, and the University of Liverpool, UK towards the accomplishment of this work.

#### **1 O. References**


Ding, Z. & Li, Y. (2001). *Blind Equalization and Identification,* Marcel Dekker Inc., New York.

