**2. Noise estimation**

The output acquired by sensing devices, for example transformer and sensor, is a measurement of analogue input signal *f* ( ) *x* . The output can be modelled as in (1). The output *X n*[ ] is composed by a filtered *f* ( ) *x* with the sensor responses ( ) *x* and an added noise [ ] *W n* . The noise *W* contains various types of interferences, for instance, the radio frequency interferences from communication systems. It also includes the noises induced by measurement devices, such as electronic noises from oscilloscope and transmission errors. In most cases, the noise *W* is modelled by a random vector that often follows Gaussian distribution.

$$X[n] = \overline{f}, \overline{\phi} > + \mathcal{W}[n] \tag{1}$$

Wavelet Denoising 61

linear estimation. In this chapter, most estimations use mean square error to measure

It is possible that we have some prior information for a signal, but it is rare to know the probability distribution of complex signals. For example, there is not an appropriate model for the stochastic transient signals in power system or the sound signals from nature environment. In this case, we have to find a "good" estimator whose maximal risk is minimal among all estimators. The prior information forms a signal set . But this set does not specify the probability distribution of signals in . The more prior information, the

For the signal *f* , the noisy output is as shown in (2). The risk of estimation *F DX* is || ||<sup>2</sup> *r D f E DX f* (,) { } . Since the probability distribution of signal in set is unknown, the precise risk cannot be calculated. Only a possible range is calculated. The maximum risk of

> || ||<sup>2</sup> ( , ) sup { } *f r D E DX f*

In minimax estimation, the minimax risk is the lower bound of risk in (6) with all possible,

( , ) inf ( , ) *<sup>n</sup> D n r D rD*

Threshold is the estimated noise level. The values larger than threshold are regarded as signal, and the smaller ones are regarded as noises. When the noisy output is decomposed in a chosen base, the estimator of noises can also be applied because the white noises remain as white noises in all bases. It is proved in section 3.1. Two thresholding functions: hard

When the noisy output is decomposed in an orthogonal basis 0 *B g* { } *m mN* , the components in (2) is rewritten as [] , *X m Xg B m* , [] , *f m fg B m* , and

If the noise *W* is a zero-mean white noise with variance 2

. Thus the noise coefficients

. This because,

(6)

(7)

[] [] [] *Xm fm Wm BB B* . (8)

0 [ ] [] [] *N B m n W m Wng n* 

<sup>1</sup> \*

also represent

, then

estimation risk.

**2.2 Minimax estimation** 

smaller the set (Mallat, 2009d).

this range is (Donoho & Johnstone, 1998):

no matter linear or nonlinear, operators *D* :

Here, *n* denotes the set of all operators.

**3. Threshold estimation in bases** 

**3.1 Estimation in orthogonal basis** 

<sup>2</sup> *EWnW k n k* { [ ] [ ]} [ ] 

a white noise of variance 2

[] , *W m Wg B m* . The sum of them gives

thresholding and soft thresholding are introduced in section 3.2.

If the analogue-to-digital data acquisition is stable, the discrete signal can be denoted by *fn f* [] , . The analogue approximation of *f* ( ) *x* can be recovered from *f n*[ ]. The noisy output in (1) is rewritten as

$$X[n] = f[n] + \mathcal{W}[n] \tag{2}$$

The estimation of *f*[ ] *n* calculated from (2) is denoted by [ ] *F DX n* , where *D* is the decision operator. It is designed to minimize the error *f F* . For one-dimension signal, the mean-square distance is often employed to measure the error *f F* . The mean-square distance is not a perfect model but it is simple and sufficiently precise in most cases (Mallat, 2009d). The risk of the estimation is calculated by (3):

$$\text{tr}(D\_\prime f) = \text{E}\{\|\|f - \tilde{F}\|\|^2\}\tag{3}$$

The decision operator *D* is optimized with the prior information available on the signal (Mallat, 2009d). Two estimation methods: Bayes estimation and minimax estimation are the most commonly used ones. The Bayes estimator minimizes the risk to get optimal estimation. But it is difficult to obtain enough information to model prior probability distribution. The minimax estimator uses simple model. But the risk cannot be calculated. The section 2.1 and section 2.2 introduce the fundamental of Bayes estimator and minimax estimator.

#### **2.1 Bayes estimation**

In Bayes estimation, the unknown signal *f* which is denoted by a random vector *F* is supposed to have a probability distribution which is also called prior distribution. The noisy output in (2) can be rewritten as

$$X[n] = F[n] + \mathcal{W}[n] \text{ for } 0 \le n < N \tag{4}$$

The noise *W* is supposed to be independent with *F* for all *n* . The distribution of measurement *X* is the joint distribution of *F* and *W* . It is called posterior distribution. Thus, *F DX* is an estimator of *F* from measurement *X* . Then the risk is the same as in (3). The Bayes risk of *F* with respect to the prior probability distribution of the signal is:

$$r(D,\alpha) = E\_{\alpha} \{ r(D, F) \} = E\{ \|F - \tilde{F}\|^2 \} = \sum\_{0}^{N-1} E\{ |F[n] - \tilde{F}[n]|^2 \} \,. \tag{5}$$

The estimator *F* is said to be a Bayes estimator if it minimizes the Bayes risk among all estimators. Equivalently, the estimator which minimizes the posterior expected loss *E rDF X* { ( , )| } for each *X* also minimizes the Bayes risk and therefore is a Bayes estimator (Lehmann & Casella, 1998).

The risk function is determined by choosing the way to measure the distance between the estimator *F* and the unknown signal *F* . In most applications, the mean square error is adopted because of its simplicity. But some alternative estimators are also used such as

*Xn f Wn* [] , [] 

If the analogue-to-digital data acquisition is stable, the discrete signal can be denoted by

The estimation of *f*[ ] *n* calculated from (2) is denoted by [ ] *F DX n* , where *D* is the

distance is not a perfect model but it is simple and sufficiently precise in most cases (Mallat,

The decision operator *D* is optimized with the prior information available on the signal (Mallat, 2009d). Two estimation methods: Bayes estimation and minimax estimation are the most commonly used ones. The Bayes estimator minimizes the risk to get optimal estimation. But it is difficult to obtain enough information to model prior probability distribution. The minimax estimator uses simple model. But the risk cannot be calculated. The section 2.1 and

In Bayes estimation, the unknown signal *f* which is denoted by a random vector *F* is

The noise *W* is supposed to be independent with *F* for all *n* . The distribution of measurement *X* is the joint distribution of *F* and *W* . It is called posterior distribution. Thus, *F DX* is an estimator of *F* from measurement *X* . Then the risk is the same as in (3).


*rD E rDF E F F E Fn Fn*

( , ) { ( , )} { } {| [ ] [ ]| }

estimators. Equivalently, the estimator which minimizes the posterior expected loss

The risk function is determined by choosing the way to measure the distance between the

adopted because of its simplicity. But some alternative estimators are also used such as

with respect to the prior probability distribution

section 2.2 introduce the fundamental of Bayes estimator and minimax estimator.

decision operator. It is designed to minimize the error *f F*

2009d). The risk of the estimation is calculated by (3):

supposed to have a probability distribution

noisy output in (2) can be rewritten as

mean-square distance is often employed to measure the error *f F*

. The analogue approximation of *f* ( ) *x* can be recovered from *f n*[ ]. The noisy

*fn f* [] , 

output in (1) is rewritten as

**2.1 Bayes estimation** 

The Bayes risk of *F*

The estimator *F*

*E rDF X* { ( , )| }

estimator *F*

(Lehmann & Casella, 1998).

(1)

. For one-dimension signal, the

. The mean-square

*Xn f n Wn* [] [] [] (2)


which is also called prior distribution. The

of the signal is:

*Xn Fn Wn* [] [] [] for 0 *n N* (4)

1 2 2

 . (5)

0

is said to be a Bayes estimator if it minimizes the Bayes risk among all

for each *X* also minimizes the Bayes risk and therefore is a Bayes estimator

and the unknown signal *F* . In most applications, the mean square error is

*N*

linear estimation. In this chapter, most estimations use mean square error to measure estimation risk.

#### **2.2 Minimax estimation**

It is possible that we have some prior information for a signal, but it is rare to know the probability distribution of complex signals. For example, there is not an appropriate model for the stochastic transient signals in power system or the sound signals from nature environment. In this case, we have to find a "good" estimator whose maximal risk is minimal among all estimators. The prior information forms a signal set . But this set does not specify the probability distribution of signals in . The more prior information, the smaller the set (Mallat, 2009d).

For the signal *f* , the noisy output is as shown in (2). The risk of estimation *F DX* is || ||<sup>2</sup> *r D f E DX f* (,) { } . Since the probability distribution of signal in set is unknown, the precise risk cannot be calculated. Only a possible range is calculated. The maximum risk of this range is (Donoho & Johnstone, 1998):

$$r(D\_\prime \Theta) = \sup\_{f \in \Theta} E\{\|DX - f\|^2\}\tag{6}$$

In minimax estimation, the minimax risk is the lower bound of risk in (6) with all possible, no matter linear or nonlinear, operators *D* :

$$r\_n(D, \Theta) = \inf\_{D \in \text{On}} r(D, \Theta) \tag{7}$$

Here, *n* denotes the set of all operators.

#### **3. Threshold estimation in bases**

Threshold is the estimated noise level. The values larger than threshold are regarded as signal, and the smaller ones are regarded as noises. When the noisy output is decomposed in a chosen base, the estimator of noises can also be applied because the white noises remain as white noises in all bases. It is proved in section 3.1. Two thresholding functions: hard thresholding and soft thresholding are introduced in section 3.2.

#### **3.1 Estimation in orthogonal basis**

When the noisy output is decomposed in an orthogonal basis 0 *B g* { } *m mN* , the components in (2) is rewritten as [] , *X m Xg B m* , [] , *f m fg B m* , and [] , *W m Wg B m* . The sum of them gives

$$X\_B[m] = f\_B[m] + \mathcal{V}\_B[m].\tag{8}$$

If the noise *W* is a zero-mean white noise with variance 2 , then <sup>2</sup> *EWnW k n k* { [ ] [ ]} [ ] . Thus the noise coefficients <sup>1</sup> \* 0 [ ] [] [] *N B m n W m Wng n* also represent a white noise of variance 2 . This because,

$$E\{\mathcal{W}\_{\mathcal{B}}[m]\mathcal{W}\_{\mathcal{B}}[p]\} = \sum\_{n=0}^{N-1} \sum\_{k=0}^{N-1} \mathcal{g}\_{m}[n]\mathcal{g}\_{p}[k] \mathbb{E}\{\mathcal{W}[n]\mathcal{W}[k]\} = \sigma^{2} < \mathcal{g}\_{p'}\mathcal{g}\_{m} > = \sigma^{2}\mathcal{S}[p-m] \cdot \tag{9}$$

From the analysis above, one can see that the noise remains as white noise in all bases. It is not influenced by the choice of basis (Mallat, 2009d).

#### **3.2 Thresholding estimation**

In the orthogonal basis 0 *B g* { } *m mN* , the estimator of *f* in *XfW* can be written as:

$$\tilde{F} = DX = \sum\_{m=0}^{N-1} a\_m (X\_B[m]) X\_B[m] \mathbf{g}\_m \,. \tag{10}$$

Wavelet Denoising 63

for some applications where precise recovery is required such as noise reduction of partial discharge signal, since the pulse magnitude and shape in such applications are needed for further analysis (Zhang et al., 2007). For other cases where precise recovery of signal magnitude is not required, for example, image noise reduction, the soft thresholding is

As depicted in (13), the risk of thresholding is closely related to the thresholding

of estimation. Several famous thresholds were proposed and proved by different estimation

Donoho and Johnstone (Donoho & Johnstone, 1994) proposed a universal threshold *T* . They proved that the risk of thresholding, no matter hard or soft, is small enough to satisfy

<sup>2</sup> *r EX* ( , ) {| ( ) | }

 

2 2 *EX r* {| ( ) | } ( , )

Since the projection [ ] *Bf m* in basis 0 *B g* { } *m mN* is a constant, the [ ] *X mB* is a Gaussian

 

 

and variance 1, then the estimation risk is

 

> 

 

is a soft thresholding, for a Gaussian

. The risk of estimation by soft

. (16)

, then the following formula can be verified by

. (17)

*<sup>T</sup>* . The appropriate choice of threshold *T* is an important factor to reduce the risk

*x* , (a) original signal, (b) hard thresholding, (c) soft

widely used since it can retain the regularity of signal (Donoho, 1995).

**3.3 Threshold estimation and its improvement** 

*x* of hard thresholding and soft thresholding are portrayed in Fig.1.

The ( ) *<sup>T</sup>* 

thresholding

function

methods.

**3.3.1 Universal threshold** 

random variable *X* of mean

If *X* has a variance <sup>2</sup>

considering / *X X*

the requirements of most applications.

If the thresholding function | | ( ) ( ( )) *T x x x sign x* 

thresholding with a threshold *T* is

random variable with mean [ ] *Bf m* and variance <sup>2</sup>

and a mean

Fig. 1. Thresholding function ( ) *<sup>T</sup>*

Here, *am* is the thresholding function. It could be hard thresholding or soft thresholding.

#### **3.2.1 Hard thresholding**

A hard thresholding function is shown as follows (Mallat, 2009d):

$$a\_m(\mathbf{x}) = \begin{cases} 1 & \text{if } |\mathbf{x}| \ge T \\ 0 & \text{if } |\mathbf{x}| < T \end{cases}. \tag{11}$$

By substituting *a x <sup>m</sup>*( ) into (10), we can obtain the estimator with hard thresholding function

$$\tilde{F} = \sum\_{m=0}^{N-1} \rho\_{\Gamma}(X\_R[m]) g\_m \quad \text{with} \quad \rho\_{\Gamma}(\mathbf{x}) = a\_m(\mathbf{x}) \,^\*\mathbf{x} = \begin{cases} \mathbf{x} & \text{if } |\mathbf{x}| \ge T \\ 0 & \text{if } |\mathbf{x}| < T \end{cases}. \tag{12}$$

Then the risk of this thresholding is

$$r\_{\rm th}(f) = r(D, f) = \sum\_{m=0}^{N-1} E\{ |\, f\_B[m] - \rho\_T(X\_B[m])|^2 \}\,. \tag{13}$$

#### **3.2.2 Soft thresholding**

A soft thresholding function is implemented by (Mallat, 2009d)

$$0 \le a\_m(\mathbf{x}) = \max\{1 - \frac{T}{\|\mathbf{x}\|}, 0\} \le 1 \tag{14}$$

The resulting estimator *F* for this case can be written as in (12) with the thresholding function *<sup>T</sup>* replaced by a soft thresholding function as shown in (15).

$$\rho\_T(\mathbf{x}) = \begin{cases} \mathbf{x} - T & \text{if } \mathbf{x} \ge T \\ \mathbf{x} + T & \text{if } \mathbf{x} \le -T \\ \mathbf{0} & \text{if } |\mathbf{x}| \le T \end{cases} \tag{15}$$

Reducing the magnitude of coefficients *XB* that are greater than threshold usually makes the amplitude of the estimated signal *F* be smaller than the original *F* . This is intolerable

{ [ ] [ ]} [ ] [ ] { [ ] [ ]} , [ ]

From the analysis above, one can see that the noise remains as white noise in all bases. It is

In the orthogonal basis 0 *B g* { } *m mN* , the estimator of *f* in *XfW* can be written as:

*F DX a X m X m g* 

Here, *am* is the thresholding function. It could be hard thresholding or soft thresholding.

1 if| | ( ) 0 if| | *<sup>m</sup>*

By substituting *a x <sup>m</sup>*( ) into (10), we can obtain the estimator with hard thresholding function

with if| | ( ) ( )\* 0 if| | *T m*

0 ( ) ( , ) {| [ ] ( [ ])| }

0 ( ) max(1 ,0) 1 | | *<sup>m</sup>*

*N th B TB m r f rD f E f m X m*

 

( [ ]) [ ]

*mB B m*

*x T*

*x T*

*x ax x*

<sup>1</sup> <sup>2</sup>

*T*

*x*

*x T x T x T*

 

1

*N*

*m*

A hard thresholding function is shown as follows (Mallat, 2009d):

*a x*

0

*B B m p p m*

2 2

 *gg p*

*m*

. (11)

*x xT*

*x T*

. (15)

. (12)

. (10)

. (13)

. (14)

be smaller than the original *F* . This is intolerable

for this case can be written as in (12) with the thresholding

. (9)

1 1

*N N*

*n k EW mW p g n g kEW nW k*

not influenced by the choice of basis (Mallat, 2009d).

**3.2 Thresholding estimation** 

**3.2.1 Hard thresholding** 

1

*F Xmg* 

*N*

*m*

Then the risk of this thresholding is

**3.2.2 Soft thresholding** 

The resulting estimator *F*

the amplitude of the estimated signal *F*

function

( [ ])

*TB m*

A soft thresholding function is implemented by (Mallat, 2009d)

*a x*

( )

*T*

*<sup>T</sup>* replaced by a soft thresholding function as shown in (15).

 *x xT* 

0

*x T*

 if if if| |

Reducing the magnitude of coefficients *XB* that are greater than threshold usually makes

0

0 0

for some applications where precise recovery is required such as noise reduction of partial discharge signal, since the pulse magnitude and shape in such applications are needed for further analysis (Zhang et al., 2007). For other cases where precise recovery of signal magnitude is not required, for example, image noise reduction, the soft thresholding is widely used since it can retain the regularity of signal (Donoho, 1995).

The ( ) *<sup>T</sup> x* of hard thresholding and soft thresholding are portrayed in Fig.1.

Fig. 1. Thresholding function ( ) *<sup>T</sup> x* , (a) original signal, (b) hard thresholding, (c) soft thresholding

#### **3.3 Threshold estimation and its improvement**

As depicted in (13), the risk of thresholding is closely related to the thresholding function *<sup>T</sup>* . The appropriate choice of threshold *T* is an important factor to reduce the risk of estimation. Several famous thresholds were proposed and proved by different estimation methods.

#### **3.3.1 Universal threshold**

Donoho and Johnstone (Donoho & Johnstone, 1994) proposed a universal threshold *T* . They proved that the risk of thresholding, no matter hard or soft, is small enough to satisfy the requirements of most applications.

If the thresholding function | | ( ) ( ( )) *T x x x sign x* is a soft thresholding, for a Gaussian random variable *X* of mean and variance 1, then the estimation risk is

$$\text{tr}(\mathcal{X}, \mu) = \text{E}\{ |\, \rho\_{\mathcal{Z}}(X) - \mu |^{2} \}. \tag{16}$$

If *X* has a variance <sup>2</sup> and a mean , then the following formula can be verified by considering / *X X* 

$$E\{\|\rho\_{\lambda}(X) - \mu\|^2\} = \sigma^2 r(\frac{\lambda}{\sigma}, \frac{\mu}{\sigma})\,. \tag{17}$$

Since the projection [ ] *Bf m* in basis 0 *B g* { } *m mN* is a constant, the [ ] *X mB* is a Gaussian random variable with mean [ ] *Bf m* and variance <sup>2</sup> . The risk of estimation by soft thresholding with a threshold *T* is

$$\begin{split} r\_{th}(f) &= \sigma^2 \sum\_{m=0}^{N-1} r(\frac{T}{\sigma}, \frac{f\_B[m]}{\sigma}) \\ &\leq N\sigma^2 r(\frac{T}{\sigma}, 0) + \sigma^2 \sum\_{m=0}^{N-1} \min(\frac{T^2 + \sigma^2}{\sigma^2}, \frac{|f\_B[m]|^2}{\sigma^2}) \end{split} \tag{18}$$

Donoho and Johnstone proved that for *T N* 2log*e* and 4 *N* , the upper bound of the two parts of risk in (18) are (Donoho &Johnstone, 1994)

2 2 ( ,0) (2log 1) *<sup>e</sup> <sup>T</sup> N r N* , (19)

Wavelet Denoising 65

The *Sure X T* (,) is called Stein unbiased risk estimator (SURE) and was proved unbiased by (Donoho & Johnstone, 1995). Consider using this estimator of risk to select a threshold:

arg min ( , )

Arguing heuristically, one expects that, for large dimension *N* , a sort of statistical regularity will set in, the Law of Large Numbers will ensure that SURE is close to the true risk and that *T* will be almost the optimal threshold for the case at hand (Donoho & Johnstone, 1995).

Although the SURE threshold is unbiased, its variance will induce errors when the signal energy is smaller than noise energy. In this case, the universal threshold must be imposed to

> if if

The inequality in (21) shows that the risk can be represented in the form of a multiplication of a constant 2log 1 *<sup>e</sup> N* and the loss for estimation. However, it is natural and more

1

0 ( ) ( min( ,| [ ]| )) *N th B m*

The minimax estimation introduced in section 2.2 is a possible method to find the

( 1) ( ,0) ( , ) *N*

*T N* 2 log *<sup>e</sup>* , 2 2 log ( 1) 4 log (log ( 1)) log 2 (1) ( ) *T N e e e e <sup>N</sup>*

Usually, for the same *N* , the risk of universal threshold is larger than SURE threshold and minimax threshold. All the three thresholds mentioned in section 3.3.1 to section 3.3.3 are

*T T* 

 

Donoho and Jonestone (Donoho & Johnstone, 1994) defined the minimax quantities

2 22

 *f m*

, and *T* the largest attainin

 and ( ,0) 


 

> 2 2 2 2

*XN N XN N* 

 

 . (26)

*T Sure X T* (24)

, || ||<sup>2</sup> *f* can be estimated by || ||2 2 *X N*

which yield smallest possible constant in

g above . (27)

. Then *T* is the largest

. (28)

2log*<sup>e</sup> N* .

is strictly decreasing in

*o N* . (29)

*N e N N* (log ) . Then the SURE

. (25)

*T*

and compared with a minimum energy level 2 1/2 3/2

2 log *<sup>e</sup> N*


remove all the noises. Since || || || || 2 22 *EX f N* { }

*T*

revealing to look for 'optimal' thresholds

 *N*

, so that the solution of (27) is

attaining <sup>0</sup> . Since

*T* 

place of 2log 1 *<sup>e</sup> N* . Thus, the inequality in (21) can be rewritten as

appropriate constant that satisfies 2log 1 *<sup>e</sup> N* , and the threshold

*r f*

1 2 (,) inf sup min( ,1) *T*

They also proved that attains its maximum <sup>0</sup> at 0

is strictly increasing in

 

Then with this solution, the minimax threshold *T* is

(, ) 

threshold is (Mallat, 2009d)

**3.3.3 Minimax threshold** 

and

$$\sigma^2 \sum\_{m=0}^{N-1} \min(\frac{T^2 + \sigma^2}{\sigma^2}, \frac{|f\_R[m]|^2}{\sigma^2}) \le (2\log\_{\epsilon} N + 1) \sum\_{m=0}^{N-1} \min(\sigma^2, |f\_R[m]|^2) \,. \tag{20}$$

Then, the risk of estimator with threshold *T N* 2log*e* and all 4 *N* is

$$r\_{th}(f) \le (2\log\_e N + 1)(\sigma^2 + \sum\_{m=0}^{N-1} \min(\sigma^2, |f\_R[m]|^2))\tag{21}$$

Donoho and Johnstone also mentioned in (Donoho &Johnstone, 1994), the universal threshold is optimal in certain cases defined by (Donoho &Johnstone, 1994).

#### **3.3.2 SURE threshold**

The thresholding risk is often reduced by decreasing the value of threshold, for instance, choosing a threshold smaller than universal threshold in section 3.3.1. Sure threshold was proposed by Stein (Stein, 1981) to suit this purpose.

As depicted in (Mallat, 2009d), when | [ ]| *Xm T <sup>B</sup>* , the coefficient is set to zero by soft thresholding. Then the risk of estimation equals <sup>2</sup> | [ ]| *Bf m* . Since 2 22 *EXm fm* {| [ ]| } | [ ]| *B B* , the <sup>2</sup> | [ ]| *Bf m* can be estimated with 2 2 | [ ]| *X mB* . But if | [ ]| *Xm T <sup>B</sup>* , the soft thresholding substracts *T* from | [ ]| *X mB* . Then the risk is the sum of noise energy and the bias introduced by the reduction of the amplitude of [ ] *X mB* by *T* . The resulting estimator of ( ) *th r f* is

$$\text{Sure}(X, T) = \sum\_{m=0}^{N-1} \text{C}(X\_B[m])\,\,\,\,\tag{22}$$

with

$$\mathbf{C}(\boldsymbol{\mu}) = \begin{cases} \boldsymbol{\mu}^2 - \sigma^2 & \text{if } \boldsymbol{\mu} \le T \\ \sigma^2 + T^2 & \text{if } \boldsymbol{\mu} > T \end{cases} \tag{23}$$

0

2 2 ( ,0) (2log 1) *<sup>e</sup>*

2 2 2

*th e B*


*<sup>T</sup> f m <sup>N</sup> f m*

1

0 ( ) (2 log 1)( min( ,| [ ]| )) *N*

*m*

Donoho and Johnstone also mentioned in (Donoho &Johnstone, 1994), the universal

The thresholding risk is often reduced by decreasing the value of threshold, for instance, choosing a threshold smaller than universal threshold in section 3.3.1. Sure threshold was

As depicted in (Mallat, 2009d), when | [ ]| *Xm T <sup>B</sup>* , the coefficient is set to zero by soft thresholding. Then the risk of estimation equals <sup>2</sup> | [ ]| *Bf m* . Since 2 22 *EXm fm* {| [ ]| } | [ ]| *B B*

substracts *T* from | [ ]| *X mB* . Then the risk is the sum of noise energy and the bias introduced by the reduction of the amplitude of [ ] *X mB* by *T* . The resulting estimator of

1

0 ( , ) ( [ ]) *N*

*m Sure X T C X m* 

2 2

*T* 

2 2 ( ) *u*

 

*C u*

*B*

if if *u T u T* 

. (20)

2 22

 *f m*

*T T f m N r*

*N*

*m*

*N*


1 2 2 2

*e B*

2log*e* and all 4 *N* is

. (21)

 

. But if | [ ]| *Xm T <sup>B</sup>* , the soft thresholding

, (22)

. (23)

2 2

*B*

. (18)

,

 

2log*e* and 4 *N* , the upper bound of the

, (19)

1 2

*N*

Donoho and Johnstone proved that for *T N*

two parts of risk in (18) are (Donoho &Johnstone, 1994)

*<sup>T</sup> f m rf r*

*m*

*th*

and

**3.3.2 SURE threshold** 

( ) *th r f* is

with

[ ] () ( , )

0

2 2

 

*<sup>T</sup> N r* 

2 2

*rf N*

Then, the risk of estimator with threshold *T N*

proposed by Stein (Stein, 1981) to suit this purpose.

the <sup>2</sup> | [ ]| *Bf m* can be estimated with 2 2 | [ ]| *X mB*

1 1 2 2 2

*N N B*

*m m*

threshold is optimal in certain cases defined by (Donoho &Johnstone, 1994).

 

0 0

*B*

  The *Sure X T* (,) is called Stein unbiased risk estimator (SURE) and was proved unbiased by (Donoho & Johnstone, 1995). Consider using this estimator of risk to select a threshold:

$$\tilde{T} = \underset{T}{\text{arg min }} \text{sum}(X, T) \tag{24}$$

Arguing heuristically, one expects that, for large dimension *N* , a sort of statistical regularity will set in, the Law of Large Numbers will ensure that SURE is close to the true risk and that *T* will be almost the optimal threshold for the case at hand (Donoho & Johnstone, 1995).

Although the SURE threshold is unbiased, its variance will induce errors when the signal energy is smaller than noise energy. In this case, the universal threshold must be imposed to remove all the noises. Since || || || || 2 22 *EX f N* { } , || ||<sup>2</sup> *f* can be estimated by || ||2 2 *X N* and compared with a minimum energy level 2 1/2 3/2 *N e N N* (log ) . Then the SURE threshold is (Mallat, 2009d)

$$T = \begin{cases} \sigma \sqrt{2 \log\_{\epsilon} N} & \text{if } \left\| \left\| X \right\| \right\|^2 - N\sigma^2 \le \varepsilon N\\ \tilde{T} & \text{if } \left\| \left\| X \right\| \right\|^2 - N\sigma^2 > \varepsilon N \end{cases} \tag{25}$$

#### **3.3.3 Minimax threshold**

The inequality in (21) shows that the risk can be represented in the form of a multiplication of a constant 2log 1 *<sup>e</sup> N* and the loss for estimation. However, it is natural and more revealing to look for 'optimal' thresholds which yield smallest possible constant in place of 2log 1 *<sup>e</sup> N* . Thus, the inequality in (21) can be rewritten as

$$r\_{\hbar}(f) \le \Lambda(\sigma^2 + \sum\_{m=0}^{N-1} \min(\sigma^2, |f\_B[m]|^2)) \cdot \tag{26}$$

The minimax estimation introduced in section 2.2 is a possible method to find the appropriate constant that satisfies 2log 1 *<sup>e</sup> N* , and the threshold 2log*<sup>e</sup> N* .

Donoho and Jonestone (Donoho & Johnstone, 1994) defined the minimax quantities

$$\Lambda \equiv \inf\_{\lambda} \sup\_{\mu} \frac{\rho\_{\Gamma}(\lambda, \mu)}{N^{-1} + \min(\mu^2, 1)}, \text{ and } T \text{ = the largest } \lambda \text{ attains } \Lambda \text{ above.} \tag{27}$$

They also proved that attains its maximum <sup>0</sup> at 0 . Then *T* is the largest attaining <sup>0</sup> . Since (, ) is strictly increasing in and ( ,0) is strictly decreasing in , so that the solution of (27) is

( 1) ( ,0) ( , ) *N T T* . (28)

Then with this solution, the minimax threshold *T* is

$$T \le \sqrt{2\log\_{\varepsilon} N} \quad \text{ $T^2 = 2\log\_{\varepsilon}(N+1) - 4\log\_{\varepsilon}(\log\_{\varepsilon}(N+1)) - \log\_{\varepsilon}2\pi + o(1)$ }\tag{29}$$

Usually, for the same *N* , the risk of universal threshold is larger than SURE threshold and minimax threshold. All the three thresholds mentioned in section 3.3.1 to section 3.3.3 are applied to denoise the same noisy data and are evaluated by signal-to-noise ratio (SNR), which is measured in decibels:

$$SNR\_{d\mathbb{B}} = 10 \, ^\ast \log\_{10} (\frac{E\{||\, \boldsymbol{F} \| \}^2\}}{E\{||\, \boldsymbol{F} - \tilde{\boldsymbol{F}} \| \}^2} \, \text{/} \tag{30}$$

Wavelet Denoising 67

The oscillations result in a smaller SNR (31.98dB). The oscillations are less obvious in estimations in Fig.2(d) and Fig.2(e). But noise with very small magnitude is still found. As mentioned before, universal threshold is usually larger than the other two thresholds. Its risk of estimation || ||<sup>2</sup> *r EFF* { } is therefore greater than that of the other two. This can be

The signals carry a large amount of useful information which is difficult to find. The discovery of orthogonal bases and local time-frequency analysis opens the door to the world of sparse representation of signals. An orthogonal basis is a dictionary of minimum size that can yield a sparse representation if designed to concentrate the signal energy over a set of few vectors (Mallat, 2009a). The smaller amount of wavelet coefficients reveals the information of signal we are looking for. The generation of those vectors is an approximation of original signal by linear combination of wavelets. For all *f* in <sup>2</sup>*L R*( ) ,

> 1 ,, , *Pf P f f <sup>j</sup> j j k j*

onto *Vj* . In orthogonal decomposition, *Vj* is the subspace which satisfies

Thresholding the wavelet coefficients keeps the local regularity of signal. Usually, wavelet

1. Decomposition. A filter bank of conjugate mirror filters decomposes the discrete signal

parameter 2 *<sup>j</sup>* varies from <sup>1</sup> 2*<sup>L</sup> N* up to 2 1 *<sup>J</sup>* , where *N* is the sampling rate of

2. Thresholding. After decomposition, the threshold is selected. A thresholding estimator

applied on all coefficients except the coefficients contain the lowest frequency energy

both belong to the orthogonal basis , , ,0 2 0 2 [{ [ ]} *j j* , { [ ]} ] *j k LjJ k J k <sup>k</sup> Bn n* 

2 2

*<sup>j</sup> <sup>J</sup> J*

*jL k k FX X*

1 0 0

and {0} *<sup>j</sup> <sup>j</sup>*

*V* 

 

, , , ,

, (32)

 

 

(, ) (, )

*<sup>T</sup>* is either a hard threshold or a soft threshold. Normally, the selected threshold is

*J k* . This aims to keep the regularity of reconstructed signal. The difference between keeping and not keeping the lowest-frequency approximate coefficients is illustrated by Fig.3. Universal threshold with hard thresholding is used in estimation. The original data in Fig.3(a) has a wide frequency range. It contains both low frequency regular component and high frequency irregular components. Fig.3(c) shows when lowest-frequency part is kept,

*T j k j k T Jk Jk*

(Daubechies, 1992).

 . The scale

stands for the inner product of *f* and

( ) *<sup>j</sup> <sup>j</sup> V L*

 

thresholding includes three steps (Shim et al., 2001; Zhang et al., 2007):

in a discrete orthogonal basis. The wavelet function , [ ]

*<sup>k</sup>* , (31)

*<sup>j</sup>*,*<sup>k</sup>* , *Pj* is the orthogonal projection

*j k n* and scale function , [ ]

*j k n*

deduced by values of SNR.

**4. Wavelet thresholding** 

where , , *j k f* 

signal *X* .

where

, , *X* 

*VVVV V* 210 1 2 , <sup>2</sup>

in the basis *B* is written as

where *F* is the original data without noise and *F* is the estimation of *F* .

Fig.2 shows the estimation of a synthesized signal with different thresholds. The noisy data is decomposed in a biorthogonal basis. Since hard thresholding is adopted, setting a wavelet coefficient to zero will induce oscillations near discontinuities in estimation. The estimation with universal threshold in Fig.2(c) shows small oscillations at the smooth parts.

Fig. 2. Estimation with different thresholds. (a) original data, (b) noisy data (SNR=23.59dB), (c) estimation with universal threshold (SNR=31.98dB), (d) estimation with SURE threshold (SNR=34.82dB), (e) estimation with minimax threshold (SNR=33.63dB)

applied to denoise the same noisy data and are evaluated by signal-to-noise ratio (SNR),

*E F SNR*

{ } 10 \* log ( ) { } *dB*

Fig.2 shows the estimation of a synthesized signal with different thresholds. The noisy data is decomposed in a biorthogonal basis. Since hard thresholding is adopted, setting a wavelet coefficient to zero will induce oscillations near discontinuities in estimation. The estimation with universal threshold in Fig.2(c) shows small oscillations at the smooth parts.

Fig. 2. Estimation with different thresholds. (a) original data, (b) noisy data (SNR=23.59dB), (c) estimation with universal threshold (SNR=31.98dB), (d) estimation with SURE threshold

(SNR=34.82dB), (e) estimation with minimax threshold (SNR=33.63dB)


10 2

2

is the estimation of *F* .

*EFF* , (30)

which is measured in decibels:

where *F* is the original data without noise and *F*

The oscillations result in a smaller SNR (31.98dB). The oscillations are less obvious in estimations in Fig.2(d) and Fig.2(e). But noise with very small magnitude is still found. As mentioned before, universal threshold is usually larger than the other two thresholds. Its risk of estimation || ||<sup>2</sup> *r EFF* { } is therefore greater than that of the other two. This can be deduced by values of SNR.
