Image Restoration and Noise Reduction with Context-Dependent Wavelet Graph and ADMM Optimization

*Jinn Ho, Shih-Shuo Tung and Wen-Liang Hwang*

## **Abstract**

We represent the image noise reduction and restoration problems as context-dependent graphs and propose algorithms to derive the optimal graphs by the alternating direction method of multipliers (ADMM) method. An image is spatially decomposed into smooth regions and singular regions, consisting of edges and textures. The graph representing a smooth region is defined in the image domain, while that representing a singular region is defined in the wavelet domain. The optimal graphs are formulated as the solutions of constrained optimization problems over sparse graphs, where the sparseness is imposed on the edges. The graphs on the wavelet domain are solved in a hierarchical layer structure. The convergence and complexity of the algorithms have been studied. Simulation experiments demonstrate that the results of our algorithms are superior to the state-of-the-art algorithms for image noise reduction and restoration.

**Keywords:** image restoration, image denoising, graph, ADMM, wavelet

## **1. Introduction**

We consider the inverse problem of deriving the original image *x*∈ R*<sup>N</sup>* from an observation *y*∈ R*<sup>N</sup>*, expressed as

$$
\mathbf{y} = H\mathbf{x} + \mathbf{n},
\tag{1}
$$

where *H* is an *N* � *N* matrix and *n* � N ð Þ 0*<sup>N</sup>*�1, *σIN*�*<sup>N</sup>* is a vector of independent and identically distributed (i.i.d.) Gaussian random variables with standard deviation *σ*. We further assume that the point spread function *H* is known. By imposing prior information on the desired image, given as

$$
\hat{\mathfrak{x}} = \arg\min\_{\mathfrak{x}} \frac{1}{\sigma^2} \|\mathfrak{y} - H\mathfrak{x}\|^2 + \text{penalty}(\mathfrak{x}),
\tag{2}
$$

where the first term is the data fidelity term for the Gaussian observation model and the second term is the regularization term, measuring the penalty of a solution that deviated away from the prior knowledge of the desired image. The modeling of the desired image is at the core of the approach [1–4]. The primary challenge of solving the problem is to recover the local high-frequency information of edges and texture in the original images that are not present in the observation.

If *H* is the identity matrix, problem (1) is called the noise reduction problem. The solutions vary with what type of noise is contaminated in the observation [5]. If the noise is white Gaussian noise, the state-of-the-art algorithms for the problem are BM3D [6], WBN [7], and DRUNet [8]. BM3D utilizes the tight frame representation of an image, where atoms of the frame are derived from image patches. WBN is a graphical probabilistic model of a weighted directed acyclic graph (DAG) in the wavelet domain. Different from BM3D and WBN, DRUNet is a deep learning method. It is a flexible and powerful deep CNN denoiser and the architecture is the combination of U-Net [9] and ResNet [10]. It not only outperforms the state-of-the-art deep Gaussian denoising models but also is suitable to solve plug-and-play image restoration.

If *H* is a blur singular matrix, problem (1) is called the image restoration problem. In the optimization-based method, the best image restoration performance both subjectively and objectively was derived from the algorithm IDD-BM3D [11]. It utilizes sparse synthetic and analytic models and de-couples the problem into blur inverse and noise reduction sub-problems, each of which is solved by a variational optimization approach. In deep learning, DPIR [8] replaces the denoising sub-problem of modelbased optimization with a learning-based CNN denoiser prior which is DRUNet. By iteratively solving the data sub-problem and a prior sub-problem to restore the image.

In this chapter, we also present a restoration algorithm that combines the noise reduction algorithm with the proximal point method [12]. The primary technical contributions of our methods are the context-dependent graphical representations and the algorithms to derive the optimal graphs of each representation. Finding the optimal graph in a combinatorial way is extremely difficult and likely an NP-hard problem [13, 14]. Unlike the combinatorial approach, we impose constraints on edges and include edges in the optimal graph only when the constraints on the edges are active. This renders a computationally solvable optimization problem and the solution is a graph with only a small number of active edges.

Based on local content in an image, the context-dependent representation divides the image into singular and smooth areas. Singular areas, consisting of edges or texture, are represented and processed differently from the smooth areas. The graphs of singular areas are constructed based on the persistence and sparsity of wavelet coefficients of the image. The persistence is imposed on the inter-scale edges so that the solution at one scale can be used to confine that in adjacent scales. Meanwhile, the sparsity is imposed on the intra-scale edges that preserve the edges in which end nodes have similar intensity. In contrast, a graph of a smooth area is in the image domain and has only sparse intra-scale edges.

The algorithm to derive the optimal graphs, called graphical ADMM, is based on the alternating direction method of multipliers (ADMM) method [15, 16]. It is an efficient and robust algorithm since it breaks a complicated problem into smaller pieces, each of which is easier to handle. In our case, the node update is separated from the edge update in the optimization. In addition, for wavelet graphs, graphical ADMM approximates the multi-scale optimization problem into a sequence of sub-problems; each can be efficiently solved by convex optimization methods.

*Image Restoration and Noise Reduction with Context-Dependent Wavelet Graph and ADMM… DOI: http://dx.doi.org/10.5772/intechopen.102681*

The chapter is organized as follows. In Section 2, we present the models and the construction for the context-dependent graphs. In Section 3, we formulate the noise reduction problem as a graph optimization model and present the graphical ADMM method to derive optimal graphs. In Section 4, the image restoration problem is formulated as the proximal point method that reduces the problem into a sequence of noise reduction problems, each being solved by the method in Section 3. In Section 5, experimental results and the principal differences between our and the compared methods are also discussed. Section 6 contains concluding marks.

### **2. Context-dependent graphical models**

An image is comprised of features of edges, texture, and smooth areas. A common approach to obtain a good image processing result is to treat different features with different approaches [17, 18]. Following this approach, an image is partitioned into two types of blocks. A block containing an edge point or texture is a singular block, while the others are smooth blocks. To keep the flow, we delay the partitioning method of an image, which is described in part A of Section 5, but this is not necessary to accurately partition an image to achieve the performance demonstrated in this chapter. The singular and smooth blocks were handled with different graph optimization approaches: a singular block is in the wavelet domain, while a smooth block is in the image domain. In the wavelet domain, a singular block is represented by several weighted graphs, one corresponding to an orientation. If the wavelet transform has three orientations, LH, HL, and HH, then one graph is for LH sub-bands, another for HL sub-bands, and the third for HH sub-bands. The graph for one orientation is constructed as follows.

Each wavelet coefficient is associated with a node. Edges are comprised of interscale and intra-scale edges. An inter-scale edge connecting nodes in adjacent scales can direct either from a coarse scale to a finer scale or vice versa. The inter-scale edges are built-in and data-independent; they are constructed based on the wavelet persistence. In contrast, an intra-scale edge connecting nodes of the same scale is un-directed, data-dependent, and determined based on the sparseness from the graph optimization algorithm. Regularizations have been imposed on inter-scale edges to preserve the persistence of wavelet coefficients across scales and on intra-scale edges to preserve the similarity of wavelet coefficients on nodes at the two ends of an edge.

#### **2.1 Inter-scale edges**

Since wavelets can characterize singularities, representing singularities with wavelets can facilitate the restoration of edges and texture in an image. The persistence property of wavelets means that the wavelet coefficients dependency and correlations across scales. Thus, inter-scale edges were constructed to link the wavelet coefficients of the same orientation and locations at adjacent scales. Moreover, the correlations of wavelet coefficients from a coarser scale to a finer scale are different from that from a finer scale to a coarser scale. There are two types of inter-scale edges—coarse-to-fine and fine-to-coarse. The coarse-to-fine inter-scale correlation is derived based on the statistical result by Simoncelli [19], who analyzed the correlation between the dyadic wavelet coefficients in a coarse scale to those at the same location and orientation at the immediate finer scale in a natural image. The coarse-to-fine inter-scale correlation of wavelet coefficient *wpi* at a coarse scale and wavelet

coefficient *wc* at the immediate finer scale can be represented in terms of minus logprobability as

$$\frac{k\_1 w\_i^2}{w\_{pi}^2},$$

where *k*<sup>1</sup> is a parameter. Thus, given the wavelet coefficients *wpi* at the coarse scale, Eq. (3) gives the minus log probability of the wavelet coefficient *wi* at the same location and orientation at the immediate fine scale.

On the other hand, the fine-to-coarse inter-scale correlation is derived from the theoretical result of wavelet singularity, analyzed by Mallat and Hwang [20]. Let *wi* and *wci* be wavelet coefficients corresponding to the same singularity at different scales. Then, *wi* and *wci* have the same sign and the correlation, in terms of the ratio of the modulus of wavelet coefficients, from *wci* at a fine scale to *wi* at the immediate coarser scale can be expressed as

$$\frac{|w\_i|}{|w\_{ci}|} = \frac{w\_i}{w\_{ci}} = \mathcal{Z}^{a + \frac{1}{2}},\tag{4}$$

where *α* is the Lipschitz of the singularity. If the singularity is a step edge, then *α* is 0. The exponent *<sup>α</sup>* <sup>þ</sup> <sup>1</sup> <sup>2</sup> in Eq. (4) depends on how a wavelet is normalized. Here, the wavelet is normalized to have unit 2-norm.<sup>1</sup> Eq. (4) can also be expressed in terms of minus log-probability as

$$k\_2 \left( w\_i - \mathfrak{L}^{a + \frac{1}{2}} w\_{ci} \right)^2,\tag{5}$$

where *k*<sup>2</sup> is a parameter. If the type *α* of the singularity is known, given the wavelet coefficient at the finer scale, *wci*, Eq. (5) gives the minus log-probability of the wavelet coefficient *wi* at the coarse scale. Since step edges are the most salient features to be recovered from an image, in this chapter, we set *α* to 0.

#### **2.2 Intra-scale edges**

A coherent or similar structure can be used to leverage the quality of the restoration [2, 21]. This is the principle behind the success of BM3D and the example-based approach in image processing [22]. Many similarity metrics have been proposed to derive the coherent structure, such as the mutual information, the kernel functions, and the Pearson's correlation coefficient. In this chapter, the Pearson's correlation coefficient is modified for some technical concern to derive the intra-scale correlation of random variables *X* and *Y*:

$$d(X,Y) = \max\left\{0, \frac{\mathbf{E}\{(X-\mu\_X)(Y-\mu\_Y)\}}{\sigma\_X \sigma\_Y}\right\} + q,\tag{6}$$

where ð Þ *μX*, *σ<sup>X</sup>* and *μ<sup>Y</sup>* ð Þ , *σ<sup>Y</sup>* are the mean and the standard deviation of *X* and *Y*, respectively, and *q*> 0 is the offset, introduced to avoid *d X*ð Þ¼ , *Y* 0 in inequality constraints in Eq. (8). The value of *d X*ð Þ , *Y* lies in ½ � *q*, 1 þ *q* , measuring the similarity of

<sup>1</sup> If the wavelet is normalized to have unit 1-norm, then the exponent of Eq. (4) should be *α*.

*Image Restoration and Noise Reduction with Context-Dependent Wavelet Graph and ADMM… DOI: http://dx.doi.org/10.5772/intechopen.102681*

#### **Figure 1.**

*An illustrative example of how the metric dh*,*<sup>l</sup> is measured. There are 12 blocks in a sub-band. All the intra-scale edges between the h-th and the l-th nodes are collected. The coefficients on the h-nodes form the random variable X and those on the l-th nodes form the random variable Y. dh*,*<sup>l</sup> is then measured based on Eq. (6).*

*X* and *Y*. The smaller *d X*ð Þ , *Y* is, the more the independence of *X* and *Y* and the less likely *X* and *Y* have coherence structure. As shown in **Figure 1**, the coherence structure in an image is measured on all the coefficients of intra-scale edges in which endpoints take the same locations within a block in a sub-band.

The intra-scale edges are determined based on the sparsity constraint aiming to preserve the edges in which end nodes have similar values. The number of the edges is determined by the parameter:

$$d\left(y\_h, y\_l\right) |\mathbf{x}\_h - \mathbf{x}\_l| \le r,\tag{7}$$

where *d yh*, *yl* is defined in Eq. (6) and obtained from the observation image, and *xh* and *xl* are the coefficients at the *h*-th and *l*-th nodes, respectively. If the observed values *yh* and *yl* are similar, the value of *d yh*, *yl* is large, then <sup>∣</sup>*xh* � *xl*<sup>∣</sup> would be small to satisfy the constraint. This preserves the intensities between *xh* and *xh*. Only the edges satisfying Eq. (7) are retained in the optimal graph. In the following, *dh*,*<sup>l</sup>* is used to simplify the notion *d yh*, *yl* in Eq. (7).

#### **2.3 Graph construction**

The aforementioned are integrated and summarized for our context-dependent representation of an image. An image is divided into blocks. Each block is classified as either a singular block or a smooth block. A singular block is then represented with the dyadic wavelet transform, where the scale is sampled following a geometrical sequence of the ratio of 2 and the spatial domain is not down-sampled. The dyadic wavelet transform of an image is comprised of four sub-bands, LL, LH, HL, and HH, with the last three being the orientation sub-bands. A singular block is associated with

#### **Figure 2.**

*A block of four pixels can be a smooth block (top) or a singular block (bottom). A smooth block is processed in the image domain. A singular block is in the wavelet domain, where a multi-scale graph is associated with each orientation. The blue and green are built-in directed inter-scale edges and the red are un-directed intra-scale edges. The inter-scale edge connects nodes at the same locations and orientation but at scales next to each other. The green edge is coarse-to-fine, linking a node to its parent, while the blue edge is fine-to-coarse, linking a node to its child. The intra-scale edges are determined by graphical ADMM, which decomposes the node update and edge update in the optimization.*

three graphs—one for each orientation sub-bands. Since smooth blocks can be well restored in the image domain, the wavelet transform is not applied to the blocks and a smooth block is associated with a graph in the image domain. Each graph, associated with a singular block or a smooth block, is constructed independently of other graphs. **Figure 2** illustrates an example of graph representation for a block of four pixels.

### **3. Optimal graphs for noise reduction**

The noise reduction problem corresponds to problem (1), where *H* is the identity matrix. Since each graph is solved independently of the other graphs, the following discussion is focused on one graph. A graph can be associated with a singular block or a smooth block.

#### **3.1 Singular blocks**

Let *yi* and *xi* be the wavelet coefficients associated to the *i*-th node in a sub-band of the observation image and the original image, respectively. Its parent node is denoted *Image Restoration and Noise Reduction with Context-Dependent Wavelet Graph and ADMM… DOI: http://dx.doi.org/10.5772/intechopen.102681*

as *p i*ð Þ and child node is *c i*ð Þ. Let *zh*,*<sup>l</sup>* be the variable defined on the intra-scale edge that connects the *h*-th and *l*-th nodes, which are at the same scale and orientation. The optimal wavelet graph can be derived by solving

$$\begin{cases} \min\_{\mathbf{x}\_i, \mathbf{z}\_{k,l}} \left[ \frac{\left(y\_i - \mathbf{x}\_i\right)^2}{2\sigma^2} + \frac{k\_1 \mathbf{x}\_i^2}{\mathbf{x}\_{p(i)}^2} + k\_2 \left(\mathbf{x}\_i - \sqrt{2}\mathbf{x}\_{c(i)}\right)^2 \right] \\\ d\_{h,l} |\mathbf{z}\_{h,l}| \le r, \\\ \mathbf{z}\_{h,l} = \mathbf{x}\_h - \mathbf{x}\_l, \end{cases} \tag{8}$$

where *r*, *σ*, *k*1, and *k*<sup>2</sup> are non-negative parameters and the constraints are designed as explained under Eq. (7), where *r* controls the number of edges in the optimal graph. If a node is at the coarsest scale 2*<sup>J</sup>* , where *J* is the number of decomposition of the wavelet transform, it does not have a parent node and the second term in the object of Eq. (8) is zero; the third term is set to zero for a node at the finest scale 21 as it does not have a child node.

Problem (8) has a convenient matrix representation. Let *x* ¼ *xi* ½ � and *z* ¼ *zh*,*<sup>l</sup>* ½ � be the vectors of variables on nodes and intra-scale edges, respectively. Then, the linear constraints between *zh*,*<sup>l</sup>* and *xh* and *xl* in Eq. (8) can be expressed as

$$\mathbf{A} \begin{bmatrix} \mathbf{x}^T \mathbf{z}^T \end{bmatrix}^T = \mathbf{0},\tag{9}$$

where *A* is a matrix with elements either �1, 1, or 0. Let *A* ¼ ½ � *AxAz* . If follows that

$$A\left[\mathbf{x}^T\mathbf{z}^T\right]^T = A\_{\mathbf{x}}\mathbf{x} + A\_{\mathbf{z}}\mathbf{z}.\tag{10}$$

Each row of *Ax* has one element which value is 1 and another element which value is �1 and the rest of value 0. Meanwhile, each row of has one element of values and the rest of value . Let *<sup>λ</sup>* <sup>¼</sup> *<sup>λ</sup><sup>h</sup>*,*<sup>l</sup>* ½ �<sup>≥</sup> 02 and *<sup>μ</sup>* be the vectors of Lagrangian variables associated to inequality constraints and the equality constraints in Eq. (8), respectively. The augmented Lagrangian of Eq. (8) is

$$\begin{split} L\_{\rho}(\mathbf{x}, \mathbf{z}, \boldsymbol{\lambda}, \boldsymbol{\mu}) &= \sum\_{i} \left[ \frac{\left(\mathbf{y}\_{i} - \mathbf{x}\_{i}\right)^{2}}{2\sigma^{2}} + \frac{k\_{1}\mathbf{x}\_{i}^{2}}{\mathbf{x}\_{p(i)}^{2}} + k\_{2} \left(\mathbf{x}\_{i} - \sqrt{2}\mathbf{x}\_{c(i)}\right)^{2} \right] \\ + \boldsymbol{\mu}^{T}(\mathbf{A}\_{\mathbf{x}}\mathbf{x} + \mathbf{A}\_{\mathbf{z}}\mathbf{z}) &+ \frac{\rho}{2} \left\| \mathbf{A}\_{\mathbf{x}}\mathbf{x} + \mathbf{A}\_{\mathbf{z}}\mathbf{z} \right\|^{2} + \sum\_{h,l} \lambda\_{h,l} d\_{h,l} \left| \mathbf{z}\_{h,l} \right| - \mathbf{r}\mathbf{1}^{T}\boldsymbol{\lambda}, \end{split} \tag{11}$$

where **1** is a vector with all members being 1; and *ρ*>0 are fixed parameters.

The ADMM algorithm intends to blend the decomposability of dual ascent with the superior convergence properties of the method of multipliers. Here, ADMM is used to derive the optimal graph by separating the node and edge update. Graphical ADMM derives the saddle points of

$$\max\_{\lambda \ge 0, \mu} \min\_{\varkappa, x} L\_{\rho}(\varkappa, z, \lambda, \mu) \tag{12}$$

<sup>2</sup> Let *<sup>λ</sup>* <sup>¼</sup> *<sup>λ</sup><sup>h</sup>*,*<sup>l</sup>* ½ �. Then, *<sup>λ</sup>* <sup>≥</sup>0 if and only if *<sup>λ</sup><sup>h</sup>*,*<sup>l</sup>* <sup>≥</sup>0 for all *<sup>h</sup>* and *<sup>l</sup>*.

by iterating the following updates:

$$\begin{cases} \boldsymbol{x}^{k+1} = \arg\min\_{\boldsymbol{x}} L\_{\rho}(\boldsymbol{x}, \boldsymbol{z}^{k}, \boldsymbol{\lambda}^{k}, \boldsymbol{\mu}^{k}); \\\\ \boldsymbol{z}^{k+1} = \mathbb{Q}\Big{(}\arg\min\_{\boldsymbol{x}} L\_{\rho}\big{(}\boldsymbol{x}^{k+1}, \boldsymbol{z}, \boldsymbol{\lambda}^{k}, \boldsymbol{\mu}^{k}\big{)}\Big{)}; \\\\ \boldsymbol{\lambda}^{k+1}\_{h,l} = \mathbb{P}\_{\geq 0}\Big{(}\boldsymbol{\lambda}^{k}\_{h,l} - \varepsilon\Big{[}r - d\_{h,l}|\boldsymbol{z}^{k+1}\_{h,l}|\Big{)}\Big{)}; \\\\ \boldsymbol{\mu}^{k+1} = \boldsymbol{\mu}^{k} + \rho\big{(}A\_{x}\boldsymbol{x}^{k+1} + A\_{x}\boldsymbol{z}^{k+1}\big{)}; \end{cases} \tag{13}$$

where and <sup>≥</sup><sup>0</sup> are the orthogonal projections to satisfy constraints *d <sup>j</sup>*,*l*k k *zh*,*<sup>l</sup>* <sup>1</sup> ≤*r* and *λ<sup>h</sup>*,*<sup>l</sup>* ≥0, respectively; and *ε*>0 is the stepping size for the dual ascent of Lagrangian variables *λ*. The first and second updates in Eq. (13) update the node variables and edge variables, respectively. The update of the dual variables *λ* is derived based on the necessary conditions at an optimum of Eq. (8) that

$$\begin{cases} \lambda\_{h,l} \ge \mathbf{0};\\ \lambda\_{h,l} \left( r - d\_{h,l} |z\_{h,l}^{k+1}| \right) = \mathbf{0}. \end{cases} \tag{14}$$

The third update in Eq. (13) has the following interpretation. If *<sup>r</sup>* � *dh*,*<sup>l</sup>*j*zk*þ<sup>1</sup> *<sup>h</sup>*,*<sup>l</sup>* j � �<sup>&</sup>gt; 0, then *λ<sup>k</sup> <sup>h</sup>*,*<sup>l</sup>* will decrease and keep its value to be non-negative by . The value of *λ<sup>h</sup>*,*<sup>l</sup>* can be repeatedly decreased by increasing the iteration number *k* until either *λ<sup>h</sup>*,*<sup>l</sup>* ¼ 0 or *<sup>r</sup>* � *dh*,*<sup>l</sup>*j*z<sup>k</sup>*þ<sup>1</sup> *<sup>h</sup>*,*<sup>l</sup>* j � � <sup>¼</sup> 0, where the edge is active, and the optimal conditions (14) satisfy. At optimum, either the Lagrangian associated with an edge is zero or the constraint on the edge is active. Only the active edges are retained in the graph. Since the number of active edges is sparse, the edges in the optimal graph are sparse. The solutions for the updates rules for primal variables are derived in Sections 3.1 and 3.2, respectively.

#### **3.2 Smooth blocks**

A smooth block is processed in the image domain, where each pixel is associated with a node in a graph. Problem (8) becomes finding the optimal graph by solving

$$\begin{cases} \min\_{\boldsymbol{x}\_{i}, \boldsymbol{x}\_{h,l}} \sum\_{i} \frac{\left(\boldsymbol{y}\_{i} - \boldsymbol{x}\_{i}\right)^{2}}{2\sigma^{2}}\\\ d\_{h,l}|\boldsymbol{z}\_{h,l}| \leq r, \\\ z\_{h,l} = \boldsymbol{\kappa}\_{h} - \boldsymbol{\kappa}\_{l}. \end{cases} \tag{15}$$

The optimal graph can be derived by a method similar to that for a singular block.

#### **3.3 Update edges**

The update rule for edge variables *z* is to solve, with fixed *x*, *λ*, and *μ*,

$$\begin{cases} \min\_{\boldsymbol{z}} L\_{\rho}(\boldsymbol{x}, \boldsymbol{z}, \boldsymbol{\lambda}, \boldsymbol{\mu}) \\ \quad d\_{h,l} |\boldsymbol{z}\_{h,l}| \leq r. \end{cases} \tag{16}$$

*Image Restoration and Noise Reduction with Context-Dependent Wavelet Graph and ADMM… DOI: http://dx.doi.org/10.5772/intechopen.102681*

If only the terms in Eq. (3) relevant to the optimization variables, f g *zh*,*<sup>l</sup>* , are concerned, Eq. (16) becomes

$$\begin{cases} \min \sum\_{h,l} \left( -\mu\_{h,l}^T \boldsymbol{z}\_{h,l} + \frac{\rho}{2} (\boldsymbol{\varkappa}\_h - \boldsymbol{\varkappa}\_l - \boldsymbol{z}\_{h,l})^2 + \lambda\_{h,l} d\_{h,l} |\boldsymbol{z}\_{h,l}| \right) \\\ |\boldsymbol{z}\_{h,l}| \le \frac{r}{d\_{h,l}}, \end{cases} \tag{17}$$

because *dh*,*<sup>l</sup>* is non-zero as defined in Eq. (6). Equation (17) indicates that each edge variable can be updated independently of the others.

We first solve Eq. (17) without the constraint and obtain the solution *uh*,*<sup>l</sup>* of

$$\arg\min\_{x\_{h,l}}\left(-\mu\_{h,l}^T x\_{h,l} + \frac{\rho}{2} \left(\boldsymbol{\omega}\_h - \boldsymbol{\omega}\_l - \boldsymbol{z}\_{h,l}\right)^2 + \lambda\_{h,l} d\_{h,l} |\boldsymbol{z}\_{h,l}|\right),\tag{18}$$

by soft-thresholding with

$$u\_{h,l} = \begin{cases} \varkappa\_h - \varkappa\_l + \frac{\mu\_{h,l}}{\rho} - \frac{\lambda\_{h,l}}{\rho} d\_{h,l}, & \text{if } \varkappa\_h - \varkappa\_l + \frac{\mu\_{h,l}}{\rho} \ge \frac{\lambda\_{h,l}}{\rho} d\_{h,l}; \\\varkappa\_h - \varkappa\_l + \frac{\mu\_{h,l}}{\rho} + \frac{\lambda\_{h,l}}{\rho} d\_{h,l}, & \text{if } \varkappa\_h - \varkappa\_l + \frac{\mu\_{h,l}}{\rho} \le -\frac{\lambda\_{h,l}}{\rho} d\_{h,l}; \\\ 0, & \text{otherwise.} \end{cases} \tag{19}$$

It is then followed by orthogonally projecting *uh*,*<sup>l</sup>* to satisfy the constraint by solving

$$\begin{cases} \min\_{x\_{h,l}} \frac{1}{2} \left( z\_{h,l} - u\_{h,l} \right)^2 \\ \quad |z\_{h,l}| \le \frac{r}{d\_{h,l}} . \end{cases} \tag{20}$$

Eq. (20) can be solved by a sequence of soft-thresholding operations. The algorithm is sketched as follows. First, we check whether ∣*uh*,*<sup>l</sup>*∣ ≤ *<sup>r</sup> dh*,*<sup>l</sup>* . If it is, *uh*,*<sup>l</sup>* is the solution. Otherwise, we begin with a small *γ* and solve

$$z\_{h,l}^{+} = \arg\min\_{z\_{h,l}} \frac{1}{2} \left( z\_{h,l} - u\_{h,l} \right)^2 + \gamma |z\_{h,l}|.\tag{21}$$

The solution is the soft-thresholding as

$$z\_{h,l}^{+} = \begin{cases} 0, & \text{if} \quad |u\_{h,l}| \le \chi; \\ \left(1 - \frac{\chi}{|u\_{h,l}(i)|}\right) u\_{h,l}, \text{otherwise.} \end{cases} \tag{22}$$

If ∣*z*<sup>þ</sup> *h*,*l* ∣ ≤ *<sup>r</sup> dh*,*<sup>l</sup>* , *zh*,*<sup>l</sup>* is updated to *z*<sup>þ</sup> *h*,*l* , the algorithm stops. Otherwise, *γ* is increased and Eq. (22) is solved again. Since increasing *γ* decreases ∣*z*<sup>þ</sup> *h*,*l* ∣, this algorithm always stops and updates the edge variable *zh*,*<sup>l</sup>* to meet the constraint.

The complexity to update edge variables is analyzed. If the number of pixels of a block is *n* and if the dyadic wavelet transform takes *J* scales, then the number of edge constraints on a singular block is *O Jn*<sup>2</sup> � � and the number of edge constraints on a

smooth block is *O n*<sup>2</sup> ð Þ. Let *<sup>K</sup>*<sup>1</sup> be the maximum number of iterations to derive the solution for Eq. (20) for all graphs. The complexity of one edge update is *O K*1*n*<sup>2</sup> <sup>j</sup>*JF* ð Þ ð Þ jþ3j*JW*j*<sup>J</sup>* , where 3 is the number of orientations, <sup>∣</sup>*JF*<sup>∣</sup> and <sup>∣</sup>*JW*<sup>∣</sup> are numbers of singular blocks and smooth blocks, respectively.

#### **3.4 Update nodes**

The node update for a singular block is more complicated than that for a smooth block because a graph for a singular block has a multi-scale structure, where adjacent scales are linked by inter-scale edges.

#### **3.5 Singular blocks**

To update the nodes *x* in a singular block is to solve the augmented Lagrangian function (3) via

$$\arg\min\_{\mathbf{x}} L\_{\rho}(\mathbf{x}, \mathbf{z}, \boldsymbol{\lambda}, \boldsymbol{\mu}),\tag{23}$$

for given *z*, *λ*, and *μ*. This is not a convex problem because the second term in Eq. (3) is non-convex.

Our approach is to decompose the problem based on the scale parameter into a sequence of sub-problems. Each scale is associated with two convex sub-problems: one is a coarse-to-fine sub-problem and the other is a fine-to-coarse sub-problem. The coarse-to-fine sub-problem assumes the parent nodes at scale 2*<sup>s</sup>*þ<sup>1</sup> were updated earlier, while the fine-to-coarse sub-problem assumes the child nodes at scale 2*<sup>s</sup>*þ<sup>1</sup> were updated earlier. Let *k* be the current iteration number. The course-to-fine subproblem updates the nodes at scale 2*<sup>s</sup>* by minimizing<sup>3</sup>

$$\sum\_{i} \left[ \frac{\left(\boldsymbol{y}\_{i} - \boldsymbol{\omega}\_{i}\right)^{2}}{2\sigma^{2}} + \frac{k\_{1}\boldsymbol{\omega}\_{i}^{2}}{\left[\boldsymbol{\omega}\_{p(i)}^{k}\right]^{2}} \right] + \sum\_{h,l} \left[\boldsymbol{\mu}\_{h,l}^{k-1} \left(\boldsymbol{z}\_{h,l}^{k-1} - (\boldsymbol{\omega}\_{h} - \boldsymbol{\omega}\_{l})\right) + \frac{\rho}{2} \left(\boldsymbol{z}\_{h,l}^{k-1} - (\boldsymbol{\omega}\_{h} - \boldsymbol{\omega}\_{l})\right)^{2} \right]. \tag{24}$$

On the other hand, the fine-to-coarse sub-problem updates the nodes at scale by minimizing<sup>4</sup>

$$\sum\_{i} \left[ \frac{(\mathbf{y}\_i - \mathbf{x}\_i)^2}{2\sigma^2} + k\_2 \left(\mathbf{x}\_i - \sqrt{2}\mathbf{x}\_{\varepsilon(i)}^k\right)^2 \right] + \sum\_{h,l} \left[ \mu\_{h,l}^{k-1} \left(\mathbf{z}\_{h,l}^{k-1} - (\mathbf{x}\_h - \mathbf{x}\_l)\right) + \frac{\rho}{2} \left(\mathbf{z}\_{h,l}^{k-1} - (\mathbf{x}\_h - \mathbf{x}\_l)\right)^2 \right]. \tag{25}$$

The node update problem (23) can then be approximated by repeatedly solving the coarse-to-fine iteration followed by the fine-to-coarse iteration. The coarse-to-fine iteration solves a sequence of the coarse-to-fine sub-problems beginning at the

<sup>3</sup> The second term at below is zero, when 2*<sup>s</sup>* is the coarsest scale.

<sup>4</sup> The second term below is zero when nodes are at the finest scale.

*Image Restoration and Noise Reduction with Context-Dependent Wavelet Graph and ADMM… DOI: http://dx.doi.org/10.5772/intechopen.102681*

coarsest scale. In contrast, the fine-to-coarse iteration solves a sequence of the fine-tocoarse sub-problems beginning at the finest scale.

Problems (24) and (25) can be efficiently solved. The objectives in the subproblems are strictly convex functions because their Hessian matrices are positive definite (as can be observed from the inverse matrix of Eqs. (26) and (27)) and, thus, the optimal solution of each is unique. The closed-form solutions of the sub-problems can be derived as follows.

For convenience, we omit all the superscript index in Eqs. (24) and (25) and let *As x* and *As <sup>z</sup>* denote the sub-matrices of *Ax* and *Ax*, respectively. *A<sup>s</sup> <sup>x</sup>* and *As <sup>z</sup>* retain only the rows and columns in *Ax* and *Ax* corresponding to the nodes and edges at scale 2*<sup>s</sup>* , respectively. We also let *xs*, *zs*, and *μ<sup>s</sup>* denote the vectors of nodes, edges, and Lagrangian variables at scale 2*<sup>s</sup>* . The closed-form solution of *xs* of the coarse-to-fine sub-problem is

$$\begin{cases} \left[\frac{1}{\sigma^2}I + \mathbf{C}\_{s+1}^T \mathbf{C}\_{s+1} + \rho \left(\mathbf{A}\_x^\circ\right)^T A\_x^\circ \right]^{-1} \left(\frac{1}{\sigma^2} y\_s - \left(A\_x^\circ\right)^T \left(\mu\_s - \rho A\_x^\circ x\_s\right) \right) \text{for } s \neq f;\\ \left[\frac{1}{\sigma^2}I + \rho \left(A\_x^\circ\right)^T A\_x^\circ \right]^{-1} \left(\frac{1}{\sigma^2} y\_s - \left(A\_x^\circ\right)^T \left(\mu\_s - \rho A\_x^\circ x\_s\right) \right) \text{for } s = f; \end{cases} \tag{26}$$

where *Cs*þ<sup>1</sup> is a diagonal matrix which diagonal element at ð Þ *i*, *i* is ffiffiffi *k*1 p ffiffi 2 <sup>p</sup> <sup>∥</sup>*xp i*ð Þ<sup>∥</sup> and 2*<sup>J</sup>* is the coarsest scale. On the other hand, the closed-form solution of *xs* for the fine-tocoarse sub-problem is

$$\begin{cases} \left[ \left( \frac{1}{\sigma^2} + 2k\_2 \right) I + \rho \left( A\_x^s \right)^T A\_x^s \right]^{-1} \left( \frac{1}{\sigma^2} y\_s + 2\sqrt{2}k\_2 x\_{s-1} - \left( A\_x^s \right)^T \left( \mu\_s - \rho A\_x^s z\_s \right) \right) \text{for } s \ge 2;\\ \left[ \frac{1}{\sigma^2} I + \rho \left( A\_x^s \right)^T A\_x^s \right]^{-1} \left( \frac{1}{\sigma^2} y\_s - \left( A\_x^s \right)^T \left( \mu\_s - \rho A\_x^s z\_s \right) \right) \text{for } s = 1; \end{cases} \tag{27}$$

The complexity of the matrix inversion in Eqs. (26) and (27) is low since *As <sup>x</sup>* is a sparse matrix and each row of *A<sup>s</sup> <sup>x</sup>* has at most one 1 and one �1 and the rest are zero. The complexity of the sparse matrix inversion in Matlab is proportional to the number of non-zero elements in the matrix. Thus, one iteration of either coarse-tofine or fine-to-coarse of a graph takes the complexity *O Jn* ð Þ, where *J* is the number of decomposition and *n* is the number of pixels at a scale.

#### **3.6 Smooth blocks**

The node update for a smooth block can be analytically derived from the problem (15) at the condition that f g *zh*,*<sup>l</sup>* is given. If the superscript index is omitted, the closed-form solution is

$$\left[\frac{1}{\sigma^2}I + \rho A\_\mathbf{x}^T A\_\mathbf{x}\right]^{-1} \left(\frac{1}{\sigma^2}\mathbf{y} - \left(A\_\mathbf{x}^\epsilon\right)^T (\mu - \rho A\_\mathbf{z}\mathbf{z})\right),\tag{28}$$

where *Ax* and *Az* are defined in Eq. (10). The complexity of the inversion of the sparse matrix <sup>1</sup> *<sup>σ</sup>*<sup>2</sup> *<sup>I</sup>* <sup>þ</sup> *<sup>ρ</sup>AT <sup>x</sup> Ax* is propositional to the non-zero elements in the matrix, which is *O n*ð Þ, where *n* is number of pixels in a block.

If an image has ∣*JW*∣ singular blocks and ∣*JF*∣ smooth blocks, and if *J* is the number of wavelet decompositions, the total complexity of node updates of the image is *O n* 3*K*2*J*j*JW*jþj*JF* ð Þ ð Þj , where *K*<sup>2</sup> is the maximum number of iterations of coarse-tofine and fine-to-coarse node update for a singular block.

Graphical ADMM consists of a sequence of updating the primal and dual variables and the complexity of the algorithm is dominated by the primal variable updates. Our analysis of one iteration of Eq. (13) for node updates and edge updates indicates that the costs are *O n* <sup>3</sup>*K*2*J*j*JW*jþj*JF* ð Þ ð Þj and *O K*1*n*<sup>2</sup> <sup>3</sup>j*JW*j*J*þj*JF* ð Þ ð Þj , respectively, where *<sup>n</sup>* is the number of pixels in a block.

#### **4. Optimal graphs for image restoration**

There are various image restoration methods [23]. Here, we use the proximal approach proposed in [24] and [12]. The method smartly reduces the image restoration problem into a sequence of noise reduction problems. Since graphical ADMM for noise reduction is efficient, it can be adopted to derive the optimal graphs for image restoration. Like for noise reduction, a graph is handled independently of the other graphs. The following discussion is focused on deriving the optimal graph for a block.

Let *h x*ð Þ be the objective function

$$\frac{1}{2} \| \mathbf{y} - H\mathbf{x} \|^2,\tag{29}$$

with a known blur kernel *H*; *x*<sup>0</sup> is the vector of the current restored image. The proximity function is defined as

$$d\_H(\mathbf{x}, \mathbf{x}\_0) = \frac{\beta}{2} \|\mathbf{x} - \mathbf{x}\_0\|^2 - \frac{1}{2} \|H\mathbf{x} - H\mathbf{x}\_0\|^2. \tag{30}$$

The parameter *β* is chosen so that *dH*ð Þ *x*, *x*<sup>0</sup> is strictly convex with respect to *x*. This implies that its Hessian *<sup>β</sup><sup>I</sup>* � *<sup>H</sup>TH* is a positive definite matrix, which can be achieved by choosing *β* > *λ*max *HTH* (the maximal eigenvalue of the matrix *HTH*). The proximal objective is defined as

$$
\tilde{h}(\mathfrak{x}, \mathfrak{x}\_0) = h(\mathfrak{x}) + d\_H(\mathfrak{x}, \mathfrak{x}\_0). \tag{31}
$$

Simplifying the above objective, we have the following simpler form by removing ∥*Hx*∥ from the proximal objective as

$$\tilde{h}(\mathbf{x}, \mathbf{x}\_0) = \frac{\beta}{2} \| \mathbf{x} - \left( \mathbf{x}\_0 + \frac{\mathbf{1}}{q} H^T (\mathbf{y} - H \mathbf{x}\_0) \right) \|^2 + K \tag{32}$$

where *K* contains terms unrelated to *x*. Since *x*0, *q*, *H*, *y* are given, the proximal objective can be regarded as a noise reduction problem with the observation vector, *<sup>x</sup>*<sup>0</sup> <sup>þ</sup> <sup>1</sup> *q <sup>H</sup><sup>T</sup>*ð Þ *<sup>y</sup>* � *Hx*<sup>0</sup> . Thus, <sup>~</sup> *h x*ð Þ , *x*<sup>0</sup> can be the first term in noise reduction problem (8) and the algorithm for noise reduction can be used to derive the optimal *Image Restoration and Noise Reduction with Context-Dependent Wavelet Graph and ADMM… DOI: http://dx.doi.org/10.5772/intechopen.102681*

graph via separating the node and edge updates following Eq. (13) and procedures in Section 3.

#### **5. Experiments and comparisons**

We consider several image denoising and deblurring scenarios used as the benchmarks in state-of-the-art algorithms for performance evaluations and comparisons. The setting of experiments is given as follows. The experiments were conducted on images in Sets I and II in **Figure 3**. Set I contains six gray-scaled natural images, Einstein, Boat, Barbara, Lena, Cameraman, and House. The size of each image is 512 512 or 256 256, downloaded from the USC-SIPI image database [25]; and Set II contains six gray-scaled textures. Some of them were taken from the Brodatz texture set. Through all experiments, each image is divided into 16 equal-sized blocks. A singular block is decomposed into four scales dyadic wavelet transform with the CDF 9/7 wavelet filters. Since the CDF 9/7 filters are close to orthogonal wavelet filters, the noise variance at any sub-band can be set to *σ*2, the variance of noise in the image domain [7].

#### **5.1 Noise reduction performance**

Our noise reduction performance was compared against that of BM3D, WBN, and DRUNet. The perceptual quality of the methods is shown in **Figure 4**. The Lena image of BM3D over-smooths the highlighted area of hat, which is rich in edges and textures. Similarly, textures in the highlighted area of hat in DRUNet are smooth. The image of WBN, on the contrary, under-smooths the highlighted smooth area around the chin and shoulder of Lena. These artifacts have been amended by graphical ADMM, as shown in **Figure 4f**.

The quantity comparisons, measured by the peak-signal-to-noise ratio (PSNR), of Set I and Set II are shown in **Tables 1** and **2**, respectively. The testing environments were images contaminated with the noise of variances, *σ*2. As shown, the deep learning-based method (DRUNet) achieves the highest score in almost all environments. However, in the optimization-based methods (BM3D, WBN, proposed), graphical ADMM achieves unanimously the highest score in all environments.

#### **Figure 3.**

*The images in set I (first row) and set II (second row). Set I contains six gray-scaled natural images and set II contains six gray-scaled textures.*

#### **Figure 4.**

*Comparisons of the denoised Lena images derived by BM3D, WBN, DRUNet, and graphical ADMM. The noise standard deviation is set at σ* ¼ 25*: (a) the original* 512 � 512 *Lena image; (b) the noised image; (c) the result of BM3D; (d) the result of WBN; (e) the result of DRUNet; and (f) the result of graphical ADMM. Graphical ADMM preserves both the smooth and edged areas in the original image, as shown in the highlighted areas.*

#### **5.2 Image restoration performance**

**Table 3** presents five-point spread functions (PSFs) used for image restoration in literature [11]. Each PSF was normalized to have unit 1-norm before it was used to


*Image Restoration and Noise Reduction with Context-Dependent Wavelet Graph and ADMM… DOI: http://dx.doi.org/10.5772/intechopen.102681*

#### **Table 1.**

*Comparisons of the PSNRs of the noise reduction methods on noisy images with the noise of standard deviation σ in set I.*



#### **Table 2.**

*Comparisons of the PSNRs of the noise reduction methods on set II texture images.*


#### **Table 3.**

*Blur kernels for experiments. The dx and dy in h*<sup>1</sup> *are, respectively, the horizontal and vertical distances of a pixel to the center of the blur kernel.*

blur an image. The performance was compared with the state-of-the-art methods, IDD-BM3D and DPIR. To have fair comparisons, both methods used the same initial images in each experiment. The visual quality of the restored images is shown in **Figure 5** and the blue and red boxes are magnifications of the highlighted areas in the *Image Restoration and Noise Reduction with Context-Dependent Wavelet Graph and ADMM… DOI: http://dx.doi.org/10.5772/intechopen.102681*

**Figure 5.**

*Comparisons of the deblurred images. The blue and red boxes are the magnified areas in the image. (a) the original* 512 512 *boat image; (b) the blurred image with blur kernel h*4*; (c) the image of IDD-BM3D; (d) the image of DPIR; and (e) the image of graphical ADMM. The overall perceptual quality of our image is better since that of IDD-BM3D and DPIR are over-smoothed.*

image. Compared with the original images, the overall perceptual quality of the images of IDD-BM3D and DPIR appear over-smoothed, whereas graphical ADMM preserves more image details, leading to better perceptual quality. Graphical ADMM can preserve more details because it uses the multi-scale approach in treating the texture and edge regions. The wavelet persistence property allows information at coarse scales to pass to fine scales and vice versa. As a result, graphical ADMM yields shaper results in recovering singular points in images.

The quantity comparison is shown in **Figure 6**, where the performance improvement of graphical ADMM over IDD-BM3D was measured by the ISNR (increased signal-to-noise ratio) [26]. The ISNR quantitatively assesses the restored images with known ground truths. Let *y*, *x*, and *x*<sup>0</sup> be the vector representations of the observation, the restored image, and the ground truth, respectively; the ISNR is defined as

#### **Figure 6.**

*Average and standard deviation of the ISNR gain of DPIR over IDD-BM3D and graphical ADMM over IDD-BM3D. Each image is blurred and then added to white noise of standard deviation indicated by the noise level. The image was then deblurred. An ISNR gain was calculated from the de-blurred images. The circled point and bar of a measurement at a noise level are the average and standard deviation, respectively, of thirty ISNR gains of natural images from set I ((a) DPIR over IDD-BM3D and (b) graphical ADMM over IDD-BM3D) and texture images from set II ((c) DPIR over IDD-BM3D and (d) graphical ADMM over IDD-BM3D). As shown, the curves of ISNR gain increase steadily and progressively when the noise level increases.*

$$\mathbf{10}\,\mathrm{log}\_{10}\frac{\left\|\mathbf{y}-\mathbf{x}\_{0}\right\|^{2}}{\left\|\mathbf{x}-\mathbf{x}\_{0}\right\|^{2}}.\tag{33}$$

The higher the ISNR value of a restored image, the better the restoration quality of the image. The ISNR gain of graphical ADMM over that of IDD-BM3D is defined as

ISNR graphical ð Þ ADMM ‐ISNR IDD ð Þ ‐BM3D , (34)

and the ISNR gain of DPIR over that of IDD-BM3D is defined as

$$\text{ISNR}(\text{DPIR})\text{-ISNR}(\text{IDD-BM3D}).\tag{35}$$

**Figure 6a** and **c** show the ISNR gain of DPIR over IDD-BM3D in Set I and Set II, respectively. **Figure 6b** and **d** show the ISNR gain of graphical ADMM over IDD-BM3D in Set I and Set II, respectively. Let us take **Figure 6b** as an example. At a noise level, each image in Set I was first blurred by a kernel in **Table 3**. The result was added to white noise to obtain a noisy blurred image. This procedure generated thirty noisy blurred images since Set I contains six images and **Table 3** has five blur kernels. Each noisy blurred image was deblurred. The ISNR gain of the image obtained by graphical ADMM and that by IDD-BM3D was calculated. The thirty ISNR gains were then used to calculate the mean and standard derivation, as shown in **Figure 6**. The mean ISNR

*Image Restoration and Noise Reduction with Context-Dependent Wavelet Graph and ADMM… DOI: http://dx.doi.org/10.5772/intechopen.102681*

gain of graphical ADMM increases steadily and progressively over IDD-BM3D, as the noise level increases.

#### **5.3 Discussions**

DRUNet and DPIR are deep learning methods and the training data with a noise level of *σ* ranges from 0 to 50. In the experiments, they have the best performance in quantity comparisons but graphical ADMM is the best in visual quality. For learningbased methods, the training data is important and related to the performance. The inferred results are data-driven and not interpretable. If the training data is less or the distribution of the testing data is not similar to training, the performance will be worse. If the artifacts occurred in the results, we do not know how it happened because the network is just composed of many coefficients trained from the training data. At the same time, the time cost for training is very high. These are all the drawbacks of learning-based methods. However, the optimization-based methods are not limited to the training data. The results are derived from the objective function and interpretable. In addition, they are more stable in practical applications. So, there is a trade-off between learning and optimization-based methods.

For the optimization-based methods, the experiments have demonstrated the advantages of graphical ADMM in both the noise reduction and image restoration tasks over the compared methods. Recall that BM3D and IDD-BM3D adopt the image-dependent tight frame representations. IDD-BM3D also combines the analytic and synthetic optimization methods by de-coupling the noise reduction problem and the image restoration problem. This yields a game-theoretical approach that two formulations are used to minimize a single objective function. The solution adopted by IDD-BM3D is a Nash equilibrium point. The WBN represents an image as a multi-scale probabilistic DAG and adopts the belief propagation to derive the MAP solution.

The advantages of graphical ADMM lie in the context-dependent decompositions of an image horizontally in space and vertically along the scales in handling the image details. The spatial decomposition allows our method to overcome the cons of undersmoothing the smooth areas in WBN and keeps the pros of WBN that preserves sharp edges. Meanwhile, graphical ADMM is much more efficient than the time-consuming belief propagation adopted in WBN.

The mixture of data-dependent and data-independent edges in wavelet graph construction is a significant feature of our method. The intra-scale edges are determined by a data-dependent adaptive process, which imposes sparseness by keeping the edges which end nodes have similar coefficients in the optimal graph. The inter-scale edges are data-independent, built-in to leverage the wavelet persistence property. The inter-scale edges, passing information of singularities from finer scales to coarser scales and vice versa, can preserve more texture and edges in original images. This distinguishes our algorithm from BM3D and IDD-BM3D, which encode structure in atoms of a dictionary and select a few atoms for image representation.

#### **6. Conclusions**

We present a novel approach by combining spatial decomposition, vertical (multiscale) decomposition, and ADMM optimization in a graphical framework for image

noise reduction and restoration tasks. The graphical ADMM method has demonstrated that its results are superior to those of state-of-the-art algorithms. We also demonstrated that mixing data-dependent and data-independent structures in a graph representation can leverage the sparseness and persistence of a wavelet representation. Rather than adopting a combinatorial approach to derive an optimal graph, we showed that the graph can be derived by a numerically tractable optimization approach. In addition, we showed that the optimization problem is well coupled with our graph representation, and can be decomposed into a sequence of convex subproblems, with each having an efficient closed-form solution. This opens a new perspective of combining a mixture of data-adaptive and data-independent structures, hierarchical decomposition, and optimization algorithms in modeling, representing, and solving more image processing tasks.

## **Author details**

Jinn Ho, Shih-Shuo Tung and Wen-Liang Hwang\* Institute of Information Science, Academia Sinica, Taiwan

\*Address all correspondence to: whwang@iis.sinica.edu.tw

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Image Restoration and Noise Reduction with Context-Dependent Wavelet Graph and ADMM… DOI: http://dx.doi.org/10.5772/intechopen.102681*

## **References**

[1] Rudin L, Osher S, Fatemi E. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena. 1992;**60**(1–4):259-268. DOI: 10.1016/0167-2789(92)90242-F

[2] Buades A, Coll B, Morel J. A review of image denoising algorithms, with a new one. Multiscale Modeling & Simulation. 2005;**4**(2):490-530. DOI: 10.1137/ 040616024

[3] Elad M, Aharon M. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing. 2006; **15**(12):3736-3745. DOI: 10.1109/TIP. 2006.881969

[4] Chatterjee P, Milanfar P. Is denoising dead? IEEE Transactions on Image Processing. 2010;**19**(4):895-911. DOI: 10.1109/TIP.2009.2037087

[5] Li J, Shen Z, Yin R, Zhang X. A reweighted l2 method for image restoration with poisson and mixed poisson-gaussian noise. Inverse Problem and Imaging. 2015;**9**(3):875-894. DOI: 10.3934/ipi.2015.9.875

[6] Dabov K, Foi A, Katkovnik V, Egiazarian K. Image denoising by sparse 3-D transform-domain collaborative filtering. IEEE Transactions on Image Processing. 2007;**16**(8):2080-2095. DOI: 10.1109/TIP.2007.901238

[7] Ho J, Hwang W. Wavelet Bayesian network image denoising. IEEE Transactions on Image Processing. 2013; **22**(4):1277-1290. DOI: 10.1109/ TIP.2012.2220150

[8] Zhang K, Li Y, Zuo W, Zhang L, Gool L, Timofte R. Plug-and-play image restoration with deep denoiser prior. IEEE Transactions on Pattern Analysis

and Machine Intelligence. 2021. DOI: 10.1109/TPAMI.2021.3088914

[9] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention. 2015:234-241. DOI: 10.1007/978-3-319-24574-4\_28

[10] He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778. DOI: 10.1109/CVPR.2016.90

[11] Danielyan A, Katkovnik V, Egiazarian K. BM3D frames and variational image deblurring. IEEE Transactions on Image Processing. 2012; **21**(4):1715-1728. DOI: 10.1109/TIP.2011. 2176954

[12] Zibulevsky M, Elad M. L1-L2 optimization in signal and image processing. IEEE Signal Processing Magazine. 2010;**27**(3):76-88. DOI: 10.1109/MSP.2010.936023

[13] Heckerman D. A Tutorial on learning bayesian networks. In: Innovations in Bayesian Networks. Berlin/Heidelberg: Springer; 2008. pp. 33-82. DOI: 10.1007/ 978-3-540-85066-3\_3

[14] Chickering D, Heckerman D, Meek C. A Bayesian approach to learning Bayesian networks with local structure. In Proceedings of Thirteenth Conference on Uncertainty in Artificial Intelligence. 1997:80-89. Available from: https://dl.ac m.org/doi/pdf/10.5555/2074226.2074236

[15] Cai J, Osher S, Shen Z. Split Bregman methods and frame based image restoration. Multiscale Modeling &

Simulation. 2009;**8**(2):337-369. DOI: 10.1137/090753504

[16] Boyd S, Parikh N, Chu E, Peleato B, Eckstein J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning. 2011;**3**(1):1-122. DOI: 10.1561/2200000016

[17] Hoiem D, Efros A, Hebert M. Geometric context from a single image. IEEE International Conference on Computer Vision. 2005;**1**:654-661. DOI: 10.1109/ICCV.2005.107

[18] Ji H, Luo Y, Shen Z. Image recovery via geometrically structured approximation. Applied and Computational Harmonic Analysis. 2016;**41**(1):75-93. DOI: 10.1016/j.acha. 2015.08.012

[19] Simoncelli E. Bayesian denoising of visual images in the wavelet domain. Bayesian Inference in Wavelet Based Models. 1999;**141**:291-308. DOI: 10.1007/ 978-1-4612-0567-8\_18

[20] Mallat S, Hwang W. Singularity detection and processing with wavelets. IEEE Transactions on Information Processing. 1992;**38**(2):617-643. DOI: 10.1109/18.119727

[21] Milanfar P. A tour of modern image filtering: New insights and methods, both practical and theoretical. IEEE Signal Processing Magazine. 2013;**30**(1): 106-128. DOI: 10.1109/MSP.2011. 2179329

[22] Sreedevi P, Hwang W, Lei S. An examplar-based approach for texture compaction synthesis and retrieval. IEEE Transactions on Image Processing. 2010; **19**(5):1307-1318. DOI: 10.1109/TIP. 2009.2039665

[23] Mairal J, Elad M, Sapiro G. Sparse representation for color image restoration. IEEE Transactions on Image Processing. 2008;**17**(1):53-69. DOI: 10.1109/TIP.2007.911828

[24] Daubechies I, Defrise M, De-Mol C. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics. 2004;**57**(11): 1413-1457. DOI: 10.1002/cpa.20042

[25] The USC-SIPI Image Database, Signal and Image Processing Institute, University of Southern California. Available from: https://sipi.usc.edu/ database/

[26] Hanif M, Seghouane A. Blind image deblurring using non-negative sparse approximation. IEEE International Conference on Image Processing. 2014. DOI: 10.1109/ICIP.2014.7025821

## Section 4
