**8.1 Sparsity awarded hierarchical models**

An easy way to consider the hierarchical sparsity awarded priors is to introduce a hidden variable, *z* and so consider the following Forward and prior models:

$$\begin{cases} \mathbf{g} = \mathbf{H}\mathbf{f} + \boldsymbol{\varepsilon}, \\ \mathbf{f} = \mathbf{D}\mathbf{z} + \boldsymbol{\xi}, \quad \text{z sparse modeled by Double Exp} \ (\mathbf{DE}) \end{cases} \tag{53}$$

with

$$\begin{cases} p(\mathbf{g}|\mathbf{f}) = \mathcal{N}(\mathbf{g}|\mathbf{H}\mathbf{f}, v\_{c}I) \\ p(\mathbf{f}|\mathbf{z}) = \mathcal{N}(\mathbf{f}|\mathbf{D}\mathbf{z}, v\_{\xi}I) \to \\ p(\mathbf{z}) = \mathcal{D}\mathcal{E}(\mathbf{f}|\mathbf{y}) \propto \exp\left[-\gamma||\mathbf{z}||\_{1}\right] \end{cases} \tag{54}$$

Then, we have to find the expression of the joint posterior law *p*ð Þ *f*, *z*j*g* :

$$\begin{cases} p(\mathbf{f}, \mathbf{z}|\mathbf{g}) \propto \exp\left[-J(\mathbf{f}, \mathbf{z})\right] \quad \text{with} \\ J(\mathbf{f}, \mathbf{z}) = \frac{1}{2\nu\_c} \|\mathbf{g} - \mathbf{H}\mathbf{f}\|\_2^2 + \frac{1}{2\nu\_\xi} \|\mathbf{f} - \mathbf{D}\mathbf{z}\|\_2^2 + \gamma \|\mathbf{z}\|\_1 \end{cases} \tag{55}$$

from which we can infer on *f* and *z* [10, 16, 18–22]. For the unsupervised case, we can add the appropriate priors:

$$\begin{cases} p(\boldsymbol{\eta}) = \mathcal{Z}\mathcal{G}\Big(\boldsymbol{\eta}|a\_{\boldsymbol{\gamma}\_{x}}, \boldsymbol{\beta}\_{\boldsymbol{\gamma}\_{x}}\Big) \\ p(\boldsymbol{v}\_{\boldsymbol{\epsilon}}) = \mathcal{Z}\mathcal{G}\big(v\_{\boldsymbol{\epsilon}}|a\_{\boldsymbol{\epsilon}\_{0}}, \boldsymbol{\beta}\_{\boldsymbol{\epsilon}\_{0}}\Big) \\ p(\boldsymbol{v}\_{\boldsymbol{\xi}}) = \mathcal{Z}\mathcal{G}\Big(\boldsymbol{v}\_{\boldsymbol{\xi}}|a\_{\boldsymbol{\xi}\_{x}}, \boldsymbol{\beta}\_{\boldsymbol{\xi}\_{x}}\Big) \end{cases} \tag{56}$$

and thus obtain:

$$\begin{cases} \begin{aligned} & \left[ p(\mathbf{f}, \mathbf{z}, \boldsymbol{\gamma}, \boldsymbol{v}\_{\varepsilon}, \upsilon\_{\varepsilon} | \mathbf{g}) \propto \exp \left[ -J(\mathbf{f}, \boldsymbol{\varpi}, \boldsymbol{\gamma}, \upsilon\_{\varepsilon}, \upsilon\_{\varepsilon}) \right] \right] \end{aligned} & \text{with} \\\begin{aligned} & J(\mathbf{f}, \boldsymbol{\varpi}, \upsilon\_{\varepsilon}, \upsilon\_{\varepsilon}, \boldsymbol{\gamma}) = \frac{1}{2\upsilon\_{\varepsilon}} ||\mathbf{g} - H\mathbf{f}||\_{2}^{2} + \frac{1}{2\upsilon\_{\xi}} ||\mathbf{f} - \mathbf{D}\mathbf{z}||\_{2}^{2} + \gamma ||\mathbf{z}||\_{1} + \\ & \quad \left( a\_{\gamma\_{\varepsilon}} + n/2 \right) \ln \boldsymbol{\gamma} + \beta\_{\gamma\_{\varepsilon}}/\gamma + \\ & \quad \left( a\_{\varepsilon\_{0}} + m/2 \right) \ln \upsilon\_{\varepsilon} + \beta\_{\varepsilon\_{0}}/\upsilon\_{\varepsilon} + \\ & \quad \left( a\_{\xi\_{\pi}} + n/2 \right) \ln \upsilon\_{\xi} + \beta\_{\xi\_{\pi}}/\upsilon\_{\xi} \end{aligned} & \end{cases} \tag{57}$$

It is interesting to note that the alternate optimization of this criterion gives the ADMM like algorithms [23–25] with the main advantage that here we have direct updates of the hyperparameters.
