**5.4 Variational Bayesian approximation**

VBA is a powerful approach to do approximate Bayesian computation. It starts by first obtaining the expression of the joint *p f* ð Þ , *θ*j*g* and then by approximating it with a simpler probability law *q*ð Þ *f*, *θ*j*g* which can be handled much easily for the computations. VBA can be summarized in the following steps:


$$\begin{split} \text{KL}(q:p) &= \iint q \ln q / p = \iint q\_1 q\_2 \ln \frac{q\_1 q\_2}{p} \\ &= \int q\_1 \ln q\_1 + \int q\_2 \ln q\_2 - \iint q \ln p \\ &= -H(q\_1) - H(q\_2) - < \ln p >\_q \end{split} \tag{33}$$

• Alternate optimization of KL(*q*1*q*2:*p*) with respect to *q*<sup>1</sup> and *q*<sup>2</sup> results to:

$$\begin{cases} q\_1(\boldsymbol{\mathfrak{f}}) & \propto \exp\left[ \langle \ln p(\mathbf{g}, \boldsymbol{\mathfrak{f}}, \boldsymbol{\theta}; \mathcal{M}) \rangle\_{q\_2(\boldsymbol{\mathfrak{f}})} \right] \\ q\_2(\boldsymbol{\mathfrak{g}}) & \propto \exp\left[ \langle \ln p(\mathbf{g}, \boldsymbol{\mathfrak{f}}, \boldsymbol{\mathfrak{f}}; \mathcal{M}) \rangle\_{q\_1(\boldsymbol{\mathfrak{f}})} \right] \end{cases} \tag{34}$$

As KL(*q*1*q*2:*p*) is convex as a function of *q*<sup>1</sup> and *q*2, the algorithm converges (locally) to the optimum solution. At the end, we have the expressions of *q*1(*f*) and *q*2ð Þ*θ* which can, then, be used to infer on *f* and *θ:* VBA is summarized in the following scheme:

$$
\boxed{p(f,\theta|\mathbf{g})} \rightarrow \begin{bmatrix}
\text{Variation} \\
\text{Bayesian} \\
\text{Approximation} \\
\hline
\end{bmatrix} \rightarrow q\_1(f) \rightarrow \hat{\theta}
$$

In real applications, we choose parametric probability law for *q*1(*f*) *q*2ð Þ*θ* , and so, the iterations will be done on the parameters. What is interesting is that, choosing appropriate parametric models for *q*1(*f*) and *q*2ð Þ*θ* we obtain either JMAP and GEM as special cases.

• Case 1: Deterministic or degenerate expressions ! Joint MAP

$$\begin{cases} \dot{q}\_1(\boldsymbol{f}|\bar{\boldsymbol{f}}) &= \delta(\boldsymbol{f} - \bar{\boldsymbol{f}}) \\ \dot{q}\_2(\boldsymbol{\theta}|\bar{\boldsymbol{\theta}}) &= \delta(\boldsymbol{\theta} - \bar{\boldsymbol{\theta}}) \end{cases} \rightarrow \begin{cases} \bar{\boldsymbol{f}} = \arg\max\_{\boldsymbol{f}} \left\{ p(\boldsymbol{f}, \bar{\boldsymbol{\theta}}|\mathbf{g}; \mathcal{M}) \right\} \\ \bar{\boldsymbol{\theta}} = \arg\max\_{\boldsymbol{\theta}} \left\{ p\left(\bar{\boldsymbol{f}}, \boldsymbol{\theta}|\mathbf{g}; \mathcal{M}\right) \right\} \end{cases} \tag{35}$$

• Case 2: Degenerate expression for *θ* and marginal expression for *f* ! EM

$$\begin{cases} \dot{q}\_1(\boldsymbol{\theta}) \propto p(\boldsymbol{\mathsf{f}}|\boldsymbol{\theta}, \mathbf{g}) \\ \dot{q}\_2(\boldsymbol{\theta}|\boldsymbol{\bar{\theta}}) = \boldsymbol{\delta}(\boldsymbol{\theta} - \boldsymbol{\bar{\theta}}) \end{cases} \to \begin{cases} \mathcal{Q}(\boldsymbol{\theta}, \boldsymbol{\bar{\theta}}) &= \langle \ln p(\boldsymbol{\mathsf{f}}, \boldsymbol{\theta}|\mathbf{g}; \mathcal{M}) \rangle\_{q\_1(\boldsymbol{f}|\boldsymbol{\bar{\theta}})} \\ \boldsymbol{\bar{\theta}} &= \arg \max\_{\boldsymbol{\theta}} \{ Q(\boldsymbol{\theta}|\boldsymbol{\bar{\theta}}) \} \end{cases} \tag{36}$$

• Case 3: *q*<sup>1</sup> and *q*<sup>2</sup> are chosen proportional to the marginals *p f*j ~*θ*, *g*;M � � and *p θ*j ~*f*, *g*;M � �*:* This is a very appropriate choice for inverse problems, in particular cases where we use the exponential families and conjugate priors.

$$\begin{cases} \dot{q}\_1(\boldsymbol{\mathfrak{f}}) & \lnot p(\boldsymbol{\mathfrak{f}} | \boldsymbol{\tilde{\theta}}, \boldsymbol{\mathfrak{g}}; \boldsymbol{\mathcal{M}})\\ \dot{q}\_2(\boldsymbol{\mathfrak{f}}) & \lnot p(\boldsymbol{\mathfrak{f}} | \boldsymbol{\tilde{\theta}}, \boldsymbol{\mathfrak{g}}; \boldsymbol{\mathcal{M}}) \end{cases} \rightarrow \begin{cases} \text{Accounts for the uncertainties of} \\ \boldsymbol{\tilde{\theta}} \text{ for} \boldsymbol{\hat{\mathcal{J}}} \text{ and vice versa.} \end{cases} \tag{37}$$

In the following schemes these three cases are illustrated for comparison.

• JMAP Alternate optimization Algorithm:

$$\begin{array}{c} \boldsymbol{\theta}^{(0)} \to \boldsymbol{\tilde{\theta}} \to \begin{bmatrix} \boldsymbol{\tilde{f}} = \operatorname{arg\,max}\_{f} \left\{ p \left( \boldsymbol{f}, \boldsymbol{\tilde{\theta}} \middle| \mathbf{g} \right) \right\} \\ \boldsymbol{\uparrow} \\ \boldsymbol{\tilde{\theta}} \gets \boldsymbol{\tilde{\theta}} \leftarrow \begin{bmatrix} \boldsymbol{\tilde{\theta}} = \operatorname{arg\,max}\_{\boldsymbol{\theta}} \left\{ p \left( \boldsymbol{\tilde{f}}, \boldsymbol{\theta} \middle| \mathbf{g} \right) \right\} \\ \hline \end{bmatrix} \gets \boldsymbol{\tilde{f}} \end{array}$$

### **Figure 8.**

*Illustration of advanced Bayesian approach with hierarchical prior modeling with hidden variables.*

• EM:

$$\begin{array}{c} \boldsymbol{\theta}^{(0)} \rightarrow \boldsymbol{\tilde{\theta}} \rightarrow \begin{array}{c} \hline \hline q\_{1}(\boldsymbol{f}) = p\left(\boldsymbol{f}|\boldsymbol{\tilde{\theta}}, \mathbf{g}\right) \\\\ \cline{2-4} \end{array} \rightarrow q\_{1}(\boldsymbol{f}) \rightarrow \boldsymbol{\hat{f}} \\\ \boldsymbol{\hat{\theta}} \leftarrow \boldsymbol{\tilde{\theta}} \leftarrow \begin{array}{c} \hline Q\left(\boldsymbol{\theta}, \boldsymbol{\tilde{\theta}}\right) = \langle \ln p(\boldsymbol{f}, \boldsymbol{\theta}|\mathbf{g})\rangle\_{q\_{1}(\boldsymbol{f})} \\\ \boldsymbol{\tilde{\theta}} = \arg\max\_{\boldsymbol{\theta}} \{Q(\boldsymbol{\theta}, \boldsymbol{\tilde{\theta}})\} \end{array} \leftarrow q\_{1}(\boldsymbol{f}) \end{array}$$

• VBA:

$$\begin{aligned} \boldsymbol{\theta}^{(0)} \to q\_2(\boldsymbol{\theta}) &\to \overline{\left[q\_1(\boldsymbol{f}) \propto \exp\left[\langle \ln p(\boldsymbol{f}, \boldsymbol{\theta} | \mathbf{g}) \rangle\_{q\_2(\boldsymbol{\theta})} \right] \right]} \to q\_1(\boldsymbol{f}) \to \hat{\boldsymbol{f}} \\ &\uparrow \\ \hat{\boldsymbol{\theta}} \gets q\_2(\boldsymbol{\theta}) &\leftarrow \boxed{q\_2 \boldsymbol{\theta} \propto \exp\left[\langle \ln p(\boldsymbol{f}, \boldsymbol{\theta} | \mathbf{g}) \rangle\_{q\_1(\boldsymbol{f})} \right]} \gets q\_1(\boldsymbol{f}) \end{aligned}$$
