**9.1 Large scale linear and Gaussian models**

As we could see in previous chapter, the linear forward model *g* ¼ *Hf* þ *ϵ* with Gaussian noise and Gaussian prior is the simplest case where we can do all the computations analytically.

$$\begin{cases} p(\mathbf{g}|\mathbf{f}) = \mathcal{N}(\mathbf{g}|\mathbf{H}\mathbf{f}, v\_{\epsilon}I) \\ p(\mathbf{g}|\mathbf{f}) = \mathcal{N}(\mathbf{g}|\mathbf{H}\mathbf{f}, v\_{\epsilon}I) \\ p(\mathbf{f}) = \mathcal{N}(\mathbf{f}|\mathbf{f}\_{0}, v\_{\mathcal{f}}I) \end{cases} \rightarrow \begin{cases} p(\mathbf{g}) = \mathcal{N}\left(\mathbf{g}|\mathbf{H}\mathbf{f}, v\_{\mathcal{f}}\mathbf{H}\mathbf{H}' + v\_{\epsilon}I\right), \\ p(\mathbf{f}|\mathbf{g}) = \mathcal{N}\left(\mathbf{f}|\hat{\mathbf{f}}, \hat{\mathbf{2}}\right) \\ \hat{\mathbf{f}} = \mathbf{f}\_{0} + \left[\mathbf{H}'\mathbf{H} + \lambda\mathbf{I}\right]^{-1}\mathbf{H}'\left(\mathbf{g} - \mathbf{H}\mathbf{f}\_{0}\right) \\ \hat{\mathbf{2}} = v\_{\epsilon}[\mathbf{H}'\mathbf{H} + \lambda\mathbf{I}]^{-1}\lambda = \frac{v\_{\epsilon}}{v\_{\mathcal{f}}} \end{cases} \tag{80}$$

The trick here is, for example, for computing ^*f* to use the fact that

$$\begin{cases} p(\mathbf{g}|f) \propto \exp\left[-\frac{1}{2\nu\_c} \|\mathbf{g} - Hf\|\_2^2\right] \\ p(f) \propto \exp\left[-\frac{1}{2\nu\_f} \|f\|\_2^2\right] \\ p(\mathbf{f}|\mathbf{g}) \propto \exp\left[-\frac{1}{2\nu\_c} J(\mathbf{f})\right] \text{with} \\ f(\mathbf{g}) \propto \exp\left[-\frac{1}{2\nu\_c} J(\mathbf{f})\right] \text{ with } J(\mathbf{f}) = \frac{1}{2} \|\mathbf{g} - Hf\|^2 + \lambda \|\mathbf{f}\|\_2^2, \quad \lambda = \frac{\nu\_c}{\nu\_f} \end{cases} \tag{81}$$

and, as the mean and the mode of a Gaussian probability law are the same, we can use:

$$\hat{f} = \underset{f}{\text{argmax}} \{ f(f) \} \text{ with } J(f) = \| \mathbf{g} - \mathbf{H}f \| ^2 + \lambda \| f \| ^2 \tag{82}$$

and so the problem can be cast as an *optimization* problem of a quadratic criterion for which there are a great number of algorithms. Let here to show the simplest one which is the gradient based and so needs the expression of the gradient:

$$\nabla l(\mathbf{f}) = -2H(\mathbf{g} - \mathbf{H}\mathbf{f}) + 2\lambda \mathbf{f} \tag{83}$$

which can be summarized as follows:

$$\begin{cases} \boldsymbol{f}^{(0)} = \mathbf{0} \\ \boldsymbol{f}^{(k+1)} = \boldsymbol{f}^{(k)} + a \left[ \mathbf{H}' \left( \mathbf{g} - \mathbf{H} \boldsymbol{f}^{(k)} \right) + 2 \boldsymbol{\mathcal{Y}}^{(k)} \right] \end{cases} \tag{84}$$

As we can see, at each iteration, we need to be able to compute the *forward operation H f* and the *backward operation H*<sup>0</sup> *δg* where *δg* ¼ *g*–*H f:* This optimization algorithm needs to write two programs:


These two operations can be implemented using High Performance parallel processors such as Graphical Processor Units (GPU).

The computation of the posterior covariance is much more difficult. There are a few methods: The first category is the methods which use the particular structure of the matrix *H* and *H*<sup>0</sup> *H* or *HH*<sup>0</sup> as we can use the matrix inversion lemma and see that

$$
\hat{\Sigma} = \upsilon\_c \left[ \mathbf{H}^\prime \mathbf{H} + \lambda \mathbf{I} \right]^{-1} = \upsilon\_c \mathbf{I} - \mathbf{H}^\prime \left[ \mathbf{H} \mathbf{H}^\prime + \lambda^{-1} \mathbf{I} \right]^{-1} \tag{85}
$$

For example, in a signal deconvolution problem, the matrix *H* has a Toeplitz structure and so have the matrices *H*<sup>0</sup> *H* and *HH*<sup>0</sup> which can be approximated by Circulant matrices and be diagonalized using the Fourier Transform.

The second, more general, is to approximate Σ^ by a diagonal matrix, which can also be interpreted as to approximate the posterior law *p*ð Þ *f*j*g* by a separable *q*ð Þ¼ *f* Π *jq f <sup>j</sup>* � �*:* This brings us naturally to the Approximate Bayesian Computation (ABC). But, before going to the details of ABC methods, let consider the case where the hyperparameters of the problem (parameters of the prior laws) are also unknown.

To be able to do the computation, we need mainly to compute the determinant of the matrix *<sup>v</sup> <sup>f</sup> HH*<sup>0</sup> <sup>þ</sup> *<sup>v</sup>ϵ<sup>I</sup>* for *<sup>p</sup>*(*g*) and the inverse of the matrices *<sup>H</sup>*<sup>0</sup> ½ � *<sup>H</sup>* <sup>þ</sup> *<sup>λ</sup><sup>I</sup>* or *HH*<sup>0</sup> <sup>þ</sup> *<sup>λ</sup>*�<sup>1</sup> *I* � �.
