Preface

*Inverse Problems - Recent Advances and Applications* examines new aspects of the mathematical modeling of inverse problems, their applications in physical systems, and the computational methods used. The book's five chapters are divided into two sections. Section 1, "Modeling and Formulations of Inverse Problems", discusses new approaches in the modeling and formulation of inverse problems in some physical systems. Chapter 1 formulates the initial statement of the inverse problem. Chapter 2 introduces recent applications of the Ensemble Kalman Filter to inverse problems, known as ensemble Kalman inversion. The subject of Chapter 3 is the evaluation of gradients in inverse problems where spatial field parameters and geometry parameters are treated separately.

Section 2, "Some Computational Aspects", concerns mathematical methods of solving some inverse problems. Chapter 4 reviews individual solutions for the tomographic problem, including strategies for removing deficiencies of the ill-posed problem by using truncated singular value decomposition and the L-curve technique. Chapter 5 discusses the least norm of the solution to some system of quaternion matrix equations and its expression by determinantal representations (analogs of Cramer's rule) using row-column noncommutative determinants.

> **Ivan I. Kyrchei** Pidstryhach Institute for Applied Problems of Mechanics and Mathematics, National Academy of Sciences of Ukraine, Lviv, Ukraine

Section 1

Modeling and Formulations

of Inverse Problems

**1**

Section 1

## Modeling and Formulations of Inverse Problems

#### **Chapter 1**

## Introductory Chapter: Some Preliminary Aspects of Inverse Problem

*Ivan I. Kyrchei*

#### **1. Introduction**

Physical research in science can be divided into two groups. The first is that when by complete description of a physical system, we can predict the outcome of some measurements. This problem is called the modelization problem or the forward problem. The second group of research consists of using the actual result of some observations to infer the values of the parameters that characterize the system. It is the inverse problem, which starts with the causes and then calculates the effects. The importance of inverse problems is that they tell us about physical parameters that we cannot directly observe.

#### **2. Primary equations of inverse problem**

The inverse problem is that one wants to determine the model parameters *p* that produce the observed data or measurements *d*. **F** stays for some measurement operator that maps parameters in a functional space P, typically a Banach or Hilbert space, to the space of data D, typically another Banach or Hilbert space.

$$d = \mathbf{F}p \text{ for } p \in \mathfrak{P} \text{ and } d \in \mathfrak{D}. \tag{1}$$

Solving the inverse problem amounts to finding point(s) *p* ∈ P from knowledge of the data *d*∈ D such that Eq. (1) (or its approximation) holds. In the case of a measurement, operator is linear and there is a finite number of parameters, Eq. (1) can be written as a linear system, where **F** is the matrix that characterizes the measurement operator, and P and *d*∈ D are corresponding vector spaces. Such inverse problem is called linear.

Inverse problems may be difficult to solve for at least two different reasons:


If it is acquired enough data to uniquely reconstruct the parameters, then the measurement operator can be injective, which means

$$\mathbf{F}(p\_1) = \mathbf{F}(p\_2) \quad \Rightarrow \quad p\_1 = p\_2 \text{ for all } p\_1, p\_2 \in \mathfrak{P}. \tag{2}$$

When **F** is injective, one can construct an inversion operator **F**�<sup>1</sup> mapping the range of **F** to a uniquely defined element P. In the case of a linear inverse problem, **F**�<sup>1</sup> is an inverse matrix. Further, the main features of the inverse operator are characterized by stability estimates that quantify how errors in the available measurements translate into errors in the reconstructions. It can be expressed as follows:

$$\|p\_1 - p\_2\|\_{\mathfrak{P}} \le a \|\mathbf{F}(p\_1) - \mathbf{F}(p\_2)\|\_{\mathfrak{D}}.\tag{3}$$

Where *α* : ℝ<sup>þ</sup> ! ℝ<sup>þ</sup> stay for an increasing function, such that *α*ð Þ¼ 0 0. This function gives an estimate of the reconstruction error ∥*p*<sup>1</sup> � *p*2∥<sup>P</sup> based on the error in the data ∥**F** *p*<sup>1</sup> � **<sup>F</sup>** *<sup>p</sup>*<sup>2</sup> ∥D. When the reconstructed parameters are acceptable, for instance when *α*ð Þ¼ *x Cx* for some constant *C*, then the inverse problem is called wellposed. When the reconstruction is contaminated by too large a noisy component, then the inverse problem is ill-posed.

Injectivity of **F** means satisfying the two conditions for a well-posed problem suggested by Jacques Hadamard [1], *Existence* and *Uniqueness* of solutions. Eq. (3) is the third Hadamard's condition, which is *Stability* of the solution or solutions.

Typically, inverse problems are ill-posed. Even when we have a linear inverse problem with invertible matrix **F**, it gives an ill-posed problem that can be solved by using the Moore-Penrose inverse matrix [2, 3] and least squares solutions inducted by it.

The goal of many experiments is to infer a property or attribute from data that is indirectly related to the unknown quantity. Parameter estimation problems usually satisfy the first criterion of well-posed problems, since something is responsible for the observed system response. Instead, they violate the third criterion and" almost" violate the second criterion because many different candidate solutions exist that, when substituted into the measurement model, produce very similar data. The condition of stability is often violated, because the inverse problem is represented by a mapping between metric spaces, but inverse problems are often formulated in infinite dimensional spaces. Therefore, limitations to a finite number of measurements, and the practical consideration of recovering only a finite number of unknown parameters may lead to the problems being recast in discrete form. In this case, the inverse problem is typically ill-conditioned and a regularization can be used. One of the most famous regularizations is the Tikhonov regularization [4]. The idea of Tikhonov regularization may be introduced as follows. In its simplest form, it consists in replacing the Eq. (1) with the second kind of equation

$$\mathbf{F}^\* \mathbf{F}p + ap = \mathbf{F}^\* d\tag{4}$$

where *α* is a positive parameter. It leads to that the problem of solving Eq. (4) is well-posed.

Unlike parameter estimation, inverse problems often violate Hadamard's first criterion since an optimal design outcome may be specified that cannot possibly be produced by the system. On the other hand, the existence of multiple designs (solutions) that produce an acceptable outcome violates the second criterion. From these, it follows inverse problems that are mathematically ill-posed due to an information deficit. In the parameter estimation case, the measurements barely provide sufficient information to specify a unique solution, and in some cases, the data could be explained by an infinite set of candidate solutions. Information from measurement data and prior information can be combined through Bayes' equation to produce estimates for the Quantities-of-Interest (QoI). In this approach, the measurements, *d*, and the QoI, *x*∈ P, are interpreted as random variables that obey probability density functions (PDFs). The PDFs are related by Bayes' equation

$$P(\boldsymbol{x}|d) = \frac{P(d|\boldsymbol{\omega})}{P(d)} P\_{pr}(\boldsymbol{\omega}) \tag{5}$$

where *P d*ð Þ j*x* is the likelihood of the observed data occurring for a hypothetical parameter *x*, accounting for measurement noise and model error ("likelihood PDF"), *Ppr*ð Þ *x* defines what is known before the measurement takes place about a hypothetical parameter *x*, ("prior PDF"), *P x*ð Þ j*d* is the posterior PDF, which defines what is known about *x* from both the measurements and prior information, and *P d*ð Þ is the evidence, which scales the posterior so that it satisfies the law of total probabilities.

Therefore, "*the most general theory is obtained by using a probabilistic point of view, where the a priori information on the model parameters is represented by a probability distribution over the" model space". A priori probability distribution is transformed into the a posteriori probability distribution, by incorporating a physical theory (relating the model parameters to some observable parameters) and the actual result of the observations (with their uncertainties)*" [5].

#### **Author details**

Ivan I. Kyrchei Pidstrygach Institute for Applied Problems of Mechanics and Mathematics, NASU, Lviv, Ukraine

\*Address all correspondence to: ivankyrchei26@gmail.com

© 2023 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **References**

[1] Hadamard J. Lectures on Cauchy's Problems in Linear Partial Differential Equations. New Haven: Yale University Press; 1923 (Reprinted by Dover, New York, 1952)

[2] Moore EH. On the reciprocal of the general algebraic matrix. Bulletin of the American Mathematical Society. 1920; **26**(9):394-395. DOI: 10.1090/ S0002-9904-1920-03322-7

[3] Penrose R. On best approximate solution of linear matrix equations. Proceedings of the Cambridge Philosophical Society. 1956;**52**(1):17-19. DOI: 10.1017/S0305004100030929

[4] Tikhonov AN, Arsenin VY. Solution of Ill-Posed Problems. Washington: Winston & Sons; 1977. ISBN 0-470- 99124-0

[5] Tarantola A. Inverse Problem Theory and Methods for Model Parameter Estimation. Philadelphia: SIAM; 2005. ISBN 0-89871-572-5

#### **Chapter 2**

### A Review of the EnKF for Parameter Estimation

*Neil K. Chada*

#### **Abstract**

The ensemble Kalman filter is a well-known and celebrated data assimilation algorithm. It is of particular relevance as it used for high-dimensional problems, by updating an ensemble of particles through a sample mean and covariance matrices. In this chapter we present a relatively recent topic which is the application of the EnKF to inverse problems, known as ensemble Kalman Inversion (EKI). EKI is used for parameter estimation, which can be viewed as a black-box optimizer for PDE-constrained inverse problems. We present in this chapter a review of the discussed methodology, while presenting emerging and new areas of research, where numerical experiments are provided on numerous interesting models arising in geosciences and numerical weather prediction.

**Keywords:** ensemble Kalman filter, Kalman filter, inverse problems, parameter estimation, data assimilation, optimization

#### **1. Introduction**

Inverse problems [1–3] are a class of mathematical problems which have gained significant attention of recent. Simply put, inverse problems are concerned with the recovery of some parameter of interest from noisy unstructured data. Mathematically we can express an inverse problem as the recovery of *u*∈ X from noisy measurements of data *y*∈Y, expressed as

$$\mathcal{Y} = \mathcal{G}(u) + \eta,\tag{1}$$

where G : X ! Y is the forward operator, and *η* � N ð Þ 0, Γ is some form of additive Gaussian noise. Specifically N ð Þ 0, Γ denotes a normal distribution with mean 0 and variance Γ. Commonly the covariance can be taken to be some form of the identity, i.e. <sup>Γ</sup> <sup>¼</sup> *<sup>γ</sup>*<sup>2</sup>*I*, where *<sup>γ</sup>* <sup>∈</sup> is some constant and *<sup>I</sup>* is the identity. Inverse problems are of high interest due to the amount of relevant problems that arise in wide variety of applications, most notably geophysical sciences, medical imaging and numerical weather prediction [4–6]. The classical approach to solving inverse problems, which is the theme of this chapter, is to construct a least-squares functional, and the solution is represented as a minimizer of some functional of the form

$$\mu^\* \coloneqq \arg\min\_{u \in \mathcal{X}} \frac{1}{2} \|y - \mathcal{G}(u)\|\_{\Gamma}^2 + \lambda R(u),\tag{2}$$

where *λ*>0 is a regularization parameter and *R u*ð Þ is some regularization term, usually required to prevent the overfitting of the data. A common example is Tikhonov regularization, i.e. *R u*ð Þ¼ <sup>1</sup> <sup>2</sup> <sup>∥</sup>*u*∥<sup>2</sup> . Traditional methods for solving (1) include optimization schemes such as the Gauss–Newton method, or Levenburg– Marquardt method which require derivative information of G, which can prove costly and cumbersome. Therefore a motivation for solving inverse problems is to provide gradient-free optimizers which can reduce this computational burden, while attaining a good level of accuracy. The methodology that we motivate, which alleviates these issues, is that of ensemble Kalman inversion (EKI). EKI can be viewed as the application of the ensemble Kalman filter (EnKF) to inverse problems, which is a natural way to solve inverse problems given the connections between data assimilation and inverse problems. The EnKF is a Monte-Carlo version of the celebrated Kalman filter, which is more favorable in high-dimensions. It operates by updating an ensemble of particles through sample mean and covariances. In particular we will take the viewpoint of EKI which acts as PDE-constrained derivative-free optimizer. Therefore EKI can be viewed as a black-box solver where no derivative information is required. Since this method was proposed for inverse problems, it has seen wide applications to various engineering-based applications, as well as developments related to both theory and methodology. In this chapter we discuss some of these keys concepts and insights, while briefly mentioning particular directions with EKI.

The general outline of these chapter is as follows. In Section 2 we provide the necessary background material, which covers the basics of EKI with some intuition and motivation We will discuss the algorithm in both the usual discrete-time setting, but also the continuous-time setting. This will lead onto Section 3 where we discuss one recent direction which is that of regularization theory, and its application to EKI. Furthermore we will also discuss how EKI can be extended to the notion of sampling in statistics within Section 4. Other, less-developed, directions are provided in Section 5. Numerical experiments are provided in basic settings in Section 6 on a number of basic differential equations, before providing some future remarks and a conclusion in Section 7.

#### **2. EKI: background material**

In this section we provide the background material related to the understanding and intiution of EKI. This will begin with a discussion on the ensemble Kalman filter, and how it connections with EKI. We will then present EKI in its vanilla form, which is a discrete-time optimizer, before discussing its connections with various existing methods. Finally we will extend the original formulation to the setting of continuoustime where we aim to provide a gradient flow structure of the resulting equations.

#### **2.1 Kalman filtering**

The ensemble Kalman filter (EnKF), is a popular methodology based on the celebrated Kalman filter (KF), which was originally developed by Rudolph Kalman in the 1960s [7, 8]. The Kalman filters initial aim was to solve a recursive estimation problem from dynamics processes and systems. Specifically the KF aims to merge data with model, or signal, dynamics where both equations have the form

$$
\mu\_{n+1} = \Psi(u\_n) + \xi\_n,\qquad \qquad \{\xi\_n\}\_{n \in \mathbb{Z}^+} \sim \mathcal{N}(\mathbf{0}, \Sigma),\tag{3}
$$

$$\eta\_{n+1} = H(u\_{n+1}) + \eta\_{n+1}, \qquad \left\{ \eta\_{n+1} \right\}\_{n \in \mathbb{Z}^+} \sim \mathcal{N}(\mathbf{0}, \Gamma). \tag{4}$$

Here f g *un <sup>n</sup>*<sup>∈</sup> <sup>ℤ</sup><sup>þ</sup> is our signal which is updated through a forward operator <sup>Ψ</sup> : *<sup>m</sup>* ! *<sup>m</sup>*, which when combined with noise, provides the update *un*þ1. Our data is denoted as *yn*þ<sup>1</sup> which is produced by sending our updated signal through the operator *<sup>H</sup>* : *<sup>m</sup>* ! *<sup>m</sup>*, where *<sup>m</sup>* <sup>&</sup>gt; *<sup>m</sup>*, which is known as observational operator. Our initial conditions for the system are given as *u*<sup>0</sup> � N ð Þ *m*0, C<sup>0</sup> . This area of recursive estimation, in this setup, became to be known as data assimilation [9, 10].

In particular in the linear and Gaussian setting, where the dynamics and noise are Gaussian, the KF updates state using the first two moments, which we know are the mean and covariance. Assume that the state-space dimension is *d*∈*R*þ, then the cost of the KF has complexity O *d*<sup>2</sup> � �. For high-dimensional examples this can be an issue, therefore an algorithm that was developed to alleviate this is the EnKF, a Monte Carlo version, proposed by Evensen [11, 12].

The EnKF operates by replacing the true covariance by a sample covariance and mean and updates an ensemble of particles *u*ð Þ*<sup>j</sup> <sup>n</sup>* , with 1≤*j*≤*J* particles, using these moments combined with information from the data. The EnKF can be split into a twostep procedure, which is the prediction step

$$\begin{aligned} \hat{u}\_{n+1}^{(j)} &= \Psi \left( u\_n^{(j)} \right) + \xi\_n^{(j)}, \quad \hat{m}\_{n+1} = \frac{1}{J} \sum\_{j=1}^{J} u\_{n+1}^{(j)}, \\ \hat{C}\_{n+1} &= \frac{1}{J-1} \sum\_{j=1}^{J} \left( u\_{n+1}^{(j)} - \hat{m}\_{n+1} \right) \left( u\_{n+1}^{(j)} - \hat{m}\_{n+1} \right)^{T}, \end{aligned} \tag{5}$$

and update step

$$\begin{aligned} K\_{n+1} &= \mathring{\mathbf{C}}\_{n+1} H^T \left( H \mathring{\mathbf{C}}\_{n+1} H^T + \Gamma \right), \\ u\_{n+1}^{(j)} &= \left( I - K\_{j+1} H \right) \mathring{u}\_{n+1}^{(j)} + K\_{n+1} \mathbf{y}\_{n+1}^{(j)}, \\ \mathbf{y}\_{n+1}^{(j)} &= \mathbf{y}\_{n+1} + \eta\_{n+1}^{(j)}, \end{aligned} \tag{6}$$

where *Kn*þ<sup>1</sup> represents the Kalman gain matrix and *<sup>ξ</sup>*ð Þ*<sup>j</sup> <sup>n</sup>* and *η* ð Þ*j <sup>n</sup>*þ<sup>1</sup> are i.i.d. Gaussian noise. In the EnKF context our prediction step defines a sample mean and covariance from our signal. From this in the analysis step we define our Kalman gain through our sample covariance, which updates our signal, which is given by *u*ð Þ*<sup>j</sup> <sup>n</sup>*þ<sup>1</sup>. This is aided by aiming to minimize the discrepancy of the data *y* ð Þ*j <sup>n</sup>*þ<sup>1</sup> and the quantity *H u*ð Þ. To better understand this discrepancy, there is an alternative approach of looking at the EnKF is through a variational approach, where we consider the follow cost function

$$I\_n(u) \coloneqq \frac{1}{2} \left| \boldsymbol{\mathcal{y}}\_{n+1}^{(j)} - H(u) \right|\_{\Gamma}^2 + \frac{1}{2} \left| u - \hat{\boldsymbol{u}}\_{n+1}^{(j)} \right|\_{\hat{\mathcal{C}}\_{n+1}}^2,\tag{7}$$

for which we aim to minimize, which is defined as the updated mean

$$
\hat{m}\_{n+1} = \arg\min\_{\boldsymbol{\mu}} I\_{\boldsymbol{\mu}}(\boldsymbol{\mu}).\tag{8}
$$

This minimization procedure relies on the updated covariance *<sup>C</sup>*^ *<sup>n</sup>*þ<sup>1</sup> which is dependent entirely on *u*^ð Þ*<sup>j</sup>* . As described in the prediction step and update step of filtering, a mapping is presented between distributions. As we related the distributions in the filtering setting, for each step, we can do so similarly for the EnKF, i.e.

$$\left\{ u\_n^{(j)} \right\}\_{j=1}^J \mapsto \left\{ u\_{n+1}^{(j)} \right\}\_{j=1}^J, \qquad \left\{ u\_{n+1}^{(j)} \right\}\_{j=1}^J \mapsto \left\{ \hat{u}\_{n+1}^{(j)} \right\}\_{j=1}^J. \tag{9}$$

With the EnKF, compared to KF, the computational complexity associated with it is Oð Þ *Jd* , where one usually assumes *J* < *d*, therefore implying the reduction in cost.

#### **2.2 EnKF applied to inverse problems**

Since the formulation of the EnKF, there has been a huge interest from practitioners in various applicable disciplines. Most notably this has been within numerical weather prediction, geophysical sciences and signal processing related to state estimation. In this chapter our focus is on the application of the EnKF to inverse problems, namely to solve (1). We now introduce this application which is known as ensemble Kalman inversion (EKI), which was introduced by Iglesias et al., motivated from Li et al., [13] as a derivative-free optimizer for PDE-constrained inverse problems.

As with the EnKF, we are concerned with updating an ensemble of particles, for which now we modify notation with *n* now denoting the iteration count. Given an initial ensemble *u*ð Þ*<sup>j</sup>* 0 n o, our aim is to learn a true underlying unknown *<sup>u</sup>*†. To do so, as done with the EnKF, we first define our sample mean and covariance matrices

$$\overline{u}\_{n}^{(j)} = \frac{1}{J} \sum\_{j=1}^{J} u\_{n}^{(j)}, \quad \overline{u}\_{n}^{(j)} = \frac{1}{J} \sum\_{j=1}^{J} G\left(u\_{n}^{(j)}\right),$$

$$C\_{n}^{up} = \frac{1}{J-1} \sum\_{j=1}^{J} \left(u\_{n}^{(j)} - \overline{u}\right) \left(u\_{n}^{(j)} - \overline{u}\right)^{T}, \quad C\_{n}^{up} = \frac{1}{J-1} \sum \left(u\_{n}^{(j)} - \overline{u}\right) \left(G\left(u\_{n}^{(j)}\right) - \overline{G}\right)^{T}. \tag{10}$$

which we can through the update equation

$$\mathfrak{u}\_{n+1}^{(j)} = \mathfrak{u}\_n^{(j)} + h\mathcal{C}^{qp} (h\mathcal{C}^{pp} + \Gamma)^{-1} \left( \mathcal{Y}\_n^{(j)} - \mathcal{G} \left( \mathfrak{u}\_n^{(j)} \right) \right), \tag{11}$$

$$
\mathfrak{y}\_n^{(j)} = \mathfrak{y} + \mathfrak{n}\_n^{(j)},\tag{12}
$$

where *y* represents our true data and *h*> 0 denotes a step size related to the level of discretization. **Figure 1** provides a pictorial description of the EnKF, which has been described above.

The update equation of EKI (11) is of interest as it coincides with the update formula for Tikhonov regularization for linear statistical inverse problems. Namely if we consider *R u*ð Þ¼ <sup>1</sup> <sup>2</sup> <sup>∥</sup>*u*∥<sup>2</sup> *<sup>C</sup>*<sup>0</sup> , then the update formula, in the linear GðÞ¼ � G and Gaussian setting is given as

**Figure 1.** *Dynamics of the ensemble Kalman filter, split into the prediction and update steps.*

$$
\mu\_{\rm TP} = \overline{u} + \mathcal{CG}^\* \left( \mathcal{CGG}^\* + \Gamma \right)^{-1} (\mathcal{y} - \mathcal{G}\overline{u}), \tag{13}
$$

where G <sup>∗</sup> denotes the derivative of the operator G. This connection is of relevance and was discussed in [14], where it was shown that taking the limit as *J* ! ∞, it was shown that *u* ! *uTP*. This is of interest as the minimizing the regularized functional (13) is equivalent to the following maximization procedure in statistics

$$\mu := \arg\max\_{u \in \mathcal{X}} \mathbb{P}(u|y). \tag{14}$$

known as the MAP formulation, where ℙð Þ¼ *u*j*y* ℙð Þ *y*j*u* ℙð Þ *u* denotes the posterior distribution. This connection is discussed in [15]. Therefore this provides some insight into EKI and its connection with other known existing methodologies in inverse problems. An important entity to discuss is a property that EKI inherits, which is the *subspace property*. It is given by the following lemma.

Lemma 1.1 Let A be the linear span of the initial ensemble *u*ð Þ*<sup>j</sup>* 0 n o*<sup>J</sup> j*¼1 , then we that *blacku*ð Þ*<sup>j</sup> n* n o*<sup>J</sup>* ∈ A for all *n* ∈ℕ.

*j*¼1 The essence of the subspace property states that the updated ensemble of particles is spanned by the initial ensemble. This is important, because it provides a justification on the performance, whether the initial ensemble is a good choice or not. Therefore it can act as an advantage or a disadvantage.

#### **2.3 Continuous-time formulation**

The original representation of EKI, as shown in (11), is a discrete-time iterative scheme similar to other optimization methods. However it is of interest to understand EKI in a continuos-time setting, which was considered by Schillings et al. [16, 17]. This is primarily for two reasons; (i) firstly that one can understand more easily how the dynamics of (11) and (12) behaves, and secondly (ii) it provides

new numerical schemes for EKI, which is specific in the continuous-time setting. In order to derive such equations, as usual we require to take the step-size to zero, i.e. *h* ! 0. Once we do this, we have the following set of stochastic differential equations

$$\frac{du^{(j)}}{dt} = \mathcal{C}^{uw}(u)\Gamma^{-1}\left(\mathcal{Y} - \mathcal{G}\left(u^{(j)}\right)\right) + \mathcal{C}^{uw}(u)\sqrt{\Gamma^{-1}}\frac{dW^{(j)}}{dt},\tag{15}$$

with *W*ð Þ*<sup>j</sup>* denoting independent cylindrical Brownian motions. By substituting the form of the covariance operator, we see

$$\frac{du^{(j)}}{dt} = \frac{1}{J} \sum\_{k=1}^{J} \left\langle \mathcal{G}\left(u^{(k)}\right) - \overline{\mathcal{G}}, \mathcal{y} - \mathcal{G}\left(u^{(j)}\right) + \sqrt{\Gamma} \frac{d\mathcal{W}^{(j)}}{dt} \right\rangle\_{\Gamma} \left(u^{(k)} - \overline{u}\right). \tag{16}$$

For this we take our forward operator GðÞ¼ � *A*� to be bounded and linear. Using this notion and by substituting our linear operator *A* in (16) we have the following diffusion limit

$$\frac{du^{(j)}}{dt} = \frac{1}{J} \sum\_{k=1}^{J} \left< A \left( u^{(k)} - \overline{u} \right), y - Au^{(j)} \right>\_{\Gamma} \left( u^{(k)} - \overline{u} \right). \tag{17}$$

By defining the empirical covariance operator

$$C(u) = \frac{1}{J-1} \sum\_{k=1}^{J} \left( u^{(k)} - \overline{u} \right) \otimes \left( u^{(k)} - \overline{u} \right),\tag{18}$$

and taking Γ ¼ 0 we can express (17) as

$$\begin{split} \frac{du^{(j)}}{dt} &= -C(u)D\_u\Phi\Big(u^{(j)}; \mathcal{Y}\Big), \\ \Phi(u;\mathcal{Y}) &= \frac{1}{2} \left\|\Gamma^{-1/2}(\mathcal{Y} - Au)\right\|^2. \end{split} \tag{19}$$

Thus we note that each particle performs a preconditioned gradient descent for Φð Þ �; *y* where all the gradient descents are preconditioned through the covariance *C u*ð Þ. Since our covariance operator *C u*ð Þ is semi-positive definite we have that

$$\frac{d}{dt}\Phi(u(t);y) = \frac{d}{dt}\frac{1}{2}\left\|\Gamma^{-1/2}(y-Au)\right\|^2 \le 0. \tag{20}$$

In the context of EKI this is of interest as it is a first result providing some indication of the dynamics, which was not achievable through the discrete-time update formula (11). Indeed given the gradient flow structure, we are able to see that the EKI abides by a usual optimization function, with the dynamics following the direction of the negative gradient, or in other-words towards to minimizer of Φ. Since the continuous-time formulation was derived, there has been different works deriving further analysis, most notably with recent success on the nonlinear setting, and other well-known results. This can be found in [18].

*A Review of the EnKF for Parameter Estimation DOI: http://dx.doi.org/10.5772/intechopen.108218*

#### **Figure 2.**

*The figure presents two simulations of EKI as the iterations increase. The black curve represents what we aim to achieve, however in certain situations the data is commonly overfitted. Therefore this can cause a divergence in the relative error, as shown by the dashed red curve.*

#### **3. Regularization**

In this section we discuss the role of regularization in EKI. We will begin with an introduction into iterative regularization schemes, that have been used before discussing Tikhonov regularization, *Lp* and particular adaptive choices.

As briefly discussed regularization is an important tool in optimization, and inverse problems aimed at preventing the over-fitting, or influence, of the data. We refer the reader to various pieces of literature that give a concise overview on this [19, 20]. The over-fitting of data can cause issues in inverse problems, such as the divergence of the error, therefore careful consideration is needed to prevent this. A cartoon representation of this is given in **Figure 2**.

To initiate this chapter, there are two main forms of regularization one can apply for inverse problems. The first is related to *iterative regularization*, where the regularization is included within the iterative scheme. This can be included directly such as the form

$$\mu\_{n+1}^{(j)} = \mu\_n^{(j)} + hC^{qp} \left( hC^{pp} + a\_n \Gamma \right)^{-1} \left( \mathcal{Y}\_n^{(j)} - \mathcal{G} \left( \mu\_n^{(j)} \right) \right), \tag{21}$$

or in the presence of a discrepancy principle of the form

$$\|\|\Gamma^{-1}\left(\mathcal{Y} - \overline{u}\_n^{(j)}\right)\|\|^2 \le \theta\eta, \qquad \theta \in (0, 1), \tag{22}$$

which controls the error between the updated ensemble and the true unknown. The discrepancy principle acts as a stopping rule if the error becomes big, and the the modified update formula contains a sequence of numbers f g *α<sup>n</sup> <sup>n</sup>*∈<sup>ℕ</sup> aimed at also preventing the overfitting of the data. This sequence is chosen in such a way that is related to a discrepancy principle. Specifically for EKI this has been considered in numerous work by Iglesias et al. [21, 22].

However more recent work has considered regularization through the leastsquares functional (LSF) (2). For EKI the first known form to consider this, is Tikhonov regularization which has the penalty form of *R u*ð Þ¼ <sup>1</sup> <sup>2</sup> <sup>∥</sup>*u*∥<sup>2</sup> *<sup>C</sup>*<sup>0</sup> . This form of regularization is a natural choice, as it very well-known and understood but can view viewed as a Gaussian form of regularization, which smoothes the problems. In the

context of EKI this makes sense, as commonly one assumes Gaussian dynamics. The work of Chada et al. [23] first developed this extension, which was done by modifying (1) to the following

$$\begin{aligned} \mathcal{Y} &= \mathcal{G}(\mathfrak{u}) + \eta\_1, \\ \mathfrak{u} &= \eta\_2, \end{aligned} \tag{23}$$

where *<sup>η</sup>*<sup>1</sup> � <sup>N</sup> ð Þ 0, <sup>Γ</sup> ,*η*<sup>2</sup> � <sup>N</sup> 0, *<sup>λ</sup>*�<sup>1</sup> *C*<sup>0</sup> *:*. Now we introduce *z*,*η* and the mapping F : X � X↦Y � X as follows:

$$\mathbf{z} = \begin{bmatrix} \mathcal{Y} \\ \mathbf{0} \end{bmatrix}, \quad F(u) = \begin{bmatrix} G(u) \\ u \end{bmatrix}, \quad \eta = \begin{bmatrix} \eta\_1 \\ \eta\_2 \end{bmatrix}, \tag{24}$$

and

$$\eta \sim \mathcal{N}(\mathbf{0}, \Sigma), \quad \Sigma = \begin{bmatrix} \Gamma & \mathbf{0} \\ \mathbf{0} & \lambda^{-1} \mathbf{C}\_0 \end{bmatrix}. \tag{25}$$

Therefore our inverse problem is now reformulated at

$$z = \mathcal{F}(u) + \eta. \tag{26}$$

now from this we can modify EKI to include the above setup, for which we refer to it as Tikhonov ensemble Kalman inversion (TEKI), which takes the following form

$$\mathfrak{u}\_{n+1}^{(j)} = \mathfrak{u}\_n^{(j)} + hB^{\mathfrak{u}p} (hB^{pp} + \Gamma)^{-1} \left( \mathfrak{z}\_n^{(j)} - \mathcal{F} \left( \mathfrak{u}\_n^{(j)} \right) \right), \tag{27}$$

where we have now modified covariance matrices *Bup*,*Bpp*. From this inclusion, the authors of [23] were able to show that analytically, the subspace property still holds, while other such results as observability and controllability and the ensemble collapse. More importantly through the numerical simulations, it was shown that one can prevent the over-fitting phenomenon.

Since this work a number of useful extensions have been considered, such as its understanding in the continuous-case, as well as the new variants in the discrete-time setting [24]. Two recent developments on this have been firstly on the extension to *Lp* regularization [25, 26], which is to motivate reconstructing edges or lines, where the LSF is modified to

$$\Phi(u;\boldsymbol{y}) \coloneqq \frac{1}{2} \|\boldsymbol{y} - \mathcal{G}(\boldsymbol{u})\|\_{\Gamma}^{2} + \lambda \|\boldsymbol{u}\|\_{p}, \quad p \ge 1. \tag{28}$$

Finally another direction is related to producing adaptive strategies for TEKI. Adaptive regularization schemes are of importance, as choosing a correct choice of the regularization parameter *λ*> 0 can have a big impact on the reconstruction. Therefore thinking adaptively allows one to evolve the parameter over the iteration count, now denoted as *λn*. The work of Weissmann et al. [27] provides these developments in an adaptive fashion.

#### **4. Ensemble Kalman sampling**

Although the EKI has been introduced through the application of the EnKF to inverse problems and hence sequential sampling method, the trending viewpoint of *A Review of the EnKF for Parameter Estimation DOI: http://dx.doi.org/10.5772/intechopen.108218*

EKI lies in optimization. So far, we have seen its motivation from the gradient flow structure in the continuous-time formulation in Section 2.3 and the representation as SDE. For applying EKI as a consistent sampling method, we would instead of taking the limit *t* ! ∞ rather consider the limit *t* ! 1. For linear forward models EKI is consistent with the posterior distribution, however, it is known to be not consistent with the Bayesian perspective in the nonlinear setting [28].

Building up on this fact, the motivation behind the ensemble Kalman sampler [29] is to modify the time-dynamical system of EKI in a way such that the limiting distribution for *t* ! ∞ corresponds to the posterior distribution. We will start the discussion with an introductory example.

Example 1.1 Let *π* <sup>∗</sup> be a pdf of the form *π* <sup>∗</sup> ð Þ *x* ∝ exp ð Þ �Φð Þ *u* with <sup>Φ</sup>ð Þ¼ *<sup>u</sup>* <sup>1</sup> <sup>2</sup> <sup>∥</sup>*<sup>y</sup>* � *G u*ð Þ∥<sup>2</sup> <sup>Γ</sup> <sup>þ</sup> <sup>∥</sup>*u*∥<sup>2</sup> *<sup>C</sup>*, i.e. *π* <sup>∗</sup> corresponds to the posterior pdf under Gaussian prior assumption *π*<sup>0</sup> ¼ N ð Þ 0, *C* . We consider the Langevin diffusion given by

$$du\_t = \nabla\_u \log \pi\_\*\left(u\_t\right) dt + \sqrt{2} dW\_t, \quad u\_0 \sim \pi\_0,\tag{29}$$

where ð Þ *Wt <sup>t</sup>*≥<sup>0</sup> denotes a Brownian motion in <sup>X</sup> <sup>¼</sup> *nu* . The evolution of the distribution *ρ<sup>t</sup>* of the state *ut* can then be described through the Fokker–Planck equation

$$
\partial \rho\_t = \nabla \cdot (\rho\_t \nabla \log \pi\_\*) + \Delta \rho\_t, \quad \rho\_0 = \pi\_0,\tag{30}
$$

where under certain assumptions on Φ the underlying Markov process ð Þ *ut <sup>t</sup>*≥<sup>0</sup> is ergodic and its unique invariant distribution is given by *π* <sup>∗</sup> [30]. Taking the Fokker– Planck eq. (30) into account the convergence to equilibrium can be described through the Kullback–Leibler (KL) divergence KL <sup>¼</sup> <sup>Ð</sup> <sup>X</sup> *<sup>q</sup>*1ð Þ *<sup>x</sup>* log *<sup>q</sup>*1ð Þ *<sup>x</sup> q*2ð Þ *x* � �d*<sup>x</sup>* [31]. Assuming a log -Sobolev inequality (e.g. satisfied for log -concave *π* <sup>∗</sup> ), it follows that

$$\text{KL}(\rho\_t|\pi\_\*) \le \exp\left(-\lambda t\right) \text{KL}(\rho\_0|\pi\_\*)\tag{31}$$

for some *λ*>0 [32].

#### **4.1 Interacting Langevin sampler**

The interacting Langevin sampler has been introduced, motivated by the preconditioned gradient descent method, as interacting particle system represented by the coupled system of SDEs

$$du\_t^{(j)} = \mathcal{C}(u\_t)\nabla\_u \log \pi\_\*\left(u\_t^{(j)}\right)dt + \sqrt{2\mathcal{C}(u\_t)}dW\_t, \quad j = 1, \ldots, J,\tag{32}$$

initialized through an i.i.d. sample *u*ð Þ*<sup>j</sup>* <sup>0</sup> � *π*0. The idea of preconditioning with *C u*ð Þ*<sup>t</sup>* instead of a fixed preconditioning matrix *C*∈ *nu*�*nu* is motivated through the corresponding mean-field limit. In the large particle limit, the corresponding SDE is given as

$$du\_t = \mathcal{C}(\rho\_t) \nabla\_u \log \pi\_\* \left( u\_t \right) dt + \sqrt{2 \mathcal{C}(\rho\_t)} dW\_t, \quad u\_0 \sim \pi\_0,\tag{33}$$

where the macroscopic mean and covariance operator are defined as

$$\mathfrak{m}(\rho) = \int\_{\mathcal{X}} \mathfrak{x} \rho(\mathfrak{x}) \, d\mathfrak{x}, \quad \mathsf{C}(\rho) = \int\_{\mathcal{X}} (\mathfrak{x} - m(\rho)) \otimes (\mathfrak{x} - m(\rho)) \, d\mathfrak{x}. \tag{34}$$

This connects the interacting Langevin system to its origin Langevin diffusion (29). Hence, in the long-time limit the preconditioning matrix will formally be given by the covariance operator corresponding to the stationary distribution (assuming it exists).

The resulting modified Fokker–Planck equation is given by

$$d\rho\_t = \nabla \cdot (\rho\_t \mathbf{C}(\rho\_t) \nabla \log \pi\_\*) + \text{Tr}\{\mathbf{C}(\rho\_t) D^2 \rho\_t\}, \quad \rho\_0 = \pi\_0. \tag{35}$$

Assuming that *C ρ<sup>t</sup>* ð Þ≥ *α*Id and the target distribution of the form *<sup>π</sup>* <sup>∗</sup> ð Þ *<sup>u</sup>* <sup>∝</sup>exp ð Þ �Φð Þ *<sup>u</sup>* , <sup>Φ</sup>ð Þ¼ *<sup>u</sup>* <sup>1</sup> <sup>2</sup> <sup>∥</sup>*<sup>y</sup>* � <sup>G</sup>ð Þ *<sup>u</sup>* <sup>∥</sup><sup>2</sup> <sup>Γ</sup> <sup>þ</sup> *<sup>λ</sup>*∥*u*∥<sup>2</sup> *<sup>C</sup>*<sup>0</sup> , to be log -concave, the solution *ρ<sup>t</sup>* of (35) converges exponentially fast to equilibrium

$$\text{KL}(\rho\_t|\pi\_\*) \le \exp\left(-\lambda t\right) \text{KL}(\rho\_0|\pi\_\*),\tag{36}$$

for some *λ*>0 [29], Proposition 3.1. Furthermore, through the preconditioning with the sample covariance the resulting scheme remains invariant under affine transformations [33].

#### **4.2 Ensemble Kalman sampler**

One of the attractive features of the EnKF as well as of EKI is its derivative-free implementation. The basis of the ensemble Kalman sampler (EKS) is to build a modified interacting Langevin sampler avoiding to compute derivatives. Let *<sup>π</sup>* <sup>∗</sup> ð Þ *<sup>u</sup>* <sup>∝</sup>exp � <sup>1</sup> <sup>2</sup> <sup>∥</sup>*<sup>y</sup>* � <sup>G</sup>ð Þ *<sup>u</sup>* <sup>∥</sup><sup>2</sup> <sup>Γ</sup> � <sup>∥</sup>*u*∥<sup>2</sup> *C*<sup>0</sup> � �, then the interacting Langevin system is given by *du*ð Þ*<sup>j</sup> <sup>t</sup>* ¼ �*C u*ð Þ*<sup>t</sup>* <sup>D</sup><sup>G</sup> *<sup>u</sup>*ð Þ*<sup>j</sup> t* � �<sup>T</sup> Γ�<sup>1</sup> *G u*ð Þ*<sup>j</sup> t* � � � *<sup>y</sup>* � � � *C u*ð Þ*<sup>t</sup> <sup>C</sup>*�<sup>1</sup> <sup>0</sup> *<sup>u</sup>*ð Þ*<sup>j</sup> <sup>t</sup> dt* <sup>þ</sup> ffiffiffiffiffiffiffiffiffiffiffiffiffi 2*C u*ð Þ*<sup>t</sup>* p *dWt*, *j* ¼ 1, … , *J:* (37)

Motivated by the approximation *<sup>C</sup>uw*ð Þ *<sup>u</sup>* <sup>≈</sup>*C u*ð ÞD<sup>G</sup> *<sup>u</sup>*ð Þ*<sup>j</sup>* � �<sup>T</sup> the EKS is then formulated as the solution of the system of coupled SDEs

$$du\_t^{(j)} = -\mathcal{C}^{uw}(u\_t)\Gamma^{-1}\left(\mathcal{G}\left(u\_t^{(j)}\right) - \boldsymbol{y}\right) - \mathcal{C}(u\_t)\mathcal{C}\_0^{-1}u\_t^{(j)}dt + \sqrt{2\mathcal{C}(u\_t)}dW\_t, \quad j = 1, \ldots, J. \tag{38}$$

We note that the approximation *<sup>C</sup>uw*ð Þ *<sup>u</sup>* <sup>≈</sup>*C u*ð ÞD<sup>G</sup> *<sup>u</sup>*ð Þ*<sup>j</sup>* � �<sup>T</sup> is exact for linear forward models and hence, the EKS coincides with the interacting Langevin sampler in the linear setting. However, for nonlinear forward models the approximation of derivatives is only accurate in case the particles are close to each other. Since in the application of EKS the particles are aiming to represent a distribution, the particles are not expected to be close to each other. This fact suggests to formulate a localized version of the preconditioning sample covariance matrix, incorporating more weights on particles close to each other, but reducing the weight between particles far away. Therefore, we define the distance-dependent weights between particle *u*ð Þ*<sup>j</sup> <sup>t</sup>* and *u*ð Þ*<sup>i</sup> t*

*A Review of the EnKF for Parameter Estimation DOI: http://dx.doi.org/10.5772/intechopen.108218*

$$\boldsymbol{u}\_{t}^{ji} = \frac{\exp\left(-\frac{1}{2\gamma} \|\boldsymbol{u}\_{t}^{(j)} - \boldsymbol{u}\_{t}^{(i)}\|\_{D}^{2}\right)}{\sum\_{l=1}^{J} \exp\left(-\frac{1}{2\gamma} \|\boldsymbol{u}\_{t}^{(j)} - \boldsymbol{u}\_{t}^{(l)}\|\_{D}^{2}\right)},\tag{39}$$

for scaling parameters *γ* > 0 and symmetric positive-definite matrix *D* ∈ *nu*�*nu* . The localized (mixed) sample covariance matrix around particle *u*ð Þ*<sup>j</sup> <sup>t</sup>* is the defined as

$$\begin{split} \mathcal{C}\left(\boldsymbol{u}\_{t}^{(j)}\right) &= \sum\_{i=1}^{J} \boldsymbol{w}\_{t}^{\mathrm{ii}} \left(\boldsymbol{u}\_{t}^{(i)} - \overline{\boldsymbol{u}}\_{t}^{(j)}\right) \otimes \left(\boldsymbol{u}\_{t}^{(i)} - \overline{\boldsymbol{u}}\_{t}^{(j)}\right), \\ \mathcal{C}^{\mathrm{aw}}\left(\boldsymbol{u}\_{t}^{(j)}\right) &= \sum\_{i=1}^{J} \boldsymbol{w}\_{t}^{\mathrm{ii}} \left(\boldsymbol{u}\_{t}^{(i)} - \overline{\boldsymbol{u}}\_{t}^{(j)}\right) \otimes \left(\mathcal{G}\left(\boldsymbol{u}\_{t}^{(i)}\right) - \overline{\mathcal{G}}\_{t}^{(j)}\right), \end{split} \tag{40}$$

with localized mean

$$
\overline{u}\_t^{(j)} = \sum\_{i=1}^J w\_t^{ji} u\_t^{(i)}, \quad \overline{\mathcal{G}}\_t^{(j)} = \sum\_{i=1}^J w\_t^{ji} \mathcal{G} \left( u\_t^{(i)} \right). \tag{41}
$$

The localized EKS then reads as

$$d\boldsymbol{u}\_{t}^{(j)} = -\mathbf{C}^{\mathrm{uu}}\left(\boldsymbol{u}\_{t}^{(j)}\right)\Gamma^{-1}\left(\mathcal{G}\left(\boldsymbol{u}\_{t}^{(j)}\right) - \boldsymbol{y}\right) - \mathbf{C}\left(\boldsymbol{u}\_{t}^{(j)}\right)\mathbf{C}\_{0}^{-1}\boldsymbol{u}\_{t}^{(j)}\,\mathrm{d}t + \sqrt{2\mathcal{C}\left(\boldsymbol{u}\_{t}^{(j)}\right)}\,\mathrm{d}\mathcal{W}\_{t}, \quad j = 1, \ldots, J. \tag{42}$$

While the original EKS shows promising results for nearly Gaussian target distribution, the considered localized variant helps to extend the scope to multimodal target distributions [34]. Other such related work has aimed to provide further understandings of the EKS. This has included the derivation of providing mean field limits and, but also providing various generalizations [33, 35].

#### **5. Other directions**

As we have discussed some of the more recent developments in EKI, we now focus on other, more smaller, extensions. In this section we will discuss these each in turn, which will include machine learning, understanding EKI in the context of nonlinear inverse problems, and finally applications related to engineering such as geophysical sciences.

#### **5.1 Applications in machine learning**

The developments of machine learning methodologies has seen a significant increase in the last decade, which have been produced to solve problems related to health-care, imaging, and decision processes. In particular much of the these developments has been to due the advancements in optimizaion theory. As a result, ensemble Kalman methods can be viewed as a natural class of algorithms to be directly applied, as they are derivative-free optimizers.

The first work aimed at characterizing this connection was [36] which demonstrated this. The authors motivated EKI as a replacement to SGD where they initially applied it to supervising learning problems. Given a dataset *xj*, *yj* n o*<sup>N</sup> j*¼1 assumed to be i.i.d. samples from a particular distribution, then given the Monte Carlo approximation one has the minimization procedure

$$\arg\min\_{u} \Phi\_{\boldsymbol{\upiota}}(\boldsymbol{u}; \boldsymbol{x}, \boldsymbol{y}),$$

$$\Phi\_{\boldsymbol{\upiota}}(\boldsymbol{u}; \boldsymbol{x}, \boldsymbol{y}) = \frac{1}{N} \sum\_{i=1}^{N} \mathcal{L}\left(\mathcal{G}\left(\boldsymbol{u}|\boldsymbol{x}\_{\boldsymbol{\upiota}}\right), \boldsymbol{y}\_{\boldsymbol{\upiota}}\right) + \frac{\lambda}{2} \|\boldsymbol{u}\|\_{C\_{0}}^{2},\tag{43}$$

where L : Y � Y ! þ, is some positive-definite function. In other words, one is trying to learn *xj* from the labeled data *yj* . Supervised learning is used for common ML applications such as image classification and natural language processing. Another related application is that of semi-supervised learning, which aims to learn *xj* from some of the data *yj* where do not have access to all of it. This modified the least squares functional given in (43).

Another interesting direction has been the inclusion of EKI for training and learning neural networks [37]. This builds upon the previous work discussed, but with a number of modifications. In particular what the authors show is that they are able to prove convergence of EKI to the minimizer of a strongly convex function. They apply their modified methodology to a nonlinear regression problem of the form

$$F(\theta) = A\theta + e\sin\left(B\theta\right),\tag{44}$$

where *θ* is the parameter of interest and *F*ð Þ*θ* is the objective functional of interest. This was also extended to the likes of image classification problems, specifically the well-known MNIST handwritten data set.

A final and more recent direction of EKI and ML, was the work of Guth et al. [38], which provided a way of solving the forward problem, within EKI.

#### **5.2 Extensions to nonlinear convergence analysis**

A major challenge with EKI, and the EnKF in general, is establishing convergence analysis and properties in the nonlinear setting. As it is well known in the linear and Gaussian setting, as the the number of particles *N* ! ∞, the EnKF coincides with the KBF. However in the nonlinear setting it is has been challenging to derive any such results rigorously. Some ongoing and recent work has aimed to bridge the connections between EKI and nonlinear dynamics. The first paper that provided some form of analysis was the work of Chada et al. [24] which considered a specific form of EKI, in the discrete-time setting.

Namely the update formula is modified to

$$\begin{aligned} m\_{n+1} &= m\_n + \mathbf{C}\_n^{pp} \left( \mathbf{C}\_n^{up} + h\_n^{-1} \Gamma \right)^{-1} (\mathbf{z} - H(m\_n)), \\ \mathbf{C}\_{n+1} &= \mathbf{C}\_n^{uu} - \mathbf{C}\_n^{up} \left( \mathbf{C}\_n^{pp} + h\_n^{-1} \Gamma \right)^{-1} \mathbf{C}\_n^{pu} + \alpha\_n^2 \Sigma, \end{aligned} \tag{45}$$

where we adopt an ensemble square root filter formulation, which is known to perform better. As well as this we also include covariance inflation (i.e. inflation factor of *αn*), and an adaptive step-size *hn* motivated from stochastic optimization to allow an acceleration for the convergence. However the other underlying contribution, as

eluded to, is that given this update form we are able to prove convergence towards both local and global minimizers. In other words for the later, we have the following result

$$\|\lambda\_c\| m\_N - u^\* \|^2 \le \ell(m\_N) - \ell(u^\*) \le \frac{D}{N^a},\tag{46}$$

which the above result establishes polynomial convergence. We note from the above equation, that *λ<sup>c</sup>* is a convexity constant, ℓ is the associated loss function, *D* is some constant, *u*<sup>∗</sup> is the global minimizer and *α* is some term, which we refer to [24], for further details.

As one can notice, this convergence analysis was considered for the discrete-time setting, so a natural extension from this is to the continuous-time framework. The work of Blomker et al. [18] provide a first convergence analysis in this direction. However given both these works, a full understanding in the nonlinear setting has not been achieved, where considerable work is still required. Thus these papers provide a first step in doing so, for both settings.

#### **5.3 Engineering applications**

As a final direction to discuss in detail, which is very much related to the theme of this book, are applications in particular engineering applications. The advantage of these ensemble Kalman methods, is that they can be viewed as a black box-solver, therefore it is highly applicable. One particular application has been geophysical sciences, related to recovering quantities of interest which are below the surface, or subsurface. Examples include the inverse problem of electrical resistivity tomography (ERT), shown below (**Figure 3**).

ERT is concerned with recovering, or characterizing sub-surface materials in terms of their electrical properties, which are recorded through electrodes. It operates very similarly to electrical impedance tomography (EIT), expect the difference being that it is subsurface. This has been also considered for learning permeability of subsurface flow in a range of different settings which can be found in the following papers [39, 40].

#### **Figure 3.**

*Image depicting electrical resistivity tomography, where the the electric currents are recorded at the electrodes of the subsurface material.*

Another interesting direction is related to walls, specifically quantifying uncertainty in thermo-physical properties of walls. This work was conducted by Iglesias et al. [41, 42]. Specifically the application is the inverse problem of recovering the thermodynamic property or temperature. Similar work related to the methodology used here has been used in resin transfer modeling [43], based on problems of moving boundaries. This is a difficult problem to model, however it provides a first step in doing so. Aside from these applications other particular applications include mineral exploration scattering problems, numerical climate models and others [44–46]. It is worth mentioning that, as of now, there is no official online software package for EKI in general. This is currently being developed, but we emphasize to the reader that the methodology presented, with the examples later, are not related to well known softwares that are available in Matlab or Python.

As a side remark, there are more directions beyond what is discussed above. Some others, without going into details, include developing hierarchical approaches, incorporating constrained optimization, and connections with data assimilation strategies [47–51].

#### **6. Numerical experiments**

In this section we provide some numerical experiments highlighting the performance of ensemble Kalman methods for inverse problems. Specifically we will consider EKI as discussed in Section 2. We will compare EKI with its regularized version of TEKI. Both these methodologies will be tested on on two motivating inverse problems arising in geophysical and atmospheric sciences, i.e. a Darcy flow partial differential equation and the Navier–Stokes Equation.

In order to assess a comparison, we will present three different figures. (i) The first being a reconstruction at the end of the iterative scheme; (ii) the error between the approximate solution and the ground truth, and (iii) the data misfit. The equations associated with each are given as.


#### **6.1 Darcy flow**

Our first model problem is an elliptic partial differential equation (PDE), which has numerous applications. Specifically one of them is subsurface flow in a porous medium. The forward problem is concerned with solving for the pressure *p*∈ *H*<sup>1</sup> <sup>0</sup>ð Þ Ω , given the permeability *<sup>κ</sup>* <sup>∈</sup> *<sup>L</sup>*<sup>∞</sup>ð Þ <sup>Ω</sup> and source function *<sup>f</sup>* <sup>∈</sup>*L*<sup>∞</sup>ð Þ <sup>Ω</sup> , where the PDE is given as

$$-\nabla \cdot (\kappa \nabla p) = f, \quad \in \Omega,\tag{47}$$

$$p = 0, \quad \text{on } \Omega. \tag{48}$$

such that we have prescribed Dirichlet boundary conditions, and <sup>Ω</sup> <sup>¼</sup> ½ � 0, 1 <sup>2</sup> <sup>⊂</sup> *<sup>d</sup>*, for *d* ¼ 2, is a Lipschitz domain. The inverse problem associated to solving *p* from (47) is the recovery of the permeability *<sup>κ</sup>* <sup>∈</sup>*L*<sup>∞</sup>ð Þ <sup>Ω</sup> , from noisy measurements of *<sup>p</sup>*, i.e.

*A Review of the EnKF for Parameter Estimation DOI: http://dx.doi.org/10.5772/intechopen.108218*

$$\mathcal{Y} = \mathcal{G}(\kappa) + \eta, \quad \eta \sim \mathcal{N}(\mathbf{0}, \Gamma), \tag{49}$$

where recalling that Gð Þ¼ *κ p*. We consider 64 equidistance observations within the domain, and on the boundary. To numerically solve (47) we employ a centeredfinite difference method with a mesh size of *h* ¼ 1*=*100. For our noisy observations we consider Γ ¼ *γI*, where *γ* ¼ 0*:*01. We will use and compare EKI and TEKI, with an ensemble size of *J* ¼ 50 for both methods. We will run both iterative schemes for *n* ¼ 24 iterations. For our initial ensemble f g *u*<sup>0</sup> *J <sup>j</sup>*¼<sup>1</sup> we consider modeling it as a Gaussian random field, i.e. *u* � *N*ð Þ 0, *C* , which can be done via the Karhunen-Loève expansion

$$u = \sum\_{k \in \mathbb{Z}^+} \sqrt{\lambda\_k} \phi\_k \xi\_k, \quad \xi\_k \sim N(0, \mathbf{1}), \tag{50}$$

where *λk*, *ϕ<sup>k</sup>* ð Þ are the associated eigenvalues and eigenvectors of the covariance operator *C*. There are numerous choice of covariance functions one can take, however a popular choice is the Matérn covariance function, which provides much flexibility for modeling. For full details on various covariance functions, or operators, we refer to reader to [52]. The true unknown of interest is taken to be also a Gaussian random field, but one that is smoother than that of that of the initial ensemble.

Our first set of experiments are provided in **Figure 4** which shows the truth, the reconstruction from using EKI, and that of using TEKI. As we can observe, is it clear that both methodologies work well at learning the true unknown function. However it is clear that the TEKI induces a smoother reconstruction, which arises from the regularization. However, what is interesting is that if we analyze **Figure 5**, we notice

**Figure 4.**

*Reconstruction plots for the Darcy flow PDE example. Left: Truth. Middle: EKI reconstruction. Right: TEKI reconstruction.*

**Figure 5.** *Relative errors and data misfits for the Darcy flow PDE example. We compare EKI with TEKI.*

that the relative error tends to diverge at the end with EKI, and this is due to the overfitting of data. A motivation behind TEKI is to alleviate this. This can be seen vividly as it tends to decrease, and for the data misfit, it remains within the noise level, which is given as

$$\text{noise level} \quad = \| \left( \mathcal{y} - G(u^{\dagger}) \right) \| = \| \eta^{\dagger} \|. \tag{51}$$

#### **6.2 Navier: stokes equation**

Our final test problem is a well-known PDE model arising in numerical weather prediction which is the Navier–Stokes equation (NSE). We consider a 2D NSE defined on a torus <sup>2</sup> <sup>¼</sup> ½ � 0, 1 <sup>2</sup> with periodic boundary conditions. The aim to estimate the velocity *<sup>v</sup>*≔½ Þ� 0, <sup>∞</sup> <sup>2</sup> ! <sup>2</sup> defined as a vector field from the scalar pressure field *<sup>p</sup>*≔½ Þ� 0, <sup>∞</sup> <sup>2</sup> ! <sup>2</sup> . The NSE is given as

$$
\partial\_t \nu + (\nu \cdot \nabla) \nu + \nabla p - \nu \Delta \nu = f, \quad [0, \Leftrightarrow) \times \mathbb{T}^2,\tag{52}
$$

$$\nabla \cdot v = \mathbf{0}, \quad [\mathbf{0}, \,\mathrm{\boldsymbol{\omega}}) \times \mathbb{T}^2,\tag{53}$$

$$w = u, \quad \{\mathbf{0}\} \times \mathbb{T}^2,\tag{54}$$

with initial condition (54) and zero flux (53). From (52) *f* ∈½ Þ� 0, ∞ corresponds to a volume forcing, *ν* is the associated viscosity of the fluid. For the NSE equation we consider a spectral Fourier solver for (52). The PDE is more challenging

#### **Figure 6.**

*Reconstruction plots for the NSE PDE example. Left: Truth. Middle: EKI reconstruction. Right: TEKI reconstruction.*

**Figure 7.** *Relative errors and data misfits for the NSE PDE example. We compare EKI with TEKI.*

*A Review of the EnKF for Parameter Estimation DOI: http://dx.doi.org/10.5772/intechopen.108218*

to invert than the previous example, therefore we take 100 point-wise observations. The setup is largely the same as the previous example, where we take an initial condition based on a Gaussian random field through the KL expansion (50). We will aim to recover the true underlying function *u*† using both EKI and TEKI. The results are obtained from the experiments are presented in **Figures 6** and **7**. A similar phenomenon shows, where the reconstructions work well, however there is an additional smoothness induced through the regularization in TEKI. Similarly, as we see with the relative errors and data misfit the overfitting of the data in the end for EKI. We note that this can be avoided depending on the prior form, its hyperparameters, the observations, and the noise. However we specify particular choices to demonstrate it can occur.

#### **7. Conclusion**

The ensemble Kalman filter (EnKF) is a simplistic, easy-to-implement and powerful algorithm. This has been particularly the case in numerous data assimilation applications for state estimation, which includes the likes of numerical weather prediction, geosciences and more recently machine learning. A major advantage of the method is that, unlike other filters such as the particle filter, it scales better in high dimensions, and can be significantly cheaper. In this chapter we consider the EnKF and its application to parameter estimation. Such a mathematical procedure also has similar applications to the ones states, where one can exploit such techniques for inverse problems. We provide a review and overview of some of the major contributions in this direction, where the resulting methodology is known as ensemble Kalman inversion (EKI), based largely on the work of Iglesias et al. [13]. We presented various avenues the field of EKI has taken such as regularization, extensions to sampling, and other areas. We demonstrated how EKI can perform on two numerical examples PDE examples.

The EKI methodology is one which builds very naturally from many different fields, which acts a strong motivation. For example being an optimizer, one can naturally apply optimization procedures, but also techniques from data assimilation and uncertainty quantification. As a result, this methodology naturally brings researchers from different fields working towards parameter estimation, and inverse problems. This synergy of areas will hopefully ensure new emerging directions within EKI, from a methodological, theoretical and application perspective.

#### **Acknowledgements**

This work was funded by KAUST baseline funding. The author thanks Simon Weissmann for helpful discussions, and for the use of some of the earlier figures, and information on EKS.

#### **Abbreviations**



### **Author details**

Neil K. Chada King Abdullah University of Science and Technology, Thuwal, Saudi Arabia

\*Address all correspondence to: neilchada123@gmail.com

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*A Review of the EnKF for Parameter Estimation DOI: http://dx.doi.org/10.5772/intechopen.108218*

#### **References**

[1] Kaipio J, Somersalo E. Statistical and Computational Inverse Problems. New York: Springer Verlag; 2004

[2] Stuart AM. Inverse problems: A Bayesian perspective. Acta Numer. 2010; **19**:451-559

[3] Tarantola A. Inverse Problem Theory and Methods for Model Parameter Estimation. Philadelphia: SIAM; 1987

[4] Lorenc AC. Analysis methods for numerical weather prediction. Quarterly Journal of the Royal Meteorological Society. 1986;**112**(474):1177-1194

[5] Majda A, Wang X. Non-linear Dynamics and Statistical Theories for Basic Geophysical Flows. Cambridge University Press; 2006

[6] Oliver D, Reynolds AC. Liu N. 1st ed. Cambridge University Press: Inverse Theory for Petroleum Reservoir Characterization and History Matching; 2008

[7] Bucy RS. Nonlinear filtering theory. IEEE Transactions on Automatic Control. 1965;**10**(198):198

[8] Kalman RE. A new approach to linear filtering and prediction problems. Transactions ASME (Journal of Basic Engineering). 1960;**82**:35-45

[9] Bain A, Crisan D. Fundamentals of Stochastic Filtering. New York: Springer; 2009

[10] Law KJH, Stuart AM, Zygalakis K. Data assimilation: A mathematical introduction. In: Texts in Applied Mathematics. Springer; 2015

[11] Evensen G. Data Assimilation: The Ensemble Kalman Filter. Springer; 2009 [12] Evensen G. The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean dynamics. 2003;**53**(4):343-367

[13] Iglesias MA, Law KJH, Stuart AM. Ensemble Kalman methods for inverse problems. Inverse Problems. 2013;**29**

[14] Li G, Reynolds AC. Iterative ensemble Kalman filters for data assimilation. SPE Journal. 2009;**14**:496-505

[15] Lehtinen MS, Paivarinta L, Somersalo E. Linear inverse problems for generalised random variables. Inverse Problems. 1989;**5**(4):599-612

[16] Schillings C, Stuart AM. Analysis of the ensemble Kalman filter for inverse problems. SIAM Journal on Numerical Analysis. 2017;**55**(3):1264-1290

[17] Blomker D, Schillings C, Wacker P, Weissmann S. Well posedness and convergence analysis of the ensemble Kalman inversion. Inverse Problems. 2019;**35**(8):085007

[18] Blomker D, Schillings C, Wacker P, Weissmann S. Continuous time limit of the stochastic ensemble Kalman inversion: Strong convergence analysis. Preprint arXiv:2107.14508. 2021

[19] Benning M, Burger M. Modern regularization methods for inverse problems. Acta Numer. 2018;**27**:1-111

[20] Engl HW, Hanke K, Neubauer A. Regularization of Inverse Problems, Mathematics and its Applications. Vol. 375. Dordrecht: Kluwer Academic Publishers Group; 1996

[21] Iglesias MA. A regularising iterative ensemble Kalman method for

PDE-constrained inverse problems. Inverse Problems. 2016;**32**

[22] Iglesias MA, Yang Y. Adaptive regularisation for ensemble Kalman inversion with applications to nondestructive testing and imaging. arXiv preprint arXiv:2006.14980. 2020

[23] Chada NK, Tong XT, Stuart AM. Tikhonov regularization for ensemble Kalman inversion. SIAM Journal on Numerical Analysis. 2020;**58**(2): 1263-1294

[24] Chada NK, Tong XT. Convergence acceleration of ensemble Kalman inversion in nonlinear settings. Mathematics of Computation. 2022; **91**(335):1247-1280

[25] Lee Y. *lp* regularization for ensemble Kalman inversion. SIAM Journal on Scientific Computing. 2021;**43**(5): 3417-3437

[26] Schneider T, Stuart AM, Wu J-L. Imposing sparsity within ensemble Kalman inversion. arXiv preprint, arXiv: 2007.06175. 2020

[27] Weissmann S, Chada NK, Schillings C, Tong XT. Adaptive Tikhonov strategies for stochastic ensemble Kalman inversion. Inverse Problems. 2022;**38**(4):045009

[28] Ernst OG, Sprungk B, Starkloff H-J. Analysis of the ensemble and polynomial chaos Kalman filters in Bayesian inverse problems. SIAM/ASA Journal on Uncertainty Quantification. 2015;**3**: 823-851

[29] Garbuno-Inigo A, Hoffmann F, Li W, Stuart AM. Interacting Langevin diffusions: Gradient structure and ensemble Kalman sampler. SIAM Journal on Applied Dynamical Systems. 2020; **19**(1):412-441

[30] Pavliotis G. Stochastic processes and applications: Diffusion processes, the Fokker–Planck and Langevin equations. In: Texts in Applied Mathematics. New York: Springer; 2014

[31] Kullback R, Leibler S. On information and sufficiency. Annals of Mathematical Statistics. 1951;**22**:79-86

[32] Markowich PA, Villani C. On the trend to equilibrium for the Fokker Planck equation: An interplay between physics and functional analysis, Physics and Functional Analysis, Matematica Contemporanea (SBM). 19, 1999

[33] Garbuno-Inigo A, Nüsken N, Reich S. Affine invariant interacting Langevin dynamics for Bayesian inference. SIAM Journal on Applied Dynamical Systems. 2020;**19**(3): 1633-1658

[34] Reich S, Weissmann S. Fokker– Planck particle Systems for Bayesian Inference: Computational approaches. SIAM Journal of Uncertainty Quantification. 2021;**9**(2):446-482

[35] Ding Z, Li Q. Ensemble Kalman sampler: Mean-field limit and convergence analysis. SIAM Journal on Mathematical Analysis. 2021;**53**(2): 1546-1578

[36] Kovachki NB, Stuart AM. Ensemble Kalman inversion: A derivative-free technique for machine learning tasks. Inverse Problems. 2019;**35**(9):095005

[37] Haber E, Lucka F, Ruthotto L. Never look back - A modified EnKF method and its application to the training of neural networks without back propagation. arxiv preprint. 1805;**08034**: 2018

[38] Guth PA, Schillings C, Weissmann S. Ensemble Kalman filter for neural

*A Review of the EnKF for Parameter Estimation DOI: http://dx.doi.org/10.5772/intechopen.108218*

network based one-shot inversion. arXiv preprint. 2020

[39] Tso CM, Iglesias M, Wilkinson P, Kuras O, Chambers J, Binley A. Efficient multiscale imaging of subsurface resistivity with uncertainty quantification using ensemble Kalman inversion. Geophysical Journal International. 2021;**25**(2)

[40] Muir JB, Tsai VC. Geometric and level set tomography using ensemble Kalman inversion. Geophysical Journal International. 2020;**220**:967-980

[41] Iglesias MA, Sawlan Z, Scavino TR, Wood C. Bayesian inferences of the thermal properties of a wall using temperature and heat flux measurements. International Journal of Heat and Mass Transfer. 2018;**116**: 417-431

[42] De Simon L, Iglesias MA, Jones B, Wood C. Quantifying uncertainty in thermal properties of walls by means of Bayesian inversion. Energy and Buildings. 2018;**177**(2):177

[43] Iglesias M, Park M, Tretyakov MV. Bayesian inversion in resin transfer modelling. Inverse Problems. 2018;**34**(10)

[44] Sungkono S, Apriliani E, Saifuddin N, Fajriani F, Srigutomo W. Ensemble Kalman inversion for determining model parameter of selfpotential data in the mineral exploration. In: Biswas A, editor. Self-Potential Method: Theoretical Modeling and Applications in Geosciences. Cham: Springer Geophysics. Springer; 2021

[45] Dunbar ORA, Garbuno-Inigo A, Schneider T, Stuart AM. Calibration and uncertainty quantification of convective parameters in an idealized GCM. Journal of Advances in Modeling Earth Systems. 2021

[46] Huang J, Li Z, Wang B. A Bayesian level set method for the shape reconstruction of inverse scattering problems in elasticity. Computers & Mathematics with Applications. 2021; **97**(1):18-27

[47] Albers DJ, Blancquart PA, Levine ME, Seylabi EE, Stuart AM. Ensemble Kalman methods with constraints. Inverse Problems. 2019; **35**(9):095007

[48] Chada NK, Chen Y, Sanz-Alonso D. Iterative ensemble Kalman methods: A unified perspective with some new variants. Foundations of Data Science. 2021;**3**(3):331-369

[49] Chada NK, Iglesias MA, Roininen L, Stuart AM. Parameterizations for ensemble Kalman inversion. Inverse Problems. 2018;**34**

[50] Chada NK, Schillings C, Weissmann S. On the incorporation of box-constraints for ensemble Kalman inversion. Foundations of Data Science. 2019;**1**(4):433-456

[51] Tong XT, Morzfield M. Localized ensemble Kalman inversion. arXiv preprint arXiv:2201.10821. 2022

[52] Lord G, Powell CE, Shardlow T. An Introduction to Computational Stochastic PDEs. Cambridge Texts in Applied Mathematics; 2014

### **Chapter 3**

## Nanosatellites: The Next Big Chapter in Atmospheric Tomography

*Gregor Moeller*

#### **Abstract**

Nanosatellite technology opens up new possibilities for earth observation. In the next decade, large satellite constellations will arise with hundreds, up to thousand of satellites in low earth orbit. A number of satellites will be equipped with rather lowcost sensors, such as GNSS receivers, suited for atmospheric monitoring. However, the future evolution in atmospheric science leans not only on densified observing systems but also on new, more complex analysis methods. In this regard, tomographic principles provide a unique opportunity for sensor fusion. The difficulty in performing the conversion of integral measurements into 3D images is that the signal ray path is not a straight line and the number of radio sources and detectors is limited with respect to the size of the object of interest. Therefore, the inverse problem is either solved linearly or iterative nonlinear. In this chapter, an overview about the individual solving techniques for the tomographic problem is presented, including strategies for removing deficiencies of the ill-posed problem by using truncated singular value decomposition and the L-curve technique. Applied to dense nanosatellite formations, a new quality in the reconstruction of the 3D water vapor distribution is obtained, which has the potential for leading to further advances in atmospheric science.

**Keywords:** GNSS, radio occultation, nanosatellites, singular value decomposition, wet refractivity

#### **1. Introduction**

For the reconstruction of two- or three-dimensional structures from integral measurements of atmospheric excess phase, e.g. as obtained from signals of the Global Navigation Satellite Systems (GNSS), a technique called atmospheric tomography has been invented. The basic mathematics behind was introduced by Johann Radon in 1917 and is therefore also known as the Radon transform [1]. Its first realization in form of an axial scanning computer tomograph for cross-sectional imaging of the human body was awarded in 1979 with the Nobel prize for medicine [2, 3]. Around the same time, the tomography concept was utilized for applications in geosciences. One of the very first results was communicated by [4] in 1977, who describe a threedimensional inversion method for simultaneous reconstruction of seismic body wave velocities and epicenter coordinates.

According to [5] the basic mathematical principle of tomography is described as follows:

$$\int\_{\mathcal{S}} f\_s = \int\_{\mathcal{S}} \mathbf{g}(\mathbf{s}) \cdot d\mathbf{s} \tag{1}$$

where *fs* is the integral function, *g s*ð Þ is the object property function, and *ds* is a small element of the ray path *S* along which the integral is determined.

In atmospheric tomography, *g s*ð Þ is typically replaced by refractivity *n*, which is connected to signal velocity *vp* by the constant speed of light *c*.

$$v\_p = \frac{c}{n} \tag{2}$$

In vacuum *n* ¼ 1, in matter is *n* 6¼ 1, i.e. the signal is slower or faster than the speed of light, dependent on the electric and magnetic properties of the medium. The integral measure *fs* is usually replaced by excess phase<sup>1</sup> or signal travel time which can be converted to excess phase by multiplying with the speed of light.

One difficulty in performing the integral is that the signal path through the atmosphere depends on the object properties along the signal path and is, therefore, not a straight line. A change in atmospheric conditions leads to a change in *S* and integral function *fs* . Another challenge is related to the distribution of the radio sources and the number of detectors with respect to the size of the object of interest. From single satellite missions, the distribution of integral measurements is not optimal for the reconstruction of three-dimensional structures in an analytical way using the Radon transform. To overcome this limitation, in atmospheric sounding the Abel transform [6], a further simplification of the Radon transform, is generally applied. It allows for the determination of one-dimensional profiles of refractivity from measurements of excess phase, assuming spherical symmetry. In 1965, this technique was applied to measurements of the Mariner four spacecraft to study important properties of the Martian atmosphere and is nowadays commonly applied to GNSS phase measurements obtained from dedicated radio occultation missions [7–9]. However, standard processing strategies based on the Abel transform do not allow for resolving horizontal features in the atmosphere. With the advent of nanosatellite technology, the number and distribution of signals have significantly increased - leading to the situation that the assumptions made to derive the Abel transform (spherical symmetry and parallel observation paths) become a limiting factor in the analysis of space-based radio occultation observations. To overcome this limitation, the existing observations can be stacked together to solve Eq. (1) either linearly or iterative non-linearly [5]. A complete non-linear solution is difficult to achieve but also not necessary for most applications in geoscience since it can be demonstrated that the signal path is not significantly perturbed by linearization assumptions. In Section 2, common solving techniques (linear and non-linear) are presented, and in Section 3 and Section 4, it is analyzed whether they can be utilized to reconstruct refractivity fields in the neutral atmosphere from GNSS measurements of atmospheric excess phase on-board dense nanosatellite formations.

<sup>1</sup> In literature this quantity has been given many different names, such as *atmospheric excess phase*, *atmospheric phase delay* or derivations thereof.

#### **2. Solving techniques**

If *fs* in Eq. (1) is replaced by the atmospheric excess phase (*aep*) and *g s*ð Þ by the relation *n* � 1, the basic function of atmospheric tomography is obtained as follows:

$$aep = \int\_{S} n \cdot ds - \int\_{S\_0} ds \tag{3}$$

where *S* is the "true" signal path and *S*<sup>0</sup> is the theoretical straight line signal path in a vacuum. The second term on the right-hand side of Eq. (3) stems from the definition of *aep*, describing only the atmospheric contribution to the excess phase. Strictly speaking also the limits of the integral have to be adapted to the relevant parts in the atmosphere, e.g. up to about 80*km* altitude for the neutral atmosphere.

Fermat's principle tells us that first-order changes in the signal path cause secondorder changes in signal travel time, i.e. for small variations in the object properties, the travel time is stationary. This principle is very beneficial since it allows to define two simplified versions of atmospheric tomography, the so-called linear and non-linear approach. In linear tomography, the bent signal path *S* is replaced by a straight line *S*<sup>0</sup> and corrections to *n* are made by ignoring atmospheric bending. In contrast, the iterative non-linear approach takes the signal bending into account by the definition of the ray paths but not in the inversion of *n* along *ds*. This means after each processing step the signal paths are re-computed, e.g. by solving the so-called Eikonal using raytracing shooting techniques [10].

A numerical solution for Eq. (1) is obtained by discretizing the object of interest, e.g. the neutral atmosphere in area elements (in two-dimensions) or volume elements (in three-dimensions). Further, it is assumed that in each volume element the index of refraction is constant.<sup>2</sup> In the atmosphere, the index of refraction *n* is close to 1, thus it can be replaced by refractivity *<sup>N</sup>* <sup>¼</sup> ð Þ� *<sup>n</sup>* � <sup>1</sup> <sup>10</sup>6. With these adaptions Eq. (1) reads:

$$aep = \sum\_{k=1}^{m} N\_k \cdot d\_k \tag{4}$$

where *Nk* is the constant refractivity and *dk* is the ray length in the volume element *k*. Assuming *l* observations, indexed by *j* ¼ 1,2, … ,*l* and *m* volume elements (short: voxels), indexed by *k* ¼ 1,2, … ,*m*, the individual observation equations can be combined into a linear equation system. In matrix notation the resulting tomographic equation reads:

$$\mathbf{A}\mathbf{E}\mathbf{P} = \mathbf{A} \cdot \mathbf{N} \tag{5}$$

where *AEP* is the observation vector of size ð Þ *l*, 1 and *N* is the unknown vector of size ð Þ *m*, 1 describing the properties in each volume element *k*. The ð Þ *l*, *m* matrix *A* contains the spatial derivatives of the observations *aepj* with respect to the unknowns *Nk*.

<sup>2</sup> In recent works by [11] or [12] alternative parameterizations, such as a trilinear, spline or adaptive node parameterizations are suggested for a more accurate description of the refractivity distribution without considerably increasing the number of unknowns.

$$\mathbf{A} = \begin{bmatrix} \frac{\partial aep\_1}{\partial N\_1} & \frac{\partial aep\_1}{\partial N\_2} & \cdots & \frac{\partial aep\_1}{\partial N\_m} \\ \frac{\partial aep\_2}{\partial N\_1} & \frac{\partial aep\_2}{\partial N\_2} & \cdots & \frac{\partial aep\_2}{\partial N\_m} \\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial aep\_l}{\partial N\_1} & \frac{\partial aep\_l}{\partial N\_2} & \cdots & \frac{\partial aep\_l}{\partial N\_m} \end{bmatrix} \tag{6}$$

Since Eq. (4) is linear, the partial derivatives of *aep* are the ray lengths (*dk*) in each voxel. For linear tomography, *dk* is computed from the line of sight vector between the transmitter and receiver. In the non-linear approach, the bending of the signal is taken into account, e.g. by using ray tracing shooting techniques [13]. Therefore, a priori information about the atmospheric state (e.g. in the form of numerical weather forecasts) is needed. Dependent on the quality of the a priori model, additional iterations might be necessary. After each iteration, the estimated refractivity field is considered as input for the ray tracer. The processing is repeated until the determination of the ray path converges. This happens usually after 1–2 iterations.

#### **2.1 The inverse problem**

An analytical solution for the tomographic equation (Eq. (5)) can be found by the inversion of matrix *A*.

$$\mathbf{N} = \mathbf{A}^{-1} \cdot \mathbf{A} \mathbf{E} \mathbf{P} \tag{7}$$

The inverse *A*�<sup>1</sup> exists if the determinant of *A* is non-zero. This requires that *A* is a squared matrix ð Þ *l* ¼ *m* . Otherwise, the matrix *A* is called singular, i.e. does not have a matrix inverse. Unfortunately latter is the case in most atmospheric tomography applications since the observation data, e.g. the GNSS measurements, are considered as "incomplete". Therefore, the matrix *A* has zero singular values and Eq. (7) becomes ill-posed. In literature, several strategies are described, which allow to remove the deficiencies of the ill-posed problem. They either try to solve the inverse problem or to avoid it. The most prominent ones are:


These three techniques have been selected since they were proven in practice as the most reliable, as described briefly in the following subsections.

#### *2.1.1 Algebraic reconstruction techniques*

The iterative algebraic reconstruction technique (ART) has been suggested in 1937 by [14] for solving linear equation systems. This technique avoids the inversion problem and initializes the matrix *A* row-wise. This is very beneficial for large equation systems. Applied to Eq. (7) the ART algorithm reads:

*Nanosatellites: The Next Big Chapter in Atmospheric Tomography DOI: http://dx.doi.org/10.5772/intechopen.108522*

$$N^{i+1} = N^i + \frac{a\nu}{\langle A\_j, A\_j \rangle} \cdot \left( aep\_j - \left\langle A\_j^T, \ N^i \right\rangle \right) \cdot A\_j \tag{8}$$

where *Aj* indicates the *j*th row of the matrix *A*, *Aj*, *Aj* � � is the resulting inner product and the difference *aepj* � *AT <sup>j</sup>* , *<sup>N</sup><sup>i</sup>* D E is the prefit-residual from the last iteration. Based on the number of traversed volume elements and the relaxation factor *ω* the residual is split into multiple components and applied to *N<sup>i</sup>* in order to obtain the improved refractivity field *Ni*þ<sup>1</sup> , which is again input for the next iteration. The processing is stopped once Eq. (8) converges to the solution of Eq. (7) with minimal norm if 0<*ω* <2. For ground-based GNSS networks, the best results have been obtained with a relaxation parameter of � 0*:*175, see [15]. Studies from [16] or [17] manifest that the algorithms of the ART family are also very successful in reconstructing the total electron content (TEC) in the ionosphere. Dependent on how the discretization is done, different realizations of ART exist. For tomographic reconstruction especially the multiplicative algebraic reconstruction techniques (MART) and the simultaneous iterative reconstruction techniques (SIRT) are worth mentioning, see [18] or [19]. In contrast to the original ART algorithm, MART leads in general to faster convergence and SIRT has the benefit of being impervious to the order of measurements (*aepj* ).

#### *2.1.2 Truncated singular value decomposition*

For ill-conditioned least squares problems, [20, 21] invented a general solution, widely known as pseudo inverse or Moore-Penrose inverse *A*þ. A numerical solution for the pseudo inverse can be obtained by singular value decomposition [22]. This requires a split of the matrix *A* into three components as follows:

$$\mathbf{A} = \mathbf{U} \cdot \mathbf{S} \cdot \mathbf{V}^T \tag{9}$$

with *U* (*l*,*l*) and *V<sup>T</sup>*(*m*,*m*) as orthogonal matrices, containing the normalized left and right singular vectors of *A*, respectively. Matrix *S* (*l*,*m*) is a diagonal matrix with singular values *sk*,*<sup>k</sup>* arranged in descending order. By using only the non-zero diagonal elements of *S* the pseudo inverse is obtained as follows:

$$\mathbf{A}^+ = \mathbf{V} \cdot \mathbf{S}^{-1} \cdot \mathbf{U}^T \tag{10}$$

The 2-norm of the matrix **S** defines the condition number *κ*ð Þ **A** . It can be interpreted as the ratio between the largest and the smallest singular values.

$$\kappa(\mathbf{A}) = \frac{|s\_{\max}|}{|s\_{\min}|} \tag{11}$$

A well-conditioned matrix has a condition number *κ*ð Þ *A* near 1. The resulting tomography solution is rather insensitive to measurement errors. A large condition number indicates an ill-conditioned problem. According to Eq. (11), the condition number of *A* improves by neglecting tiny singular values. This technique is known as truncated singular value decomposition (TSVD), see [23]. It allows to approximate the ill-conditioned matrix *A* by a matrix *A*~ of lower rank.

For atmospheric tomography, [24] suggested *slim* ¼ 2*:*8*km* as the threshold for *sk*,*k*, i.e. all singular values smaller than *slim* are set to zero. However, in practice an optimal threshold for *slim* can be determined using the L-curve technique [23]. Therefore, a set of solutions is determined with varying *slim* -values. For each solution, the 2-norm of the estimated parameters is plotted against the 2-norm of the residuals. By connecting all points in a log–log plot a concave L-shaped curve is obtained, whereby the corner of the curve, i.e. the point of maximum curvature, defines the optimal threshold (*sopt*) for singular value decomposition. **Figure 1** shows an L-curve for a typical GNSS tomography setting. In this example, the optimal solution is obtained by setting *sopt* to <sup>0</sup>*:*032. This point was found by testing various values for *slim* between 3 � <sup>10</sup>�<sup>1</sup> and <sup>3</sup> � <sup>10</sup>�<sup>5</sup> . After each processing step, the solution norm *log* k k *N* <sup>2</sup> is plotted against the residual norm *log* k k *A* � *N* � *AEP* <sup>2</sup> and the corner point of the resulting curve is marked as the optimal solution.

#### *2.1.3 Tikhonov regularization*

A more generalized solution to the regularization problem can be found in [25], who describes the minimization problem as follows:

$$\mathbf{N}\_{\eta} = \arg\min \left\{ \left\| \mathbf{A} \cdot \mathbf{N} - \mathbf{A} \mathbf{E} \mathbf{P} \right\|\_{2}^{2} + \eta^{2} \left\| \mathbf{L} (\mathbf{N}\_{0} - \mathbf{N}) \right\|\_{2}^{2} \right\} \tag{12}$$

where *η* is called the regularization parameter or Tikhonov factor and *N*<sup>0</sup> is an approximation of *N*. The "size" of the solution is defined by the norm k k *L N*ð Þ <sup>0</sup> � *N* <sup>2</sup> and the "fit" by the norm of the residual vector k k *A* � *N* � *AEP* 2. One possibility to solve Eq. 12 is to treat it as a least squares problem. In [26] it is shown that the matrix *L* can be replaced by an identity matrix *I*, i.e. the condition number of *A* is improved by adding a small multiple of the identity to the matrix *A*.

$$
\tilde{\mathbf{A}} = \mathbf{A} + \eta \cdot \mathbf{I} \tag{13}
$$

A possible solution for *η* can be obtained by means of singular value decomposition (see Section 2.1.2). Thereby the elements of the diagonal matrix *S* are replaced by the coefficients *rk:<sup>k</sup>*.

**Figure 1.**

*Representative L-curve for a typical GNSS tomography inversion problem. The red dot indicates the corner point of the L-curve and therewith the optimal tomography solution.*

*Nanosatellites: The Next Big Chapter in Atmospheric Tomography DOI: http://dx.doi.org/10.5772/intechopen.108522*

$$r\_{k,k} = \frac{s\_{k,k}^2}{s\_{k,k}^2 + \eta} \tag{14}$$

If the Tikhonov factor is defined as a sharp filter

$$\eta = \begin{cases} \; \mathbf{1} for s\_{k,k} \ge s\_{lim} \\ \; \mathbf{0} for s\_{k,k} < s\_{lim} \end{cases} \tag{15}$$

the resulting solution can be interpreted as smoothed TSVD solution.

#### **2.2 The partial least squares solution**

By treating Eq. (7) as a least squares problem, the basic equation of the weighted least squares estimator reads:

$$
\hat{\mathbf{N}} = \left(\mathbf{A}^T \cdot \mathbf{P} \cdot \mathbf{A}\right)^{-1} \cdot \mathbf{A}^T \cdot \mathbf{P} \cdot \mathbf{A} \mathbf{EP} \tag{16}
$$

where *P* is a weighting matrix, which allows to take the relative accuracy and possible constraints between the observations into account. The least squares solution *N*^ is obtained by minimizing the 2-norm of the observation residuals. Thereby we assume, that the observations are normally distributed, i.e. free of gross errors or systematic effects.

By combining Eq. (10) with Eq. (16) the tomography solution reads:

$$
\hat{\mathbf{N}} = \mathbf{V} \cdot \mathbf{S}^{-1} \cdot \mathbf{U}^T \cdot \mathbf{A}^T \cdot \mathbf{P} \cdot \mathbf{A} \mathbf{EP} \tag{17}
$$

where the columns of *U* and *V<sup>T</sup>* are the normalized left and right singular vectors of *<sup>A</sup><sup>T</sup>* � *<sup>P</sup>* � *<sup>A</sup>*, respectively and matrix *<sup>S</sup>* (*l*,*m*) is the diagonal singular value matrix as defined in Section 2.1.2.

#### *2.2.1 The a priori field*

In Section 2, the linear and non-linear approaches have been defined to reconstruct the GNSS signal paths through the atmosphere. While the linear approach is not dependent on any external data, the non-linear approach requires an a priori refractivity field, e.g. derived from the standard atmosphere or numerical weather forecasts, to reconstruct the bent signal path. Besides, the a priori field can be also utilized to stabilize the tomographic equation system. One possibility is to treat the additional information (*N*0) as absolute constraints:

$$
\hat{\mathbf{N}} = \mathbf{N}\_0 + \mathbf{V} \cdot \mathbf{S}^{-1} \cdot \mathbf{U}^T \cdot \mathbf{A}^T \cdot \mathbf{P} \cdot (\mathbf{A} \mathbf{E} \mathbf{P} - \mathbf{A} \cdot \mathbf{N}\_0) \tag{18}
$$

In the following, this solution is called the **constrained solution.**

Another possibility to handle the extended equation system is to treat it as a system of subsets with

$$\mathbf{A}\_{\rm ext} = \begin{bmatrix} \mathbf{A} \\ \mathbf{A}\_{\rm c} \end{bmatrix} \tag{19}$$

$$\mathbf{AEP}\_{\text{ext}} = \begin{bmatrix} \mathbf{AEP} \\ \mathbf{N}\_0 \end{bmatrix} \tag{20}$$

$$\mathbf{P}\_{\text{ext}} = \begin{bmatrix} \mathbf{P} \\ \mathbf{P}\_{\varepsilon} \end{bmatrix} \tag{21}$$

where *A<sup>c</sup>* is the design matrix and *P<sup>c</sup>* the weighting matrix for *N*0. The extended equation system can be solved using Eq. (17), whereby *A*, *AEP* and *P* are replaced by its extended complements *Aext*, *AEPext* and *Pext*. In principle, it provides the same results as the constrained solution.

A third possibility would be to solve Eq. (18) separately for each observation type using the estimates (*N*^ ) and the variance–covariance matrix of the estimates (*CovNN*^ ) from the first step as a priori information for the next step. In the case of two subsets the corresponding tomography solution reads:

$$
\hat{\mathbf{N}}\_1 = \mathbf{V} \cdot \mathbf{S}^{-1} \cdot \mathbf{U}^T \cdot \mathbf{A}^T \cdot \mathbf{P} \cdot \mathbf{A} \mathbf{EP} \tag{22}
$$

$$\mathbf{Cov}\_{\hat{N}N} = \mathbf{V} \cdot \mathbf{S}^{-1} \cdot \mathbf{U}^{T} \tag{23}$$

where *U*, *V* and *S* are obtained by singular value decomposition of the matrix *<sup>A</sup><sup>T</sup>* � *<sup>P</sup>* � *<sup>A</sup>*. For the second (final) solution both, *<sup>N</sup>*^ <sup>1</sup> and *CovNN*^ are introduced into the equation system as follows:

$$
\hat{\mathbf{N}} = \hat{\mathbf{N}}\_1 + \mathbf{V} \cdot \mathbf{S}^{-1} \cdot \mathbf{U}^T \cdot \mathbf{A}\_\varepsilon^T \cdot \mathbf{P}\_\varepsilon \cdot \left(\mathbf{N}\_0 - \mathbf{A}\_0 \cdot \hat{\mathbf{N}}\_1\right) \tag{24}
$$

with *S*, *U* and *V* obtained by truncated singular value decomposition (see subsection 2.1.2) of the matrix *A<sup>T</sup>* <sup>0</sup> � *<sup>P</sup>*<sup>0</sup> � *<sup>A</sup>*<sup>0</sup> <sup>þ</sup> *Cov*�<sup>1</sup> *XX*^ *:* In the following this solution is called the **partial solution**. In case the matrix *A* is of full rank or if only one set of observations is available, the constrained solution and the partial solution provide identical results. In the case of an ill-conditioned matrix, the partial solution has the advantage that the eigenvalue can be computed for each subset of observations. In large equation systems, this allows to reduce computational load since the matrix *A* is divided into several parts.

#### *2.2.2 Observation weights*

Up to now, the individual observations were considered as uncorrelated and equally accurate. However, for varying input data it might be beneficial to set up a weighting matrix. In case the relative accuracy between observations is known, they can be directly introduced into the equation system by defining the weighting matrix

$$P = \sigma\_0^2 \cdot \mathsf{Cov}\_{ll}^{-1} \tag{25}$$

where variance co-variance matrix *Covll* reflects the precision of the observations on its diagonal elements (*σ*<sup>2</sup> *<sup>n</sup>*) with *σ*<sup>2</sup> <sup>0</sup> as the a priori variance of the unit weight. In the case of uncorrelated observations and unit variances, the matrix *Covll* simplifies to an identity matrix of size ð Þ *n*, *n* .

In case no accurate information is available, a weighting model can be utilized. For ground-based GNSS observations, an elevation-dependent weighting is common, for satellite-to-satellite observations a weighting based on carrier-to-noise density *C=N*0

*Nanosatellites: The Next Big Chapter in Atmospheric Tomography DOI: http://dx.doi.org/10.5772/intechopen.108522*


**Table 1.**

*Standard deviation of refractivity and its components, assuming a typical meteo sensor error of* 0*:*3*hPa for pressure,* 0*:*2*K for temperature and* 3% *for relative humidity - computed for standard atmospheric conditions at sea level (p* <sup>¼</sup> <sup>1013</sup>*hPa*,*<sup>T</sup>* <sup>¼</sup> <sup>15</sup><sup>∘</sup> *C*,*rh* ¼ 60%*).*

seems to be more useful since the observations are usually gathered around 0deg the elevation angle or below. For the a priori refractivity field, a height-dependent weighting model is recommended. By focusing on the neutral atmosphere and assuming that Eq. (26) is exact

$$N = K\_1 \cdot \frac{p\_d}{T} + K\_2 \cdot \frac{e}{T} + K\_3 \cdot \frac{e}{T^2} \tag{26}$$

the theoretical standard deviation for refractivity reads<sup>3</sup> :

$$
\sigma\_N = \left[ \left( \frac{\partial \mathbf{N}}{\partial T} \cdot \sigma\_T \right)^2 + \left( \frac{\partial \mathbf{N}}{\partial p} \cdot \sigma\_p \right)^2 + \left( \frac{\partial \mathbf{N}}{\partial \epsilon} \cdot \sigma\_\epsilon \right)^2 + \dots \right]^{\frac{1}{2}} \tag{27}
$$

For in-situ measurements, [27] provides theoretical standard deviations for pressure *p*, temperature *T*, and water vapor pressure<sup>4</sup> *e* for a wide range of meteorological sensors. In addition, height-dependent error curves for the three meteorological parameters can be obtained from [28]. The resulting uncertainties, assuming standard atmospheric conditions at sea level, are listed in **Table 1**.

The standard deviation of refractivity is 2*:*39*ppm*, which equates to a relative uncertainty of 0*:*75%. By far the largest impact (2*:*37*ppm*) is related to the uncertainty of water vapor. Consequently, the utmost care has to be taken when measuring humidity and temperature.

#### **3. Observations of atmospheric excess phase**

Satellite refractometric sounding of the atmosphere and the underlying inverse problems have been under investigation since the 1960s, see [29–31]. However, it was not until 1976 that the first radio occultation (RO) experiment was carried out to survey the earth's atmosphere within the Apollo-Soyuz mission [32]. Until then, the major problem noted was the lack in accuracy of refractometric measurements of phase or Doppler shift [33]. This limitation has widely been overcome with the emergence of the Global Positioning System (GPS) around the 1980s [7]. Since the proof of concept during the GPS-MET satellite mission in 1995 various satellites have been equipped with precise GNSS radio occultation receivers, leading to approximately 500 � 600 globally distributed radio occultation profiles per day and satellite, assuming a 32-satellite GPS constellation.

<sup>3</sup> Assuming that the uncertainty of the refractivity constants is negligible.

<sup>4</sup> Since water vapor pressure is usually not measured directly, it can be computed from relative humidity and temperature.

#### **3.1 The observation equation**

The phase that a receiver obtains from a GNSS satellite can be modeled as

$$L\_{r,\nu}^{s} = \mathbf{q}\_r^s + \boldsymbol{\mathcal{c}} \cdot \boldsymbol{\delta t}\_r - \boldsymbol{\mathcal{c}} \cdot \boldsymbol{\delta t}^s + \Delta \mathbf{q}\_{r,\nu}^s + \boldsymbol{\nu} \cdot \boldsymbol{n}\_{r,\nu}^s \tag{28}$$

with


The objective of refractometric sounding is to extract the atmospheric propagation effects from the phase observations. Assuming that relativistic effects, satellitespecific multipath effects and antenna-specific phase center corrections are known and removed, the remaining effects in Δϱ*<sup>s</sup> <sup>r</sup>*,*<sup>v</sup>* can be divided into two terms:

$$
\Delta \mathbf{q}\_{r,v}^{\epsilon} = \Delta \mathbf{q}\_{r,trp}^{\epsilon} + K \frac{TEC\_{r,v}^{\epsilon}}{f\_r^2} \tag{29}
$$

where the first term (Δϱ*<sup>s</sup> <sup>r</sup>*,*trp*) describes the delay of the carrier phase in the neutral atmosphere and the second term *<sup>K</sup> TEC<sup>s</sup> r*,*v f* 2 *r* the advancement of the carrier phase in the ionosphere. The integral term *TECs <sup>r</sup>*,*<sup>v</sup>* is the electron density along the ray path between transmitter s and receiver r, scaled by a constant term *K*.

A detailed description of the individual systematic effects can be found in [34]. In the following, special attention is given to the modeling and estimation of neutral atmospheric effects (Δϱ*<sup>s</sup> <sup>r</sup>*,*trp*) assuming that the first-order ionospheric effect (up to 99*:*9%) can be removed by forming an ionospheric-free linear combination *LIF*. Condition therefore is, that the receiver tracks the GNSS carrier phase simultaneous on two frequencies

$$L\_{IF} = \frac{f\_1^2 \cdot L\_{r,1}^{\prime} - f\_2^2 \cdot L\_{r,2}^{\prime}}{f\_1^2 - f\_1^2} \tag{30}$$

where the nominal frequencies *f* <sup>1</sup> and *f* <sup>2</sup> are defined by the satellite system frequency plan (e.g. 1575*:*42*MHz* for GPS *L*1 and 1227*:*60*MHz* for GPS *L*2).

#### **3.2 Calibration of the phase signal**

For the extraction of atmospheric phase excess from phase observations, first, the phase signal has to be calibrated. Therefore, the clock effects in Eq. (28) are eliminated. This can be achieved if the occulting receiver satellite can simultaneously see an *Nanosatellites: The Next Big Chapter in Atmospheric Tomography DOI: http://dx.doi.org/10.5772/intechopen.108522*

occulting GNSS and a non-occulting GNSS satellite. Elimination of the clock effects is also possible if the same GNSS satellites are visible from a ground-based GNSS receiver. In addition, one needs to precisely know the position and velocity of the transmitting and receiving satellites. The orbits for the GNSS satellites can be obtained from services such as the International GNSS service (see www.igs.org). The position and velocity of the receiving satellite have to be computed using the phase measurements recorded from the GNSS radio occultation receiver or from a dedicated POD (precise orbit determination) receiver on board the receiving satellite. For further details about the procedure of POD, the reader is referred to [35]. A more detailed description of the calibration of the phase measurements is given in, e.g. [36]. Once the signal is calibrated, the atmospheric phase excess can be used to set up the observation equations for the tomographic processing. The precondition for a stable tomography solution is, that enough overlapping observations are available. Therefore, in the following the concept of a dense nanosatellite formation is introduced.

#### **4. Concept of a dense nanosatellite formation**

#### **4.1 Introduction**

In recent years, nanosatellite technology has become increasingly important for a wide variety of applications, such as communication, technology demonstration, heliophysics, astrophysics, earth science, or planetary science [37–39]. Most of the existing mission concepts are based on the CubeSat form factor established by Professor Bob Twiggs at the Department of Aeronautics and Astronautics at Stanford University in late 1999. Although small satellites have existed since the very beginning of spaceflight, the definition of the CubeSat standard "made it possible to bring production to a level of flexibility and innovation never seen before" [40]. As of September 2022, over 2000 nanosatellites were launched into orbit, with a record number of 143 satellites launched on a single rocket on board the Transporter-1 mission in January 2021. In the next decade, we expect that the number of nanosatellite launches per year will continue to rise by a factor of 4–5, leading to dense observation networks in low earth orbit. Innovations in satellite technology, such as miniaturized GNSS receivers [41] and intelligent processing strategies will further boost the realization of new observation concepts based on nanosatellite technology and the establishment of dense satellite formations in highly interesting but yet scarcely-explored regions in the earth's atmosphere and beyond.

#### **4.2 The observation geometry**

The multi-frequency signals from over a hundred active GNSS satellites gathered on board each nanosatellite allow for measuring the atmospheric state with unprecedented spatiotemporal resolution. For the proof of concept, we assume a formation of four nanosatellites, injected into a polar orbit. The advantage from such a configuration is that we can get simultaneous radio occultation observations that are closely located [42]. **Figure 2** shows the observation geometry together with the ray paths through the lower 8*km* of the atmosphere.

The spacing between the nanosatellites is set to *dM* ¼ 1*:*9deg (approx. 230*km*). At an altitude of 550*km* this corresponds to a temporal spacing of about 30*s*. Due to the

limb-sounding geometry and high inclination of most nanosatellites, a global distribution can be obtained with 500 � 600 radio occultation events per nanosatellite and day assuming a 32-satellite GPS constellation.

The angle under which the GNSS satellites are observed is constantly changing. In order to characterize the observation geometry, we distinguish between two scenarios. In the first scenario, the observation angle is close to 90deg, i.e. the RO measurements are obtained in a cross-track direction. In consequence, from the four nanosatellites, we obtain ray paths that are widely parallel to each other. In the second scenario, the angle is close to 0deg or 180deg. This leads to RO measurements in the flight direction or anti-flight direction. In both cases, a unique observation geometry is obtained, in which consecutive observations overlap, as shown in **Figure 2**.

#### **4.3 Tomography case study**

At the time of writing, real GNSS measurements from a dense nanosatellite formation were not available. Thus, for technique demonstration, a closed-loop simulation was carried out using the Weather Research and Forecasting (WRF) model to simulate the atmospheric state along the GNSS radio occultation signals shown in **Figure 2**.

In the first step, the signal paths through the atmosphere were reconstructed every 500*ms* using ray-tracing shooting techniques [13] with a step size of 1*km*. For each ray point, wet refractivity (*Nw*) was computed from WRF temperature (*T*) and water vapor pressure (*e*) fields using Eq. (31)

$$N\_w = K\_2' \cdot \frac{e}{T} + K\_3 \cdot \frac{e}{T^2} \tag{31}$$

where the constant *K*<sup>0</sup> <sup>2</sup> is given by

$$K\_2' = K\_2 - K\_1 \cdot \frac{M\_w}{M\_d} \tag{32}$$

**Figure 2.**

*Left: The observation geometry for one GNSS satellite simultaneous observed by four nanosatellites in a string-ofpearls formation. Right: The resulting radio occultation signal paths.*

*Nanosatellites: The Next Big Chapter in Atmospheric Tomography DOI: http://dx.doi.org/10.5772/intechopen.108522*

with *<sup>K</sup>*<sup>1</sup> <sup>¼</sup> <sup>77</sup>*:*<sup>689</sup> *<sup>K</sup> hPa*, *<sup>K</sup>*<sup>2</sup> <sup>¼</sup> <sup>71</sup>*:*<sup>2952</sup> *<sup>K</sup> hPa* and *<sup>K</sup>*<sup>3</sup> <sup>¼</sup> <sup>375463</sup> *<sup>K</sup>*<sup>2</sup> *hPa* as the refractivity constants and *Mw* and *Md* as the molar mass of dry air and water vapor, respectively.

**Figure 3** (left) shows the resulting wet refractivity distribution in the lowest 8*km* of the atmosphere, with a water vapor inversion layer at a height of approximately 2 � 4*km*. By integration along the signal paths, the wet refractivity can be converted into atmospheric excess phase using Eq. (4). **Figure 3** (right) shows that the inversion layer in the WRF model also propagates into the simulated observations of atmospheric phase excess.

To reconstruct the 2D refractivity fields from the atmospheric excess phase, the area covered by the observations was discretized in area elements with a grid size of 22 km (horizontally) and 0.2 km (vertically). The tomographic processing itself was carried out with the ATom software package [13]. **Table 2** summarizes the major settings.

#### **Figure 3.**

*Left: Weather Research and Forecasting (WRF) model derived wet refractivity fields [ppm] with the overlaying white lines showing the tangent points of the four RO ray paths through the lower* 8*km of the atmosphere. Right: The resulting atmospheric excess phase observations [m] by integration through the wet refractivity field.*


#### **Table 2.**

*Tomography settings applied for the reconstruction of refractivity fields from (simulated) RO observations.*

#### **Figure 4.**

*Top left: Smooth WRF refractivity field used to initialize the tomography solution. Top right: Estimated refractivity field (tomography solution). Bottom left: WRF refractivity field (reference). Bottom right: Closed-loop validation (tomography minus WRF) to assess the performance of the tomography approach.*

The resulting refractivity fields are visualized in **Figure 4**. The upper left plot shows the a priori field, a smooth WRF refractivity field, used to initialize the tomography solution. For the computation of the smoothed field, a sliding window filter was applied to the WRF data to remove the inversion layer and therefore, reduce the information contained in the initial field. In the upper right plot, the actual tomography solution is shown. By comparison with the WRF reference field (lower left plot), the reconstruction capabilities of the tomography approach can be assessed. The differences between the two models are shown in the lower right plot. Overall voxels, a Root Mean Square Error (RMSE) of 0*:*9*ppm* (21*:*8%) and a bias of 0*:*03*ppm* was received.

Overall, the best solution is obtained within the horizontal range (250 km, 250 km) in which multiple observations overlap and therefore, help to stabilize the tomography resolution. In this core domain of the tomography model, an RMSE of 0*:*5*ppm* (9*:*6%) was obtained, which is by a factor of two better than in the outer regions.

#### **5. Conclusions and outlook**

In this chapter, the basic aspects of the remote sensing of the lower earth's atmosphere using tomography radio occultation methods are addressed. My motivation was to provide an overview about the current achievements in tomographic

*Nanosatellites: The Next Big Chapter in Atmospheric Tomography DOI: http://dx.doi.org/10.5772/intechopen.108522*

processing and its potential for the processing of radio occultation measurements collected from very light-weight and power-efficient GNSS sensors onboard dense nanosatellite formations. In a number of closed-loop validations, the expected observations have been analyzed and possible processing strategies have been evaluated. Due to the unique observation geometry, combined processing of overlapping radio occultation measurements using tomographic principles is possible and allows to generate high-resolution cross-sections of the lower atmosphere. Thus, I believe that tomography products have great potential to advance current knowledge, e.g. as a weather analysis tool or as a complementary observation technique for water vapor distribution, which can be assimilated into operational weather forecast systems. Once the required sensor technology is available, not only the communication industry but also the earth observation community will benefit from new observation concepts based on nanosatellite technology. If the proposed observation concept is also suited for the monitoring of the ionosphere has to be evaluated in future studies.

#### **Conflict of interest**

The authors declare no conflict of interest.

### **Abbreviations**


*Inverse Problems - Recent Advances and Applications*

#### **Author details**

Gregor Moeller ETH Zurich, Institute of Geodesy and Photogrammetry, Zurich, Switzerland

\*Address all correspondence to: gmoeller@ethz.ch

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Nanosatellites: The Next Big Chapter in Atmospheric Tomography DOI: http://dx.doi.org/10.5772/intechopen.108522*

#### **References**

[1] Radon J. Über die Bestimmung von Funktionen durch Ihre Integralwerte längs gewisser Mannigfaltigkeiten. Berichte über die Verhandlungen der Sächsischen Gesellschaft der Wissenschaften zu Leipzig. 1917;**69**:262-277

[2] Cormack AM. Representation of a function by its line integrals, with some radiological applications. Journal of Applied Physics. 1963;**34**(9):2722-2727. DOI: 10.1063/1.1729798

[3] Hounsfield GN. Computerized transverse axial scanning tomography: Part I. Description of the system. The British Journal of Radiology. 1973; **46**(552):1016-1022. DOI: 10.1259/ 0007-1285-46-552-1016

[4] Aki K, Christoffersson A, Husebye ES. Determination of the threedimensional seismic structure of the lithosphere. Journal of Geophysical Research. 1977;**82**(2):277-296. DOI: 10.1029/JB082i002p00277

[5] Iyer HM, Hirahara K. Seismic Tomography: Theory and Practice. 1st ed. Dordrecht, Netherlands: Springer; 1993. p. 864. ISBN: 978-0412371905

[6] Abel NH. Auflösung einer mechanischen Aufgabe. Journal für die reine und angewandte Mathematik. 1826; **1**:153-157. DOI: 10.1515/crll.1826.1.153

[7] Ware R, Rocken C, Solheim F, Exner M, Schreiner W, Anthes R, et al. GPS sounding of the atmosphere from low earth orbit: Preliminary results. Bulletin of the American Meteorological Society. 1996;**77**:19-40. DOI: 10.1175/ 1520-0477(1996)077<0019: GSOTAF>2.0.CO;2

[8] Schmidt T, Wickert J, Marquardt C, Beyerle G, Reigber C, Galas R, et al. GPS radio occultation with CHAMP: An innovative remote sensing method of the atmosphere. Advances in Space Research. 2004;**33**(7):1036-1040. DOI: 10.1016/S0273-1177(03)00591-X

[9] Schreiner WS, Weiss JP, Anthes RA, Braun J, Chu V, Fong J, et al. COSMIC-2 radio occultation constellation: First results. Geophysical Research Letters. 2020;**47**:7. DOI: 10.1029/2019GL086841

[10] Moeller G, Landskron D. Atmospheric bending effects in GNSS tomography. Atmospheric Measurement Techniques. 2019;**12**:23-34. DOI: 10.5194/amt-12-23-2019

[11] Perler D. Water vapor tomography using global navigation satellite systems [doctoral thesis]. ETH Zurich: Institute of Geodesy and Photogrammetry. 2011

[12] Ding N, Zhang SB, Wu SQ, Wang X. Adaptive node parameterization for dynamic determination of boundaries and nodes of GNSS tomographic models. Journal of Geophysical Research Atmospheres. 2018;**123**(4):1990-2003. DOI: 10.1002/2017JD027748

[13] Moeller G. Reconstruction of 3D wet refractivity fields in the lower atmosphere along bended GNSS signal paths [doctoral thesis]. TU Wien: Department of Geodesy and Photogrammetry. 2017. DOI: 10.34726/ hss.2017.21443

[14] Kaczmarz S. Angenäherte Auflösung von Systemen linearer Gleichungen. Bulletin International de l' Académie Polonaise des Sciences et des Lettres. 1937;**35**:355-357

[15] Bender M, Dick G, Ge M, Deng Z, Wickert J, Kahle H-G, et al. Development of a GNSS water vapour tomography system using algebraic reconstruction techniques. Advances in Space Research. 2011;**47**(10):1704-1720. DOI: 10.1016/j.asr.2010.05.034

[16] Stolle C. Three-dimensional imaging of ionospheric electron density fields using GPS observations at the ground and onboard the CHAMP satellite. [doctoral thesis]. Universität Leipzig: Institut für Meteorologie. 2014

[17] Jin S, Park JU. GPS ionospheric tomography: A comparison with the IRI-2001 model over South Korea. Earth, Planets and Space. 2007;**59**(4):287-292. DOI: 10.1186/BF03353106

[18] Gordon R, Bender R, Herman GT. Algebraic reconstruction technique (ART) for three-dimensional electron microscopy and X-ray photography. Journal of Theoretical Biology. 1970; **29**(3):471-481. DOI: 10.1016/0022-519 (370)90109-8

[19] Gilbert PFC. Iterative methods for three-dimensional reconstruction of an object from its projections. Journal of Theoretical Biology. 1972;**36**(1):105-117. DOI: 10.1016/0022-519(372)90180-4

[20] Moore EH. On the reciprocal of the general algebraic matrix. Bulletin of American Mathematical Society. 1920; **26**:394-395

[21] Penrose R. A generalized inverse for matrices. Proceedings of the Cambridge Philosophical Society. 1955;**51**(3): 406-413. DOI: 10.1017/ S0305004100030401

[22] Strang G, Borre K. Linear Algebra, Geodesy, and GPS. 1st ed. Wellesley, Massachusetts, USA: Wellesley-Cambridge Press; 1997. p. 624. ISBN: 978-0961408862

[23] Hansen PC. The L-curve and its use in the numerical treatment of inverse

problems. In: Computational Inverse Problems in Electrocardiology. Vol. 4. Ashurst, UK: WIT Press; 2000. pp. 119-142

[24] Flores A. Atmospheric tomography using satellite radio signals [doctoral thesis]. Universitat Politecnica de Catalunya: Departament de Teoria del Senyal i Comunicacions. 1999

[25] Tikhonov AN. Solution of incorrectly formulated problems and the regularization method. Soviet Mathematics Doklady. 1963;**4**:1035-1038

[26] Elden L. Algorithms for the regularization of ill-conditioned least squares problems. BIT. 1977;**17**(2): 134-145. DOI: 10.1007/BF01932285

[27] WMO. Guide to Meteorological Instruments and Methods of Observation. 7th ed. Geneva, Switzerland: Secretariat of the World Meteorological Organization; 2008. ISBN: 978-9263100085

[28] Steiner AK, Löscher A, Kirchengast G. Error characteristics of refractivity profiles retrieved from CHAMP radio occultation data. In: Atmosphere and Climate. Berlin Heidelberg: Springer; 2006. pp. 27-36. DOI: 10.1007/3-540-34121-8

[29] Moeller G, Ao C, Mannucci T. Tomographic radio occultation methods applied to a dense cubesat formation in low Mars orbit. Radio Science. 2019; **56**(7):1-10. DOI: 10.1029/2020RS007199

[30] Fishbach FF. A satellite method for pressure and temperature below 24 km. Bulletin of the American Meteorological Society. 1965;**46**(9):528-532. DOI: 10.1175/1520-0477-46.9.528

[31] Phinney RA, Anderson DL. On the radio occultation method for study

*Nanosatellites: The Next Big Chapter in Atmospheric Tomography DOI: http://dx.doi.org/10.5772/intechopen.108522*

planetary atmospheres. Journal of Geophysical Research. 1968;**73**(5): 1819-1827. DOI: 10.1029/ JA073i005p01819

[32] Rangaswamy S. Recovery of atmospheric parameters from the Apollo/Soyuz-ATS-F radio occultation data. Geophysical Research Letters. 1976; **3**(8):483-486. DOI: 10.1029/ GL003i008p00483

[33] Gorbunov ME. Three-dimensional satellite refractive tomography of the atmosphere: Numerical simulation. Radio Science. 1996;**31**(1):95-104. DOI: 10.1029/95RS01353

[34] Xu G. GPS Theory, Algorithms and Applications. 2nd ed. Berlin Heidelberg: Springer-Verlag; 2007. DOI: 10.1007/ 978-3-540-72715-6

[35] Svehla D. Geometrical Theory of Satellite Orbits and Gravity Field. 1st ed. Cham, Switzerland: Springer International Publishing; 2018. DOI: 10.1007/978-3-319-76873-1

[36] Kursinski ER, Hajj GA, Schofield JT, Linfield RP, Hardy KR. Observing earth's atmosphere with radio occultation measurements using the global positioning system. Journal of Geophysical Research. 1997;**102** (D19):23429-23465. DOI: 10.1029/ 97JD01569

[37] Aragon B, Houborg R, Tu K, Fisher JB, McCabe M. CubeSats enable high spatiotemporal retrievals of crop-water use for precision agriculture. Remote Sensing. 2018;**10**(12):1867. DOI: 10.3390/ rs10121867

[38] Douglas E, Cahoy KL, Morgan RE, Knapp M. CubeSats for astronomy and astrophysics. Bulletin of the AAS. 2019; **51**(7):1-6

[39] Curzi G, Modenini D, Tortora P. Large constellations of small satellites: A survey of near future challenges and missions. Aerospace. 2020;**7**(9):133. DOI: 10.3390/aerospace7090133

[40] de Carvalho RA, Estela J, Langer M. Nanosatellites, Nanosatellites: Space and Ground Technologies, Operations and Economics. Toronto, Canada: John Wiley & Sons; 2020. p. xxxv. ISBN: 978-1119042051

[41] Moeller G, Rothacher M, Sonnenberg F, Wolf A. A high-precision commercial off-the-shelf GNSS payload board for nanosatellite orbit determination and timing. Proceedings of the 44th COSPAR Scientific Assembly 16-24 July 2022, online

[42] Turk FJ, Padulles R, Ao CO, de la Torre JM, Wang KN, Franklin GW. Benefits of a closely-spaced satellite constellation of atmospheric polarimetric radio occultation measurements. Remote Sensing. 2019; **11**(20):1-19. DOI: 10.3390/rs11202399

Section 2

## Some Computational Aspects

#### **Chapter 4**

## Numerical Gradient Computation for Simultaneous Detection of Geometry and Spatial Random Fields in a Statistical Framework

*Michael Conrad Koch, Kazunori Fujisawa and Akira Murakami*

#### **Abstract**

The target of this chapter is the evaluation of gradients in inverse problems where spatial field parameters and geometry parameters are treated separately. Such an approach can be beneficial especially when the geometry needs to be detected accurately using *L*2-norm-based regularization. Emphasis is laid upon the computation of the gradients directly from the governing equations. Working in a statistical framework, the Karhunen-Loève (K-L) expansion is used for discretization of the spatial random field and inversion is done using the gradient-based Hamiltonian Monte Carlo (HMC) algorithm. The HMC gradients involve sensitivities w.r.t the random spatial field and geometry parameters. Building on a method developed by the authors, a procedure is developed which considers the gradients of the associated integral eigenvalue problem (IEVP) as well as the interaction between the gradients w.r.t random spatial field parameters and the gradients w.r.t the geometry parameters. The same mesh and linear shape functions are used in the finite element method employed to solve the forward problem, the artificial elastic deformation problem and the IEVP. Analysis of the rate of convergence using seven different meshes of increasing density indicates a linear rate of convergence of the gradients of the log posterior.

**Keywords:** sensitivity analysis, geometry detection, random fields, Hamiltonian Monte Carlo, inverse problems

#### **1. Introduction**

Accurate computation of gradients, w.r.t parameters of interest, is a key aspect of deterministic algorithms like Gauss-Newton, Levenberg–Marquardt, Occam's inversion [1] as well as statistical algorithms like Hamiltonian Monte Carlo (HMC) [2]. Common nonintrusive methods like finite differences compute the gradient by taking differences between the response at the current model and at a perturbed model, such methods suffer from certain drawbacks. Two types of errors stand out in particular:

numerical error involved with truncation of Taylor's series and round-off error involved with finite precision arithmetic of computers [3]. For more robust analysis, this chapter focuses on methods where the gradient is computed directly from the analytical/numerical model by enlisting the sensitivity equations.

The type of solutions that can be obtained from inverse problems is guided by the regularization term. When the accurate detection of the geometry of embedded objects (or detection of discontinuities) is of interest, *L*1-norm-based difference priors have proven to be useful [4, 5]. However, special techniques need to be used to accommodate the nondifferentiability of the *L*1-norm [1]. Hence, to make use of a large library of adjoint-based inversion solvers (that use gradients w.r.t the parameters), *L*2-norm-based priors which readily allow for differentiation are more popular in practice. However, *L*2-norm regularization or Gaussian priors in a stochastic sense only admit smooth solutions. Hence, in order to still be able to capture discontinuities, this paper explicitly parameterizes the shape of the boundary (or the geometry of the domain). This approach thereby considers two sets of parameters, one related to the spatial field and the second with the geometry parameters. This approach is of course only applicable when the unknown geometry can be parameterized explicitly.

Gradients have to be computed w.r.t both spatial as well geometry parameters. While sensitivity analysis of spatial parameters is usually straightforward, computation of the gradients w.r.t geometry parameters [6, 7] needs to be done more carefully and consider aspects like mesh distortion. This chapter develops on the method presented in [8] and briefly details the simultaneous spatial field and geometry update within the HMC statistical framework. Similar to [8], the Karhunen-Loève (K-L) expansion is used for discretization of the random field, with the difference that the complete theoretical basis is considered. The complete integral eigenvalue problem or IEVP is solved and the associated gradients are computed. The interaction between the geometry and spatial parameters due to the domain of definition of the IEVP is also detailed. It should be noted that the procedure detailed above is applicable to both the direct differentiation and adjoint methods [9] of sensitivity analysis.

The entire numerical study is done with a focus on aspects related to gradients and not with the aim to solve the inverse problem. Nevertheless, a forward model is still required for the computation of gradients. The chapter begins with a description of the forward model, observation equation and the discretization of the spatial random field in Section 2. This is followed by a brief description in Section 3 of the HMC-based methods developed in [8, 10] for simultaneous spatial field and geometry detection. Section 3.3 introduces the new gradients obtained when the complete IEVP is considered. It is shown how the gradients of the eigenvectors involve the computation of a Moore-Penrose pseudoinverse. Finally, the gradient computation procedure is validated and a convergence study is done in Section 4.

#### **2. Inversion preliminaries**

#### **2.1 Governing and observation equations**

Consider a linear steady seepage flow problem defined on a domain **z ∈** Ω ⊂ *<sup>d</sup>*, *d*∈f g 2, 3 , where **k z**ð Þ is a symmetric spatially varying hydraulic conductivity matrix, *h*ð Þ**z** is the hydraulic head, and *Q*ð Þ**z** is a source term as shown in Eq. (1) below

*Numerical Gradient Computation for Simultaneous Detection of Geometry and Spatial Random… DOI: http://dx.doi.org/10.5772/intechopen.108363*

$$\nabla\_\cdot - \mathbf{k}(\mathbf{z})\nabla h(\mathbf{z}) = Q(\mathbf{z}).\tag{1}$$

Standard Dirichlet: *h* ¼ *h* on Γ*D*, and Neumann: *f <sup>n</sup>* ¼ *f:n* ¼ �**k**∇*h:n* on Γ*<sup>N</sup>* boundary conditions are applied. Following a spatial discretization of the weak form of the PDE using the finite element method, the governing equation can be written as:

$$\mathbf{K}(\boldsymbol{\theta})\mathbf{h}=\mathbf{q},\tag{2}$$

where **K** is the global hydraulic conductivity matrix and **h** and **q** are the nodal hydraulic head and flux vectors respectively. The parameter vector **<sup>θ</sup>** <sup>¼</sup> <sup>1</sup> **θ**, <sup>2</sup> **θ** � �∈ *<sup>K</sup>* in Eq. (2), includes all the unknowns related to the inverse problem. In this study, the unknowns are divided into two sets [8]: <sup>1</sup> **θ**∈ *<sup>K</sup>*<sup>1</sup> , related to the spatial discretization of a random field and <sup>2</sup> **θ**∈ *<sup>K</sup>*<sup>2</sup> , and related to the definition of the geometry of the domain Ω. Also, consider an observation model relating the state vector **m** ¼ ð Þ **h**, **q** to the discrete observations **y**, through a map **H** that is independent of **θ** i.e.

$$\mathbf{y} = \mathbf{H}\mathbf{m}(\boldsymbol{\theta}) + \mathbf{r}.\tag{3}$$

The error in Eq. (3) is modeled as Gaussian **r** � ð Þ **0**, **R** with a known covariance matrix **R**. Parameter estimation is done in a probabilistic sense using Bayesian inference. Starting with a Gaussian prior distribution *p*ð Þ¼ **θ** ð Þ **θ**j**0**, **Σθ** and a likelihood distribution *<sup>p</sup>* **<sup>y</sup>**j**<sup>θ</sup>** � � <sup>¼</sup> **<sup>y</sup>**j**Hm**ð Þ**<sup>θ</sup>** , **<sup>R</sup>** � �, the posterior distribution is written as:

$$p(\boldsymbol{\theta}|\mathbf{y}) \propto p(\mathbf{y}|\boldsymbol{\theta})p(\boldsymbol{\theta}).\tag{4}$$

Except for linear Gaussian observation models, the posterior cannot be computed analytically and is usually evaluated using MCMC sampling algorithms.

#### **2.2 Karhunen-Loève (K-L) expansion**

The parameter vector <sup>1</sup> **θ** defined in Section 2.1 is associated with a continuous hydraulic conductivity spatial random field *k*ð Þ **z**, *ω* , where **z** is defined on the domain Ω and *ω* belongs to the space of random events Θ. Let the expected value of the random field be denoted as *k* : Ω ! and the autocovariance function *C* : Ω � Ω ! be defined as *C* **z**, **z**<sup>0</sup> ð Þ¼ *σ*ð Þ**z** *σ* **z**<sup>0</sup> ð Þ*ρ* **z**, **z**<sup>0</sup> ð Þ. Here *σ* : Ω ! is the standard deviation function and *ρ* : Ω � Ω ! �½ � 1, 1 is the autocorrelation coefficient function. The study in this chapter is confined to Gaussian random fields that can be defined completely by their mean and autocovariance functions.

The Karhunen-Loève (K-L) expansion method is a series expansion method for the discretization of random fields which is based on the spectral decomposition of the autocovariance function. It can be shown that a random field can be written as an infinite sum [11]:

$$k(\mathbf{z}, \alpha) = \overline{k}(\mathbf{z}) + \sum\_{k=1}^{\infty} \sqrt{\lambda\_k}^1 \theta\_k \phi\_k(\mathbf{z}),\tag{5}$$

where <sup>1</sup> *θ<sup>k</sup>* : Θ ! are standard uncorrelated random variables, *λ<sup>k</sup>* are the eigenvalues (always non-negative) and *ϕk*ð Þ**z** are the eigenfunctions of the linear operator related to the covariance kernel *C*. They can be obtained by solving the homogeneous Fredholm integral eigenvalue problem (IEVP) on the domain Ω:

$$\int\_{\Omega} \mathbf{C}(\mathbf{z}, \mathbf{z}') \phi\_k(\mathbf{z}') \mathbf{d} \mathbf{z}' = \lambda\_k \phi\_k(\mathbf{z}). \tag{6}$$

The autocovariance function is symmetric, bounded, and positive semi-definite and has the spectral decomposition *<sup>C</sup>* **<sup>z</sup>**, **<sup>z</sup>**<sup>0</sup> ð Þ¼ <sup>P</sup><sup>∞</sup> *<sup>k</sup>*¼1*λkϕk*ð Þ**<sup>z</sup>** *<sup>ϕ</sup><sup>k</sup>* **<sup>z</sup>**<sup>0</sup> ð Þ. The eigenfunctions are orthogonal, and in a normalized form satisfy the condition Ð <sup>Ω</sup>*ϕk*ð Þ**z** *ϕl*ð Þ**z** d**z** ¼ *δkl*, where *δkl* is the Kronecker delta. In the case of Gaussian random fields, the random variables <sup>1</sup> *θ<sup>k</sup>* are also independent and follow the standard normal distribution.

In practice, the eigenvalues decay exponentially fast for smooth functions and algebraically fast for non-smooth autocovariance kernels and the K-L expansion is usually truncated after *K*<sup>1</sup> terms. If the eigenvalues are arranged in descending order such that *λ*<sup>1</sup> >*λ*<sup>2</sup> > … >*λ<sup>K</sup>*<sup>1</sup> , then accompanied by the associated eigenfunctions, the truncated K-L expansion approximation of the random field can be written as

$$
\hat{k}(\mathbf{z}, \alpha) = \overline{k}(\mathbf{z}) + \sum\_{k=1}^{K\_1} \sqrt{\lambda\_k}^1 \theta\_k \phi\_k(\mathbf{z}), \tag{7}
$$

The truncated K-L expansion approximation is optimal in the sense that, for a fixed number of terms *K*1, the mean square error over the domain is minimized [12]. A global error measure related to random field discretization is called the *mean error variance εσ* and is defined as [13]:

$$\overline{\varepsilon}\_{\sigma}(\mathbf{z}) = \frac{1}{|\Omega|} \int\_{\Omega} \frac{\mathrm{Var}\left[k(\mathbf{z}, \ \boldsymbol{\omega}) - \hat{k}(\mathbf{z}, \ \boldsymbol{\omega})\right]}{\mathrm{Var}[k(\mathbf{z}, \ \boldsymbol{\omega})]} \, \mathrm{d}\mathbf{z}. \tag{8}$$

It can be shown that the variance of the truncated K-L expansion ^ *k*ð Þ **z**, *ω* is:

$$\operatorname{Var}\left[\hat{k}\left(\mathbf{z},\ \boldsymbol{\alpha}\right)\right] = \sum\_{k=1}^{K\_1} \lambda\_k \phi\_k^2(\mathbf{z}).\tag{9}$$

Using the property <sup>1</sup> *θk* 1 *θl* � � <sup>¼</sup> *<sup>δ</sup>kl*, the mean error variance can be calculated as [14]:

$$\overline{\varepsilon}\_{\sigma, \text{KL}} = \mathbf{1} - \frac{\mathbf{1}}{|\Omega|\sigma^2} \sum\_{k=1}^{K\_1} \lambda\_k. \tag{10}$$

The derivation for Eq. (10) assumes the random field to be homogeneous, i.e., *σ*ð Þ¼ **z** *σ*. We only consider the case where the prior random field is homogeneous and Gaussian. This assumption is for numerical convenience and does not limit the posterior, which can be non-Gaussian and nonhomogeneous [15].

*Numerical Gradient Computation for Simultaneous Detection of Geometry and Spatial Random… DOI: http://dx.doi.org/10.5772/intechopen.108363*

#### **2.3 Galerkin finite element method to solve the integral eigenvalue problem**

The Galerkin Finite Element Method (FEM) is used to solve the IEVP on Ω. The eigenfunctions are approximated with the help of the shape functions *Nj* : Ω ! of the FE mesh, and is represented as:

$$\phi\_k(\mathbf{z}) \approx \sum\_{j=1}^n d\_{kj} N\_j(\mathbf{z}),\tag{11}$$

where the coefficients *dkj* ∈ are unknown and *n* is the number of nodes in the FE mesh. Substitution of Eq. (11) into Eq. (6), yields the residual:

$$r(\mathbf{z}) = \sum\_{j=1}^{n} d\_{kj} \left( \int\_{\Omega} \mathbf{C}(\mathbf{z}, \ \mathbf{z}') N\_j(\mathbf{z}') d\mathbf{z}' - \lambda\_j N\_j(\mathbf{z}) \right), \tag{12}$$

In the Galerkin method, the unknown coefficients are determined by making the residual *r*ð Þ**z** orthogonal to the space spanned by the shape functions i.e.

$$\int\_{\Omega} r(\mathbf{z}) N\_i(\mathbf{z}') \, \mathrm{d}\mathbf{z}' = \mathbf{0} \forall j = \mathbf{1}, \ldots, n. \tag{13}$$

This results in a generalized eigenvalue problem

$$\mathbf{B}\mathbf{d}\_{k} = \lambda\_{k}\mathbf{M}\mathbf{d}\_{k},\tag{14}$$

where

$$B\_{\vec{\eta}} = \int\_{\Omega} N\_i(\mathbf{z}) \int\_{\Omega} C(\mathbf{z}, \mathbf{z}') N\_{\vec{\jmath}}(\mathbf{z}') d\mathbf{z}' d\mathbf{z} \text{ and}$$

$$M\_{\vec{\eta}} = \int\_{\Omega} N\_i(\mathbf{z}) N\_{\vec{\jmath}}(\mathbf{z}) d\mathbf{z}.\tag{15}$$

Both **B** and **M** are *n* � *n* matrices that involve integrals over the domain Ω. Hence the actual geometry of the domain has to be considered for integration. The maximum number of available eigenpairs is *n*, but in practice, the K-L expansion can usually be truncated at *K*<sup>1</sup> terms such that *K*<sup>1</sup> ≪ *n*. As such, for computational efficiency, it is sufficient to compute the first *K*<sup>1</sup> eigenpairs only, which can be done through the Lanczos algorithm.

#### **3. Simultaneous geometry and spatial field detection**

#### **3.1 Hamiltonian Monte Carlo**

Consider a parameter space **θ**∈ *<sup>K</sup>* augmented with equidimensional momentum variables **<sup>p</sup>**<sup>∈</sup> *<sup>K</sup>* and a joint probability distribution with density *<sup>p</sup>*ð Þ **<sup>θ</sup>**, **<sup>p</sup>** defined over this augmented space. If the underlying distribution over the momentum variables is chosen to be a Gaussian: *p*ð Þ� **p** ð Þ **p**j**0**, **M** , where **M** is user-specified and

independent of **θ**, then a joint probability distribution can be defined as *<sup>p</sup>*ð Þ¼ **<sup>θ</sup>**, **<sup>p</sup>** *<sup>p</sup>*ð Þ **<sup>p</sup>** *<sup>p</sup>* **<sup>θ</sup>**j**<sup>y</sup>** � �. The Hamiltonian *<sup>H</sup>* : *<sup>K</sup>* � *<sup>K</sup>* ! is then defined as:

$$H(\mathbf{\theta}, \mathbf{p}) = -\log p(\mathbf{p}) - \log p\left(\mathbf{\theta}|\mathbf{y}\right) \tag{16}$$

The second term on the right of Eq. (17) is generally called *<sup>φ</sup>* : *<sup>K</sup>* ! and is given as *<sup>φ</sup>*ð Þ¼� **<sup>θ</sup>** log *<sup>p</sup>* **<sup>θ</sup>**j**<sup>y</sup>** � �.

The introduction of the momentum variables allows for the generation of trajectories through conservative Hamiltonian dynamics [16], which are given as:

$$
\begin{pmatrix}
\frac{d\Phi}{dt} \\
\frac{d\mathbf{p}}{dt}
\end{pmatrix} = \begin{bmatrix}
\begin{matrix} 0 & 1 \\ -1 & 0 \end{matrix} \end{pmatrix} \begin{pmatrix}
\frac{\partial H}{\partial \mathbf{\theta}} \\
\frac{\partial H}{\partial \mathbf{p}}
\end{pmatrix} = \begin{pmatrix}
\mathcal{M}^{-1} \mathbf{p}
\end{pmatrix}.\tag{17}
$$

These dynamics are exactly reversible (provided the gradient *<sup>∂</sup>φ*ð Þ**<sup>θ</sup>** *<sup>∂</sup>***<sup>θ</sup>** is one-to-one) and preserve volume as Eq. (18) is just a rotation transformation in **θ** � **p** space. Except for simple problems Eq. (18) cannot be solved analytically and is usually solved using the leapfrog method, which is a second-order accurate numerical integrator given as:

$$\mathbf{p}\left(t+\frac{c}{2}\right) = \mathbf{p}(t) - \frac{c}{2}\frac{\partial\rho(\mathbf{\theta}(t))}{\partial\mathbf{\theta}},\tag{18}$$

$$\boldsymbol{\theta}(t+\varepsilon) = \boldsymbol{\theta}(t) + \varepsilon \boldsymbol{\mathcal{M}}^{-1} \mathbf{p}\left(t + \frac{\varepsilon}{2}\right) \text{ and} \tag{19}$$

$$\mathbf{p}(t+\varepsilon) = \mathbf{p}\left(t+\frac{\varepsilon}{2}\right) - \frac{\varepsilon}{2} \frac{\partial \rho(\mathbf{\hat{o}}(t+\varepsilon))}{\partial \mathbf{\hat{o}}}.\tag{20}$$

Starting from a point **θ***<sup>j</sup>* , **p***<sup>j</sup>* � �**,** these equations are applied repeatedly for *L* steps, each with a step-size *ϵ*, to determine a transition to a new point **θ***<sup>j</sup>*þ<sup>1</sup> , **p***<sup>j</sup>*þ<sup>1</sup> � �, which lies on the same Hamiltonian level-set as **θ***<sup>j</sup>* , **p***<sup>j</sup>* � �. The deterministic part of Hamiltonian Monte Carlo (HMC) [2] is defined by Eqs. (19)–(21). The stochastic part of HMC comes from resampling **p** � ð Þ **0**, **M** . The statistical efficiency of Hamiltonian Monte Carlo stems from the fact that the gradient-guided transitions can propose new points that are "far-away" from the starting point, thereby enabling efficient sampling of the posterior. This is in contrast to the random nature of transitions, which suffer from the curse of dimensionality [17], in conventional MCMC algorithms. As shown in Eqs. (19)–(21), critical to the success of HMC, is the computation of the gradient *<sup>∂</sup>φ*ð Þ**<sup>θ</sup>** *<sup>∂</sup>***<sup>θ</sup>** . Special attention must be paid to maintaining the reversibility of the transitions to satisfy the detailed balance condition [18] for MCMC algorithms. This is detailed along with the gradient computation procedure in the following sections.

#### **3.2 Parameter update using the mesh moving method**

The leapfrog equations determine an update in **θ** � **p** space. In particular, the update from **θ**ðÞ!*t* **θ**ð Þ *t* þ *ϵ* in Eq. (20), defines not only a new realization of the random field, but also a new domain, i.e., Ω <sup>2</sup> **θ** � �. Without loss of generality, consider a domain in 2D discretized such that **Z** <sup>2</sup> **θ** ∈ <sup>2</sup> is the nodal coordinate vector of all the nodes of the finite element mesh. Let **Z***<sup>v</sup>* <sup>2</sup> **θ** ∈ <sup>2</sup> represent a subset of this vector that includes only the node coordinates at the piping zone boundary. Koch et al. [10] show that the computation of the gradient *<sup>∂</sup>φ*ð Þ**<sup>θ</sup>** *<sup>∂</sup>***<sup>θ</sup>** , by analytical methods [6], ultimately involves the computation of the gradient of the nodal coordinate vector *<sup>∂</sup>***Z**ð Þ**<sup>θ</sup>** *<sup>∂</sup>***<sup>θ</sup>** .

Computation of the nodal coordinate vector gradient requires the definition of a differentiable map which is additionally reversible and one-to-one, to satisfy the detailed balance condition of MCMC. One such map proposed in [10], is to update from an arbitrarily fixed reference domain Ω*ref* <sup>2</sup> **θ***ref* , defined by an arbitrary parameter <sup>2</sup> **θ***ref* . Let **Z***ref* <sup>2</sup> **θ***ref* ∈ *<sup>n</sup>*�<sup>2</sup> be the nodal coordinate vector on the discretized domain Ω*ref* and **Z***ref v* 2 **θ***ref* ∈ *nv*�<sup>2</sup> represent a subset of this vector that includes only the coordinates of the *nv* nodes at the piping zone boundary. The nodal coordinates **Z***<sup>v</sup>* <sup>2</sup> **θ** and **Z***ref <sup>v</sup>* <sup>2</sup> **θ***ref* can be determined explicitly if <sup>2</sup> **θ** and <sup>2</sup> **θ***ref* are known respectively. Following the update, <sup>2</sup> **<sup>θ</sup>**ðÞ!*<sup>t</sup>* <sup>2</sup> **<sup>θ</sup>**ð Þ *<sup>t</sup>* <sup>þ</sup> *<sup>ε</sup>* , **<sup>Z</sup>***<sup>v</sup>* <sup>2</sup> **<sup>θ</sup>**ð Þ *<sup>t</sup>* <sup>þ</sup> *<sup>ε</sup>* is available, and an artificial elastic deformation problem can be set up from the arbitrary known reference domain Ω*ref* <sup>2</sup> **θ***ref* to the current domain Ω <sup>2</sup> **<sup>θ</sup>**ð Þ *<sup>t</sup>* <sup>þ</sup> *<sup>ε</sup>* . The prescribed displacements **u***ref <sup>v</sup>* ∈ *nv*�<sup>2</sup> for the elastic deformation problem are given as:

$$\mathbf{u}\_{v}^{\rm rcf} = \mathbf{Z}\_{v} \left( ^{2}\boldsymbol{\Theta}(t+\varepsilon) \right) - \mathbf{Z}\_{v}^{\rm rf} \left( ^{2}\boldsymbol{\Theta}^{\rm rf} \right). \tag{21}$$

The entire mesh is moved and the new nodal coordinates can be determined as:

$$\mathbf{Z}\left(^{2}\mathbf{\theta}(t+\varepsilon)\right) = \mathbf{Z}^{r\circ f}\left(^{2}\mathbf{\theta}^{r\circ f}\right) + \mathbf{u}^{r\circ f},\tag{22}$$

where **u***ref* ∈ *<sup>n</sup>*�<sup>2</sup> represents the displacement of all the nodes from the reference domain to the current domain.

The displacements in the elastic deformation step can cause distortions in the mesh, especially in regions where large deformation is expected, i.e., near the piping zone boundary. To maintain a good mesh quality for computation purposes, a mesh moving method [19] is used. The simple idea is to scale the elastic modulus *Eref <sup>e</sup>* of each element (in the reference domain) with the determinant of the Jacobian **J** *ref e* in the elastic deformation step:

$$
\bar{E}\_{\varepsilon}^{\rm ref} = E\_{\varepsilon}^{\rm ref} \left( \mathbf{1} / \left| \mathbf{J}\_{\varepsilon}^{\rm ref} \right| \right)^{\chi^{\rm ref}}, \tag{23}
$$

where *χref* is an arbitrary positive scaling parameter. The Poisson's ratio of the reference domain *νref* is also chosen arbitrarily. The performance of the algorithm has been shown [20] to be invariant to the choice of these reference parameters. The net effect of such scaling is that small elements become rigid and larger elements become more flexible. Hence, if the reference domain mesh is constructed carefully such that small elements are placed in regions where large distortion is expected, i.e., near the piping zone, and larger elements are placed in regions of less expected distortion, the method is expected to yield a good mesh quality in the elastic deformation stage. The map in Eq. (23) is differentiable and helps determine *<sup>∂</sup>***Z**ð Þ**<sup>θ</sup>** *<sup>∂</sup>***<sup>θ</sup>** .

#### **3.3 Gradient computation**

Analytical methods for the computation of the gradient are intrusive and involve gradients of the steady seepage flow forward solver. From the definition of *<sup>φ</sup>*ð Þ¼� **<sup>θ</sup>** log *<sup>p</sup>* **<sup>θ</sup>**j**<sup>y</sup>** � �, it is apparent that the computation of *<sup>∂</sup>φ*ð Þ**<sup>θ</sup>** *<sup>∂</sup>***<sup>θ</sup>** requires the computation of *<sup>∂</sup>***<sup>m</sup>** *<sup>∂</sup>***<sup>θ</sup>** <sup>¼</sup> *<sup>∂</sup>***<sup>h</sup>** *<sup>∂</sup>***<sup>θ</sup>** , *<sup>∂</sup>***<sup>q</sup>** *∂***θ** � �. These terms can be computed by taking the derivative of Eq. (2) and is given as:

$$\frac{\partial \mathbf{K}}{\partial \mathbf{\theta}} \mathbf{h} + \mathbf{K} \frac{\partial \mathbf{h}}{\partial \mathbf{\theta}} = \frac{\partial \mathbf{q}}{\partial \mathbf{\theta}}.\tag{24}$$

Given standard boundary conditions, this equation can be solved to obtain *<sup>∂</sup>***<sup>h</sup>** *<sup>∂</sup>***<sup>θ</sup>** and *∂***q** *<sup>∂</sup>***θ**, provided *<sup>∂</sup>***<sup>K</sup>** *<sup>∂</sup>***<sup>θ</sup>** is known.

Following the standard procedure in FEM, the hydraulic conductivity matrix at the element level Ω*<sup>e</sup>* in the current domain is given as:

$$\mathbf{K}\_{\epsilon} = \int\_{\hat{\Omega}\_{\epsilon}} \mathbf{G}^{T} \hat{k}(\mathbf{\theta}, \mathbf{z}) \mathbf{G} |\mathbf{J}\_{\epsilon}| \mathrm{d}\xi,\tag{25}$$

where ^ *k* represents the hydraulic conductivity spatial field obtained from the truncated K-L expansion, **G** contains the derivatives (w.r.t **z**) of the shape functions *Nj* described earlier in Section 2.3, **J***<sup>e</sup>* j j is the determinant of the Jacobian matrix associated with the isoparametric transformation *ξ*1, *ξ*<sup>2</sup> ð Þ! ð Þ z1, z2 and Ω*<sup>e</sup>* is the region occupied by the parent element related to the isoparametric transformation. The gradient of **K***<sup>e</sup>* with respect to the spatial parameters <sup>1</sup> **θ** can be computed as:

$$\frac{\partial \mathbf{K}\_{\epsilon}}{\partial \frac{1}{\partial \mathbf{f}}} = \int\_{\Omega\_{\epsilon}} \mathbf{G}^{\mathrm{T}} \frac{\partial \hat{k}}{\partial \frac{1}{\epsilon} \mathbf{G}} \mathbf{G} |\mathbf{J}\_{\epsilon}| \mathrm{d}\mathfrak{F},\tag{26}$$

where the gradient of the hydraulic conductivity field w.r.t <sup>1</sup> **θ** is easily obtained by differentiating Eq. (7) and is given as:

$$\frac{\partial \hat{\mathbb{k}}}{\partial \hat{\mathbb{k}}\_{j}} = \sqrt{\lambda\_{j}} \phi\_{j},\tag{27}$$

The gradient of **K***<sup>e</sup>* w.r.t the geometry parameters <sup>2</sup> **θ** can be written as [6]:

$$\frac{\partial \mathbf{K}\_{\epsilon}}{\partial \hat{\sigma}^{2} \mathbf{0}} = \int\_{\mathcal{Q}\_{\epsilon}} \left( \frac{\partial \mathbf{G}^{T}}{\partial^{2} \mathbf{\hat{\mathbf{f}}} \mathbf{G} |\mathbf{J}\_{e}| + \mathbf{G}^{T} \hat{k} \frac{\partial \mathbf{G}}{\partial^{2} \mathbf{\hat{\mathbf{f}}} \mathbf{G}} |\mathbf{J}\_{e}| + \mathbf{G}^{T} \hat{k} \mathbf{G} \frac{\partial |\mathbf{J}\_{e}|}{\partial^{2} \mathbf{\hat{\mathbf{g}}} }{\partial^{2} \mathbf{\hat{\mathbf{g}}} } + \mathbf{G}^{T} \frac{\partial \hat{k}}{\partial^{2} \mathbf{\hat{\mathbf{g}}} } \mathbf{G} |\mathbf{J}\_{e}| \right) d \mathbf{f}. \tag{28}$$

Formulas for the calculation of the gradients *<sup>∂</sup>***<sup>G</sup>** *<sup>∂</sup>*2**<sup>θ</sup>** and *<sup>∂</sup>* **<sup>J</sup>**<sup>e</sup> j j *∂*2 **<sup>θ</sup>** can readily be found in literature [6]. It is clear from Eq. (6) that the computation of the eigenvalues and eigenfunctions depends on the definition of the domain Ω <sup>2</sup> **θ** � �. Hence, the gradient *<sup>∂</sup>*^ *k*ð Þ **θ**, **z** *<sup>∂</sup>*2**<sup>θ</sup>** will involve the gradients of the eigenvalues and eigenvectors and is written as:

*Numerical Gradient Computation for Simultaneous Detection of Geometry and Spatial Random… DOI: http://dx.doi.org/10.5772/intechopen.108363*

$$\frac{\partial \hat{k}}{\partial^2 \mathbf{0}} = \sum\_{j=1}^{K\_1} \left( \frac{1}{2\sqrt{\lambda\_j}} \frac{\partial \lambda\_j}{\partial^2 \mathbf{0}} \phi\_j + \sqrt{\lambda\_q} \frac{\partial \phi\_j}{\partial^2 \mathbf{0}} \right) \mathbf{1}\_{\mathcal{O}\_j} \tag{29}$$

The gradient of the eigenvalues and eigenvectors of the generalized eigenvalue problem in Eq. (14) w.r.t. <sup>2</sup> **θ** are written as [21]:

$$\frac{\partial \lambda\_{\dot{\gamma}}}{\partial^2 \mathbf{\dot{\sigma}}} = \mathbf{d}\_{\dot{\gamma}}^T \left( \frac{\partial \mathbf{B}}{\partial^2 \mathbf{\dot{\sigma}}} - \lambda\_{\dot{\gamma}} \frac{\partial \mathbf{M}}{\partial^2 \mathbf{\dot{\sigma}}} \right) \mathbf{d}\_{\dot{\gamma}},\tag{30}$$

$$\frac{\partial \mathbf{d}\_j}{\partial^2 \mathbf{0}} = \left(\lambda\_j \mathbf{M} - \mathbf{B}\right)^\dagger \left(\frac{\partial \mathbf{B}}{\partial^2 \mathbf{0}} - \lambda\_j \frac{\partial \mathbf{M}}{\partial^2 \mathbf{0}}\right) \mathbf{d}\_j - \frac{1}{2} \left(\mathbf{d}\_j^T \frac{\partial \mathbf{M}}{\partial^2 \mathbf{0}} \mathbf{d}\_j\right) \mathbf{d}\_j. \tag{31}$$

As **<sup>d</sup>***<sup>j</sup>* lies in the null space of *<sup>λ</sup>j***<sup>M</sup>** � **<sup>B</sup>**, the inverse *<sup>λ</sup>j***<sup>M</sup>** � **<sup>B</sup>** � ��<sup>1</sup> cannot be computed and a generalized inverse called the Moore-Penrose pseudoinverse ð Þ<sup>∙</sup> † is employed. In this study, the Moore-Penrose pseudoinverse is calculated by first carrying out an SVD of the matrix *λj***M** � **B** and then eliminating the smallest singular values below a tolerance level. This is then followed by taking a standard inverse of the SVD.

Considering the same mesh that is used for the solution of the steady seepage flow problem to be used for the discretization of the random field, the gradients of the matrices **B** and **M** at an elemental level in isoparametric space are given as:

$$\frac{\partial B\_{\vec{\eta}}}{\partial^{2}\mathbf{\dot{\theta}}} = \int\_{\Omega\_{\text{e}}} N\_{i} \int\_{\Omega\_{\text{e}}} C(\mathbf{z}, \mathbf{z}') N\_{j} \left( \frac{\partial |\mathbf{J}\_{e}(\mathbf{z}')|}{\partial^{2}\mathbf{\dot{\theta}}}, |\mathbf{J}\_{e}(\mathbf{z}')| + |\mathbf{J}\_{e}(\mathbf{z})| \frac{\partial |\mathbf{J}\_{e}(\mathbf{z}')|}{\partial^{2}\mathbf{\dot{\theta}}} \right) d\mathbf{\dot{\xi}'} d\mathbf{\dot{\xi}} \tag{32}$$

$$+ \int\_{\Omega\_{\text{e}}} N\_{i} \int\_{\Omega\_{\text{e}}} \left( \frac{\partial C(\mathbf{z}, \mathbf{z}')}{\partial^{2}\mathbf{\dot{\theta}}} \right) N\_{j} |\mathbf{J}\_{e}(\mathbf{z})| |\mathbf{J}\_{e}(\mathbf{z}')| d\mathbf{\dot{\xi}} \mathbf{\dot{\xi}},$$

$$\frac{\partial M\_{\vec{\eta}}}{\partial^{2}\mathbf{\dot{\theta}}} = \int\_{\Omega\_{\text{e}}} N\_{i} N\_{j} \frac{\partial |\mathbf{J}\_{e}(\mathbf{z})|}{\partial^{2}\mathbf{\dot{\theta}}} \mathbf{d} \mathbf{\dot{\xi}}. \tag{33}$$

In this study, the squared exponential autocorrelation coefficient function has been used, i.e.,

$$\rho(\mathbf{z}, \mathbf{z}') = \exp\left(\frac{-|\mathbf{z} - \mathbf{z}'|^2}{l\_c^2}\right),\tag{34}$$

where *lc* is the correlation length of the random field. Assuming *σ*ð Þ¼ **z** *σ* **z**<sup>0</sup> ð Þ¼ *σ*, the gradient of the autocovariance function *<sup>C</sup>* **<sup>z</sup>**, **<sup>z</sup>**<sup>0</sup> ð Þ w.r.t <sup>2</sup> **θ** is given as:

$$\frac{\partial \mathbf{C}(\mathbf{z}, \mathbf{z}')}{\partial^2 \mathbf{0}} = -\frac{2}{l\_c^2} \left( \left( \frac{\partial \mathbf{z}}{\partial^2 \mathbf{0}} - \frac{\partial \mathbf{z}}{\partial^2 \mathbf{0}} \right)^T (\mathbf{z} - \mathbf{z}') \right) \mathbf{C}(\mathbf{z}, \mathbf{z}'). \tag{35}$$

This completes the definition of all the terms required to compute the HMC gradient

$$\frac{\partial \rho(\mathbf{0})}{\partial \mathbf{0}}.$$

#### **4. Numerical implementation and results**

#### **4.1 Observation data**

A seepage zone containing a piping region of length *l* and width *w*, shown in **Figure 1**, is chosen to generate synthetic observations for inversion. A steady seepage flow problem is solved on the domain defined by *l* ¼ 0*:*15 m and *w* ¼ 0*:*05 m. The left and right boundaries marked in blue are Dirichlet boundaries and the top and bottom boundaries are no-flow Neumann boundaries. Linear shape functions are used in the FE solution of the forward problem. Ten sets of observations (hydraulic head and total outward normal flux) are made by increasing the hydraulic head at the left boundary as shown by point A in **Figure 2**. The hydraulic head at the right boundary is fixed at 0. The corresponding hydraulic head recorded at observation points B, C, D, and E are shown in **Figure 2**. The total outward normal flux from the right boundary is summed and also shown on the right in **Figure 2**. Observation data is generated assuming a constant hydraulic conductivity field, i.e., *<sup>k</sup>*trueð Þ¼ **<sup>z</sup>** <sup>0</sup>*:*001 m/s. The standard deviation of the observation noise for the hydraulic head data and outward normal flux data is taken to be 1 and 5% respectively.

#### **4.2 Inversion setup**

Considering a log-normal hydraulic conductivity field *k*ð Þ**z** , inversion is carried out on the Gaussian field ~ *<sup>k</sup>*ð Þ¼ **<sup>z</sup>** log ½ � *<sup>k</sup>*ð Þ� **<sup>z</sup>** ˘*<sup>k</sup>* , where ˘*<sup>k</sup>* is a lower bound taken as 10�<sup>5</sup> m/s. The log-normal construction enables a convenient choice of the standard deviation of the random field which is chosen as *σ*ð Þ¼ **z** 1. The correlation length of the

#### **Figure 1.**

*Discretized seepage domain containing piping zone, used to obtain observation data, i.e., hydraulic head h data at points B, C, D and E, and total outward normal flux data q from the boundary marked in blue on the right. The hydraulic head is constant on the left boundary. The number of nodes in the discretization is 3943.*

*Numerical Gradient Computation for Simultaneous Detection of Geometry and Spatial Random… DOI: http://dx.doi.org/10.5772/intechopen.108363*

**Figure 2.**

*Observation data generated from numerical seepage flow experiment. Data at each time t is obtained by solving a new seepage flow experiment with an input hydraulic head at the left boundary represented by A.*

random field is assumed to be known and is taken to be equal to the length of the largest dimension of the domain i.e. *lc* ¼ 0*:*4m.

A preliminary assessment of the eigenvalues obtained by solving the generalized eigenvalue problem in Eq. (14) reveals that the K-L expansion can be truncated at *<sup>K</sup>*<sup>1</sup> <sup>¼</sup> 12 terms. The eigenvalues are found to decay by approximately 3 � 104 times. This enables the determination of the number of terms in the spatial parameter vector 1 **θ** ¼ ð Þ *θ*1, … , *θ*<sup>12</sup> . As mentioned in Section 3.2, once the geometry parameterization 2 **<sup>θ</sup>** <sup>¼</sup> <sup>2</sup> *θ*1, <sup>2</sup> *θ*2 � � <sup>¼</sup> ð Þ *<sup>θ</sup>*13, *<sup>θ</sup>*<sup>14</sup> is known (see **Figure 3**(**a**)), an explicit function **<sup>Z</sup>***<sup>v</sup>* <sup>2</sup> **θ** � � can be constructed as:

$$\mathbf{z}\_{v\_j}^1 = \left( \underbrace{\,}\_{L\_2 - \,}^{L\_1 - \,} \theta\_1 \mathbf{i} - \mathbf{i} \right)^2 \frac{\theta\_1}{n\_1 - \mathbf{1}} \Big|,\tag{36}$$

$$\mathbf{z}\_{v\_{\mathcal{I}}}^2 = \left( \frac{\mathbf{z}\_1 \mathbf{-}^2 \theta\_1}{\mathbf{z}\_2 - \mathbf{z}^2 \theta\_2 + (\mathbf{j} - 1)} \frac{\mathbf{z}\_2}{n\_2 - \mathbf{1}} \right). \tag{37}$$

Eqs. (37) and (38) can be readily differentiated to obtain *<sup>∂</sup>***Z***<sup>v</sup> <sup>∂</sup>***<sup>θ</sup>** , which can be used to compute other gradients *<sup>∂</sup>***<sup>Z</sup>** *<sup>∂</sup>***θ**, *<sup>∂</sup><sup>φ</sup> <sup>∂</sup>***<sup>θ</sup>** etc. This completes the definition of the parameter vector.

As all updates are designed to take place from an arbitrary reference domain as mentioned in Eq. (23), the study of the convergence properties requires a focus on the reference mesh. The reference domain is arbitrarily chosen as the true domain used to generate the observations. Seven different reference configuration meshes with 128, 306, 497, 1038, 1476, 1860, and 3018 nodes are considered for the study of the convergence properties. Three of these meshes are shown in **Figure 3**. The artificial elastic properties selected for the mesh moving method are *Eref* <sup>¼</sup> 25 MPa, *<sup>ν</sup>ref* <sup>¼</sup> <sup>0</sup>*:*<sup>25</sup> and *<sup>χ</sup>ref* <sup>¼</sup> 1. To reduce mesh distortion during the mesh moving stage, all the reference meshes have a unique construction, i.e., smaller elements (which behave rigidly) are placed close to the piping zone boundary and larger elements (which are more flexible) are placed further away.

**Figure 3.**

*(a) Parameterizations of the piping zone boundary. Discretization of the reference domain defined by* 2 *θ ref* <sup>1</sup> , 2 *θref* 2 <sup>¼</sup> ð Þ <sup>0</sup>*:*15 m, 0*:*05 m *with (b) 307 (c) 1476 and (d) 3018 nodes. Red dots indicate the position of observation points.*

#### **4.3 Inversion and convergence analysis**

There are no analytical solutions to verify the correctness of the gradient computation procedure mentioned in Section 3. Hence, for verification purposes, a short inversion analysis is done using HMC, where *i* ¼ 150 samples are drawn from the posterior. The number of leapfrog steps is variable and drawn from a Gaussian *<sup>L</sup>* � ð Þ 5, 2 . The prior for the parameters are chosen as <sup>1</sup> **θ** � Nð Þ **0**, **I**<sup>12</sup> and <sup>2</sup> **θ** � Nð Þ **0**, 0*:*1**I**<sup>2</sup> . The results presented in **Figure 4** correspond to inversion done considering the reference mesh shown in **Figure 3**(**b**). The potential energy *φ* decreases rapidly in the first 20 steps and thereafter HMC begins the exploration of the region of the high posterior probability. Prior experience leads the authors to consider the performance of HMC to be appropriate and as such, is an indirect indicator that the gradient computation procedure is correct.

In order to study the convergence properties, suitable error measures are required. The rate of convergence of the truncated KL expansion of the prior random field, for a fixed number of parameters *K*<sup>1</sup> ¼ 12, is determined through the relative mean error variance [14]:

$$\varepsilon\_{\text{Var,rel}} = \frac{\left| \overline{\varepsilon}\_{\sigma, \text{KL}} - \overline{\varepsilon}\_{\sigma, \text{analytic}} \right|}{\overline{\varepsilon}\_{\sigma, \text{analytic}}}. \tag{38}$$

*Numerical Gradient Computation for Simultaneous Detection of Geometry and Spatial Random… DOI: http://dx.doi.org/10.5772/intechopen.108363*

#### **Figure 4.**

*Results from HMC showing (a) Decrease in potential energy as HMC progresses towards the high posterior probability region and samples of (b) θ*<sup>13</sup> *and (c) θ*14*. The dashed lines represent the true values of the parameters used to generate the observations.*

The analytical mean error variance *εσ*,analytic cannot be computed for squared exponential type autocorrelation coefficient functions (Eq. (35)) and is calculated numerically using the fine mesh containing 3943 nodes, as shown in **Figure 1**. The relative error is computed for the seven different meshes mentioned in Section 4.2 and plotted in a log–log plot in **Figure 5**. A closer look at Eq. (10) indicates that the mean error variance is dependent only on the cumulative sum of the *K*<sup>1</sup> eigenvalues. As the eigenvalues are arranged in descending order, the relative error of the largest eigenvalue ð Þ *λ*<sup>1</sup> is also shown in **Figure 5**. The corresponding relative error measure can be written in a generalized manner as:

$$
\varepsilon\_{\lambda\_1} = \frac{\left| \lambda\_1 - \lambda\_{\text{1analytic}} \right|}{\lambda\_{\text{1analytic}}}.\tag{39}
$$

Other error measures can be computed in a similar manner just by replacing *λ*<sup>1</sup> with the relevant variable. Finally, the relative error of the gradient of the largest eigenvalue w.r.t the geometry parameters *<sup>∂</sup>λ*<sup>1</sup> *<sup>∂</sup>*2**<sup>θ</sup>** is also shown in **Figure 5**. To compute the error related to the gradient, one mesh moving step is carried out, where the mesh is moved from the reference domain 2 *θ ref* 1 , 2 *θ ref* 2 <sup>¼</sup> ð Þ <sup>0</sup>*:*15 m, 0*:*05 m to an arbitrary domain defined by <sup>2</sup> *θ*1, <sup>2</sup> *θ*2 <sup>¼</sup> ð Þ <sup>0</sup>*:*16 m, 0*:*04 m . The same linear shape functions are used for discretization of the random field, the mesh moving method and the solution of the forward problem. The same realization of the parameter vector <sup>1</sup> **θ** is used to generate the random field in all the cases. A least squares fit of the relative error plots with a first-order polynomial reveals a linear convergence rate between �2.5 and � 5. Linear rates of convergence for *ε*Var,rel, using linear FEM, are also observed by Betz et al. [14].

Finally, the rate of convergence of the gradient of the potential energy function w.r.t the spatial parameter related to the largest eigenvalue *θ*1, and the two geometry parameters *θ*<sup>13</sup> and *θ*<sup>14</sup> is plotted in **Figure 6**. The convergence rates of the different gradients are almost parallel to each other. Once again, the slope of each plot is

**Figure 6.** *Convergence of relative errors of potential energy gradients for a fixed number of terms K*<sup>1</sup> ¼ 12 *in the K-L*

*expansion.*

determined by a least squares fit with a first-order polynomial. Each plot shows a linear convergence rate with a slope of approximately 2.7.

#### **5. Conclusions**

This paper details the numerical procedure for the computation of gradients in probabilistic inverse problems involving the simultaneous estimation of spatial fields and geometry. The method is analytical (in the sense of [6]), intrusive and involves the computation of gradients of the forward problem. Emphasis is laid upon the calculation of gradient of eigenvalues and eigenvectors involved with the truncated K-L expansion method for discretization of the random field. The eigenvalues and eigenvectors are obtained by solving a generalized eigenvalue problem on a defined domain. This implies that as the geometry parameters are updated, the domain is updated and the generalized eigenvalues and eigenvectors change. Computation of the gradient of the eigenvectors w.r.t the geometry parameters involve the computation of a generalized inverse.

The gradients are validated through an inverse analysis using HMC. The potential energy decreases rapidly as the Markov chain related to the geometry parameters approaches the region of high posterior probability, indicating the correctness of the computed gradients. Overall the rate of convergence of various quantities is observed to be linear on a log–log plot. This means computation costs can grow exponentially to achieve better results. While the same mesh that is used to solve the forward problem can also be used for the Galerkin FE method-based discretization of the IEVP, the repeated need to solve the forward problem, the mesh moving method and the generalized eigenvalue problem at every step can be computationally prohibitive. Elimination of any one, two or even all three of these numerical problems can significantly improve the computational feasibility of simultaneous spatial field and geometry detection using gradient-based stochastic samplers like HMC.

#### **Acknowledgements**

This work was supported by JSPS KAKENHI Grant Number JP21H02304.

#### **Conflict of interest**

The authors declare no conflict of interest.

*Inverse Problems - Recent Advances and Applications*

### **Author details**

Michael Conrad Koch\*, Kazunori Fujisawa and Akira Murakami Graduate School of Agriculture, Kyoto University, Kyoto, Japan

\*Address all correspondence to: koch.michaelconrad.5w@kyoto-u.ac.jp

© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Numerical Gradient Computation for Simultaneous Detection of Geometry and Spatial Random… DOI: http://dx.doi.org/10.5772/intechopen.108363*

#### **References**

[1] Aster RC, Borchers B, Thurber CH. Parameter Estimation and Inverse Problems. Oxford: Academic Press; 2013. DOI: 10.1016/C2009-0-61134-X

[2] Neal R. MCMC Using Hamiltonian Dynamics. Handb. Markov Chain Monte Carlo. New York: CRC Press; 2011. DOI: 10.1201/b10905-6

[3] Voorhees A, Millwater H, Bagley R. Complex variable methods for shape sensitivity of finite element models. Finite Elements in Analysis and Design. 2011;**47**:1146-1156. DOI: 10.1016/ J.FINEL.2011.05.003

[4] Lee J, Kitanidis PK. Bayesian inversion with total variation prior for discrete geologic structure identification. Water Resources Research. 2013;**49**: 7658-7669. DOI: 10.1002/2012WR 013431

[5] Rudin LI, Osher S, Fatemi E. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena. 1992;**60**: 259-268. DOI: 10.1016/0167-2789(92) 90242-F

[6] Christensen PW, Klarbring A. Two-Dimensional Shape Optimization. An Introd. to Struct. Optim. Netherlands: Springer; 2008. DOI: 10.1007/978-1- 4020-8666-3\_7

[7] Haslinger J, Mäkinen RA. Introduction to Shape Optimization-Theory, Approximation, and Computation. Philadelphia: SIAM; 2003

[8] Koch MC, Osugi M, Fujisawa K, Murakami A. Hamiltonian Monte Carlo for simultaneous interface and spatial field detection (HMCSISFD) and its application to a piping zone interface detection problem. International Journal for Numerical and Analytical Methods in Geomechanics. 2021;**45**:2602-2626. DOI: 10.1002/NAG.3279

[9] Koch MC, Fujisawa K, Murakami A. Adjoint Hamiltonian Monte Carlo algorithm for the estimation of elastic modulus through the inversion of elastic wave propagation data. International Journal for Numerical Methods in Engineering. 2020;**121**:1037-1067. DOI: 10.1002/nme.6256

[10] Koch MC, Fujisawa K, Murakami A. Novel parameter update for a gradient based MCMC method for solid-void interface detection through elastodynamic inversion. Probabilistic Engineering Mechanics. 2020;**62**: 103097. DOI: 10.1016/j.probengmech. 2020.103097

[11] Loève M. Probability Theory II. New York: Springer-Verlag; 1978

[12] Ghanem RG, Spanos PD. Stochastic Finite Elements: A Spectral Approach. Springer New York: New York, NY; 1991. DOI: 10.1007/978-1-4612-3094-6

[13] Sudret B, Der Kiureghian A. Stochastic Finite Element Methods and Reliability: A State-of-the-Art Report. UCB/SEMM-2000/08. Berkeley: University of California; 2000

[14] Betz W, Papaioannou I, Straub D. Numerical methods for the discretization of random fields by means of the Karhunen–Loève expansion. Computer Methods in Applied Mechanics and Engineering. 2014;**271**: 109-129. DOI: 10.1016/J.CMA.2013. 12.010

[15] Marzouk YM, Najm HN. Dimensionality reduction and polynomial chaos acceleration of Bayesian inference in inverse problems. Journal of Computational Physics. 2009; **228**:1862-1902. DOI: 10.1016/j. jcp.2008.11.024

[16] Leimkuhler B, Reich S. Simulating Hamiltonian Dynamics. Cambridge: Cambridge University Press; 2005. DOI: 10.1017/CBO9780511614118

[17] Betancourt M. A conceptual introduction to hamiltonian Monte Carlo. ArXiv Preprint. 2017

[18] Andrieu C, De Freitas N, Doucet A, Jordan MI. An introduction to MCMC for machine learning. Machine Learning. 2003:5-43. DOI: 10.1023/A: 1020281327116

[19] Stein K, Tezduyar T, Benney R. Mesh moving techniques for fluidstructure interactions with large displacements. Journal of Applied Mechanics. 2003;**70**:58-63. DOI: 10.1115/ 1.1530635

[20] Koch MC, Osugi M, Fujisawa K, Murakami A. Investigation of reference parameter invariance and application of HMCSISFD for identification of an eroded seepage zone. Proc. 13th Int. Conf. Struct. Saf. Reliab. 2021–2022, Shanghai. 2022

[21] Magnus JR. On Differentiating Eigenvalues and Eigenvectors. Econometric Theory. 1985;**1**:179-191. DOI: 10.1017/S0266466600011129

#### **Chapter 5**

## Solving and Algorithm for Least-Norm General Solution to Constrained Sylvester Matrix Equation

*Abdur Rehman and Ivan I. Kyrchei*

#### **Abstract**

Keeping in view that a lot of physical systems with inverse problems can be written by matrix equations, the least-norm of the solution to a general Sylvester matrix equation with restrictions *A*1*X*<sup>1</sup> ¼ *C*1,*X*1*B*<sup>1</sup> ¼ *C*2, *A*2*X*<sup>2</sup> ¼ *C*3, *X*2*B*<sup>2</sup> ¼ *C*4, *A*3*X*1*B*<sup>3</sup> þ *A*4*X*2*B*<sup>4</sup> ¼ *Cc*, is researched in this chapter. A novel expression of the general solution to this system is established and necessary and sufficient conditions for its existence are constituted. The novelty of the proposed results is not only obtaining a formal representation of the solution in terms of generalized inverses but the construction of an algorithm to find its explicit expression as well. To conduct an algorithm and numerical example, it is used the determinantal representations of the Moore–Penrose inverse previously obtained by one of the authors.

**Keywords:** linear matrix equation, generalized Sylvester matrix equation, Moore-Penrose inverse

#### **1. Introduction**

Standardly, we state and , respectively, for the complex and real numbers. Let *<sup>m</sup>*�*<sup>n</sup>* denote the set of all *<sup>m</sup>* � *<sup>n</sup>* matrices over , and *<sup>m</sup>*�*<sup>n</sup> <sup>r</sup>* stay for a subset of *m* � *n* complex matrices with rank *r*. The rank of *A* is denoted by both symbols *r A*ð Þ and *rankA*. The (complex) conjugate transpose matrix of *A* ∈ *<sup>m</sup>*�*<sup>n</sup>* is written by *A*<sup>∗</sup> and a matrix *<sup>A</sup>* <sup>∈</sup> *<sup>n</sup>*�*<sup>n</sup>* is said to be Hermitian if *<sup>A</sup>*<sup>∗</sup> <sup>¼</sup> *<sup>A</sup>*. An identity matrix with feasible shape is denoted by *I*.

**Definition 1.1.** The Moore–Penrose (MP-) inverse of *A* ∈ *<sup>m</sup>*�*<sup>n</sup>*, denoted by *A*† , is defined to be the unique solution *X* to the following four Penrose equations

$$\mathbf{A}\mathbf{X}\mathbf{A}=\mathbf{A},\tag{1}$$

$$\mathbf{X} \mathbf{A} \mathbf{X} = \mathbf{X}, \tag{2}$$

$$\left(\left(AX\right)^{\*} = AX,\right) \tag{3}$$

$$\left(\mathbf{X}\mathbf{A}\right)^{\*} = \mathbf{X}\mathbf{A}.\tag{4}$$

Matrices satisfying the eqs. (1) and (2) are known as reflexive inverses, denoted by *A*þ.

In addition, *LA* <sup>¼</sup> *<sup>I</sup>* � *<sup>A</sup>*† *<sup>A</sup>* and *RA* <sup>¼</sup> *<sup>I</sup>* � *AA*† represent a pair of orthogonal projectors onto the kernels of *A* and *A*<sup>∗</sup> , respectively.

Mathematical models of physical systems with inverse problems especially those has a finite number of model parameters can be written by matrix equations. In particular, the Sylvester-type matrix equations have far-reaching applications in singular system control [1], system design [2], robust control [3], feedback [4], perturbation theory [5], linear descriptor systems [6], neural networks [7] and theory of orbits [8], etc.

Some recent work on generalized Sylvester matrix equations and their systems can be observed in [9–21]. In 2014, Bao [22] examined the least-norm and extremal ranks of the least square solution to the quaternion matrix equations

$$A\_1 \mathbf{X} = \mathbf{C}\_1, \mathbf{X} \mathbf{B}\_1 = \mathbf{C}\_2, \ A\_3 \mathbf{X} \mathbf{B}\_3 = \mathbf{C}\_\iota. \tag{5}$$

Wang et al. [23] examined the expression of the general solution to the system

$$A\_1 X\_1 = C\_1, \quad A\_2 X\_2 = C\_3,\\ A\_3 X\_1 \quad B\_3 + A\_4 X\_2 B\_4 = C\_c,\tag{6}$$

and as an application, the *P*-symmetric and *P*-skew-symmetric solution to

$$A\_a X = \mathcal{C}\_a,\\ A\_b X B\_b = \mathcal{C}\_b.$$

has been established. Li et al. [24] established a novel expression of the general solution of the system (6) and they computed the least-norm of general solution to (6). In 2009, Wang et al. [25] constituted the expression of the general solution to

$$\begin{aligned} A\_1 &\quad X\_1 = \mathcal{C}\_1, X\_1 B\_1 = \mathcal{C}\_2, \\ A\_2 &\quad X\_2 = \mathcal{C}\_3, X\_2 B\_2 = \mathcal{C}\_4, \\ A\_3 &\quad X\_1 B\_3 + A\_4 X\_2 B\_4 = \mathcal{C}\_6, \end{aligned} \tag{7}$$

and as an application, they explored the ð Þ *P*, *Q* -symmetric solution to the system

$$A\_{d}X = \mathcal{C}\_{a}, \\ XB\_{b} = \mathcal{C}\_{b}, \\ A\_{c}XB\_{c} = \mathcal{C}\_{c}.$$

Some latest findings on the least-norm of matrix equations and ð Þ *P*, *Q* -symmetric matrices can be consulted in [26–30]. Furthermore, our main system (7) is a special case of the following system

$$\begin{aligned} A\_1 X\_1 &= C\_1, X\_2 B\_1 = D\_1, \\ A\_2 X\_3 &= C\_2, X\_3 B\_2 = D\_2, \\ A\_3 X\_4 &= C\_3, X\_4 B\_3 = D\_3, \\ A\_4 X\_1 + X\_2 &\quad B\_4 + C\_4 X\_3 D\_4 + C\_5 X\_4 D\_5 = C\_c, \end{aligned} \tag{8}$$

which has been investigated by Zhang in 2014.

Motivated by the latest interest of least-norm of matrix equations, we construct a novel expression of the general solution to the system (7) and apply this to investigate the least-norm of the general solution to the system (7) in this chapter. Observing that *Solving and Algorithm for Least-Norm General Solution to Constrained Sylvester Matrix… DOI: http://dx.doi.org/10.5772/intechopen.109749*

systems (5) and (6) are particular cases of our system (7), solving system (7) will encourage the least-norm to a wide class of problems.

We commence with the following lemmas which have crucial function in the construction of the chief outcomes of the following sections.

**Lemma 1.2.** *[31]. Let A*, *B, and C be given matrices over with agreeable dimensions.* Then.

$$\mathbf{1}.r(A) + r(R\_AB) = r(B) + r(R\_BA) = r[A \quad B].$$

$$\mathbf{2}.r(A) + r(CL\_A) = r(C) + r(AL\_C) = r\begin{bmatrix} A\\ C \end{bmatrix}.}$$

$$\mathbf{3}.r(B) + r(C) + r(R\_BAL\_C) = r\begin{bmatrix} A & B\\ C & 0 \end{bmatrix}.}$$

**Lemma 1.3.** *[32]. Let A, B, and C be known matrices over with right sizes. Then*

$$\begin{array}{l} \mathbf{1}.A^\dagger = (A^\*A)^\dagger A^\* = A^\* \left(A A^\* \right)^\dagger. \\\\ \mathbf{2}.L\_A = L\_A^2 = L\_A^\*, R\_A = R\_A^2 = R\_A^\*. \\\\ \mathbf{3}.L\_A \left(B L\_A \right)^\dagger = \left(B L\_A \right)^\dagger, \left(R\_A C \right)^\dagger R\_A = \left(R\_A C \right)^\dagger. \end{array}$$

**Lemma 1.4.** *[33]. Let* Φ, Ω *be matrices over and*

$$\boldsymbol{\Phi} = \begin{bmatrix} \Phi\_1 \\ \Phi\_2 \end{bmatrix}, \ \boldsymbol{\Omega} = \begin{bmatrix} \boldsymbol{\Omega}\_1 & \boldsymbol{\Omega}\_2 \\ \boldsymbol{\end{bmatrix}, \ \boldsymbol{F} = \boldsymbol{\Phi}\_2 \boldsymbol{L}\_{\Phi\_1}, \ \boldsymbol{T} = \boldsymbol{R}\_{\Omega\_1} \boldsymbol{\Omega}\_2.$$

*:*

*Then*

$$\begin{aligned} L\_{\Phi} &= L\_{\Phi\_1} L\_F, & L\_{\Omega} &= \begin{bmatrix} L\_{\Omega\_1} & -\Omega\_1^\dagger \Omega\_2 L\_T \\ \mathbf{0} & L\_T \end{bmatrix}, \\ R\_{\Omega} &= R\_T R\_{\Omega\_1}, & R\_{\Phi} &= \begin{bmatrix} R\_{\Phi\_1} & \mathbf{0} \\ -R\_F \Phi\_2 \Phi\_1^\dagger & R\_F \end{bmatrix}, \end{aligned}$$

*where* Φ<sup>þ</sup> <sup>1</sup> , Ω<sup>þ</sup> <sup>1</sup> *are any fixed reflexive inverses, L*<sup>Φ</sup><sup>1</sup> *and R*<sup>Ω</sup><sup>1</sup> *stand for the projectors L*<sup>Φ</sup><sup>1</sup> ¼ *I* � Φ<sup>þ</sup> <sup>1</sup> Φ1, *R*<sup>Ω</sup><sup>1</sup> ¼ *I* � Ω1Ω<sup>þ</sup> <sup>1</sup> *induced by* Φ1, Ω1*, respectively.*

*Remark* 1.5. Since the Moore-Penrose inverse is a reflexive inverse, this lemma can be used for the MP-inverse without any changes. It has taken place in ([32], Lemma 2.4).

**Lemma 1.6.** *[34]. Suppose that*

$$B\_1 \mathbf{X} \mathbf{C}\_1 + B\_2 \mathbf{Y} \mathbf{C}\_2 = A \tag{9}$$

*is consistent linear matrix equation. Then.*

1.*The general solution of the homogeneous equation*

$$B\_1 X C\_1 + B\_2 Y C\_2 = 0,$$

*can be expressed by*

$$X = X\_1 X\_2 + X\_3, \quad Y = Y\_1 Y\_2 + Y\_3,$$

*where X*<sup>1</sup> � *X*<sup>3</sup> *and Y*<sup>1</sup> � *Y*<sup>3</sup> *are general solution to the system*

$$B\_1 X\_1 = -B\_2 Y\_1, \; X\_2 C\_1 = Y\_2 C\_2, \; B\_1 X\_3 C\_1 = 0, \; B\_2 Y\_3 C\_2 = 0.1$$

*By computing the value of unknowns in above and using them in X* and *Y, we have*

$$\begin{aligned} X &= \mathbb{S}\_1 L\_G U R\_H T\_1 + L\_{B\_1} V\_1 + V\_2 R\_{C\_1}, \\ Y &= \mathbb{S}\_2 L\_G U R\_H T\_2 + L\_{B\_2} W\_1 + W\_2 R\_{C\_2}, \end{aligned}$$

*where S*<sup>1</sup> <sup>¼</sup> *Ip*, 0 , *<sup>S</sup>*<sup>2</sup> <sup>¼</sup> 0, *Is* ½ �, *<sup>T</sup>*<sup>1</sup> <sup>¼</sup> *Iq* 0 , *<sup>T</sup>*<sup>2</sup> <sup>¼</sup> <sup>0</sup> *It* , *<sup>G</sup>* <sup>¼</sup> ½ � *<sup>B</sup>*1, *<sup>B</sup>*<sup>2</sup> , *and <sup>H</sup>* <sup>¼</sup> *<sup>C</sup>*<sup>1</sup> �*C*<sup>2</sup> *; the matrices U*,*V*1,*V*2,*W*<sup>1</sup> *and W*<sup>2</sup> *are free to vary over :*

2.*Assume that* Eq. (9) *is solvable, then its general solution can be expressed as*

$$X = X\_0 + X\_1 X\_2 + X\_3, \quad Y = Y\_0 + Y\_1 Y\_2 + Y\_3,$$

*where X*<sup>0</sup> *and Y*<sup>0</sup> *are any pair of particular solutions to (9)*.

*It can also be written as*

$$\begin{aligned} X &= X\_0 + \mathbb{S}\_1 L\_G U \mathcal{R}\_H T\_1 + L\_{B\_1} V\_1 + V\_2 \mathcal{R}\_{C\_1}, \\ Y &= Y\_0 + \mathbb{S}\_2 L\_G U \mathcal{R}\_H T\_2 + L\_{B\_2} W\_1 + W\_2 \mathcal{R}\_{C\_2}. \end{aligned}$$

**Lemma 1.7.** *[35]. Let A*1, *B*1,*C*1,*C*<sup>2</sup> *be given matrices over with agreeable sizes and X*<sup>1</sup> *to be determined. Then the system*

$$A\_1 X\_1 = C\_1,\\ X\_1 B\_1 = C\_2,\\ \tag{10}$$

*is consistent if and only if*

$$R\_{A\_1} C\_1 = 0, \ \ C\_2 L\_{B\_1} = 0, \ \ A\_1 C\_2 = C\_1 B\_1. \tag{11}$$

*Under these conditions, the general solution to* (10) *can be established as*

$$X\_1 = A\_1^\dagger C\_1 + L\_{A\_1} C\_2 B\_1^\dagger + L\_{A\_1} U\_1 R\_{B\_1} \mu$$

*where U*<sup>1</sup> *is a free matrix over with accordant dimension*.

**Lemma 1.8.** *[36]. Let A, B, and C be known matrices over with agreeable dimensions, and X be unknown. Then the matrix equation*

$$AXB = \mathbb{C} \tag{12}$$

*is consistent if and only if AA*† *CB*† *B* ¼ *C*. *In this case, its general solution can be expressed as*

*Solving and Algorithm for Least-Norm General Solution to Constrained Sylvester Matrix… DOI: http://dx.doi.org/10.5772/intechopen.109749*

$$X = A^\dagger \mathcal{C} \mathcal{B}^\dagger + L\_A V + W R\_B,\tag{13}$$

*where V*,*W are arbitrary matrices over with appropriate dimensions*.

In [37], it is proved that (13) is the least squares solution to (12), and its minimum norm least squares solution is *XLS* <sup>¼</sup> *<sup>A</sup>*† *CB*† .

**Lemma 1.9.** *[25]. Let Ai*, *Bi*,*Ci, i*ð Þ ¼ 1, … , 4 *, and Cc be given matrices over with agreeable dimensions, and X*1, *X*<sup>2</sup> *to be determined. Denote*

$$\begin{aligned} A &= A\_3 L\_{A\_1}, B = R\_{B\_1} B\_3, C = A\_4 L\_{A\_2}, D = R\_{B\_2} B\_4, \\ N &= D L\_B, M = R\_A C, S = C L\_M, \\ E &= C\_c - A\_3 A\_1^\dagger C\_1 B\_3 - A C\_2 B\_1^\dagger B\_3 - A\_4 A\_2^\dagger C\_3 B\_4 - C C\_4 B\_2^\dagger B\_4. \end{aligned}$$

*Then the following conditions are tantamount:*

1.*System* (7) *is resolvable.*

2.*The conditions in* (11) *are met and*

$$\begin{aligned} R\_{A\_2} C\_3 &= 0, & C\_4 L\_{B\_2} &= 0, & A\_2 C\_4 &= C\_3 B\_2, \\ R\_M R\_A E &= 0, & R\_A E L\_D &= 0, & E L\_B L\_N &= 0, & R\_C E L\_B &= 0. \end{aligned} \tag{14}$$

3.*The equalities in* (11) and (14) *are satisfied and*

$$\mathbf{M} \mathbf{M}^{\dagger} \mathbf{R}\_{A} \mathbf{D}^{\dagger} \mathbf{D} = \mathbf{R}\_{A} \mathbf{E}, \quad \mathbf{C} \mathbf{C}^{\dagger} \mathbf{E} \mathbf{L}\_{B} \mathbf{N}^{\dagger} \mathbf{N} = \mathbf{E} \mathbf{L}\_{B}.$$

*In these conditions, the general solution to the system* (7) *can be written as*

$$\begin{aligned} X\_1 &= A\_1^\dagger C\_1 + L\_{A\_1} C\_2 B\_1^\dagger + L\_{A\_1} A^\dagger E B^\dagger R\_{B\_1} - L\_{A\_1} A^\dagger C M^\dagger E B^\dagger R\_{B\_1} - \\ &- L\_{A\_1} A^\dagger S C^\dagger E N^\dagger D B^\dagger R\_{B\_1} - L\_{A\_1} A^\dagger S V\_1 R\_N D B^\dagger R\_{B\_1} + \\ &+ L\_{A\_1} (L\_A U\_1 + Z\_1 R\_B) R\_{B\_1}, \\ &\dots \end{aligned} \tag{15}$$

$$\begin{split} X\_2 &= A\_2^\dagger C\_3 + L\_{A\_2} C\_4 B\_2^\dagger + L\_{A\_2} M^\dagger R\_A E D^\dagger R\_{B\_2} + L\_{A\_2} L\_{M\_b} S^\dagger S C^\dagger E N^\dagger R\_{B\_2} \\ &+ L\_{A\_2} L\_M (V\_1 - S^\dagger S V\_1 N N^\dagger) R\_{B\_2} + L\_{A\_2} W\_1 R\_D R\_{B\_2}, \end{split} \tag{16}$$

*where U*1,*V*1,*W*<sup>1</sup> *and Z*<sup>1</sup> *are free matrices over with agreeable dimensions.*

Since the general solutions of considered systems are expressed in terms of generalized inverses, another goal of the paper is to give determinantal representations of the least-norm of the general solution to the system (7) based on determinantal representations of generalized inverses.

Due to the important role of generalized inverses in many application fields, considerable effort has been exerted toward the numerical algorithms for fast and accurate calculation of matrix generalized inverse. In general, most existing methods for their obtaining are iterative algorithms for approximating generalized inverses of complex matrices (some recent papers, see, e.g. [38–40]). There are only several direct methods for finding MP-inverse for an arbitrary complex matrix *A* ∈ *<sup>m</sup>*�*<sup>n</sup>*. The most famous is method based on singular value decomposition (SVD), i.e. if *<sup>A</sup>* <sup>¼</sup> *<sup>U</sup>*Σ*<sup>V</sup>* <sup>∗</sup> , then *<sup>A</sup>*† <sup>¼</sup> *<sup>V</sup>*Σ†*<sup>U</sup>* <sup>∗</sup> . The computational cost of this method is dominated by the cost of computing the SVD, which is several times higher than matrix–matrix

multiplication. Another approach is constructing determinantal representations of the MP-inverse *A*† . A well-known determinantal representation of an ordinary inverse is the adjugate matrix with the cofactors in entries. It has an important theoretical significance and brings forth Cramer's rule for systems of linear equations. The same is desirable to have for the generalized inverses. Due to looking for their more applicable explicit expressions, there are various determinantal representations of generalized inverses (for the MP-inverse, see, e.g. [41, 42]). Because of the complexity of the previously obtained expressions of determinantal representations of the MP-inverse, they have little applicability.

In this chapter, we will use the determinantal representations of the MP-inverse recently obtained in [43].

**Lemma 1.10.** *[43,Theorem 2.2] If A* <sup>∈</sup> *<sup>m</sup>*�*<sup>n</sup> with rankA* <sup>¼</sup> *r, then the Moore-Penrose inverse A*† <sup>¼</sup> *<sup>a</sup>*† *ij* � � <sup>∈</sup> *<sup>n</sup>*�*<sup>m</sup> possess the following determinantal representations*

$$a\_{\vec{\eta}}^{\uparrow} = \frac{\sum\_{\beta \in I\_{r,n}\{i\}} \left| \left(\mathbf{A}^\* \,^\*\mathbf{A}\right)\_{:i} \left(\mathbf{a}\_{\cdot \cdot}^\*\right) \right|\_{\beta}^{\beta}}{\sum\_{\beta \in I\_{r,n}} \left| \mathbf{A}^\* \,^\*\mathbf{A} \right|\_{\beta}^{\beta}} = \frac{\sum\_{a \in I\_{r,n}\{\bar{\eta}\}} \left| \left(\mathbf{A} \mathbf{A}^\*\right)\_{\cdot \cdot} \left(\mathbf{a}\_{\cdot \cdot}^\*\right) \right|\_{a}^{a}}{\sum\_{a \in I\_{r,n}} \left| \mathbf{A} \mathbf{A}^\* \,^\*\right|\_{a}^{a}}. \tag{17}$$

Here j j *<sup>A</sup> <sup>α</sup> <sup>α</sup>* denote a principal minor of *A* whose rows and columns are indexed by *α*≔f g *α*1, … , *α<sup>k</sup>* ⊆f g 1, … , *m* ,

$$L\_{k,m} \coloneqq \{ a: 1 \le a\_1 < \dots < a\_k \le m \}, \text{ and } I\_{r,m} \{ i \} \coloneqq \{ a: a \in L\_{r,m}, i \in a \}.$$

Also, *a*<sup>∗</sup> *:<sup>j</sup>* and *a*<sup>∗</sup> *<sup>i</sup>:* denote the *<sup>j</sup>*th column and the *<sup>i</sup>*th row of *<sup>A</sup>*<sup>∗</sup> , and *Ai:*ð Þ *<sup>b</sup>* and *<sup>A</sup>:<sup>j</sup>*ð Þ*<sup>c</sup>* stand for the matrices obtained from *A* by replacing its *i*th row with the row vector *b*∈ <sup>1</sup>�*<sup>n</sup>* and its *j*th column with the column vector *c*∈ *<sup>m</sup>*, respectively.

The formulas (17) give very simple and elegant determinantal representations of the MP-inverse. So, for any *A* ∈ *<sup>m</sup>*�*<sup>n</sup> <sup>r</sup>* , we have sum of all principal minors of *r* order of the matrices *A*<sup>∗</sup> *A* or *AA*<sup>∗</sup> in denominators and sum of principal minors of *r* order of the matrices *<sup>A</sup>*<sup>∗</sup> ð Þ *<sup>A</sup> :<sup>i</sup> <sup>a</sup>*<sup>∗</sup> *:j* � � or *AA*<sup>∗</sup> ð Þ*<sup>j</sup>: <sup>a</sup>*<sup>∗</sup> *i:* � � that contain the *i*th column or the *j*th row, respectively, in numerators into (17).

Note that for an arbitrary full-rank matrix *A*, Lemma 1.10 gives a new way of finding an inverse matrix.

**Corollary 1.11.** *If A* <sup>∈</sup> *<sup>m</sup>*�*<sup>n</sup> with rankA* <sup>¼</sup> minf g *<sup>m</sup>*, *<sup>n</sup> , then the inverse A*�<sup>1</sup> <sup>¼</sup> *a*�<sup>1</sup> *ij* � �<sup>∈</sup> *<sup>n</sup>*�*<sup>m</sup> possess the following determinantal representations:*

$$a\_{\vec{\boldsymbol{y}}}^{-1} = \begin{cases} \left| (\boldsymbol{A}^\* \boldsymbol{A})\_{\boldsymbol{i}} (a\_{\vec{\boldsymbol{j}}}^{\*}) \right| \\ \frac{|\boldsymbol{A}^\* \boldsymbol{A}|}{|\boldsymbol{A}^\* \boldsymbol{A}|} \quad \text{if} \quad \text{rank} \boldsymbol{A} = n, \\\left| \frac{(\boldsymbol{A} \boldsymbol{A}^\*)\_{\boldsymbol{j}} (a\_{\vec{\boldsymbol{i}}}^{\*})}{|\boldsymbol{A} \boldsymbol{A}^\*|} \right| \quad \text{if} \quad \text{rank} \boldsymbol{A} = m. \end{cases}$$

These new determinantal representations of the Moore-Penrose inverse have been obtained by the developed novel limit-rank method in the case of quaternion matrices [44] as well. This method was successfully applied for constructing determinantal

*Solving and Algorithm for Least-Norm General Solution to Constrained Sylvester Matrix… DOI: http://dx.doi.org/10.5772/intechopen.109749*

representations of other generalized inverses in both cases for complex and quaternion matrices (see e.g. [45–47]). It also yields Cramer's rules of various matrix equations [48–54].

The remainder of our chapter is directed as follows. In Section 2, we provide a new expression of the general solution to our system (7) and discuss its least-norm. The algorithm and numerical example of finding the anti-Hermitian solution to (7) are presented in Section 3. (7). Finally, in Section 4, the conclusions are drawn.

#### **2. A new expression of the general solution to the system**

Now we demonstrate the principal theorem of this section (7).

**Theorem 2.1.** *Assume that S*<sup>1</sup> <sup>¼</sup> *Ip*<sup>1</sup> <sup>0</sup> , *<sup>S</sup>*<sup>2</sup> <sup>¼</sup> <sup>0</sup> *Ip*<sup>2</sup> *, T*<sup>1</sup> <sup>¼</sup> *Iq*<sup>1</sup> 0 *, T*<sup>2</sup> <sup>¼</sup> *Iq*<sup>2</sup> 0 *, , H*<sup>1</sup> <sup>¼</sup> *LA*1*LA*, *<sup>H</sup>*<sup>2</sup> <sup>¼</sup> *LA*1*S*1*LG*, *<sup>H</sup>*<sup>3</sup> <sup>¼</sup> *RHT*1*RB*<sup>1</sup> , *<sup>H</sup>*<sup>4</sup> <sup>¼</sup> *LA*2*LC*, *<sup>H</sup>*<sup>5</sup> <sup>¼</sup>

*<sup>G</sup>* <sup>¼</sup> ½ � *A C* , *<sup>H</sup>* <sup>¼</sup> *<sup>B</sup>* �*D LA*2*S*2*LG*, *H*<sup>6</sup> ¼ *RHT*2*RB*<sup>2</sup> *and the system* (7) *is solvable, then the general solution to our system can be formed as*

$$\begin{aligned} X\_1 &= A\_1^\dagger C\_1 + L\_{A\_1} C\_2 B\_1^\dagger + L\_{A\_1} A^\dagger E B^\dagger R\_{B\_1} - L\_{A\_1} A^\dagger C M^\dagger E B^\dagger R\_{B\_1} - \\ &- L\_{A\_1} A^\dagger S C^\dagger E N^\dagger D B^\dagger R\_{B\_1} + H\_1 V\_1 R\_{B\_1} + H\_2 U H\_3 + L\_{A\_1} V\_2 R\_B R\_{B\_1}, \end{aligned} \tag{18}$$

$$\begin{split} X\_2 &= A\_2^\dagger C\_3 + L\_{A\_2} C\_4 B\_2^\dagger + L\_{A\_2} M^\dagger R\_A E D^\dagger R\_{B\_2} + L\_{A\_2} L\_M S^\dagger S C^\dagger E N^\dagger R\_{B\_2} + \\ &+ H\_4 W\_1 R\_{B\_2} + H\_5 U H\_6 + L\_{A\_2} W\_2 R\_D R\_{B\_2}, \end{split} \tag{19}$$

*where U*,*V*1,*V*2, *W*1, *and W*<sup>2</sup> *are free matrices over with allowable dimensions.*

*Proof.* Our proof contains three parts. At the first step, we show that the matrices *X*<sup>1</sup> and *X*<sup>2</sup> have the forms of

$$X\_1 = \phi\_0 + H\_1 V\_1 R\_{B\_1} + L\_{A\_1} V\_2 R\_B R\_{B\_1} + H\_2 U H\_3,\tag{20}$$

$$X\_2 = \varphi\_0 + H\_4 W\_1 R\_{B\_1} + L\_{A\_2} W\_2 R\_D R\_{B\_2} + H\_5 U H\_6,\tag{21}$$

where *ϕ*<sup>0</sup> and *ψ*<sup>0</sup> are any pair of particular solution to the system (7), *V*1, *V*2, *W*1, *W*2, and *U* are free matrices of able shapes over , are solutions to the system (7). In the second step, we display that any couple of solutions *μ*<sup>0</sup> and *ν*<sup>0</sup> to the system (7) can be established as (20) and (21), respectively. In the end, we confirm that

$$\begin{aligned} \mu &= A\_1^\dagger \mathbf{C}\_1 + L\_{A\_1} \mathbf{C}\_2 B\_1^\dagger + A^\dagger E B^\dagger - A^\dagger \mathbf{C} M^\dagger E B^\dagger - A^\dagger S C^\dagger E N^\dagger D B^\dagger, \\ \nu &= A\_2^\dagger \mathbf{C}\_3 + L\_{A\_2} \mathbf{C}\_4 B\_2^\dagger + L\_{A\_2} M^\dagger R\_A E D^\dagger + L\_{A\_2} L\_M \mathbf{S}^\dagger \mathbf{S} C^\dagger E N^\dagger R\_{B\_2} \end{aligned}$$

are a couple of particular solutions to the system (7).

Now we prove that a couple of matrices *X*<sup>1</sup> and *X*<sup>2</sup> having the shape of (20) and (21), respectively, are solutions to the system (7). Observe that

$$\begin{aligned} A\_1^\dagger C\_1 B\_1 + L\_{A\_1} C\_2 B\_1^\dagger B\_1 &= A\_1^\dagger A\_1 C\_2 + L\_{A\_1} C\_2 = C\_2, \\ A\_2^\dagger C\_3 B\_2 + L\_{A\_2} C\_4 B\_2^\dagger B\_2 &= A\_2^\dagger A\_2 C\_4 + L\_{A\_2} C\_4 = C\_4. \end{aligned}$$

It is evident that *X*<sup>1</sup> having the form (20) is a solution of *A*1*X*<sup>1</sup> ¼ *C*1, and *X*1*B*<sup>1</sup> ¼ *C*<sup>2</sup> and *X*<sup>2</sup> having the form (21) is a solution to *A*2*X*<sup>2</sup> ¼ *C*3, *X*2*B*<sup>2</sup> ¼ *C*4. Now we are left to show that *A*3*X*1*B*<sup>3</sup> þ *A*4*X*2*B*<sup>4</sup> ¼ *Cc* is satisfied by *X*<sup>1</sup> and *X*<sup>2</sup> given in (20) and (21). By Lemma 1.4, we have

$$\begin{split} A \mathbf{S\_1} L\_G &= A \begin{bmatrix} I\_{p\_1} & \mathbf{0} \end{bmatrix} \begin{bmatrix} L\_A & -A^\dagger \mathbf{C} L\_M \\ \mathbf{0} & L\_M \end{bmatrix} = A \begin{bmatrix} L\_A & -A^\dagger \mathbf{C} L\_M \end{bmatrix} \\ &= \begin{bmatrix} \mathbf{0} & -A A^\dagger \mathbf{C} L\_M \end{bmatrix} = \begin{bmatrix} \mathbf{0} & -(\mathbf{C} - M) L\_M \end{bmatrix} = \begin{bmatrix} \mathbf{0} & -\mathbf{C} L\_M \end{bmatrix} \\ &= -\begin{bmatrix} \mathbf{0} & \mathbf{S} \end{bmatrix} = -\mathbf{C} \mathbf{S}\_2 L\_G, \end{split} \tag{22}$$

and

$$\begin{aligned} \boldsymbol{R}\_{H}\boldsymbol{T}\_{1}\boldsymbol{B} &= \begin{bmatrix} \boldsymbol{R}\_{B} & \mathbf{0} \\ \boldsymbol{R}\_{N}\boldsymbol{D}\boldsymbol{B}^{\dagger} & \boldsymbol{R}\_{N} \end{bmatrix} \begin{bmatrix} \boldsymbol{I}\_{q\_{1}} \\ \mathbf{0} \end{bmatrix} \boldsymbol{B} = \begin{bmatrix} \boldsymbol{R}\_{B} \\ \boldsymbol{R}\_{N}\boldsymbol{D}\boldsymbol{B}^{\dagger} \end{bmatrix} \boldsymbol{B} \\ &= \begin{bmatrix} \mathbf{0} \\ \boldsymbol{R}\_{N}\boldsymbol{D}\boldsymbol{B}^{\dagger}\boldsymbol{B} \end{bmatrix} = \begin{bmatrix} \mathbf{0} \\ \boldsymbol{R}\_{N}\boldsymbol{D}(\boldsymbol{I}-\boldsymbol{L}\_{B}) \end{bmatrix} = \begin{bmatrix} \mathbf{0} \\ \boldsymbol{R}\_{N}\boldsymbol{D} \end{bmatrix} \\ &= \boldsymbol{R}\_{H}\boldsymbol{T}\_{2}\boldsymbol{D}. \end{aligned} \tag{23}$$

Observe that *ALA* ¼ 0 and by using (22) and (23), we arrive that

$$A\_3 X\_1 B\_3 + A\_4 X\_2 B\_4 = C\_c.$$

Conversely, assume that *μ*<sup>0</sup> and *ν*<sup>0</sup> are any couple of solutions to our system (7). By Lemma 1.7, we have

$$\begin{aligned} A\_1 A\_1^\dagger C\_1 &= C\_1, C\_2 B\_1^\dagger B\_1 = C\_2, A\_2 A\_2^\dagger C\_3 = C\_3, \\ C\_4 B\_2^\dagger B\_2 &= C\_4, A\_1 C\_2 = C\_1 B\_1, A\_2 C\_4 = C\_3 B\_2. \end{aligned}$$

Observe that

$$\begin{aligned} L\_{A\_1}\mu\_0 R\_{B\_1} &= \left(I - A\_1^\dagger A\_1\right)\mu\_0 \left(I - B\_1 B\_1^\dagger\right) \\ &= \mu\_0 - \mu\_0 B\_1 B\_1^\dagger - A\_1^\dagger A\_1 \mu\_0 + A\_1^\dagger A\_1 \mu\_0 B\_1 B\_1^\dagger \\ &= \mu\_0 - C\_2 B\_1^\dagger - A\_1^\dagger C\_1 + A\_1^\dagger A\_1 C\_2 B\_1^\dagger \\ &= \mu\_0 - L\_{A\_1} C\_2 B\_1^\dagger - A\_1^\dagger C\_1 \end{aligned}$$

produces

$$
\mu\_0 = L\_{A\_1} \mathbf{C}\_2 \mathbf{B}\_1^\dagger + A\_1^\dagger \mathbf{C}\_1 + L\_{A\_1} \mu\_0 \mathbf{R}\_{B\_1}.\tag{24}
$$

On the same lines, we can get

$$
\omega\_0 = L\_{A\_2} \mathbf{C}\_4 \mathbf{B}\_2^\dagger + A\_2^\dagger \mathbf{C}\_3 + L\_{A\_2} \nu\_0 \mathbf{R}\_{B\_2}.\tag{25}
$$

It is manifest that *μ*<sup>0</sup> and *ν*<sup>0</sup> defined in (24)–(25) are also solution pair of

$$AX\_1B + CX\_2D = E.\tag{26}$$

*Solving and Algorithm for Least-Norm General Solution to Constrained Sylvester Matrix… DOI: http://dx.doi.org/10.5772/intechopen.109749*

Since

$$\begin{aligned} AX\_1B + CX\_2D &= A\_3L\_{A\_1}\mu\_0 R\_{B\_1}B\_3 + A\_4L\_{A\_2}\nu\_0 R\_{B\_2}B\_4\\ &= A\_3(\mu\_0 - L\_{A\_1}C\_2B\_1^\dagger - A\_1^\dagger C\_1)B\_3 + A\_4(\nu\_0 - L\_{A\_1}C\_4B\_2^\dagger - A\_2^\dagger C\_3)B\_4\\ &= A\_3\mu\_0B\_3 - A\_3L\_{A\_1}C\_2B\_1^\dagger B\_3 - A\_1^\dagger C\_1B\_3 + A\_4\nu\_0 B\_4\\ &- A\_4L\_{A\_1}C\_4B\_2^\dagger B\_4 - A\_4A\_2^\dagger C\_3B\_4\\ &= A\_3\mu\_0B\_3 + A\_4\nu\_0B\_4 - A C\_2B\_1^\dagger B\_3 - A\_1^\dagger C\_1B\_3 - C C\_4B\_2^\dagger B\_4 - A\_4A\_2^\dagger C\_3B\_4\\ &= C\_c - A C\_2B\_1^\dagger B\_3 - A\_1^\dagger C\_1B\_3 - C C\_4B\_2^\dagger B\_4 - A\_4A\_2^\dagger C\_3B\_4\\ &= E. \end{aligned}$$

Hence by Lemma 1.6, *μ*<sup>0</sup> and *ν*<sup>0</sup> can be written as

$$
\mu\_0 = \mathbf{X}\_{01} + \mathbf{S}\_1 L\_G U \mathbf{R}\_H T\_1 + L\_A V\_1 + V\_2 \mathbf{R}\_B,\tag{27}
$$

$$
\nu\_0 = X\_{02} + \mathcal{S}\_2 L\_G U \mathcal{R}\_H T\_2 + L\_C W\_1 + W\_2 \mathcal{R}\_D,\tag{28}
$$

where *X*<sup>01</sup> and *X*<sup>02</sup> are a couple of special solutions to (26) and *U*,*V*1,*V*2, *W*<sup>1</sup> and *W*<sup>2</sup> are free matrices with agreeable dimensions. Using (27) and (28) in (24) and (25), respectively, we get

$$\begin{aligned} \mu\_0 &= X\_{10} + H\_2 U H\_3 + H\_1 V\_1 R\_{B\_1} + L\_{A\_1} V\_2 R\_B R\_{B\_1}, \\ \nu\_0 &= X\_{20} + H\_5 U H\_6 + H\_4 W\_1 R\_{B\_2} + L\_{A\_2} W\_2 R\_D R\_{B\_2}, \end{aligned}$$

where *<sup>X</sup>*<sup>10</sup> <sup>¼</sup> *<sup>A</sup>*† <sup>1</sup>*C*<sup>1</sup> <sup>þ</sup> *LA*1*C*2*B*† <sup>1</sup> <sup>þ</sup> *LA*1*X*01*RB*<sup>1</sup> and *<sup>X</sup>*<sup>20</sup> <sup>¼</sup> *<sup>A</sup>*† <sup>2</sup>*C*<sup>3</sup> <sup>þ</sup> *LA*2*C*4*B*† <sup>2</sup> þ *LA*2*X*02*RB*<sup>2</sup> *:* It is evident that *X*<sup>10</sup> and *X*<sup>20</sup> are a couple of solutions to the system (7). It is clear that *μ*<sup>0</sup> and *ν*<sup>0</sup> can be represented by (20) and (21), respectively. Lastly, by putting *U*1,*V*1, *W*1, and *Z*<sup>1</sup> equal to zero in (15) and (16), we conclude that *μ* and *ν* are special solutions to the system (7). Hence the expressions (18) and (19) represent the general solution to the system (7) and the theorem is completed.

*Remark* 2.2. Due to Lemma 1.3 and taking into account *LA*2*LM* ¼ *LMLA*<sup>2</sup> , we have the following simplification of the solution pair to the system (7) that is identical for (15)–(16) and (18)–(19) when *U*, *U*1,*V*1,*V*2, *Z*1, *W*1, and *W*<sup>2</sup> disappear,

$$\begin{aligned} X\_1 &= A\_1^\dagger C\_1 + L\_{A\_1} C\_2 B\_1^\dagger + A^\dagger E B^\dagger - A^\dagger A\_4 M^\dagger E B^\dagger - A^\dagger S C^\dagger E N^\dagger B\_4 B^\dagger, \\ X\_2 &= A\_2^\dagger C\_3 + L\_{A\_2} C\_4 B\_2^\dagger + M^\dagger E D^\dagger + S^\dagger S C^\dagger E N^\dagger. \end{aligned}$$

*Comment* 2.3. We have established a novel expression of the general solution to the system (7) in Theorem 2.1 which is different from one created in [25]. With the help of this novel expression, we can explore the least-norm of the general solution which can not be studied with the help of the expression given in [25], which is one of the advantage of our new expression.

Now we discuss some special cases of our system.

If *B*1, *B*2,*C*<sup>2</sup> and *C*<sup>4</sup> disappear in Theorem 2.1, then we gain the following conclusion.

$$\textbf{Corollary 2.4.}\text{ }Denote\text{ }\textbf{S\_1} = \begin{bmatrix} I\_{p\_1} \ \textbf{0} \end{bmatrix}, \textbf{S\_2} = \begin{bmatrix} \textbf{0} \ I\_{p\_2} \end{bmatrix}, \textbf{T\_1} = \begin{bmatrix} I\_{q\_1} \\ \textbf{0} \end{bmatrix}, \textbf{T\_2} = \begin{bmatrix} I\_{q\_2} \\ \textbf{0} \end{bmatrix},$$

$$\textbf{G} = [\textbf{A} \ \textbf{C}], \textbf{H} = \begin{bmatrix} B\_3 \\ -B\_4 \end{bmatrix}, \textbf{H}\_1 = L\_{\textbf{A}1} L\_{\textbf{A}}, \textbf{H}\_2 = L\_{\textbf{A}1} S\_{\textbf{1}} L\_{\textbf{G}}, \textbf{H}\_3 = R\_{\textbf{H}} T\_{\textbf{1}}, \textbf{H}\_4 = L\_{\textbf{A}2} L\_{\textbf{G}}, \textbf{H}\_5 = L\_{\textbf{A}3} L\_{\textbf{A}}, \textbf{H}\_6 = L\_{\textbf{A}4} L\_{\textbf{A}}, \textbf{H}\_7 = L\_{\textbf{A}5} L\_{\textbf{A}}, \textbf{H}\_8 = L\_{\textbf{A}6} L\_{\textbf{A}}, \textbf{H}\_9 = L\_{\textbf{A}7} L\_{\textbf{A}}, \textbf{H}\_{10} = L\_{\textbf{A}9} L\_{\textbf{A}}, \textbf{H}\_{11} = L\_{\textbf{A}9} L\_{\textbf{A}}, \textbf{H}\_{12} = L\_{\textbf{A}2} L\_{\textbf{A}}, \textbf{H}\_{13} = L\_{\textbf{A}2} L\_{\textbf{A}}, \textbf{H}\_{14} = L\_{\textbf{A}1} L\_{\textbf{A}}, \textbf{H}\_{15} = L\_{\textbf{A}2} L\_{\textbf{A}}, \textbf{H}\_{16} = L\_{\textbf{A}2} L\_{\textbf{A}}, \textbf{H}\_{17} = L\_{\textbf{A}$$

*LA*2*S*2*LG*, *H*<sup>6</sup> ¼ *RHT*<sup>2</sup> *and the system* (6) *is solvable, then the general solution to* (6) *can be formed as*

$$\begin{aligned} X\_1 &= A\_1^\dagger C\_1 + A^\dagger E B\_3^\dagger - A^\dagger A\_4 M^\dagger E B\_3^\dagger - A^\dagger S C^\dagger E N^\dagger B\_4 B\_3^\dagger - H\_1 Y\_1 + \\ &+ H\_2 V H\_3 + L\_{A\_1} Y\_2 R\_{B\_3}, \\ X\_2 &= A\_2^\dagger C\_3 + M^\dagger E B\_4^\dagger + S^\dagger S C^\dagger E N^\dagger + H\_4 Z\_1 + H\_5 V H\_6 + L\_{A\_2} Z\_2 R\_{B\_4}, \end{aligned}$$

*where A*,*C*, *<sup>N</sup>*, *<sup>M</sup>*, *S are the same as in Lemma 1.6, E* <sup>¼</sup> *Cc* � *<sup>A</sup>*3*A*† <sup>1</sup>*C*1*B*<sup>3</sup> � *<sup>A</sup>*4*A*† <sup>2</sup>*C*3*B*4, *V*, *Y*1, *Y*2, *Z*1, and *Z*<sup>2</sup> *are free matrices over obeying agreeable dimensions*.

*Comment* 2.5. The above consequence is a chief result of [32].

If *A*2, *B*2,*C*3, *A*4, *B*<sup>4</sup> and *C*<sup>4</sup> vanish in our system (7), then we get the following outcome.

**Corollary 2.6.** *Suppose that A*1, *B*1,*C*1,*C*2, *A*3, *B*<sup>3</sup> *and Cc are given. Then the general solution to system* (5) *is established by*

$$\begin{aligned} X\_1 &= A\_1^\dagger C\_1 + L\_{A\_1} C\_2 B\_1^\dagger + (A\_3 L\_{A\_1})^\dagger \left[ C\_\iota - A\_3 A\_1^\dagger C\_1 B\_3 - A\_3 L\_{A\_1} C\_2 B\_1^\dagger B\_3 \right] (R\_{B\_1} B\_3)^\dagger + \\ &+ L\_{A\_1} L\_{A\_3 L\_{A\_1}} W\_1 R\_{B\_1} + L\_{A\_1} W\_2 R\_{R\_{B\_1} B\_3} R\_{B\_1}, \end{aligned}$$

*where W*<sup>1</sup> and *W*<sup>2</sup> *are arbitrary matrices over with appropriate sizes*.

We experience the least-norm to the system (7) in this section. By the definition and [55], we can get the following result easily.

**Lemma 2.7.** *Let A* ∈ *<sup>m</sup>*�*<sup>n</sup>*, *B*∈ *<sup>n</sup>*�*m. Then we have.*

(1) <sup>∥</sup>*<sup>A</sup>* <sup>þ</sup> *<sup>B</sup>*∥<sup>2</sup> <sup>¼</sup> <sup>∥</sup>*A*∥<sup>2</sup> <sup>þ</sup> <sup>∥</sup>*B*∥<sup>2</sup> <sup>þ</sup> <sup>2</sup> *Re* tr *<sup>B</sup>*<sup>∗</sup> ½ � ð Þ *<sup>A</sup>* .

*(2) Re*½ �¼ trð Þ *AB Re*½ � trð Þ *BA :*

**Theorem 2.8.** *Assume that system* (7) *is solvable, then the least-norm of the solution pair X*<sup>1</sup> *and X*<sup>2</sup> *to system* (7) *can be extracted as follows:*

$$\begin{aligned} \|X\_1\|\_{\min} &= A\_1^\dagger C\_1 + L\_{A\_1} C\_2 B\_1^\dagger + A^\dagger E B^\dagger - A^\dagger A\_4 M^\dagger E B^\dagger - \\ &- A^\dagger S C^\dagger E N^\dagger B\_4 B^\dagger, \end{aligned} \tag{29}$$

$$\|\|X\_2\|\|\_{\min} = A\_2^\dagger C\_3 + L\_{A\_2} C\_4 B\_2^\dagger + M^\dagger E D^\dagger + S^\dagger S C^\dagger E N^\dagger. \tag{30}$$

*Proof.* By Theorem 2.1 and Remark 2.2, the general solution to (7) can be formed as

$$X\_1 = A\_1^\dagger C\_1 + L\_{A\_1} C\_2 B\_1^\dagger + A^\dagger E B^\dagger - A^\dagger A\_4 M^\dagger E B^\dagger - A^\dagger S C^\dagger E N^\dagger B\_4 B^\dagger$$

$$-H\_1 V\_1 R\_{B\_1} + H\_2 U H\_3 + L\_{A\_1} V\_2 R\_B R\_{B\_1},$$

$$X\_2 = A\_2^\dagger C\_3 + L\_{A\_2} C\_4 B\_2^\dagger + M^\dagger E D^\dagger + S^\dagger S C^\dagger E N^\dagger$$

$$+ H\_4 W\_1 R\_{B\_2} + H\_5 U H\_6 + L\_{A\_2} W\_2 R\_D R\_{B\_2},$$

where *U*,*V*1,*V*2,*W*1, and *W*<sup>2</sup> are free matrices over having executable dimensions. By Lemma 2.7, the norm of *X*<sup>1</sup> can be established as

$$\begin{aligned} \|\mathbf{X}\_{1}\|^{2} &= \|A\_{1}^{\dagger}\mathbf{C}\_{1} + L\_{A\_{1}}\mathbf{C}\_{2}\mathbf{B}\_{1}^{\dagger} + A^{\dagger}\mathbf{E}\mathbf{B}^{\dagger} - A^{\dagger}A\_{4}\mathbf{M}^{\dagger}\mathbf{E}\mathbf{B}^{\dagger} - \\ &- A^{\dagger}\mathbf{S}\mathbf{C}^{\dagger}\mathbf{E}\mathbf{N}^{\dagger}\mathbf{B}\_{4}\mathbf{B}^{\dagger} - H\_{1}V\_{1}R\_{B\_{1}} + H\_{2}\mathbf{U}\mathbf{H}\_{3} + L\_{A\_{1}}V\_{2}R\_{B}R\_{B\_{1}}\|\|^{2} \\ &= \|A\_{1}^{\dagger}\mathbf{C}\_{1} + L\_{A\_{1}}\mathbf{C}\_{2}\mathbf{B}\_{1}^{\dagger} + A^{\dagger}\mathbf{E}\mathbf{B}^{\dagger} - A^{\dagger}A\_{4}\mathbf{M}^{\dagger}\mathbf{E}\mathbf{B}^{\dagger} - A^{\dagger}\mathbf{S}\mathbf{C}^{\dagger}\mathbf{E}\mathbf{N}^{\dagger}\mathbf{B}\_{4}\mathbf{B}^{\dagger}\|\|^{2} \\ &+ \|H\_{1}V\_{1}R\_{B\_{1}} + H\_{2}\mathbf{U}\mathbf{H}\_{3} + L\_{A\_{1}}V\_{2}R\_{B}R\_{B\_{1}}\|\|^{2} + J, \end{aligned} \tag{31}$$

*Solving and Algorithm for Least-Norm General Solution to Constrained Sylvester Matrix… DOI: http://dx.doi.org/10.5772/intechopen.109749*

where

$$J = 2\operatorname{Re}\left[\operatorname{tr}\left( (H\_1 V\_1 R\_{B\_1} + H\_2 U H\_3 + L\_{A\_1} V\_2 R\_B R\_{B\_1}) \right)^\* \right. \tag{32}$$

$$\left( A\_1^\dagger C\_1 + L\_{A\_1} C\_2 B\_1^\dagger + A^\dagger E B^\dagger - A^\dagger A\_4 M^\dagger E B^\dagger - A^\dagger S C^\dagger E N^\dagger B\_4 B^\dagger \right) \vert\_{\,} \tag{32}$$

Now we want to show that *J* ¼ 0. Applying Lemmas 1.3, 1.4 and 2.7, we have

$$\begin{aligned} \operatorname{Re}\left[\operatorname{tr}\left(\left(\operatorname{H}\_{1}V\_{1}R\_{R,i}\right)^{\ast}\left(A\_{1}^{\ast}C\_{1}+L\_{A,i}C\_{2}B\_{1}^{\ast}+A^{\ast}EB^{\ast}-A^{\ast}A\_{A}M^{\prime}EB^{\dagger}\right.\\ \quad \quad \quad - A^{\ast}SC^{\ast}EN^{\sharp}B\_{A}B^{\dagger}\right)] &= \operatorname{Re}\left[\operatorname{tr}\left(R\_{B,i}V\_{1}^{\ast}A\_{1}^{\ast}C\_{1}+L\_{A,i}C\_{2}B\_{1}^{\ast}+A^{\ast}EB^{\ast}\right.\\ \quad \quad \quad \quad - A^{\ast}A\_{A}M^{\prime}EB^{\sharp}-A^{\ast}SC^{\ast}EN^{\sharp}B\_{A}B^{\dagger}\right)] &= \operatorname{Re}\left[\operatorname{tr}\left(R\_{B,i}V\_{1}^{\ast}L\_{A}L\_{A,i}\left(A\_{1}^{\ast}C\_{1}\right)^{\ast}\right.\\ \quad \quad \quad + L\_{A,i}C\_{2}B\_{1}^{\ast}+A^{\ast}EB^{\sharp}-A^{\ast}A\_{A}M^{\prime}EB^{\sharp}-A^{\ast}SC^{\ast}EN^{\sharp}B\_{A}B^{\dagger}\right)] \\ &= \operatorname{Re}\left[\operatorname{tr}\left(R\_{B,i}V\_{1}^{\ast}L\_{A}L\_{A,i}\left(L\_{A,i}C\_{2}B\_{1}^{\ast}\right)\right)\right] \\ &= \operatorname{Re}\left[\operatorname{tr}\left(V\_{1}^{\ast}L\_{A}L\_{A,i}\left(L\_{A,i}C\_{2}B\_{1}^{\ast}\right)\right)\right] = 0,\\ &\quad \operatorname{Re}\left[\operatorname{tr}\left(\left(L\_{A,i}V\_{2}R\_{B}R\_{B\_{1}}\right)^{\ast}\left(A\_{1}^{$$

*Re* tr ð Þ *<sup>H</sup>*2*UH*<sup>3</sup> <sup>∗</sup> *<sup>A</sup>*† <sup>1</sup>*C*<sup>1</sup> <sup>þ</sup> *LA*1*C*2*B*† <sup>1</sup> <sup>þ</sup> *<sup>A</sup>*† *EB*† � *<sup>A</sup>*† *A*4*M*† *EB*† � � � �*A*† *SC*† *EN*† *B*4*B*† ÞÞ� ¼ *Re* tr *<sup>H</sup>*<sup>∗</sup> <sup>3</sup> *U* <sup>∗</sup> *H*<sup>∗</sup> <sup>2</sup> *<sup>A</sup>*† <sup>1</sup>*C*<sup>1</sup> <sup>þ</sup> *LA*1*C*2*B*† <sup>1</sup> <sup>þ</sup> *<sup>A</sup>*† *EB*† � � � �*A*† *A*4*M*† *EB*† � *<sup>A</sup>*† *SC*† *EN*†*B*4*B*†ÞÞ� ¼ *Re* tr *<sup>H</sup>*<sup>∗</sup> <sup>3</sup> *U* <sup>∗</sup> *LGS* <sup>∗</sup> <sup>1</sup> *LA*<sup>1</sup> *LA*1*C*2*B*† 1 � � � <sup>þ</sup>*A*† *EB*† � *<sup>A</sup>*† *A*4*M*† *EB*† � *<sup>A</sup>*† *SC*† *EN*† *B*4*B*† ÞÞ� <sup>¼</sup> *Re* tr *<sup>H</sup>*<sup>∗</sup> <sup>3</sup> *<sup>U</sup>* <sup>∗</sup> *LA* �*A*† *CLM* 0 *LM* " # *I* 0 � � *LA*1*C*2*B*† <sup>1</sup> <sup>þ</sup> *<sup>A</sup>*† *EB*† � " �*A*† *A*4*M*† *EB*† � *<sup>A</sup>*† *SC*† *EN*† *B*4*B*† Þ !# <sup>¼</sup> *Re* tr *<sup>H</sup>*<sup>∗</sup> <sup>3</sup> *<sup>U</sup>* <sup>∗</sup> *LA <sup>A</sup>*† *EB*† � � � �*A*† *<sup>A</sup>*4*M*†*EB*† � *<sup>A</sup>*† *SC*† *EN*† *B*4*B*† ÞÞ� ¼ *Re* tr *<sup>H</sup>*<sup>∗</sup> <sup>3</sup> *<sup>U</sup>* <sup>∗</sup> *LALA*1*C*2*B*† 1 � � � � <sup>¼</sup> *Re* tr *RB*1*<sup>T</sup>* <sup>∗</sup> <sup>1</sup> *RHU* <sup>∗</sup> *LALA*1*C*2*B*† 1 � � � � (35)

By using (33)–(35) in (32) produces *J* ¼ 0. Since *X*<sup>1</sup> is arbitrary, we get (29) from (31). In the same way, we can prove that (30) hold. □ A special case of our system (7) is given below.

If *B*1, *B*2,*C*2, and *C*<sup>4</sup> become zero matrices in Theorem 2.8, then again we get the principal result of [20].

**Corollary 2.9.** *Assume that system* (6) *is solvable, then the least-norm of the solution pair X*<sup>1</sup> *and X*<sup>2</sup> *to system* (6) *can be furnished as*

$$\begin{aligned} \|X\_1\|\_{\min} &= A\_1^\dagger C\_1 + A^\dagger E B\_3^\dagger - A^\dagger A\_4 M^\dagger E B\_3^\dagger - A^\dagger S C^\dagger E N^\dagger B\_4 B\_3^\dagger, \\ \|X\_2\|\_{\min} &= A\_2^\dagger C\_3 + M^\dagger E B\_4^\dagger + S^\dagger S C^\dagger E N^\dagger. \end{aligned}$$

If *A*2, *B*2,*C*3, *A*4, *B*<sup>4</sup> and *C*<sup>4</sup> vanish in our system, then we get the next consequence. **Corollary 2.10.** *Suppose that A*1, *B*1,*C*1,*C*2, *A*3, *B*<sup>3</sup> *and Cc are given. Then the leastnorm of the least square solution to system* (5) *is launched by*

$$\begin{aligned} \|X\_1\|\_{\min} &= A\_1^\dagger C\_1 + L\_{A\_1} C\_2 B\_1^\dagger \\ &+ (A\_3 L\_{A\_1})^\dagger \left[ C\_c - A\_3 A\_1^\dagger C\_1 B\_3 - A\_3 L\_{A\_1} C\_2 B\_1^\dagger B\_3 \right] (R\_{B\_1} B\_3)^\dagger \dots \end{aligned}$$

*Comment* 2.11. Corollary 2.10 is the key result of [22].

#### **3. Algorithm with example**

In this section, we construct the algorithm for finding the least-norm of the solution to (7) that is inducted by Theorem 2.8. *Algorithm 1.*


The following example will be considered by using Algorithm 1. Note that our goal is both to confirm correctness of main results from Theorems 2.1 and 2.8, and to demonstrate the technique of applying the determinantal representations of the MP-inverse from Lemma 1.10 by using a not too complicated and understandable example.

*Example* 1. Given the matrices:

$$A\_{1} = \begin{bmatrix} 1+i & 1-i & -1+i & -1-i \\ -1+i & 1+i & -1-i & 1-i \\ 2i & 2 & -2 & -2i \end{bmatrix}, B\_{1} = \begin{bmatrix} 2i & -1 & i+3 \\ -i & 1 & -3-i \\ -1 & i & 1-3i \\ 1 & -i & -1+3i \end{bmatrix}, A\_{2} = \begin{bmatrix} i & 1 & -1 \\ 1 & -i & i \\ -1 & i & -i \\ -i & -1 & 1 \end{bmatrix},$$

*Solving and Algorithm for Least-Norm General Solution to Constrained Sylvester Matrix… DOI: http://dx.doi.org/10.5772/intechopen.109749*

$$\begin{aligned} &B\_{2}=\begin{bmatrix}2-i&2i&-1&i+1\\2i&1+1&-2&i-1\\2i+1&i&2-2&-i-1\\-i+2&-2i&-1&-i\\i+2&-2i&-1&-i\\2&2i&2+88&88i\\-2i&2i&-2i&-88i\\-2i&-2i&-88&-88i\end{bmatrix},A\_{3}=\begin{bmatrix}8i&2&5i-2&-25&2i+5\\24&4i&-42&2i-54&2i+5\\24&4i&-42&-25&-2+5i\\44&4&-4i&-4\\44&4&-4i&-4\end{bmatrix},\\ &B\_{3}=\begin{bmatrix}-2&-i&2&-1\\-i&-2&-2i&2i\\-2&-2i&-2i&2\\-2&-4&2i&-2\\-2&-4&-2i\end{bmatrix},A\_{4}=\begin{bmatrix}-2i-3&-3i+2&2i+3\\-i&1&i\\-i&1&i\\-3&-3i&3\end{bmatrix},\\ &C\_{5}=\begin{bmatrix}3&3&-3&3i\\3&3&-3&3i\\-3&3&3&3\\-3&3&3&3\end{bmatrix},\\ &B\_{4}=\begin{bmatrix}7i&-i&2\\-i&-3&3&3\\-7&i&2&2\end{bmatrix},C\_{6}=\begin{bmatrix}4-2i&-2+4i&2+2i\\2+4i&-4-2i&-2+2i\\-2-4i&4+2i&2-2i\end{bmatrix},\\ &C\_{7}=\begin{bmatrix}7i&-i&2\\-7&i&2\\7&i&2\end{bmatrix},C\_{8}=\begin{bmatrix}4-2i&-2+4i&2+2i\\2+4i&-4-2i&-2+2i\\-2-4i&4+2i&2-2i\\-2-4i&4+2i&2-1\\-15-9&-81$$

Let us find a solution to the system (7) with the given above matrices by Algorithm 1.

1.Thanks to Lemma 1.10, we calculate the Moore-Penrose inverses. So,

$$\begin{split} A\_{1}^{\dagger} &= \frac{1}{32} \begin{bmatrix} 1-i & -1-i & -2i \\ 1+i & 1-i & 2 \\ -1-i & -1+i & -2 \\ -1+i & 1+i & 2i \end{bmatrix}, B\_{1}^{\dagger} = \frac{1}{44} \begin{bmatrix} -11i & 11i & -11 & 11 \\ 39 & 41 & 20-i & 20+i \\ 9 & 11+i & 5+3i & 3-3i \end{bmatrix}, \\\ A\_{2}^{\dagger} &= \frac{1}{12} \begin{bmatrix} -i & 1 & -1 & i \\ 1 & i & -i & -1 \\ 1 & i & -i & -1 \\ -1 & -i & i & 1 \end{bmatrix}, B\_{2}^{\dagger} = \frac{1}{12} \begin{bmatrix} 1 & -i & i & 1 \\ -i & -1 & i & 1 \\ -i & -1 & -1 & i \\ 1-i & -1-i & -1+i & 1+i \end{bmatrix}, \\\ A\_{3}^{\dagger} &= \frac{1}{80} \begin{bmatrix} -2i & -2 & 2-5i \\ 2 & -2i & 5+2i \\ -2i & -2 & 2+5i \\ 2 & -2i & -5+2i \end{bmatrix}, B\_{3}^{\dagger} = \frac{1}{70} \begin{bmatrix} i & -2 & 2i & 1 \\ 2+i & -2+4i & 4+2i & 1-2i \\ -1 & -2i & -2 & i \end{bmatrix}, \end{split}$$

$$A\_4^\dagger = \frac{1}{69} \begin{bmatrix} -3+2i & i & -3 \\ 2+3i & 1 & 3i \\ 3-2i & -i & 1 \end{bmatrix}, B\_4^\dagger = \frac{1}{792} \begin{bmatrix} -35i & -21 & 35i & 21 \\ 47i & -51 & -47i & 51 \\ -52 & -48i & 52 & 48i \end{bmatrix}.$$

Then,

*<sup>A</sup>* <sup>¼</sup> <sup>1</sup> þ 5*i* 5 � 2*i* 1 þ 8*i* 12 þ 9*i* �5 þ 2*i* 2 þ 5*i* �8 þ *i* �9 þ 12*i i* 4 4 � 8*i* �8 þ 4*i* , *<sup>B</sup>* <sup>¼</sup> <sup>1</sup> �52 � 31*i* 10 � 135*i* �31 þ 52*i* þ 9*i* �10 þ 25*i* 9 � 8*i* �9 þ 8*i* �25 � 10*i* 8 þ 9*i* � 52*i* 135 þ 10*i* �52 � 31*i* , *<sup>C</sup>* <sup>¼</sup> <sup>1</sup> �11 � 3*i* 9 � 7*i* 6 þ 4*i* �1 � 3*i* 3 þ *i* 2*i* �9 þ 3 3 � 9*i* 6 , *<sup>D</sup>* <sup>¼</sup> �2*i* �2 �2 2*i* 0 2*i* 2 0 2 �2*i* , *<sup>N</sup>* <sup>¼</sup> <sup>1</sup> þ 4*i* �4 � 2*i* �10 � 4*i* � 4*i* �2 þ 4*i* �4 þ 10*i* �4 � 4*i* 4 þ 2*i* 10 þ 4*i* �4 þ 4*i* 2 � 4*i* 4 � 10*i* , *<sup>M</sup>* <sup>¼</sup> <sup>1</sup> �4 � 2*i* 4 � 2*i* 2 þ 2*i* �2 þ 4*i* �2 � 4*i* 2 � 2*i* 0 00 , *<sup>S</sup>* <sup>¼</sup> <sup>0</sup> *<sup>E</sup>* <sup>¼</sup> <sup>1</sup> � 108289*i* 236509 � 68427*i* �108289 � 19931*i* þ 16211*i* 77995 þ 79015*i* 16211 � 110417*i* þ 106424*i* �138224 þ 255672*i* 106424 � 74624*i :*


$$\begin{split} X\_{1} &= \frac{1}{365\%60} \begin{bmatrix} -11103239 + 18670945i & -9851419 + 14002307i & -5154373 + 3862099i & -4697953 + 10234959i \\ 26688873 + 4258681i & 29888893 + 5519501i & 12048461 + 4721147i & 17746081 + \ $1779697i \\ 6556168 + 965606i & \$ 321848 + 219634i & 4452796 + 10360112i & -6737414 + 7846532i \\ -17049264 - 2930378i & -26304464 - 11113378i & -10244698 - 3367816i & -7362609 - 1372029ii \\ \end{bmatrix}, \\\ X\_{2} &= \frac{1}{1344} \begin{bmatrix} 2052 - 963i & 233 - 1985i & -2159 + 3481i & -1465 - 367i \\ -792 + 2656i & 1901 - 205i & 317 + 445i & -221 + 317i \\ 171 + 585i & -146 + 28i & 868 - 1714i & 146 + 2884i \end{bmatrix} \end{split}$$

Note that Maple 2021 was used to perform the numerical experiment.

*Solving and Algorithm for Least-Norm General Solution to Constrained Sylvester Matrix… DOI: http://dx.doi.org/10.5772/intechopen.109749*

#### **4. Conclusion**

We have constructed a novel expression of the general solution to system (7) over and used this result to explore the least-norm of the general solution to this system when it is solvable. Some particular cases of our system are also discussed. Our results carry the principal results of [22, 32]. To give an algorithm finding the explicit numerical expression of the least-norm of the general solution, it is used the determinantal representations of the MP-inverse recently obtained by one of the authors. The novelty of the conducted research is obtaining necessary and sufficient conditions to exist a solution, its formal representation of by closed formula in terms of generalized inverses, and the construction of an algorithm to find its explicit expression. A numerical example is also given to interpret the results established in this paper.

### **Conflict of interest**

The authors declare that they have not conflicts of interest.

#### **Data Availability**

The data used to support the findings of this study are included within the article titled "Solving and Algorithm for Least-Norm General Solution to Constrained Sylvester Matrix Equation". The prior studies (and datasets) are cited at relevant places within the text.

### **Classification**

**2000 AMS subject classifications:** 15A09, 15A15, 15A24.

#### **Author details**

Abdur Rehman<sup>1</sup> and Ivan I. Kyrchei<sup>2</sup> \*

1 University of Engineering and Technology, Lahore, Pakistan

2 Pidstrygach Institute for Applied Problems of Mechanics and Mathematics, NAS of Ukraine, Lviv, Ukraine

\*Address all correspondence to: ivankyrchei26@gmail.com

© 2023 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **References**

[1] Shahzad A, Jones BL, Kerrigan EC, Constantinides GA. An efficient algorithm for the solution of a coupled Sylvester equation appearing in descriptor systems. Automatica. 2011;**47**: 244-248. DOI: 10.1016/j.automatica. 2010.10.038

[2] Syrmos VL, Lewis FL. Coupled and constrained Sylvester equations in system design. Circuits, Systems, and Signal Processing. 1994;**13**(6):663-694. DOI: 10.1007/BF02523122

[3] Varga A. Robust pole assignment via Sylvester equation based state feedback parametrization: Computer-aided control system design (CACSD). IEEE International Symposium. 2000;**57**: 13-18. DOI: 10.1109/CACSD.2000. 900179

[4] Syrmos VL, Lewis FL. Output feedback eigenstructure assignment using two Sylvester equations. IEEE Transaction on Automatic Control. 1993;**38**:495-499. DOI: 10.1109/ 9.210155

[5] Li RC. A bound on the solution to a structured Sylvester equation with an application to relative perturbation theory. SIAM Journal on Matrix Analysis and Application. 1999;**21**(2): 440-445

[6] Darouach M. Solution to Sylvester equation associated to linear descriptor systems. Systems and Control Letters. 2006;**55**:835-838. DOI: 10.1016/j. sysconle.2006.04.004

[7] Zhang YN, Jiang DC, Wang J. A recurrent neural network for solving Sylvester equation with time-varying coefficients. IEEE Transcation on Neural Networks. 2002;**13**(5):1053-1063. DOI: 10.1109/TNN.2002.1031938

[8] Terán FD, Dopico FM. The solution of the equation *XA* <sup>þ</sup> *AX<sup>T</sup>* <sup>¼</sup> 0 and its application to the theory of orbits. Linear Algebra and its Application. 2011; **434**:44-67. DOI: 10.1016/j.laa.2010. 08.005

[9] Dehghan M, Hajarian M. An efficient iterative method for solving the secondorder Sylvester matrix equation *EVF*<sup>2</sup> � *AVF* � *CV* <sup>¼</sup> *BW*. IET Control Theory and Applications. 2009;**3**: 1401-1408. DOI: 10.1049/iet-cta.2008. 0450

[10] Ding F, Chen T. Gradient based iterative algorithms for solving a class of matrix equations. IEEE Transaction on Automatic Control. 2005;**50**(8): 1216-1221. DOI: 10.1109/TAC.2005. 852558

[11] Dmytryshyn A, Futorny V, Klymchuk T, Sergeichuk VV. Generalization of Roth's solvability criteria to systems of matrix equations. Linear Algebra and its Application. 2017; **527**:294-302. DOI: 10.1016/j.laa.2017. 04.011

[12] He ZH, Wang QWA. Real quaternion matrix equation with applications. Linear and Multilinear Algebra. 2013;**61**(6):725-740. DOI: 10.1080/03081087.2012. 703192

[13] He ZH, Wang QW. The *η*bihermitian solution to a system of real quaternion matrix equation. Linear and Multilinear Algebra. 2014;**62**(11): 1509-1528. DOI: 10.1080/03081087. 2013.839667

[14] Kyrchei II. Explicit representation formulas for the minimum norm least squares solutions of some quaternion

*Solving and Algorithm for Least-Norm General Solution to Constrained Sylvester Matrix… DOI: http://dx.doi.org/10.5772/intechopen.109749*

matrix equations. Linear Algebra and its Applications. 2013;**438**(1):136-152. DOI: 10.1016/j.laa.2012.07.049

[15] Kyrchei II. Explicit determinantal representation formulas for the solution of the two-sided restricted quaternionic matrix equation. Journal of Applied Mathematics and Computing. 2018;**58** (1–2):335-365. DOI: 10.1007/s12190-017- 1148-6

[16] Rehman A, Wang QW. A system of matrix equations with five variables. Applied Mathematics and Computation. 2015;**271**:805-819. DOI: 10.1016/j.amc. 2015.09.066

[17] Rehman A, Wang QW, Ali I, Akram M, Ahmad MO. A constraint system of generalized Sylvester quaternion matrix equations. Adv. Appl. Clifford Algebr. 2017;**27**(4): 3183-3196. DOI: 10.1007/s00006-017- 0803-1

[18] Rehman A, Wang QW, He ZH. Solution to a system of real quaternion matrix equations encompassing *η*-Hermicity. Applied Mathematics and Computation. 2015;**265**:945-957. DOI: 10.1016/j.amc.2015.05.104

[19] Rehman A, Akram M. Optimization of a nonlinear hermitian matrix expression with application. Univerzitet u Nišu. 2017;**31**(9):2805-2819. DOI: 10.2298/FIL1709805R

[20] Wang QW, Qin F, Lin CY. The common solution to matrix equations over a regular ring with applications. Indian Journal of Pure and Applied Mathematics. 2005;**36**(12):655-672

[21] Wang QW, Rehman A, He ZH, Zhang Y. Constraint generalized Sylvester matrix equations. Automatica. 2016;**69**:60-64. DOI: 10.1016/j.auto matica.2016.02.024

[22] Bao Y. Least-norm and extremal ranks of the Least Square solution to the quaternion matrix equation *AXB* ¼ *C* subject to two equations. Algebra Colloq. 2014;**21**(3):449-460. DOI: 10.1142/ S100538671400039X

[23] Wang QW, Chang HX, Lin CY. P- (skew)symmetric common solutions to a pair of quaternion matrix equations. Applied Mathematics and Computation. 2008;**195**:721-732. DOI: 10.1016/j. amc.2007.05.021

[24] Li H, Gao Z, Zhao D. Least squares solutions of the matrix equation *AXB* þ *CYD* ¼ *E* with the least norm for symmetric arrowhead matrices. Applied Mathematics and Computation. 2014; **226**:719-724. DOI: 10.1016/j.amc.2013. 10.065

[25] Wang QW, van der Woude JW, Chang HX. A system of real quaternion matrix equations with applications. Linear Algebra and its Application. 2009; **431**(12):2291-2303. DOI: 10.1016/j. laa.2009.02.010

[26] Peng YG, Wang X. A finite iterative algorithm for solving the least-norm generalized ð Þ *P*, *Q* reflexive solution of the matrix equations *AiXBi* ¼ *Ci*. Journal of Computational Analysis and Applications. 2014;**17**(3):547-561

[27] Yuan S, Liao A. Least squares Hermitian solution of the complex matrix equation *AXB* þ *CXD* ¼ *E* with the least norm. Journal of Franklin Institute. 2014;**351**(11): 4978-4997. DOI: 10.1016/j.jfranklin. 2014.08.003

[28] Trench WF. Minimization problems for ð Þ *R*, *S* -symmetric and ð Þ *R*, *S* -skew symmetric matrices. Linear Algebra and its Applications. 2004;**389**:23-31. DOI: 10.1016/j.laa.2004.03.035

[29] Trench WF. Characterization and properties of matrices with generalized symmetry or skew symmetry. Linear Algebra and its Applications. 2004;**377**: 207-218. DOI: 10.1016/j.laa.2003.07.013

[30] Trench WF. Characterization and properties of (*R*,*S*)-symmetric, (*R*,*S*) skew symmetric and (*R*,*S*)-conjugate matrices. SIAM Journal on Matrix Analysis and Application. 2005;**26**: 748-757. DOI: 10.1137/ S089547980343134X

[31] Marsaglia G, Styan GPH. Equalities and inequalities for ranks of matrices. Linear and Multilinear Algebra. 1974;**2**: 269-292. DOI: 10.1080/ 03081087408817070

[32] Wang QW, Li CK. Ranks and the least-norm of the general solution to a system of quaternion matrix equations. Linear Algebra and its Application. 2009; **430**:1626-1640. DOI: 10.1016/j. laa.2008.05.031

[33] Wang QW, Chang HX, Ning Q. The common solution to six quaternion matrix equations with applications. Applied Mathematics and Computation. 2008;**198**:209-226. DOI: 10.1016/j.amc. 2007.08.091

[34] Tian Y. Solvability of two linear matrix equations. Linear and Multilinear Algebra. 2000;**48**:123-147. DOI: 10.1080/03081080008818664

[35] Wang QW, Wu ZC, Lin CY. Extremal ranks of a quaternion matrix expression subject to consistent systems of quaternion matrix equations with applications. Applied Mathematics and Computation. 2006;**182**:1755-1764. DOI: 10.1016/j.amc.2006.06.012

[36] Wang QW. A system of matrix equations and a linear matrix equation over arbitrary regular rings with

identity. Linear Algebra and its Application. 2004;**384**:43-54. DOI: 10.1016/j.laa.2003.12.039

[37] Wensheng C. Solvability of a quaternion matrix equation. Applied Mathematics, Journal of Chinese Universities, Serie B. 2002;**17**(4): 490-498. DOI: 10.1007/s11766-996- 0015-2

[38] Artidiello S, Cordero A, Torregrosa JR, Vassileva MP. Generalized inverses estimations by means of iterative methods with memory. Mathematics. 2019;**8**:2. DOI: 10.3390/math8010002

[39] Guo W, Huang T. Method of elementary transformation to compute Moore–Penrose inverse. Applied Mathematics and Computation. 2010; **216**:1614-1617. DOI: 10.1016/j. amc.2010.03.016

[40] Sayevand K, Pourdarvish A, Machado JAT, Erfanifar R. On the calculation of the Moore-Penrose and Drazin inverses: Application to fractional calculus. Mathematics. 2021;**9**:2501. DOI: 10.3390/math9192501

[41] Bapat RB, Bhaskara KPS, Prasad KM. Generalized inverses over integral domains. Linear Algebra and its Applications. 1990;**140**:181-196. DOI: 10.1016/0024-3795(90)90229-6

[42] Stanimirovic PS. General determinantal representation of pseudoinverses of matrices. Matematichki Vesnik. 1996;**48**:1-9

[43] Kyrchei II. Analogs of the adjoint matrix for generalized inverses and corresponding Cramer rules. Linear and Multilinear Algebra. 2008;**56**(4): 453-469. DOI: 10.1080/0308108 0701352856

*Solving and Algorithm for Least-Norm General Solution to Constrained Sylvester Matrix… DOI: http://dx.doi.org/10.5772/intechopen.109749*

[44] Kyrchei II. Determinantal representation of the Moore-Penrose inverse matrix over the quaternion skew field. Journal of Mathematical Sciences. 2012;**108**(1):23-33. DOI: 10.1007/ s10958-011-0626-x

[45] Kyrchei II. Determinantal representations of the Drazin and Wweighted Drazin inverses over the quaternion skew field with applications. In: Griffin S, editor. Quaternions: Theory and Applications. New York: Nova Sci Publ; 2017. pp. 201-275

[46] Kyrchei II. Determinantal representations of the quaternion weighted Moore-Penrose inverse and its applications. In: Baswell AR, editor. Advances in Mathematics Research 23. New York: Nova Sci Publ; 2017. pp. 35-96

[47] Kyrchei II. Cramer's rule for generalized inverse solutions. In: Kyrchei II, editor. Advances in Linear Algebra Research. New York: Nova Sci Publ; 2015. pp. 79-132

[48] Kyrchei II. Analogs of Cramer's rule for the minimum norm least squares solutions of some matrix equations. Applied Mathematics and Computation. 2012;**218**(11):6375-6384. DOI: 10.1016/ j.amc.2011.12.004

[49] Kyrchei II. Determinantal representations of solutions and hermitian solutions to some system of two-sided quaternion matrix equations. Journal of Mathematics. 2018; **2018**:6294672. DOI: 10.1155/2018/ 6294672

[50] Kyrchei II. Cramer's rules of *η*-(skew-)Hermitian solutions to the quaternion Sylvester-type matrix equations. Adv. Appl. Clifford Algebr. 2019;**29**(3):56. DOI: 10.1007/ s00006-019-0972-1

[51] Kyrchei II. Determinantal representations of solutions to systems of two-sided quaternion matrix equations. Linear and Multilinear Algebra. 2021;**69**(4):648-672. DOI: 10.1080/03081087.2019.1614517

[52] Rehman A, Kyrchei II, Ali I, Akram M, Shakoor A. The general solution of quaternion matrix equation having *η*-skew-Hermicity and its Cramer's rule. Mathematical Problems in Engineering. 2018;**2018**:7939238. DOI: 10.1155/2019/7939238

[53] Rehman A, Kyrchei II, Ali I, Akram M, Shakoor A. Explicit formulas and determinantal representation for *η*-skew-Hermitian solution to a system of quaternion matrix equations. Univerzitet u Nišu. 2020;**34**(8):2601-2627. DOI: 10.2298/FIL2008601R

[54] Rehman A, Kyrchei II, Ali I, Akram M, Shakoor A. Constraint solution of a classical system of quaternion matrix equations and its Cramer's rule. Iranian Journal of Science and Technology, Transactions A: Science. 2021;**45**(3):1015-1024. DOI: 10.1007/s40995-021-01083-7

[55] Tian Y. Equalities and inequalities for traces of quaternionic matrices. Algebras Groups Geometry. 2002;**19**(2): 181-193

### *Edited by Ivan I. Kyrchei*

*Inverse Problems - Recent Advances and Applications* examines some recent advances in inverse problems, new aspects of their mathematical modeling of inverse problems regarding in relation to their applications in physical systems and used the computational methods used. It consists of five chapters divided into two sections. Section 1, "Modeling and Formulations of Inverse Problems", discusses new approaches to modeling and formulations of inverse problems in some physical systems. Section 2, "Some Computational Aspects", contains research related to mathematical methods of solving some inverse problems.

Published in London, UK © 2023 IntechOpen © kh\_art / iStock

Inverse Problems - Recent Advances and Applications

Inverse Problems

Recent Advances and Applications

*Edited by Ivan I. Kyrchei*