2. Linear algebra and objective functions

Linear algebra has two basic problems. A solution of linear equations system is one of them; the other is the eigendecomposition. In this chapter, we will use both of them upon a linear equation system as a combined form (Eqs. (8)–(11)) in which we will solve the linear equations system by means of the singular value decomposition related with the eigendecomposition (or the matrix diagonalization) [8, 9, 13, 14].

Suppose an estimated unknown vector <sup>b</sup>x<sup>u</sup> <sup>¼</sup> <sup>x</sup> <sup>þ</sup> <sup>δ</sup><sup>b</sup> (in interested model) and an experimental data (or observations which are stochastic variables) vector <sup>y</sup><sup>n</sup> <sup>¼</sup> <sup>b</sup><sup>y</sup> � <sup>b</sup><sup>ε</sup> [in which an estimated data and error (residual) vectors are in order of <sup>b</sup><sup>y</sup> and <sup>b</sup>ε] by an objective function and their covariance matrices <sup>Σ</sup>x^ <sup>¼</sup> <sup>Σ</sup><sup>x</sup> <sup>¼</sup> <sup>σ</sup>b<sup>2</sup> <sup>0</sup> Qx (for the unknowns) and <sup>Σ</sup><sup>y</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup> <sup>0</sup> P�<sup>1</sup> (for the data), respectively, with a priori variance σ<sup>2</sup> <sup>0</sup> and a posteriori variance <sup>σ</sup>b<sup>2</sup> <sup>0</sup>. Note that <sup>b</sup><sup>x</sup> is a nonstochastic vector before estimation, where an approximated values vector is <sup>x</sup> for <sup>b</sup><sup>x</sup> (hat-sign "^" shows an estimated value for interested parameter according to an objective function). In addition, n, m and u are the observation number, the equation number and the unknown number, respectively.

Start with a linear or nonlinear functions vector <sup>f</sup>mð Þ¼ <sup>b</sup>y; <sup>b</sup><sup>x</sup> <sup>0</sup>, we can have a linear mathematical model with a weight matrix (<sup>P</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup> <sup>0</sup> Σ<sup>y</sup> �1 ) of the observations for m ¼ n:

$$\mathbf{e}\_{n} = \mathbf{A}\_{n,u} \cdot \mathbf{\delta}\_{u} - \mathbf{l}\_{n\prime} \cdot \mathbf{P}\_{n,n\prime} \tag{1}$$

$$\mathbf{A}\_{\mathfrak{n},\mathfrak{u}} = \frac{\partial \mathbf{f}(\widehat{\mathbf{y}}, \widehat{\mathbf{x}})}{\partial \widehat{\mathbf{x}}} \Big|\_{\widehat{\mathbf{y}}, \widehat{\mathbf{x}} = \mathbf{y}, \mathbf{x}} \text{ and } \mathbf{l}\_{\mathfrak{n}} = \mathbf{f}(\mathbf{y}, \mathbf{x}).$$

Mathematical model between data and unknowns can be established by Taylor expansion for any model. However, if <sup>m</sup> pieces function vector <sup>f</sup>mð Þ¼ <sup>b</sup>y; <sup>b</sup><sup>x</sup> <sup>0</sup> is not transformed into <sup>b</sup>y<sup>n</sup> � <sup>f</sup>nð Þ¼ <sup>b</sup><sup>x</sup> <sup>0</sup> (for <sup>m</sup> <sup>¼</sup> <sup>n</sup>), the error in variable solution as in total least squares (TLS) method can be preferred. Therefore, <sup>f</sup>mð Þ¼ <sup>b</sup>y; <sup>b</sup><sup>x</sup> <sup>0</sup> (for <sup>m</sup> 6¼ <sup>n</sup>) should be differenced as following:

$$\mathbf{B}\_{m,n} \cdot \varepsilon\_n - \mathbf{A}\_{m,u} \cdot \mathbf{S}\_u + \mathbf{l}\_m = \mathbf{0} \qquad \mathbf{P}\_{n,n} \tag{2}$$

where

If the relations are nonlinear, they should be linearized via Taylor expansion [1–7]. Therefore, the

To overcome complicated real-life problems whose mathematical models are not known, the soft computing techniques have been developed in the last decades. We can count well-known techniques, some as artificial neural network (ANN), artificial intelligence (AI), machine learning (ML), deep learning (DP), fuzzy logic (FL) and genetic algorithms (GA) [16–18]. The techniques inspired by the human intelligence and learning processes can be very timeconsuming according to the data given in run due to their processing based on the trial-anderror method. If these techniques are roughly defined, data (experimental outcomes and observations) are separated into two parts in them, learning (or training) data and test data. Mathematical (functional and/or stochastic) relations between data and model parameters are learned from the learning data. The handled model is tested by means of the test data. After that, the trained and developed model, if meets expectations, is used to estimate for producing

In the soft computing techniques, the linear algebra is also a very effective tool to solve the problem as in the hard computing ones. For this reason, we should take a short overview on

Linear algebra has two basic problems. A solution of linear equations system is one of them; the other is the eigendecomposition. In this chapter, we will use both of them upon a linear equation system as a combined form (Eqs. (8)–(11)) in which we will solve the linear equations system by means of the singular value decomposition related with the eigendecomposition (or

Suppose an estimated unknown vector <sup>b</sup>x<sup>u</sup> <sup>¼</sup> <sup>x</sup> <sup>þ</sup> <sup>δ</sup><sup>b</sup> (in interested model) and an experimental data (or observations which are stochastic variables) vector <sup>y</sup><sup>n</sup> <sup>¼</sup> <sup>b</sup><sup>y</sup> � <sup>b</sup><sup>ε</sup> [in which an estimated data and error (residual) vectors are in order of <sup>b</sup><sup>y</sup> and <sup>b</sup>ε] by an objective function and their

stochastic vector before estimation, where an approximated values vector is <sup>x</sup> for <sup>b</sup><sup>x</sup> (hat-sign "^" shows an estimated value for interested parameter according to an objective function). In addition, n, m and u are the observation number, the equation number and the unknown

Start with a linear or nonlinear functions vector <sup>f</sup>mð Þ¼ <sup>b</sup>y; <sup>b</sup><sup>x</sup> <sup>0</sup>, we can have a linear mathemat-

<sup>0</sup> Σ<sup>y</sup> �1

> � � � � y^, x^¼y, x

<sup>A</sup>n,u <sup>¼</sup> <sup>∂</sup>fð Þ <sup>b</sup>y; <sup>b</sup><sup>x</sup> ∂bx

<sup>0</sup> Qx (for the unknowns) and <sup>Σ</sup><sup>y</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup>

<sup>0</sup> and a posteriori variance <sup>σ</sup>b<sup>2</sup>

) of the observations for m ¼ n:

and l<sup>n</sup> ¼ f yð Þ ; x :

ε<sup>n</sup> ¼ An,u δ<sup>u</sup> � ln, Pn,n, (1)

<sup>0</sup> P�<sup>1</sup> (for the data),

<sup>0</sup>. Note that <sup>b</sup><sup>x</sup> is a non-

linear models can be solved by linear algebra [8–15].

58 Optimization Algorithms - Examples

unobserved data for the scientific (or engineering) problems [16–18].

linear algebra used in science and engineering [16–18].

2. Linear algebra and objective functions

the matrix diagonalization) [8, 9, 13, 14].

covariance matrices <sup>Σ</sup>x^ <sup>¼</sup> <sup>Σ</sup><sup>x</sup> <sup>¼</sup> <sup>σ</sup>b<sup>2</sup>

number, respectively.

respectively, with a priori variance σ<sup>2</sup>

ical model with a weight matrix (<sup>P</sup> <sup>¼</sup> <sup>σ</sup><sup>2</sup>

$$\mathbf{B}\_{\mathbf{m},\mathbf{n}} = \frac{\partial \mathbf{f}(\widehat{\mathbf{y}}, \widehat{\mathbf{x}})}{\partial \widehat{\mathbf{y}}} \Big|\_{\dot{\mathbf{y}}, \dot{\mathbf{x}} = \mathbf{y}, \mathbf{x}\_0}.$$

Most of science and engineering problems can be modeled as <sup>b</sup><sup>y</sup> � <sup>f</sup>ð Þ¼ <sup>b</sup><sup>x</sup> <sup>0</sup> (<sup>m</sup> <sup>¼</sup> <sup>n</sup>). Therefore, the functional model named as indirect adjustment method in the adjustment literature [3–7] in geomatics engineering has been preferred in the chapter. The weight matrix (Pn,n) of observations (stochastic variables) would be accepted as a unit matrix Pn,n ¼ In,n in here for simplicity.

#### 2.1. Objective functions

A generalization for objective functions is Lp � Norm (p ¼ 1, 2, 3, 4…, ∞) [9, 10]. The first-degree objective function is L1-norm estimation which is accepted as a robust estimation method in just linear models [9–11].


The second-degree objective function is L2-norm estimation which is known as least squares (LS) method and widely used in hard and soft computations.

The last-degree objective function is L∞-norm estimation which is known as minmax method. In fact, the soft computing techniques use this objective while it applies the trial-and-error method in their learning stages. Eq. (1) under L1-norm and L∞-norm is also solved by means of linear programming methods, for this reason; the methods may give several solutions (as being in trial-and-error method) to any interested problem [10, 11].

#### 2.2. Rank deficiencies in linear models

While a rank is a number that indicates a linear independent column, the number of the coefficient matrix of unknowns in a linear equation system, a rank deficiency represents a linear dependent column number (if it is smaller than the row number) of the coefficient matrix. Inconsistency in the solution stage of a linear equation system results from the (rank) deficiencies. Defining the rank of An,u by rankð Þ¼ A r, a condition r ≤ u ≤ minð Þ m; n is always satisfied. In general, n ¼ m in well-known (or the indirect) LS used in many scientific problems.

Denoting the rank defect d letter, we can define two type defects [12].

$$d\_s = n - r,\qquad \text{Surjectivity ("onto" mapping")}\tag{6a}$$

An,u δ<sup>u</sup> ¼ ln, P ¼ I: (8)

On Non-Linearity and Convergence in Non-Linear Least Squares

http://dx.doi.org/10.5772/intechopen.76313

u,n λn. In

61

: (9)

<sup>0</sup> <sup>A</sup><sup>T</sup>QQ A, (10a)

u,n the normal equation system is established

<sup>0</sup> Q, (11b)

� � <sup>&</sup>lt;<sup>¼</sup> thres <sup>¼</sup> <sup>5</sup><sup>e</sup> � 12; otherwise, continue

, A posteriori variance rð Þ ¼ rankð Þ A (11c)

<sup>b</sup><sup>ε</sup> <sup>¼</sup> <sup>A</sup> <sup>δ</sup><sup>b</sup> � <sup>l</sup>, (11d)

<sup>b</sup><sup>ε</sup> <sup>↦</sup> min: <sup>L</sup><sup>2</sup> � norm estimation Least Square ð Þ (11e)

, (11a)

Meet two states to solve Eq. (8), n ≤ u and n > u. The solution for the former state n ≤ u is

fact, the auxiliary vector λ<sup>n</sup> is named as a Lagrange multipliers vector or an eigenvalues vector in a homogenous equations system in which <sup>l</sup> <sup>¼</sup> <sup>0</sup> for Eq. (8) [9]. Putting back <sup>δ</sup> <sup>¼</sup> <sup>A</sup><sup>T</sup> <sup>λ</sup> into

<sup>λ</sup>b<sup>n</sup> <sup>¼</sup> <sup>Q</sup>n,n <sup>l</sup>n, <sup>Q</sup>n,n <sup>¼</sup> A A<sup>T</sup> � �<sup>þ</sup>

And then δ and its variance–covariance matrix if we know the statistical uncertainty of observations (Σ<sup>l</sup> ¼ ΣyÞ are calculated by Eq. (10) and the low error propagation, respectively. We can only calculate the variance–covariance matrix of estimations as in Eq. (10) due to

In the state (<sup>n</sup> <sup>≤</sup> <sup>u</sup>), <sup>A</sup> <sup>δ</sup><sup>b</sup> � <sup>l</sup> <sup>¼</sup> <sup>b</sup><sup>ε</sup> <sup>¼</sup> <sup>0</sup> should be provided. If not, continue solution until

Solution to the second state n > u is a situation encountered in many scientific and engineering

� � <sup>&</sup>lt;<sup>¼</sup> thres <sup>¼</sup> <sup>5</sup><sup>e</sup> � 12 (or maxð Þ <sup>j</sup>bε<sup>j</sup> <sup>&</sup>lt;<sup>¼</sup> thres <sup>¼</sup> <sup>5</sup><sup>e</sup> � 12) by taking <sup>x</sup> <sup>¼</sup> <sup>b</sup><sup>x</sup> in every itera-

<sup>b</sup><sup>x</sup> <sup>¼</sup> <sup>x</sup> <sup>þ</sup> <sup>δ</sup>b, <sup>Σ</sup>x^ <sup>¼</sup> Σδ^ , (10b)

<sup>b</sup><sup>x</sup> <sup>↦</sup> min: (10c)

<sup>δ</sup><sup>b</sup> <sup>¼</sup> <sup>A</sup><sup>T</sup> <sup>λ</sup><sup>b</sup> <sup>¼</sup> <sup>A</sup><sup>T</sup>Q l, Σδ^ <sup>¼</sup> <sup>σ</sup><sup>2</sup>

<sup>b</sup><sup>x</sup> will be the smallest at end of the solution.

<sup>b</sup><sup>x</sup> <sup>¼</sup> <sup>x</sup> <sup>þ</sup> <sup>δ</sup>b, <sup>Σ</sup>x^ <sup>¼</sup> <sup>σ</sup>b<sup>2</sup>

<sup>δ</sup>b<sup>u</sup> <sup>¼</sup> Q A<sup>T</sup>l, <sup>Q</sup>u,u <sup>¼</sup> <sup>A</sup><sup>T</sup><sup>A</sup> � �<sup>þ</sup>

achieved by means of auxiliary variables vector <sup>λ</sup><sup>n</sup> which can be defined as <sup>δ</sup><sup>u</sup> <sup>¼</sup> <sup>A</sup><sup>T</sup>

Eq. (8), we compute λ first:

<sup>0</sup> ¼ 0 and taking Σ<sup>y</sup> ¼ I in the chapter.

bx T

problems. Multiplying both sides of Eq. (8) by A<sup>T</sup>

σb2 <sup>0</sup> <sup>¼</sup> <sup>b</sup>ε<sup>T</sup>b<sup>ε</sup> n � r

bε T

the iteration with <sup>x</sup> <sup>¼</sup> <sup>b</sup>x.

End the solution if the condition ensured is max jδbj

σb2

max jδbj

tion step. <sup>b</sup><sup>x</sup>

T

and solved with Eq. (11):

$$d\_i = \mathfrak{u} - r.\qquad\qquad\text{Injectivity (\"one-to-one\" mappings)}\tag{6b}$$

Objective functions are used to remove the surjectivity defect ds occurred by the redundant observations. The injectivity defect di can consist of three reasons in the estimation problem [12].

Datum defects (d-defects) are closely related to the origin of the spatial system. The defect arises if the data do not carry any information to cover the absolute spatial position of the problem given.

Configuration (Design) defects (c-defects) occur from weak geometric relation among data and unknowns. To avoid the defect, we can be careful and planned when picking data (whose interval or/and place) and choosing the consistent mathematical model (can use auxiliary variables instead of original ones).

Ill-conditioning defects (i-defects) arise from the large intervals among the elements of the coefficient matrix of unknowns. Norming the matrix can reduce ill-conditioning defects but cannot remove it fully. I-defects and c-defects cannot be separated from each other easily [12].

The defects lead to the failure of any given problem to be solved properly. Since the unknown coefficient matrix cannot be inverted by regular (ordinary) inverse methods, we should use pseudo inverse to overcome the effects of the defects [8, 9, 13–15]. Eigenvalue and singular value decompositions can be used effectively for the pseudoinverse. Denoting a positive definite symmetric matrix <sup>N</sup> (that is always satisfied for <sup>N</sup> <sup>¼</sup> <sup>A</sup><sup>T</sup><sup>A</sup> or <sup>N</sup> <sup>¼</sup> A A<sup>T</sup>), its pseudoinverse is:


<sup>N</sup>u,u <sup>¼</sup> <sup>S</sup> <sup>Λ</sup> <sup>S</sup><sup>T</sup> <sup>¼</sup> <sup>U</sup> <sup>Σ</sup> <sup>V</sup>T, For a positive definite symmetric matrix (7b)

$$
\boldsymbol{\Lambda}^+ = \boldsymbol{\Sigma}^+ = \begin{bmatrix} \boldsymbol{\Lambda}\_r^{-1} & \mathbf{0}\_{r,d} \\ \mathbf{0}\_{r,d} & \mathbf{0}\_{d,d} \end{bmatrix}.
$$

Since Nu,u is a positive definite symmetric matrix in the LS, S ¼ U ¼ V. If there is no defect in a matrix <sup>N</sup>, <sup>N</sup>�<sup>1</sup> <sup>¼</sup> <sup>N</sup>þ. Therefore, we can use pseudoinverse safely in any given problem [8, 9, 13–15].

#### 2.3. Hard computing

Linearizing from nonlinear functions to their linear form by means of Taylor expansion, a linear equation system is to be handled as Eq. (1). To avoid complicated proofs in the solution of an equation system, the simplified mathematical model can be written in the following (statically rotation invariant [1]) numerical computation form.

$$\mathbf{A}\_{\boldsymbol{n},\boldsymbol{u}} \; \mathsf{\delta}\_{\boldsymbol{u}} = \mathsf{l}\_{\boldsymbol{n}} \qquad \qquad \mathsf{P} = \mathsf{I}.\tag{8}$$

Meet two states to solve Eq. (8), n ≤ u and n > u. The solution for the former state n ≤ u is achieved by means of auxiliary variables vector <sup>λ</sup><sup>n</sup> which can be defined as <sup>δ</sup><sup>u</sup> <sup>¼</sup> <sup>A</sup><sup>T</sup> u,n λn. In fact, the auxiliary vector λ<sup>n</sup> is named as a Lagrange multipliers vector or an eigenvalues vector in a homogenous equations system in which <sup>l</sup> <sup>¼</sup> <sup>0</sup> for Eq. (8) [9]. Putting back <sup>δ</sup> <sup>¼</sup> <sup>A</sup><sup>T</sup> <sup>λ</sup> into Eq. (8), we compute λ first:

Defining the rank of An,u by rankð Þ¼ A r, a condition r ≤ u ≤ minð Þ m; n is always satisfied. In

Objective functions are used to remove the surjectivity defect ds occurred by the redundant observations. The injectivity defect di can consist of three reasons in the estimation problem [12]. Datum defects (d-defects) are closely related to the origin of the spatial system. The defect arises if the data do not carry any information to cover the absolute spatial position of the problem given. Configuration (Design) defects (c-defects) occur from weak geometric relation among data and unknowns. To avoid the defect, we can be careful and planned when picking data (whose interval or/and place) and choosing the consistent mathematical model (can use auxiliary

Ill-conditioning defects (i-defects) arise from the large intervals among the elements of the coefficient matrix of unknowns. Norming the matrix can reduce ill-conditioning defects but cannot

The defects lead to the failure of any given problem to be solved properly. Since the unknown coefficient matrix cannot be inverted by regular (ordinary) inverse methods, we should use pseudo inverse to overcome the effects of the defects [8, 9, 13–15]. Eigenvalue and singular value decompositions can be used effectively for the pseudoinverse. Denoting a positive definite symmetric

<sup>Q</sup>u,u <sup>¼</sup> <sup>N</sup><sup>þ</sup> <sup>¼</sup> <sup>S</sup> <sup>Λ</sup><sup>þ</sup> <sup>S</sup><sup>T</sup> <sup>¼</sup> <sup>V</sup> <sup>Σ</sup>þUT, Pseudoinverse of <sup>N</sup> (7a)

<sup>N</sup>u,u <sup>¼</sup> <sup>S</sup> <sup>Λ</sup> <sup>S</sup><sup>T</sup> <sup>¼</sup> <sup>U</sup> <sup>Σ</sup> <sup>V</sup>T, For a positive definite symmetric matrix (7b)

Since Nu,u is a positive definite symmetric matrix in the LS, S ¼ U ¼ V. If there is no defect in a matrix <sup>N</sup>, <sup>N</sup>�<sup>1</sup> <sup>¼</sup> <sup>N</sup>þ. Therefore, we can use pseudoinverse safely in any given problem [8, 9, 13–15].

Linearizing from nonlinear functions to their linear form by means of Taylor expansion, a linear equation system is to be handled as Eq. (1). To avoid complicated proofs in the solution of an equation system, the simplified mathematical model can be written in the following

remove it fully. I-defects and c-defects cannot be separated from each other easily [12].

matrix <sup>N</sup> (that is always satisfied for <sup>N</sup> <sup>¼</sup> <sup>A</sup><sup>T</sup><sup>A</sup> or <sup>N</sup> <sup>¼</sup> A A<sup>T</sup>), its pseudoinverse is:

onto" mapping � � (6a)

one � to � one" mapping � � (6b)

general, n ¼ m in well-known (or the indirect) LS used in many scientific problems.

ds <sup>¼</sup> <sup>n</sup> � r, Surjectivity "

di <sup>¼</sup> <sup>u</sup> � <sup>r</sup>: Injectivity "

Denoting the rank defect d letter, we can define two type defects [12].

variables instead of original ones).

60 Optimization Algorithms - Examples

<sup>Λ</sup><sup>þ</sup> <sup>¼</sup> <sup>Σ</sup><sup>þ</sup> <sup>¼</sup> <sup>Λ</sup>�<sup>1</sup>

2.3. Hard computing

<sup>r</sup> 0r, <sup>d</sup> 0r,d 0d, <sup>d</sup> " #

:

(statically rotation invariant [1]) numerical computation form.

$$
\widehat{\lambda}\_n = \mathbf{Q}\_{n,n} \mathbf{1}\_n \qquad \qquad \mathbf{Q}\_{n,n} = \begin{pmatrix} \mathbf{A} \ \mathbf{A}^T \end{pmatrix}^+. \tag{9}
$$

And then δ and its variance–covariance matrix if we know the statistical uncertainty of observations (Σ<sup>l</sup> ¼ ΣyÞ are calculated by Eq. (10) and the low error propagation, respectively. We can only calculate the variance–covariance matrix of estimations as in Eq. (10) due to σb2 <sup>0</sup> ¼ 0 and taking Σ<sup>y</sup> ¼ I in the chapter.

$$
\widehat{\boldsymbol{\delta}} = \mathbf{A}^T \, \widehat{\boldsymbol{\lambda}} = \mathbf{A}^T \mathbf{Q} \, \mathbf{l}, \qquad \boldsymbol{\Sigma}\_{\widehat{\boldsymbol{\delta}}} = \sigma\_0^2 \, \mathbf{A}^T \mathbf{Q} \, \mathbf{Q} \, \mathbf{A}, \tag{10a}
$$

$$
\hat{\mathbf{x}} = \mathbf{x} + \hat{\mathbf{S}}, \tag{10b}
\\
\qquad \qquad \qquad \qquad \qquad \Sigma\_{\hat{\mathbf{x}}} = \Sigma\_{\hat{\mathbf{S}}'} \tag{10b}
$$

$$
\hat{\mathbf{x}}^T \hat{\mathbf{x}} \quad \star \quad \text{min.}\tag{10c}
$$

In the state (<sup>n</sup> <sup>≤</sup> <sup>u</sup>), <sup>A</sup> <sup>δ</sup><sup>b</sup> � <sup>l</sup> <sup>¼</sup> <sup>b</sup><sup>ε</sup> <sup>¼</sup> <sup>0</sup> should be provided. If not, continue solution until max jδbj � � <sup>&</sup>lt;<sup>¼</sup> thres <sup>¼</sup> <sup>5</sup><sup>e</sup> � 12 (or maxð Þ <sup>j</sup>bε<sup>j</sup> <sup>&</sup>lt;<sup>¼</sup> thres <sup>¼</sup> <sup>5</sup><sup>e</sup> � 12) by taking <sup>x</sup> <sup>¼</sup> <sup>b</sup><sup>x</sup> in every iteration step. <sup>b</sup><sup>x</sup> T<sup>b</sup><sup>x</sup> will be the smallest at end of the solution.

Solution to the second state n > u is a situation encountered in many scientific and engineering problems. Multiplying both sides of Eq. (8) by A<sup>T</sup> u,n the normal equation system is established and solved with Eq. (11):

$$
\widehat{\mathbf{S}}\_{u} = \mathbf{Q}\_{\cdot} \mathbf{A}^{\top} \mathbf{l}, \qquad \qquad \mathbf{Q}\_{u,u} = \left(\mathbf{A}^{\top} \mathbf{A}^{\top}\right)^{+}, \tag{11a}
$$

$$
\hat{\mathbf{x}} = \mathbf{x} + \hat{\mathbf{\delta}}, \qquad \qquad \qquad \Sigma\_{\hat{\mathbf{x}}} = \hat{\sigma}\_0^2 \mathbf{Q}. \tag{11b}
$$

$$
\widehat{\sigma}\_0^2 = \frac{\widehat{\mathbf{c}}^T \widehat{\mathbf{c}}}{n - r}, \qquad \qquad \text{A posteriori variance } (r = \text{rank}(\mathbf{A})) \tag{11c}
$$

$$
\widehat{\mathfrak{e}} = \mathbf{A} \cdot \widehat{\mathfrak{d}} - \mathbf{l},
\tag{11d}
$$

$$
\hat{\mathfrak{E}}^{\top}\hat{\mathfrak{E}} \quad \mapsto \quad \min. \qquad L\_2-norm \,\,\text{estimation (Least Square)}\tag{11e}
$$

End the solution if the condition ensured is max jδbj � � <sup>&</sup>lt;<sup>¼</sup> thres <sup>¼</sup> <sup>5</sup><sup>e</sup> � 12; otherwise, continue the iteration with <sup>x</sup> <sup>¼</sup> <sup>b</sup>x.

Relationships between nonlinearity and LS in a multidimensional surface have been shown by Teunissen et al. [1, 2]. The authors argued the relation on some simple examples and gave some analytical solutions for them. But, they highlighted that those types of analytical solutions have not been given for every problem and emphasized that suitable Taylor expansions have been useful to the solution not being transformed into the analytical ones.
