**2. The integral equation context**

In an integral equation context, the standard EM scattering problem may be formulated in variational form as follows:

Find the surface current*j* such that for all tangential test functions*j t* , we have

$$\begin{split} \int\_{\Gamma} \int\_{\Gamma} \mathsf{G}(|y-\mathbf{x}|) \left( \vec{j}(\mathbf{x}) \cdot \vec{j}^{\sharp}(y) - \frac{1}{k^{2}} \mathrm{div}\_{\Gamma} \vec{j}(\mathbf{x}) \cdot \mathrm{div}\_{\Gamma} \vec{j}^{\sharp}(y) \right) d\mathbf{x} dy &= \\ &= \frac{i}{k \mathbb{Z}\_{0}} \int\_{\Gamma} \vec{\mathsf{E}}\_{\text{inc}}(\mathbf{x}) \cdot \vec{j}^{\sharp}(\mathbf{x}) d\mathbf{x}. \tag{1} \end{split} \tag{1}$$

Eqn. (1) is called Electric Field Integral Equation (EFIE) (see Bilotti & Vegni (2003); Li et al. (2005)); we denote by *<sup>G</sup>*(|*<sup>y</sup>* <sup>−</sup> *<sup>x</sup>*|) = *<sup>e</sup>ik*|*y*−*x*<sup>|</sup> <sup>4</sup>*π*|*<sup>y</sup>* <sup>−</sup> *<sup>x</sup>*<sup>|</sup> the Green's function of Helmholtz equation, Γ is the boundary of the object, *k* the wave number and *Z*<sup>0</sup> = *μ*0/*ε*<sup>0</sup> the characteristic impedance of vacuum (*�* is the electric permittivity and *μ* the magnetic permeability). Given a continuously differentiable vector field *j*(*x*) represented in Cartesian coordinates on a 3D Euclidean space as *j*(*x*1, *x*2, *x*3) = *jx*<sup>1</sup> (*x*1, *x*2, *x*3)*ex*<sup>1</sup> + *jx*<sup>2</sup> (*x*1, *x*2, *x*3)*ex*<sup>2</sup> + *jx*<sup>3</sup> (*x*1, *x*2, *x*3)*ex*<sup>3</sup> , where *ex*<sup>1</sup> ,*ex*<sup>2</sup> ,*ex*<sup>3</sup> are the unit basis vectors of the Euclidean space, we denote by div*j*(*x*) the divergence operator defined as

$$\operatorname{div} \vec{j}(\mathbf{x}) = \frac{\partial j\_{\mathbf{x}\_1}}{\partial \mathbf{x}\_1} + \frac{\partial j\_{\mathbf{x}\_2}}{\partial \mathbf{x}\_2} + \frac{\partial j\_{\mathbf{x}\_3}}{\partial \mathbf{x}\_3}.$$

The EFIE formulation can be applied to arbitrary geometries such as those with cavities, disconnected parts, breaks on the surface; hence, it is very popular in industry.

For closed targets, the Magnetic Field Integral Equation (MFIE) can be used, which reads

$$\int\_{\Gamma} (\vec{R}\_{\text{ext}} \circ \wedge \vec{\nu}) . \vec{j}^{\sharp} + \frac{1}{2} \int\_{\Gamma} \vec{j} . \vec{j}^{\sharp} = - \int\_{\Gamma} (\vec{H}\_{\text{inc}} \wedge \vec{\nu}) . \vec{j}^{\sharp} .$$

The operator *Rext j* is defined as

2 Will-be-set-by-IN-TECH

156 Trends in Electromagnetism – From Fundamentals to Applications

Fig. 1. Example of surface discretization in an integral equation context. Each unknown of the problem is associated to an edge in the mesh. Courtesy of the EMC-CERFACS Group in

the grid compared to the exact solution; they tend to accumulate in space and may introduce spurious solutions over large 3D simulation regions (Bayliss et al. (1985); Jr. (1994); Lee & Cangellaris (1992)). For second-order accurate differential schemes, to alleviate this problem the grid density may grow up to <sup>O</sup>((*kd*)3) unknowns in 2D and of <sup>O</sup>((*kd*)4.5) in 3D, where *<sup>k</sup>* is the wavenumber and *d* is the approximate diameter of the simulation region. Therefore, the overall solution cost may increase considerably also for practical (*i.e.* finite) values of

Boundary element discretizations are applied in many scientific and engineering areas beside electromagnetics and acoustics, e.g. in biomagnetic and bioelectric inverse modeling, magnetostatic and biomolecular problems, and many other applications (Forsman, Gropp, Kettunen & Levine (1995); Yokota, Bardhan, Knepley, Barba & Hamada (2011)). The potential drawback is that they lead, upon discretization, to large and dense linear systems to invert. Hence fast numerical linear algebra methods and efficient parallelization techniques are urged for solving large-scale boundary element equations efficiently on modern computers. In this chapter we overview some relevant techniques. In Section 2 we introduce the boundary integral formulation for EM scattering from perfectly conducting objects. In Section 4 we discuss fast iterative solution strategies based on preconditioned Krylov methods for solving the dense linear system arising from the discretization. In Section 5 we focus our attention on the design of the preconditioner, that is a critical component of Krylov methods in this context.

In an integral equation context, the standard EM scattering problem may be formulated in

*<sup>j</sup>*(*x*) · div<sup>Γ</sup>

*j t* (*y*)  *t*

*dxdy* =

<sup>=</sup> *<sup>i</sup> kZ*<sup>0</sup> Γ *Einc*(*x*) ·*<sup>j</sup>*

, we have

*t*

(*x*)*dx*. (1)

Toulouse.

wavenumber (Chew et al. (1997)).

**2. The integral equation context**

 *<sup>j</sup>*(*x*) ·*<sup>j</sup> t* (*y*) <sup>−</sup> <sup>1</sup>

variational form as follows:

*G*(|*y* − *x*|)

 Γ Γ

We conclude our study in Section 5 with some final remarks.

Find the surface current*j* such that for all tangential test functions*j*

*<sup>k</sup>*<sup>2</sup> div<sup>Γ</sup>

$$\vec{R}\_{\text{ext}} \, j(y) = \int\_{\Gamma} \vec{grad}\_y G(|y-\mathbf{x}|) \wedge \vec{j}(\mathbf{x}) d\mathbf{x} \,\omega$$

and is evaluated in the domain exterior to the object.

Both formulations suffer from interior resonances which make the numerical solution more problematic at some frequencies known as resonant frequencies, especially for large objects. The problem can be solved by combining linearly EFIE and MFIE. The resulting integral equation, known as Combined Field Integral Equation (CFIE), is the formulation of choice for closed targets. We point the reader to Gibson (2008) for a thorough presentation of integral equations in electromagnetism.

On discretizing Eqn. (1) in space by the Method of Moments (MoM) over a mesh containing *n* edges, the surface current *<sup>j</sup>* is expanded into a set of basis functions {*<sup>ϕ</sup>i*}1≤*i*≤*<sup>n</sup>* with compact support (the Rao-Wilton-Glisson basis, Rao et al. (1982), is a popular choice), then the integral equation is applied to a set of tangential test functions *j t* . Selecting *j <sup>t</sup>* = *ϕj*, we are led to compute the set of coefficients {*λi*}1≤*i*≤*<sup>n</sup>* such that

$$\begin{aligned} \sum\_{1 \le i \le n} \lambda\_i \left[ \int\_{\Gamma} \int\_{\Gamma} \mathsf{G} \left( |\mathsf{y} - \mathsf{x}| \right) \left( \vec{\varphi}\_i(\mathsf{x}) \cdot \vec{\varphi}\_j(\mathsf{x}) - \frac{1}{k^2} \operatorname{div}\_{\Gamma} \vec{\varphi}\_i(\mathsf{x}) \cdot \operatorname{div}\_{\Gamma} \vec{\varphi}\_j(\mathsf{y}) \right) d\mathsf{x} d\mathsf{y} \right] &= \\ &= \frac{i}{k \mathbb{Z}\_0} \int\_{\Gamma} \vec{\operatorname{E}}\_{\text{inc}}(\mathsf{x}) \cdot \vec{\varphi}\_j(\mathsf{x}) d\mathsf{x}, \quad \text{(2)} \end{aligned}$$

for each 1 ≤ *i* ≤ *n*. The set of equations (2) can be recast in matrix form as

$$A\lambda = b,\tag{3}$$

Fast Preconditioned Krylov Methods for Boundary Integral Equations

is shown in Figure 2

*<sup>A</sup>*−<sup>1</sup> <sup>=</sup> <sup>−</sup> <sup>1</sup>

for Boundary Integral Equations in Electromagnetic Scattering

a new space is reconstructed using the latest residual.

*α*0

*m* ∑ *j*=0

expected rate of convergence of a Krylov method (see Ipsen & Meyer (1998)).

we have

in Electromagnetic Scattering 5 Solver Iterations (CPU time) CORS 601 (253∗) BiCOR 785 (334) GMRES(50) 2191 (469) QMR 878 (548) BiCGSTAB 1065 (444) Table 1. Number of iterations and CPU time (in seconds) required by Krylov methods to reduce the initial residual to <sup>O</sup>(10−8). An asterisk "∗" indicates the fastest run. The problem

<sup>159</sup> Fast Preconditioned Krylov Methods

*αj*+1*A<sup>j</sup>* , *α*<sup>0</sup> =

This shows that, if the minimal polynomial of *A* has degree *m*, then the solution of *Ax* = *b* lies in the space *Km*(*A*, *b*). The smaller the degree of the minimal polynomial, the faster the

One issue is the choice of the suitable Krylov algorithm. Most integral formulations for surface and hybrid surface/volume scattering give rise to indefinite linear systems that cannot be solved using the Conjugate Gradient method (see discussions in Section 2). The GMRES method by Saad & Schultz (1986) is virtually always used for solving dense non-Hermitian linear systems as it is an optimal iterative solver, in the sense that it minimizes the 2-norm of the residual over the corresponding Krylov space. It generally requires the least number of iterations to converge. However, the optimality of GMRES comes at a price. The cost of applying the method increases with the iterations, and it may sometimes become prohibitively expensive for solving practical applications. As an attempt to limit the costs of GMRES, the algorithm is often restarted. After a given number of steps *k*, the approximate solution is computed from the generated Krylov subspace. Then the Krylov subspace is destroyed, and

On the other hand, non-optimal methods attempt to limit the costs of GMRES while preserving its favourable convergence properties. In Table 1, we show the number of iterations required by Krylov methods to reduce the initial residual to <sup>O</sup>(10−8) starting from the zero vector on the problem shown in Figure 2. For simplicity, the right-hand side of the linear system is set up so that the initial solution is the vector of all ones. We do not use preconditioning. In addition to restarted GMRES, we consider complex versions of iterative algorithms based on Lanczos biorthogonalization, such as BiCGSTAB (van der Vorst (1992)) and QMR (Freund & Nachtigal (1994)) and on the recently developed Lanczos biconjugate *A*-orthonormalization, such as BiCOR and CORS (Carpentieri et al. (2011); Jing, Huang, Zhang, Li, Cheng, Ren, Duan, Sogabe & Carpentieri (2009)). We clearly observe the importance of the choice of the iterative method. In our experiments, the CORS method is the fastest non-Hermitian solver with respect to CPU time on most selected examples except GMRES with large restart. Indeed, unrestarted GMRES may outperform all other Krylov methods and should be used when memory is not a concern. However, reorthogonalization costs may penalize the GMRES convergence in large-scale applications, so using high values of restart may not be convenient (or even not affordable for the memory) as shown

in Carpentieri et al. (2005). In Table 1 we select a value of 50 for the restart parameter.

The BiCOR and CORS methods are introduced in Carpentieri et al. (2011); Jing, Huang, Zhang, Li, Cheng, Ren, Duan, Sogabe & Carpentieri (2009). They search for the approximate

*d* ∏ *j*=1

(−*λj*)*mj* �<sup>=</sup> 0.

where *A* = *Aij* and *b* = [*bi*] have elements, respectively,

$$\begin{split} A\_{i\bar{j}} &= \int\_{\Gamma} \int\_{\Gamma} \mathsf{G} \left( |y - \mathsf{x}| \right) \left( \vec{\varphi}\_{i}(\mathsf{x}) \cdot \vec{\varphi}\_{\bar{j}}(y) - \frac{1}{k^{2}} di\upsilon\_{\Gamma} \vec{\varphi}\_{i}(\mathsf{x}) \cdot di\upsilon\_{\Gamma} \vec{\varphi}\_{\bar{j}}(y) \right) d\mathsf{x} dy, \\ b\_{\bar{j}} &= \frac{i}{k \mathsf{Z}\_{0}} \int\_{\Gamma} \vec{\mathsf{E}}\_{\mathrm{inc}}(\mathsf{x}) \cdot \vec{\varphi}\_{\bar{j}}(y) d\mathsf{x}. \end{split}$$

The set of unknowns are associated with the vectorial flux across an edge in the mesh. The right-hand side varies with the frequency and the direction of the illuminating wave.

#### **3. Fast matrix solvers for boundary element equations**

Linear systems issued from boundary element discretizations may be very large in applications, although their size is typically much smaller compared to those arising from FE or FV formulations of the same problem. The number of unknowns grows linearly with the size of the scatterer and quadratically with the frequency of the incoming radiation (Bendali (1984)). A target with size of a few tens of wavelength, illuminated at O(1) GHz of frequency, may lead to meshes with several million points (Sylvand (2002)). Some efficient out-of-core dense direct solvers based on variants of Gaussian elimination have been proposed for solving blocks of right-hand sides, see e.g. Alléon, Amram, Durante, Homsi, Pogarieloff & Farhat (1997); Chew & Wang (1993). However, the memory requirements of direct methods are not affordable for solving such systems in realistic applications, even on modern parallel computers. Iterative methods can solve the problems of space of direct methods because they are based on matrix-vector (M-V) multiplications. In general terms, a modern integral equation solver is the mix of a robust iterative method, a fast algorithm for computing cheap approximate M-V products, and an efficient preconditioner to speed-up the convergence.

#### **3.1 The choice of the iterative method**

Krylov methods are among the most popular accelerators because of their ability to deliver good rates of convergence and to handle very large problems efficiently. They look for the solution of the system *Ax* <sup>=</sup> *<sup>b</sup>* in the Krylov space *Kk*(*A*, *<sup>b</sup>*) = *span*{*b*, *Ab*, *<sup>A</sup>*2*b*, ..., *<sup>A</sup>k*−1*b*}. This is a good space from which to construct approximate solutions for a nonsingular linear system because it is intimately related to *A*−1. In fact the inverse of any nonsingular matrix *A* can be written in terms of powers of *A* with the help of the minimal polynomial *q*(*t*) of *A*, which is the unique monic polynomial of minimal degree such that *q*(*A*) = 0. If *λ*1, ..., *λ<sup>d</sup>* are the distinct eigenvalues of *A*, *mj* is the index of *λj*, and we define *m* as

$$m = \sum\_{j=1}^{d} m\_{j\prime}$$

then

$$q(t) = \prod\_{j=1}^{d} (t - \lambda\_j)^{m\_j}.\tag{4}$$

Writing *q*(*t*) in the form

$$q(t) = \sum\_{j=0}^{m} \alpha\_j t^j \mu$$


Table 1. Number of iterations and CPU time (in seconds) required by Krylov methods to reduce the initial residual to <sup>O</sup>(10−8). An asterisk "∗" indicates the fastest run. The problem is shown in Figure 2

we have

4 Will-be-set-by-IN-TECH

158 Trends in Electromagnetism – From Fundamentals to Applications

*<sup>ϕ</sup>i*(*x*) · *<sup>ϕ</sup>j*(*y*) <sup>−</sup> <sup>1</sup>

The set of unknowns are associated with the vectorial flux across an edge in the mesh. The

Linear systems issued from boundary element discretizations may be very large in applications, although their size is typically much smaller compared to those arising from FE or FV formulations of the same problem. The number of unknowns grows linearly with the size of the scatterer and quadratically with the frequency of the incoming radiation (Bendali (1984)). A target with size of a few tens of wavelength, illuminated at O(1) GHz of frequency, may lead to meshes with several million points (Sylvand (2002)). Some efficient out-of-core dense direct solvers based on variants of Gaussian elimination have been proposed for solving blocks of right-hand sides, see e.g. Alléon, Amram, Durante, Homsi, Pogarieloff & Farhat (1997); Chew & Wang (1993). However, the memory requirements of direct methods are not affordable for solving such systems in realistic applications, even on modern parallel computers. Iterative methods can solve the problems of space of direct methods because they are based on matrix-vector (M-V) multiplications. In general terms, a modern integral equation solver is the mix of a robust iterative method, a fast algorithm for computing cheap approximate M-V products, and an efficient preconditioner to speed-up the convergence.

Krylov methods are among the most popular accelerators because of their ability to deliver good rates of convergence and to handle very large problems efficiently. They look for the solution of the system *Ax* <sup>=</sup> *<sup>b</sup>* in the Krylov space *Kk*(*A*, *<sup>b</sup>*) = *span*{*b*, *Ab*, *<sup>A</sup>*2*b*, ..., *<sup>A</sup>k*−1*b*}. This is a good space from which to construct approximate solutions for a nonsingular linear system because it is intimately related to *A*−1. In fact the inverse of any nonsingular matrix *A* can be written in terms of powers of *A* with the help of the minimal polynomial *q*(*t*) of *A*, which is the unique monic polynomial of minimal degree such that *q*(*A*) = 0. If *λ*1, ..., *λ<sup>d</sup>* are

*m* =

*q*(*t*) =

*d* ∏ *j*=1

*q*(*t*) =

*d* ∑ *j*=1 *mj*,

(*<sup>t</sup>* <sup>−</sup> *<sup>λ</sup>j*)*mj*

*m* ∑ *j*=0 *αjt j* ,

the distinct eigenvalues of *A*, *mj* is the index of *λj*, and we define *m* as

right-hand side varies with the frequency and the direction of the illuminating wave.

*<sup>k</sup>*<sup>2</sup> *div*<sup>Γ</sup>*<sup>ϕ</sup>i*(*x*) · *div*<sup>Γ</sup>*<sup>ϕ</sup>j*(*y*)

 *dxdy*,

. (4)

and *b* = [*bi*] have elements, respectively,

*Einc*(*x*) · *ϕj*(*y*)*dx*.

**3. Fast matrix solvers for boundary element equations**

*G* (|*y* − *x*|)

where *A* =

 *Aij* 

*Aij* = Γ Γ

*bj* <sup>=</sup> *<sup>i</sup> kZ*<sup>0</sup> Γ 

**3.1 The choice of the iterative method**

then

Writing *q*(*t*) in the form

$$A^{-1} = -\frac{1}{\mathfrak{a}\_0} \sum\_{j=0}^{m} \mathfrak{a}\_{j+1} A^j \qquad , \qquad \mathfrak{a}\_0 = \prod\_{j=1}^{d} (-\lambda\_j)^{m\_j} \neq 0.$$

This shows that, if the minimal polynomial of *A* has degree *m*, then the solution of *Ax* = *b* lies in the space *Km*(*A*, *b*). The smaller the degree of the minimal polynomial, the faster the expected rate of convergence of a Krylov method (see Ipsen & Meyer (1998)).

One issue is the choice of the suitable Krylov algorithm. Most integral formulations for surface and hybrid surface/volume scattering give rise to indefinite linear systems that cannot be solved using the Conjugate Gradient method (see discussions in Section 2). The GMRES method by Saad & Schultz (1986) is virtually always used for solving dense non-Hermitian linear systems as it is an optimal iterative solver, in the sense that it minimizes the 2-norm of the residual over the corresponding Krylov space. It generally requires the least number of iterations to converge. However, the optimality of GMRES comes at a price. The cost of applying the method increases with the iterations, and it may sometimes become prohibitively expensive for solving practical applications. As an attempt to limit the costs of GMRES, the algorithm is often restarted. After a given number of steps *k*, the approximate solution is computed from the generated Krylov subspace. Then the Krylov subspace is destroyed, and a new space is reconstructed using the latest residual.

On the other hand, non-optimal methods attempt to limit the costs of GMRES while preserving its favourable convergence properties. In Table 1, we show the number of iterations required by Krylov methods to reduce the initial residual to <sup>O</sup>(10−8) starting from the zero vector on the problem shown in Figure 2. For simplicity, the right-hand side of the linear system is set up so that the initial solution is the vector of all ones. We do not use preconditioning. In addition to restarted GMRES, we consider complex versions of iterative algorithms based on Lanczos biorthogonalization, such as BiCGSTAB (van der Vorst (1992)) and QMR (Freund & Nachtigal (1994)) and on the recently developed Lanczos biconjugate *A*-orthonormalization, such as BiCOR and CORS (Carpentieri et al. (2011); Jing, Huang, Zhang, Li, Cheng, Ren, Duan, Sogabe & Carpentieri (2009)). We clearly observe the importance of the choice of the iterative method. In our experiments, the CORS method is the fastest non-Hermitian solver with respect to CPU time on most selected examples except GMRES with large restart. Indeed, unrestarted GMRES may outperform all other Krylov methods and should be used when memory is not a concern. However, reorthogonalization costs may penalize the GMRES convergence in large-scale applications, so using high values of restart may not be convenient (or even not affordable for the memory) as shown in Carpentieri et al. (2005). In Table 1 we select a value of 50 for the restart parameter.

The BiCOR and CORS methods are introduced in Carpentieri et al. (2011); Jing, Huang, Zhang, Li, Cheng, Ren, Duan, Sogabe & Carpentieri (2009). They search for the approximate

Fast Preconditioned Krylov Methods for Boundary Integral Equations

for Boundary Integral Equations in Electromagnetic Scattering

domain.

in Electromagnetic Scattering 7 (see Figure 3). Multipole coefficients are computed for all boxes starting from the smallest

<sup>161</sup> Fast Preconditioned Krylov Methods

Fig. 3. The *oct-tree* data structure representation in the FMM algorithm. Each cube has up to eight children and one parent box except for the largest cube which encloses the whole

ones, that are the leaves, and recursively for each parent cube in the tree by summing together multipole coefficients of its children. Interactions of degrees of freedom within one observation box and its close neighboring boxes are computed exactly using MoM; depending on the frequency, they generate between 1% and 2% of the entries of *A*. Interactions with boxes that are not neighbors of the observation box but whose parent in the oct-tree is a neighbor of the box parent are computed using FMM (see Figure 4). All other interactions are computed

Fig. 4. Interactions in the multilevel FMM algorithm. Interactions for the gray boxes are computed directly. We denote by dashed lines cubes that are not neighbors of the cube itself but whose parent is a neighbor of the cube's parent. These interactions are computed using

the FMM. All other interactions are computed hierarchically on a coarser level.

Fig. 2. Test problem: an open cylindric surface. Characteristics of the associated linear system: size=6268, frequency=362 MHz, *<sup>κ</sup>*1(*A*) = <sup>O</sup>(105). Courtesy of the EMC-CERFACS Group in Toulouse.

solution in the Krylov subspace *Km* (*A*,*r*0) by applying a Petrov-Galerkin approach and imposing the residual be orthogonal to the constraints subspace *AHKm AH*,*r*<sup>∗</sup> 0 ; the shadow residual *r*∗ <sup>0</sup> is chosen to be equal to *r*<sup>∗</sup> <sup>0</sup> = *Ar*0. The basis vector representations for the subspaces *Km* (*A*,*r*0) and *AHKm AH*,*r*<sup>∗</sup> 0 are computed by means of the biconjugate *A*-Orthonormalization procedure. Starting from two vectors *v*<sup>1</sup> and *w*<sup>1</sup> chosen to satisfy the condition *ω<sup>H</sup>* <sup>1</sup> *Av*1, the method ideally builds up a pair of biconjugate A-orthonormal bases *vj*, *j* = 1, 2, . . . , *m* and *wi*, *i* = 1, 2, . . . , *m*, respectively for the dual Krylov subspaces *Km*(*A*; *v*1) and *Km*(*AH*; *w*1), satisfying the condition *ω<sup>H</sup> <sup>i</sup> Avj* = *δi*,*j*, 1 ≤ *i*, *j* ≤ *m*. We point the reader to Jing, Carpentieri & Huang (2009) for further experiments with iterative Krylov methods for surface integral equations.

A significant amount of work has been devoted in the last years to design fast algorithms that can reduce the <sup>O</sup>(*n*2) computational complexity for the M-V product operation required at each step of a Krylov method, such as the Fast Multipole Method (FMM) (Greengard & Rokhlin (1987); Rokhlin (1990)), the panel clustering method (Hackbush & Nowak (1989)), the H-matrix approach (Hackbush (1999)), wavelet techniques (Alpert et al. (1993); Bond & Vavasis (1994)), the adaptive cross approximation method (Bebendorf (2000)), the impedance matrix localization method (Canning (1990)), the multilevel matrix decomposition algorithm (Michielssen & Boag (1996)) and others. In particular, the combination of iterative Krylov subspace solvers and FMM is a popular approach for solving integral equations. For Helmholtz and Maxwell problems, FMM algorithms enable to speedup M-V multiplications with boundary element matrices down to O(*n* log *n*) algorithmic and memory complexity depending on the problem and on the specific implementation, see e.g. Cheng et al. (2006); Darrigrand (2002); Darve & Havé (2004); Dembart & Epton (1994); Engheta et al. (1992); Song & Chew (1995); Tausch (2004). Two-level implementations of FMM can reduce the cost of the M-V product operation from <sup>O</sup>(*n*2) to <sup>O</sup>(*n*3/2), a three level algorithm down to <sup>O</sup>(*n*4/3) and the Multilevel Fast Multipole Algorithm (MLFMA) to O(*n* log *n*).

Multipole techniques exploit the rapid decay of the Green's function and compute interactions amongst degrees of freedom in the mesh at different levels of accuracy depending on their physical distance. The 3D mesh of the object is partitioned recursively into boxes of roughly equal size until the size becomes small compared with the wavelength. The hierarchical partitioning of the object is typically represented using a tree-structured data called *oct-tree* 6 Will-be-set-by-IN-TECH

160 Trends in Electromagnetism – From Fundamentals to Applications

Fig. 2. Test problem: an open cylindric surface. Characteristics of the associated linear system: size=6268, frequency=362 MHz, *<sup>κ</sup>*1(*A*) = <sup>O</sup>(105). Courtesy of the EMC-CERFACS

imposing the residual be orthogonal to the constraints subspace *AHKm*

the Multilevel Fast Multipole Algorithm (MLFMA) to O(*n* log *n*).

 *AH*,*r*<sup>∗</sup> 0

<sup>0</sup> is chosen to be equal to *r*<sup>∗</sup>

and *Km*(*AH*; *w*1), satisfying the condition *ω<sup>H</sup>*

the subspaces *Km* (*A*,*r*0) and *AHKm*

solution in the Krylov subspace *Km* (*A*,*r*0) by applying a Petrov-Galerkin approach and

*A*-Orthonormalization procedure. Starting from two vectors *v*<sup>1</sup> and *w*<sup>1</sup> chosen to satisfy the

*vj*, *j* = 1, 2, . . . , *m* and *wi*, *i* = 1, 2, . . . , *m*, respectively for the dual Krylov subspaces *Km*(*A*; *v*1)

to Jing, Carpentieri & Huang (2009) for further experiments with iterative Krylov methods for

A significant amount of work has been devoted in the last years to design fast algorithms that can reduce the <sup>O</sup>(*n*2) computational complexity for the M-V product operation required at each step of a Krylov method, such as the Fast Multipole Method (FMM) (Greengard & Rokhlin (1987); Rokhlin (1990)), the panel clustering method (Hackbush & Nowak (1989)), the H-matrix approach (Hackbush (1999)), wavelet techniques (Alpert et al. (1993); Bond & Vavasis (1994)), the adaptive cross approximation method (Bebendorf (2000)), the impedance matrix localization method (Canning (1990)), the multilevel matrix decomposition algorithm (Michielssen & Boag (1996)) and others. In particular, the combination of iterative Krylov subspace solvers and FMM is a popular approach for solving integral equations. For Helmholtz and Maxwell problems, FMM algorithms enable to speedup M-V multiplications with boundary element matrices down to O(*n* log *n*) algorithmic and memory complexity depending on the problem and on the specific implementation, see e.g. Cheng et al. (2006); Darrigrand (2002); Darve & Havé (2004); Dembart & Epton (1994); Engheta et al. (1992); Song & Chew (1995); Tausch (2004). Two-level implementations of FMM can reduce the cost of the M-V product operation from <sup>O</sup>(*n*2) to <sup>O</sup>(*n*3/2), a three level algorithm down to <sup>O</sup>(*n*4/3) and

Multipole techniques exploit the rapid decay of the Green's function and compute interactions amongst degrees of freedom in the mesh at different levels of accuracy depending on their physical distance. The 3D mesh of the object is partitioned recursively into boxes of roughly equal size until the size becomes small compared with the wavelength. The hierarchical partitioning of the object is typically represented using a tree-structured data called *oct-tree*

<sup>1</sup> *Av*1, the method ideally builds up a pair of biconjugate A-orthonormal bases

 *AH*,*r*<sup>∗</sup> 0 

<sup>0</sup> = *Ar*0. The basis vector representations for

are computed by means of the biconjugate

*<sup>i</sup> Avj* = *δi*,*j*, 1 ≤ *i*, *j* ≤ *m*. We point the reader

; the shadow

Group in Toulouse.

residual *r*∗

condition *ω<sup>H</sup>*

surface integral equations.

(see Figure 3). Multipole coefficients are computed for all boxes starting from the smallest

Fig. 3. The *oct-tree* data structure representation in the FMM algorithm. Each cube has up to eight children and one parent box except for the largest cube which encloses the whole domain.

ones, that are the leaves, and recursively for each parent cube in the tree by summing together multipole coefficients of its children. Interactions of degrees of freedom within one observation box and its close neighboring boxes are computed exactly using MoM; depending on the frequency, they generate between 1% and 2% of the entries of *A*. Interactions with boxes that are not neighbors of the observation box but whose parent in the oct-tree is a neighbor of the box parent are computed using FMM (see Figure 4). All other interactions are computed

Fig. 4. Interactions in the multilevel FMM algorithm. Interactions for the gray boxes are computed directly. We denote by dashed lines cubes that are not neighbors of the cube itself but whose parent is a neighbor of the cube's parent. These interactions are computed using the FMM. All other interactions are computed hierarchically on a coarser level.

Fast Preconditioned Krylov Methods for Boundary Integral Equations

may be *S* = *Adiag* + *Anear*.

(a) Topological neighbours of a DOF in the discretization mesh.

equation context.

We may expect that the preconditioned system

for Boundary Integral Equations in Electromagnetic Scattering

in Electromagnetic Scattering 9

<sup>163</sup> Fast Preconditioned Krylov Methods

The simplest approach to compute the local matrix *S* is to drop the small entries of *A* below a threshold (Cosnau (1996); Kolotilina (1988); Vavasis (1992)). When all the entries of *A* are not explicitly available, it may be necessary to use information extracted from the physical mesh of the problem. In an integral equation context, the surface of the object is discretized using a triangular mesh; each degree of freedom (DOF), or equivalently each unknown of the linear system, is associated to an edge of the mesh. Therefore, the sparsity pattern of *S* can be defined according to the concept of level *k* neighbours (see Figure 5(a)). Level 1 neighbours of a DOF are the DOF plus the four DOFs belonging to the two triangles that share the edge corresponding to the DOF itself. Level 2 neighbours are all the level 1 neighbours plus the DOFs in the triangles that are neighbours of the two triangles considered at level 1, and so forth. Due to the very localized nature of the Green's function, by retaining a few (two or three) levels of neighbours for each DOF an effective approximation may be constructed.

Comparative experiments show that there is little to choose. Both matrix- and mesh-based approaches can provide very good approximations *S* to the dense coefficient matrix for low sparsity ratio between 1% and 2% (Carpentieri et al. (2000)). The mesh-based approach is straightforward to implement in FMM codes as the object is typically partitioned using geometric information (see Figure 5(b)). Multipole algorithms yield a matrix decomposition

where *Adiag* is the block diagonal part of *A* associated with interactions of basis functions belonging to the same box, *Anear* is the block near-diagonal part of *A* associated with interactions within one level of neighboring boxes (they are 8 in 2D and 26 in 3D), and *Af ar* is the far-field part of *A*. Therefore, in a multipole setting a suitable choice for the local matrix

Fig. 5. Mesh-based pattern selection strategies to compute local interactions in an integral

of eigenvalues close to one, see e.g. Chen (1994) and (Chen, 2005, pp. 182-185).

*I* + *S*−1*B*

*A* = *Adiag* + *Anear* + *Af ar*, (6)

Toulouse.

*x* = *S*−1*b* has a good clusterization

(b) Box-wise partitioning in the FMM context. Courtesy of EADS-CCR

hierarchically on a coarser level by traversing the oct-tree. Multiple techniques have been efficiently implemented on distributed memory parallel computers proving to be scalable to several million discretization points, see for instance the FISC code developed at University of Illinois by Song & Chew (1998); Song et al. (1997; 1998), the INRIA/EADS integral equation code AS\_ELFIP by Sylvand (2002; 2003), the Bilkent University code by Ergül & Gürel (2007; 2008) and others.

### **4. Algebraic preconditioning for boundary integral equations**

Krylov methods may converge very slowly in practice, mainly due to bad spectral conditioning of the linear system. Relation (4) implies that the dimension of the solution space, and therefore the convergence properties, are mostly dictated by the eigenvalue distribution of *A*. The spectral properties may vary noticeably depending on the integral operator as well as on object shape and material. Problems with cavities or open surfaces typically require more iterations to converge than closed objects of the same physical size, and nonuniform meshes often produce ill-conditioned MoM matrices. On EFIE, the iteration count of Krylov solvers may increase as <sup>O</sup>(*n*0.5) when the number of unknowns *<sup>n</sup>* is related to the wavenumber, see for instance experiments reported in Song & Chew (1998), whereas on CFIE the number of iterations typically increases as <sup>O</sup>(*n*0.25).

On the other hand, if preconditioning *A* by a nonsingular matrix *M* the eigenvalues of *M*−1*A* fall into a few clusters, say *t* of them, whose diameters are small enough, then *M*−1*A* behaves numerically like a matrix with *t* distinct eigenvalues. As a result, we would expect *t* iterations of a Krylov method to produce reasonably accurate approximations. The matrix *M* is called the *preconditioner* matrix; preconditioning can be applied from the left as *M*−1*Ax* = *M*−1*b* as well as from the right as *AM*−1*y* = *b* with *x* = *M*−1*y*.

Optimal analytic preconditioners have been proposed for surface integral equations, see e.g. Antoine et al. (2004); Christiansen & Nédélec (2002); Steinbach & Wendland (1998). But they are problem-dependent. In this study, we consider purely algebraic techniques which compute the preconditioner only using information contained in the coefficient matrix of the linear system. Although far from optimal for any specific problem, algebraic methods can be applied to different operators and to changes in the geometry only by tuning a few parameters, and may often be developed from existent public-domain software implementations.

We are interested to develop techniques that have O(*n* log *n*) algorithmic and memory complexity in the construction and in the application phase like FMM, and may be implemented efficiently within multipole codes. For memory concerns, we compute the preconditioner by initially decomposing the linear system in the form

$$\mathbf{x}(\mathbf{S} + \mathbf{B})\mathbf{x} = \mathbf{b} \tag{5}$$

where *S* is a sparse matrix retaining the most relevant contributions to the singular integrals and is easy to invert, while *B* can be dense. If the continuous operator S underlying *S* is bounded and the operator <sup>B</sup> underlying *<sup>B</sup>* is compact, then <sup>S</sup>−1<sup>B</sup> is compact and

$$\mathcal{S}^{-1} \left( \mathcal{S} + \mathcal{B} \right) = \mathcal{T} + \mathcal{S}^{-1} \mathcal{B}.$$

8 Will-be-set-by-IN-TECH

162 Trends in Electromagnetism – From Fundamentals to Applications

hierarchically on a coarser level by traversing the oct-tree. Multiple techniques have been efficiently implemented on distributed memory parallel computers proving to be scalable to several million discretization points, see for instance the FISC code developed at University of Illinois by Song & Chew (1998); Song et al. (1997; 1998), the INRIA/EADS integral equation code AS\_ELFIP by Sylvand (2002; 2003), the Bilkent University code by Ergül & Gürel (2007;

Krylov methods may converge very slowly in practice, mainly due to bad spectral conditioning of the linear system. Relation (4) implies that the dimension of the solution space, and therefore the convergence properties, are mostly dictated by the eigenvalue distribution of *A*. The spectral properties may vary noticeably depending on the integral operator as well as on object shape and material. Problems with cavities or open surfaces typically require more iterations to converge than closed objects of the same physical size, and nonuniform meshes often produce ill-conditioned MoM matrices. On EFIE, the iteration count of Krylov solvers may increase as <sup>O</sup>(*n*0.5) when the number of unknowns *<sup>n</sup>* is related to the wavenumber, see for instance experiments reported in Song & Chew (1998), whereas on CFIE the number of

On the other hand, if preconditioning *A* by a nonsingular matrix *M* the eigenvalues of *M*−1*A* fall into a few clusters, say *t* of them, whose diameters are small enough, then *M*−1*A* behaves numerically like a matrix with *t* distinct eigenvalues. As a result, we would expect *t* iterations of a Krylov method to produce reasonably accurate approximations. The matrix *M* is called the *preconditioner* matrix; preconditioning can be applied from the left as *M*−1*Ax* = *M*−1*b* as

Optimal analytic preconditioners have been proposed for surface integral equations, see e.g. Antoine et al. (2004); Christiansen & Nédélec (2002); Steinbach & Wendland (1998). But they are problem-dependent. In this study, we consider purely algebraic techniques which compute the preconditioner only using information contained in the coefficient matrix of the linear system. Although far from optimal for any specific problem, algebraic methods can be applied to different operators and to changes in the geometry only by tuning a few parameters, and may often be developed from existent public-domain software

We are interested to develop techniques that have O(*n* log *n*) algorithmic and memory complexity in the construction and in the application phase like FMM, and may be implemented efficiently within multipole codes. For memory concerns, we compute the

where *S* is a sparse matrix retaining the most relevant contributions to the singular integrals and is easy to invert, while *B* can be dense. If the continuous operator S underlying *S* is

<sup>S</sup>−<sup>1</sup> (<sup>S</sup> <sup>+</sup> <sup>B</sup>) <sup>=</sup> <sup>I</sup> <sup>+</sup> <sup>S</sup>−1B.

bounded and the operator <sup>B</sup> underlying *<sup>B</sup>* is compact, then <sup>S</sup>−1<sup>B</sup> is compact and

(*S* + *B*)*x* = *b* (5)

preconditioner by initially decomposing the linear system in the form

**4. Algebraic preconditioning for boundary integral equations**

2008) and others.

implementations.

iterations typically increases as <sup>O</sup>(*n*0.25).

well as from the right as *AM*−1*y* = *b* with *x* = *M*−1*y*.

We may expect that the preconditioned system *I* + *S*−1*B x* = *S*−1*b* has a good clusterization of eigenvalues close to one, see e.g. Chen (1994) and (Chen, 2005, pp. 182-185).

The simplest approach to compute the local matrix *S* is to drop the small entries of *A* below a threshold (Cosnau (1996); Kolotilina (1988); Vavasis (1992)). When all the entries of *A* are not explicitly available, it may be necessary to use information extracted from the physical mesh of the problem. In an integral equation context, the surface of the object is discretized using a triangular mesh; each degree of freedom (DOF), or equivalently each unknown of the linear system, is associated to an edge of the mesh. Therefore, the sparsity pattern of *S* can be defined according to the concept of level *k* neighbours (see Figure 5(a)). Level 1 neighbours of a DOF are the DOF plus the four DOFs belonging to the two triangles that share the edge corresponding to the DOF itself. Level 2 neighbours are all the level 1 neighbours plus the DOFs in the triangles that are neighbours of the two triangles considered at level 1, and so forth. Due to the very localized nature of the Green's function, by retaining a few (two or three) levels of neighbours for each DOF an effective approximation may be constructed.

Comparative experiments show that there is little to choose. Both matrix- and mesh-based approaches can provide very good approximations *S* to the dense coefficient matrix for low sparsity ratio between 1% and 2% (Carpentieri et al. (2000)). The mesh-based approach is straightforward to implement in FMM codes as the object is typically partitioned using geometric information (see Figure 5(b)). Multipole algorithms yield a matrix decomposition

$$A = A\_{diag} + A\_{near} + A\_{far} \tag{6}$$

where *Adiag* is the block diagonal part of *A* associated with interactions of basis functions belonging to the same box, *Anear* is the block near-diagonal part of *A* associated with interactions within one level of neighboring boxes (they are 8 in 2D and 26 in 3D), and *Af ar* is the far-field part of *A*. Therefore, in a multipole setting a suitable choice for the local matrix may be *S* = *Adiag* + *Anear*.

(a) Topological neighbours of a DOF in the discretization mesh.

(b) Box-wise partitioning in the FMM context. Courtesy of EADS-CCR Toulouse.

Fig. 5. Mesh-based pattern selection strategies to compute local interactions in an integral equation context.

Fast Preconditioned Krylov Methods for Boundary Integral Equations

problem is a small sphere.

the identity

**4.2 Sparse approximate inverses preconditioner**

for Boundary Integral Equations in Electromagnetic Scattering

in Electromagnetic Scattering 11

<sup>165</sup> Fast Preconditioned Krylov Methods

Approximate inverse methods are very attractive for parallelism. They explicitly compute and store an approximation of the inverse of the coefficient matrix *<sup>M</sup>* <sup>≈</sup> *<sup>S</sup>*−1, which may be used as preconditioner by performing one or more sparse M-V products operations at each step of an iterative solver. As shown in Figure 6, due to the rapid decay of the Green's function the entries of *A*−<sup>1</sup> may have a very similar structure to those of *A*, so that a very sparse

(a) Pattern of large entries of *A* (b) Pattern of large entries of *A*−<sup>1</sup>

The actual entries of *M* may be computed by minimizing the error matrix �*I* − *SM*�*<sup>F</sup>* for right preconditioning (�*I* − *MS*�*<sup>F</sup>* resp. left preconditioning). The Frobenius-norm allows to decouple the constrained minimization problem into *n* independent linear least-squares problems, one for each column (resp. row) of *M* when preconditioning from the right (resp. from the left). The independence of the least-squares problems can be immediately seen from

where *ej* is the *<sup>j</sup>*th canonical unit vector and *<sup>m</sup>*•*<sup>j</sup>* is the column vector representing the *<sup>j</sup>*th

holds, where *mj*• is the column vector representing the *<sup>j</sup>*th row of *<sup>M</sup>*. The preconditioner is not guaranteed to be nonsingular in general, and additionally it does not preserve any possible symmetry of *A*. The condition to ensure non-singularity of *M* may be derived from the following estimates of the accuracy of the approximate inverse (Grote & Huckle (1997)):

*<sup>F</sup>* = *n* ∑ *j*=1

�*ej* <sup>−</sup> *Sm*•*j*�<sup>2</sup>

�*ej* <sup>−</sup> *<sup>S</sup>Tmj*•�<sup>2</sup>

<sup>2</sup>, (7)

<sup>2</sup> (8)

Fig. 6. Structure of the large entries of *A* (on the left) and of *A*−<sup>1</sup> (on the right). Large to small entries are depicted in different colors, from red to green, yellow and blue. The test

�*<sup>I</sup>* <sup>−</sup> *SM*�<sup>2</sup>

column of *M*. In the case of right preconditioning, the analogous relation

�*<sup>I</sup>* <sup>−</sup> *MS*�<sup>2</sup>

*<sup>F</sup>* = *n* ∑ *j*=1

*<sup>F</sup>* <sup>=</sup> �*<sup>I</sup>* <sup>−</sup> *<sup>S</sup>TMT*�<sup>2</sup>

preconditioner *M* may effectively capture the large contributions to the inverse.

#### **4.1 Comparison of standard preconditioners**

To illustrate the difficulty of finding a good preconditioner for this problem class, in Table 2 we report one experiments with the GMRES solver and various algebraic preconditioners applied to a scattering problem from an open cylindric surface illuminated at 200 MHz of frequency and modeled using EFIE. The system has *n* = 1299 unknowns and is a low resolution testcase than the problem in Figure 2. In connection with GMRES, we consider preconditioners *M* of either implicit type (which approximately factorize *S*) or of explicit type (which approximately invert *S*) at roughly the same number of nonzero entries in *M*. We adopt the following acronyms:



Density of *<sup>S</sup>* = 3.18% - Density of *<sup>M</sup>* = 1.99%.

Table 2. Number of iterations using GMRES and various preconditioners on a test problem, a cylinder with an open surface, discretized with *n* = 1299 edges. The tolerance is set to 10−8. The symbol '-' means that no convergence was achieved after 1000 iterations. The results are for right preconditioning.

We see that many standard methods fail. Simple preconditioners, like the diagonal of *A*, diagonal blocks, or a band, may be effective when the coefficient matrix has some degree of diagonal dominance (Song et al. (1997)). For ill-conditioned and indefinite matrices, more robust methods are needed. Techniques that are successful for solving partial differential equations may be successfully adopted for integral equations; in the next section, we analyse some of these methods.

#### **4.2 Sparse approximate inverses preconditioner**

10 Will-be-set-by-IN-TECH

164 Trends in Electromagnetism – From Fundamentals to Applications

To illustrate the difficulty of finding a good preconditioner for this problem class, in Table 2 we report one experiments with the GMRES solver and various algebraic preconditioners applied to a scattering problem from an open cylindric surface illuminated at 200 MHz of frequency and modeled using EFIE. The system has *n* = 1299 unknowns and is a low resolution testcase than the problem in Figure 2. In connection with GMRES, we consider preconditioners *M* of either implicit type (which approximately factorize *S*) or of explicit type (which approximately invert *S*) at roughly the same number of nonzero entries in *M*. We adopt the following

> (*D*+*ωE<sup>T</sup>* ) *<sup>ω</sup>*(2−*ω*) , where

*L* (resp. *U*) is equal to that of the

*LU*, *L* ≈

**4.1 Comparison of standard preconditioners**

• *None*, means that no preconditioner is used;

lower (resp. upper) triangular part of *S*;

threshold in the inverse factors.

for right preconditioning.

some of these methods.

• *Diag*, a simple diagonal scaling, *i.e. M* is the diagonal of *S*;

*L*, *U* ≈ *U*, *S* = *LU*, where the sparsity pattern of

• *SSOR*, the symmetric successive overrelaxation method *<sup>M</sup>* <sup>=</sup> (*D*+*ωE*)*D*−<sup>1</sup>

we denote by *D* the diagonal of *S* and *E* is the strict lower triangular part of *S*; • *ILU*(0) by Saad (1996), the lower/upper incomplete LU factorization *M* =

computed by minimizing �*I* − *SM*�*F*. The same pattern of *S* is imposed to *M*.

• *SPAI* by Grote & Huckle (1997), an approximate inverse preconditioner *<sup>M</sup>* <sup>≈</sup> *<sup>S</sup>*−<sup>1</sup>

• *AINV* by Benzi et al. (1996), a sparse approximate inverse computed in factorized form by applying an incomplete biconjugation process to *S*, and dropping small entries below a

Density of *<sup>S</sup>* = 3.18% - Density of *<sup>M</sup>* = 1.99%.

Precond. GMRES(30) GMRES(80) GMRES(∞)

We see that many standard methods fail. Simple preconditioners, like the diagonal of *A*, diagonal blocks, or a band, may be effective when the coefficient matrix has some degree of diagonal dominance (Song et al. (1997)). For ill-conditioned and indefinite matrices, more robust methods are needed. Techniques that are successful for solving partial differential equations may be successfully adopted for integral equations; in the next section, we analyse

*None* - - 302 *Diag* - - 272 *SSOR* - 717 184 *ILU*(0) - 454 135 *SPAI* 308 70 70 *AINV* --- Table 2. Number of iterations using GMRES and various preconditioners on a test problem, a cylinder with an open surface, discretized with *n* = 1299 edges. The tolerance is set to 10−8. The symbol '-' means that no convergence was achieved after 1000 iterations. The results are

acronyms:

Approximate inverse methods are very attractive for parallelism. They explicitly compute and store an approximation of the inverse of the coefficient matrix *<sup>M</sup>* <sup>≈</sup> *<sup>S</sup>*−1, which may be used as preconditioner by performing one or more sparse M-V products operations at each step of an iterative solver. As shown in Figure 6, due to the rapid decay of the Green's function the entries of *A*−<sup>1</sup> may have a very similar structure to those of *A*, so that a very sparse preconditioner *M* may effectively capture the large contributions to the inverse.

(a) Pattern of large entries of *A* (b) Pattern of large entries of *A*−<sup>1</sup>

Fig. 6. Structure of the large entries of *A* (on the left) and of *A*−<sup>1</sup> (on the right). Large to small entries are depicted in different colors, from red to green, yellow and blue. The test problem is a small sphere.

The actual entries of *M* may be computed by minimizing the error matrix �*I* − *SM*�*<sup>F</sup>* for right preconditioning (�*I* − *MS*�*<sup>F</sup>* resp. left preconditioning). The Frobenius-norm allows to decouple the constrained minimization problem into *n* independent linear least-squares problems, one for each column (resp. row) of *M* when preconditioning from the right (resp. from the left). The independence of the least-squares problems can be immediately seen from the identity

$$\|\|I - SM\|\|\_{\mathcal{F}}^2 = \sum\_{j=1}^n \|e\_j - Sm\_{\bullet j}\|\_{2^\prime}^2 \tag{7}$$

where *ej* is the *<sup>j</sup>*th canonical unit vector and *<sup>m</sup>*•*<sup>j</sup>* is the column vector representing the *<sup>j</sup>*th column of *M*. In the case of right preconditioning, the analogous relation

$$\|\|I - MS\|\|\_F^2 = \|\|I - S^T M^T\|\|\_F^2 = \sum\_{j=1}^n \|e\_j - S^T m\_{j\bullet}\|\_2^2 \tag{8}$$

holds, where *mj*• is the column vector representing the *<sup>j</sup>*th row of *<sup>M</sup>*. The preconditioner is not guaranteed to be nonsingular in general, and additionally it does not preserve any possible symmetry of *A*. The condition to ensure non-singularity of *M* may be derived from the following estimates of the accuracy of the approximate inverse (Grote & Huckle (1997)):

Fast Preconditioned Krylov Methods for Boundary Integral Equations

hours.

problem-dependent.

triangular factor

integral equations on a model problem.

*do f* / *f req*

for Boundary Integral Equations in Electromagnetic Scattering

**4.3 Incomplete LU factorization preconditioner**

*L* is defined by

coefficient *lk*,*<sup>i</sup>* of *L* is computed as follows:

in Electromagnetic Scattering 13

<sup>167</sup> Fast Preconditioned Krylov Methods

Elapsed time precond (sec)

FROB GMRES(∞) GMRES(120)

Density Time Iter Time Iter Time

time (sec)

23676 / 1.3 Ghz 0.94 2m 438 20m +2000 55m 104793 / 2.6 " 0.19 6m 234 20m 253 17m 419172 / 5.2 " 0.05 21m 413 2h 44m 571 2h 26m 943137 / 7.8 " 0.02 49m 454 3h 35m• 589 5h 55m Table 4. Numerical scalability of the approximate inverse for solving large-scale boundary integral equations. The symbol • means run on 32 processors. Notation: *m* means minutes, *h*

a priori the sparse pattern for the factors can be extremely hard and dynamic pattern selection strategies, that drop small entries below a user-defined threshold during the computation, may be very difficult to tune as they can easily discard relevant information and lead to a very poor preconditioner. For those problems, finding the appropriate threshold to enable a good trade-off between sparsity and numerical efficiency is challenging and very

*ILU*-type methods compute an approximate triangular decomposition of *S* by means of an

*U* ≈ *U* where *L* and *U* denote respectively the lower and upper triangular factors of the standard *LU* factorization of *S*. This class of methods is virtually always used for solving sparse linear systems. However, mixed success is reported on dense matrix problems, due to the indefiniteness of the systems arising from the discretization. The root of the problem is that small pivots often appear during the factorization, leading to highly ill-conditioned

In Table 5 we show an experiment with an *ILU* preconditioner computed from the sparse approximation *S* to *A*, using different values of density for *S*. The test case is a sphere of 1 meter length illuminated at 300 MHz; the problem is modeled using EFIE and the mesh is discretized with 2430 edges. The set F of fill-in entries to be kept for the approximate lower

(*k*, *i*) | *lev*(*lk*,*i*) ≤

where the integer denotes a user specified maximal fill-in level. The level *lev*(*lk*,*i*) of the

,

*LU*,

*L* ≈ *L*,

incomplete Gaussian elimination process. The *ILU* preconditioner writes as *M* =

triangular factors and unstable triangular solves (Carpentieri et al. (2004)).

<sup>F</sup> <sup>=</sup>

112908 (8) 513 0.39 221952 (16) 497 0.43 451632 (32) 509 0.48 900912 (64) 514 0.60 Table 3. Parallel scalability of the approximate inverse for solving large-scale boundary

*<sup>n</sup>* (procs) Construction

THEOREM **1.** *Let rj* = *Smj* − *ej be the residual associated with column mj for j* = 1, 2, . . . , *n, and q* = max 1≤*j*≤*n nnz rj* � *n. Suppose that rj* 2 < *t for j* = 1, 2, . . . , *n, then we have* �*SM* <sup>−</sup> *<sup>I</sup>*�*<sup>F</sup>* <sup>≤</sup> <sup>√</sup>*nt, <sup>M</sup>* <sup>−</sup> *<sup>S</sup>*−<sup>1</sup> *<sup>F</sup>* <sup>≤</sup> *S*−<sup>1</sup> 2 <sup>√</sup>*nt,* �*SM* <sup>−</sup> *<sup>I</sup>*�<sup>2</sup> <sup>≤</sup> <sup>√</sup>*nt, <sup>M</sup>* <sup>−</sup> *<sup>S</sup>*−<sup>1</sup> <sup>2</sup> <sup>≤</sup> *S*−<sup>1</sup> 2 <sup>√</sup>*nt,* �*SM* <sup>−</sup> *<sup>I</sup>*�<sup>1</sup> <sup>≤</sup> <sup>√</sup>*qt, <sup>M</sup>* <sup>−</sup> *<sup>S</sup>*−<sup>1</sup> <sup>1</sup> <sup>≤</sup> *S*−<sup>1</sup> 1 √*qt.* 

Owing to this result, all the eigenvalues of *SM* lie in the disk centered in 1 and of radius √*qt*; the value of *q* is not known a priori, though, so that one might enforce the condition <sup>√</sup>*nt* <sup>&</sup>lt; 1 to prevent singularity or near-singularity of the preconditioned matrix. In practice it may be too costly to compute *M* with such a small *t*. For some problems, it may be observed a lack of robustness of the approximate inverse due to the clustering of small eigenvalues in the spectrum of the preconditioned matrix. Stabilization techniques based on eigenvalue deflation may be used to enhance the robustness of *M*, see e.g. Carpentieri et al. (2003).

The most critical component is the computation of the nonzero structure of *M*. From Figure 6, we see that the sparse pattern of *S* may be a suitable pattern for *M*. Denoting by

$$\mathcal{P}\_{\;\;\;l} = \{ \;(i,j) \in [1,n]^2 \text{ s.t. } m\_{ij} \neq 0 \; \}$$

the nonzero structure of the approximate inverse, we may automatically determine the pattern of the nonzero entries of the *j*th column of *M* as

$$\mathcal{C}\_{j} = \{ i \in [1, n] \text{ s.t. } (i, j) \in \mathcal{P} \}.$$

and compute the associated entries by solving a small size dense least-squares problem. The least-squares solution involves only those columns of *S* indexed by C*j*; we indicate this subset by *S*(:, C*j*). Because *S* is sparse, many rows in *S*(:, C*j*) are usually null, not affecting the solution of the least-squares problems (7). Thus denoting by R*<sup>j</sup>* the set of indices corresponding to the nonzero rows in *<sup>S</sup>*(:, <sup>C</sup>*j*), by *<sup>S</sup>* <sup>=</sup> *<sup>S</sup>*(R*j*, <sup>C</sup>*j*), by *<sup>m</sup><sup>j</sup>* <sup>=</sup> *mj*(C*j*), and by *ej* <sup>=</sup> *ej*(C*j*), the actual "reduced" least-squares problems to solve are

$$\min \|\hat{\boldsymbol{\varepsilon}}\_{j} - \dot{\boldsymbol{S}}\hat{\boldsymbol{m}}\_{j}\|\_{2}, \ j = 1, \ldots, n. \tag{9}$$

Usually problems (9) have much smaller size than problems (7) and can be efficiently solved by dense QR factorization. The parallel implementation of the approximate inverse is highly scalable as shown in Table 3, while the numerical performance typically tend to deteriorate for increasing matrix size as can be seen in Table 4.

Approximate inverses may be also computed in factorized form as *<sup>M</sup>* <sup>=</sup> *<sup>G</sup>Z*, where *<sup>G</sup>* <sup>≈</sup> *<sup>U</sup>*−<sup>1</sup> and *<sup>Z</sup>* <sup>≈</sup> *<sup>U</sup>*−<sup>1</sup> are approximation of the inverse triangular factors of *<sup>S</sup>*, see for instance Alléon, Benzi & Giraud (1997); Chen (1998); Rahola (1998); Samant et al. (1996). One example of such preconditioner is the *AINV* method by Benzi et al. (1996), a sparse approximate inverse computed in factorized form by applying an incomplete biconjugation process to *S* and dropping small entries below a threshold in the inverse factors. However, disappointing results with factorized approximate inverses have been reported on this problem class, see e.g. Carpentieri et al. (2004). The reason of failure is that for many integral formulations like EFIE and CFIE, the inverse factors may be totally unstructured. In this case, selecting 12 Will-be-set-by-IN-TECH

166 Trends in Electromagnetism – From Fundamentals to Applications

THEOREM **1.** *Let rj* = *Smj* − *ej be the residual associated with column mj for j* = 1, 2, . . . , *n, and*

 *rj* 2

*<sup>M</sup>* <sup>−</sup> *<sup>S</sup>*−<sup>1</sup>

*<sup>M</sup>* <sup>−</sup> *<sup>S</sup>*−<sup>1</sup>

*<sup>M</sup>* <sup>−</sup> *<sup>S</sup>*−<sup>1</sup>

Owing to this result, all the eigenvalues of *SM* lie in the disk centered in 1 and of radius √*qt*; the value of *q* is not known a priori, though, so that one might enforce the condition <sup>√</sup>*nt* <sup>&</sup>lt; 1 to prevent singularity or near-singularity of the preconditioned matrix. In practice it may be too costly to compute *M* with such a small *t*. For some problems, it may be observed a lack of robustness of the approximate inverse due to the clustering of small eigenvalues in the spectrum of the preconditioned matrix. Stabilization techniques based on eigenvalue deflation may be used to enhance the robustness of *M*, see e.g. Carpentieri et al. (2003).

The most critical component is the computation of the nonzero structure of *M*. From Figure 6,

the nonzero structure of the approximate inverse, we may automatically determine the pattern

C*<sup>j</sup>* = {*i* ∈ [1, *n*] *s*.*t*. (*i*, *j*) ∈ P}. and compute the associated entries by solving a small size dense least-squares problem. The least-squares solution involves only those columns of *S* indexed by C*j*; we indicate this subset by *S*(:, C*j*). Because *S* is sparse, many rows in *S*(:, C*j*) are usually null, not affecting the solution of the least-squares problems (7). Thus denoting by R*<sup>j</sup>* the set of indices corresponding to the nonzero rows in *<sup>S</sup>*(:, <sup>C</sup>*j*), by *<sup>S</sup>* <sup>=</sup> *<sup>S</sup>*(R*j*, <sup>C</sup>*j*), by *<sup>m</sup><sup>j</sup>* <sup>=</sup> *mj*(C*j*), and by *ej* <sup>=</sup> *ej*(C*j*), the actual

Usually problems (9) have much smaller size than problems (7) and can be efficiently solved by dense QR factorization. The parallel implementation of the approximate inverse is highly scalable as shown in Table 3, while the numerical performance typically tend to deteriorate

Approximate inverses may be also computed in factorized form as *<sup>M</sup>* <sup>=</sup> *<sup>G</sup>Z*, where *<sup>G</sup>* <sup>≈</sup> *<sup>U</sup>*−<sup>1</sup> and *<sup>Z</sup>* <sup>≈</sup> *<sup>U</sup>*−<sup>1</sup> are approximation of the inverse triangular factors of *<sup>S</sup>*, see for instance Alléon, Benzi & Giraud (1997); Chen (1998); Rahola (1998); Samant et al. (1996). One example of such preconditioner is the *AINV* method by Benzi et al. (1996), a sparse approximate inverse computed in factorized form by applying an incomplete biconjugation process to *S* and dropping small entries below a threshold in the inverse factors. However, disappointing results with factorized approximate inverses have been reported on this problem class, see e.g. Carpentieri et al. (2004). The reason of failure is that for many integral formulations like EFIE and CFIE, the inverse factors may be totally unstructured. In this case, selecting

we see that the sparse pattern of *S* may be a suitable pattern for *M*. Denoting by

P = { (*i*, *j*) ∈ [1, *n*]

of the nonzero entries of the *j*th column of *M* as

"reduced" least-squares problems to solve are

for increasing matrix size as can be seen in Table 4.

 *<sup>F</sup>* <sup>≤</sup> *S*−<sup>1</sup> 2 <sup>√</sup>*nt,*

 <sup>2</sup> <sup>≤</sup> *S*−<sup>1</sup> 2 <sup>√</sup>*nt,*

 <sup>1</sup> <sup>≤</sup> *S*−<sup>1</sup> 1 √*qt.*

<sup>2</sup> *<sup>s</sup>*.*t*. *mij* �<sup>=</sup> <sup>0</sup> }

*min*�*ej* <sup>−</sup> *<sup>S</sup>mj*�2, *<sup>j</sup>* <sup>=</sup> 1, .., *<sup>n</sup>*. (9)

< *t for j* = 1, 2, . . . , *n, then we have*

� *n. Suppose that*

�*SM* <sup>−</sup> *<sup>I</sup>*�*<sup>F</sup>* <sup>≤</sup> <sup>√</sup>*nt,*

�*SM* <sup>−</sup> *<sup>I</sup>*�<sup>2</sup> <sup>≤</sup> <sup>√</sup>*nt,*

�*SM* <sup>−</sup> *<sup>I</sup>*�<sup>1</sup> <sup>≤</sup> <sup>√</sup>*qt,*

*q* = max 1≤*j*≤*n*  *nnz rj* 


Table 3. Parallel scalability of the approximate inverse for solving large-scale boundary integral equations on a model problem.


Table 4. Numerical scalability of the approximate inverse for solving large-scale boundary integral equations. The symbol • means run on 32 processors. Notation: *m* means minutes, *h* hours.

a priori the sparse pattern for the factors can be extremely hard and dynamic pattern selection strategies, that drop small entries below a user-defined threshold during the computation, may be very difficult to tune as they can easily discard relevant information and lead to a very poor preconditioner. For those problems, finding the appropriate threshold to enable a good trade-off between sparsity and numerical efficiency is challenging and very problem-dependent.
