**4.3 Incomplete LU factorization preconditioner**

*ILU*-type methods compute an approximate triangular decomposition of *S* by means of an incomplete Gaussian elimination process. The *ILU* preconditioner writes as *M* = *LU*, *L* ≈ *L*, *U* ≈ *U* where *L* and *U* denote respectively the lower and upper triangular factors of the standard *LU* factorization of *S*. This class of methods is virtually always used for solving sparse linear systems. However, mixed success is reported on dense matrix problems, due to the indefiniteness of the systems arising from the discretization. The root of the problem is that small pivots often appear during the factorization, leading to highly ill-conditioned triangular factors and unstable triangular solves (Carpentieri et al. (2004)).

In Table 5 we show an experiment with an *ILU* preconditioner computed from the sparse approximation *S* to *A*, using different values of density for *S*. The test case is a sphere of 1 meter length illuminated at 300 MHz; the problem is modeled using EFIE and the mesh is discretized with 2430 edges. The set F of fill-in entries to be kept for the approximate lower triangular factor *L* is defined by

$$\mathcal{F} = \left\{ (k, i) \mid lev(l\_{k,i}) \le \ell \right\} \; , \; i$$

where the integer denotes a user specified maximal fill-in level. The level *lev*(*lk*,*i*) of the coefficient *lk*,*<sup>i</sup>* of *L* is computed as follows:

Fast Preconditioned Krylov Methods for Boundary Integral Equations

which yields *A*ˆ *x*ˆ = ˆ

�*U*−<sup>1</sup>

approximate partial factorization

Π*<sup>T</sup> A*ˆΠ =

�

after a one additional level we obtain

⎡ ⎢ ⎣

represented via *LE* <sup>≈</sup> *EU*−<sup>1</sup>

*P*˜*D*˜ *<sup>l</sup> AD*˜ *rQ*˜ =

� *B F E C* � ≈ � *LB* 0 *LE I*

*Bx*ˆ1 + *Fx*ˆ2 = ˆ

*Ex*ˆ1 + *Cx*ˆ2 = ˆ

*<sup>B</sup> <sup>D</sup>*−<sup>1</sup>

*B F*<sup>1</sup> *F*<sup>2</sup> *E*<sup>1</sup> *C*<sup>11</sup> *C*<sup>12</sup> *E*<sup>2</sup> *C*<sup>21</sup> *C*<sup>22</sup>

*b*1

*b*2 ⇒ �

⎤ ⎥ ⎦ ≈

in Electromagnetic Scattering 15 on the convergence is difficult to predict (Carpentieri et al. (2004)). Pivoting may be a more robust approach to overcome the problem according to reported experiment by Malas & Gürel

<sup>169</sup> Fast Preconditioned Krylov Methods

We follow a different approach. We report on experiments with multilevel inverse-based *ILU* factorization methods to possibly remedy numerical instabilities. Following Bollhöfer & Saad

weight matching (Duff & Koster (1999)). By rescaling and a one-sided permutation, it attempts to improve the diagonal dominance. After that, a symmetric reordering is applied to reduce the fill-/bandwidth. The latter can also be used without an a priori matching step, only rescaling the entries and symmetrically permuting the rows and the columns. This is of particular interest for (almost) symmetrically structured problems. Next, an inverse-based ILU with static diagonal pivoting is computed. I.e., during the approximate incomplete factorization *<sup>A</sup>*<sup>ˆ</sup> <sup>≈</sup> *LDU* such that *<sup>L</sup>*, *<sup>U</sup><sup>H</sup>* are unit lower triangular factors and *<sup>D</sup>* is block diagonal, the norms �*L*−1�, �*U*−1� are estimated. If at factorization step *<sup>l</sup>* a prescribed bound *κ* is exceeded, the current row *l* and column *l* are permuted to the lower right end of the matrix. Otherwise the approximate factorization is continued. One single pass leads to an

� � *DB* <sup>0</sup>

with a suitable leading block *<sup>B</sup>* and a suitable permutation matrix <sup>Π</sup>, where �*L*−<sup>1</sup>

<sup>1</sup> � ≤ *<sup>κ</sup>*. The remaining system *SC* approximates *<sup>C</sup>* <sup>−</sup> *EB*−1*F*. From the relations

*<sup>B</sup>* (resp. *UF* <sup>≈</sup> *<sup>D</sup>*−<sup>1</sup>

⎡ ⎢ ⎣ 0 *SC*

(*<sup>C</sup>* <sup>−</sup> *EB*−1*F*)*x*ˆ2 <sup>=</sup> <sup>ˆ</sup>

*<sup>B</sup> <sup>L</sup>*−<sup>1</sup>

⎤ ⎥ ⎦ ⎡ ⎢ ⎣

at each step of an iterative solver we need to store and invert only blocks with *B* and *SC* <sup>≈</sup> *<sup>C</sup>* <sup>−</sup> *EB*−1*<sup>F</sup>* while for reasons of memory efficieny, *LE*, *UF* are discarded and implicitly

the factorization is successively applied to *SC*, a multilevel variant of (10) is computed. E.g.,

*LB* 0 0 *LE*<sup>1</sup> *I* 0 *LE*<sup>2</sup> *LC*<sup>21</sup> *I*

The multilevel algorithm ends at some step *m* when either *SC* is factored completely or it becomes considerably dense and switches to a dense LAPACK solver. After computing an *m*-step ILU decomposition, for preconditioning we have to apply *L*−<sup>1</sup> *<sup>m</sup> AU*−<sup>1</sup> *<sup>m</sup>* . From the error equation *Em* <sup>=</sup> *<sup>A</sup>* <sup>−</sup> *LmDmUm*, we see that �*L*−<sup>1</sup> *<sup>m</sup>* � and �*U*−<sup>1</sup> *<sup>m</sup>* � contribute to the inverse error *L*−<sup>1</sup> *<sup>m</sup> EmU*−<sup>1</sup> *<sup>m</sup>* . Monitoring the growth of these two quantities during the partial factorization

� � *UB UF* 0 *I*

*x*ˆ1 = *B*−1(ˆ

*DB* 0 0 0 *DC*<sup>11</sup> 0 0 0 *S*<sup>22</sup>

�

*b*<sup>1</sup> − *Fx*ˆ2)

*b*1

*<sup>B</sup> F*). When the scaling, preordering and

,

*UB UF*<sup>1</sup> *UF*<sup>2</sup> 0 *I UC*<sup>12</sup> 0 0 *I*

⎤ ⎥ ⎦ .

*<sup>b</sup>*<sup>2</sup> <sup>−</sup> *EB*−<sup>1</sup> <sup>ˆ</sup>

⎤ ⎥ ⎦ ⎡ ⎢ ⎣

≡ *L*1*D*1*U*1, (11)

<sup>1</sup> � ≤ *κ*,

� � � *sij* � �

*PTDl ADrQ* = *A*ˆ, (10)

*b*. The initial step may consist of an optional maximum

� <sup>&</sup>gt; <sup>|</sup>*sii*|, where

(2007); in this case, the *i*th row of the factor is computed as soon as *permtol* ×

*permtol* is the permutation tolerance and *sij* are the entries of *S*.

for Boundary Integral Equations in Electromagnetic Scattering

(2006), we initially rescale and reorder the initial matrix *A* as

*b* for appropriate *x*ˆ, ˆ

**Initialization**

$$lev(l\_{k,i}) = \begin{cases} 0 & \text{if } l\_{k,i} \neq 0 \text{ or } k = i \\\\ \infty \text{ otherwise} \end{cases}$$

**Factorization**

$$lev(l\_{k,i}) = \min\left\{ \left. lev(l\_{k,i}) \right. \left. lev(l\_{l,j}) + \left. lev(l\_{k,j}) + 1 \right. \right\} \right\}.$$

Observe that the larger , the higher the density of the preconditioner. We denote the resulting preconditioner by *ILU*() Saad (1996).

In our results, increasing the fill-in parameter may produce much more robust preconditioners than *ILU*(0) applied to a denser sparse approximation of the original matrix; *ILU*(1) may deliver a good rate of convergence provided the coefficient matrix is not too sparse. However, the factorization of a very sparse approximation (up to 2%) of the coefficient matrix can be stable and accelerate significantly the convergence, especially if at least one level of fill-in is retained. Then, for higher values of the density of *S* the factors may become progressively ill-conditioned, the triangular solves unstable and consequently the preconditioner is useless. The table also shows that ill-conditioning of the factors is not related to ill-conditioning of *A*.


Table 5. Number of iterations of GMRES varying the sparsity level of *S* and the level of fill-in of the approximate factor *L* on a spherical model problem (*<sup>n</sup>* <sup>=</sup> 2430, *<sup>κ</sup>*∞(*A*) = �*A*�∞�*A*−1�<sup>∞</sup> ≈ O(102)). The symbol '-' means that convergence was not obtained after 500 iterations.

A complex diagonal compensation can help to compute a more stable preconditioner, by shifting along the imaginary axis the eigenvalues close to zero in the spectrum of the coefficient matrix. However, the value of the shift is not easy to tune *a priori* and its effect 14 Will-be-set-by-IN-TECH

168 Trends in Electromagnetism – From Fundamentals to Applications

∞ otherwise

Observe that the larger , the higher the density of the preconditioner. We denote the resulting

In our results, increasing the fill-in parameter may produce much more robust preconditioners than *ILU*(0) applied to a denser sparse approximation of the original matrix; *ILU*(1) may deliver a good rate of convergence provided the coefficient matrix is not too sparse. However, the factorization of a very sparse approximation (up to 2%) of the coefficient matrix can be stable and accelerate significantly the convergence, especially if at least one level of fill-in is retained. Then, for higher values of the density of *S* the factors may become progressively ill-conditioned, the triangular solves unstable and consequently the preconditioner is useless. The table also shows that ill-conditioning of the factors is not related to ill-conditioning of *A*.

> Density of *S* = 2% *IC*(level) Density of *L κ*∞(*L*) GMRES(30) GMRES(50) *IC*(0) 2.0% 2 · 103 378 245 *IC*(1) 5.1% 1 · 103 79 68 *IC*(2) 9.1% 9 · 102 58 48

> Density of *S* = 4% *IC*(level) Density of *L κ*∞(*L*) GMRES(30) GMRES(50) *IC*(0) 4.0% 6 · 109 – – *IC*(1) 11.7% 2 · 105 – – *IC*(2) 19.0% 7 · 103 40 38

> Density of *S* = 6% *IC*(level) Density of *L κ*∞(*L*) GMRES(30) GMRES(50) *IC*(0) 6.0% 8 · 1011 – – *IC*(1) 18.8% 5 · 1011 – – *IC*(2) 29.6% 7 · 104 – –

Table 5. Number of iterations of GMRES varying the sparsity level of *S* and the level of fill-in

A complex diagonal compensation can help to compute a more stable preconditioner, by shifting along the imaginary axis the eigenvalues close to zero in the spectrum of the coefficient matrix. However, the value of the shift is not easy to tune *a priori* and its effect

problem (*<sup>n</sup>* <sup>=</sup> 2430, *<sup>κ</sup>*∞(*A*) = �*A*�∞�*A*−1�<sup>∞</sup> ≈ O(102)). The symbol '-' means that

�

0 if *lk*,*<sup>i</sup>* �= 0 or *k* = *i*

*lev*(*lk*,*i*), *lev*(*li*,*j*) + *lev*(*lk*,*j*) + 1

� .

**Initialization**

**Factorization**

of the approximate factor *L* on a spherical model

convergence was not obtained after 500 iterations.

preconditioner by *ILU*() Saad (1996).

*lev*(*lk*,*i*) =

*lev*(*lk*,*i*) = min

⎧ ⎪⎨

⎪⎩

on the convergence is difficult to predict (Carpentieri et al. (2004)). Pivoting may be a more robust approach to overcome the problem according to reported experiment by Malas & Gürel (2007); in this case, the *i*th row of the factor is computed as soon as *permtol* × � � � *sij* � � � <sup>&</sup>gt; <sup>|</sup>*sii*|, where *permtol* is the permutation tolerance and *sij* are the entries of *S*.

We follow a different approach. We report on experiments with multilevel inverse-based *ILU* factorization methods to possibly remedy numerical instabilities. Following Bollhöfer & Saad (2006), we initially rescale and reorder the initial matrix *A* as

$$P^T D\_l A D\_r Q = \hat{A}\_\prime \tag{10}$$

which yields *A*ˆ *x*ˆ = ˆ *b* for appropriate *x*ˆ, ˆ *b*. The initial step may consist of an optional maximum weight matching (Duff & Koster (1999)). By rescaling and a one-sided permutation, it attempts to improve the diagonal dominance. After that, a symmetric reordering is applied to reduce the fill-/bandwidth. The latter can also be used without an a priori matching step, only rescaling the entries and symmetrically permuting the rows and the columns. This is of particular interest for (almost) symmetrically structured problems. Next, an inverse-based ILU with static diagonal pivoting is computed. I.e., during the approximate incomplete factorization *<sup>A</sup>*<sup>ˆ</sup> <sup>≈</sup> *LDU* such that *<sup>L</sup>*, *<sup>U</sup><sup>H</sup>* are unit lower triangular factors and *<sup>D</sup>* is block diagonal, the norms �*L*−1�, �*U*−1� are estimated. If at factorization step *<sup>l</sup>* a prescribed bound *κ* is exceeded, the current row *l* and column *l* are permuted to the lower right end of the matrix. Otherwise the approximate factorization is continued. One single pass leads to an approximate partial factorization

$$\mathbf{III}^T \hat{A} \mathbf{II} = \begin{pmatrix} \mathbf{B} \ \mathbf{F} \\ \mathbf{E} \ \mathbf{C} \end{pmatrix} \approx \begin{pmatrix} L\_B \ \mathbf{0} \\ L\_E \ \mathbf{I} \end{pmatrix} \begin{pmatrix} D\_B \ \mathbf{0} \\ \mathbf{0} \ \mathbf{S}\_{\mathbb{C}} \end{pmatrix} \begin{pmatrix} \mathbf{U}\_B \ \mathbf{U}\_F \\ \mathbf{0} \ \mathbf{I} \end{pmatrix} \equiv L\_1 D\_1 \mathbf{U}\_1 \tag{11}$$

with a suitable leading block *<sup>B</sup>* and a suitable permutation matrix <sup>Π</sup>, where �*L*−<sup>1</sup> <sup>1</sup> � ≤ *κ*, �*U*−<sup>1</sup> <sup>1</sup> � ≤ *<sup>κ</sup>*. The remaining system *SC* approximates *<sup>C</sup>* <sup>−</sup> *EB*−1*F*. From the relations

$$\begin{cases} B\pounds\_1 + F\pounds\_2 = \hat{b}\_1\\ E\pounds\_1 + C\pounds\_2 = \hat{b}\_2 \end{cases} \Rightarrow \begin{cases} \pounds\_1 = B^{-1}(\hat{b}\_1 - F\pounds\_2) \\ (C - EB^{-1}F)\pounds\_2 = \hat{b}\_2 - EB^{-1}\hat{b}\_1 \end{cases} \land \begin{cases} \pounds\_1 = \hat{b}\_1\\ E\pounds\_1 = \hat{b}\_2 - EB^{-1}\hat{b}\_1 \end{cases} \land \begin{cases} \pounds\_1 = \hat{b}\_1\\ E\pounds\_2 = \hat{b}\_2 - EB^{-1}\hat{b}\_1 \end{cases} \land \begin{cases} \pounds\_1 = \hat{b}\_1\\ E\pounds\_1 = \hat{b}\_2 \end{cases}$$

at each step of an iterative solver we need to store and invert only blocks with *B* and *SC* <sup>≈</sup> *<sup>C</sup>* <sup>−</sup> *EB*−1*<sup>F</sup>* while for reasons of memory efficieny, *LE*, *UF* are discarded and implicitly represented via *LE* <sup>≈</sup> *EU*−<sup>1</sup> *<sup>B</sup> <sup>D</sup>*−<sup>1</sup> *<sup>B</sup>* (resp. *UF* <sup>≈</sup> *<sup>D</sup>*−<sup>1</sup> *<sup>B</sup> <sup>L</sup>*−<sup>1</sup> *<sup>B</sup> F*). When the scaling, preordering and the factorization is successively applied to *SC*, a multilevel variant of (10) is computed. E.g., after a one additional level we obtain

$$
\tilde{P}\tilde{D}\_{l}A\tilde{D}\_{r}\tilde{Q} = \begin{bmatrix} B & F\_{1} \\ E\_{1} & \mathbb{C}\_{11} \\ \hline E\_{2} & \mathbb{C}\_{21} \end{bmatrix} \tilde{\mathbf{C}}\_{12} \approx \begin{bmatrix} L\_{B} & 0 & 0 \\ L\_{E\_{1}} & I & 0 \\ \hline L\_{E\_{2}} & L\_{\mathbb{C}\_{21}} \end{bmatrix} \begin{bmatrix} D\_{B} & 0 & 0 \\ 0 & D\_{\mathbb{C}\_{11}} & 0 \\ 0 & 0 & \mathbb{S}\_{22} \end{bmatrix} \begin{bmatrix} \mathcal{U}\_{B} \ \mathcal{U}\_{I\_{1}} \ \mathcal{U}\_{I\_{2}} \\ 0 & I \end{bmatrix} \tilde{\mathbf{U}}\_{12} \begin{bmatrix} \mathcal{U}\_{L} \ \mathcal{U}\_{I\_{1}} \ \mathcal{U}\_{I\_{2}} \ \mathcal{U}\_{I\_{2}} \end{bmatrix}.
$$

The multilevel algorithm ends at some step *m* when either *SC* is factored completely or it becomes considerably dense and switches to a dense LAPACK solver. After computing an *m*-step ILU decomposition, for preconditioning we have to apply *L*−<sup>1</sup> *<sup>m</sup> AU*−<sup>1</sup> *<sup>m</sup>* . From the error equation *Em* <sup>=</sup> *<sup>A</sup>* <sup>−</sup> *LmDmUm*, we see that �*L*−<sup>1</sup> *<sup>m</sup>* � and �*U*−<sup>1</sup> *<sup>m</sup>* � contribute to the inverse error *L*−<sup>1</sup> *<sup>m</sup> EmU*−<sup>1</sup> *<sup>m</sup>* . Monitoring the growth of these two quantities during the partial factorization

Fast Preconditioned Krylov Methods for Boundary Integral Equations

for Boundary Integral Equations in Electromagnetic Scattering

unknowns is related to *k*.

**6. References**

USA.

16: 1–15.

14: 159–184.

61(8): 1310–1331.

86(4): 565–589.

optimal approach in the future.

in Electromagnetic Scattering 17 matrix is spectrally equivalent to the original matrix and preconditioning is often needed (Chan & Chen (2000; 2002); Chan et al. (1997); Chen (1999); Ford & Chen (2001); Hawkins & Chen (2005); Hawkins et al. (2005)). Some efficient wavelet preconditioning algorithms have been proposed, based on bordered block structure (Ford & Chen (2001); Hawkins et al. (2005)), multi-level preconditioners (Chan & Chen (2002)), and sparse approximate inverses. However, most experiments with wavelet preconditioners are reported for model problems, e.g. Calderon-Zygmund type matrix, single and double layer potentials, the hyper-singular operator. For oscillatory kernels the compressed matrix may be fairly dense and wavelet techniques are less useful. For Helmholtz problems, wavelet Galerkin schemes yield matrices with approximately <sup>O</sup>(*kn*) (*<sup>k</sup>* is the wavenumber) which becomes *<sup>O</sup>*(*n*2) when the number of

<sup>171</sup> Fast Preconditioned Krylov Methods

Further investigations are necessary to identify the best class of methods for the given problem and the selected computer hardware. The use of more powerful (but also more complex) computing facilities should help in the search for additional speed, but it will also mean that there will be even more factors that need to be considered when attempting to identify the

Alléon, G., Amram, S., Durante, N., Homsi, P., Pogarieloff, D. & Farhat, C. (1997). Massively

Alléon, G., Benzi, M. & Giraud, L. (1997). Sparse approximate inverse preconditioning for

Alpert, B., Beylkin, G., Coifman, R. & Rokhlin, V. (1993). Wavelet-like bases for the fast

Antoine, X., Bendali, A. & Darbas, M. (2004). Analytic preconditioners for the electric

Bayliss, A., Goldstein, C. I. & Turkel, E. (1985). On accuracy conditions for the numerical

Bebendorf, M. (2000). Approximation of boundary element matrices, *Numerische Mathematik*

Bendali, A. (1984). *Approximation par elements finis de surface de problemes de diffraction des ondes*

Benzi, M., Meyer, C. & Tuma, M. (1996). A sparse approximate inverse preconditioner for the conjugate gradient method., *SIAM J. Scientific Computing* 17: 1135–1149. Beylkin, G., Coifman, R. & Rokhlin, V. (1991). Fast wavelet transforms and numerical

Bilotti, F. & Vegni, C. (2003). MoM entire domain basis functions for convex polygonal patches,

Bollhöfer, M. & Saad, Y. (2006). Multilevel preconditioners constructed from inverse–based

*Journal of Electromagnetic Waves and Applications* 17(11): 1519–1538.

computation of waves, *J. Comp. Phys.* 59: 396–404.

*electro-magnetiques*, PhD thesis, *Université Paris VI* .

algorithms, *Comm. Pure Appl. Math.* 44: 141–183.

ILUs, *SIAM J. Scientific Computing* 27(5): 1627–1650.

parallel processing boosts the solution of industrial electromagnetic problems: High performance out-of-core solution of complex dense systems, *in* M. Heath, V. Torczon, G. Astfalk, P. E. Bjrstad, A. H. Karp, C. H. Koebel, V. Kumar, R. F. Lucas, L. T. Watson & E. D. E. Womble (eds), *Proceedings of the Eighth SIAM Conference on Parallel Computing*, SIAM Book, Philadelphia. Conference held in Minneapolis, Minnesota,

dense linear systems arising in computational electromagnetics, *Numerical Algorithms*

solution of second-kind integral equations, *SIAM J. Scientific and Statistical Computing*

field integral equation, *International Journal for Numerical Methods in Engineering*


is essential to preserve the numerical stability of the solver, as can be observed comparing results in Table 5 and Table 6.

Table 6. Number of iterations of GMRES using a multilevel inverse-based ILU factorization as preconditioner. The model problem is the same as in Table 5.

## **5. Concluding remarks**

We have discussed some fast iterative solution techniques for solving surface boundary integral equations. High-frequency simulations of large structures are extremely demanding for scalable solvers and large computing resources. We have reviewed recent advances for the class of Krylov subspace methods, sparse approximate inverses, incomplete *LU* factorizations.

Other approach have been applied in this area of research. Multigrid methods are provably optimal algorithms for solving various classes of partial differential equations. Attempts to apply these techniques to dense linear systems have obtained mixed success. Early experiments on boundary element equations are reported with geometric versions on simple model problems, typically the hypersingular and single-layer potential integral operators arising from the Laplace equation (Bramble et al. (1994); Petersdorff & Stephan (1992); Rjasanow (1987)). Multigrids require a hierarchy of nested meshes to setup the principal components of the algorithm, *i.e.* a coarsening strategy to decrease the number of unknowns, grid transfer operators to move from a grid to another one, coarse grid operators and smoothing procedure, see e.g. Hackbusch (1985). Thus they are difficult to implement. On the other hand, algebraic multigrid algorithms use only single grid information extracted from either the graph or the entries of the coefficient matrix and are nearly as effective as geometric algorithms in reducing the number of iterations, see e.g Braess (1995); Brandt (1999); Ruge & Stüben (1987); Vanek et al. (1996). Langer et al. propose to apply an auxiliary sparse matrix reflecting the local topology of the mesh on the fine grid to setup all the components of the multigrid algorithm in a purely algebraic setting (Langer et al. (2003)). This *gray-box* approach is fairly robust on model problems and maintains the algorithmic and memory complexity of the M-V product operation (Langer & Pusch (2005)), thus it is well suited to be combined with MLFMA. See also Carpentieri et al. (2007) for another multigrid-type solver.

Preconditioners based on wavelet techniques are also receiving interest. The wavelet compression of integral operators with smooth kernels yields nearly sparse matrices with at most <sup>O</sup>(*<sup>n</sup>* log*<sup>a</sup> <sup>n</sup>*) nonzero entries, where *<sup>a</sup>* is a small constant that depends on the operator and the wavelet used, see e.g. earlier work by Beylkin et al. (1991); Dahmen et al. (1993); Harbrecht & Schneider (2004); Hawkins et al. (2007); Lage & Schwab (1999). The compressed matrix is spectrally equivalent to the original matrix and preconditioning is often needed (Chan & Chen (2000; 2002); Chan et al. (1997); Chen (1999); Ford & Chen (2001); Hawkins & Chen (2005); Hawkins et al. (2005)). Some efficient wavelet preconditioning algorithms have been proposed, based on bordered block structure (Ford & Chen (2001); Hawkins et al. (2005)), multi-level preconditioners (Chan & Chen (2002)), and sparse approximate inverses. However, most experiments with wavelet preconditioners are reported for model problems, e.g. Calderon-Zygmund type matrix, single and double layer potentials, the hyper-singular operator. For oscillatory kernels the compressed matrix may be fairly dense and wavelet techniques are less useful. For Helmholtz problems, wavelet Galerkin schemes yield matrices with approximately <sup>O</sup>(*kn*) (*<sup>k</sup>* is the wavenumber) which becomes *<sup>O</sup>*(*n*2) when the number of unknowns is related to *k*.

Further investigations are necessary to identify the best class of methods for the given problem and the selected computer hardware. The use of more powerful (but also more complex) computing facilities should help in the search for additional speed, but it will also mean that there will be even more factors that need to be considered when attempting to identify the optimal approach in the future.
