Section 1 Theory and Design

## **Chapter 1**

## Optimization Using Genetic Algorithms – Methodology with Examples from Seismic Waveform Inversion

*Subhashis Mallick*

## **Abstract**

Genetic algorithms use the *survival of the fittes*t analogy from evolution theory to make random walks in the multiparameter model-space and find the model or the suite of models that best-fit the observation. Due to nonlinear nature, runtimes of genetic algorithms exponentially increase with increasing model-space size. A diversity-preserved genetic algorithm where each member of the population is given a measure of diversity and the models are selected in preference to both their objective and diversity values, and scaling the objectives using a suitably chosen scaling function can expedite computation and reduce runtimes. Starting from an initial model and the model-space defined as search intervals around it and using a new sampling strategy of generating smoothly varying initial set of random models within the specified search intervals; the proposed diversity-preserved method converges rapidly and estimates reliable models. The methodology and implementation of this new genetic algorithm optimization is described using examples from the prestack seismic waveform inversion problems. In geophysics, this new method can be useful for subsurface characterization where well-control is sparse.

**Keywords:** global optimization, genetic algorithm, inversion, parameter estimation, diversity preservation, sampling strategy

## **1. Introduction**

Genetic algorithm (GA) is a global optimization method based on the natural analogy from genetics and evolution theory [1, 2]. In geophysics GA has been used to solve a variety of single- and multi-objective inverse problems [3–17].

GA belongs to the class of *model-based optimizations* in which there are three distinct components: (1) model, (2) data, and (3) objective. It is also assumed that the model and the objective are related to one another via data and the underlying physics of the problem. The model or the decision space is usually denoted as **X**. In seismic

inverse problems under an isotropic elastic assumption for example, the model consists of the vectors of the P- wave velocity (*VP*), S-wave velocity (*VS*), and density (*ρ*) for each subsurface depth (or time) sample and the model space is the entire feasible range of their variability. For an anisotropic elastic inversion, the model would be other anisotropic parameters in addition to *VP*, *VS*, and *ρ*. For electromagnetic (EM) and gravity inverse problems, the model would consist respectively of the electrical resistivities or densities. Mathematically, the model or decision space is thus defined as

$$X = \begin{bmatrix} \mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_N \end{bmatrix}^T. \tag{1}$$

In Eq. (1) the superscript *T* represents a transpose and each *xi*, *i* ¼ 1, 2, ⋯, *N* represents one model vector and *N* is the total number of model vectors. Thus for isotropic elastic seismic inverse problem at a single location, *N* ¼ 3 with *x*<sup>1</sup> ¼ ½ � *VP*1,*VP*2, ⋯,*VPM <sup>T</sup>*, *<sup>x</sup>*<sup>2</sup> <sup>¼</sup> ½ � *VS*1,*VS*2, <sup>⋯</sup>,*VSM <sup>T</sup>*, *<sup>x</sup>*<sup>3</sup> <sup>¼</sup> *<sup>ρ</sup>*1, *<sup>ρ</sup>*<sup>2</sup> ½ � , <sup>⋯</sup>, *<sup>ρ</sup><sup>M</sup> <sup>T</sup>* , where for any depth or time sample *j*, *j* ¼ 1, 2, ⋯, *M*, *VPj*,*VSj*, *ρ<sup>j</sup>* respectively denote the P-wave velocity, S-wave velocity, and density at that sample, and *M* is the total number of samples. The data space is defined as

$$D = \begin{bmatrix} d\_1, d\_2, \dots, d\_l \end{bmatrix}^T,\tag{2}$$

where each *di*,*i* ¼ 1, 2, ⋯*J* is the vector representing the *i th*data being optimized. Finally, the objective space is defined as

$$Y = \begin{bmatrix} y\_1, y\_2, \dots, y\_f \end{bmatrix}^T,\tag{3}$$

where each *yi* , *i* ¼ 1, 2, ⋯, *J* is a scalar valued quantity representing the objective of the *i th*data. To compute the objective, it is assumed that there exists a unique mapping of the model space onto the objective space, i.e., *X* ! *Y* via the underlying physics of the problem and the data space **D**. The aim of any model-based optimization is to find the model (or suite of models) in the model (decision) space that either minimize or maximize the objective.

The problem defined above is multi-parameter and multi-objective optimization where multiple parameters (Eq. 1) are simultaneously estimated from multiple datasets (Eq. 2) via optimizing multiple objectives (Eq. 3). Such multi-parameter and multi-objective optimizations have been previously used in geophysics to solve a variety of problems such as estimating anisotropic properties for mantle lithosphere from the splitting parameters of teleseismic S-waves and P-wave residual spheres [18], wave equation migration velocity inversion [19], estimating earthquake focal mechanisms [20], inverting multicomponent seismic and electromagnetic (EM) data [10–12, 14, 17], etc.

This chapter restricts to the discussion of the multi-parameter and singleobjective optimization problem such that there are multiple parameters in the models space to be estimated (Eq. 1) using a single set of data and single objective, i.e., when *J* ¼ 1 in Eqs. (2) and (3). Although the examples provided are from the seismic inversion problem, its extension to solving other optimization problems is straightforward.

### **2. Multi-parameter and single-objective optimization problem**

Restricting to seismic inversion under isotropic assumption, the model consists of three parameters, (1) the longitudinal or P-wave velocity (*VP*), (2) transverse or the S-wave velocity (*VS*), and (3) density (*ρ*). Thus, for single-component isotropic elastic seismic inverse problems, the model and data can be redefined as

$$\mathcal{m} = \begin{bmatrix} V\_P, V\_S, \rho \end{bmatrix}^T,\tag{4}$$

and

$$d = \begin{bmatrix} d\_1, d\_2, \dots, d\_N \end{bmatrix}^T. \tag{5}$$

In Eq. (4), the vectors *VP* ¼ ½ � *VP*1,*VP*2, ⋯,*VPM <sup>T</sup>*, *VS* <sup>¼</sup> ½ � *Vs*1,*VS*2, <sup>⋯</sup>,*VSM <sup>T</sup>*, and *ρ* ¼ *ρ*1, *ρ*<sup>2</sup> ½ � , , *ρ<sup>M</sup> <sup>T</sup>* respectively denote the P-wave velocity, S-wave velocity, and density for each depth (or time) sample *i*; *i* ¼ 1, 2, ⋯, *M*, and the vector *d* in Eq. (5) is the input seismic data with *N* samples.

Having defined the model and data, a unique forward modeling operator *s* ¼ *g m*ð Þ is then defined which allows computing the synthetic or predicted data vector *s* ¼ ½ � *s*1, *s*2, ⋯, *sN <sup>T</sup>* exactly at the same points as those of the data vector *d* of Eq. (5). This forward modeling operator varies depending upon the flavor of the inversion method. For post-stack or pre-stack amplitude-variation-with-angle (AVA)/elasticimpedance inversion [9, 20–24], *g m*ð Þ is the convolutional modeling at normal or nonnormal incidence angles [25]. For wave-equation based inversion such as the full waveform inversion (FWI) [26–39], *g m*ð Þ is the numerical solution of the elastic or acoustic wave equation using finite-difference or finite-element modeling. Finally, for prestack waveform inversion (PWI), which is a subset of FWI under the assumption of a locally horizontal (1D) stratification at each location [3–8, 13, 16, 40, 41], *g m*ð Þ is the analytical solution to the 1D wave equation [42, 43].

After defining the synthetic or predicted data *s* ¼ *g m*ð Þ , the misfit or error between the observed and synthetic data is defined as *e* ¼ j j *d* � *s* , following which, the objective is defined as the sum of the squared errors

$$y = e^T e,\tag{6}$$

and the optimization can be cast as the minimization of the objective *y*. Alternatively, the objective can be either defined as�*eTe* [17] or as the normalized cross-correlation [3]

$$y = \frac{d^\dagger s + s^\dagger d}{d^\dagger d + s^\dagger s},\tag{7}$$

and the optimization can be cast as the maximization of *y*. The superscript † in Eq. (7) denotes a complex conjugate transpose. It is assumed here that the real and synthetic data vectors **d** and **s** can be complex valued. However, when they are realvalued, *<sup>d</sup>*† <sup>¼</sup> *<sup>d</sup><sup>T</sup>* and *<sup>s</sup>* † <sup>¼</sup> *<sup>s</sup> <sup>T</sup>*. Also, the objective defined by Eqs. (6) and (7) represent a pure least-square optimization which is unstable for most practical problems. Therefore, additional regularization or damping terms are used in defining the objective. These damping terms stabilize the optimization by addressing the inherent noise that are present on the data.

**Figure 1** illustrates single-objective and multi-parameter global optimization with reference to PWI. As shown in **Figure 1**, an initial model of *V*P, *V*S, and *ρ*, each as functions of depth is first estimated and the search windows around them are provided to define the model-space. In the presence of well-logs with borehole measurements of *V*P, *V*S, and *ρ*, they can be used for generating this initial model. In the absence of well-logs, initial *V*<sup>P</sup> can be estimated from seismic data using velocity analysis or other advanced velocity estimation procedures such as traveltime tomography [44] and then established *V*P-*V*<sup>S</sup> [23, 24, 45, 46] and *VP*-*ρ* relations [47] can be used to estimate initial *VS* and *ρ*. Based on the local geology and the knowledge of the variability of *V*P, *V*S, and *ρ*, the model-space or search windows can then be defined around these initial models and a suite of models is generated within them. These suites of models can be generated in different ways by providing an appropriate probability distribution around the initial model. For the most unbiased case, the distribution could be uniform (boxcar) between the minimum and maximum values and the models are generated such that the value of *VPi*, *VSi*, and *ρ<sup>i</sup>* at any depth *i* can have any random value within their minimum and maximum search bounds specified for that depth with equal probability. Alternatively, a Gaussian distribution with the mean and standard deviation, dictated by the initial model and the search window can be defined for each depth, and depth-dependent models of *VP*, *VS*, and *ρ* can be randomly drawn from this distribution. After generating these suite of models, synthetic seismograms using each of these models are computed and matched with the observed seismic data. When there are adequate offset samples, prestack seismic data in time and offset domain can be decomposed into plane-waves in the intercept-time and ray-parameter (*τ*-*p*) domain [5]; synthetic seismograms can also be computed in *τ*-*p* domain and matched with the real seismic data using Eq. (7) to compute the objective for each model. In many practical situations however, the offset samples may not be adequate to perform an accurate plane-wave decomposition in the *τ*-*p* domain and it must be approximately done in the angle-domain instead [8]. Thus, using the P-wave velocity field of each of the models, the offset-domain seismic data are converted into the angle domain using an offset-to-angle transformation method

#### **Figure 1.**

*Illustration of the single-objective and multi-parameter global optimization for prestack waveform inversion of seismic data.*

#### *Optimization Using Genetic Algorithms – Methodology with Examples from Seismic… DOI: http://dx.doi.org/10.5772/intechopen.113897*

[48–50]. Synthetic seismograms are also computed in angle domain and matched with the observed angle-gathers using Eq. (7) to compute the objective. Following the objective evaluation, they are scaled using a suitably chosen scaling function. Objective scaling, in principle, is equivalent to objective regularization and will be discussed in detail later in relation to genetic algorithms. Following scaling, the models generated so far are checked to verify if convergence is achieved. If yes, the method reports the solutions and exits. If not, the models are iteratively modified until a point is reached when one or more of the convergence criteria are satisfied (see **Figure 1**).

The way the models are modified in global optimization (the box shown with a different color in **Figure 1**), depends upon the flavor of global optimization being used. In simulated annealing (SA) for example [4, 6], the analogy of the fundamental physics of crystallization from a melt via slow and steady cooling is used to generate new models at each iteration. In Genetic algorithm (GA) on the other hand [1–3, 7, 8], the analogy from the natural selection and survival of the fittest of the evolution theory is used. In Particle swarm optimization (PSO), the models are modified using the analogy from the social behavior of a flock of birds or a school of fish [51, 52].

In theory, global optimizations do not depend upon the choice of the initial model like the local, gradient-based optimizations do. The only requirement for global methods is that the model-space provided as search bounds or prior probability distributions, is wide enough to contain the true model. In seismology, these global methods could be useful to characterize the subsurface in new exploration areas where well-control is sparse or unavailable. Even in matured areas with adequate well control, if production data are to be integrated with time-lapse seismic data through history matching via iterative reservoir simulations, geomechanical modeling, and seismic inversion; existing wells before production may not represent the true subsurface model and a global optimization would be useful to predict dynamic reservoir properties. Superiority of the GA over a linearized local inversion in predicting the ocean salinity and temperature from the water column reflections with no prior information has been clearly demonstrated by Padhi et. al. [41]. Yet, uses of global methods in seismic inversion is still limited. The primary reason for this is the fact that all global methods are nonlinear optimizations and computationally challenging for handling moderate to large sized seismic inverse problems. The advantage of using a global method is they can find a reasonable subsurface model even when the initial starting model is far from the true model. However, to find the true (global) model starting from a faraway initial guess requires (1) defining sufficiently large search bounds within the model-space, (2) generating a very large number of models within these bounds, and (3) iterating these models many times such that the model-space is adequately sampled (**Figure 1**). And the runtime for all global optimizations exponentially increase with increasing model-space size (i.e., with increasing search bounds) and increasing number of models to be iterated within this model-space [12].

Restricting specifically to GA, we will now elaborate the steps shown in **Figure 1** with particular emphasis on seismic inversion. As shown in **Figure 1**, the first step for GA is to get an estimate of the initial model and define the model-space. This initial model can be generated from the well-logs. But when well-logs are present, the initial model is close to the true model. Under such conditions, there are many computeefficient local gradient-based methods to handle the inverse problem, and the use of a global optimization is unnecessary. The power of using any global optimization is in the situation where well-logs are unavailable, and the initial model must be differently estimated.

**Figure 2** is a simple way to estimate the initial *VP* via velocity analysis and normal moveout (NMO) correction. In this method, the input seismic data (**Figure 2a**) are interactively NMO corrected using different *VP* fields until a time-varying *VP* field, shown in **Figure 2b** is obtained that optimally flattens (NMO-corrects) the input gather (**Figure 2c**). Once the *VP* field is estimated directly from data using the procedure shown in **Figure 2** or using other sophisticated method like tomography, initial *VS* and *ρ* can be estimated using established *VP* versus *VS* and *VP* versus *ρ* relations. To keep things simple, *VP*-*VS* relation, given as *VS* <sup>¼</sup> *VP* <sup>2</sup> and the Gardner's relation [47] given as *<sup>ρ</sup>* <sup>¼</sup> *<sup>α</sup>V<sup>β</sup> <sup>P</sup>* can be used to estimate the initial *VS* and *ρ*. The *VP*-*VS* relation *VS* <sup>¼</sup> *VP* <sup>2</sup> is commonly used in AVA/elastic-impedance inversion [23, 24, 46]. In Gardner's relation, *β* ¼ 0*:*25 and *α* is 0.23 or 0.31, respectively for *VP* in ft/s and m/s for most Gulf of Mexico sedimentary rocks.

Cyan curves in **Figure 3** represent the initial model for inverting the prestack seismic data shown in **Figure 2a**. **Figure 3a** is the initial *VP*, which is same as the *VP* shown in **Figure 2b** after time-to-depth conversion. Initial *VS* and *ρ*, shown in **Figures 3b** and **c** were computed from the initial *VP* of **Figure 3a** using the *VP*-*VS* and *VP*-*ρ* relations as discussed above. The cyan curve of **Figure 3d** is the Poisson's ratio, computed from the initial *VP* and *VS* of **Figure 3a** and **b** using the formula for the Poisson's ratio, given as [53]

$$\nu = \frac{1 - 2\left(\frac{V\_S}{V\_P}\right)^2}{2\left\{1 - \left(\frac{V\_S}{V\_P}\right)^2\right\}}.\tag{8}$$

Note that because the initial *VS* is computed from the initial *VP* by setting *Vs* <sup>¼</sup> *VP* 2 , per Eq. (8), the initial Poisson's ratio, shown in **Figure 3d** is constant <sup>1</sup> 3 . The location

#### **Figure 2.**

*Estimation of initial VP from prestack seismic data. (a) Input seismic gather. (b) Estimated VP. (3) Seismic gather after NMO correction using the estimated VP.*

*Optimization Using Genetic Algorithms – Methodology with Examples from Seismic… DOI: http://dx.doi.org/10.5772/intechopen.113897*

#### **Figure 3.**

*The well-log model (black), initial model (cyan), and the model space search limits (dashed red) for (a) VP, (b) VS, (c) ρ, and (d) Poisson's ratio.*

of the seismic data shown in **Figure 2** coincides with a well location and the well-log *VP*, *VS*, *ρ*, and the computed Poisson's ratio are shown as black curves in **Figures 3a**–**d**.

Besides the initial model, GA needs the model-space to be defined. The upper and lower bounds shown as dashed red curves on each panel of **Figure 3** are computed by varying (1) the initial *VP* by �10% (**Figure 3a**), (2) initial *ρ* by �15% (**Figure 3c**), (3) initial Poisson's ratio from �70 to +25% (**Figure 3d**), and (4) by computing the upper and lower bounds of *VS* (**Figure 3b**) from the initial *VP* (**Figure 3a**) and the lower and upper bounds of the Poisson's ratio (**Figure 3d**). Note that the model space, defined by these upper and lower bounds (dashed red curves), are wide enough to span the range of variations in the true (well-log) model.

After estimating the initial model and the model-space (**Figure 3**), the next step in GA is generating the initial population of models. As mentioned earlier, these initial models can be generated by letting initial *VP*, *VS* (or Poisson's ratio), and *ρ* to randomly vary between their respective lower and upper bounds such that each of them can take any value between these bounds with equal probability. These models could be generated from the lower and upper bounds of *VP* and *ρ* and the lower and upper bounds of either *VS* or Poisson's ratio. Note that for single-component (i.e., for vertical or pressure response) seismic data considered here, the variations of reflection amplitudes are controlled by the variations of P-P reflection coefficients (*RPP*) at subsurface interface boundaries as functions of the angle-of-incidence (*θ*). While the exact mathematical expression of *RPP* is complex, linearized approximations suggest that *RPP*ð Þ*θ* is primarily controlled by (1) P-wave velocity contrast, (2) density contrast, and (3) *VS*-*VP* ratio (*VS*/*VP*) or Poisson's ratio contrast [53–59]. For both AVA waveform inversion and PWI, Mallick [7, 8] found that parameterizing the random models with *VP*, Poisson's ratio and *ρ*, and then computing *VS* from *VP* and Poisson's ratio provides more stable inversion than parameterizing them directly in *VP*, *VS*, and *ρ*. From Eq. (8), it is straightforward to show that *VS* can be obtained from *VP* and Poisson's ratio ð Þ*ν* as

$$V\_S = V\_P \sqrt{\frac{1 - 2\nu}{2(1 - \nu)}}.\tag{9}$$

Thus, in the applications shown here, *VP*, *ν*, and *ρ* were randomly generated and then Eq. (9) was used to calculate *VS*. Also note that for 1D seismic inverse problems, layer thickness is also necessary. However, instead of layer thickness to be treated as the model parameter to be estimated, it is preferable to calculate them using a userdefined wavelength fraction at the dominant seismic frequency. Therefore, from the *VP* value *VPi* for any layer *i*, the thickness for the layer *i* can be computed as *ti* ¼ *gλ<sup>i</sup>* where *g* is the user-defined wavelength fraction and *λ<sup>i</sup>* is the wavelength at the dominant seismic frequency *<sup>f</sup> dom*, i.e., *<sup>λ</sup><sup>i</sup>* <sup>¼</sup> *VPi f dom* . Thus, the layer thickness also varies with the variation of *VP* but controlled by the dominant seismic frequency and the user-defined wavelength fraction. For seismic inverse problems, it has been shown that allowing layer thickness to randomly vary along with *VP*, *VS*, and *ρ* is unstable and stable results are obtained by fixing them at the observed reflection boundaries in time domain [3, 4, 60]. Thus, in some old GA and SA implementations [3, 4, 60], the layers were of fixed time-thickness. Later Mallick [8] obtained stable results by resolving models at a given fraction of the dominant wavelength. More recently, a trans-dimensional seismic inversion method [61–65] have been developed where the number of layers besides *VP*, *VS* or *ν*, and *ρ* is treated as additional model parameter and estimated. However, in the applications shown here, a fixed number of layers with thickness set as the user-defined wavelength fraction is used.

**Figure 4** shows randomly generated *VP* models along with the search window, initial model, and true (well-log) model. For any layer *i*, the P-wave velocity *VPi* for the layer is assumed to have any random value between the minimum and maximum search limits ð Þ *VPi min* and ð Þ *VPi max* for the layer. Generating them for all layers and for all other model parameters (i.e., for *VS=ν* and *ρ*) provides one random model and repeating the same for many models gives the model population. In **Figure 4**, ten random *VP* models are shown with one of the models in orange and the other nine in grey. Note that when a sufficiently large population of such random models are generated, they would span the entire model space.

#### **Figure 4.**

*Ten random VP models generated within the specified search window. One of the models is shown in a different color than the others to show the variability of the model within the specified search limits. The well-log and initial VP are also shown.*

*Optimization Using Genetic Algorithms – Methodology with Examples from Seismic… DOI: http://dx.doi.org/10.5772/intechopen.113897*

After generating the random population of models, synthetic data are computed for each model and matched with observation to compute the objective (**Figure 1**). However, before discussing objective computation, it is important to discuss one important GA-specific step known as the *parameter coding*/*decoding*. One popular version for GA works with coded parameters instead of the parameters themselves, and one good way for parameter coding is *binary coding*, illustrated in **Figure 5**. In binary coding, the model parameters (*VP*, *VS* or *ν*, and *ρ*) for each layer are coded as binary digits (bits). In **Figure 5** for example, 16 bits are assigned for *VP* and 8 bits are assigned for *VS*/*ν* and *ρ*. **Figure 5** shows the binary coded parameters for layer-1 only, but those for all layers are concatenated to form a single binary string to represent one model or one member of the population. In GA, this entire binary string representing one member is called a *chromosome* and each bit within the string is called a *gene*. The number of bits (genes) *Nbits* in a single model (chromosome) is thus given as *NL NVP* þ *NVS* þ *N<sup>ρ</sup>* in which *NL* is the number of layers and *NVP*, *NVS*, and *N<sup>ρ</sup>* are the number of bits used to code *VP*, *VS* or *ν*, and *ρ* for each layer. Thus, in binary coding, a coin is tossed *Nbits* times with 50% probability for heads (1) and 50% for tails (0) and the results (i.e., zeros and ones) of the coin-toss are placed next to one another to represent one model (chromosome). Repeating this 2*N* times where *N* is an integer, produces a population of 2*N* models (**Figure 5**). For a single model parameter (*VP*, *VS*/*ν*, *ρ*) in each layer *i*, having all bits set to 0 (zero) corresponds to the minimum parameter value and having them all set 1 (one) corresponds to the maximum parameter value, specified by the search window for that layer (**Figure 4**). Thus, the bits with random zeros and ones from the coin-toss can be decoded for their actual value by linearly interpolating between these minimum and maximum values. For example, assume the following parameters for binary coding:


**Figure 5.** *Illustration of binary coding.*

Under these conditions, *VP* for the layer is represented by the first eight bits, i.e., 10010111, *ν* for the layer is the next eight bits- 00111001, and *ρ* is represented by the last eight bits- 11001001. Now the equivalent decimal integer *dval* of the binary string 10010111 that *VP* represents is *dval* <sup>¼</sup> <sup>1</sup> � 20 <sup>þ</sup> <sup>0</sup> � <sup>2</sup><sup>1</sup> <sup>þ</sup> <sup>0</sup> � 22 <sup>þ</sup> <sup>1</sup> � 23 <sup>þ</sup> <sup>0</sup> � 24 <sup>þ</sup> <sup>1</sup> � 25 <sup>þ</sup> <sup>1</sup> � 26 <sup>þ</sup> <sup>1</sup> � 27 <sup>¼</sup> <sup>1</sup> <sup>þ</sup> <sup>8</sup> <sup>þ</sup> <sup>32</sup> <sup>þ</sup> <sup>64</sup> <sup>þ</sup> <sup>128</sup> <sup>¼</sup> <sup>233</sup>*:* Assuming when all eight bits are zero (00000000) is equivalent to the minimum value ð Þ *VP min*of the search limit (2500 m/s) and when all of them are one (11111111) is equivalent to the maximum value ð Þ *VP max* of the search limit (5500 m/s), *VP* for the layer is decoded as *VP* <sup>¼</sup> ð Þ *VP min*+ð Þ *dval* � <sup>1</sup> ð Þ *VP max*�ð Þ *VP min* <sup>2</sup>8�<sup>1</sup> <sup>¼</sup> <sup>2500</sup> <sup>þ</sup> <sup>232</sup> <sup>5500</sup>�<sup>2500</sup> <sup>255</sup> ¼ 5229 m/s. Similarly, it can be shown that *ν* and *ρ* for the layer are respectively 0.295 and 2.29 g/cm<sup>3</sup> . While such binary coding and decoding allows an easy way to search the model-space via crossover and mutation (see below), such a coding discreetly samples the modelspace. For any model parameter *P* with minimum and maximum search limits *Pmin*and *Pmax*, the resolution of this discrete sampling is given as *Pmax*�*Pmin* <sup>2</sup>*NB*�<sup>1</sup> , where *NB* is the number of bits used to code the parameter. Considering that the model space is, in fact, continuous, modern GA implementations are *real-coded*, which do not use any coding and directly work with real numbers. Thus, in real-coded GA, real values of the model parameters (*VP*, *VS*/*ν*, *ρ*) within the specified search window limits are randomly generated for each layer to represent one model or member in the population and do not require any decoding. Details of real parameter coding for GA can be found elsewhere [10–12, 66] and is not repeated here. Irrespective of whether the model parameters are coded as binary strings (**Figure 5**) or as real numbers, the next step in GA is to compute synthetic data using the underlying physics, match with the observation, and compute the objective.

Objective computation is followed by objective scaling or fitness scaling. In GA, objective is the primary driving mechanism for model-space search. However, if these objectives are directly used to drive search, the good models tend to dominate the poor ones, which is undesirable at early generations when none of the models are expected to be anywhere close to the true (global) optima, driving GA optimization to a local optimum [2]. In fitness scaling, the good models are scaled down and the poor models are scaled up and the degree of scaling up and down varies with generation. In GA, the original objectives computed using the normalized cross-correlation (Eq. 7 or equivalent) are called raw *fitness* and those after scaling are called *scaled fitness*. Although many ways for fitness scaling have been proposed, one simple and yet stable method is the linear scaling proposed by Goldberg (1989) [2]. In linear fitness scaling, the scaled fitness (*f* 0 ) is assumed to be linearly related to the raw fitness (*f* ) as

$$f' = a + bf,\tag{10}$$

where *a* and *b* are constants, and are computed from the constraints

$$f'\_{\text{max}} = \mathbb{S}\_{\text{\textdegree}} f\_{\text{avg}} \tag{11}$$

and

$$f\_{\text{avg}}' = f\_{\text{avg}}.\tag{12}$$

In Eq. (11), *f* 0 *max* is the maximum value of the scaled fitness, *f avg* is the average value of the raw fitness, *SC* is the scaling constant, and *f* 0 *avg* in Eq. (12) is the average *Optimization Using Genetic Algorithms – Methodology with Examples from Seismic… DOI: http://dx.doi.org/10.5772/intechopen.113897*

scaled fitness, set equal to the average raw fitness *f avg* . The scaling constant *SC* is the user-defined parameter that controls the model selection and mathematically, it represents the expected number of copies of the best member to be selected for crossover and mutation (discussed below). In seismic inversion, setting *SC* ¼ 0*:*8 at the beginning and slowly increasing it to 1.8 at the final generation provides good results [7, 8]. As mentioned earlier, fitness scaling is like adding a regularization or damping term to the pure least square inverse problem.

Fitness scaling allows stability to GA optimization. In seismic inverse problems however, Sen and Stoffa (1992) [4] found that even after fitness scaling, GA may still fail to converge. This is because of *genetic drift*- a tendency for GA to cluster toward a single region in the model-space when finite population size is used [2]. One way to avoid genetic drift is to use a very large population size and propagate it to many generations. However, being a nonlinear method, the runtime for GA becomes prohibitively expensive with increasing population size and must be avoided. So, instead of using a large population, running several independent GA optimizations with small population sizes, and later combining them is one good way to handle genetic drift [4, 7, 17]. However, making several such GA runs is still compute-intense, and it is advisable to find ways to avoid genetic drift in a single rather than several GA runs. For multi-objective optimizations, one way to avoid genetic drift in a single run is to compute the *population diversity*, given as the normalized Euclidean distance of a member in the population from its nearest neighbors measured along all objective axes [67]. For single-objective optimization problem, diversity can be calculated using the pseudo-code shown in **Figure 6** in which the population size is given as *NP* and the measured diversity is stored as a floating-point array *dist* of size *NP*. Note that the diversity value calculated by the pseudo-code of **Figure 6** is normalized, which can have a value varying between 0 and 1. In addition, the higher the diversity value, the more diverse (i.e., more isolated from its neighbors) is the model. Thus, multiplying the raw fitness values of each model by their respective diversity values prior to fitness scaling would prefer more diverse models over the ones that are less diverse and add an additional control to avoid premature convergence of GA. Thus, in *diversity-preserved* GA, diversity is an additional regularization parameter for the objective besides fitness scaling.

Having discussed genetic drift and how to avoid clustering and premature convergence via objective-regularization, the next step in GA is *reproduction* or *tournament selection*. In this step, the models are selected in preference to their respective regularized (diversity-preserved and scaled) fitness values. Although many methods are

**Figure 6.**

*The pseudo-code for diversity computation.*

proposed for reproduction, the stochastic remainder selection without replacement [2] has been reported to work very well for seismic inversion problems [3, 7]. In this method the mathematically expected value *Ei* for each member *i* of the of the population are first calculated as

$$E\_i = N\_P \frac{f\_i'}{\sum\_{j=1}^{N\_P} f\_j'}.\tag{13}$$

In Eq. (13)*NP* is the population size (number of models in the population) and *f* 0 *<sup>k</sup>* is the diversity-preserved scaled fitness of the *kth* member in the population. After calculating *Ei* for all members, the integer part of *Ei* is used as the number of copies of the member *i* to be selected into a new population pool of the same size *NP*. For example, if for a given member *j*, *Ej* ¼ 3*:*67, then three copies of the *j th* member would be selected into the new population pool. Repeating this process for all members *j*, *j* ¼ 1, 2, , *NP*, would then partially fill up the new population pool, in which some of the original members would be present once, more than once, or not present at all, depending upon whether the integer part of their mathematically expected values are 1, greater than 1, or less than 1. Next, the rest of the new population is filled up as follows:

	- a. Using the same example again, if the member *j* is randomly selected where *Ej* ¼ 3*:*67, then the coin would be tossed with the probability of heads set to 0.67.
		- i. If the outcome of the coin-toss is heads, then the one copy of the member *j* would be selected.

Reproduction generates a new population from the original (old) population, but the new population is simply a fitness and diversity preferred copy of the old population. To explore the model space and create a new generation of models from the old generation, the processes that GA uses are crossover, mutation, and elitism (update). Typically, crossover and mutation are combined into a single process, which is then followed by elitism to advance to the new generation. In the following, crossover, mutation, and elitism are explained using binary coding. Their extension for realcoded GA can be found in [66].

**Figure 7** illustrates the binary-coded crossover. First, two members from the reproduced population are randomly selected and treated as parents. In **Figure 7**, they are called *Parent-*1 and *Parent*-2. Next, a crossover site within each parameter space (*VP*, *VS*/*ν*, *ρ*) for each layer is randomly selected (dashed purple lines in **Figure 7**). Finally, the bits (genes) on the left-hand side of the crossover sites of each parameter are swapped between the parents to produce two children, denoted as *Child*-1 and *Child*-2 in **Figure 7**. Crossover is performed with a crossover probability *PC*. For each layer and each parameter, a coin is tossed with chances of heads set to *PC*. If the

*Optimization Using Genetic Algorithms – Methodology with Examples from Seismic… DOI: http://dx.doi.org/10.5772/intechopen.113897*

**Figure 7.** *Illustration of crossover.*

outcome of the coin-toss is heads, then the crossover for the given layer and model parameter is performed, else the bits for the parents for the parameter are simply copied into the children (i.e., *Parent*-1 is copied into *Child*-1, and *Parent*-2 is copied into *Child*-2). Thus, crossover produces two new (child) members from the two original (parent) members. For seismic inversions, setting *PC* ¼ 0*:*8 at early generations and slowly reducing it to about 0.65 at the end produces good results [7, 8]. In mutation, for each bit (gene) in both child members (*Child*-1 and *Child*-2) is sequentially visited and a coin is tossed with the chances of heads set to the probability of mutation *PM*. If the outcome of the coin-toss is heads, then the bit is changed, i.e., if it is 0 it is changed to 1 and if it is 1, it is changed to 0. A starting value of 0.15 and reducing it down to about 0.01 at the end is a good choice for *PM* [7, 8].

After producing each pair of children from their parents via crossover and mutation, objectives for the children are computed. Next, in elitism the objectives of the children and their corresponding parents are compared and two members with the highest objective values are allowed to advance to the next generation. Like crossover and mutation, elitism is also performed with a probability of elitism *PE* and a value of 0.5 at early generations and increasing it to about 0.9 at the end are good choices for *PE* for seismic inversion problems [7, 8]. Crossover, mutation, and elitism produces two members of the next generation from the two members of the old generation and performing *N* such operations of crossover, mutation, and elitism for a population size *NP* ¼ 2*N* would thus produce *NP* members of the new (next) generation from *NP* members of the old (previous) generation. Once the new generation of models are produced, they are advanced to another generation via reproduction, crossover, mutation, elitism, and the process is continued until a prespecified maximum number of generations- *Gmax* is reached or some other stopping criteria (see below for details) is satisfied.

### **3. Practical implementation of GA**

GA optimization is computationally challenging. However, the entire methodology can be parallelized in high-performance parallel computing environments, which, in

turn, can lead to a compute-efficient nonlinear global seismic inversion method. Considering that a global method in place of a local, gradient-based method for seismic inversion can, in principle, allow estimating a reliable subsurface depth model of elastic properties, even when there is no prior (well) information, implementing an efficient global method like GA to solve seismic inverse problems would certainly be a big leap forward not only for oil and gas exploration, but also in solid earth geophysics, global seismology, and in the emerging fields like Carbon Capture, Utilization, and Storage (CCUS), Underground Hydrogen Storage in Porous media (UHSP) , characterization of the Enhanced Geothermal Systems (EGS), etc. However, to implement GA in practice, it is important to address a few important aspects. In above, GA is described as the process where an initial population models of size *NP*, randomly generated within a user specified model-space (**Figure 4**) is propagated via reproduction, crossover, mutation, and elitism up to a specified number of generations *Gmax*. While these are the fundamental GA steps, implementing just them, irrespective of how large the values of *NP* and *Gmax* are, may possibly work on synthetic data, but is most likely to fail when applied to real seismic data. To develop GA that would work on a wide variety of real data, fundamental issue that must be addressed is that the real data are always noisy, which must be efficiently handled. Any model-based optimization, whether local or global, is an iterative process which must have ways to stop iteration and exit such that the method does not suffer from unnecessary computational burden. Thus, after each iteration, these algorithms check if the method converged to a reasonable solution. If not, the iteration is continued, else the method reports solutions and exits (**Figure 1**). In dealing with noisy real data however, deciding whether to continue with iteration or to stop iterating is not one simple step as shown in **Figure 1**. Noise levels on real seismic data are known to widely vary, which are controlled by the geological factors, environmental factors, and many other factors directly related to how the data were acquired in the field and later processed in a computational facility. Even within a single area, there are often different noise levels in different parts. To handle noisy data, the stopping criteria for GA optimization must be implemented such that there is convergence check at different points within the algorithm, so that the data that are less noisy may converge early and exit and at the same time, more iterations are allowed for noisy data. Such multipoint convergence check, if correctly implemented, would not only allow additional computations when needed, but it would also avoid unnecessary computations on the data that are relatively less noisy.

Considering the above issues, **Figure 8** is the workflow for the prestack waveform inversion (PWI) using GA optimization. The workflow shown in **Figure 8** is complex, in which different parts or modules are shown with different colors and are outlined by dashed boxes, also of different colors. The first module (Module-1) is color coded in light green and outlined by the red dashed box. This module comprises the basic GA optimization steps. The second module (Module-2), color coded in peach and outlined by gray dashed box is the first convergence checkpoint. The third module (Module-3) in light blue and within purple dashed box is the second convergence checkpoint. Finally, the fourth module (Module-4), coded in yellow within green box is the output module where the method reports solutions and exits. The main user defined controlling parameters are (1) *NP* population size, (2) *Gmax* number of generations, (3) *Rmax* maximum number of repeats, (4) *Niter* maximum number of iterations, and (5) *Cmin* minimum correlation, i.e., the minimum value of objective (raw fitness value) to be achieved in the optimization process.

*Optimization Using Genetic Algorithms – Methodology with Examples from Seismic… DOI: http://dx.doi.org/10.5772/intechopen.113897*

**Figure 8.**

*Methodology for GA optimization with parallel implementation and multiple convergence checkpoints to solve seismic inverse problems.*

At the beginning of Module-1 (modules with the red box), an initial random population of *NP* models are generated and distributed to different processors. Different processors compute the synthetic data and objectives (raw fitness values) for the subset of the model population they each receive. The master node then collects results from all processors, computes diversity, and performs fitness scaling and reproduction (tournament selection). Next, the selected models are again distributed to different processors such that they each perform crossover, mutation, and elitism and produce new members of the next generation on a subset of the entire population. The size of the population subset each processor receives and produces new members is decided by the number of processors being used. For example, if *NP* ¼ 100 and 50 processors are used, then each processor would receive two members from the selected population as parents, produce two child members via crossover and mutation, and then two new members of the new generation via elitism. When the number of processors is less than 50, they each would then receive more than two parents and thus produce more than two new next generation members. If the number of assigned processors are more than 50, then only the first 50 processors would do the operation and the rest would sit idle; thus, for a population size of *NP*, the maximum number of processors to be used by the method should be *NP* <sup>2</sup> , else the additional processors would not be used. In addition, *NP* should be an even number so that *NP* <sup>2</sup> is an integer. After generating new members within each processor, the master node receives them all and checks if the method progressed to *Gmax* generations. If not, the master node continues to perform tournament selection using the new models and distributes the selected models to different processors for creating the models for the next generation. However, if *Gmax* is reached, the methodology then goes into Module-2. In this module, whether the best value of the objective achieved at this point is at least 0*:*9*Cmin* is first checked. If yes, the method goes straight into Module-4 (yellow) to

report solutions and exit. Otherwise, the generation number is reset to 1 and the current model population is sent back to the green module (Module-1) to continue with another set of GA optimization for *Gmax* generations, and this process is repeated for a maximum of *Rmax* times with the chance of reporting solutions and exiting if 0*:*9*Cmin*is reached during any stage. After such *Rmax* repeats, the method goes to Module-3. In this module, the convergence criterion of reaching the objective at least to 0*:*9*Cmin* is first checked, and if not, a new initial model is set as the maximum likelihood model achieved at this point, a fresh new set of random models are generated within the model-space and sent to the top of Module-1 to start a fresh set of GA optimization with *Rmax* repeats. Module-3 is repeated for a maximum of *Niter* times with chances of being sent to Module-4 of reporting solutions and exiting at multiple points (see **Figure 8**).

### **4. Examples**

Here, different PWI runs using GA optimization on a single real prestack seismic data are shown. Input seismic, initial model generated from velocity analysis and NMO correction and then using established *VP*-*VS* and *VP*-*ρ* relations, and the search windows were used to define the model-space are shown in **Figures 2** and **3**. In all examples, GA with diversity preservation and linear fitness scaling was used. Fitness scaling constant (*SC*), probabilities of crossover, mutation, and elitism (*PC*, *PM*, *PE*) were all set to their recommended values as discussed earlier. Also, the models were parameterized as the P-wave velocity (*VP*), Poisson's ratio (*ν*), and density (*ρ*).

Using the workflow of **Figure 8**, first two examples were run using *NP* ¼ 80, *Gmax* ¼ 400, *Rmax* ¼ 5, *Niter* ¼ 7, and *Cmin* ¼ 2*:* Note the value of *Cmin* cannot exceed 1. In these two runs, it was deliberately set to 2 to ensure that the methodology goes through all seven iterations (*Niter*), each with five repeats (*Rmax*) of GA optimization using a population size (*NP*) of 80 with 400 generations (*Gmax*). In both examples, the search windows to define the model space were �10% for *VP*, -70% to +25% for *ν*, and �15% for *ρ* around their initial values, which are shown as dashed red curves in **Figure 4**. Using 40 Intel Sandybridge/Ivybridge processors, runtimes for each were approximately 45 min.

In the first example, the initial random models were generated using the method shown in **Figure 4**, where the *VP*, *ν*, and *ρ* for each layer were allowed to randomly vary within their respective model-space search limits, and the inversion result along with the initial model and the true (well-log) model are shown in **Figure 9**. The inverted model shown in red in this Figure is not the true inverted model, but a smoothed version of it, computed by taking a moving average of five samples (layers) across the entire depth range. The reason for showing a smoothed model instead of the actual model is because for unconstrained GA inversion where well-logs are not used at all, it is necessary to run multiple inversion passes. For the first pass, the initial model, generated from velocity analysis, *VP*-*VS*, and *VP*-*ρ* relations is discretized at 0.5*λ* resolution where *λ* is the dominant wavelength as discussed earlier. In successive passes, a smoothed model from the previous pass is discretized at finer resolutions (0.25*λ*, 0.15*λ*, etc.) and used as the initial model. If the search window required for any given inversion pass is narrower than the one needed for the previous pass, then it can be inferred that the multi-pass inversion is moving toward convergence. Now, comparing the (smoothed) inverted model with the true (well-log) model shown in **Figure 9**, it can be readily verified that *VP* is estimated reasonably well. However, the *Optimization Using Genetic Algorithms – Methodology with Examples from Seismic… DOI: http://dx.doi.org/10.5772/intechopen.113897*

#### **Figure 9.**

*PWI results using the diversity preserved GA optimization and random sampling method for initial model generation shown in Figure 4. (a) P-wave velocity, (b) S-wave velocity, (c) density, and (d) Poisson's ratio.*

estimates of *VS* (or *ν*) and *ρ* are not as good, especially for depths greater than 3 km. Thus, by creating the initial random models using the sampling method of **Figure 4**, further refinements in the estimated inverted model is possible by making successive GA runs with the smoothed inverted model from the previous run set as the initial model for the next run with new search windows. Comparing the true (well-log) model with the initial model and the smoothed inverted model, shown in **Figure 9**, it can be readily seen that if the smoothed inverted model (the red curve in **Figure 9**) is used as the initial model for another inversion pass, much narrower search window than the first pass would be needed for GA to encompass all variations the true (welllog), and running a few such inversion passes would eventually find a model close to the true model. However, running such multiple inversion passes is not only computeintense and cumbersome, but also impractical. Although well-logs were not used here to define the initial model, the results could still be compared with the well data and the inversion could be stopped when the estimated model is sufficiently close to the true well-log model. In the absence of well data, it is however difficult to come to a decision point of when the estimated model is sufficiently close to the true model so that the multiple inversion passes could be stopped. Thus, although the inversion result of **Figure 9** indicates that starting from a faraway initial model and using a wide model-space, GA optimization would, in theory, find the true model via successive inversion runs, the method is still difficult to apply in practice.

In contrast with the method for generating initial models shown in **Figure 4**, **Figure 10** shows a new way to generate them. Like **Figure 4**, **Figure 10** also shows ten random *VP* models, generated between the minimum and maximum search bounds with model-1 in a different color from models 2–10. Note that the random *VP* models of **Figure 10** span the entire search limit like the ones shown in **Figure 4**. These newly generated random models, however, vary vertically, but are much smoother laterally than the ones in **Figure 4**. Comparing the random models of **Figures 4** and **10**, the former could be regarded as the *laterally sampled* and later as the *vertically sampled* random models.

**Figure 11** is the PWI result with GA optimization using the same parameters as the ones used in **Figure 9**, except the initial set of random models were generated via

#### **Figure 10.**

*A new strategy of generating initial random models, demonstrated using VP. Vertically varying and laterally smooth models are generated randomly between the minimum and maximum search limits. Like Figure 4, the search limits, well-log and initial VP models, and model-1 in a different color than models 2–10 are shown.*

vertical sampling method of **Figure 10**. Note that unlike **Figure 9**, the smoothed inverted model of **Figure 11** is sufficiently close to the true model, and just a single inversion pass using this smoothed inverted model as the initial model should find the true model.

To verify, if the smoothed inverted model shown in **Figure 11**, would find the true model, another pass of PWI with GA optimization was run by using the model shown in red in **Figure 11** as the initial model and discretizing it at 0.25λ resolution. Search windows for *VP* and *ν* were set to �5% and that for *ρ* was set to �2% and random models of population size *NP* ¼ 40 were generated using vertical sampling procedure of **Figure 10**. Other parameters used for this inversion were *Gmax* ¼ 200, *Rmax* ¼ 3, *Niter* ¼ 3, and *Cmin* ¼ 1*:*0*:* Like last two inversions, a diversity preserved GA optimization with linear

#### **Figure 11.**

*Same as Figure 9 except when the initial set of random models were generated using the vertical sampling method of Figure 10.*

*Optimization Using Genetic Algorithms – Methodology with Examples from Seismic… DOI: http://dx.doi.org/10.5772/intechopen.113897*

#### **Figure 12.**

*Pass-2 inversion, using the smoothed inverted model of Figure 11 as the initial model. The original initial model is also shown in blue.*

fitness scaling was used with the same parameters as before. Using 20 Intel Sandybridge/ Ivybridge processors, runtime for this inversion was 10 min and the results are shown in **Figure 12**.

Unlike **Figures 9** and **11**, the inverted model in **Figure 12** is the actual model estimated from inversion, not a smoothed version of it. This model is almost identical to the true (well-log) model, indicating that vertical sampling strategy for generating the initial set of random models is superior to lateral sampling and is the practical way for using GA optimization to solve seismic inverse problems when the initial model is far, and the model-space is large.

### **5. Discussion**

Vertical sampling (**Figure 10**) instead of lateral sampling (**Figure 4**) is a new concept in the GA optimization for PWI. By starting from an initial model obtained directly from data and well-known *VP*-*VS* and *VP*-*ρ* relations, using a large modelspace defined as wide search windows for each model parameter, and generating vertically sampled initial set of random models (**Figure 10**), this newly proposed twopass GA optimization can find the true (well-log) model with very good accuracy. By comparing lateral sampling (**Figure 4**) with the new vertical sampling (**Figure 10**) it may indicate the former sampling method encompasses the model space more uniformly than the latter. This is, however, untrue. In **Figures 4** and **10** only ten random models are shown, and when sufficiently large numbers of models are generated, both methods sample the model-space equally well.

**Figure 13** compares a single random model of *VP*, *ρ*, and *ν* generated from both standard (lateral) and newly proposed (vertical) sampling methods along with the initial model and the search limits. There is a fundamental difference between how the models are generated in these methods and how do they behave within the model space. In lateral sampling, values for each parameter and for each layer are independently generated in between their respective search bounds. Thus, *VP*, *ρ*, and *ν* for each layer vary widely between their specified search limits (light green curves in

#### **Figure 13.**

*Comparison between the lateral (light green) and vertical (magenta) sampling using a single random model. The search limits (dashed red) and the initial model (black) are also shown. In addition, the dashed line in cyan represents the top of the inversion window. (a)* **VP***, (b)* **ρ***, and (c)* **ν***.*

**Figure 13**). Vertical sampling, shown as magenta curves in **Figure 13** on the other hand, uses a single random number for each model. Thus, a random number between 0 and 1 is generated and combined with the minimum and maximum search limits for each model parameter at each depth point or layer to define the model. For example, consider two specific layers in **Figure 13** at 1 and 2 km depths. The minimum and maximum limits for *VP* at 1 km are approximately 3.3 and 4.2 km/s and those at 2 km are about 3.5 and 4.5 km/s. So, if the random number generated for a given model is 0.35, then the *VP* at 1 km for the model would be set to 3*:*3 þ 0*:*35 � ð Þ¼ 4*:*2 � 3*:*3 3*:*615 km/s. Similarly, *VP* at 2 km for the same model would be calculated as 3*:*5 þ 0*:*35 � ð Þ¼ 4*:*5 � 3*:*5 3*:*85 km/s. Values for *ρ* and *ν* can also be computed in a similar fashion. Note that because the initial Poisson's ratio is constant, the random Poisson's ratio models generated by vertical sampling are also constant. Because a single random number defines one model, vertically sampled random models are much smoother than the laterally sampled ones. In addition, these vertically sampled random models follow the initial *VP*, *VS* (or *ν*) and *ρ* model trends. In the examples shown here, the initial *VP* is derived from the velocity analysis and NMO correction of prestack seismic data, and the initial *VS* (*ν*) and *ρ* were generated from this initial *VP* and established *VP*-VS and *VP*-*ρ* relations, which, in a broad sense, is the representative of the local geology that follows the regional compaction trend. Even for the case when well-logs are used as the initial model and then the initial random models are generated by vertical sampling, they would still be the representative of the geology. Thus, the vertically sampled random models could also be regarded as *geologically constrained* random models. Although random, their smooth and geologically constrained behavior is the reason for their ability to better sample the model-space. Starting from the combinations of *VP*, *VS* (or *ν*) and *ρ* models, consistent with the local geology, they tend to converge faster than those generated from lateral sampling. Finally, the dashed cyan line across all panels of **Figure 13** is the top of the inversion window. All model parameters above this line were regarded as overburden and kept unchanged.

*Optimization Using Genetic Algorithms – Methodology with Examples from Seismic… DOI: http://dx.doi.org/10.5772/intechopen.113897*

Discussion here is restricted to the application of GA to solve seismic inverse problem a single location using the parallel implementation outlined in **Figure 8**. This procedure at single location is extendable to multiple location using the multi-level parallelization. The concept of such multi-level parallelization is to use multiple sets of the workflow of **Figure 8** to simultaneously invert different regions of the seismic data. Details of such multi-level parallelization is described elsewhere [13, 16] and therefore not repeated.

### **6. Conclusions**

By introducing a new sampling strategy for generating initial random models, a GA optimization methodology is presented here. By avoiding genetic drift via diversity preservation and fitness scaling, the proposed method has been shown to work very well for large-sized model-spaces in solving seismic waveform inversion problems. The proposed method shown for a single location can be easily extended to multiple locations using a multi-level parallelization approach.

## **Author details**

Subhashis Mallick University of Wyoming, Laramie, WY, USA

\*Address all correspondence to: smallick@uwyo.edu

© 2023 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

## **References**

[1] Holland JH. Adaptation in Natural and Artificial System. Ann Arbor, MI, USA: University of Michigan Press; 1975

[2] Goldberg DE. Genetic Algorithms in Search, Optimization and Machine Learning. Boston, MA, USA: Addison Wesley Publishing Company; 1989

[3] Stoffa PL, Sen MK. Nonlinear multiparameter optimization using genetic algorithms: Inversion of planewave seismograms. Geophysics. 1991;**56**: 1794-1810

[4] Sen MK, Stoffa PL. Rapid sampling of model space using genetic algorithms: Examples from seismic waveform inversion. Geophysical Journal International. 1992;**108**:281-292

[5] Sen MK, Stoffa PL. Global Optimization Methods in Geophysical Inversion. Amsterdam, Netherlands: Elsevier Science Publications; 1995

[6] Sen MK, Stoffa PL. Bayesian inference, Gibbs's sampler and uncertainty estimation in geophysical inversion. Geophysical Prospecting. 1996;**44**:313-350

[7] Mallick S. Model-based inversion of amplitude-variations-with-offset data using a genetic algorithm. Geophysics. 1995;**60**:939-954

[8] Mallick S. Case History: Some practical aspects of prestack waveform inversion using a genetic algorithm: An example from the east Texas Woodbine gas sand. Geophysics. 1999;**64**:326-336

[9] Du Z, MacGregor LM. Reservoir characterization from joint inversion of marine CSEM and seismic AVA data using Genetic Algorithms: a case study based on the Luva gas field. SEG

Technical Program Expanded Abstracts. 2010;**80**:737-741

[10] Padhi A, Mallick S. Accurate estimation of density from the inversion of multicomponent prestack seismic waveform data using a non-dominated sorting genetic algorithm. The Leading Edge. 2013;**32**:94-98

[11] Padhi A, Mallick S. Multicomponent prestack seismic waveform inversion in transversely isotropic media using a nondominated sorting genetic algorithm. Geophysical Journal International. 2014; **196**:1600-1618

[12] Li T, Mallick S. Multicomponent, multi-azimuth pre-stack seismic waveform inversion for azimuthally anisotropic media using a parallel and computationally efficient non-dominated sorting genetic algorithm. Geophysical Journal International. 2015;**200**:1136-1154

[13] Mallick S, Adhikari S. Amplitudevariation-with-offset and prestack waveform inversion: A direct comparison using a real data example from the Rock Springs Uplift, Wyoming, USA. Geophysics. 2015;**80**(2):B45-B59

[14] Li T, Mallick S, Tamimi N, Davis T. Inversion of wide-azimuth multicomponent vertical seismic profile data for anisotropic subsurface properties. SEG Technical Program Expanded Abstracts. 2016;**2016**:1252-1257

[15] Mazzotti A, Bienati N, Stucchi E, Tognarelli A, Aleardi M, Sajeva A. Twogrid genetic algorithm full-waveform inversion. The Leading Edge. 2016;**35**: 1068-1075

[16] Pafeng J, Mallick S, Sharma H. Prestack waveform inversion of three*Optimization Using Genetic Algorithms – Methodology with Examples from Seismic… DOI: http://dx.doi.org/10.5772/intechopen.113897*

dimensional seismic data – An example from the Rock Springs Uplift, Wyoming, USA. Geophysics. 2017;**82**(1):B1-B12

[17] Ayani M, MacGregor L, Mallick S. Inversion of marine controlled source electromagnetic data using a parallel non-dominated sorting genetic algorithm. Geophysical Journal International. 2020;**220**:1066-1077

[18] Kozlovskaya E, Vecsey L, Plomerova J, Raita T. Joint inversion of multiple data types with the use of multiobjective optimization: problem formulation and application to the seismic anisotropy investigations. Geophysical Journal International. 2007; **171**:761-779

[19] Singh VP, Duquet B, Leger M, Schoenauer M. Automatic wave equation migration velocity inversion using multiobjective evolutionary algorithms. Geophysics. 2008;**73**:VE61-VE73

[20] Heyburn R, Fox B. Multi-objective analysis of body and surface waves from the Market Rasen (UK) earthquake. Geophysical Journal International. 2010; **181**:532-544

[21] Oldenburg DW, Scheuer T, Levy S. Recovery of acoustic impedance from reflection seismograms. Geophysics. 1983;**48**:1318-1337

[22] Oldenburg DW, Levy S, Stinson K. Root-means-square velocities and recovery of the acoustic impedance. Geophysics. 1984;**49**:1653-1663

[23] Connolly P. Elastic impedance. The Leading Edge. 1999;**18**:438-452

[24] Hampson DP, Russell BH, Bankhead B. Simultaneous inversion of pre-stack seismic data. SEG Technical Program Expanded Abstracts. 2005;**75**: 1633-1637

[25] Mallick S. Amplitude-variation-withoffset, elastic impedance, and waveequation synthetics – A modeling study. Geophysics. 2007;**72**:C1-C7

[26] Pratt RG. Seismic waveform inversion in the frequency domain. Part 1: Theory and verification in a physical scale model. Geophysics. 1999;**64**: 888-901

[27] Vigh D, Starr EW. 3D prestack plane-wave, full-waveform inversion. Geophysics. 2008;**73**:VE135-VE144

[28] Plessix R-É. Three-dimensional frequency-domain full-waveform inversion with an iterative solver. Geophysics. 2009;**74**:WCC149-WCC157

[29] Lee KH, Kim HJ. Source-independent full-waveform inversion of seismic data. Geophysics. 2003;**68**:2010-2015

[30] Liu F, Guasch L, Morton SA, Warner M, Umpleby A, Meng Z, et al. 3-D time-domain full waveform inversion of a valhall obc dataset. SEG Technical Program Expanded Abstracts. 2012;**82**:1-5. DOI: 10.1190/segam2012-1105.1

[31] Guasch L, Warner M, Nangoo T, Morgan J, Umpleby A, Stekl I, et al. Elastic 3D full-waveform inversion. SEG Technical Program Expanded Abstracts. 2012;**82**:1-7. DOI: 10.1190/segam2012- 1239.1

[32] Guitton A, Ayeni G, Diaz E. Constrained full-waveform inversion by model reparameterization. Geophysics. 2012;**77**:R117-R217

[33] Warner M, Ratcliffe A, Nangoo T, Morgan J, Umpleby A, Shah N, et al. Anisotropic 3D full-waveform inversion. Geophysics. 2013;**78**:R59-R80

[34] Bansal R, Krebs J, Routh P, Lee S, Anderson J, Baumstein A, et al.

Simultaneous-source full-wavefield inversion. The Leading Edge. 2013;**32**: 1100-1108

[35] Xue Z, Zhu H, Fomel S. Fullwaveform inversion using seislet regularization. Geophysics. 2017;**82**: A43-A49

[36] Huang G, Nammour R, Symes W. Full-waveform inversion via sourcereceiver extension. Geophysics. 2017;**82**: R153-R171

[37] Biswas R, Sen MK. 2D Full Waveform Inversion and Uncertainty Estimation using the Reversible Jump Hamiltonian Monte Carlo. SEG Technical Program Expanded Abstracts. 2017;**87**:1280-1285

[38] da Silva NV, Yao G, Warner M. Semiglobal viscoacoustic full-waveform inversion. Geophysics. 2019;**84**:R271-R293

[39] Zhang Z-d, Alkhalifah T. Localcrosscorrelation elastic full-waveform inversion. Geophysics. 2019;**84**:R897- R908

[40] Sen MK, Roy IG. Computation of differential seismograms and iteration adaptive regularization in prestack waveform inversion. Geophysics. 2003; **68**:2026-2039

[41] Padhi A, Mallick S, Fortin W, Holbrook WS, Blacic T. 2-D ocean temperature and salinity images from pre-stack seismic waveform inversion methods: an example from the South China Sea. Geophysical Journal International. 2015;**202**: 800-810

[42] Kennett BLN. Seismic Wave Propagation in Stratified Media. Cambridge, United Kingdom: Cambridge University Press; 1983 [43] Kennett BLN, Kerry NJ. Seismic waves in a stratified half space. Geophysical Journal of the Royal Astronomical Society. 1979;**57**:557-583

[44] Yilmaz O. Seismic data processing, Society of Exploration Geophysicists. Tulsa, OK, USA; 1987

[45] Castagna JP, Batzle ML, Eastwood RL. Relationships between compressional-wave and shear-wave velocities in clastic silicate rocks. Geophysics. 1985;**50**:571-581

[46] Hilterman F. Is AVO the seismic signature of lithology? A case study of Ship Shoal-South Addition. The Leading Edge. 1990;**10**:39-42

[47] Gardner GHF, Gardner LW, Gregory AR. Formation velocity and density – The diagnostic basics for stratigraphic traps. Geophysics. 1974;**39**: 770-780

[48] Todd CP, Backus MM. Offsetdependent reflectivity in a structural context. SEG Technical Program Expanded Abstracts. 1985;**55**:586-588

[49] Resnick JR. Seismic data processing for AVO and AVA analysis. In: Castagna JE, Backus MM, editors. Offset Dependent Reflectivity—Theory and Practice for AVO and AVA analysis. Tulsa, OK, USA: Society of Exploration Geophysicists; 1993. pp. 175-189

[50] Mukhopadhyay PK, Mallick S. An accurate ray-based offset-to-angle transform from normal moveout uncorrected multicomponent data in a transversely isotropic medium with vertical symmetry axis. Geophysics. 2011;**76**:C41-C51

[51] Shaw R, Srivastava S. Particle swarm optimization: A new tool to invert

*Optimization Using Genetic Algorithms – Methodology with Examples from Seismic… DOI: http://dx.doi.org/10.5772/intechopen.113897*

geophysical data. Geophysics. 2007;**72**: F75-F83

[52] Tronicke J, Paasche H, Böniger U. Crosshole traveltime tomography using particle swarm optimization: A nearsurface field example. Geophysics. 2016; **77**:R19-R32

[53] Mallick S. A simple approximation to the P-wave reflection coefficient and its implication in the inversion of amplitude variation with offset data. Geophysics. 1993;**58**:544-552

[54] Koefoed O. On the effect of Poisson's ratios of rock strata on the reflection coefficients of plane waves. Geophysical Prospecting. 1955;**3**:381-387

[55] Koefoed O. Reflection and transmission coefficients for plane longitudinal incident waves. Geophysical Prospecting. 1962;**10**:304-351

[56] Bortfeld R. Approximations to the reflection and transmission coefficients of plane longitudinal and transverse waves. Geophysical Prospecting. 1961;**9**:485-502

[57] Richards PG, Frasier CW. Scattering of elastic waves from depth-dependent inhomogeneities. Geophysics. 1976;**41**: 441-458

[58] Shuey RT. A simplification of the Zoeppritz, equations. Geophysics. 1985; **50**:609-614

[59] Aki K, Richards PG. Quantitative Seismology. Herndon, VA, USA: University Science Books; 2002

[60] Sen MK, Stoffa PL. Nonlinear onedimensional seismic waveform inversion using simulated annealing. Geophysics. 1991;**56**:1624-1638

[61] Sen MK, Biswas R. Transdimensional seismic inversion using the reversible

jump Hamiltonian Monte Carlo algorithm. Geophysics. 2017;**82**:R119- R134

[62] Zhu D, Gibson R. Seismic inversion and uncertainty quantification using transdimensional Markov chain Monte Carlo method. Geophysics. 2018;**83**: R321-R334

[63] Jian W, Lei Z, Hao C, Xiu-ming W. Trans-dimensional Bayesian inversion for directional resistivity logging while drilling data. SEG Technical Program Expanded Abstracts. 2018;**88**:849-852

[64] Visser G, Guo P, Saygin E. Bayesian transdimensional seismic full-waveform inversion with a dipping layer parameterization. Geophysics. 2019;**84**: R845-R858

[65] Dhara A, Sen MK, Yang D, Schmedes J, Routh P, Sain R. Facies and reservoir properties estimation by a transdimensional seismic inversion. SEG Technical Program Expanded Abstracts. 2020;**90**:255-259

[66] Deb K, Agrawal S. A niched-penalty approach for constraint handling in genetic algorithms. Vienna: Springer; 1999. DOI: 10.1007/978-3-7091-6384-9\_ 40

[67] Deb K, Pratap A, Agarwal S, Meyarivan T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation. 2002;**6**(2):181-197
