**3. Applications on astrophysics**

Astrophysics is a field of research very rich in NP-complete problems. Many of actual astrophysicists deal with non-linear systems and unstable conditions. In some cases, the comparative data, or the environment in GA jargon, is an image originated in telescopes or instruments placed in deep space. It is common the need for fit models with multi-spectral 162 Bio-Inspired Computational Algorithms and Their Applications

Deep inside any GA code there is a model of the inverted problem to be solved. This routine works like I don't know what the correct answer is, but I kwon if a candidate to an answer is good or bad. So, the problem to be solved by a GA must have the property that any proposed solution to an instance must be quickly checked for correctness. For one thing, the

To formalize the notion of quick checking, we will say that there is a polynomial-time algorithm that takes as input instance and the solution and decides whether or not it is a solution. If a problem demands a nondeterministic polynomial time to be solved, it is said a NP-problem, as defined by complexity theory researchers. It means that a solution to any search problem can be found and verified in polynomial time by nondeterministic

The most remarkable characteristic of a NP-complete problem is the lack known algorithms to find its solution. In a P-Problem, any given candidate to solution can be verified quickly for its accuracy or validity. On the other hand, the time required to solve a NP-problem using any currently known search algorithm increases exponentially with the size of the problem grows. As a consequence, one of the principal unsolved problems in computer science today is determining whether or not it is possible to solve these problems quickly,

Then, suppose one has a problem *M* to be solved and asks if a GA based program could

1. To write down formally the set of parameters to be found, something like *S*={*p*1, *p*2, *p*3, …, *pn*}, where the *p*i set is a representation of the input parameters. Each *p*<sup>i</sup> must be a single number (float or integer), so the *S* set could be interpreted as a

2. To express the problem as a function of the set of parameters: *M*=*f*(*S*), with *M*={*q*1, *q*2, *q*3, …, *qm*}, where the *q*i set is the representation of the output (desired)

If the *g*(*M*) function can be translated to a writable algorithm, and this algorithm is computable in a finite time, then the *g*(*M*) is a P-problem. If the *f*(*S*) function cannot be translated to a writable algorithm, or this algorithm is computable only with by verifying all

With both answers: the *f*(*S*) function is a NP-problem, and its inverse, *g*(*M*) is a P-problem,

Astrophysics is a field of research very rich in NP-complete problems. Many of actual astrophysicists deal with non-linear systems and unstable conditions. In some cases, the comparative data, or the environment in GA jargon, is an image originated in telescopes or instruments placed in deep space. It is common the need for fit models with multi-spectral

3. Obtain the inverse problem, or the formalities need to compute *S*= *g*(*M*) =*f*-1(*M*).

solution must be concise, with length polynomially bounded by that of the instance.

algorithm.

**2.1 Inverting the problem** 

called the P versus NP problem.

parameters.

solve it. The steps to be followed are:

chromosome and each *p*i as a gene.

then the problem can be solved by a GA.

**3. Applications on astrophysics** 

possibilities in the *S* space, then the *f*(*S*) is a NP-problem.

data, like radio, infrared, visible and gamma-rays. All these solution constraints lead to an incredible variety of possibilities for using GA tools.

In this section, it will be presented how GAs were used to model protoplanetary discs, an application that involves non-linear radiative-density profile relations. The model combines spectral energy distribution, observed in a wide range of the electromagnetic spectrum, and emissivity behaviour of different dust grain species.

Another interesting application is the use of GAs together with and spectral synthesis in the calculation of abundances and metallicities of T Tauri stars. In this problem, the model is outside the GA code, as one of the conditions imposed is to use a standard, well tested, spectral generator. It is presented how to deal with the challenge of changing a ready to use tool into a NP-complete problem and invert it.

#### **3.1 Using GA to model protoplanetary discs**

This subsection is based on the published work *The use of genetic algorithms to model protoplanetary discs* (Hetem & Gregorio-Hetem 2007).

During its formation process, a young star object (YSO) can be surrounded by gas, dust grains and debris, that shall be gravitationally (and also electrostatically) agglomerate in the future solar system bodies. This material receives the energy brought from the star surface and re-irradiates it in other wavelengths. The contribution of this circumstellar matter to the spectral energy distribution (SED) slope is often used to recognize different categories of young YSOs by following an observational classification based on the near-infrared spectral index (Lada & Wilking 1984; Wilking, Lada & Young 1989; André, Ward-Thompson & Barsony 1993). Actually, this classification suggests a scenario for the evolution of YSOs, from Class 0 to Class III, which is well established for TTs.

Here, the adopted model is a flared configuration, according to Dullemond et al. (2001) modelling of a passively irradiated circumstellar disc with an inner hole. We used this model as the P-problem core of a GA based optimization method to estimate the circumstellar parameters.

### **3.1.1 Presenting the problem**

In this subsection we describe the implementation of the GA method for the flared-disc model.

The SED for a given set of parameters is evaluated according to Dullemond et al. (2001) model equations. The disc is composed by three components: the inner rim, the shadowed region, and the flared region with two layers: an illuminated hot layer and an inner cold layer. The disc parameters are: radius, *RD*; mass, *MD*; inclination, *θ*; density power law index, *p*; and inner rim temperature, *T*rim. The stellar parameters are: distance, *d*; mass, *M*; luminosity, *L*; and temperature, *T*.

The model starts by establishing a vertical boundary irradiated directly by the star, which considers the effect arising from shadowing from the rim, and the variations in scale height as a function of the radius. Figure 1 presents the obtained SED for the star AB Aurigae, as presented in Hetem & Gregorio-Hetem (2007).

The Search for Parameters and Solutions:

of the genetic operators to the field

solutions with the smallest *χ*<sup>2</sup>

the schematic view in figure. 2.

by

(1994), Bentley & Corne (2002) and references therein.

Applying Genetic Algorithms on Astronomy and Engineering 165

attributed to a fraction of the number of individuals following the values suggested by Koza

With the genetic operators chosen, the next generation is evaluated by applying specific rules according to the genetic operators. The copy operator uses an elitist selection, as the

a random mix of two distinct individuals' genes is built. The mutation operator copies the original individual, except for one of the genes, which is randomly changed. The process loop continues to build new generations until the end condition is reached, as illustrated by

Fig. 2. Main steps of a generic GA (adapted from Hetem & Gregorio-Hetem 2007).

1

=

*ij*

α

one can evaluate the inverse of the Hessian matrix [*C*] ≡ [

We also can estimate the error bars in the final results by analysing the *χ*2 behaviour as a function of the parameter variation. Then one can determine the confidence levels of a given parameter, as suggested by Press et al. (1995). Once the GA end condition has been reached,

> () () *<sup>N</sup> k k*

 λ

*k i j y y a a* λ

where ∂*y*(*λk*)/∂*ai* is the partial derivative of the SED with respect to parameter *ai* at *λ* = *λk* , and *N* is the number of observed data points. The main diagonal of *C* can be used to estimate the error bars on each parameter by σ*<sup>i</sup>* ≅ *C*1/2/*N*. We estimated the error bars for the 1σ confidence level and the respective disc parameters for AB Aurigae, resulting in *MD* = 0.1 ± 0.004M, *RD* =400±44 AU, θ =65±3o, and *Trim* =1500±26 K, and these results are in agreement with the error-bar estimation provided by the surface contour levels described below. Fig. 3 presents the contour levels of the *gof*(*MD*, *RD*) surface calculated for a set of 400

α

∂ ∂ <sup>=</sup> ∂ ∂ (3)

]-1 whose components are given

*i*: copy, crossover, mutation or termination. Each

*<sup>i</sup>* are copied to the next generation. For the crossover operator,

Φis

Φ

Fig. 1. Results from Dullemond et al. (2001) model applied to the star AB Aurigae. The Synthetic SED is the sum of its components: star emission (continuous thin line); rim emission (dashed line); disc cold layer emission (dot–dashed line); and the disc hot layer emission (dotted line). The observational data in various wavelengths is represented by squares (Hetem & Gregorio-Hetem 2007).

#### **3.1.2 Implementation**

The GA code was designed and built to find the best disk parameters, namely *S*= {*RD*; *MD*; *θ*; *p*; *T*rim, *d*; *M*; *L*; *T*}, as discussed in subsection 2.1. However, some of these parameters are already known: the stellar parameters *d*, *L*; and *T* are adopted from observations and easily found in literature. Essentially, the GA method used implements a *χ*2 minimization of the SED fitting provided by the Dullemond et al. (2001) model. The main structures used to manipulate the data are linked lists containing the solutions (parameter set, adaptation level, *χ*2 *i*, and the genetic operator, Φ*<sup>i</sup>*), expressed by

$$M\_i = \left\{ \left( R\_{\text{Di}}, \theta\_{i\prime}, M\_{\text{Di}\prime}, p\_{i\prime} T\_i \right), \left( \chi\_{i\prime}^2, \Phi\_i \right) \right\} \tag{1}$$

where *Si* denotes the *i*th solution, and *Ti* is the *i*th Trim. Following Goldberg (1989), the code starts with the construction of the first generation, where all parameters are randomly chosen within an allowed range (for example, 50 ≤ RD ≤ 1000 AU). We chose as the number of individuals (parameter sets) in all the generations to be 100. In the following interactions loops, the evaluation function runs the Dullemond et al. (2001) model for each individual, and compares the synthetic SED with the observed data through a *χ*2 measure, using the modified expression (Press et al. 1995):

$$\mathcal{Z}\_i^2 = \frac{1}{N} \sum\_{j}^{N} \left( F\_j - \varphi\_{ij} \right)^2 \tag{2}$$

where *Fj* is the observed flux at wavelength *λj*, *N* is the number of observed data points, and *φi j* is the calculated flux for the solution *Si*. The smallest *χ*2 is assumed to be the *gof*, the goodness-of-fit measure for that generation. The evaluation function is applied to all individuals, and then the judgement procedure sorts the list by increasing *χ*2. It also sets one 164 Bio-Inspired Computational Algorithms and Their Applications

Fig. 1. Results from Dullemond et al. (2001) model applied to the star AB Aurigae. The Synthetic SED is the sum of its components: star emission (continuous thin line); rim emission (dashed line); disc cold layer emission (dot–dashed line); and the disc hot layer emission (dotted line). The observational data in various wavelengths is represented by

The GA code was designed and built to find the best disk parameters, namely *S*= {*RD*; *MD*; *θ*; *p*; *T*rim, *d*; *M*; *L*; *T*}, as discussed in subsection 2.1. However, some of these parameters are already known: the stellar parameters *d*, *L*; and *T* are adopted from observations and easily found in literature. Essentially, the GA method used implements a *χ*2 minimization of the SED fitting provided by the Dullemond et al. (2001) model. The main structures used to manipulate the data are linked lists containing the solutions (parameter set, adaptation level,

where *Si* denotes the *i*th solution, and *Ti* is the *i*th Trim. Following Goldberg (1989), the code starts with the construction of the first generation, where all parameters are randomly chosen within an allowed range (for example, 50 ≤ RD ≤ 1000 AU). We chose as the number of individuals (parameter sets) in all the generations to be 100. In the following interactions loops, the evaluation function runs the Dullemond et al. (2001) model for each individual, and compares the synthetic SED with the observed data through a *χ*2 measure, using the

> ( )<sup>2</sup> <sup>2</sup> <sup>1</sup> *<sup>N</sup> i j ij j F*

where *Fj* is the observed flux at wavelength *λj*, *N* is the number of observed data points, and *φi j* is the calculated flux for the solution *Si*. The smallest *χ*2 is assumed to be the *gof*, the goodness-of-fit measure for that generation. The evaluation function is applied to all individuals, and then the judgement procedure sorts the list by increasing *χ*2. It also sets one

 ϕ

*N*

χ

{( ) ( )} <sup>2</sup> *M R, i Di i Di i i i i* = *θ ,M ,p ,T , χ ,Φ* (1)

= − (2)

squares (Hetem & Gregorio-Hetem 2007).

Φ

*<sup>i</sup>*), expressed by

**3.1.2 Implementation** 

*i*, and the genetic operator,

modified expression (Press et al. 1995):

*χ*2

of the genetic operators to the field Φ*i*: copy, crossover, mutation or termination. Each Φ is attributed to a fraction of the number of individuals following the values suggested by Koza (1994), Bentley & Corne (2002) and references therein.

With the genetic operators chosen, the next generation is evaluated by applying specific rules according to the genetic operators. The copy operator uses an elitist selection, as the solutions with the smallest *χ*<sup>2</sup> *<sup>i</sup>* are copied to the next generation. For the crossover operator, a random mix of two distinct individuals' genes is built. The mutation operator copies the original individual, except for one of the genes, which is randomly changed. The process loop continues to build new generations until the end condition is reached, as illustrated by the schematic view in figure. 2.

Fig. 2. Main steps of a generic GA (adapted from Hetem & Gregorio-Hetem 2007).

We also can estimate the error bars in the final results by analysing the *χ*2 behaviour as a function of the parameter variation. Then one can determine the confidence levels of a given parameter, as suggested by Press et al. (1995). Once the GA end condition has been reached, one can evaluate the inverse of the Hessian matrix [*C*] ≡ [α]-1 whose components are given by

$$\alpha\_{ij} = \sum\_{k=1}^{N} \left( \frac{\partial y(\mathcal{A}\_k)}{\partial a\_i} \frac{\partial y(\mathcal{A}\_k)}{\partial a\_j} \right) \tag{3}$$

where ∂*y*(*λk*)/∂*ai* is the partial derivative of the SED with respect to parameter *ai* at *λ* = *λk* , and *N* is the number of observed data points. The main diagonal of *C* can be used to estimate the error bars on each parameter by σ*<sup>i</sup>* ≅ *C*1/2/*N*. We estimated the error bars for the 1σ confidence level and the respective disc parameters for AB Aurigae, resulting in *MD* = 0.1 ± 0.004M, *RD* =400±44 AU, θ =65±3o, and *Trim* =1500±26 K, and these results are in agreement with the error-bar estimation provided by the surface contour levels described below. Fig. 3 presents the contour levels of the *gof*(*MD*, *RD*) surface calculated for a set of 400

The Search for Parameters and Solutions:

Gregorio-Hetem 2009).

**3.2.1 Artificial spectra as a measurement tool** 

the elements in study should be;

a general comparison index;

stellar atmospheres.

example of a high-resolution spectrum is presented in figure 5.

elements and the known physics of absorption line production;

abundances.

experiments.

Applying Genetic Algorithms on Astronomy and Engineering 167

PDS Name *M* (M) *RD* (AU) *MD* (M) *Trim* (K) *θ* (o) *p gof*  398 HD 141569 2.4 13 0.06 1085 0.6 -2.0 0.006 022 BD−14 1319 2.8 690 0.003 380 40 -10. 0.006 130 IRAS 06475−0735 2.0 309 0.20 1705 53 -1.5 0.016 257 IRAS 07394−1953 2.0 859 0.64 1838 47 -2.0 0.098

This subsection is based on the published work The use of Genetic Algorithms and Spectral Synthesis in the Calculation of Abundances and Metallicities of T Tauri stars (Hetem &

In the previous subsection, we presented a method that uses a calculation technique based on GA aiming to optimize the parameters estimation of protoplanetary disks of T Tauri stars. Inspired by the success of that application, which gives accurate and efficient calculations, we decided to develop a similar method to determine atomic stellar

In astrophysics, the absorption spectra are obtained and employed as an analytical chemistry tool to determine the presence of atoms and ions in stellar atmospheres and, if possible, to quantify the amount of the atoms present. In stellar atmospheres, each element produces a number of spectrum absorption lines, at wavelengths which can be measured with extreme accuracy when compared to spectra emission tables provided by laboratory

The presence of a given element in the star atmosphere can be verified (and measured) by looking for its absorption lines at the correct wavelength. The hydrogen is present in all stars by its Balmer absorption lines, and is often used to calibrate the measurements. An

The way astrophysics use to calculate the abundances of atoms in stars follows the steps:

1. Obtain the star spectrum in a given range (or ranges) of wavelength, where the lines of

2. Generate an artificial spectrum, considering the lines whose origin are the desired

3. Compare the artificial and observed spectra. Here a simple *χ*2 test is enough to compute

4. Use a GA methodology to optimize the artificial spectrum in order to minimize the differences with the observed spectrum (the inverted problem, subsection 2.1); 5. Once the optimization methodology reaches its goals, consider the elemental parameters (density, temperature, ionization, etc) as the measures of the elements in the

Table 1. Obtained parameters for the chosen stars (Hetem & Gregorio-Hetem 2007).

**3.2 Abundances and Metallicities of young stars via Spectral Synthesis** 

random pairs of disc mass and radius around the parameters for the AB Aurigae model taken from Dominik et al. (2003). The result at the minimum is *gof* ~ 0.046, what means that the error bar estimation converged to a narrow range around the parameter set.

Fig. 3. Contour levels *gof*(*MD*, *RD*) estimated for AB Aurigae presenting the confidence levels *χ*2(68%)= 0.082 (continuous line), *χ*2(90%)= 0.15 (dashed) and *χ*2(99%)= 0.21 (dot–dashed) (Hetem & Gregorio-Hetem 2007).

We also applied the described GA method to a four other stars, in order to verify the quality of the fitting for objects showing different SED shapes and different levels of infrared excess. Our set was chosen by the slope of their near-infrared SED. The infrared excess in Herbig Be stars is the result of a spherical dusty envelope (van den Ancker et al. 2001), whereas a thickedge flared disc are characteristic of Herbig Ae. With this in mind, we selected A-type or late-B-type stars from the Pico dos Dias Survey sample (Gregorio-Hetem et al. 1992; Torres et al. 1995; Torres 1998) to apply the GA SED fitting. The results are presented in table 1 together with their corresponding *gofs* (see figure 4).

Fig. 4. GA SEDs obtained the stars BD-14 1319, IRAS 07394-1953, IRAS 06475-0735 and HD 141569. The plots are given as *log*[λFλ(Wm-2)] versus *log*[λ(μm)] (Hetem & Gregorio-Hetem 2007).

166 Bio-Inspired Computational Algorithms and Their Applications

random pairs of disc mass and radius around the parameters for the AB Aurigae model taken from Dominik et al. (2003). The result at the minimum is *gof* ~ 0.046, what means that

Fig. 3. Contour levels *gof*(*MD*, *RD*) estimated for AB Aurigae presenting the confidence levels *χ*2(68%)= 0.082 (continuous line), *χ*2(90%)= 0.15 (dashed) and *χ*2(99%)= 0.21 (dot–dashed)

We also applied the described GA method to a four other stars, in order to verify the quality of the fitting for objects showing different SED shapes and different levels of infrared excess. Our set was chosen by the slope of their near-infrared SED. The infrared excess in Herbig Be stars is the result of a spherical dusty envelope (van den Ancker et al. 2001), whereas a thickedge flared disc are characteristic of Herbig Ae. With this in mind, we selected A-type or late-B-type stars from the Pico dos Dias Survey sample (Gregorio-Hetem et al. 1992; Torres et al. 1995; Torres 1998) to apply the GA SED fitting. The results are presented in table 1

Fig. 4. GA SEDs obtained the stars BD-14 1319, IRAS 07394-1953, IRAS 06475-0735 and HD 141569. The plots are given as *log*[λFλ(Wm-2)] versus *log*[λ(μm)] (Hetem & Gregorio-Hetem

(Hetem & Gregorio-Hetem 2007).

2007).

together with their corresponding *gofs* (see figure 4).

the error bar estimation converged to a narrow range around the parameter set.


Table 1. Obtained parameters for the chosen stars (Hetem & Gregorio-Hetem 2007).
