**Bayesian Recovery of Sinusoids with Simulated Annealing**

Dursun Üstündag and Mehmet Cevri

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/50449

## **1. Introduction**

The ultimate goal of collecting data is to gain meaningful information about a physical system. However, in many situations, the quantities that we would like to determine are different from the ones which we are able to have measured. If the data we measured depends on the quantities we want, it contains at least some information about them. Therefore, our general interest is to subtract this information from data.

Let the vector **θ** contain the parameters to be estimated from the (measurements) vector **D**, which is the output of the physical system that one wants to be modeled. The physical system is described by a vector function *f* in the form:

$$y(t) = f(t; \Theta),\tag{1}$$

where *t* represents time. In many experiments, the recorded data 1 2 { , ,..., } *<sup>N</sup>* **D** *dd d* are sampled from an unknown function *y*( )*t* together with errors *e t*( ) at discrete times 1 2 ( , ,..., )*<sup>T</sup> <sup>N</sup> tt t* :

$$d\_i = y(t\_i) + e(t\_i), \quad (i = 1, \ldots, N). \tag{2}$$

The measurement errors *e t*( ) are generally assumed to be drawn independently from a zero mean Gaussian probability distribution with a standard deviation of . On the other hand, different signal models correspond to different choices of signal model function *f t*(, ) **θ** . In this chapter, we restrict our attention to the static1 sinusoidal model given by

<sup>1</sup> Static refers to that the amplitudes of the sinusoids do not change with time.

<sup>© 2012</sup> Üstündag and Cevri, licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. © 2012 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

$$\begin{aligned} f(t; \Theta) &= \sum\_{j=1}^{\rho} B\_j \cos(t o\_j) + B\_{j+\rho} \sin(t o\_j) \\ &= \sum\_{j=1}^{2\rho} G\_j(t; \omega) B\_j \end{aligned} \tag{3}$$

Bayesian Recovery of Sinusoids with Simulated Annealing 69

:

> **Ι** is

In this chapter, we studied Bayesian recovery of sinusoids using estimation approach proposed by Bretthorst for a general signal model equation and combined it with a simulated annealing (SA) algorithm to obtain a global maximum of the posterior probability density function (PDF) *P I* **ω D**, for frequencies . Unfortunately, conventional algorithms (Press et al., 1995) based on the gradient direction fail to converge for this problem. Even when they converge, there is no assurance that they have found the global, rather than a local maximum. This is because the logarithm of the PDF *P I* **ω D**, is so sharply peaked and highly nonlinear function of frequencies. In this respect, a pattern search algorithm described by Hook-Jevees (Hooke & Jevees, 1962) to overcome this problem has already been used by some researchers in literature of estimation. However, we have found out that this approach does not converge unless the starting point is much closer to the optimum so that we have developed an algorithm in which this Bayesian approach is combined with a simulated annealing (SA) algorithm (Kirkpatrick, et al., 1983; Corana et al., 1987; Goffe et al., 1994; Ingber, 1996), which is a function optimization strategy based on an analogy with the creation of crystals from their melts. This explores the entire surface of the posterior PDF for the frequencies and tries to maximize it while moving both uphill and downhill steps, whose sizes are controlled by a parameter that plays the role of the temperature in a physical system. By slowly lowering the temperature towards zero according to a properly chosen schedule, one can show that the globally optimal solutions are approached asymptotically. Thus, it is largely independent of the starting values, often a critical input in conventional algorithms, and also offers different approach to finding parameter values of sinusoids through a directed, but random, search of the parameter space. In this context, an algorithm of this Bayesian approach is developed and coded in Mathematica programming language (Wellin at al., 2005) and also tested for recovering noisy sinusoids with multiple frequencies. Furthermore, simulation studies on synthetic data sets of a single sinusoid under a variety of signal to noise ratio (SNR) are made for a comparison of its performance with Cramér-Rao lower bound (CRLB), known as a lower limit on variance of any unbiased

estimator. The simulations results support its effectiveness.

**ω B D** 

Let us now reconsider above problem within a Bayesian context (Bernardo & Smith, 2000; Bretthorst, 1988; Gregory, 2005; Harney, 2003; Jaynes, 2003; Ruanaidh & Fitzgerald, 1996; Üstündag & Cevri, 2008*).* As with all Bayesian calculations, the first step is to write down

the likelihood (Edwards, 1972) of the data **D**, given the model parameters. The probability

**Ι ω B Ι D ω B Ι**

given the information **I** . The sampling distribution <sup>2</sup> *P* **D**| ,, , **ω B**

**Ι** is called the prior PDF; it represents prior knowledge of the

**D|I** (5)

Bayes' theorem for the joint PDF for all of the unknown parameters <sup>2</sup> { ,, } **ω B**

 2 2 <sup>1</sup> <sup>2</sup> *P P* , , |, ,, | | ,, , . *<sup>P</sup> P*

**2. Bayesian parameter estimation** 

The quantity <sup>2</sup> *P* **ω**,**B**, |

parameters <sup>2</sup> { ,, } **ω B**

where *<sup>j</sup> B* 's represent the amplitudes of the signal model and

$$\mathbf{G}(t;\mathsf{o}) = \begin{bmatrix} \cos(\boldsymbol{\alpha}\_{1}\boldsymbol{t}\_{1}) & \dots & \cos(\boldsymbol{\alpha}\_{\boldsymbol{\rho}}\boldsymbol{t}\_{1}) & \sin(\boldsymbol{\alpha}\_{1}\boldsymbol{t}\_{1}) & \dots & \sin(\boldsymbol{\alpha}\_{\boldsymbol{\rho}}\boldsymbol{t}\_{1}) \\ \cos(\boldsymbol{\alpha}\_{2}\boldsymbol{t}\_{2}) & \dots & \cos(\boldsymbol{\alpha}\_{\boldsymbol{\rho}}\boldsymbol{t}\_{2}) & \sin(\boldsymbol{\alpha}\_{2}\boldsymbol{t}\_{2}) & \dots & \sin(\boldsymbol{\alpha}\_{\boldsymbol{\rho}}\boldsymbol{t}\_{2}) \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ \cos(\boldsymbol{\alpha}\_{N}\boldsymbol{t}\_{N}) & \dots & \cos(\boldsymbol{\alpha}\_{\boldsymbol{\rho}}\boldsymbol{t}\_{N}) & \sin(\boldsymbol{\alpha}\_{\boldsymbol{\rho}}\boldsymbol{t}\_{N}) & \dots & \sin(\boldsymbol{\alpha}\_{\boldsymbol{\rho}}\boldsymbol{t}\_{N}) \end{bmatrix} \tag{4}$$

The goal of data analysis is usually to use the observed data **D**, to infer the values of parameters **θ ω** { ,} **B** . Besides estimating the values of the parameters, there are two additional important problems. The one is to obtain an indication of the uncertainties associated with each parameter, i.e. some measures of how far they are away from the true parameters. The other we will not consider here is to assess whether or not the model is appropriate for explaining the data.

Sinusoidal parameter estimation in additive white noise within a Bayesian framework has been an important problem in signal processing and still now is an active research area because of its wide variety of applications in multiple disciplines such as sonar, radar, digital communications and biomedical engineering. The purpose of this research is therefore to develop accurate and computationally efficient estimators for sinusoidal parameter, namely, amplitudes and frequencies. In above problem, one may or may not know the number of sinusoids. When it is unknown, it is called model selection (Andrieu and Doucet, 1999; Üstündag, 2011) and is not subject to this chapter. Under an assumption of a known number of sinusoids, several algorithms have already been used in the parameter estimation literature, such as least-square fitting (Press et al., 1995), discrete Fourier transform (Cooley & Tukey, 1964), and periodogram (Schuster, 1905). With least square fitting, the model is completely defined and the question remaining is to find the values of the parameters by minimizing the sum of squared residuals. The discrete Fourier transform has been a very powerful tool in Bayesian spectral analysis since Cooley and Tukey introduced the fast Fourier transform (FFT) technique in 1965, followed by the rapid development of computers. In 1987, Jaynes derived periodogram directly from the principles of Bayesian inference. After his work, researchers in different branches of science have given much attention to the relationship between Bayesian inference and parameter estimation and they have done excellent works in this area for last fifteen years (Bretthorst, 1990; Üstündag et al., 1989, 1991; Harney, 2003; Gregory, 2005; Üstündag & Cevri, 2008, 2011; Üstündag, 2011).

In this chapter, we studied Bayesian recovery of sinusoids using estimation approach proposed by Bretthorst for a general signal model equation and combined it with a simulated annealing (SA) algorithm to obtain a global maximum of the posterior probability density function (PDF) *P I* **ω D**, for frequencies . Unfortunately, conventional algorithms (Press et al., 1995) based on the gradient direction fail to converge for this problem. Even when they converge, there is no assurance that they have found the global, rather than a local maximum. This is because the logarithm of the PDF *P I* **ω D**, is so sharply peaked and highly nonlinear function of frequencies. In this respect, a pattern search algorithm described by Hook-Jevees (Hooke & Jevees, 1962) to overcome this problem has already been used by some researchers in literature of estimation. However, we have found out that this approach does not converge unless the starting point is much closer to the optimum so that we have developed an algorithm in which this Bayesian approach is combined with a simulated annealing (SA) algorithm (Kirkpatrick, et al., 1983; Corana et al., 1987; Goffe et al., 1994; Ingber, 1996), which is a function optimization strategy based on an analogy with the creation of crystals from their melts. This explores the entire surface of the posterior PDF for the frequencies and tries to maximize it while moving both uphill and downhill steps, whose sizes are controlled by a parameter that plays the role of the temperature in a physical system. By slowly lowering the temperature towards zero according to a properly chosen schedule, one can show that the globally optimal solutions are approached asymptotically. Thus, it is largely independent of the starting values, often a critical input in conventional algorithms, and also offers different approach to finding parameter values of sinusoids through a directed, but random, search of the parameter space. In this context, an algorithm of this Bayesian approach is developed and coded in Mathematica programming language (Wellin at al., 2005) and also tested for recovering noisy sinusoids with multiple frequencies. Furthermore, simulation studies on synthetic data sets of a single sinusoid under a variety of signal to noise ratio (SNR) are made for a comparison of its performance with Cramér-Rao lower bound (CRLB), known as a lower limit on variance of any unbiased estimator. The simulations results support its effectiveness.

#### **2. Bayesian parameter estimation**

68 Simulated Annealing – Advances, Applications and Hybridizations

2

*j*

where *<sup>j</sup> B* 's represent the amplitudes of the signal model and

Gregory, 2005; Üstündag & Cevri, 2008, 2011; Üstündag, 2011).

**θ**

*j*

1

( ; ) cos( ) sin( )

cos ... cos sin ... sin

cos ... cos sin ... sin

*f t BtB t*

*j jj j*

1 1 1 11 1

*t tt t*

2 2 2 22 2

. (4)

*t tt t*

*t tt t*

cos ... cos sin ... sin *N NN N*

(3)

 

 

 

 

 

 

(; )

*Gt B*

*j j*

1 1

The goal of data analysis is usually to use the observed data **D**, to infer the values of parameters **θ ω** { ,} **B** . Besides estimating the values of the parameters, there are two additional important problems. The one is to obtain an indication of the uncertainties associated with each parameter, i.e. some measures of how far they are away from the true parameters. The other we will not consider here is to assess whether or not the model is

Sinusoidal parameter estimation in additive white noise within a Bayesian framework has been an important problem in signal processing and still now is an active research area because of its wide variety of applications in multiple disciplines such as sonar, radar, digital communications and biomedical engineering. The purpose of this research is therefore to develop accurate and computationally efficient estimators for sinusoidal parameter, namely, amplitudes and frequencies. In above problem, one may or may not know the number of sinusoids. When it is unknown, it is called model selection (Andrieu and Doucet, 1999; Üstündag, 2011) and is not subject to this chapter. Under an assumption of a known number of sinusoids, several algorithms have already been used in the parameter estimation literature, such as least-square fitting (Press et al., 1995), discrete Fourier transform (Cooley & Tukey, 1964), and periodogram (Schuster, 1905). With least square fitting, the model is completely defined and the question remaining is to find the values of the parameters by minimizing the sum of squared residuals. The discrete Fourier transform has been a very powerful tool in Bayesian spectral analysis since Cooley and Tukey introduced the fast Fourier transform (FFT) technique in 1965, followed by the rapid development of computers. In 1987, Jaynes derived periodogram directly from the principles of Bayesian inference. After his work, researchers in different branches of science have given much attention to the relationship between Bayesian inference and parameter estimation and they have done excellent works in this area for last fifteen years (Bretthorst, 1990; Üstündag et al., 1989, 1991; Harney, 2003;

**ω**

<sup>1</sup>

(; )

**ω**

appropriate for explaining the data.

*G t*

Let us now reconsider above problem within a Bayesian context (Bernardo & Smith, 2000; Bretthorst, 1988; Gregory, 2005; Harney, 2003; Jaynes, 2003; Ruanaidh & Fitzgerald, 1996; Üstündag & Cevri, 2008*).* As with all Bayesian calculations, the first step is to write down Bayes' theorem for the joint PDF for all of the unknown parameters <sup>2</sup> { ,, } **ω B** :

$$P\left(\omega, \mathbf{B}, \sigma^2 \mid \mathbf{D}, \mathbf{I}\right) = \frac{1}{P\left(\mathbf{D} \mid \mathbf{I}\right)} P\left(\omega, \mathbf{B}, \sigma^2 \mid \mathbf{I}\right) P\left(\mathbf{D} \mid \omega, \mathbf{B}, \sigma^2 \mid \mathbf{I}\right). \tag{5}$$

The quantity <sup>2</sup> *P* **ω**,**B**, | **Ι** is called the prior PDF; it represents prior knowledge of the parameters <sup>2</sup> { ,, } **ω B** given the information **I** . The sampling distribution <sup>2</sup> *P* **D**| ,, , **ω B Ι** is the likelihood (Edwards, 1972) of the data **D**, given the model parameters. The probability function *P*( ) **D|I** is the evidence, which is a constant for parameter estimation and is used here for normalizing the posterior PDF <sup>2</sup> *P* **ω**,**B D** , |, **Ι** . Therefore, it can be dropped in Equation (5) so that the joint PDF for the unknown parameters turns out to be the following form:

$$P\left(\omega, \mathbf{B}, \sigma^2 \mid \mathbf{D}, \mathbf{I}\right) \propto P\left(\omega, \mathbf{B}, \sigma^2 \mid \mathbf{I}\right) P\left(\mathbf{D} \mid \omega, \mathbf{B}, \sigma^2, \mathbf{I}\right). \tag{6}$$

Bayesian Recovery of Sinusoids with Simulated Annealing 71

**θ ω** (14)

(15)

**ω ω** (13)

 

*2 2*

*2 N -2ρ P( | , ,σ,Ι) <sup>σ</sup> exp 2σ* (17)

(18)

(19)

is known as a nuisance parameter and

is known, the joint

and

**<sup>ω</sup>** (16)

2 1 <sup>1</sup> (; ) ( ; ), ( 1,...,2 ), *j j <sup>k</sup> <sup>k</sup> <sup>k</sup>*

2

1 ( ; ) ( ; ), *k k k ft AH t* 

,( 1,...,2 ) *k k j kj*

as the corresponding eigenvalue. Substituting these expressions into Equation (9) and

to be the projection of the data onto the orthonormal model functions (; ) *H t <sup>j</sup>* **ω** , we can then

**D h <sup>D</sup> <sup>ω</sup> <sup>B</sup>**

**<sup>h</sup>** . 2ρ 2 2

posterior probability density of the parameters **ω** , conditional on the data and our

*<sup>ρ</sup> P , ,I exp*

must also be eliminated by integrating it out. Using Jeffreys prior (Jeffreys, 1961) <sup>1</sup>

**ω D**

we obtain

*-N+2ρ*

represents the *j* th component of the *k* th normalized eigenvector of *jk* , with *<sup>j</sup>*

( ; ), ( 1,2,...,2 )

integration over *Aj* in Equation (12) to obtain

j j=1 <sup>1</sup> = h 2ρ

.

 **h**

*σ*

*2 2*

2

*j A Bk* 

*h dH t j*

1

This represents the mean-square of the observed projections. If

*N j i ji i*

1

*j H t Gt j* 

and also gives a new expression for the signal model function in Equation (3):

The new amplitudes *Ak* 's are related to the old amplitudes *<sup>j</sup> B* 's by

where *kj* 

defining

with

knowledge of

integrating Equation (12) over

is given by

If there is no prior information about noise, then

proceed to perform the 2

A key component in Bayes theorem is the likelihood function <sup>2</sup> *P* **D**| ,, , **ω B Ι** which is proportional to the probability density of the noise. If its standard deviation is assumed to be known, then the likelihood function takes on the following form:

$$P\left(\mathbf{D}\mid\omega,\mathbf{B},\sigma^2,\mathbf{I}\right) = \left(2\pi\sigma^2\right)^{\frac{-N}{2}} \exp\left(-\frac{NQ}{2\sigma^2}\right),\tag{7}$$

where the exponent *Q* is defined as follows

$$\mathbf{Q} = \sum\_{i=1}^{N} \left( d\_i - \sum\_{j=1}^{2\rho} B\_j \mathbf{G}\_j(t\_i; \mathbf{u}) \right)^2. \tag{8}$$

This is equivalent to

$$\mathbf{Q} = \overline{\mathbf{D}^2} - \frac{2}{N} \sum\_{j=1}^{2\rho} \sum\_{i=1}^{N} d\_i B\_j \mathbf{G}\_j \left( t\_i; \mathbf{\omega} \right) + \frac{1}{N} \sum\_{j=1}^{2\rho} \sum\_{k=1}^{2\rho} \mathbf{\Omega}\_{jk} B\_j B\_{k'} \tag{9}$$

where

$$\overline{\mathbf{D}^2} = \frac{1}{N} \sum\_{i=1}^{N} d\_i^{\;2} \tag{10}$$

and

$$\Omega\_{jk} = \sum\_{i=1}^{N} \mathcal{G}\_j \left( t\_i; \mathfrak{w} \right) \otimes \mathcal{G}\_k \left( t\_i; \mathfrak{w} \right), \quad \left( j, k = 1, \dots, 2 \,\, \rho \right). \tag{11}$$

In order to obtain the posterior PDF for **ω** , Equation (6) can be integrated with respect to the nuisance parameters **B** under the knowledge of <sup>2</sup> :

$$P\left(\boldsymbol{\omega}, \boldsymbol{\sigma} \mid \mathbf{D}, \mathbf{I}\right) = \int P\left(\boldsymbol{\omega}, \mathbf{B}, \boldsymbol{\sigma} \mid \mathbf{D}, \mathbf{I}\right) d\mathbf{B}.\tag{12}$$

With the choices of an uniform prior PDF or independent Gaussians distributions with mean zero and known standard deviation for the amplitudes, the integral equations in (12) turn out to be a Gaussian integral which can be evaluated analytically (Bretthorst, 1988). To do this it is simply to convert the square matrix **Ω** into a special type of matrix- a so called diagonal matrix- that shares the same fundamental properties of the underlying matrix. In other words, it is equivalent to transforming the underlying system of equations into a special set of coordinate axes. Therefore, this diagonalization process (Bretthorst, 1988) effectively introduces new orthonormal model functions,

#### Bayesian Recovery of Sinusoids with Simulated Annealing 71

$$H\_j(t; \mathfrak{U}) = \frac{1}{\sqrt{2}\_j} \sum\_{k=1}^{2\rho} \upsilon\_{jk} \mathcal{G}\_k(t; \mathfrak{U})\_\prime \left(j = 1, \dots, 2\,\rho\right), \tag{13}$$

and also gives a new expression for the signal model function in Equation (3):

$$f(t; \Theta) = \sum\_{k=1}^{2\rho} A\_k H\_k(t; \omega),\tag{14}$$

The new amplitudes *Ak* 's are related to the old amplitudes *<sup>j</sup> B* 's by

$$A\_k = \sqrt{\lambda\_k} \sum\_{j=1}^{2\rho} B\_j \nu\_{k j'}(k=1,...,2\,\rho) \tag{15}$$

where *kj* represents the *j* th component of the *k* th normalized eigenvector of *jk* , with *<sup>j</sup>* as the corresponding eigenvalue. Substituting these expressions into Equation (9) and defining

$$h\_j = \sum\_{i=1}^{N} d\_i H\_j(t\_i; \mathfrak{so}), \quad (j = 1, 2, \dots, 2\,\rho) \tag{16}$$

to be the projection of the data onto the orthonormal model functions (; ) *H t <sup>j</sup>* **ω** , we can then proceed to perform the 2integration over *Aj* in Equation (12) to obtain

$$P(\mathbf{D} \mid \omega, \mathbf{B}, \sigma, l) \propto \sigma^{-N+2\rho} \exp\left(-\frac{N\overline{\mathbf{D}^2} \cdot 2\rho \overline{\mathbf{h}^2}}{2\sigma^2}\right) \tag{17}$$

with

70 Simulated Annealing – Advances, Applications and Hybridizations

for normalizing the posterior PDF <sup>2</sup> *P* **ω**,**B D** , |,

where the exponent *Q* is defined as follows

2

1

*i*

effectively introduces new orthonormal model functions,

nuisance parameters **B** under the knowledge of <sup>2</sup>

*jk j i k i*

*N*

This is equivalent to

where

and

function *P*( ) **D|I** is the evidence, which is a constant for parameter estimation and is used here

2 22 *P PP* **ω**,**B D** , | , ,, | | ,, , .

<sup>2</sup> | , , , (2 ) , <sup>2</sup> *<sup>N</sup> NQ <sup>P</sup>*

*i jji*

 2 2 2

*j i j k Q d B G t B B N N*

> 2 2 1

1 *<sup>N</sup>*

*i i d N*

; ; , ( , 1,...,2 ).

 

 

<sup>2</sup> <sup>2</sup>

(; )

*ijj i jk j k*

 *exp*

**Ι** . Therefore, it can be dropped in Equation (5)

**Ι** which is

is assumed

**Ι ω B Ι D ω B Ι** (6)

**<sup>ω</sup>** . (8)

**<sup>D</sup>** (10)


 **D ω B Ι** (7)

1 1

; ,

**ω ω** (11)

**<sup>D</sup> <sup>ω</sup>** (9)

 

so that the joint PDF for the unknown parameters turns out to be the following form:

A key component in Bayes theorem is the likelihood function <sup>2</sup> *P* **D**| ,, , **ω B**

proportional to the probability density of the noise. If its standard deviation

2 2 <sup>2</sup>

1 1

2 1

*G t G t jk*

In order to obtain the posterior PDF for **ω** , Equation (6) can be integrated with respect to the

*P Pd* ,

With the choices of an uniform prior PDF or independent Gaussians distributions with mean zero and known standard deviation for the amplitudes, the integral equations in (12) turn out to be a Gaussian integral which can be evaluated analytically (Bretthorst, 1988). To do this it is simply to convert the square matrix **Ω** into a special type of matrix- a so called diagonal matrix- that shares the same fundamental properties of the underlying matrix. In other words, it is equivalent to transforming the underlying system of equations into a special set of coordinate axes. Therefore, this diagonalization process (Bretthorst, 1988)

:

*i j Q d BG t* 

*N*

*N*

to be known, then the likelihood function takes on the following form:

$$\overline{\mathbf{h}^2} = \frac{1}{2\rho} \sum\_{j=1}^{2\rho} \mathbf{h}\_j^2. \tag{18}$$

This represents the mean-square of the observed projections. If is known, the joint posterior probability density of the parameters **ω** , conditional on the data and our knowledge of is given by

$$P\left(\boldsymbol{\omega} \middle| \mathbf{D}, \sigma, I\right) \propto \exp\left(\frac{\rho \overline{\mathbf{h}^2}}{\sigma^2}\right). \tag{19}$$

If there is no prior information about noise, then is known as a nuisance parameter and must also be eliminated by integrating it out. Using Jeffreys prior (Jeffreys, 1961) <sup>1</sup> and integrating Equation (12) over we obtain

$$P\left(\omega \middle| \mathbf{D}, I\right) \propto \left(1 - \frac{2\rho \overline{\mathbf{h}^2}}{N \overline{\mathbf{D}^2}}\right)^{\frac{2\rho - N}{2}}.\tag{20}$$

Bayesian Recovery of Sinusoids with Simulated Annealing 73

**ω D** (27)

*v* (28)

<sup>1</sup> , , *k k P IP I* **ω D ω D** (29)

of the current

*<sup>j</sup>* **v** *v*

**<sup>B</sup>** (26)

.Because there is no variation at negative frequencies and the

, so that the corresponding uncertainty in **B** is

<sup>1</sup> ( 1,2,3,...,2 ) *jk jk*

*k*

Bayesian approach introduced by Bretthorst is briefly summarized in section 2 but, it is referred to Bretthorst's works (Bretthorst, 1988) for more detail information. Consequently, Bayesian parameter estimation turns into a global optimization problem which is a task to

*max P I* , , **<sup>ω</sup>**

highest possible frequencies corresponds to wave that under goes a complete cycle in two unit intervals, so the lower limit on the range is 0 and all the variation is accounted for by

Over last few decades, researchers have developed many computational algorithms to address such type of global optimization problems (Metropolis et al., 1953; Jeffreys, 1961; Kirkpatrick, et al., 1983; Laarhoven & Aarts, 1987; Stefankovic et al., 2009). Although there are numerous algorithms which are suggested to achieve this goal, few of them are capable of locating it effectively. Therefore, we follow the SA algorithm, suggested by Corana (Corana et. al., 1987) and modified by Goffe (Goffe et al., 1994), which is a kind of probabilistic algorithm for finding the global optimum of Equation (27) although its various alternative versions have already been used in statistical applications. A brief review of the most work on many algorithms based on that of SA, together with areas of applications is

The algorithm begins with an initial guess of the frequencies <sup>0</sup> **ω** , a trial step-length vector <sup>0</sup> **<sup>v</sup>** and a global parameter 0 (called the initial temperature). Each step of the SA algorithm replaces the current frequency with randomly generated new frequency. In other words, the

> <sup>1</sup> , *k kk j jj*

is a uniformly distributed random number from the interval [-1,1] and *k k*

 

next candidate point *<sup>k</sup>*<sup>1</sup> **ω** is generated by varying one component *j* 1,...,

is a step vector. The function value of <sup>1</sup> , *<sup>k</sup> P I* **ω D** is then computed. If

The uncertainty in *Aj* is

in the solution space {0, }

frequencies less than

point *<sup>k</sup>* **ω** at a time:

where 

> .

provided by Ingber and Binder (Binder, 1986; Ingber, 1994).

 2

1

**3. Implementation of Simulated Annealing (SA) algorithm** 

find the best possible solution **ω** for Equations (19) or (20) satisfying

 

*j j*

 

This has the form of the "Student *t-* distribution". As well as determining the values of the *ω* parameters for which the posterior PDF for *ω* is a maximum, it is also desirable to compute uncertainties associated with them. To do this, let us assume the case where is known and let *ω*ˆ represent the estimated values of *ω* . Following to Bretthorst's work (Bretthorst, 1988), we can expand the function <sup>2</sup> **h** in a Taylor series at the point *ω*ˆ , such that

$$P(\boldsymbol{\omega} \mid \mathbf{D}, I) \propto \exp\left(-\sum\_{j=1}^{\rho} \sum\_{k=1}^{\rho} \frac{b\_{jk}}{2\sigma^2} \Delta\_j \Delta\_k\right) \tag{21}$$

with *jk b* defined as 2 2 ˆ <sup>ˆ</sup> *<sup>j</sup> <sup>j</sup> k k jk j k b* **<sup>h</sup>** and ( ) <sup>ˆ</sup> *<sup>j</sup> j j* for a single frequency case. For

an arbitrary model the matrix *jk b* cannot be calculated analytically; however, it can be evaluated numerically. The calculations of the mean and standard deviations for *ω* parameters require for evaluating Gaussian integrals by first changing to the orthogonal variables as was done above with the amplitudes. Let *<sup>j</sup>* and *kj u* represent the *j*th eigenvalue and eigenvector of the matrix *jk b* , respectively. Then the new orthogonal variables are given by

$$s\_j = \sqrt{\tau\_j} \sum\_{k=1}^{\mathcal{P}} \Delta\_k \boldsymbol{\mu}\_{kj\prime} \quad \Delta\_j = \sum\_{k=1}^{\mathcal{P}} \frac{s\_k \boldsymbol{\mu}\_{jk}}{\sqrt{\tau\_k}}.\tag{22}$$

By using these orthogonal variables to perform the *ρ* Gaussian integrals, the estimate variance for *<sup>k</sup>* can be calculated:

$$
\sigma\_{\alpha}^{2} = \left\langle \sigma^{2} \right\rangle \sum\_{j=1}^{\rho} \frac{\mu\_{jk}^{2}}{\tau\_{j}}.\tag{23}
$$

Therefore, the approximations for *ω* can be implemented in the form:

$$
\langle \alpha\_{\rangle} \rangle = \hat{\alpha}\_{\rangle} \pm \sigma\_{\alpha\_{\rangle}}, \quad (j = 1, 2, 3, \dots, \rho). \tag{24}
$$

In agree with Bretthorst, the expectation values of the amplitudes are given by *Aj <sup>j</sup> h* . From Equation (15) the expected values for the old amplitudes **B** becomes

$$
\langle B\_k \rangle = \sum\_{j=1}^{2\rho} \frac{1}{\sqrt{\mathcal{A}\_j}} h\_j \nu\_{jk} \,. \tag{25}
$$

The uncertainty in *Aj* is , so that the corresponding uncertainty in **B** is

72 Simulated Annealing – Advances, Applications and Hybridizations

uncertainties associated with them. To do this, let us assume the case where

we can expand the function <sup>2</sup> **h** in a Taylor series at the point *ω*ˆ , such that

*P I exp*

 

 

ˆ <sup>ˆ</sup> *<sup>j</sup> <sup>j</sup> k k*

2

*jk*

*b*

2 2

 

variables as was done above with the amplitudes. Let *<sup>j</sup>*

*j k*

with *jk b* defined as

variables are given by

*<sup>k</sup>* can be calculated:

variance for

*P I*

**ω D**

2 2 2

*N*

(20)

is known and

for a single frequency case. For

and *kj u* represent the *j*th

(24)

2 <sup>2</sup> , 1 .

*N*

This has the form of the "Student *t-* distribution". As well as determining the values of the *ω* parameters for which the posterior PDF for *ω* is a maximum, it is also desirable to compute

let *ω*ˆ represent the estimated values of *ω* . Following to Bretthorst's work (Bretthorst, 1988),

( | ,) <sup>2</sup>

 

**h**

**D**

1 1

*j k*

**<sup>h</sup>** and ( ) <sup>ˆ</sup> *<sup>j</sup>*

an arbitrary model the matrix *jk b* cannot be calculated analytically; however, it can be evaluated numerically. The calculations of the mean and standard deviations for *ω* parameters require for evaluating Gaussian integrals by first changing to the orthogonal

eigenvalue and eigenvector of the matrix *jk b* , respectively. Then the new orthogonal

1 1

By using these orthogonal variables to perform the *ρ* Gaussian integrals, the estimate

*k k k*

*j j k kj j*

2 2

 

 

Therefore, the approximations for *ω* can be implemented in the form:

 

From Equation (15) the expected values for the old amplitudes **B** becomes

*s u* 

, . *k jk*

2

1 . *jk j j*

<sup>ˆ</sup> , ( 1,2,3,..., ). *<sup>j</sup> j j <sup>j</sup>*

In agree with Bretthorst, the expectation values of the amplitudes are given by *Aj <sup>j</sup> h* .

2

1 <sup>1</sup> . *k j jk j j B h* 

 

*u*   *s u*

 

*jk*

*j j* 

*b*

*j k*

**<sup>ω</sup> <sup>D</sup>** (21)

(22)

(23)

(25)

$$
\sigma\_{\mathbf{B}} = \sigma \sum\_{j=1}^{2\rho} \frac{1}{\sqrt{\lambda\_j}} \nu\_{jk} \nu\_{jk} \quad \text{( $k=1,2,3,...,2$   $\rho$ )}\tag{26}
$$

#### **3. Implementation of Simulated Annealing (SA) algorithm**

Bayesian approach introduced by Bretthorst is briefly summarized in section 2 but, it is referred to Bretthorst's works (Bretthorst, 1988) for more detail information. Consequently, Bayesian parameter estimation turns into a global optimization problem which is a task to find the best possible solution **ω** for Equations (19) or (20) satisfying

$$\max\_{\omega \in \Xi\_{\rho}} \left\langle P(\omega | \mathbf{D}\_{\prime} I) \right\rangle\_{\prime} \tag{27}$$

in the solution space {0, } .Because there is no variation at negative frequencies and the highest possible frequencies corresponds to wave that under goes a complete cycle in two unit intervals, so the lower limit on the range is 0 and all the variation is accounted for by frequencies less than .

Over last few decades, researchers have developed many computational algorithms to address such type of global optimization problems (Metropolis et al., 1953; Jeffreys, 1961; Kirkpatrick, et al., 1983; Laarhoven & Aarts, 1987; Stefankovic et al., 2009). Although there are numerous algorithms which are suggested to achieve this goal, few of them are capable of locating it effectively. Therefore, we follow the SA algorithm, suggested by Corana (Corana et. al., 1987) and modified by Goffe (Goffe et al., 1994), which is a kind of probabilistic algorithm for finding the global optimum of Equation (27) although its various alternative versions have already been used in statistical applications. A brief review of the most work on many algorithms based on that of SA, together with areas of applications is provided by Ingber and Binder (Binder, 1986; Ingber, 1994).

The algorithm begins with an initial guess of the frequencies <sup>0</sup> **ω** , a trial step-length vector <sup>0</sup> **<sup>v</sup>** and a global parameter 0 (called the initial temperature). Each step of the SA algorithm replaces the current frequency with randomly generated new frequency. In other words, the next candidate point *<sup>k</sup>*<sup>1</sup> **ω** is generated by varying one component *j* 1,..., of the current point *<sup>k</sup>* **ω** at a time:

$$
\alpha\_{j}^{k+1} = \alpha\_{j}^{k} + \xi v\_{j}^{k},\tag{28}
$$

where is a uniformly distributed random number from the interval [-1,1] and *k k <sup>j</sup>* **v** *v* is a step vector. The function value of <sup>1</sup> , *<sup>k</sup> P I* **ω D** is then computed. If

$$P\left(\boldsymbol{\omega}^{k+1}\middle|\mathbf{D},I\right) \geq P\left(\boldsymbol{\omega}^{k}\middle|\mathbf{D},I\right) \tag{29}$$

then the point *<sup>k</sup>*<sup>1</sup> **ω** is accepted as the *k* 1 th iteration point, it is replaced with *<sup>k</sup>* **ω** and algorithm moves uphill. If <sup>1</sup> , *<sup>k</sup> P I* **<sup>ω</sup> <sup>D</sup>** is the largest posterior PDF, denoted as *P I opt* **<sup>ω</sup> <sup>D</sup>**, its value and *<sup>k</sup>*<sup>1</sup> **ω** are recorded since this is the best current value of the optimum. This forces the system toward a state corresponding to a local maximum or possibly a global maximum. However, most large optimization problems, like the one given in Equation (27), have many local maxima and optimization algorithm is therefore often trapped in a local maximum. To get out of a local maximum, a decrease of the function value <sup>1</sup> , *<sup>k</sup> P I* **ω D** is accepted with a certain probability. This is accomplished by the Metropolis-criterion (Metropolis et al., 1953) which is based on the changes of obtaining new state with the posterior PDF of frequencies value, defined as

$$p = \frac{\exp\left(-\frac{P\left(\boldsymbol{\omega}^{k+1}\,\middle|\,\mathbf{D},I\right)}{\Gamma^{k}}\right)}{\exp\left(-\frac{P\left(\boldsymbol{\omega}^{k+1}\,\middle|\,\mathbf{D},I\right)}{\Gamma^{k}}\right) + \exp\left(-\frac{P\left(\boldsymbol{\omega}^{k}\,\middle|\,\mathbf{D},I\right)}{\Gamma^{k}}\right)}\,\tag{30}$$
  $\approx \exp\left(-\frac{\Lambda P}{\Gamma^{k}}\right)$ 

Bayesian Recovery of Sinusoids with Simulated Annealing 75

(33)

*<sup>j</sup>* . On the other hand, *<sup>k</sup>*

**J** (34)

*T*

**<sup>ω</sup> ω ω** (35)

*j* 

is the CRLB for

*<sup>j</sup>* can be extracted from

(36)

(37)

where *<sup>j</sup> n* is the number of accepted moves along the direction *j* and the parameter *<sup>j</sup> c* , which is initially defined by user, controls the step variation along the *j* th direction. The aim of these variations in a step length is to maintain average percentage of accepted moves

*N j*

where *N*(0,1) is a standard Normal distribution, *<sup>k</sup>* represents the temperature at the *k* th

the *j* th component of angular frequencies at the *k* 1 th iteration (Ireland, 2007) and

*j j*

 <sup>2</sup> 2 2

*P I f f <sup>E</sup>* **<sup>D</sup> <sup>ω</sup> ω ω**

is an expectation of the second derivatives of the signal function with respect to **ω** . Assuming that the matrix **Jθ** is diagonal for a large *N* so that its inversion is straightforward. In this case, the diagonal elements yield the lower bound (Stoica et al.,

<sup>24</sup> <sup>ˆ</sup> var <sup>ˆ</sup> , ˆ ˆ ( ) *j CRLB j NB B j j*

This whole cycle is then repeated *n* times, after which the temperature is decreased by a

heart of the algorithm and effects the number of times the temperature is decreased. If a fast cooling takes place then the problem will be trapped in a local maximum. Therefore, there are various annealing schedules suggested by different researchers (Ingber, 1994;

1 1

ˆ represents the estimated variance of the noise and is described in Equation (41).

. This process is generally called annealing (or cooling) schedule which is the

*k* 

*k*

exp{ 1 }

 

Stefankovic, et al., 2009) for lowering the temperature but we choose the following:

*k*

1 ln ( , ) <sup>1</sup> ,

32 2

*j*

*<sup>N</sup> j j*

1989; Lagrange, 2005) for the variance of **ω**ˆ asymptotically and we can write,

An alternative step size of above SA algorithm is given by replacing Equation (28) with

 

<sup>1</sup> 0,1 , ( 1,2,..., ). *kk k*

provides a theoretical lower limit on how precisely parameter

where the Fisher information matrix **Jω** (Kay, 1993), defined by

<sup>2</sup>

noisy measurements. In this respect, it is defined in the form:

<sup>1</sup>

**J ω**

where <sup>2</sup> 

factor (0,1) 

 *j j kj* 

at about one-half of the total number of moves.

iteration and rescales the step size and ordering

where *P* represents difference between the present and previous posterior PDF values of frequencies, e.g., <sup>1</sup> , , *k k PP I P I* **ω D ω D** . Whenever

$$P\left(\boldsymbol{\omega}^{k+1}\,\middle|\,\mathbf{D},I\right) < P\left(\boldsymbol{\omega}^{k}\,\middle|\,\mathbf{D},I\right) \tag{31}$$

*p* is computed and compared to *p* , a uniformly distributed random number from the interval 0,1 . If *p p* , the new point *<sup>k</sup>*<sup>1</sup> **ω** is accepted and replaced with *<sup>k</sup>* **ω** and the algorithm moves downhill, i.e. lower temperatures and larger differences in posterior PDF's values. This continues until all components have been altered and thus new points have successively accepted or rejected according to the Metropolis criterion. After this process is repeated *<sup>s</sup> <sup>n</sup>* times the step vector *<sup>k</sup>* **<sup>v</sup>** is adjusted by the following rule:

$$\boldsymbol{\sigma}^{k+1}\_{\boldsymbol{\sigma}} = \begin{cases} \boldsymbol{\sigma}^{k}\_{\boldsymbol{j}} \left( 1 + c\_{\boldsymbol{j}} \frac{\left( \boldsymbol{n}\_{j} - 0.6 \boldsymbol{n}\_{s} \right)}{0.4 \boldsymbol{n}\_{s}} \right) \text{if } \frac{\boldsymbol{n}\_{j}}{\boldsymbol{n}\_{s}} > 0.6\\ \frac{\boldsymbol{\sigma}^{k}\_{\boldsymbol{j}}}{\left( 1 + c\_{\boldsymbol{j}} \frac{\left( 0.4 \boldsymbol{n}\_{s} - \boldsymbol{n}\_{j} \right)}{0.4 \boldsymbol{n}\_{s}} \right) & \text{if } \frac{\boldsymbol{n}\_{j}}{\boldsymbol{n}\_{s}} < 0.4\\ \boldsymbol{\sigma}^{k}\_{\boldsymbol{j}} & \text{otherwise} \end{cases} \tag{32}$$

where *<sup>j</sup> n* is the number of accepted moves along the direction *j* and the parameter *<sup>j</sup> c* , which is initially defined by user, controls the step variation along the *j* th direction. The aim of these variations in a step length is to maintain average percentage of accepted moves at about one-half of the total number of moves.

An alternative step size of above SA algorithm is given by replacing Equation (28) with

$$
\alpha \rho\_j^{k+1} = \alpha\_j^k + \Gamma\_k \sqrt{\gamma\_j^k} \mathcal{N}(0, 1), (j = 1, 2, \dots, \rho). \tag{33}
$$

where *N*(0,1) is a standard Normal distribution, *<sup>k</sup>* represents the temperature at the *k* th iteration and rescales the step size and ordering *<sup>j</sup>* . On the other hand, *<sup>k</sup> j* is the CRLB for the *j* th component of angular frequencies at the *k* 1 th iteration (Ireland, 2007) and provides a theoretical lower limit on how precisely parameter *<sup>j</sup>* can be extracted from noisy measurements. In this respect, it is defined in the form:

$$\boldsymbol{\gamma}\_{j} = \mathbf{J}^{-1} \left( \boldsymbol{\alpha}\_{j} \right) \tag{34}$$

where the Fisher information matrix **Jω** (Kay, 1993), defined by

74 Simulated Annealing – Advances, Applications and Hybridizations

posterior PDF of frequencies value, defined as

*p*

values. This continues until all

exp

frequencies, e.g., <sup>1</sup> , , *k k PP I P I* **ω D ω D** . Whenever

 

*k*

1

*j*

1

  *j*

*P*

then the point *<sup>k</sup>*<sup>1</sup> **ω** is accepted as the *k* 1 th iteration point, it is replaced with *<sup>k</sup>* **ω** and

algorithm moves uphill. If <sup>1</sup> , *<sup>k</sup> P I* **<sup>ω</sup> <sup>D</sup>** is the largest posterior PDF, denoted as *P I opt* **<sup>ω</sup> <sup>D</sup>**,

its value and *<sup>k</sup>*<sup>1</sup> **ω** are recorded since this is the best current value of the optimum. This forces the system toward a state corresponding to a local maximum or possibly a global maximum. However, most large optimization problems, like the one given in Equation (27), have many local maxima and optimization algorithm is therefore often trapped in a local maximum. To get out of a local maximum, a decrease of the function value <sup>1</sup> , *<sup>k</sup> P I* **ω D** is accepted with a certain probability. This is accomplished by the Metropolis-criterion (Metropolis et al., 1953) which is based on the changes of obtaining new state with the

1

*k k*

*P I*

**ω D**

 

, exp

, , exp exp

where *P* represents difference between the present and previous posterior PDF values of

<sup>1</sup> , , *k k P IP I* **ω D ω D** (31)

*p* is computed and compared to *p* , a uniformly distributed random number from the interval 0,1 . If *p p* , the new point *<sup>k</sup>*<sup>1</sup> **ω** is accepted and replaced with *<sup>k</sup>* **ω** and the algorithm moves downhill, i.e. lower temperatures and larger differences in posterior PDF's

have successively accepted or rejected according to the Metropolis criterion. After this

*k j s j*

*v c if*

0.6 1 0.6 0.4

*n n n*

*v n*

*s*

*n*

0.4

0.4

 

*v if*

*j*

*k*

*v*

*j*

*c*

*j*

*k k j*

process is repeated *<sup>s</sup> <sup>n</sup>* times the step vector *<sup>k</sup>* **<sup>v</sup>** is adjusted by the following rule:

*j*

components have been altered and thus

*s s*

*n n*

*s s j*

*n n n*

0.4

otherwise

*k k k k*

*P I PI*

 

,

(32)

new points

**ω D ω D** (30)

1

$$\mathbf{J}(\boldsymbol{\omega}) = E\left[\frac{\hat{\boldsymbol{\sigma}}^2 \ln P(\mathbf{D} | \boldsymbol{\omega}, I)}{\hat{\boldsymbol{\sigma}} \boldsymbol{\omega}^2}\right] = \frac{1}{\sigma^2} \sum\_{j=1}^N \frac{\partial f\_j(\boldsymbol{\omega})}{\hat{\boldsymbol{\sigma}} \boldsymbol{\omega}} \left(\frac{\partial f\_j(\boldsymbol{\omega})}{\hat{\boldsymbol{\sigma}} \boldsymbol{\omega}}\right)^T,\tag{35}$$

is an expectation of the second derivatives of the signal function with respect to **ω** . Assuming that the matrix **Jθ** is diagonal for a large *N* so that its inversion is straightforward. In this case, the diagonal elements yield the lower bound (Stoica et al., 1989; Lagrange, 2005) for the variance of **ω**ˆ asymptotically and we can write,

$$\gamma\_{\dot{j}} = \text{var}\_{\text{CRLB}}\left(\hat{\boldsymbol{\alpha}}\_{\dot{j}}\right) \cong \frac{24\hat{\sigma}^2}{N^3\left(\hat{\boldsymbol{\mathcal{B}}}\_{\dot{j}}^2 + \hat{\boldsymbol{\mathcal{B}}}\_{\dot{j}+\rho}^2\right)}\tag{36}$$

where <sup>2</sup> ˆ represents the estimated variance of the noise and is described in Equation (41). This whole cycle is then repeated *n* times, after which the temperature is decreased by a factor (0,1) . This process is generally called annealing (or cooling) schedule which is the heart of the algorithm and effects the number of times the temperature is decreased. If a fast cooling takes place then the problem will be trapped in a local maximum. Therefore, there are various annealing schedules suggested by different researchers (Ingber, 1994; Stefankovic, et al., 2009) for lowering the temperature but we choose the following:

$$\Gamma\_{k+1} = a\_{\Gamma} \frac{\Gamma\_k}{\exp\{(a\_{\Gamma} - 1)k^{\rho}\}} \tag{37}$$

Because of being exponential rather than logarithmic, it is sometimes known as simulated quenching (SQ) (Vasan et al., 2009; Aguiare et al., 2012). In case of a well conditioned estimation problem like, say, frequency estimation problem in signal processing, it is clear that the convenience of SA algorithm, together with a need for some global search over local optima, makes a strong practical case for the use of SQ. Therefore, different parameters have different finite ranges and different annealing time- dependent sensitivities. Classical annealing algorithms have distributions that sample infinite ranges and there is no decision for considering differences in each parameter dimension; e.g. different sensitivities might be necessary to use different annealing schedules. This requires the development of a new PDF to embed in the desired features (Ingber, 1994) so that it leads to variants of SA that justifies exponential temperature annealing schedules.

Termination of the algorithm occurs when average function value of the sequences of the points after each *<sup>s</sup> n n* step cycle reaches a stable state:

$$\begin{aligned} \left| P\_k \left( \boldsymbol{\omega} \big| \mathbf{D}\_\prime I \right) - P\_{k-l} \left( \boldsymbol{\omega} \big| \mathbf{D}\_\prime I \right) \right| &\leq \varepsilon \quad (l = 1, \ldots, 4) \\\left| P\_k \left( \boldsymbol{\omega} \big| \mathbf{D}\_\prime I \right) - P\_{opt} \left( \boldsymbol{\omega} \big| \mathbf{D}\_\prime I \right) \right| &\leq \varepsilon \end{aligned} \tag{38}$$

Bayesian Recovery of Sinusoids with Simulated Annealing 77

, initially and added to the

(41)

**<sup>h</sup>** (42)

. However, it is better to start with the locations of

and 2, ( 1,..., ) *<sup>j</sup> c j*

.

This function stresses information about the total energy carried by the signal and about the accuracy of each line. In the next section, we will present some numerical examples how

To verify the performance of the algorithm, we generated a simulated data vector according to one, two and with five sinusoids. Here *it* runs over the symmetric time interval *T* to *T* in (2 1) 512 *T* integer steps and the components of the noise *<sup>i</sup> e* are generated from the

simulated data. However, one of the interests in an experiment is also to estimate noise

*N N*

Clearly, it is seen that the accuracy of the estimate depends on how long we sample and the signal-to-noise ratio (SNR), defined as the ratio of the root mean square of the signal

> <sup>2</sup> SNR 1 . *N*

When the standard deviation of the noise is unknown, an empirical SNR is obtained by

0.001 0.5403cos 0.3 0.8415sin 0.3 ( 1,...,512). *<sup>i</sup> <sup>i</sup> i i d t tei* (43)

We then carried out the Bayesian analysis of the simulated data, assuming that we know the mathematical form of the model but not the value of the parameters. We first gave starting values to the list of frequencies to begin a multidimensional search for finding a global maximum of the posterior PDF of the frequencies **ω** given in Equations (19) or (20). As an initial estimate of the frequencies <sup>0</sup> **ω** for the maximization procedure, it is possible to take

the peaks chosen automatically from the Fourier power spectral density graph by using a

In agreement with Corana (Corana, et al., 1987), reasonable values of the parameters that

Then the global optimization algorithm starts at some high temperature <sup>0</sup> 100 . Thus the

in Equation (42) with the estimated noise variance in (41).

In our first example, we generate the data set from the following equation:

control the SA algorithm are chosen as 20 *<sup>s</sup> n* , *n* max 100,5

2 2

**D h**

*N*

so that it is assumed to be unknown. In agreement with Bretthorst, this is given in

<sup>2</sup> <sup>2</sup> 1 . 22 24

 

. In addition, one may also get the following expression of SNR:

2 2

well this technique works.

power <sup>2</sup> 

replacing <sup>2</sup> 

the following form:

amplitude to the noise <sup>2</sup>

**5. Computer simulated examples** 

zero mean Gaussian distribution with a known deviation

2

random choices from the interval0,

computer code written in Mathematica.

where is a small positive number defined by user and *l* indicates the last four successive iteration values of the posterior PDF of the frequencies that are being stored. Further details of the algorithm initialization are problem-dependent and are given in Section 5.

#### **4. Power spectral density**

Before we discuss the computer simulated examples, there is something we need to say about how to display the results. The usual way the result from a spectral analysis is displayed is in the form of a power spectral density that shows the strength of the variation (energy) as a function of frequency. In Fourier transform spectroscopy this is typically taken as the squared magnitude of the discrete Fourier transform of the data. In order to display our results in the form of a power spectral density (Bretthorst, 1988; Gregory, 2005), it is necessary to give an attention to its definition that shows how much power is contained in a unit frequency. According to Bretthorst (Bretthorst, 1988) the Bayesian power spectral density is defined as the expected value of the power of the signals over the joint posterior PDF:

$$S(\boldsymbol{\omega}) = \frac{N}{2} \left[ \sum\_{j=1}^{\rho} \binom{\rho}{j} \boldsymbol{B}\_j^2 + \boldsymbol{B}\_{j+\rho}^2 \right] \mathbf{P} \left( \boldsymbol{\omega}, \mathbf{B} \middle| \, \boldsymbol{D}\_{\prime} \boldsymbol{\sigma}^2, I \right) d\boldsymbol{B}\_1 \dots d\boldsymbol{B}\_{2\rho}.\tag{39}$$

Performing integrals analytically over 1 2 *B B* ,..., by using these orthonormal model functions defined in section 2, the power spectral density can therefore be approximated as

$$S(\mathfrak{w}) = 2\left(\sigma^2 + \sum\_{j=1}^{2\rho} h\_j^2(\hat{\mathfrak{w}})\right) \sum\_{k=1}^{\rho} \sqrt{\frac{b\_{kk}}{2\pi\sigma^2}} \exp(-\frac{b\_{kk}(\hat{\alpha}\_k - \alpha)^2}{2\sigma^2}).\tag{40}$$

This function stresses information about the total energy carried by the signal and about the accuracy of each line. In the next section, we will present some numerical examples how well this technique works.

#### **5. Computer simulated examples**

76 Simulated Annealing – Advances, Applications and Hybridizations

exponential temperature annealing schedules.

**4. Power spectral density** 

where 

PDF:

points after each *<sup>s</sup> n n* step cycle reaches a stable state:

Because of being exponential rather than logarithmic, it is sometimes known as simulated quenching (SQ) (Vasan et al., 2009; Aguiare et al., 2012). In case of a well conditioned estimation problem like, say, frequency estimation problem in signal processing, it is clear that the convenience of SA algorithm, together with a need for some global search over local optima, makes a strong practical case for the use of SQ. Therefore, different parameters have different finite ranges and different annealing time- dependent sensitivities. Classical annealing algorithms have distributions that sample infinite ranges and there is no decision for considering differences in each parameter dimension; e.g. different sensitivities might be necessary to use different annealing schedules. This requires the development of a new PDF to embed in the desired features (Ingber, 1994) so that it leads to variants of SA that justifies

Termination of the algorithm occurs when average function value of the sequences of the

, , ( 1,...,4)

 is a small positive number defined by user and *l* indicates the last four successive iteration values of the posterior PDF of the frequencies that are being stored. Further details

(38)

 

of the algorithm initialization are problem-dependent and are given in Section 5.

**ω D ω D ω D ω D**

*P IP I*

*k k l k opt*

2 2

Performing integrals analytically over 1 2 *B B* ,...,

, ,

Before we discuss the computer simulated examples, there is something we need to say about how to display the results. The usual way the result from a spectral analysis is displayed is in the form of a power spectral density that shows the strength of the variation (energy) as a function of frequency. In Fourier transform spectroscopy this is typically taken as the squared magnitude of the discrete Fourier transform of the data. In order to display our results in the form of a power spectral density (Bretthorst, 1988; Gregory, 2005), it is necessary to give an attention to its definition that shows how much power is contained in a unit frequency. According to Bretthorst (Bretthorst, 1988) the Bayesian power spectral density is defined as the expected value of the power of the signals over the joint posterior

> 2 2 <sup>2</sup> <sup>1</sup> 1 2 ( ) , , , ... . <sup>2</sup> *j j <sup>j</sup> <sup>N</sup> <sup>S</sup> B B P D I dB dB*

2 2

**ω ω <sup>B</sup>** (39)

2 2

 

*kk kk k*

**ω ω** (40)

by using these orthonormal model

functions defined in section 2, the power spectral density can therefore be approximated as

 

( ) <sup>ˆ</sup> ( ) 2 ( ) exp( <sup>ˆ</sup> ). 2 2

1 1

*j j k b b S h* 

*P IP I l*

To verify the performance of the algorithm, we generated a simulated data vector according to one, two and with five sinusoids. Here *it* runs over the symmetric time interval *T* to *T* in (2 1) 512 *T* integer steps and the components of the noise *<sup>i</sup> e* are generated from the zero mean Gaussian distribution with a known deviation , initially and added to the simulated data. However, one of the interests in an experiment is also to estimate noise power <sup>2</sup> so that it is assumed to be unknown. In agreement with Bretthorst, this is given in the following form:

$$\langle \sigma^2 \rangle = \frac{\left( N \overline{\mathbf{D}^2} - 2\rho \mathbf{\overline{h}^2} \right)}{N - 2\rho - 2} \bigg| \left( 1 \pm \sqrt{\frac{2}{N - 2\rho - 4}} \right). \tag{41}$$

Clearly, it is seen that the accuracy of the estimate depends on how long we sample and the signal-to-noise ratio (SNR), defined as the ratio of the root mean square of the signal amplitude to the noise <sup>2</sup> . In addition, one may also get the following expression of SNR:

$$\text{SNR} = \sqrt{\frac{2\rho}{N} \left( 1 + \frac{\overline{\mathbf{h}^2}}{\sigma^2} \right)}. \tag{42}$$

When the standard deviation of the noise is unknown, an empirical SNR is obtained by replacing <sup>2</sup> in Equation (42) with the estimated noise variance in (41).

In our first example, we generate the data set from the following equation:

$$d\_i = 0.001 + 0.5403 \cos\left(0.3t\_i\right) - 0.8415 \sin\left(0.3t\_i\right) + e\_i \quad \text{(i = 1,...,512)}.\tag{43}$$

We then carried out the Bayesian analysis of the simulated data, assuming that we know the mathematical form of the model but not the value of the parameters. We first gave starting values to the list of frequencies to begin a multidimensional search for finding a global maximum of the posterior PDF of the frequencies **ω** given in Equations (19) or (20). As an initial estimate of the frequencies <sup>0</sup> **ω** for the maximization procedure, it is possible to take random choices from the interval0, . However, it is better to start with the locations of the peaks chosen automatically from the Fourier power spectral density graph by using a computer code written in Mathematica.

In agreement with Corana (Corana, et al., 1987), reasonable values of the parameters that control the SA algorithm are chosen as 20 *<sup>s</sup> n* , *n* max 100,5 and 2, ( 1,..., ) *<sup>j</sup> c j* . Then the global optimization algorithm starts at some high temperature <sup>0</sup> 100 . Thus the

sequence of points is generated until a sort of equilibrium is approached; that is a sequence of points <sup>012</sup> **ωωω** , , ,... whose average value of *P I* **ω D**, reaches a stable value as iteration proceeds. During this phase the step vector **v** is periodically adjusted by the rule defined in Equation (32). The best point **ω** reached so far is recorded. After thermal equilibration, the temperature is reduced by using the annealing scheduled in Equation (37) with a factor 0.85 and a new sequence is made starting from this best point **ω** , until thermal equilibrium is reached again and so on. The SA algorithm first builds up a rough view of the surface by moving large step lengths. As the temperature falls and the step decreases, it is slowly focuses on the most promising area. Therefore it proceeds toward better maximum even in the presence of many local maxima by adjusting the step vector that can allow the algorithm to shape the space within it may move, to that within which ought to be as defined by the PDF *P I* **ω D**, . Consequently, the process is stop at a temperature low enough that no more useful improvement can be expected, according to a stopping criterion in Equation (38).

Bayesian Recovery of Sinusoids with Simulated Annealing 79

1 . We run our *Mathematica* code again in the case where the deviation of

Parameters Estimated Values

<sup>1</sup> 0.3000±0.0009

<sup>2</sup> 0.3105±0.0010

 *B* 0.5206±0.0632 *B* -0.8698±0.0631 *B* -0.3819±0.0634 *B* -0.8556±0.0636

In a similar way, we produced the same size data corrupted by the zero mean Gaussian

noise is unknown. The results, shown in Table 2, indicate that all values of the parameters

Bayesian Parameter estimation

0.4161cos 0.31 0.9093sin 0.31 , ( 1,...,512).

*t t i*

(44)

*i i*

**Table 2.** Computer simulations for two closed harmonic frequency model

In general, we consider a multiple harmonic frequency model signal:

cos 0.1 t 1 2cos 0.15 2 5cos 0.3 3

*ii i i*

*d t t*

The best estimates of parameters are tabulated in Table 3. Once again, all the frequencies have been well resolved, even the third and fourth frequencies which are too closed not to be separated by the Fourier power spectral density shown in Figure 3. Actually with the Fourier spectral density when the separation of two frequencies is less than the Nyquist step,

no sample points in between the two frequencies in the frequency domain. If

not large enough, the resolution will be very poor. Therefore, it is hard to tell where the two frequencies are located. This is just the inherent problem of the discrete Fourier power spectral density. In this example two frequencies are separated by 0.01, which is less than the Nyquist step size. There is no way by using Fourier power spectral density that one can resolve the closed frequencies less than the Nyquist step. However, Bayesian power spectral density shown in Figure 3 gives us very good results with high accuracy. Finally, we constructed the signal model in Equation (3), whose parameters, amplitudes and frequencies, are randomly

2 / *N* theoretically the two frequencies can then be distinguished. If

2cos 0.31 4 3cos +5 ( 1,...,512).

(45)

3 4 is

*t t ei*

*i ii*

*N* , the two frequencies are indistinguishable. This is simply because there are

0.5403cos 0.3 0.8415sin 0.3

*ii i*

*dt t*

within the calculated accuracy are clearly recovered.

noise with

3 : 6 *SNR* : 1.02936

*N* : 512

ˆ : 0.99639

defined as 2 /

3 4 

 

: Unknown


**Table 1.** Computer simulations for a single harmonic frequency model

Once the frequencies are estimated, we then carried on calculating the amplitudes and parameter errors approximately using Equations (25), (23) and (26), respectively. However, an evaluation of the posterior PDF at a given point **ω** cannot be made analytically. It requires a numerical calculation of projections onto orthonormal model functions, related to Eigen-decomposition of the 2 2 dimensional matrix **ω** . Therefore, the proposed algorithm was coded in *Mathematica* programming language (Wellin, P., et al., 2005), that provides a much flexible and efficient computer programming environment. Furthermore, it also contains a large collection of built-in functions so that it results much shorter computer codes than those written in C or FORTRAN programming languages.

The computer program was run on the workstation with four processors, which of each has got Intel Core 2 Quad Central Processing Unit (CPU), in two cases where the standard deviation of noise is known or not. The output of the computer simulations when 1 is illustrated in Table 1. The results when is unknown are almost similar with that of Table 1. Parameter values are quoted as (*value*) ± (*standard deviation*). It can be seen that a single frequency and amplitudes are recovered very well. The estimated value of SNR and the standard deviation of the noise are also shown in Table 1.

In our second example, we consider a signal model with two close harmonic frequencies:

Bayesian Recovery of Sinusoids with Simulated Annealing 79

$$\begin{aligned} d\_i &= 0.5403 \cos\left(0.3t\_i\right) - 0.8415 \sin\left(0.3t\_i\right) \\ &- 0.4161 \cos\left(0.31t\_i\right) - 0.9093 \sin\left(0.31t\_i\right), \quad (i = 1, \ldots, 512) . \end{aligned} \tag{44}$$

In a similar way, we produced the same size data corrupted by the zero mean Gaussian noise with 1 . We run our *Mathematica* code again in the case where the deviation of noise is unknown. The results, shown in Table 2, indicate that all values of the parameters within the calculated accuracy are clearly recovered.


**Table 2.** Computer simulations for two closed harmonic frequency model

78 Simulated Annealing – Advances, Applications and Hybridizations

3 : 3 *SNR* : 0.725464

ˆ : 1

 : Known *N* : 512

Eigen-decomposition of the 2 2

illustrated in Table 1. The results when

sequence of points is generated until a sort of equilibrium is approached; that is a sequence of points <sup>012</sup> **ωωω** , , ,... whose average value of *P I* **ω D**, reaches a stable value as iteration proceeds. During this phase the step vector **v** is periodically adjusted by the rule defined in Equation (32). The best point **ω** reached so far is recorded. After thermal equilibration, the temperature is reduced by using the annealing scheduled in Equation (37) with a factor

 0.85 and a new sequence is made starting from this best point **ω** , until thermal equilibrium is reached again and so on. The SA algorithm first builds up a rough view of the surface by moving large step lengths. As the temperature falls and the step decreases, it is slowly focuses on the most promising area. Therefore it proceeds toward better maximum even in the presence of many local maxima by adjusting the step vector that can allow the algorithm to shape the space within it may move, to that within which ought to be as defined by the PDF *P I* **ω D**, . Consequently, the process is stop at a temperature low enough that no more useful improvement can be expected, according to a stopping criterion in Equation (38).

Once the frequencies are estimated, we then carried on calculating the amplitudes and parameter errors approximately using Equations (25), (23) and (26), respectively. However, an evaluation of the posterior PDF at a given point **ω** cannot be made analytically. It requires a numerical calculation of projections onto orthonormal model functions, related to

algorithm was coded in *Mathematica* programming language (Wellin, P., et al., 2005), that provides a much flexible and efficient computer programming environment. Furthermore, it also contains a large collection of built-in functions so that it results much shorter computer

The computer program was run on the workstation with four processors, which of each has got Intel Core 2 Quad Central Processing Unit (CPU), in two cases where the standard

1. Parameter values are quoted as (*value*) ± (*standard deviation*). It can be seen that a single frequency and amplitudes are recovered very well. The estimated value of SNR and the

In our second example, we consider a signal model with two close harmonic frequencies:

deviation of noise is known or not. The output of the computer simulations when

**Table 1.** Computer simulations for a single harmonic frequency model

 

standard deviation of the noise are also shown in Table 1.

codes than those written in C or FORTRAN programming languages.

Bayesian Parameter estimation Parameters Estimated Values

<sup>1</sup> 0.2998 0.0005 <sup>1</sup> *B* 0.6035 0.0589 <sup>2</sup> *B* -0.8174±0.0626

dimensional matrix **ω** . Therefore, the proposed

is unknown are almost similar with that of Table

1 is In general, we consider a multiple harmonic frequency model signal:

$$\begin{aligned} d\_i &= \cos\left(0.1\text{ t}\_i + 1\right) + 2\cos\left(0.15t\_i + 2\right) + 5\cos\left(0.3t\_i + 3\right) \\ &+ 2\cos\left(0.31t\_i + 4\right) + 3\cos\left(t\_i + 5\right) + e\_i \qquad (i = 1, \dots, 512). \end{aligned} \tag{45}$$

The best estimates of parameters are tabulated in Table 3. Once again, all the frequencies have been well resolved, even the third and fourth frequencies which are too closed not to be separated by the Fourier power spectral density shown in Figure 3. Actually with the Fourier spectral density when the separation of two frequencies is less than the Nyquist step, defined as 2 / *N* , the two frequencies are indistinguishable. This is simply because there are no sample points in between the two frequencies in the frequency domain. If 3 4 2 / *N* theoretically the two frequencies can then be distinguished. If 3 4 is not large enough, the resolution will be very poor. Therefore, it is hard to tell where the two frequencies are located. This is just the inherent problem of the discrete Fourier power spectral density. In this example two frequencies are separated by 0.01, which is less than the Nyquist step size. There is no way by using Fourier power spectral density that one can resolve the closed frequencies less than the Nyquist step. However, Bayesian power spectral density shown in Figure 3 gives us very good results with high accuracy. Finally, we constructed the signal model in Equation (3), whose parameters, amplitudes and frequencies, are randomly

chosen from uniform distribution in the intervals 5,5 and 0, , respectively and used it to generate data samples of *N* 512 by adding a zero mean Gaussian random noise with 1 . The proposal algorithm was rerun for recovering sinusoids from it and the results are tabulated in Table (4). It can be seen that frequencies are specially recovered with high accuracies. Ten frequencies signal model are shown in Fig.5.

**Figure 1.** Recovering signal from noisy data produced from a single harmonic frequency signal model.

Frequencies

Amplitudes

deviation

Bayesian Recovery of Sinusoids with Simulated Annealing 81

0 and the standard

Parameters True values Estimated values Bretthorst's results

<sup>1</sup> 0.10 0.1006 0.0004 0.0998 0.0001

<sup>2</sup> 0.15 0.1495 0.0002 0.1498 0.0002

<sup>3</sup> 0.30 0.3001 0.0001 0.3001 0.0002

<sup>4</sup> 0.31 0.3099 0.0004 0.3120 0.0001

<sup>5</sup> 1.00 1.0000 0.0001 0.9999 0.0001

*a* 1.00 0.9905 0.08 0.99 0.08 *a* 2.00 1.9600 0.08 2.08 0.08 *a* 5.00 5.1058 0.09 4.97 0.08 *a* 2.00 1.8199 0.09 1.95 0.08 *a* 3.00 2.9556 0.08 2.92 0.08

On the other hand, modifications of this algorithm (Üstündag & Cevri, 2008; 2011) have already been made by generating the next candidate in Equation (33) from normal distribution with a mean of the current estimation whose standard deviation is a square root of the CRLB (Ireland, 2007) given in Equation (36), which is a lower limit to the variance of the measurement of the frequencies, so this generates a natural scale size of the search space around their estimated values. It is expected that better solutions lie close the ones that are already good and so normally distributed step size is used. Consequently, the results we obtained are comparable with or higher than those obtained previously. In addition, all the results discussed so far are also consistent with those of Bretthorst (Bretthorst, 1988) and also demonstrate the advantages of the Bayesian approach together with SA algorithm. Moreover it appears to be very reliable, in the sense that it always converged to neighborhood of the global maximum. The size of this neighborhood can be reduced by altering the control parameters of the SA algorithm, but this can be expensive in terms of CPU consumption. Moreover, we initially assumed that the values of the random noise in

. Figure 4 shows the exact and estimate probability densities of the random

noise in data. It is seen that the estimated (dotted) probability density is closer to the true (solid) probability density and the histogram of the data is also much closer to the true probability density. The histogram is known as a nonparametric estimator of the probability

Computer simulations had been carried out to compare the performance of the method with the CRLB. To do this, we generated 64 data samples from a single real tone frequency signal

<sup>2</sup>In order to compare the results with Bretthorst's in this example we converted 2 2 ,( 1,..., ). *ii i a BB i*

**Table 3.** Computer simulations for a multiple harmonic frequency model2

data were drawn from the Gaussian density with the mean

density because it does not depend on specified parameters.

**Figure 2.** Recovering signals from noisy data produced from two closed harmonic frequency signal model.

#### Bayesian Recovery of Sinusoids with Simulated Annealing 81


**Table 3.** Computer simulations for a multiple harmonic frequency model2

80 Simulated Annealing – Advances, Applications and Hybridizations

model.

chosen from uniform distribution in the intervals 5,5 and 0,

accuracies. Ten frequencies signal model are shown in Fig.5.







1

2


2 4 Signal



0.5 1.0 Signal

to generate data samples of *N* 512 by adding a zero mean Gaussian random noise with

 1 . The proposal algorithm was rerun for recovering sinusoids from it and the results are tabulated in Table (4). It can be seen that frequencies are specially recovered with high

time

time

**Figure 1.** Recovering signal from noisy data produced from a single harmonic frequency signal model.

time

Signal Observed

time

**Figure 2.** Recovering signals from noisy data produced from two closed harmonic frequency signal

, respectively and used it

Observed Data

Data

Estimated Signal

Estimated Signal

On the other hand, modifications of this algorithm (Üstündag & Cevri, 2008; 2011) have already been made by generating the next candidate in Equation (33) from normal distribution with a mean of the current estimation whose standard deviation is a square root of the CRLB (Ireland, 2007) given in Equation (36), which is a lower limit to the variance of the measurement of the frequencies, so this generates a natural scale size of the search space around their estimated values. It is expected that better solutions lie close the ones that are already good and so normally distributed step size is used. Consequently, the results we obtained are comparable with or higher than those obtained previously. In addition, all the results discussed so far are also consistent with those of Bretthorst (Bretthorst, 1988) and also demonstrate the advantages of the Bayesian approach together with SA algorithm. Moreover it appears to be very reliable, in the sense that it always converged to neighborhood of the global maximum. The size of this neighborhood can be reduced by altering the control parameters of the SA algorithm, but this can be expensive in terms of CPU consumption. Moreover, we initially assumed that the values of the random noise in data were drawn from the Gaussian density with the mean 0 and the standard deviation . Figure 4 shows the exact and estimate probability densities of the random noise in data. It is seen that the estimated (dotted) probability density is closer to the true (solid) probability density and the histogram of the data is also much closer to the true probability density. The histogram is known as a nonparametric estimator of the probability density because it does not depend on specified parameters.

Computer simulations had been carried out to compare the performance of the method with the CRLB. To do this, we generated 64 data samples from a single real tone frequency signal

<sup>2</sup>In order to compare the results with Bretthorst's in this example we converted 2 2 ,( 1,..., ). *ii i a BB i* 

model 1 2 0.3, 1 *B B* and added it to the variety of noise levels. After 50 independent trials under different SNR ratios, the mean square errors (MSE) for the estimated frequencies were obtained and their logarithmic values were plotted with respect to SNR ratios that vary between zero and 20 dB (deci Bell). It can be seen from Figure 6 that the proposed estimator has threshold about 3 dB of the SNR and follows nicely the CRLB after this value. As expected, larger SNR gives smaller MSE. However, many of existing methods in signal processing literature have a MSE that is close to the CRLB when the SNR is more than 20 dB and they usually perform poorly when the SNR is decreased.

Bayesian Recovery of Sinusoids with Simulated Annealing 83

8

1.5978

4.994

1.0755

8

±0.00002

1.5979

4.9877

1.1273

±0.06

±0.06

9

3.0807



9

±0.00002

3.0807



±0.06

±0.06

10

3.1150

1.371

0.206

10

±0.00007

3.1153

1.2779

0.2479

±0.06

±0.06

True Frequencies

0.9715

True Amplitudes

<sup>1</sup> *B* <sup>2</sup> *B* <sup>3</sup> *B* <sup>4</sup> *B* <sup>5</sup> *B* <sup>6</sup> *B* <sup>7</sup> *B* <sup>8</sup> *B* <sup>9</sup> *B* <sup>10</sup> *B*

2.519

<sup>11</sup> *B* <sup>12</sup> *B* <sup>13</sup> *B* <sup>14</sup> *B* <sup>15</sup> *B* <sup>16</sup> *B* <sup>17</sup> *B* <sup>18</sup> *B* <sup>19</sup> *B* <sup>20</sup> *B*


Estimated Frequencies

0.9715

Estimated Amplitudes

<sup>1</sup> *B* <sup>2</sup> *B* <sup>3</sup> *B* <sup>4</sup> *B* <sup>5</sup> *B* <sup>6</sup> *B* <sup>7</sup> *B* <sup>8</sup> *B* <sup>9</sup> *B* <sup>10</sup> *B*

2.5492

<sup>11</sup> *B* <sup>12</sup> *B* <sup>13</sup> *B* <sup>14</sup> *B* <sup>15</sup> *B* <sup>16</sup> *B* <sup>17</sup> *B* <sup>18</sup> *B* <sup>19</sup> *B* <sup>20</sup> *B*


±0.06

±0.06

6

±0.00002

6 7

1.0167


4.559

7

±0.00002

1.0166


4.6550

±0.06

±0.06

5

0.8146

2.765


5

±0.00003

0.8145

2.7536


±0.06

±0.06

1

0.36824


4.634

1

±0.00002

0.3683


4.6103

±0.06

±0.06

2

0.4109



2

±0.00003

0.4107



±0.06

±0.06

3

0.4841



3

±0.00002

0.4842



±0.06

±0.06

4

0.4961

2.919


4

±0.00004

0.4961

2.7857


±0.06

**Table 4.** Computer simulations for ten harmonic frequency signal model

±0.06

**Figure 3.** Spectral analysis of multiple frequency signal model


**Figure 3.** Spectral analysis of multiple frequency signal model

0

1. ×10 6 2. ×10 6 3. ×10 6 4. 6 5. ×10 6

×10

Power Spectral Density 0

10

Power Spectral Density

20

30

40

0.3, 1 *B B* and added it to the variety of noise levels. After 50 independent

trials under different SNR ratios, the mean square errors (MSE) for the estimated frequencies were obtained and their logarithmic values were plotted with respect to SNR ratios that vary between zero and 20 dB (deci Bell). It can be seen from Figure 6 that the proposed estimator has threshold about 3 dB of the SNR and follows nicely the CRLB after this value. As expected, larger SNR gives smaller MSE. However, many of existing methods in signal processing literature have a MSE that is close to the CRLB when the SNR is more

than 20 dB and they usually perform poorly when the SNR is decreased.


5

10

Signal


0.0 0.5 1.0 1.5 2.0 2.5 3.0

Angular Frequency

Bayes Power Spectral Density

0.0 0.2 0.4 0.6 0.8 1.0

Angular Frequency

Fourier Power Spectrum Density


time

model 1 2

**Table 4.** Computer simulations for ten harmonic frequency signal model

Bayesian Recovery of Sinusoids with Simulated Annealing 85

Bre�horsts'

CRLB

The computational complexity of the algorithm is dependent upon a few parameters such as annealing schedule, data samples and parameters. Under using same annealing schedule, Fig. 7 shows only CPU time of different simulations in a variety of number of data samples. It can be clearly seen that an increase in these numbers causes larger consumption of CPU time. With fixed size of components set and specifically annealing schedule of SA algorithm, the overall execution time of the cooling and decision is almost constant, but the runtime of the first two stages (move and evaluate) mostly depends on complicated design constraints and objective functions. Because the move and the evaluation process in the SA algorithm play an important role in CPU resource usage, improving the calculation ability for these stages will be the most feasible approach for an optimizing SA so that parallel computing is

**Figure 6.** The calculated MSE of the proposed method compared with CRLB versus different SNR with

0 5 10 15 20

SNR dB

**Figure 7.** CPU times versus with number of parameters and data samples.

one of best approaches for this goal.




10 Log MSE


0

a white noise.

**Figure 4.** Comparison of exact and estimate probability densities of noise in data 1 .

**Figure 5.** Spectral analysis of ten frequency signal model.

The computational complexity of the algorithm is dependent upon a few parameters such as annealing schedule, data samples and parameters. Under using same annealing schedule, Fig. 7 shows only CPU time of different simulations in a variety of number of data samples. It can be clearly seen that an increase in these numbers causes larger consumption of CPU time. With fixed size of components set and specifically annealing schedule of SA algorithm, the overall execution time of the cooling and decision is almost constant, but the runtime of the first two stages (move and evaluate) mostly depends on complicated design constraints and objective functions. Because the move and the evaluation process in the SA algorithm play an important role in CPU resource usage, improving the calculation ability for these stages will be the most feasible approach for an optimizing SA so that parallel computing is one of best approaches for this goal.

84 Simulated Annealing – Advances, Applications and Hybridizations

**Figure 5.** Spectral analysis of ten frequency signal model.

0

5.0 ×10 6

1.0 ×10 7

Power Spectral Density

1.5 ×10 7

2.0 ×10 7

Power Spectral Density

**Figure 4.** Comparison of exact and estimate probability densities of noise in data



Fourier Power Spectrum Density

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Angular Frequency

Bayes Power Spectrum Density

0.0 0.5 1.0 1.5 2.0 2.5 3.0

Angular Frequency


10

20

Signal

1 .

**Figure 6.** The calculated MSE of the proposed method compared with CRLB versus different SNR with a white noise.

**Figure 7.** CPU times versus with number of parameters and data samples.

## **6. Conclusion**

In this work we have partially developed a Bayesian approach combined with SA algorithm and applied it to spectral analysis and parameter algorithm for estimating parameters of sinusoids from noisy data. Overall, results presented here show that it provides rational approach for estimating the parameters, namely the frequencies and the amplitudes, can be recovered from the experimental data and the prior information with high accuracy, especially the frequency, which is the most important parameter in spectral analysis. A significant advantage of this approach comes from the very large posterior probabilities, which are sharply peaked in the neighborhood of the best fit. This helps us to simplify the problem of choosing starting values for the iteration and it provides a rational approach for estimating, in an optimal way, values of parameters by performing a random search. On the other hand, for sufficiently high SNR, MSEs of frequencies will attain CRLB so that it justifies the accuracy of the frequency estimation. Although the SA algorithm spends large consumption of CPU time, it is competitive when compared to the multiple runs often used with conventional algorithms to test different starting values. As expected, parallel implementation of SA algorithm reduces CPU resource usage.

Bayesian Recovery of Sinusoids with Simulated Annealing 87

*H*(.;.) : Orthonormal model functions <sup>2</sup> **<sup>h</sup>** : Mean of square of projections

**J** . : Fisher information matrix

 : Number of frequencies. *N*: **N**umber of data samples

*k* : Number of iterations *Q* : Least square function

*s* : Sampling variance

*<sup>N</sup> tt t* : Discrete time set

*N*0,1 A standard Normal distribution

*S*(.) : Bayesian power spectral density

: Standard deviation of angular frequency

: *j* th component of normalized Eigenvalues of *jk*

: *j* th component of *k* th normalized Eigenvector of *jk*

: Uniformly distributed random number from the interval 1,1

: Standard deviation of amplitude

 : *j*th Eigenvalue of the matrix *jk b kj u* : *j*th Eigenvector of the matrix *jk b*

: A factor between 0 and 1.

**θ** : Parameters vector

*<sup>k</sup>* **v** : Step-length vector <sup>0</sup> **v** : Trial step-length vector

: Variance of noise

: Solution space of angular frequencies

: Expected value of noise variance

: CRLB for *j* th component of *<sup>k</sup>* **ω**

*ω*ˆ : Estimated angular frequency vector <sup>0</sup> **ω** : An initial guess vector for frequencies

**ω** : Vector of angular frequency

<sup>0</sup> : The initial temperature **Ω** : Diagonal matrix .: Absolute value

**J** : Inverse of Fisher information matrix

*<sup>j</sup> n* : The number of accepted moves along the direction *j* .

**I** : Prior information

<sup>1</sup> .

2

 

*B* 

*j* 

*j* 

*kj* 

2 

<sup>2</sup> 

( ) *k j* 

1 2 ( , ,..., )*<sup>T</sup>*

Data analysis given in this chapter has also been applied to more complicated models and conditions, such as signals decay, periodic but non-harmonic signals, signals with nonstationary, etc.,. In general, we have also not addressed the problem of model selection which is the part of spectral analysis. In this case, one has enough prior information in a given experiment to select the best model among a finite set of model functions so that Bayesian inference helps us to accomplish it. Therefore, it will deserve further investigations.

## **Nomenclature**



implementation of SA algorithm reduces CPU resource usage.

In this work we have partially developed a Bayesian approach combined with SA algorithm and applied it to spectral analysis and parameter algorithm for estimating parameters of sinusoids from noisy data. Overall, results presented here show that it provides rational approach for estimating the parameters, namely the frequencies and the amplitudes, can be recovered from the experimental data and the prior information with high accuracy, especially the frequency, which is the most important parameter in spectral analysis. A significant advantage of this approach comes from the very large posterior probabilities, which are sharply peaked in the neighborhood of the best fit. This helps us to simplify the problem of choosing starting values for the iteration and it provides a rational approach for estimating, in an optimal way, values of parameters by performing a random search. On the other hand, for sufficiently high SNR, MSEs of frequencies will attain CRLB so that it justifies the accuracy of the frequency estimation. Although the SA algorithm spends large consumption of CPU time, it is competitive when compared to the multiple runs often used with conventional algorithms to test different starting values. As expected, parallel

Data analysis given in this chapter has also been applied to more complicated models and conditions, such as signals decay, periodic but non-harmonic signals, signals with nonstationary, etc.,. In general, we have also not addressed the problem of model selection which is the part of spectral analysis. In this case, one has enough prior information in a given experiment to select the best model among a finite set of model functions so that Bayesian inference helps us to accomplish it. Therefore, it will deserve further

2 2

 

*<sup>j</sup> c* : A parameter which controls the step variation along the *<sup>j</sup>* th direction

 **h**

ˆ <sup>ˆ</sup> *j j k k j k* 

 

**6. Conclusion** 

investigations.

*B Bj j* , 

**Nomenclature** 

**B** : Nuisance parameters

*jk b* : A matrix defined as

**D** : A set of observed data *e t*( ): Measurement errors

 : Amplitudes of *j*th sinusoidal *P*. : Marginal probability density function

*P*. . : Conditional probability density function

 : A small positive number defined by user *E* . : Expectation of a function or random variable

*G*(.;.) : Model functions that contains sinus and cosines terms **h** : Projections of data onto orthonormal model functions


## **Author details**

Dursun Üstündag *Marmara University, Faculty of Science and Letters, Department of Mathematics, Turkey* 

Mehmet Cevri *Istanbul University, Faculty of Science, Department of Mathematics, Turkey* 

## **Acknowledgement**

This work is a part of the projects, whose names are "Bayesian Model Selection and Parameter Estimation" with a number FEN - DKR - 200407 - 0082 and "Bayesian Spectrum Estimation of Harmonic Signals" with a number FEN-BGS-060907-0191, supported by the Marmara University, Istanbul, Turkey.

Bayesian Recovery of Sinusoids with Simulated Annealing 89

Corana, A., Marchesi, M., Martini, C. and Ridella, S. (1987). Minimizing multimodal functions of continuous variables with the simulated annealing algorithm, *ACM* 

Gregory, P.C. (2005). Bayesian Logical Data Analysis for the Physical Science*,* Cambridge

Goffe, W.L., Ferier, G.D. & Rogers, J. (1994). Global optimization of statistical functions with

Harney, H.L. (2003). Bayesian Inference: Parameter Estimation and Decisions*,* Springer-

Hooke, T.R. & Jevees, T.A. (1962). Direct search solution of numerical and statistical

Ingber, L. (1994). Simulated Annealing: Practice versus theory, *Mathematical computer* 

Ingber, L. (1996). Adaptive Simulated Annealing ASA: Lessons learned, *Control and* 

Ireland, J. (2007). Simulated annealing and Bayesian posterior distribution analysis applied

Jaynes, E. T. (1987). Bayesian Spectrum and Chirp Analysis, in Maximum Entropy and Bayesian Spectral Analysis and Estimation Problems, C. R. Smith and G. J. Erickson

Kay, S. M. (1993). *Fundamentals of statistical signal processing: Estimation theory*, Prentice

Kirkpatrick, S., Gelatt, C. D., & Vecchi, M.P. (1983). Optimization by Simulated Annealing,

Laarhoven, P.V. & Aarts, E. (1987). *Simulated annealing: Theory and Applications*, D. Reidel

Locatelli, M. (2000). Simulated annealing algorithms for Continuous global optimization: Convergence Conditions, *Journal of Optimization Theory and Applications* 106, pp. 121-133. Lagrange, M., Marchand, S. & Rault, J-Rr. (2005). Improving sinusoidal frequency estimation using a trigonometric approach, *Proceedings of the Digital Audio Effects Conference*,

Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A. M. & Teller, E. ( 1953). Equation of states calculations by fast computing machines, *Journal of Chemical Physics* 21, pp. 1087-1092. McWhorter T. & Scharf, L. L. (1993). Cramer-Rao Bound for deterministic modal analysis,

Nishiyama, K. (1997). A nonlinear filter for estimating a sinusoidal signal and its parameters in white noise: On the case of a single sinusoid, *IEEE transactions on signal processing* 45,

Jaynes, E.T. (2003). *Probability Theory: The Logic of Science*, Cambridge University Press.

*Transactions on Mathematical Software* 13, pp. 262-280. Edwards, A.W.F. (1972). *Likelihood*, Cambridge University Press.

simulated annealing, Journal of Econometrics *60, pp.* 65-100.

problems, *Journal of Association of Computer Machinery* 5, pp. 212-229.

to spectral emission line fitting, *Journal of Solar Physics* 243, pp. 237-252.

Jeffreys, H. (1961*). Theory of Probability*,Oxford University Press, London.

*IEEE Transactions on Signal Processing* 41, pp. 1847-1866.

Norton, J.P. (1986). *An introduction to identification,* Academic Press*,* 1986.

University Press.

Verlag Berlin Heidelberg.

*Modeling* 18, pp. 29-57.

*Cybernetic 25*, pp. 33-54.

Hall, V.1.

(eds.), D. Reidel, Dordrecht, p. 1-37.

*Journal of Science* 220, pp. 671-680.

Publishing Company.

Spain,pp.110-115.

pp. 970-981.

## **7. References**


*Marmara University, Faculty of Science and Letters, Department of Mathematics, Turkey* 

This work is a part of the projects, whose names are "Bayesian Model Selection and Parameter Estimation" with a number FEN - DKR - 200407 - 0082 and "Bayesian Spectrum Estimation of Harmonic Signals" with a number FEN-BGS-060907-0191, supported by the

Aguiare, H., Junior, O., Ingber, L., Petraglia, A., Petraglia, M.R & Machado, M.A.S. (2012). Stochastic Global Optimization and Its Applications with Fuzzy Adaptive Simulated

Barbedo, J.G.A. & Lopes, A. (2009). Estimating Frequency, Amplitude and Phase of two Sinusoids with Very Close Frequencies*, International Journal of Signal processing*, Vol.5,

Bretthorst, G. L. (1990). An Introduction to Parameter Estimation Using Bayesian Probability Theory, *Maximum Entropy and Bayesian Methods*, Kluwer Academic Publishers, pp. 53-79. Brooks S.P. & Morgan B.J.T. (1995). Optimization using simulated annealing, *Journal of Royal* 

Chan K.W. & So, H. C. (2004). Accurate frequency estimation for real harmonic sinusoids*,* 

Cooley, J.W. & Tukey, J.W. (1964). An algorithm for the machine calculation of complex

Binder, K. (1986). *Monte Carlo Methods in Statistical Physics*, 2nd Edition, Springer-Verlag. Bretthorst, G.L. (1988). *Bayesian Spectrum Analysis and Parameter Estimation, Lecture Notes in* 

Annealing. Springer Heidelberg New York Dorddrecht London, pp.36-57.

Bernardo, J.M. & Smith, A.F.M. (2000*).* Bayesian *Theory, Wiley*, New York.

*Statistics,* Springer-Verlag Berlin Heidelberg, New York.

Fourier series, Mathematics of Computation *19, pp.* 297-301.

*Society* (The Statistician) 44, pp. 241-257.

*IEEE Signal Processing Letters* 11, pp.609-612.

*Istanbul University, Faculty of Science, Department of Mathematics, Turkey* 

CRLB : Cramér-Rao lower bound PDF : Probability Density Function SA **:** Simulated Annealing SNR : Signal to Noise Ratio SQ : Simulated Quenching MSE : Mean Squared Errors

dB : deci Bell

**Author details** 

Dursun Üstündag

**Acknowledgement** 

Marmara University, Istanbul, Turkey.

Mehmet Cevri

**7. References** 

No.2, pp. 138-145.


Press, W.H., Flannery, B.P., Teukolshy, S.A. & Vetterling, W.T. (1995). Numerical recipes in C: The art of computing, Cambridge University Press.

**Chapter 5** 

© 2012 Aguilera et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2012 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution,

**Simulated Annealing: A Novel Application** 

Material's internal structure knowledge is highly relevant to improve quality indexes [1]. For example, the exact information of the internal structure of a wood log or lumber such as density and internal defects is an important economic advantage (see [2]). Internal characteristics of materials have been studied with different non-destructive techniques, including ultrasound [3], microwaves [4-7,8], gamma rays, X-rays, nuclear magnetic resonance and, lately, artificial vision techniques [9]. X-rays have been used to examine the

In the wood industry, some applications have been found that allow the recognition of different types of defects of the raw material [14,15]. Methods for measuring moisture [16- 18] have also been developed. Some works have been presented related to knot detection [19,20]. Recently, several approaches have been proposed in the literature along these lines of research; for example in the detection of wood defects [21,22], and more specifically to detect tree-ring knots in wood [23]. However, automatic defect recognition in the

An X-ray computerized tomography reflects variations on the density of an object or body. X-ray sensors reveal the shape of a log below the bark, allowing the detection of macroscopic and microscopic aspects. The latter favors its application in the wood industry, for example Figure 1 shows images of X-ray tomography on wood log. In these illustrations can be clearly seen internal constitutions such as knots and knotty cylinder: a) wood log, b)

The current work is focused on the visual inspection of wood in order to automatically detect classical defects such as knots and knotty cylinders. The proposed approach is based on the use of simulated annealing in deformable contours and X-ray tomography. The

and reproduction in any medium, provided the original work is properly cited.

**of Image Processing in the Wood Area** 

Cristhian A. Aguilera, Mario A. Ramos and Angel D. Sappa

Additional information is available at the end of the chapter

internal characteristics of many materials [10-13].

manufacture process is ever more necessary and fundamental.

X-ray computed tomography, c) and d) images of two cuts.

http://dx.doi.org/10.5772/50635

**1. Introduction** 


Cristhian A. Aguilera, Mario A. Ramos and Angel D. Sappa

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/50635

## **1. Introduction**

90 Simulated Annealing – Advances, Applications and Hybridizations

*Processing*, Springer-Verlag*,* New York.

Society of London*, Series A 77, pp.* 136-140.

*Computer Physics Communications* 174, pp. 846-85.

doi:10.1145/1516512.1516520

MaxEnt 90, pp. 295-301.

Shanghai, pp. 1850-1854.

*Computing* 9, pp.274-281.

Cambridge University Press.

pp. 239–245.

pp. 432–441.

C: The art of computing, Cambridge University Press.

Press, W.H., Flannery, B.P., Teukolshy, S.A. & Vetterling, W.T. (1995). Numerical recipes in

Ruanaidh, J.J.K.Q. & Fitzgerald, W.J. (1996). *Numerical Bayesian Methods Applied to Signal* 

Stoica, P. & Nehorai, A.(1989). MUSIC, Maximum Likelihood, and Cramer-Rao Bound, *IEEE* 

Schuster, A. (1905). The Periodogram and its optical analogy, Proceedings of the Royal

Stefankovic, D., Vempala, S. & Vigoda, E. (2009). Adaptive simulated annealing: A near – optimal connection between sampling and counting. *Journal of the ACM* 56,

Tsoulos, G. & Lagaris, I.E. (2006). GenAnneal: Genetically modified Simulated Annealing,

Üstündag, D., Queen & Bowcock, J.E. (1989). A method for retrieving images from noisy, incomplete data, *Proceeding of the Fifth Alvey Vision Conference*, University of Reading,

Üstündag, D., Queen, N.M., Skinner, G.K. & Bowcock, J.E. (1991). Two new methods for retrieving an image from noisy, incomplete data and comparison with the Cambridge MaxEnt package, *10. International Workshop on Maximum Entropy and Bayesian Methods*,

Üstündağ, D. & Cevri, M. (2008). Estimating parameters of sinusoids from noisy data using Bayesian inference with simulated annealing, *WSEAS Transactions on Signal Processing* 7,

Üstündağ, D. & Cevri, M. (2011). Recovering sinusoids from noisy data using Bayesian inference with simulated annealing, *Mathematical and Computational Applications 16,* pp. 382-391. Üstündağ, D. (2011). Recovering sinusoids from data using Bayesian inference with RJMCMC, *Proceeding of Seventh International Computation on Natural Computation (ICNN)*,

Vasan, A. & Raju, K.S. (2009). Comparative analysis of Simulated annealing, Simulated quenching and genetic algorithms for optimal reservoir operation, *Applied Soft* 

Wellin, P., Gayllord, R. & Kamin, S. (2005). An *Introduction to programming with Mathematica*,

*Transactions on Acoustic Speech, and Signal Processing* 37, pp.720-741.

Material's internal structure knowledge is highly relevant to improve quality indexes [1]. For example, the exact information of the internal structure of a wood log or lumber such as density and internal defects is an important economic advantage (see [2]). Internal characteristics of materials have been studied with different non-destructive techniques, including ultrasound [3], microwaves [4-7,8], gamma rays, X-rays, nuclear magnetic resonance and, lately, artificial vision techniques [9]. X-rays have been used to examine the internal characteristics of many materials [10-13].

In the wood industry, some applications have been found that allow the recognition of different types of defects of the raw material [14,15]. Methods for measuring moisture [16- 18] have also been developed. Some works have been presented related to knot detection [19,20]. Recently, several approaches have been proposed in the literature along these lines of research; for example in the detection of wood defects [21,22], and more specifically to detect tree-ring knots in wood [23]. However, automatic defect recognition in the manufacture process is ever more necessary and fundamental.

An X-ray computerized tomography reflects variations on the density of an object or body. X-ray sensors reveal the shape of a log below the bark, allowing the detection of macroscopic and microscopic aspects. The latter favors its application in the wood industry, for example Figure 1 shows images of X-ray tomography on wood log. In these illustrations can be clearly seen internal constitutions such as knots and knotty cylinder: a) wood log, b) X-ray computed tomography, c) and d) images of two cuts.

The current work is focused on the visual inspection of wood in order to automatically detect classical defects such as knots and knotty cylinders. The proposed approach is based on the use of simulated annealing in deformable contours and X-ray tomography. The

remainder of the manuscript is organized as follow. Firstly, the deformable contour technique, which is used to represent detected defect, is presented. Then, Section 3 introduces the simulated annealing-based optimization that is used to obtain an accurate description of wood's defect. Experimental results and discussions are given in Section 4. Finally, conclusions and future works are provided in Section 5.

(a) (b)

Simulated Annealing: A Novel Application of Image Processing in the Wood Area 93

 2 2 *int <sup>s</sup> ss E x sx s sx s*

�������, = shape energy measurement of external constraints either from higher level shape information or user applied energy.

���� � ��‖��‖�

The internal energy, *int E* , with alpha and beta parameters, controls the tension and rigidity of the contour; *ext E* represents measures or external considerations with respect to the shape of the contour; and *img E* is the force related to the gradient of the image given a Gaussian-type convolution filter. The contour is initially located near the object to be segmented in the image. Then, the attraction and repulsion forces, generated by internal, external image forces, deform the contour until surrounds the object; thereby isolating the object with respect to the overall image. Minimizing the energy equation allows the final solution of the contour (see Fig. 2).

Figure 3 presents a tomographic image of a log; Fig. 3(a) shows the tomographic image itself, in which two knots, and the appearance of a third knot, can clearly be seen at the bottom of the image; Fig. 3(b) shows the enlarged sector of the image corresponding to a particular knot. Figure 3(c) represents the gradient of the image; the most intense points represent the greatest values of the gradient. These forces guide the contour around the knot in order to completely isolate it, as shown in Fig. 3(d). One of the problems to confront is to identify, given the energy function and initial contour, the location of each new point of the

The deformable contour presented in Fig. 3(d) capture the knot's shape present in the image but also some inaccuracies due to noisy date. Actually, these inaccuracies can be due to noise or distracter points; both points distort the contour by attracting or repulsing it (Fig. 4). This is the case of tomography X-ray images in which other objects or variation in object density (e.g., other knots or water saturation in wood pieces) artificially loses the form of the

objects to detect, provoking by this influence, and a large error in segmentation.

**Figure 2.** Evolution of a deformable contour.

contour. This can be done with different search algorithms.

*Tension Stiffness*

 

**Figure 1.** X-ray tomography process of wood log.

#### **2. The deformable contour**

A deformable contour or snake [24] is a parametric curve of the type:

$$\mathfrak{u}(\mathbf{s}) = (\mathfrak{x}(\mathbf{s}), \mathfrak{y}(\mathbf{s})) \tag{1}$$

where *x(s)* and *y(s)* are coordinates along the contour and *s*0,1 . These contours are influenced by internal and external forces and forces typically related to the gradient of the intensity of the image, mathematically:

$$E\_{sanke} = \int\_0^1 \left[ E\_{int}(u(\mathbf{s})) + E\_{ext}(u(\mathbf{s})) + E\_{lmg}(u(\mathbf{s})) \right] d\mathbf{s} \tag{2}$$

where:

$$E\_{int}\left(\mathbf{x}\right) = \alpha\left(\mathbf{s}\right)\left|\mathbf{x}\_{s}\left(\mathbf{s}\right)\right|^{2} + \beta\left(\mathbf{s}\right)\left|\mathbf{x}\_{ss}\left(\mathbf{s}\right)\right|^{2}$$

$$\begin{array}{cc} \text{Tension} & \text{Stiffness} \\ \end{array}$$

�������, = shape energy measurement of external constraints either from higher level shape information or user applied energy.

$$E\_{img} = -a \|\nabla I\|^2$$

The internal energy, *int E* , with alpha and beta parameters, controls the tension and rigidity of the contour; *ext E* represents measures or external considerations with respect to the shape of the contour; and *img E* is the force related to the gradient of the image given a Gaussian-type convolution filter. The contour is initially located near the object to be segmented in the image. Then, the attraction and repulsion forces, generated by internal, external image forces, deform the contour until surrounds the object; thereby isolating the object with respect to the overall image. Minimizing the energy equation allows the final solution of the contour (see Fig. 2).

**Figure 2.** Evolution of a deformable contour.

92 Simulated Annealing – Advances, Applications and Hybridizations

**Figure 1.** X-ray tomography process of wood log.

A deformable contour or snake [24] is a parametric curve of the type:

ଵ

where *x(s)* and *y(s)* are coordinates along the contour and *s*0,1 . These contours are influenced by internal and external forces and forces typically related to the gradient of the

(c) (d)

ܧ௦ ൌ න ൣܧ௧ሺݑሺݏሻሻ ܧ௫௧ሺݑሺݏሻሻ ܧሺݑሺݏሻሻ൧݀ݏ

ሺሻ ൌ ሺሺሻǡ ሺሻሻ (1)

(2)

**2. The deformable contour** 

intensity of the image, mathematically:

where:

Finally, conclusions and future works are provided in Section 5.

remainder of the manuscript is organized as follow. Firstly, the deformable contour technique, which is used to represent detected defect, is presented. Then, Section 3 introduces the simulated annealing-based optimization that is used to obtain an accurate description of wood's defect. Experimental results and discussions are given in Section 4.

(a) (b)

Figure 3 presents a tomographic image of a log; Fig. 3(a) shows the tomographic image itself, in which two knots, and the appearance of a third knot, can clearly be seen at the bottom of the image; Fig. 3(b) shows the enlarged sector of the image corresponding to a particular knot. Figure 3(c) represents the gradient of the image; the most intense points represent the greatest values of the gradient. These forces guide the contour around the knot in order to completely isolate it, as shown in Fig. 3(d). One of the problems to confront is to identify, given the energy function and initial contour, the location of each new point of the contour. This can be done with different search algorithms.

The deformable contour presented in Fig. 3(d) capture the knot's shape present in the image but also some inaccuracies due to noisy date. Actually, these inaccuracies can be due to noise or distracter points; both points distort the contour by attracting or repulsing it (Fig. 4). This is the case of tomography X-ray images in which other objects or variation in object density (e.g., other knots or water saturation in wood pieces) artificially loses the form of the objects to detect, provoking by this influence, and a large error in segmentation.

**Figure 4.** Deformable contour and distracter points; (a) original contour; (b) final contour; (c) original

Simulated annealing (SA) is a stochastic optimization technique introduced by Kirkpatrick [25]. This algorithm begins by selecting an initial solution and later generating a new state, randomly generating a new solution in the neighbourhood of the current solution; this is called a neighbour solution. This new state is evaluated and compared with the previous solution. If the solution from the new state is better than the previous one, it is accepted; but if it is not, it is accepted or rejected with some probability. The probability of accepting a new state is given by:

*E* reflects the change in the objective function and *T* is the current temperature. The way how the temperature decreases along the algorithm is commonly known as the Cooling Schedule and several types of cooling can be found in the literature as well as stopping

with: *E* : Difference between the present and the candidate solutions

R: Random uniform number between [0,1]

(3) Ȁοି

contour with distracter point; and (d) final contour with distracter point.

**3. Simulated Annealing** 

T: Temperature

criterion of the algorithm.

**Figure 3.** (a) Tomographic image; (b) knot (enlargement); (c) gradient of the image; (d) segmentation through a deformable contour.

For example, a tomography image with distinct singularities or objects can be appreciated in Fig. 5. In a) and b), the tomography with low humidity is presented, involving very clear and easy-to-segment images; however, in c) and d) an image with a very high humidity, can be appreciated. Since it is not homogeneously distributed, the object to be segmented can be deformed.

Several techniques have been employed to improve these aspects, like characteristic extraction techniques and improvements in the energy function. However, when there is a prior-knowledge about the objects to detect, such as in the case of well-defined objects and with known forms and characteristics, this information can be incorporated into the energy function. Some approaches, like Deformable Templates, use templates defined a priori and transform the deformation problem to a template adjustment problem. However, even though this produces good results, it suffers from the rigidity due to the templates.

Besides incorporating more information in the energy function, for the case of images of nodes, can significantly improve the quality of final contours. Hence, the final segmentation can be improved by using methods that allow the contour freely explore neighborhood areas in the stage of evolution. The latter makes the simulated annealing, an ideal candidate for solving this type of problems.

**Figure 4.** Deformable contour and distracter points; (a) original contour; (b) final contour; (c) original contour with distracter point; and (d) final contour with distracter point.

## **3. Simulated Annealing**

94 Simulated Annealing – Advances, Applications and Hybridizations

through a deformable contour.

for solving this type of problems.

deformed.

**Figure 3.** (a) Tomographic image; (b) knot (enlargement); (c) gradient of the image; (d) segmentation

(c) (d)

(a) (b)

For example, a tomography image with distinct singularities or objects can be appreciated in Fig. 5. In a) and b), the tomography with low humidity is presented, involving very clear and easy-to-segment images; however, in c) and d) an image with a very high humidity, can be appreciated. Since it is not homogeneously distributed, the object to be segmented can be

Several techniques have been employed to improve these aspects, like characteristic extraction techniques and improvements in the energy function. However, when there is a prior-knowledge about the objects to detect, such as in the case of well-defined objects and with known forms and characteristics, this information can be incorporated into the energy function. Some approaches, like Deformable Templates, use templates defined a priori and transform the deformation problem to a template adjustment problem. However, even

Besides incorporating more information in the energy function, for the case of images of nodes, can significantly improve the quality of final contours. Hence, the final segmentation can be improved by using methods that allow the contour freely explore neighborhood areas in the stage of evolution. The latter makes the simulated annealing, an ideal candidate

though this produces good results, it suffers from the rigidity due to the templates.

Simulated annealing (SA) is a stochastic optimization technique introduced by Kirkpatrick [25]. This algorithm begins by selecting an initial solution and later generating a new state, randomly generating a new solution in the neighbourhood of the current solution; this is called a neighbour solution. This new state is evaluated and compared with the previous solution. If the solution from the new state is better than the previous one, it is accepted; but if it is not, it is accepted or rejected with some probability. The probability of accepting a new state is given by:

$$e^{-\Delta E/T} > \mathbf{R} \tag{3}$$

with: *E* : Difference between the present and the candidate solutions T: Temperature

R: Random uniform number between [0,1]

*E* reflects the change in the objective function and *T* is the current temperature. The way how the temperature decreases along the algorithm is commonly known as the Cooling Schedule and several types of cooling can be found in the literature as well as stopping criterion of the algorithm.

**Figure 6.** Simulated Annealing Algorithm used in the current work

example where a neighborhood of *3x3* pixels is considered

the contour, to find a global minimum of the energy function. SA explores the possibilities in a local environment of the contour and it evolves in the direction of minimum energy. The method initially considers the generation of a contour in the 2D image space by using a given number of vertices. A neighborhood patch of *nxn* pixels is created for every pixel in the contour. Then, in that neighborhood a *new candidate vertex* is selected by evaluating it according to the energy function; the acceptance of this new vertex will depend on criteria such as the number of iterations, number of acceptances and the probability of "bad solution" acceptance according to the temperature. Additionally, a criterion for the elimination and creation of new vertices is used. This criterion is based on the distance between the vertices in the contour and allows a homogeneous distribution through the whole contour. Once all the vertices in the contour have been checked for a given temperature *Ti* (Fig. 6) the criterion for elimination and creation is applied. A vertex is eliminated in case the distance to its neighbor is smaller than a minimum distance (*dmin*). Similarly, a vertex is created in case the distance between two vertices is higher than a maximum allowed distance (*dmax*). Figure 7 shows an

Figure 8 shows a deformable contour applied to a knot in a X-ray tomography, the contour near to the object to be segmented is attracted and deformed by the force of the image until the contour completely surrounds the object. The forces coming from the gradient of the images guide the contour until it is located around the object's borders. In Fig. 8, we can see the evolution of the contour for the segmentation of knots in *Pinus radiata* wood. We can see that this method can be strongly influenced by other objects in the image, or by points of

intensities not related to the object, producing a poor segmentation of the defect.

**Figure 5.** X-ray tomography of de wood log.

Several implementations and variations of SA can be found in the literature: Threshold Accepting (TA), [26]; Boltzmann Annealing (BA), [27]; Simulated Quelching (SQ), [27]; Fast annealing (FA), [27] among others. The latest approaches, SQ and FA, are mainly focused on speeding up the searching for the right solution, which is one of the main drawbacks of BA.

In the current work a classical simulated annealing algorithm is used. A Bolztman distribution (BA) is assumed and consequently, a logarithmic cooling schedule is used: *T(k) =T0/ln k*. It should be noticed that since the searching space of our application is not too big, the slowness problem of the method is not relevant. Figure 6 presents the pseudo-code of the algorithm used in the current work.

The ability to accept poor solutions early in the evolution of a deformable contour is a powerful tool to explore complex areas of the image and get better final solutions. Then, SA provides a method of local evolution of each of the points of a contour, allowing the objects in a target to be more accurately represented.

For the case of the images obtained by X-ray computed tomography applied to wood, unlike the steepest descent method that provides a global search method, there are other methods that are based on a local view to find a global minimum. In the current work we propose to use a method based on SA, which is based on a local view, and in this case, local portions of

**Figure 6.** Simulated Annealing Algorithm used in the current work

**Figure 5.** X-ray tomography of de wood log.

the algorithm used in the current work.

in a target to be more accurately represented.

Several implementations and variations of SA can be found in the literature: Threshold Accepting (TA), [26]; Boltzmann Annealing (BA), [27]; Simulated Quelching (SQ), [27]; Fast annealing (FA), [27] among others. The latest approaches, SQ and FA, are mainly focused on speeding up the searching for the right solution, which is one of the main drawbacks of BA. In the current work a classical simulated annealing algorithm is used. A Bolztman distribution (BA) is assumed and consequently, a logarithmic cooling schedule is used: *T(k) =T0/ln k*. It should be noticed that since the searching space of our application is not too big, the slowness problem of the method is not relevant. Figure 6 presents the pseudo-code of

(c) (d)

(a) (b)

The ability to accept poor solutions early in the evolution of a deformable contour is a powerful tool to explore complex areas of the image and get better final solutions. Then, SA provides a method of local evolution of each of the points of a contour, allowing the objects

For the case of the images obtained by X-ray computed tomography applied to wood, unlike the steepest descent method that provides a global search method, there are other methods that are based on a local view to find a global minimum. In the current work we propose to use a method based on SA, which is based on a local view, and in this case, local portions of the contour, to find a global minimum of the energy function. SA explores the possibilities in a local environment of the contour and it evolves in the direction of minimum energy. The method initially considers the generation of a contour in the 2D image space by using a given number of vertices. A neighborhood patch of *nxn* pixels is created for every pixel in the contour. Then, in that neighborhood a *new candidate vertex* is selected by evaluating it according to the energy function; the acceptance of this new vertex will depend on criteria such as the number of iterations, number of acceptances and the probability of "bad solution" acceptance according to the temperature. Additionally, a criterion for the elimination and creation of new vertices is used. This criterion is based on the distance between the vertices in the contour and allows a homogeneous distribution through the whole contour. Once all the vertices in the contour have been checked for a given temperature *Ti* (Fig. 6) the criterion for elimination and creation is applied. A vertex is eliminated in case the distance to its neighbor is smaller than a minimum distance (*dmin*). Similarly, a vertex is created in case the distance between two vertices is higher than a maximum allowed distance (*dmax*). Figure 7 shows an example where a neighborhood of *3x3* pixels is considered

Figure 8 shows a deformable contour applied to a knot in a X-ray tomography, the contour near to the object to be segmented is attracted and deformed by the force of the image until the contour completely surrounds the object. The forces coming from the gradient of the images guide the contour until it is located around the object's borders. In Fig. 8, we can see the evolution of the contour for the segmentation of knots in *Pinus radiata* wood. We can see that this method can be strongly influenced by other objects in the image, or by points of intensities not related to the object, producing a poor segmentation of the defect.

98 Simulated Annealing – Advances, Applications and Hybridizations

gradient. These forces guide the contour around the knot in order to completely isolate it, but not all are related to force the knot. The latter is one of the problems to be properly

(a) (b)

**Figure 9.** (a) and (b) Tomographic image of a knot, (c) knot (enlargement), (d) gradient of the image.

(c) (d)

According to the above description and using the energy functions described in (2), we proceeded to analyse the images from the X-ray computerized topographies of logs with different characteristics. To carry out these experiments, we used discreet versions of the energy functions and a local optimization algorithm based on the greedy algorithm and another based on classic Boltzmann Annealing with T0=1000. The tests were carried out with topographies of various logs using a medical scanner. In total, 90 images were analysed. The

Figure 10 shows a sequence of images revealing the evolution of the knots inside a log. On the left, we can see the images of the complete log and, on the right, an enlargement of the image corresponding to a single knot. Figure 11 shows the situation of a knot that appears to be an isolated object in the image: Fig. 11(a) shows the result of the segmentation using the greedy algorithm; and Fig. 11(b) presents the result using SA. In both cases, the segmentation gives good results; the contour becomes deformed around the knot, isolating

solved by the segmentation methods.

**4. Result and discussion** 

most relevant results are shown in the following figures.

**Figure 7.** Illustration of a contour evolution for a neighborhood patch of *3x3* pixels. For every contour vertex a new candidate vertex is created and accepted according to a set of user defined parameters.

**Figure 8.** Evolution of a deformable contour: (a) initial image; (b) and (c) initial contour; (d) final contour.

In general, the tomography of logs show the existence of the distortions in the shape of the knots and forces of attraction in other regions of the contour not associated with the knot (in this case, growth rings) and in zones or points that influence the contour, thereby causing a poor final segmentation. Figure 9(a) and (b) present a tomographic image of a wood log; an enlargement of a knot is presented in Fig. 9(c); the corresponding gradient image is depicted in Fig. 9(d). Note that in Fig. 9(c), the most intense points represent the greatest values of the gradient. These forces guide the contour around the knot in order to completely isolate it, but not all are related to force the knot. The latter is one of the problems to be properly solved by the segmentation methods.

**Figure 9.** (a) and (b) Tomographic image of a knot, (c) knot (enlargement), (d) gradient of the image.

## **4. Result and discussion**

98 Simulated Annealing – Advances, Applications and Hybridizations

XXX X X

X X

**Figure 7.** Illustration of a contour evolution for a neighborhood patch of *3x3* pixels. For every contour vertex a new candidate vertex is created and accepted according to a set of user defined parameters.

(a) (b)

**Figure 8.** Evolution of a deformable contour: (a) initial image; (b) and (c) initial contour; (d) final contour.

(c) (d)

In general, the tomography of logs show the existence of the distortions in the shape of the knots and forces of attraction in other regions of the contour not associated with the knot (in this case, growth rings) and in zones or points that influence the contour, thereby causing a poor final segmentation. Figure 9(a) and (b) present a tomographic image of a wood log; an enlargement of a knot is presented in Fig. 9(c); the corresponding gradient image is depicted in Fig. 9(d). Note that in Fig. 9(c), the most intense points represent the greatest values of the According to the above description and using the energy functions described in (2), we proceeded to analyse the images from the X-ray computerized topographies of logs with different characteristics. To carry out these experiments, we used discreet versions of the energy functions and a local optimization algorithm based on the greedy algorithm and another based on classic Boltzmann Annealing with T0=1000. The tests were carried out with topographies of various logs using a medical scanner. In total, 90 images were analysed. The most relevant results are shown in the following figures.

Figure 10 shows a sequence of images revealing the evolution of the knots inside a log. On the left, we can see the images of the complete log and, on the right, an enlargement of the image corresponding to a single knot. Figure 11 shows the situation of a knot that appears to be an isolated object in the image: Fig. 11(a) shows the result of the segmentation using the greedy algorithm; and Fig. 11(b) presents the result using SA. In both cases, the segmentation gives good results; the contour becomes deformed around the knot, isolating

it completely. It should be noted here that, in this case, the knot in the image appears to be totally isolated and there are no objects that distort the contour deformation. Figure 12 presents the result of the segmentation in a case in which the knots are confused with the growth rings: Fig. 12(a) shows the result when the greedy algorithm is used; and Fig. 12(b) depicts the result with SA. In this case, a considerable difference can be appreciated between the two methods. The energy gradients generated by the image of the growth rings tend to move the contour away from the knot. However, with SA, due to its random exploration characteristic, in general it tends to better segment the knot. In Fig. 13, the situation is more

Simulated Annealing: A Novel Application of Image Processing in the Wood Area 101

critical. In these images the knots are merged with the growth rings; in fact, the knot is only partially visible in the tomographic image, and it appears to be an object with different geometric characteristics than those visible in the previous images. Figure 13(a) shows the result of the evolution of the contour with the greedy algorithm; in this case the algorithm clearly does not surround the knot correctly and the result of the segmentation is extremely poor. Nonetheless, in Fig. 13(b), with SA and thanks to a wider exploration, the general

As we can see in the figures presented above, in some situations the images obtained by Xray computerized topographies facilitate the segmentation. Nevertheless, in other situations, the segmentation is seriously complicated by confusion with other objects in the image. In particular, growth rings and other objects distract the contours. In such cases, a classical optimization algorithm such as the greedy algorithm does not provide good results, principally due to the tendency of local minima. However, SA provides a more robust

**Figure 11.** Evolution of an initial contour around an isolated knot; (a) greedy algorithm; (b) simulated

(a) (b)

**Figure 12.** Evolution of an initial contour around a knot with interference from growth rings; (a) greedy

(a) (b)

result is perceptibly better.

annealing algorithm.

algorithm; (b) simulated annealing algorithm.

method for presenting better results in these situations.

**Figure 10.** Tomography of a log; (left) transversal cuts; (right) enlarged image.

critical. In these images the knots are merged with the growth rings; in fact, the knot is only partially visible in the tomographic image, and it appears to be an object with different geometric characteristics than those visible in the previous images. Figure 13(a) shows the result of the evolution of the contour with the greedy algorithm; in this case the algorithm clearly does not surround the knot correctly and the result of the segmentation is extremely poor. Nonetheless, in Fig. 13(b), with SA and thanks to a wider exploration, the general result is perceptibly better.

100 Simulated Annealing – Advances, Applications and Hybridizations

**Figure 10.** Tomography of a log; (left) transversal cuts; (right) enlarged image.

it completely. It should be noted here that, in this case, the knot in the image appears to be totally isolated and there are no objects that distort the contour deformation. Figure 12 presents the result of the segmentation in a case in which the knots are confused with the growth rings: Fig. 12(a) shows the result when the greedy algorithm is used; and Fig. 12(b) depicts the result with SA. In this case, a considerable difference can be appreciated between the two methods. The energy gradients generated by the image of the growth rings tend to move the contour away from the knot. However, with SA, due to its random exploration characteristic, in general it tends to better segment the knot. In Fig. 13, the situation is more

(a) (b)

(c) (d)

(e) (f)

As we can see in the figures presented above, in some situations the images obtained by Xray computerized topographies facilitate the segmentation. Nevertheless, in other situations, the segmentation is seriously complicated by confusion with other objects in the image. In particular, growth rings and other objects distract the contours. In such cases, a classical optimization algorithm such as the greedy algorithm does not provide good results, principally due to the tendency of local minima. However, SA provides a more robust method for presenting better results in these situations.

**Figure 11.** Evolution of an initial contour around an isolated knot; (a) greedy algorithm; (b) simulated annealing algorithm.

**Figure 12.** Evolution of an initial contour around a knot with interference from growth rings; (a) greedy algorithm; (b) simulated annealing algorithm.

This work was partially supported from MECESUP2 Postdoctoral program and Regular Project 100710 3/R of the University of Bio-Bio, Chile; and the Spanish Government under Research Program Consolider Ingenio 2010: MIPRCV (CSD2007-00018) and Project TIN2011-

[1] Benson-Cooper D, Knowles R, Thompsom F, Cown D (1982) Detection of defaults in

[2] Harless T, Wagner F, Steele P, Taylor F, Yadama V, McMillin C (1991) Methodology for locating defect within hardwood logs and determining their impact on lumber value

[3] Sandoz J (1993) Moisture content and temperature effect on ultrasound timber grading,

[4] Martin P, Collet P, Barthelemy P, Roussy G (1987) Evaluation of wood characteristics: internal scanning off the material by microwaves, Wood Science and Technology 21:

[5] King R (1991) Measurement of basic weight and moisture content of composite board using microwave, VIII Int. Symp. On non-destructive testing osf wood 1: 21-32. [6] Gapp V, Mallick G, (1995) Capteur intelligent pour la mesure de la humdité dans le matériau par technique micro-ondes, Journées Nationales Micro-ondes 73: 53-57. [7] Baradit E, Aedo R, Correa J (2006) Knot detection in Wood using microwaves, Wood

[8] Lundgren N, Brännström M, Hagman O, Oja J (2007) Predicting the Strength of Norway Spruce by Microwave Scanning: A Comparison with Other Scanning Techniques, Wood

[9] Thomas L, Mili L (2006) Defect Detection on Hardwood Logs Using Laser Scanning,

[10] Funt B, Bryant E (1987) Detection of internal log defects by automatic interpretation of

[11] Zhu D, Conners R, Schmoldt D, Araman P, (1996) A prototype Vision System for Analyzing CT Imagery of Hardwood Logs. IEEE Transactions on Systems, Man and

[12] Rojas G, Hernández R, Condal A, Verret D, Beauregard R (2005): Exploration of the Physical Properties of Internal Characteristics of Sugar Maple Logs and Relationships

[13] Eriksson J, Johansson H, Danvind J (2006) Numerical Determination of Diffusion Coefficients in Wood Using Data From CT-Scanning, Wood and Fiber Science 38: 334-

computer tomography images, Forest Product Journal 37(1): 56-62.

with CT Images, Wood and Fiber Science 37: 591-604.

logs, Forest Research Institute of New Zeeland 8: 9-12.

yield, Forest Product Journal 41: 25-30.

Science and Technology 40: 118-123.

Wood and Fiber Science 38: 243-246.

and Fiber Science 39: 167-172.

Cybernetics 26: 522-532.

344.

Wood Science and Technology 27: 373-380.

**Acknowledgement** 

25606.

**6. References** 

361-371.

**Figure 13.** Evolution of an initial contour around a knot with interference from growth rings; (a) greedy algorithm; (b) simulated annealing algorithm

## **5. Conclusions**

From the experimental results presented above, we can conclude that the proposed method is a very good alternative for knot segmentation. The use of SA allows much more precise segmentation of knots with greater irregularities in the images. Nonetheless, the initial position of the deformable contour is of vital importance for a good identification. We can also use the results obtained to infer that both excessive humidity content in the logs and growth rings present inconveniences. The latter presents a great challenge since the knot shows up in the images obtained with X-rays both in isolation and confused with other objects. In the second case, additional information, for example morphological information on the knots, can be incorporated into the contour's energy function that describes the force of the image, offering a very good alternative. The cases examined correspond to segmentation by X-ray computerized tomography for logs having distortions mainly due to their storage; this causes the density of the material to increase, which then has an effect on the images. This case is of great interest for the wood industry. However, this technique can also be used in other situations such as the study of other defects in logs. The authors are currently developing 3D techniques and a way to reduce the method's computerization costs that was not considered in this first study.

## **Author details**

Cristhian A. Aguilera *Dept. Electrical and Electronic Engineering, University of Bio-Bio, Concepción, Chile* 

Mario A. Ramos *Dept. Wood Engineering, University of Bio-Bio, Concepción, Chile* 

Angel D. Sappa *Computer Vision Center, Autonomous University of Barcelona, Barcelona, Spain* 

## **Acknowledgement**

102 Simulated Annealing – Advances, Applications and Hybridizations

algorithm; (b) simulated annealing algorithm

costs that was not considered in this first study.

*Dept. Electrical and Electronic Engineering, University of Bio-Bio, Concepción, Chile* 

*Computer Vision Center, Autonomous University of Barcelona, Barcelona, Spain* 

*Dept. Wood Engineering, University of Bio-Bio, Concepción, Chile* 

**5. Conclusions** 

**Author details** 

Mario A. Ramos

Angel D. Sappa

Cristhian A. Aguilera

**Figure 13.** Evolution of an initial contour around a knot with interference from growth rings; (a) greedy

(a) (b)

From the experimental results presented above, we can conclude that the proposed method is a very good alternative for knot segmentation. The use of SA allows much more precise segmentation of knots with greater irregularities in the images. Nonetheless, the initial position of the deformable contour is of vital importance for a good identification. We can also use the results obtained to infer that both excessive humidity content in the logs and growth rings present inconveniences. The latter presents a great challenge since the knot shows up in the images obtained with X-rays both in isolation and confused with other objects. In the second case, additional information, for example morphological information on the knots, can be incorporated into the contour's energy function that describes the force of the image, offering a very good alternative. The cases examined correspond to segmentation by X-ray computerized tomography for logs having distortions mainly due to their storage; this causes the density of the material to increase, which then has an effect on the images. This case is of great interest for the wood industry. However, this technique can also be used in other situations such as the study of other defects in logs. The authors are currently developing 3D techniques and a way to reduce the method's computerization This work was partially supported from MECESUP2 Postdoctoral program and Regular Project 100710 3/R of the University of Bio-Bio, Chile; and the Spanish Government under Research Program Consolider Ingenio 2010: MIPRCV (CSD2007-00018) and Project TIN2011- 25606.

## **6. References**

	- [14] Sarigul E, Abbott L, Schmoldt D (2001) Nondestructive rule-based defect detection and indentification system in CT images of hardwood logs, Review of Progress in Nondestructive Evaluation 20: 1936-1943.

**Chapter 6** 

© 2012 Huang et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2012 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution,

**Applications of Simulated Annealing-Based** 

In the last decade, many heuristic methods have evolved for solving optimization problems that were previously difficult or impossible to solve. These methods include simulated annealing (SA), tabu search (TS), genetic algorithm (GA), differential evolution (DE), evolutionary programming (EP), evolutionary strategy (ES), ant colony optimization (ACO), and particle swarm optimization (PSO). This chapter reviews papers in international

The numerical simulation of physical phenomena has become a major issue in the design phase or optimization of many systems. The complexity of the phenomena demands that multi-scale and multi-physical aspects are considered. In recent years, many researchers are concerned with the development, study, and implementation of efficient numerical methods that can be used in applications in engineering, natural and other sciences. They also focus on numerical methods for solving highly complex and CPU-time problems using highperformance computing. For the advancement of computing schemes that improve the efficiency and even optimize calculation processes, researchers must be able to generate novel systematic methods for solving multi-physical problems. The applications of SA-

This chapter reviews various SA-based methods (including SA and other meta-heuristics methods) for the planning, operation, and optimization of power systems, including relevant recent and historical developments. Relevant publications in international journals that cover a broad range of applications of SA methods to solving power system problems are reviewed. As is well known among power engineers, many kinds of combinatorial optimization problems arise in the planning and operation of power systems. The generation and transmission expansion planning, generator maintenance scheduling and unit commitment, reactive power planning, load forecasting and economic dispatch, and

and reproduction in any medium, provided the original work is properly cited.

journals that present the SA-based methods for electric power system applications.

**Approaches to Electric Power Systems** 

Yann-Chang Huang, Huo-Ching Sun and Kun-Yuan Huang

Additional information is available at the end of the chapter

based methods for electric power systems are surveyed.

distribution systems planning and operation are typical problems.

http://dx.doi.org/10.5772/50388

**1. Introduction** 

