**Appendix A: Parameter list**

We summarized parameters used in PAGE and UPAGE in the following table.


#### **Appendix B: Derivation of a parameter update formula for UPAGE**

We here explain details of the parameter update formula for UPAGE (see Section 4.1). By separating Q(β, π, ζ|β, π, ζ) into terms containing β, π and ζ, a parameter update formula for β, π and ζ can be calculated separately.

We here derive β. Maximization of Q(β, π, ζ|β, π, ζ) under a constraint ∑*<sup>α</sup> β k* (S[*x*] → *α*) = 1 can be performed by the method of Lagrange multipliers:

$$\frac{\partial \mathcal{K}}{\partial \overline{\beta}^k(\mathcal{S}[\mathbf{x}] \to \mathbf{a})} = 0,\tag{27}$$

with

22 Will-be-set-by-IN-TECH

estimate *μ* and *h*, as well as *β*, *π* and *ζ* during search. In the case of PAGE, we proposed PAGE-VB in Ref. ([12]), which adopted VB to estimate the annotation size *h*. In a similar

We have shown the effectiveness of PAGE and UPAGE with benchmark problems not having intron structures. However, in real-world applications, problems generally include intron structures, which make the model and parameter inference much more difficult. For such problems, we consider that intron removal algorithms ([13, 30]) are effective, and application

We have introduced a probabilistic program evolution algorithm named PAGE and its extension UPAGE. PAGE takes advantage of latent annotations that enables consideration of dependencies among nodes, and UPAGE incorporates a mixture model for taking into account global contexts. By applying UPAGE to computational experiments, we have confirmed that a mixture model is highly effective for obtaining solutions in terms of the number of fitness evaluations. At the same time, UPAGE is more advantageous than PAGE in the sense that UPAGE can obtain multiple solutions for multimodal problems. We hope that it will be possible to apply PAGE and UPAGE to a wide class of real-world problems, which is an

fashion, it is possible to apply VB to UPAGE to enable the inference of *μ* and *h*.

We summarized parameters used in PAGE and UPAGE in the following table.

Target model Parameter Meaning

*xj*

*zk*

*<sup>i</sup> <sup>z</sup><sup>k</sup>*

PAGE and UPAGE *δ*(*x*; *T*, *X*) Frequency of a root S[*x*] in a complete tree (0 or 1)

PAGE *π*(S[*x*]) Probability of a root S[*t*]

UPAGE *ζ<sup>k</sup>* Mixture ratio of *k*th model.

*h* Annotation size

*c*(*r*; *T*, *X*) Frequency of a production rule *r* in a complete tree

*H* Set of annotation *H* = {0, 1, ··· *h* − 1} *Ti* Observed derivation tree

*<sup>i</sup> j*th latent annotation in *Ti* R[*H*] Set of production rules N Set of non-terminals in CFG T Set of terminals in CFG F Set of function nodes in GP T Set of terminal nodes in GP

*β*(*r*) Probability of a production rule *r*

*<sup>π</sup>k*(S[*x*]) Probability of a root <sup>S</sup>[*t*] in *<sup>k</sup>*th model. *βk*(*r*) Probability of a production rule *r* in *k*th model

*μ* Mixture size

*<sup>k</sup>* = 1, if *i*th individual belongs to *k*th model

of such algorithms to GP-EDAs is left as a topic of future study.

**7. Conclusion**

**Author details** Yoshihiko Hasegawa *The University of Tokyo, Japan*

intended future area of study.

**Appendix A: Parameter list**

$$\mathcal{X}' = \mathcal{Q}(\overline{\beta}, \overline{\pi}, \overline{\zeta} | \beta, \pi, \zeta) + \sum\_{k, \mathbf{x}} \overline{\zeta}\_{k, \mathbf{x}} \left(1 - \sum\_{\mathbf{a}} \overline{\beta}^{k} (\mathcal{S}[\mathbf{x}] \to \mathbf{a})\right), \tag{28}$$

where *ξk*,*<sup>x</sup>* denote Lagrange multipliers. By calculating Equation 27, we obtain the following update formula:

$$\overline{\mathcal{P}}^{k}(\mathcal{S}[\mathbf{x}] \to \mathcal{g}\,\mathcal{S}[\mathbf{y}] \cdot \cdots \cdot \mathcal{S}[\mathbf{y}]) \approx \sum\_{i=1}^{N} \sum\_{X\_{i}} \sum\_{Z\_{i}} \left\{ P(X\_{i}, Z\_{i} | T\_{i}; \boldsymbol{\mathcal{B}}, \boldsymbol{\pi}, \boldsymbol{\zeta}) z\_{i}^{k} \right. \\ \left. \begin{aligned} & \left. \left( \sum\_{i} \sum\_{j} \left\{ P(X\_{i}, Z\_{i} | T\_{i}; \boldsymbol{\mathcal{B}}, \boldsymbol{\pi}, \boldsymbol{\zeta}) z\_{i}^{k} \right. \right. \\ & \left. \left. \left. \left( \sum\_{i} \sum\_{j} \left\{ P(X\_{i}, Z\_{i} | T\_{i}; \boldsymbol{\mathcal{B}}, \boldsymbol{\pi}, \boldsymbol{\zeta}) z\_{i}^{k} \right. \right. \right. \end{aligned} \right) \right) \right\} . \end{aligned} \tag{29}$$

Because Equation 29 includes summation in terms of *Xi*, direct calculation is intractable due to exponential increase of computational cost. Consequently, we use forward–backward probabilities. Let *<sup>c</sup>k*(S[*x*] <sup>→</sup> *<sup>g</sup>* <sup>S</sup>[*y*] ···S[*y*]; *Ti*) be

$$\begin{aligned} &c^k(\mathcal{S}[\mathbf{x}] \to \operatorname{g}\mathcal{S}[\mathbf{y}] \cdots \mathcal{S}[\mathbf{y}]; T\_i) \\ &= \sum\_{\mathbf{X}\_i} \sum\_{\mathbf{Z}\_i} P(\mathbf{X}\_i \mathcal{Z}\_i | T\_i; \mathcal{B}, \pi, \zeta) z\_i^k c(\mathcal{S}[\mathbf{x}] \to \operatorname{g}\mathcal{S}[\mathbf{y}] \cdot \cdots \cdot \mathcal{S}[\mathbf{y}]; T\_i \mathcal{X}\_i). \end{aligned}$$

By differentiating the likelihood of complete data (Equation 18) with respect to *<sup>β</sup>k*(S[*x*] <sup>→</sup> *g* S[*y*] ···S[*y*]), we have

$$\begin{split} & \mathcal{L}^{k}(\mathcal{S}[\mathbf{x}] \to \operatorname{g} \mathcal{S}[\mathbf{y}] \cdots \mathcal{S}[\mathbf{y}]; T\_{i}) \\ &= \frac{\mathcal{B}^{k}(\mathcal{S}[\mathbf{x}] \to \operatorname{g} \mathcal{S}[\mathbf{y}] \cdots \mathcal{S}[\mathbf{y}])}{P(T\_{i}; \boldsymbol{\beta}, \pi, \boldsymbol{\zeta})} \sum\_{\mathbf{X}\_{i}} \sum\_{\mathbf{Z}\_{i}} \frac{\partial P(T\_{i}, \mathbf{X}\_{i}, \mathbf{Z}\_{i}; \boldsymbol{\beta}, \pi, \boldsymbol{\zeta})}{\partial \boldsymbol{\beta}^{k} (\mathcal{S}[\mathbf{x}] \to \operatorname{g} \mathcal{S}[\mathbf{y}] \cdots \mathcal{S}[\mathbf{y}])}. \end{split}$$

The last term is calculated as

$$\begin{split} \sum\_{\mathbf{X}\_{\boldsymbol{l}}} \sum\_{\mathbf{Z}\_{\boldsymbol{l}}} \frac{\partial P(T\_{\boldsymbol{l}}, \mathbf{X}\_{\boldsymbol{l}}, \mathbf{Z}\_{\boldsymbol{l}}; \boldsymbol{\beta}, \boldsymbol{\pi}, \boldsymbol{\zeta})}{\partial \beta^{k} (\mathcal{S}[\mathbf{x}] \to \operatorname{g} \mathcal{S}[\mathbf{y}] \cdot \cdots \mathcal{S}[\mathbf{y}])} &= \boldsymbol{\xi}^{k} \sum\_{\mathbf{X}\_{\boldsymbol{l}}} \frac{\partial P(T\_{\boldsymbol{l}}, \mathbf{X}\_{\boldsymbol{l}}; \boldsymbol{\beta}^{k}, \boldsymbol{\pi}^{k})}{\partial \beta^{k} (\mathcal{S}[\mathbf{x}] \to \operatorname{g} \mathcal{S}[\mathbf{y}] \cdot \cdots \mathcal{S}[\mathbf{y}])} \\ &= \boldsymbol{\xi}^{k} \sum\_{\boldsymbol{\ell} \in \operatorname{cover}(\mathbf{g}, T\_{\boldsymbol{l}})} f\_{T\_{\boldsymbol{l}}}^{\ell}(\mathbf{x}; \boldsymbol{\beta}^{k}, \boldsymbol{\pi}^{k}) \prod\_{j \in \operatorname{ch}(\boldsymbol{\ell}, T\_{\boldsymbol{l}})} b\_{T\_{\boldsymbol{l}}}^{j}(\mathbf{y}; \boldsymbol{\beta}^{k}, \boldsymbol{\pi}^{k}). \end{split}$$

By this procedure, the update formula for *β* is expressed with Equation 21, and the update formula for *π* is calculated in a similar way (and much easier). The update formula for *ζ* is

24 Will-be-set-by-IN-TECH 72 Genetic Programming – New Approaches and Successful Applications Programming with Annotated Grammar Estimation <sup>25</sup>

given by

$$\begin{split} \overline{\zeta}^{k} & \propto \sum\_{i=1}^{N} \sum\_{X\_{i}} \sum\_{Z\_{i}} P(X\_{i\prime}, Z\_{i} | T\_{i\prime}; \beta, \pi, \zeta) z\_{i}^{k} \\ &= \sum\_{i=1}^{N} \frac{1}{P(T\_{i\prime}; \beta, \pi, \zeta)} \sum\_{X\_{i}} \sum\_{Z\_{i}} \left\{ z\_{i}^{k} P(T\_{i\prime}, X\_{i\prime}, Z\_{i\prime}; \beta, \pi, \zeta) \right\} \\ &= \sum\_{i=1}^{N} \frac{1}{P(T\_{i\prime}; \beta, \pi, \zeta)} \sum\_{X\_{i}} \left\{ \zeta^{k} P(T\_{i\prime}, X\_{i\prime}; \beta^{k}, \pi^{k}) \right\} \\ &= \sum\_{i=1}^{N} \frac{\zeta^{k} P(T\_{i\prime}; \beta^{k}, \pi^{k})}{P(T\_{i\prime}; \beta, \pi, \zeta)}. \end{split}$$

[12] Hasegawa, Y. & Iba, H. [2009b]. Latent variable model for estimation of distribution algorithm based on a probabilistic context-free grammar, *IEEE Transactions on*

Programming with Annotated Grammar Estimation 73

[13] Hooper, D. & Flann, N. S. [1996]. Improving the accuracy and robustness of genetic programming through expression simplification, *Proceedings of the First Annual*

[14] Larrañaga, P. & Lozano, J. A. [2002]. *Estimation of Distribution Algorithms*, Kluwer

[15] Looks, M. [2005]. Learning computer programs with the Bayesian optimization algorithm. Master thesis, Washington University Sever Institute of Technology. [16] Looks, M. [2007]. Scalable estimation-of-distribution program evolution, *GECCO '07: Proceedings of the 9th annual conference on Genetic and evolutionary computation*, ACM, New

[17] Matsuzaki, T., Miyao, Y. & Tsujii, J. [2005]. Probabilistic CFG with latent annotations, *In Proceedings of the 43rd Meeting of the Association for Computational Linguistics (ACL)*,

[18] Nordin, P. [1994]. A compiling genetic programming system that directly manipulates the machine code, *Advances in genetic programming*, MIT Press, Cambridge, MA, USA,

[19] Pelikan, M. & Goldberg, D. E. [2001]. Escaping hierarchical traps with competent genetic algorithms, *GECCO '01: Proceedings of the 2001 conference on Genetic and evolutionary*

[20] Pelikan, M., Goldberg, D. E. & Cantú-Paz, E. [1999]. BOA: The Bayesian optimization algorithm, *Proceedings of the Genetic and Evolutionary Computation Conference GECCO-99*, Vol. I, Morgan Kaufmann Publishers, San Fransisco, CA, Orlando, FL, pp. 525–532. [21] Poli, R. & McPhee, N. F. [2008]. A linear estimation-of-distribution GP system, *Proceedings*

[22] Punch, W. F. [1998]. How effective are multiple populations in genetic programming, *Genetic Programming 1998: Proceedings of the Third Annual Conference*, Morgan Kaufmann,

[23] Ratle, A. & Sebag, M. [2001]. Avoiding the bloat with probabilistic grammar-guided genetic programming, *Artificial Evolution 5th International Conference, Evolution Artificielle,*

[24] Regolin, E. N. & Pozo, A. T. R. [2005]. Bayesian automatic programming, *Proceedings of the 8th European Conference on Genetic Programming*, Vol. 3447 of *Lecture Notes in Computer*

[25] Sałustowicz, R. P. & Schmidhuber, J. [1997]. Probabilistic incremental program evolution,

[26] Sastry, K. & Goldberg, D. E. [2003]. Probabilistic model building and competent genetic programming, *Genetic Programming Theory and Practise*, Kluwer, chapter 13, pp. 205–220. [27] Sato, H., Hasegawa, Y., Bollegala, D. & Iba, H. [2012]. Probabilistic model building GP with belief propagation, *Proceedings of IEEE Congress on Evolutionary Computation (CEC*

[28] Shan, Y., McKay, R. I., Abbass, H. A. & Essam, D. [2003]. Program evolution with explicit learning: a new framework for program automatic synthesis, *Proceedings of the 2003 Congress on Evolutionary Computation CEC2003*, IEEE Press, Canberra, pp. 1639–1646. [29] Shan, Y., McKay, R. I., Baxter, R., Abbass, H., Essam, D. & Hoai, N. X. [2004]. Grammar model-based program evolution, *Proceedings of the 2004 IEEE Congress on Evolutionary*

*EA 2001*, Vol. 2310 of *LNCS*, Springer Verlag, Creusot, France, pp. 255–266.

*Evolutionary Computation* 13(4): 858–878.

Academic Publishers.

York, NY, USA, pp. 539–546.

chapter 14, pp. 311–331.

*Conference*, MIT Press, Stanford University, CA, USA.

Morgan Kaufmann, Michigan, USA, pp. 75–82.

*of Euro GP 2008*, Springer-Verlag, pp. 206–217.

*Science*, Springer, Lausanne, Switzerland, pp. 38–49.

*Computation*, IEEE Press, Portland, Oregon, pp. 478–485.

*Evolutionary Computation* 5(2): 123–141.

*2012)*. accepted for publication.

*computation*, ACM Press, New York, NY, USA, pp. 511–518.

University of Wisconsin, Madison, Wisconsin, USA, pp. 308–313.

#### **8. References**


URL: *citeseer.ist.psu.edu/baluja94population.html*


[12] Hasegawa, Y. & Iba, H. [2009b]. Latent variable model for estimation of distribution algorithm based on a probabilistic context-free grammar, *IEEE Transactions on Evolutionary Computation* 13(4): 858–878.

24 Will-be-set-by-IN-TECH

*<sup>P</sup>*(*Xi*, *Zi*|*Ti*;β, <sup>π</sup>, <sup>ζ</sup>)*z<sup>k</sup>*

*Xi* ∑ *Zi zk*

*Xi* 

[1] Abbass, H. A., Hoai, X. & Mckay, R. I. [2002]. AntTAG: A new method to compose computer programs using colonies of ants, *Proceedings of the IEEE Congress on*

[2] Attias, H. [1999]. Inferring parameters and structure of latent variable models by variational Bayes, *the 15th Conference of Uncertainty in Artificial Intelligence*, Morgan

[3] Baluja, S. [1994]. Population-based incremental learning: A method for integrating genetic search based function optimization and competitive learning, *Technical Report*

[4] Bosman, P. A. N. & de Jong, E. D. [2004]. Grammar transformations in an EDA for genetic programming, *Technical Report UU-CS-2004-047*, Institute of Information and Computing

[5] Dempster, A., Laird, N. & Rubin, D. [1977]. Maximum likelihood from incomplete data via the EM algorithm, *Journal of the Royal Statistical Society, Series B* 39(1): 1–38. [6] Goldberg, D. E., Deb, D. & Kargupta, H. [1993]. Rapid, accurate optimization of difficult problems using fast messy genetic algorithms, *in* S. Forrest (ed.), *Proc. of the Fifth Int. Conf.*

[7] Harik, G. [1999]. Linkage learning via probabilistic modeling in the ECGA, *IlliGAL Report*

[8] Hasegawa, Y. & Iba, H. [2006]. Estimation of Bayesian Network for Program Generation, *Proceedings of The Third Asian-Pacific Workshop on Genetic Programming*, Hanoi, Vietnam,

[9] Hasegawa, Y. & Iba, H. [2007]. Estimation of distribution algorithm based on probabilistic grammar with latent annotations, *Proceedings of IEEE Congress of*

[10] Hasegawa, Y. & Iba, H. [2008]. A Bayesian network approach for program generation,

[11] Hasegawa, Y. & Iba, H. [2009a]. Estimation of distribution algorithm based on PCFG-LA mixture model, *Transactions of the Japanese Society for Artificial Intelligence (in Japanese)*

1 *<sup>P</sup>*(*Ti*;β, <sup>π</sup>, <sup>ζ</sup>) ∑

1 *<sup>P</sup>*(*Ti*;β, <sup>π</sup>, <sup>ζ</sup>) ∑

*ζkP*(*Ti*;β*k*, π*k*) *<sup>P</sup>*(*Ti*;β, <sup>π</sup>, <sup>ζ</sup>) .

*on Genetic Algorithms*, Morgan Kaufman, San Mateo, pp. 56–64.

*Evolutionary Computation*, IEEE press, Singapore, pp. 1143–1150.

*IEEE Transactions on Evoluationary Computation* 12(6): 750–764.

*i*

*ζkP*(*Ti*, *Xi*; β*<sup>k</sup>*

*<sup>i</sup> P*(*Ti*, *Xi*, *Zi*;β, π, ζ)

, π*<sup>k</sup>* )  given by

**8. References**

(99010).

pp. 35–46.

24(1): 80–91.

*ζ <sup>k</sup>* <sup>∝</sup> *N* ∑ *i*=1 ∑ *Xi* ∑ *Zi*

> = *N* ∑ *i*=1

> = *N* ∑ *i*=1

> = *N* ∑ *i*=1

*Evolutionary Computation*, pp. 1654–1659.

Kaufmann, Stockholm, Sweden, pp. 21–30.

URL: *citeseer.ist.psu.edu/baluja94population.html*

*CMU-CS-94-163*, Pittsburgh, PA.

Sciences, Utrecht University.

	- [30] Shin, J., Kang, M., McKay, R. I., Nguyen, X., Hoang, T.-H., Mori, N. & Essam, D. [2007]. Analysing the regularity of genomes using compression and expression simplification, *Proceedings of Euro GP 2007*, Springer-Verlag, pp. 251–260.
	- [31] Tanev, I. [2004]. Implications of incorporating learning probabilistic context-sensitive grammar in genetic programming on evolvability of adaptive locomotion gaits of snakebot, *GECCO 2004 Workshop Proceedings*, Seattle, Washington, USA.
	- [32] Tanev, I. [2005]. Incorporating learning probabilistic context-sensitive grammar in genetic programming for efficient evolution and adaptation of Snakebot, *Proceedings of EuroGP 2005*, Springer Verlag, Lausanne, Switzerland, pp. 155–166.
	- [33] Whigham, P. A. [1995]. Grammatically-based genetic programming, *Proceedings of the Workshop on Genetic Programming : From Theory to Real-World Applications*, Tahoe City, California USA, pp. 44–41.
	- [34] Whigham, P. A. [1996]. Search bias, language bias, and genetic programming, *Genetic Programming 1996: Proceedings of the First Annual Conference*, MIT Press, Stanford University, CA, USA, pp. 230–237.
	- [35] Whigham, P. A. & Science, D. O. C. [1995]. Inductive bias and genetic programming, *In Proceedings of First International Conference on Genetic Algorithms in Engineering Systems: Innovations and Applications*, pp. 461–466.
	- [36] Wineberg, M. & Oppacher, F. [1994]. A representation scheme to perform program induction in a canonical genetic algorithm, *Parallel Problem Solving from Nature III*, Vol. 866 of *LNCS*, Springer-Verlag, Jerusalem, pp. 292–301.
	- [37] Yanai, K. & Iba, H. [2003]. Estimation of distribution programming based on Bayesian network, *Proceedings of the 2003 Congress on Evolutionary Computation CEC2003*, IEEE Press, Canberra, pp. 1618–1625.
	- [38] Yanai, K. & Iba, H. [2005]. Probabilistic distribution models for EDA-based GP, *GECCO 2005: Proceedings of the 2005 conference on Genetic and evolutionary computation*, Vol. 2, ACM Press, Washington DC, USA, pp. 1775–1776.
	- [39] Yanase, T., Hasegawa, Y. & Iba, H. [2009]. Binary encoding for prototype tree of probabilistic model building gp, *Proceedings of 2009 Genetic and Evolutionary Computation Conference (GECCO 2009)*, pp. 1147–1154.

© 2012 Esmeraldo et al., licensee InTech. This is an open access chapter distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

© 2012 Esmeraldo et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

©2012 Augusto et al., licensee InTech. This is a paper distributed under the terms of the Creative Commons

**Genetically Programmed Regression Linear** 

Symbolic regression is a technique which characterizes, through mathematical functions, response variables with basis on input variables. Their main features include: need for no (or just a few) assumptions about the mathematical model; the coverage of multidimensional data, frequently unbalanced with big or small samples. In order to find the plausible Symbolic Regression Models (SRM), we used the genetic programming (GP) technique [1].

Genetic programming (GP) is a specialization of genetic algorithms (GA), an evolutionary algorithm-based methodology inspired by biological evolution, to find predictive functions. Each GP individual is evaluated by performing its function in order to determine how its

However, depending on the problem, one may notice that the estimates of the SRM found from the GP may present errors [4], affecting the precision of the predictive function. To deal with this problem, some studies [5,6] substitute the predictive functions, which are deterministic mathematical models, by linear regression statistical models (LRM) to

LRM, as well as the traditional mathematical models, can be used to model a problem and make estimates. Their great advantage is the possibility of controlling the estimate errors. Nevertheless, the studies available in the literature [5,6] have considered only information criteria, such as the sum of least squares [7] and AIC [8], as evaluation indexes with respect to the dataset and comparison of the solution candidate models. Despite the models obtained through this technique generate good indexes, sometimes the final models may not be representative, since the model structure assumptions were not verified, bringing some

**Models for Non-Deterministic Estimates** 

Guilherme Esmeraldo, Robson Feitosa, Dilza Esmeraldo and Edna Barros

http://dx.doi.org/10.5772/48156

output fits to the desired output [2,3].

compose the genetic individual models.

incorrect estimates [9].

**1. Introduction** 

Additional information is available at the end of the chapter
