**5. Experimental results**

88 Genetic Programming – New Approaches and Successful Applications

**Variable Representation** 

**Table 3.** Candidate variables to the linear regression model.

Data bus width bw 8, 16, 32 (bits)

Operation frequency fr 100, 166, 200 (MHz) Priority of the first process p1 Higher, lower (priority) Priority of the second process p2 Higher, lower (priority) Execution time of the application te Time measured in ns

with 72 distinct configurations.

of these variables.

variables.

by the obtained models.

We considered the following configuration parameters for the Amba AHB bus: data bus width, fixed priority arbitrage mechanisms, operation frequency and transference types. With the combination of the possible values for these parameters, we built a project space

In the representation of the LRMs, in the proposed GP algorithm, the configuration parameters of the bus ware characterized as predictive variables and the execution time of the embedded application, as the independent variable. The table below describes each one

Transference type ty With preemption, without preemption

It can be seen in Table 3 that all the predictive variables have discrete values, and then they are classified as factors. In the LRMs, the predictive variables are represented as dummy

With the increase in the training set, the probability of distortion on the estimates may increase, because the possibility of existence of outliers in this set may also increase. On the other hand, larger training sets may be more significant for the obtainment of a more precise model. For this reason, we used three training sets, with distinct sizes, to check these assumptions. So, we selected three sets, using the technique introduced in Subsection 3.2, with 10% (7 samples), 20% (14 samples) and 50% (36 samples) of the project space. The rest of the points were grouped in test sets, used to evaluate the precision of the estimates given

According to [2], on average, 50 generations are sufficient to find an acceptable solution, and larger populations have higher probability of finding a valid solution. So, for the GP algorithm, we considered the following parameters: 1000 candidates for each generation of LRM trees; the maximum number of generations was limited in 50; and stop condition of the algorithm consisting of an LRM which is the fittest candidate for 30 consecutive generations. For each generation, 999 tournaments were carried out, where 50 LRMs were randomly chosen to participate. During the tournament the AIC index is computed, in order to evaluate each one of the participants. So, the winners, those with the best AIC indexes, are selected for crossover. For mutation, a mutation factor is randomly computed in all the LRM trees generated by crossover. If the computed value for each tree is below 5% - index demonstrated in [37] as qualified to find good solutions in several problem types - then the three will mutate and, next, selected to make part of the next generation. Finally, the fittest LRM trees of the present

**in the LRM Values** 

As described in the previous section, we used three training sets for validation of the proposed approach. However, the application of this approach brought different results for these sets.

For the first set, that with 10% if the project space, which we will call Set A, the final model was approved in the formal evaluation, right in the first iteration. For Set B (the set with 20% of the design space), the final model was also approved in the formal evaluation, but needed five iterations. The results of the formal tests for the models selected for the Sets A and B can be seen in Table 4.


**Table 4.** Formal test results for verification of assumptions about the LRMs selected for the Sets A, B and C.

The test results for Sets A and B, presented in Table 4, show indexes (p-values) above the significance level, defined in this work as 5%. So, the structures of the errors of the selected LRMs, for the sets A and B, tend to have normalized errors, with constant variances and independent from each other.

Finally, for the Set C, the last training set, no model was approved in the formal evaluation. Table 4 also shows the tests results for the final model found (best AIC) for the Set C. The pvalues for the *Shapiro-Wilk* and *Breusch-Pagan* tests are below the significance level, being necessary to do residual analysis. The final results of the residual analysis are shown in the graphics of Figure 6.

Figure 6 presents the graphics of (a) Q-Q Plot and (b) Residuals histograms, as well as (c) of dispersion of the values observed in the Set C versus residuals and (d) of the order of collection of residuals. Analyzing Figure 6 (b), we may notice that the errors presented by the LRM selected for the Set C do not follow a normal distribution, violating the assumption of normality of the model structure. However, it can be seen that the distribution of the errors tends to be normal, since the points are distributed around the diagonal line of the Q-Q Plot diagram shown in Figure 6 (a). In Figure 6 (c), in turn, the assumption of homoscedasticity can be confirmed, since the maximum dispersion of the points is constant around the line. Finally, the last assumption, independence among the errors, can be verified in Figure 6 (d), since there is no apparent linear pattern in the distribution of points.

So, in the diagrams of residual analysis, we could verify that all the assumptions – normality, homoscedasticity and independence of the errors – about the structures of the errors of the LRM selected for the Set C were met.

Genetically Programmed Regression Linear Models for Non-Deterministic Estimates 91

In order to check the adherence of the LRMs to the data of the respective training sets, we performed the *Mann-Whitney-Wilcoxon* test, besides the computation of the global mean,

According to the result of the *Mann-Whitney-Wilcoxon* test, presented in Table 5, we can see that the estimates, given by the LRMs selected for the Sets A, B and C, tend to be equal to the data in the respective training sets, since the p-values are above the significance level, defined in the test as 5%. Analyzing Table 5, still, we notice that the selected LRMs presented accurate

Still analyzing the precision of the estimates, with respect to the Set C, the diagram of accumulated errors is presented in Figure 7. It shows the cumulative error (x axis) for percentages of the training set (y axis). The accumulated errors indicate the deviation between the estimates given by the LRM and the data from the training set. In this case, the

estimates, since the mean global, maximum and minimum errors were almost zero.

maximum and minimum errors. The results can be seen in Table 5.

estimates given by the selected LRM differed by a maximum of 5e-07.

**Figure 7.** Graphic of accumulated errors for the LRM selected for the Set C.

errors were computed. The results can be seen in Table 6.

Finally, in order to evaluate the precision of the predictions, which are the estimates given for the respective test sets of the Sets A, B and C, the selected LRMS were submitted to the *Mann-Whitney-Wilcoxon* test. Besides this test, the global mean, maximum and minimum

In Table 6, according to the results of the *Mann-Whitney-Wilcoxon* test, defined with a significance index of 5%, for the three sets, the estimates given by the selected LRMs tend to be equal to the data of the respective test sets. The three models had values for the global mean and minimum errors very close. For the maximum errors, there was a little variation, with the LRMs selected for the sets B and C, obtaining the highest and the lowest indexes, respectively.

**Figure 6.** Graphics for analysis of assumptions about the distribution of errors for the training set with 50% of the project space.


**Table 5.** Testing the fitness to the data from the training set and global mean, maximum and minimum errors for the LRMs selected for the Sets A, B and C.

In order to check the adherence of the LRMs to the data of the respective training sets, we performed the *Mann-Whitney-Wilcoxon* test, besides the computation of the global mean, maximum and minimum errors. The results can be seen in Table 5.

90 Genetic Programming – New Approaches and Successful Applications

errors of the LRM selected for the Set C were met.

50% of the project space.

errors for the LRMs selected for the Sets A, B and C.

So, in the diagrams of residual analysis, we could verify that all the assumptions – normality, homoscedasticity and independence of the errors – about the structures of the

**Figure 6.** Graphics for analysis of assumptions about the distribution of errors for the training set with

**Measurement Set A Set B Set C**  *Mann-Whitney-Wilcoxon* **test (P-Value)** 100% 100% 79.12% **Global mean error** 7.81e-08% 0% 7.15e-06% **Maximum error** 1.43e-07% 0% 4.52e-05% **Minimum error** 0% 0% 1.88e-08% **Table 5.** Testing the fitness to the data from the training set and global mean, maximum and minimum According to the result of the *Mann-Whitney-Wilcoxon* test, presented in Table 5, we can see that the estimates, given by the LRMs selected for the Sets A, B and C, tend to be equal to the data in the respective training sets, since the p-values are above the significance level, defined in the test as 5%. Analyzing Table 5, still, we notice that the selected LRMs presented accurate estimates, since the mean global, maximum and minimum errors were almost zero.

Still analyzing the precision of the estimates, with respect to the Set C, the diagram of accumulated errors is presented in Figure 7. It shows the cumulative error (x axis) for percentages of the training set (y axis). The accumulated errors indicate the deviation between the estimates given by the LRM and the data from the training set. In this case, the estimates given by the selected LRM differed by a maximum of 5e-07.

**Figure 7.** Graphic of accumulated errors for the LRM selected for the Set C.

Finally, in order to evaluate the precision of the predictions, which are the estimates given for the respective test sets of the Sets A, B and C, the selected LRMS were submitted to the *Mann-Whitney-Wilcoxon* test. Besides this test, the global mean, maximum and minimum errors were computed. The results can be seen in Table 6.

In Table 6, according to the results of the *Mann-Whitney-Wilcoxon* test, defined with a significance index of 5%, for the three sets, the estimates given by the selected LRMs tend to be equal to the data of the respective test sets. The three models had values for the global mean and minimum errors very close. For the maximum errors, there was a little variation, with the LRMs selected for the sets B and C, obtaining the highest and the lowest indexes, respectively.


Genetically Programmed Regression Linear Models for Non-Deterministic Estimates 93

This paper has been supported by the Brazilian Research Council - CNPq under grant

[1] Augusto D.A (2000) Symbolic Regression Via Genetic Programming. In Proceedings of

[2] Koza J.R (1992) Genetic Programming: On the Programming of Computers by Means of

[3] Spector L, Goodman E, Wu A, Langdon W.B, Voigt H.M, Gen M, Sem S, Dorigo M, Pezeshk S, Garzon M, Burke E (2001) Towards a New Evolutionary Computation: Advances in the Estimation of Distribution Algorithms. In Proceedings of the Genetic

[4] Keijzer M (2003) Improving Symbolic Regression with Interval Arithmetic and Linear Scaling.In Ryan C, Soule T, Keijzer M, Tsang E, Poli R., Costa E, editors. Heidelberg:

[5] Esmeraldo G, Barros E (2010) A Genetic Programming Based Approach for Efficiently Exploring Architectural Communication Design Space of MPSOCS. In Proceedings of

[6] Paterlini S, Minerva T (2010) Regression Model Selection Using Genetic Algorithms, Proceedings of the 11th WSEAS International Conference on RECENT Advances in

[7] Wolberg J (2005) Data Analysis Using the Method of Least Squares: Extracting the Most

[8] Sakamoto Y, Ishiguro M, Kitagawa G (1986) Akaike Information Criterion Statistics. D.

[11] McCulloch C.E, Searle S.R (2001) Generalized, Linear and Mixed Models. New York: Willey. [12] Anderson D, Feldblum S, Modlin C, Schirmacher D, Schirmacher E, Thandi E (2004) A Practitioner's Guide to Generalized Linear Models. Watson Wyatt Worldwide. [13] Hausmana J, Kuersteinerb G (2008) Difference in Difference Meets Generalized Least Squares: Higher Order Properties of Hypotheses Tests. In Journal of Econometrics, 144:

[14] Nelder J.A, Wedderburn R.W (1972) Generalized linear models. Journal of the Royal

[15] Chellapilla K (1997) Evolving Computer Programs Without Subtree Crossover. In IEEE.

[16] Aho A.V, Lam M.S, Sethi R, Ullman J.D (2006) Compilers: Principles, Techniques, and

Sixth Brazilian Symposium on Neural Networks, Rio de Janeiro.

and Evolutionary Computation Conference, Morgan Kaufmann.

Neural Networks, Fuzzy Systems & Evolutionary Computing.

[9] Seber G. A. F, Lee A.J (2003) Linear Regression Analysis. Hoboken: Wiley. [10] Weisberg S (2005) Applied Linear Regression, Third Edition. Hoboken: Wiley.

VI Southern Programmable Logic Conference.

Information from Experiments. Springer.

Statistical Society Series A, 135 (3): 370–384.

Tools, Second Edition. Prentice Hall.

Transactions on Evolutionary Computation, 1(3):209–216.

Reidel Publishing Company.

371-391.

**Acknowledgement** 

number 309089/2007-7.

Natural Selection, MIT Press.

Springer. 70-78 pp.

**7. References** 

**Table 6.** Test of fitness to the data of the test set and the global mean, maximum and minimum errors.

Still analyzing the results of the measurements presented in Table 6, we notice that the indexes obtained for the three sets, were comparatively very close. Such results may be explained by the used of the technique of selection of the training sets, which returns samples with high representative power.

In general, the use of the approach proposed in this work, which added methods for evaluation of the LRMs selected by the GP algorithm and the technique of selection of the elements of the training sets, allows the obtainment of solutions capable of providing precise estimates, even with the use of small samples.
