**3.3 Synergy and redundancy**

150 Bio-Inspired Computational Algorithms and Their Applications

n c1= m cj ⇔ cj= (n/m)c1 If V is a value function, then every proportional function is also a value function satisfying the same preferences. Therefore, we can arbitrarily set c1=1 to obtain Equation 3 below.

cj= n/m (3)

 U= Σi,k wikxik (4) In Equation 4, the variable *j* is used to index categories, whereas variable *k* indexes projects. The value of w1k is set to 1, and wjk= n/mj, where mj denotes the cardinality of an elementary portfolio defined over category Cj. Additionally, factors wik might be interpreted as importance factors. These weights express the importance given by the SDM to projects within certain category. Therefore, they should be calculated from the SDM's preferences, expressed while solving the indifference equations between elementary portfolios, as stated by Assumption 8 and according to Equation 3. A weight must be calculated for every category. If the cardinality of the set of categories is too large, the resolution of such categories can be reduced to simplify the model. A temporary set of weights is obtained using these coarse categories. By interpolation on such set, the values of the original (finer

Another important issue is the imprecise estimation of the monetary resources required by each project. If *d*k are the funds assigned to the k-th project, then there is an interval [mk, Mk] for *d*k where the SDM is uncertain about whether or not the project is being adequately supported. Therefore, the proposition "the k-th project is adequately supported" may be seen as a fuzzy statement. If we consider that the set of projects with adequate funds is fuzzy, then the SDM can define a membership function μk(dk) representing the degree of truth of the previous proposition. This is a monotonically increasing function on the interval

The subjective value assigned by the SDM to the k-th project is based on the belief that the project receives the necessary funding for its operation. When dk<mk the SDM is certain that the project is not sufficiently funded. When mk ≤ dk < Mk, the SDM hesitates about the truth of that statement. This uncertainty affects the subjective value of the project, because it reduces the feasible impact of the project, which had been subjectively estimated under the premise that funding was sufficient. The reduction of the project's subjective value can be modeled by the product of the original value and a feasibility factor f. This factor is a monotonically increasing function with μk as an argument such that f(0) = 0 and f(1) = 1. Equation 5 below, is generated by introducing this factor into Equation 4, and assuming that

The simplest definition of the feasibility factor is to make f(μik) = μik. This is equivalent to a fuzzy generalization of Equation 4. In such case, xik can be considered as the indicator

U = Σik f(μik) wik (5)

[mk ,Mk], such that μk(Mk) = 1, μk(mk) > 0, and μk(dk) = 0 when dk < mk.

In consequence, Equation 2 can now be re-stated as follows.

resolution) set can be obtained.

**3.2 Fuzziness of requirements** 

f(μik) >0 ⇔ xik=1.

Redundancy between projects can be addressed using constraints. For every pair of redundant projects, (pi, pj), i < j, condition μi(di) × μj(dj) = 0 should be enforced.

Let S = {S1, S2, …, Sk} be the set of coalitions of synergetic projects. In a model like the one represented by Equation 5, each of these coalitions should be treated as an (additional) individual project. As a result, each coalition has an associated cost (i.e., the sum of the costs of the individual projects in the coalition), and an evaluation. This evaluation should be better than the evaluation of any of the projects in the coalition. Let us assume that coalitions Si and Sj become projects PN+i and PN+j, respectively. If Si is a subset of Sj, then it does not make sense to include them both in a portfolio. Therefore, PN+i and PN+j must be considered redundant projects. Furthermore, if project pn is a member of Si, then the pair (pn, pN+i) is also redundant (since the value of pn is included in the value of pN+i).

#### **3.4 A Genetic algorithm for optimizing public portfolio subjective value**

Suppose that a feasible region of portfolios, RF, is defined by constraints on the total budget and on the distribution of projects by area. In addition, the SDM could include further constraints on the portfolios due to following reasons.


Let us use R'F, R'F ⊂ RF, to denote the set of values for the decision variables that make every portfolio acceptable. All the veto constraints are satisfied in R'F and there are no redundant projects in the portfolios belonging to this region. The optimization problem can now be defined as follows.

**Problem definition 2**. An optimal portfolio can be selected by maximizing U = Σik f(μik(dik)) wik, subject to *d* ∈ R'F , where dik indicates the financial support assigned to the k-th project belonging to the i-th category.

Public Portfolio Selection Combining Genetic Algorithms and Mathematical Decision Analysis 153

these models, we propose to choose parameters α and β (0*<* α <1, m/M < β ≤1) for

and β = 1 have been used. The most promising values for these parameters are reasonably

Set best\_solution ← any feasible portfolio. // *this is the best so far*

randomly select a gene in new\_solution and mutate it set Population ← Population ∪ {new\_solution}

perform crossover on (Ν × c\_r) individuals ∈ Population perform mutation on (Ν × m\_r) individuals ∈ Population

set best\_solution ← the fittest individual ∈ Population

set best\_solution ← the fittest individual ∈ Population

set Population ← Population ∪ {best\_solution}

evaluate every individual ∈ Population

as shown in Figure 2. For the experiments presented here, the values of α = 0.5

modelling

μ

1 2

3

10 11

12 13 14

20

found in the intervals [0.5, 0.7] and [0.9, 1], respectively.

generations

c\_r, the Crossover rate m\_r, the Mutation rate

**for** (i = 1 to cycles) **do for** (j = 1 to Ν - 1) **do**

**end** 

**end end** 

**Output**: best\_solution, the best solution found

Set Population ← {best\_solution}

**Algorithm 1**. A Genetic Algorithm for Project Portfolio Selection.

Set Ν ← the number of projects (chromosomes)

evaluate every individual ∈ Population

**for** (k = 1 to generations) **do**

**Input**: cycles, the number of iterations before the algorithm converges

set new\_solution ← best\_solution

Fig. 1. A Genetic Algorithm for Project Portfolio Selection

**return** best\_solution

For the selection stage, the roulette wheel technique was used. That is, the probability that a particular individual is selected for reproduction is proportional to its fitness value. For the experiments, the crossover rate was set to 0.2. Therefore twenty percent of the population is selected for crossover in any given reproductive trial. The crossover operator takes genes from each parent string and combines them to produce the offspring of the next generation. The main reason for doing this is that by creating new strings from fit parent strings, new and promising zones of the search space will be explored. While many crossover techniques

Solving this problem requires a complex non linear programming algorithm. The number of decision variables involved can be in the order of thousands. Due to the discontinuity of μi, the objective function is discontinuous on the hyper planes defined by dik = mik. Therefore, its continuity domain is not connected. The shape of the feasible region R'F is too convoluted, even more if synergy and redundancy need to be addressed. R'F hardly has the mathematical properties generally required by non linear programming methods. Note that veto constraints on the pairs of projects (pi, pk) and (pj, pk') are discontinuous on the hyper planes defined by dik = mik and djk' = mjk'. In a real world scenario, where hundreds or even thousands of projects are considered, non-linear programming solutions cannot handle these situations. Using Equation 6, a simplified form of Problem definition 2, was efficiently solved by Fernandez et al. (2009) and later by Litvinchev et al. (2010) using an integer-mixed programming model. Unfortunately, this approach cannot handle synergy, redundancy, veto constraints, nor can it handle the non-linear forms of function f in Problem definition 2.

Evolutionary algorithms are less sensitive to the shape of the feasible region, the number of decision variables, and the mathematical properties of the objective function (e.g., continuity, convexity, differentiability, and local extremes). In contrast, all of these issues are a real concern for mathematical non linear programming techniques (Coello, 1999). While evolutionary algorithms are not time-efficient, they often find solutions that closely approximate the optimal. Problem definition 2 represents a relatively rough model. However, the main interest is not on fine tuning the optimization process but rather on the generality of the model and on the ability to reach the optimal solution or a close approximation.

In Figure 1, we illustrate the genetic algorithm used for solving the optimization problem stated in Problem definition 2. This algorithm is based on the work of Fernandez and Navarro (2005). As in any genetic algorithm, a fundamental issue is defining a codification for the set of feasible solutions to the optimization problem. In this case, each individual represents a portfolio and each chromosome contains N genes, where N is the number of projects. For the chromosome, we use a floating point encoding representing the distribution of funding among the set of projects in the portfolio. The financial support for each project is represented by its membership function, *μj*(dj), which is realvalued with range in [0, 1]. That is, a floating point number represents each project's membership value. This membership value is a gene in our definition of chromosomes. As discussed earlier, the number of genes can be increased in order to address the effects of synergetic projects.

The fitness value of each individual is calculated based on function *U* given by Equation 5. Remember that this is a subjective value that captures the SDM's certainty that the project receives the necessary funding for its operation. The SDM's idea that a project has been assigned sufficient funds is modeled using two parameters, α and β. The domain for both parameters is the continuous interval [0, 1].The first parameter, α, can be interpreted as the degree of truth of the assertion "the project has sufficient financial support if it receives m monetary units of funding". When this financial support reaches the value βM, the predicate "the project has sufficient funding" is considered true. The value of these two parameters is needed to establish models for function μ*<sup>j</sup>* in order to calculate the value of *U*. To generate 152 Bio-Inspired Computational Algorithms and Their Applications

Solving this problem requires a complex non linear programming algorithm. The number of decision variables involved can be in the order of thousands. Due to the discontinuity of μi, the objective function is discontinuous on the hyper planes defined by dik = mik. Therefore, its continuity domain is not connected. The shape of the feasible region R'F is too convoluted, even more if synergy and redundancy need to be addressed. R'F hardly has the mathematical properties generally required by non linear programming methods. Note that veto constraints on the pairs of projects (pi, pk) and (pj, pk') are discontinuous on the hyper planes defined by dik = mik and djk' = mjk'. In a real world scenario, where hundreds or even thousands of projects are considered, non-linear programming solutions cannot handle these situations. Using Equation 6, a simplified form of Problem definition 2, was efficiently solved by Fernandez et al. (2009) and later by Litvinchev et al. (2010) using an integer-mixed programming model. Unfortunately, this approach cannot handle synergy, redundancy, veto constraints, nor can it handle the non-linear forms of function f in

Evolutionary algorithms are less sensitive to the shape of the feasible region, the number of decision variables, and the mathematical properties of the objective function (e.g., continuity, convexity, differentiability, and local extremes). In contrast, all of these issues are a real concern for mathematical non linear programming techniques (Coello, 1999). While evolutionary algorithms are not time-efficient, they often find solutions that closely approximate the optimal. Problem definition 2 represents a relatively rough model. However, the main interest is not on fine tuning the optimization process but rather on the generality of the model and on the ability to reach the optimal solution or a close

In Figure 1, we illustrate the genetic algorithm used for solving the optimization problem stated in Problem definition 2. This algorithm is based on the work of Fernandez and Navarro (2005). As in any genetic algorithm, a fundamental issue is defining a codification for the set of feasible solutions to the optimization problem. In this case, each individual represents a portfolio and each chromosome contains N genes, where N is the number of projects. For the chromosome, we use a floating point encoding representing the distribution of funding among the set of projects in the portfolio. The financial support for each project is represented by its membership function, *μj*(dj), which is realvalued with range in [0, 1]. That is, a floating point number represents each project's membership value. This membership value is a gene in our definition of chromosomes. As discussed earlier, the number of genes can be increased in order to address the effects of

The fitness value of each individual is calculated based on function *U* given by Equation 5. Remember that this is a subjective value that captures the SDM's certainty that the project receives the necessary funding for its operation. The SDM's idea that a project has been assigned sufficient funds is modeled using two parameters, α and β. The domain for both parameters is the continuous interval [0, 1].The first parameter, α, can be interpreted as the degree of truth of the assertion "the project has sufficient financial support if it receives m monetary units of funding". When this financial support reaches the value βM, the predicate "the project has sufficient funding" is considered true. The value of these two parameters is

μ

*<sup>j</sup>* in order to calculate the value of *U*. To generate

Problem definition 2.

approximation.

synergetic projects.

needed to establish models for function

these models, we propose to choose parameters α and β (0*<* α <1, m/M < β ≤1) for modelling μ as shown in Figure 2. For the experiments presented here, the values of α = 0.5 and β = 1 have been used. The most promising values for these parameters are reasonably found in the intervals [0.5, 0.7] and [0.9, 1], respectively.


Fig. 1. A Genetic Algorithm for Project Portfolio Selection

For the selection stage, the roulette wheel technique was used. That is, the probability that a particular individual is selected for reproduction is proportional to its fitness value. For the experiments, the crossover rate was set to 0.2. Therefore twenty percent of the population is selected for crossover in any given reproductive trial. The crossover operator takes genes from each parent string and combines them to produce the offspring of the next generation. The main reason for doing this is that by creating new strings from fit parent strings, new and promising zones of the search space will be explored. While many crossover techniques

Public Portfolio Selection Combining Genetic Algorithms and Mathematical Decision Analysis 155

Let us now consider the following example taken from (Fernandez and Navarro, 2005). The goal is to distribute a budget of 50 million dollars among of 400 R&D projects. These projects are distributed in four areas, namely engineering, life sciences, formal sciences, and social sciences. There are 140 projects in the first area (engineering), 80 projects in the second one (life sciences), 100 projects in the third area (formal sciences), and 80 project in the last area

The classification of the projects, according to their evaluations and areas, is described in Table 1. The projects subjective values corresponding to each category and area are shown in Table 2. These values were obtained taking a social sciences project evaluated as Below Average as baseline (w = 1). These values define a ranking on the set of projects that can be used to allocate funds according to the conventional heuristic described in Section 1 (with all

> Very Good 54 28 13 12 Good 23 9 18 24 Above Average 62 32 36 28 Average 1 9 17 11 Below Average 0 2 16 5 **Total 140 80 100 80**

> Very Good 5.838 4.3785 3.892 2.9190 Good 4.540 3.4055 3.027 2.2700 Above Average 3.027 2.2700 2.018 1.5135 Average 2.108 1.5810 1.405 1.0540 Below Average 2.000 1.5000 1.333 1.0000

Four different instances of the problem were generated by assigning random budget ranges to each area. For each project, random values of *mik,* and *Mik* were defined, representing its minimum and maximum funding requirements. The proposed evolutionary algorithm was run 30 times to optimize the expression given by Problem definition 2. For simplicity f(μik)

Area 1 Area 2 Area 3 Area 4

Area 1 Area 2 Area 3 Area 4

**3.5 An illustrative example** 

its known limitations).

Table 1. Distribution of Projects by Area.

Table 2. Projects Subjective Values.

was taken to be identical to μik .

(social sciences). No synergetic effects are considered.

have been reported, in this algorithm the classic crossover technique based on a random cut point was used. The number of offspring resulting from this process is one fifth the size of the population.

The replacing process dictates how to update the current population with the individuals obtained by crossover. A random replacement approach (every individual has the same probability to be replaced) is used for reducing selective pressure. A similar approach is used for implementing an elitist policy. That is, an individual is randomly chosen from the current population and is replaced by the individual with the highest evaluation. Consequently, the presence of the best individual (best\_solution in Algorithm 1) in the updated population is guaranteed.

Algorithm 1 uses a constant mutation rate that is set a priori. Each individual in the population is considered for mutation, and all the individuals have the same probability of mutating, which is defined by the mutation rate. Once an individual has been selected for mutation, one of its genes is randomly chosen. This gene will change by adding to it a random value in the [-0.2, 0.2] interval, excluding zero. The resulting gene value, however is limited to the [0, 1] interval.

Redundancy is addressed in a very simple way. If, as the result of some genetic operator an individual (i.e., a portfolio) containing redundant projects is generated, this individual is immediately "killed". That is, its incorporation to the current population is denied.

Fig. 2. The Membership Function

#### **3.5 An illustrative example**

154 Bio-Inspired Computational Algorithms and Their Applications

have been reported, in this algorithm the classic crossover technique based on a random cut point was used. The number of offspring resulting from this process is one fifth the size of

The replacing process dictates how to update the current population with the individuals obtained by crossover. A random replacement approach (every individual has the same probability to be replaced) is used for reducing selective pressure. A similar approach is used for implementing an elitist policy. That is, an individual is randomly chosen from the current population and is replaced by the individual with the highest evaluation. Consequently, the presence of the best individual (best\_solution in Algorithm 1) in the

Algorithm 1 uses a constant mutation rate that is set a priori. Each individual in the population is considered for mutation, and all the individuals have the same probability of mutating, which is defined by the mutation rate. Once an individual has been selected for mutation, one of its genes is randomly chosen. This gene will change by adding to it a random value in the [-0.2, 0.2] interval, excluding zero. The resulting gene value, however is

Redundancy is addressed in a very simple way. If, as the result of some genetic operator an individual (i.e., a portfolio) containing redundant projects is generated, this individual is

m βM d

immediately "killed". That is, its incorporation to the current population is denied.

the population.

updated population is guaranteed.

limited to the [0, 1] interval.

*μ*(d)

1

0

0

α

Fig. 2. The Membership Function

Let us now consider the following example taken from (Fernandez and Navarro, 2005). The goal is to distribute a budget of 50 million dollars among of 400 R&D projects. These projects are distributed in four areas, namely engineering, life sciences, formal sciences, and social sciences. There are 140 projects in the first area (engineering), 80 projects in the second one (life sciences), 100 projects in the third area (formal sciences), and 80 project in the last area (social sciences). No synergetic effects are considered.

The classification of the projects, according to their evaluations and areas, is described in Table 1. The projects subjective values corresponding to each category and area are shown in Table 2. These values were obtained taking a social sciences project evaluated as Below Average as baseline (w = 1). These values define a ranking on the set of projects that can be used to allocate funds according to the conventional heuristic described in Section 1 (with all its known limitations).


Table 1. Distribution of Projects by Area.


Table 2. Projects Subjective Values.

Four different instances of the problem were generated by assigning random budget ranges to each area. For each project, random values of *mik,* and *Mik* were defined, representing its minimum and maximum funding requirements. The proposed evolutionary algorithm was run 30 times to optimize the expression given by Problem definition 2. For simplicity f(μik) was taken to be identical to μik .

Public Portfolio Selection Combining Genetic Algorithms and Mathematical Decision Analysis 157

This problem can be solved using a genetic algorithm similar to the one previously presented. However, a different encoding for individuals must be devised. Our proposal is to encode individuals as a 2N-dimensional vector of the form (μ1, t1, μ2, t2, …, μN, tN). As before, genes corresponding to μi have domain defined by the continuous interval [0, 1]. Genes corresponding to ti have a domain defined by the set {1, 2, 3, …, T}, where T is the maximum number of time periods. Crossover can only occur between genes of the same kind. However, mutations may occur at any gene. Restrictions such as time precedence and the earliest time a project can start are controlled by constraints as described by Carazo et al.

Given a set of premises, it is possible to create a value model for selecting optimal portfolios from an SDM perspective. While this problem is Turing-decidable, finding its exact solution requires exponential time. However, the use of genetic algorithms for solving this problem

Inspired by a normative approach, the set of premises presented here is based on the

• To the SMD, every project and every portfolio has a subjective value that depends on its

• The SDM either has already defined a consistent system of preferences, or has the

• The SDM is willing to invest a considerable amount of mental effort in order to define this consistent set of preferences and produce the aforementioned value model.

As for the algorithmic solution to the portfolio problem, its computational complexity can increase considerably when synergic effects and temporal dependencies are considered. However strategic planning requires a high quality model. The problems defined in this scenario are so important that they justify the use of computational intensive solutions.

This work was sponsored in part by the Mexican Council for Science and Technology

Abdullah, A. & Chandra, C.K. (1999). *Sustainable Transport: Priorities for Policy Sector Reform*,

Badri, M.A. & Davis, D. (2001). A Comprehensive 0-1 Goal Programming Model for Project Selection. *International Journal of Project Management,* No. 19, pp. 243-252. Bertolini, M., Braglia, M., & Carmignani, G. (2006). Application of the AHP Methodology in

Making a Proposal for a Public Work Contract. *International Journal of Project* 

<http://www.worldbank.org/html/extpb/sustain/sustain.htm>

Boardman, A. (1996). *Cost-benefit Analysis: Concepts and Practices*, Prentice Hall.

social impact. This value exists even if it cannot be initially quantified.

(2010).

**4. Concluding remarks** 

following assumptions.

aspiration of doing so.

**5. Acknowledgements** 

**6. References** 

(CONACyT) under grants 57255 and 106098.

World Bank, Retrieved from

*Management* No. 24, pp. 422-430.

can closely approximate the optimal portfolio selection.

The algorithm was coded using Visual C++. Its execution time was about 25 minutes for one million generations running on a Pentium-4 processor with a, 2.1 GHz clock cycle. This architecture was complemented with 256 MB of physical memory and a 74.5-GB hard disk drive. The experimental results shown in Table 3 indicate a significant improvement in the value of the optimized portfolio with respect to conventional approaches.

These results represent an average saving of 6.514 million dollars, equivalent to 13.02% of the total budget. This improvement has a positive impact on the number of supported projects, as Table 4 reveals. The average number of supported projects is 12.5 % higher than when conventional methods were used.


Table 3. Traditional Funding versus our Approach.


Table 4. Traditional Funding versus our Approach (portfolio's cardinality).

#### **3.6 Modeling temporal dependencies**

The model described in Problem definition 2 can be generalized to incorporate temporal restrictions.

**Problem definition 3**. An optimal portfolio of projects with temporal dependencies can be selected by maximizing U= Σik f(μik(dik)) wik, subject to (**d**, **t**)∈ R''F , where vector **t** =(t(p1), t(p2),…) denotes the decision variables valid during the period of time when each project starts. R''F contemplates time-precedence restrictions, restrictions on the time projects can start, and the available funds for each time interval.

This problem can be solved using a genetic algorithm similar to the one previously presented. However, a different encoding for individuals must be devised. Our proposal is to encode individuals as a 2N-dimensional vector of the form (μ1, t1, μ2, t2, …, μN, tN). As before, genes corresponding to μi have domain defined by the continuous interval [0, 1]. Genes corresponding to ti have a domain defined by the set {1, 2, 3, …, T}, where T is the maximum number of time periods. Crossover can only occur between genes of the same kind. However, mutations may occur at any gene. Restrictions such as time precedence and the earliest time a project can start are controlled by constraints as described by Carazo et al. (2010).
