**6. Fitness-based synthesis approach**

To address the inability of the IPF approach to deal with multilevel controls, Ma and Srinivasan [25] developed the fitness-based synthesis approach that directly generates a list of households to match several multilevel controls without the need for determining a joint multiway distribution. The FBS approach generally involves selecting a set of households from the seed data, like PUMS, such that tract-level controls are satisfied. The FBS approach starts with an initial set of households that can either be a null set or a random sample from the seed data. Then, the population of each census tract is synthesized in an iterative fashion, with one household being either added or removed from the current list in each iteration. Count tables, defined in terms of control attributes, are used to track the number of households of each type that have already been included. The FBS approach implements an adding or removing procedure, while swapping is not considered. The main criteria in the FBS approach is the reduced sum of squared error for addition *FI in* and corresponding error for removal *FII in* as illustrated by Eqs. (4) and (5):

$$F\_I^{in} = \sum\_{j=1}^{I} \sum\_{k=1}^{K\_j} \left[ \left( R\_{jk}^{n-1} \right)^2 - \left( R\_{jk}^{n-1} - H \, T\_{jk}^i \right)^2 \right] \tag{4}$$

$$F\_{II}^{in} = \sum\_{j=1}^{J} \sum\_{k=1}^{K\_j} \left[ \left( R\_{jk}^{n-1} \right)^2 - \left( R\_{jk}^{n-1} + H \, T\_{jk}^i \right)^2 \right] \tag{5}$$

**9**

*A Critical Review on Population Synthesis for Activity- and Agent-Based Transportation Models*

responding count) tables; *J*, is the total number of control (or count) tables; *jk*, is an index representing the different cells in a table; *Tjk*, represents the value of cell *k*

Three applications of the FBS approach were performed to demonstrate the feasibility of incorporating many controls at multiple levels in the synthesis and increased accuracy of synthesized population. The three applications were performed using the 2000 Census data for 12 census tracts in Florida. The first application involved population synthesis using the IPF approach with only household-level controls. The second application involved population synthesis using the proposed FBS approach with few household- and individual-level controls. The third application also involved population synthesis using the FBS approach but with significantly larger number of controls. Validation for the three applications was performed by comparing the mean absolute error against 22 artificial census tracts that were created by randomly selecting subsets of households from the 2000 PUMS. Validation results demonstrated that FBS outperformed IPF and demonstrated efficiency and scalability. In addition, FBS did not require many iterations as it required only one to three times the number of households to be synthesized. In addition, the proposed FBS approach addresses the notorious IPF issues of zero-cell problems, computational resources (memory), and non-integers cell value in the

Hafezi and Habib [26] refined the FBS approach, and the refined FBS population synthesizer was examined by three models. The first model used householdlevel control tables. The second model used individual- and household-level control tables, and the third model used weighting individual-and household-level control tables. The models were applied to the province of Nova Scotia in Atlantic, Canada, using the 2006 Canadian Census and Public Use Microdata File (PUMF). The refined approach was implemented using the sparse matrix technique package in MATLAB that is based on high-level matrix programming for numerical computation. The three models were validated by error percentages and goodnessof-fit evaluation. Validation results indicated that the refined FBS approach can efficiently obtain a satisfactory result using both individual- and household-level control tables. However, higher homogeneity was achieved within the third model.

Other emerging approaches have been developed in an attempt to replace the IPF approach or to overcome one or more of its drawbacks. Emerging approaches include Bayesian network, annealing algorithm, linear programming, heuristicbased, copula-based, and entropy maximization approaches. The following para-

The Bayesian network approach was developed by Sun and Erath [27] in 2015. The proposed Bayesian network approach is a probabilistic population synthesizer that is intended as an alternative to approximate the inherent joint distribution in a more efficient manner. Using a graphical model, the proposed Bayesian network approach encodes probabilistic relationships, like causality or dependence, among a set of variables. The advantages of Bayesian network models lie in their ability to learn the structure of population systems, particularly when the number of attributes of interest is large using limited amounts of microdata. The Bayesian network

; *j*, is an index representing the control (and the cor-

, is the contribution of the *i*

*th*

, represents the value of cell *k* in count table *j* after itera-

*i*

, is the number of households/persons required to satisfy the target

*DOI: http://dx.doi.org/10.5772/intechopen.86307*

*n*−1

*n*−1

for cell *k* in control table *j* after iteration *n* − 1;*HTjk*

household in the seed data to the *kth* cell in control table *j*.

where *Rjkn*−1 = *Tjk* − *CTjk*

in control table *j*;*CTjk*

*n*−1

joint-distribution tables.

**7. Emerging approaches**

graphs introduce each of the emerging approaches.

tion *n* − 1;*Rjk*

Subject to *FI in*+ *FII in* =-2 ∑ *j*=1 *J* ∑ *k*=1 *kj* (*HTjk i* ) 2

where *Rjkn*−1 = *Tjk* − *CTjk n*−1 ; *j*, is an index representing the control (and the corresponding count) tables; *J*, is the total number of control (or count) tables; *jk*, is an index representing the different cells in a table; *Tjk*, represents the value of cell *k* in control table *j*;*CTjk n*−1 , represents the value of cell *k* in count table *j* after iteration *n* − 1;*Rjk n*−1 , is the number of households/persons required to satisfy the target for cell *k* in control table *j* after iteration *n* − 1;*HTjk i* , is the contribution of the *i th* household in the seed data to the *kth* cell in control table *j*.

Three applications of the FBS approach were performed to demonstrate the feasibility of incorporating many controls at multiple levels in the synthesis and increased accuracy of synthesized population. The three applications were performed using the 2000 Census data for 12 census tracts in Florida. The first application involved population synthesis using the IPF approach with only household-level controls. The second application involved population synthesis using the proposed FBS approach with few household- and individual-level controls. The third application also involved population synthesis using the FBS approach but with significantly larger number of controls. Validation for the three applications was performed by comparing the mean absolute error against 22 artificial census tracts that were created by randomly selecting subsets of households from the 2000 PUMS. Validation results demonstrated that FBS outperformed IPF and demonstrated efficiency and scalability. In addition, FBS did not require many iterations as it required only one to three times the number of households to be synthesized. In addition, the proposed FBS approach addresses the notorious IPF issues of zero-cell problems, computational resources (memory), and non-integers cell value in the joint-distribution tables.

Hafezi and Habib [26] refined the FBS approach, and the refined FBS population synthesizer was examined by three models. The first model used householdlevel control tables. The second model used individual- and household-level control tables, and the third model used weighting individual-and household-level control tables. The models were applied to the province of Nova Scotia in Atlantic, Canada, using the 2006 Canadian Census and Public Use Microdata File (PUMF). The refined approach was implemented using the sparse matrix technique package in MATLAB that is based on high-level matrix programming for numerical computation. The three models were validated by error percentages and goodnessof-fit evaluation. Validation results indicated that the refined FBS approach can efficiently obtain a satisfactory result using both individual- and household-level control tables. However, higher homogeneity was achieved within the third model.
