**7. Emerging approaches**

Other emerging approaches have been developed in an attempt to replace the IPF approach or to overcome one or more of its drawbacks. Emerging approaches include Bayesian network, annealing algorithm, linear programming, heuristicbased, copula-based, and entropy maximization approaches. The following paragraphs introduce each of the emerging approaches.

The Bayesian network approach was developed by Sun and Erath [27] in 2015. The proposed Bayesian network approach is a probabilistic population synthesizer that is intended as an alternative to approximate the inherent joint distribution in a more efficient manner. Using a graphical model, the proposed Bayesian network approach encodes probabilistic relationships, like causality or dependence, among a set of variables. The advantages of Bayesian network models lie in their ability to learn the structure of population systems, particularly when the number of attributes of interest is large using limited amounts of microdata. The Bayesian network

*Transportation Systems Analysis and Assessment*

than the existing ones. More specifically, the proposed HMM-based approach promised great flexibility and efficiency in terms of data preparation and model training while being able to reproduce the structural configuration of a given population from an unlimited number of micro samples and a marginal distribution. The HMM-based approach considers population synthesis as a variant of the standard decoding problem, at which the state sequences are supposed to be unknown. Accordingly, the maximum likelihood estimators related to the transition states were determined through the Viterbi algorithm. An important advantage of the HMM-based approach is its ability to handle both continuous and discrete variables, which addresses the inherent issue of loss of information due to aggregation of continuous variables like age. Also, the proposed HMM-based approach satisfies the need to discretize continuous variables to meet the fundamental limitation of Markov process to discrete states. The statistical and machine Learning Toolbox of MATLAB was used to generate sequences from an estimated HMM that were applied to the 2013 Belgian National household travel survey*.* Three simulations were run to illustrate the HMM-based approach. The first simulation tested the combined effects of scalability and dimensionality. The second simulation compared the HMM-based approach against IPF, and the third demonstrated the advantage of the HMM-based approach over IPF using various samples. Simulation results indicated that the proposed HMM-based approach provided accurate results due to its ability to reproduce the marginal distributions and their corresponding multivariate joint distributions with an acceptable error. Furthermore, the HMM-based approach outperformed IPF for small sample sizes while using smaller amount of input data than IPF. In addition, simulation results demonstrated that the HMM-based approach can integrate information provided by several data

sources to allow good estimates of synthesized population.

To address the inability of the IPF approach to deal with multilevel controls, Ma and Srinivasan [25] developed the fitness-based synthesis approach that directly generates a list of households to match several multilevel controls without the need for determining a joint multiway distribution. The FBS approach generally involves selecting a set of households from the seed data, like PUMS, such that tract-level controls are satisfied. The FBS approach starts with an initial set of households that can either be a null set or a random sample from the seed data. Then, the population of each census tract is synthesized in an iterative fashion, with one household being either added or removed from the current list in each iteration. Count tables, defined in terms of control attributes, are used to track the number of households of each type that have already been included. The FBS approach implements an adding or removing procedure, while swapping is not considered. The main criteria

as illustrated by Eqs. (4) and (5):

*<sup>n</sup>*−1 − *HTjk i* ) 2

*<sup>n</sup>*−1 + *HTjk i* ) 2 *in*

] (4)

] (5)

and cor-

in the FBS approach is the reduced sum of squared error for addition *FI*

*in*

*in* = ∑ *j*=1 *J* ∑ *k*=1 *Kj* [(*Rjk n*−1 ) 2 − (*Rjk*

*in* = ∑ *j*=1 *J* ∑ *k*=1 *Kj* [(*Rjk n*−1 ) 2 − (*Rjk*

**6. Fitness-based synthesis approach**

responding error for removal *FII*

*FI*

*FII*

*in*+ *FII*

*in* =-2 ∑ *j*=1 *J* ∑ *k*=1 *kj* (*HTjk i* ) 2

Subject to *FI*

**8**

approach was founded on the inference of the joint distribution—that is, perceiving the population synthesis problem as an inference of a multivariate probability distribution of demographic and socioeconomic household- and individual-level attributes. Like the Markov process-based approaches, the Bayesian network approach does not require marginals as input. In addition, it does not require any conditionals since structure learning and parameter estimation are inherently integrated in the learning model. The performance of the proposed Bayesian network approach was demonstrated through an application to the 2010 household interview travel survey of Singapore. The Bayesian network approach demonstrated good performance as illustrated by low SRMSE values. It also demonstrated good heterogeneity in synthetic population when the size of PUMS is less than 70% of the full population.

The simulated annealing (SA) algorithm was developed by Kim and Lee [28] to synthesize populations for activity-based models. The proposed SA algorithm is built upon the concepts of thermodynamics and metallurgy and was first introduced as a generic heuristic method for discrete optimization. The Metropolis-Hastings Algorithm was employed to solve the inherent problems of hill climbing and cooling schedule when applying SA to population synthesis. The proposed algorithm consists of seven steps. The first step concerns setting the maximum number of iterations. The second step sets up the total amount of columns and rows in the population and enters observed values of sample distribution. The third step sets up the before-distribution, which is composed by random numbers, while satisfying the total amount of restrictive conditions. The fourth step sets up the after-distribution, which is also composed by random numbers that satisfy total amount restrictive conditions. The fifth step involves calculation of absolute error on the before−/ after-distributions as well as observed data. The sixth step involves calculation of selection probability. The seventh and final step iterates steps 4 through 6 and ends the calculations when the absolute error (calculated in the fifth step) has the smallest value or satisfies ending conditions. The SA algorithm was implemented using the household travel diary survey from the Korean National Statistics Office. Results from the implementation indicated the need for further verification of the accuracy of this algorithm.

The linear programming (LP) approach was developed by Vovsha et al. [29] to synthesize populations as part of an activity-based model developed for the Maricopa Association of Governments. The LP approach is an analytical method that balances a list or sample of household weights to meet the controls imposed at some spatial level, typically, for each traffic analysis zone (TAZ). Features of the LP approach include (a) the general formulation of convergence of the balancing procedure with imperfect controls, (b) optimized discretization of weights while preserving the best possible match to the controls, and (c) ability to set controls at multiple spatial levels. In addition, the proposed LP approach featured an innovative discretizing method applied for the household weights and integrated with the balancing procedure. While validation of the proposed LP approach is questionable, it still demonstrates reasonable accommodation to various fine-resolution spatial levels that are much needed by newer-generation activity- and agent-based models.

The heuristic-based approach was developed by Zhuge et al. [30] to address two IPF limitations that received less attention from earlier studies. The first limitation stems from the existence of various solutions for one target marginal distribution. The second limitation stems from the optimization nature of population synthesis with the objective function being minimizing the mean absolute percentage error (MAPE) of control variables. The proposed heuristic-based approach consists of 11 steps arranged in three parts. The first part, including steps 1 and 2, is used to generate the initial household weights. The second part, including steps 3 through 11, adjusts the household weights until a stop criterion is met. The third part,

**11**

*A Critical Review on Population Synthesis for Activity- and Agent-Based Transportation Models*

including steps 10 and 11, calculates the adjustment steps and adjustment range*,* which are two fundamental parameters of the approach. The 2007 household travel survey data from Baoding, China, were used as a case study. Results indicated that heuristic-based approach cannot perform as well as IPF-based on comparing MAPE

Most recently, the copula-based approach was proposed by Kao et al. [31] to address previously identified limitations of IPF approach. Copulas are joint probability distributions with uniform marginal, which are a relatively new statistical tool. Hence, the copula-based approach was designed to preserve marginal distributions and dependence structure between variables. The proposed method was tested for the state of Iowa, and the results were compared with the IPF approach using mean, median, and correlation matrices. The synthesized households resulted in the same local statistics at each block group, but having similar intervariable correlations as described in the PUMS suggests the applicability of the copula-based approach. Another recent effort to develop an alternative to IPF approaches resulted in the development of entropy maximization-based population synthesizer by Paul et al. [32] which handles multiple geographies and avoids algorithmic errors. The entropy maximization approach was developed as part of the Oregon Department of Transportation (ODOT) effort to utilize an open-source population synthesis platform. The approach consists mainly of two algorithms. The first algorithm, namely, list balancing, finds weights that match the given marginal control distributions. The second algorithm, namely, integerizing, implements a LP-based procedure to covert fractional weights to integers. The proposed entropy maximization-based approach was implemented in Python and made heavy use of the Pandas and NumPy libraries, which allow for vectorization of operations to reduce overall runtime. Validation results against those of IPF approach were promising and

This study presented a critical, comprehensive literature review of population synthesizers starting from the early efforts through the most recent approaches. The review and synthesis indicated that, despite its identified limitations and drawbacks, IPF approach is the most feasible and widely used population synthesizer. All other studies and efforts used it as a reference for comparison and produced similar or slightly improved results. Evidently, IPF has its drawbacks and limitations. Yet reviewed literature indicates that there is no single approach that can result in an efficient and accurate population synthesizer. However, an integration of robust methods appears as the most promising approach, like the effort of Fournier et al. [33] where the limitations of IPF are resolved by combining five methods into an integral framework for population synthesis. **Table 1**, in the *Supplemental Information* section, summarizes the advantages and disadvantages of the presented

Almost three-decade old, yet the IPF approach is still being used in state-of-theart simulation platforms like MATSim. Given that IPF is the most studied approach and the fact that none of the alternatives provided an out-of-the-box solution, IPF is preferred approach by modelers and practitioners. This conclusion is confirmed by the findings of Saadi et al. [34], who investigated the influence of scalability on the accuracy of different population synthesizers using both fitting- and generationbased approaches. Their results revealed that simulation-based approaches are more stable than IPF approaches when the number of attributes increases; however, IPF

*DOI: http://dx.doi.org/10.5772/intechopen.86307*

demonstrated reasonable match to controls.

approaches are less sensitive to changes in sample size.

**8. Conclusion**

approaches.

values for both approaches.

#### *A Critical Review on Population Synthesis for Activity- and Agent-Based Transportation Models DOI: http://dx.doi.org/10.5772/intechopen.86307*

including steps 10 and 11, calculates the adjustment steps and adjustment range*,* which are two fundamental parameters of the approach. The 2007 household travel survey data from Baoding, China, were used as a case study. Results indicated that heuristic-based approach cannot perform as well as IPF-based on comparing MAPE values for both approaches.

Most recently, the copula-based approach was proposed by Kao et al. [31] to address previously identified limitations of IPF approach. Copulas are joint probability distributions with uniform marginal, which are a relatively new statistical tool. Hence, the copula-based approach was designed to preserve marginal distributions and dependence structure between variables. The proposed method was tested for the state of Iowa, and the results were compared with the IPF approach using mean, median, and correlation matrices. The synthesized households resulted in the same local statistics at each block group, but having similar intervariable correlations as described in the PUMS suggests the applicability of the copula-based approach.

Another recent effort to develop an alternative to IPF approaches resulted in the development of entropy maximization-based population synthesizer by Paul et al. [32] which handles multiple geographies and avoids algorithmic errors. The entropy maximization approach was developed as part of the Oregon Department of Transportation (ODOT) effort to utilize an open-source population synthesis platform. The approach consists mainly of two algorithms. The first algorithm, namely, list balancing, finds weights that match the given marginal control distributions. The second algorithm, namely, integerizing, implements a LP-based procedure to covert fractional weights to integers. The proposed entropy maximization-based approach was implemented in Python and made heavy use of the Pandas and NumPy libraries, which allow for vectorization of operations to reduce overall runtime. Validation results against those of IPF approach were promising and demonstrated reasonable match to controls.
