**3. Model comparison**

Model comparison is widely used to select a plausible model to fit a given dataset among all the considered candidate models. Various methods have been developed to make model comparisons for many models such as linear/nonlinear regression models, structural equation models, multilevel models, machine learning models, and pattern recognition model in the Bayesian framework over the past years.

To select a better model among all the candidate models, we can adopt the wellknown best subset selection methods such as Akaike information criterion (AIC), Bayesian information criterion (BIC), deviance information criterion (DIC), generalized information criterion (GIC), minimum description length (MDL), Hannan-Quinn information criterion (HIC), and log scoring criterion (also called the conditional predictive ordinate, i.e., CPO), which trade off a measure of model plausibility and a measure of model complexity. Also, the Bayes factor [17] has been developed to conduct Bayesian model comparison and is widely utilized to investigate the strength of the evidence in favor of one model among two candidate models. The Bayes factor for two competing models *H*<sup>0</sup> and *H*<sup>1</sup> is defined as follows:

$$\mathbf{B}\_{10} = \frac{\pi(Y\_{\text{obs}}, \delta | H\_1)}{\pi(Y\_{\text{obs}}, \delta | H\_0)},$$

where *π Y*obs ð Þ¼ , *δ*j*Hk* Ð *π Y*obs ð Þ , *δ*j*θ<sup>k</sup> π θ*ð Þ*<sup>k</sup> dθ<sup>k</sup>* is the marginal density of *Hk* with parameter vectors *θk*, and *π θ*ð Þ*<sup>k</sup>* is the prior density of *θ<sup>k</sup>* associated with model *Hk* for *k* ¼ 0,1. In general, if the Bayes factor B10 >1, the model *H*<sup>1</sup> is more plausible by the observed data than the model *H*0, which leads to the following model comparison rule: B10's value lying in the intervals (3,10), (10,30), (30,100), and (100,∞) yields moderate, strong, very strong, and extreme evidence in favor of model *H*1, respectively. It is rather difficult to compute *π Y*obs ð Þ , *δ*j*θ<sup>k</sup>* due to the intractable high-dimensional integral involved, thus computing the Bayes factor B10 is challenging. Many methods have been proposed to compute marginal likelihoods *π Y*obs ð Þ , *δ*j*Hk* or Bayes factors [3]. For example, see importance sampling, path sampling, bridge sampling, Harmonic mean method, random weight importance sampling, sequential Monte Carlo method, and pareto-smoothed importance sampling leave-one-out cross-validation.

One serious defect of the Bayes factor for model comparison is that it is well defined for improper priors of *θk*'s and is sensitive to the selection of the hyperparameters in the priors. According to our experience, different priors together with different sampling methods lead to different values of the Bayes factor, i.e.,

different model comparison results. To this end, some modifications of the Bayes factor have been proposed, for instance, the partial Bayes factor, the intrinsic Bayes factor, and the fractional Bayes factor, which are subject to more or less arbitrary selection of training samples, weights for averaging training samples, and fractions, respectively. Also, some robust methods were developed to compute the sensitivity of the marginal likelihoods via the simulation-based methods, called the automated prior robustness method. Recently, some novel methods were proposed to deal with improper priors in computing the Bayes factor. For example, see machine learning method, i.e., first using a part of the dataset studied to train the Bayes factor/transform the improper prior into a proper prior and then utilizing the remainder of the dataset for model comparison, which provides a new idea for computing the Bayes factor. The robustness of model comparison is a challenging topic, which is worth further studying.
