**2.1 Two-level models**

Let *yij* be the dependent variable measured in the *i-th* individual in the *j-th* leveltwo unit (e.g., the j-th group); *i* ¼ 1, … ,*N*, where *N* is the total sample size, and *j* ¼ 1, … ,*J* for *J* level-two units.

The simplest two-level model is the *null model (intercept-only model, unconditional means model, or one-way random-effects analysis of the variance)*. The model is defined by two equations:

$$\begin{aligned} \boldsymbol{\nu}\_{\boldsymbol{i}\boldsymbol{j}} &= \boldsymbol{\beta}\_{0\boldsymbol{j}} + \boldsymbol{e}\_{\boldsymbol{i}\boldsymbol{j}} \\ \boldsymbol{\beta}\_{0\boldsymbol{j}} &= \boldsymbol{\gamma}\_{00} + \boldsymbol{\omega}\_{0\boldsymbol{j}} \end{aligned} \tag{2}$$

*β*0*<sup>j</sup>* is the mean of *y* in the group *j* that varies across groups; *eij* is the individual variation around this mean; *γ*<sup>00</sup> is the overall intercept, that is, the grand mean of *y*; and *u*0*<sup>j</sup>* is the deviation of *β*0*<sup>j</sup>* with respect to *γ*00.

Substitution of *β*0*<sup>j</sup>* in *yij* produces the single-equation model:

$$
\gamma\_{ij} = \gamma\_{00} + \mathfrak{u}\_{0j} + \mathfrak{e}\_{\vec{\eta}} \tag{3}
$$

Eq. (3) is composed of a fixed part, *γ*00, and a random part corresponding to two random effects, *<sup>u</sup>*0*<sup>j</sup>* and *eij*. Assuming that *eij* � *<sup>N</sup>* 0, *<sup>σ</sup>*<sup>2</sup> ð Þ and *<sup>u</sup>*0*<sup>j</sup>* � *<sup>N</sup>* 0, *<sup>σ</sup>*<sup>2</sup> *u*0 and that *eij* and *u*0*<sup>j</sup>* are independent, the variance of *yij* in Eq. (3) is

$$\begin{split} \text{var}\left(\mathbf{y}\_{\vec{\text{ij}}}\right) &= \text{var}\left(\mathbf{y}\_{00} + \boldsymbol{\mu}\_{0\vec{\text{j}}} + \boldsymbol{\sigma}\_{\vec{\text{ij}}}\right) \\ &= \boldsymbol{\sigma}\_{\boldsymbol{\upupup}0}^2 + \boldsymbol{\sigma}^2 \end{split} \tag{4}$$

where *σ*<sup>2</sup> *<sup>u</sup>*<sup>0</sup> <sup>¼</sup> *<sup>σ</sup>*<sup>2</sup> *<sup>b</sup>* is the variance between groups, and *<sup>σ</sup>*<sup>2</sup> <sup>¼</sup> *<sup>σ</sup>*<sup>2</sup> *<sup>w</sup>* is the variance within groups in Eq. (1) to calculate the ICC.

Now, let us consider a two-level model with two level-one independent variables, *x*<sup>1</sup> with a fixed effect *α*1, which does not vary between groups, and *x*<sup>2</sup> with a random effect *β*2*<sup>j</sup>* , which does vary between groups.

This model is defined by

$$\begin{aligned} \chi\_{\vec{\eta}} &= \beta\_{0\dot{\jmath}} + a\_1 \varkappa\_{1\dot{\jmath}} + \beta\_{2\dot{\jmath}} \varkappa\_{2\dot{\jmath}} + e\_{\dot{\imath}} \\ \beta\_{0\dot{\jmath}} &= \chi\_{00} + \chi\_{01} w\_{1\dot{\jmath}} + u\_{0\dot{\jmath}} \\ \beta\_{2\dot{\jmath}} &= \chi\_{10} + \chi\_{11} w\_{1\dot{\jmath}} + u\_{1\dot{\jmath}} \end{aligned} \tag{5}$$

In the above equation, both *β*0*<sup>j</sup>* and *β*2*<sup>j</sup>* depend on a level-two independent variable; *w*<sup>1</sup> and *uqj* are the deviation of the effect of the variable *w*<sup>1</sup> on *yij* in the group *j* with respect to the average effect *γq*0.

Substitution of *β*0*<sup>j</sup>* and *β*2*<sup>j</sup>* in *yij* produces the single-equation model:

$$\mathbf{y}\_{\vec{\eta}} = \mathbf{y}\_{00} + a\_1 \mathbf{x}\_{1\vec{\eta}} + \mathbf{y}\_{01} \mathbf{w}\_{1\vec{\eta}} + \mathbf{y}\_{10} \mathbf{x}\_{2\vec{\eta}} + \mathbf{y}\_{11} \mathbf{w}\_{1\vec{\eta}} \mathbf{x}\_{2\vec{\eta}} + \left(\mathbf{u}\_{0\vec{\eta}} + \mathbf{u}\_{1\vec{\eta}} \mathbf{x}\_{2\vec{\eta}} + \mathbf{e}\_{\vec{\eta}}\right) \tag{6}$$

Generalizing the above two-level model to the case where level one includes *P* independent variables *xp* that have a fixed effect, *Q*, independent variables *xq* that have a random effect, and *M* level-two independent variables *wm*, which also have a fixed effect, we have:

$$\begin{aligned} \mathbf{y}\_{\vec{\eta}} &= \mathbf{y}\_{00} + \sum\_{m=1}^{M} \mathbf{y}\_{0m} \mathbf{w}\_{m\vec{\eta}} + \sum\_{p=1}^{P} a\_p \mathbf{x}\_{p\vec{\eta}} + \sum\_{q=P+1}^{P+Q} \mathbf{y}\_{q0} \mathbf{x}\_{q\vec{\eta}} + \sum\_{q=P+1}^{P+Q} \sum\_{m=1}^{M} \mathbf{y}\_{qm} \mathbf{w}\_{m\vec{\eta}} \mathbf{x}\_{q\vec{\eta}} \\ &+ \left( \mathbf{u}\_{0j} + \sum\_{q=P+1}^{P+Q} \mathbf{x}\_{q\vec{\eta}} \mathbf{u}\_{qj} + \mathbf{e}\_{ij} \right) \end{aligned} \tag{7}$$

Model 7 is composed of *fixed effects* (the coefficients *γ* and *α*) and *random effects* (all terms in parentheses). Two-level models can also be expressed in a matrix form by

$$Y = X\beta + \mathcal{W}u + e\tag{8}$$

where *Y* is the vector of measurements of the dependent variable, *X* is the design matrix of the fixed effect parameter vector *β* (containing the overall mean, main effects, and interactions), *W* is the design matrix of the random effects given by the vector *U*, and *e* is the vector of level-one residual errors.

### **2.2 Three-level models**

Let *i*, *j*, and *k* indicate the observation units of levels 1, 2, and 3, respectively. In addition, level 3 has *K* units, each level-three unit has *Jk* level-two units, and the *j*th level-two unit in the *k*th level-three unit has *nijk* level-oneunits. The null model is

$$\begin{aligned} \chi\_{ijk} &= \beta\_{0jk} + \mathfrak{e}\_{ijk} \\ \beta\_{0jk} &= \chi\_{00k} + \mathfrak{u}\_{0jk} \\ \chi\_{00k} &= \mathfrak{xi}\_{000} + \nu\_{00k} \end{aligned} \tag{9}$$

In the first equation, *β*0*jk* is the level-one random intercept that varies between the groups of level two, and *eijk* is the residual variance at level one with respect to *β*0*jk*. In the second equation, *γ*00*<sup>k</sup>* is the level-two random intercept that varies between the level-three units, and *u*0*jk* is the residual variation of the group *j* with respect to *γ*00*<sup>k</sup>*. In the third equation, *ξ*<sup>000</sup> is the general intercept, that is, the grand mean of *y*, and *ν*00*<sup>k</sup>* is the variation between the means of the level-three groups (i.e., the deviation of the mean of group *k* with respect to the grand mean).

Substituting *γ*00*<sup>k</sup>* in *β*0*jk* and then *β*0*jk* in *yijk* yields

$$\mathcal{Y}\_{ijk} = \xi\_{000} + \nu\_{00k} + u\_{0jk} + e\_{ijk} \tag{10}$$

Assuming that *eijk* � *<sup>N</sup>* 0, *<sup>σ</sup>*<sup>2</sup> ð Þ, *<sup>u</sup>*0*jk* � *<sup>N</sup>* 0, *<sup>σ</sup>*<sup>2</sup> *u*0 � �, and *<sup>ν</sup>*00*<sup>k</sup>* � *<sup>N</sup>* 0, *<sup>σ</sup>*<sup>2</sup> *ν*0 � � and that *eijk*, *u*0*jk*, and *ν*00*<sup>k</sup>* are independent, the variance of *yij* in Eq. (10) is

$$var\left(\mathcal{Y}\_{ij\mathbb{k}}\right) = \sigma\_{\nu\_0}^2 + \sigma\_{\nu 0}^2 + \sigma^2 \tag{11}$$

One way to define the ICC at levels two and three, attributed to Davis and Scott [3], is

$$\rho\_{\text{level\\_3}} = \frac{\sigma\_{\nu\_0}^2}{\sigma\_{\nu\_0}^2 + \sigma\_{\nu\_0}^2 + \sigma^2} \tag{12}$$

$$\rho\_{\text{level\\_2}} = \frac{\sigma\_{\text{u}\_0}^2}{\sigma\_{\nu\_0}^2 + \sigma\_{\text{u}\_0}^2 + \sigma^2},\tag{13}$$

Now, we consider a three-level multilevel model with two level-one independent variables, *x*<sup>1</sup> and *x*2; the former has a fixed effect and the latter a random effect:

$$\mathcal{Y}\_{\rm ijk} = \beta\_{0jk} + a\_1 \varkappa\_{1jk} + \beta\_{2jk} \varkappa\_{2jk} + \varepsilon\_{ijk} \tag{14}$$

Let suppose that the random coefficients *β*0*jk* and *β*2*jk* are explained by a secondlevel variable, *w*1, by the relationships

$$\begin{aligned} \beta\_{0jk} &= \gamma\_{00k} + \gamma\_{01k} w\_{1jk} + u\_{0jk} \\ \beta\_{2jk} &= \gamma\_{10k} + \gamma\_{11k} w\_{1jk} + u\_{1jk} \end{aligned} \tag{15}$$

And the random coefficients in Eq. (15) are explained by a third-level variable, *z*1, by the equations

$$\begin{aligned} \gamma\_{00k} &= \xi\_{000} + \xi\_{001} z\_{1k} + \nu\_{00k} \\ \gamma\_{01k} &= \xi\_{010} + \xi\_{011} z\_{1k} + \nu\_{01k} \\ \gamma\_{10k} &= \xi\_{100} + \xi\_{101} z\_{1k} + \nu\_{10k} \\ \gamma\_{11k} &= \xi\_{110} + \xi\_{111} z\_{1k} + \nu\_{11k} \end{aligned} \tag{16}$$

Substituting Eqs. (16) in (15) and then in Eq. (14), we have the three-level multilevel model:

$$\begin{split} y\_{ijk} &= \xi\_{000} + a\_1 \mathbf{x}\_{1jk} + \xi\_{001} \mathbf{z}\_{1k} + \xi\_{010} w\_{1jk} + \xi\_{100} \mathbf{x}\_{2jk} + \xi\_{011} \mathbf{z}\_{1k} w\_{1jk} \\ &+ \xi\_{101} \mathbf{z}\_{1k} \mathbf{x}\_{2jk} + \xi\_{110} w\_{1jk} \mathbf{x}\_{2jk} + \xi\_{111} \mathbf{z}\_{1k} w\_{1jk} \mathbf{x}\_{2jk} \\ &+ (\boldsymbol{\mu}\_{0jk} + \boldsymbol{\nu}\_{00k} + \boldsymbol{\nu}\_{10k} \mathbf{x}\_{2jk} + \boldsymbol{\mu}\_{1jk} \mathbf{x}\_{2jk} + \boldsymbol{\nu}\_{01k} \boldsymbol{\nu}\_{1jk} + \boldsymbol{\nu}\_{11k} \boldsymbol{\nu}\_{1jk} \mathbf{x}\_{2jk} + \boldsymbol{\epsilon}\_{ijk}) \end{split} \tag{17}$$

where the regression coefficients *ξ* and *α* are the fixed part of the model, and the residual terms of each level contained in parentheses are the random part.

We can generalize the three-level model of Eq. (17). Suppose level one contains *P* independent variables *xp* that have a fixed effect, *αp*, and *Q* independent variables *xq q* ¼ *P* þ 1, … ,*P* þ *Q*. Level two contains *M* variables *wm m* ¼ 1, … ,*M*. Level three contains *L* independent variables *zl l* ¼ 1, … ,*L*.

*yijk* <sup>¼</sup> *<sup>ξ</sup>*<sup>000</sup> <sup>þ</sup><sup>X</sup> *P p*¼1 *αpxpijk* þ *P* X þ*Q q*¼*P*þ1 *<sup>ξ</sup><sup>q</sup>*00*xqijk* <sup>þ</sup><sup>X</sup> *M m*¼1 *<sup>ξ</sup>*0*m*0*wmjk* <sup>þ</sup><sup>X</sup> *L l*¼1 *ξ*00*lzlk* þ X *M m*¼1 X *L l*¼1 *ξ*0*mlzlkwmjk* þ *P* X þ*Q q*¼*P*þ1 X *L l*¼1 *ξ<sup>q</sup>*0*lzlkxqijk* þ *P* X þ*Q q*¼*P*þ1 X *M m*¼1 *ξqm*0*wmjkxqijk* þ *P* X þ*Q q*¼*P*þ1 X *M m*¼1 X *L l*¼1 *ξqmlzlkwmjkxqijk* þ *ν*00*<sup>k</sup>* þ *u*0*jk* þ *P* X þ*Q q*¼*P*þ1 *<sup>ν</sup>*10*kxqijk* þ *P* X þ*Q q*¼*P*þ1 *uqjkxqijk* <sup>þ</sup><sup>X</sup> *M m*¼1 *ν*0*mkwmjk* þ *P* X þ*Q q*¼*P*þ1 X *M m*¼1 *<sup>ν</sup>qmkwmjkxqijk* <sup>þ</sup> *eijk*! *:* (18)

When the three-level model includes a random slope of level one and a random slope of level two, the model easily includes many parameters (interaction and a residual effect by each random slope coefficient) that easily cause convergence

problems, except for sufficiently large data sets. Therefore, most three-level models have few random slope coefficients.

The equivalent matrix model for a three-level model is

$$Y = X\beta + \mathcal{W}u + Z\nu + e\tag{19}$$

Again, *X* is the design matrix of the fixed effect parameter vector *β* (containing the overall mean, main effects, and interactions), *W* is the design matrix of the random effects given by the vector *u*, *Z* is the design matrix of the random effects given by the vector *ν*, and *e* is the vector of level-one residual errors.

#### **2.3 Assumptions of multilevel models**

Statistical assumptions such as normal distribution, variance at each level, and independence between errors at different levels have been mentioned in the definition of the null multilevel model. They are explicitly defined in this section.

The dimension of the vector *u* depends on the number of random coefficients in the level-one equation; for example, in Eq. (14), the dimension is two. Similarly, the dimension of the vector *ν* depends on the number of random coefficients in the leveltwo equation; for example, in Eq. (15), the dimension is four. Let *<sup>e</sup>* <sup>¼</sup> ð Þ *<sup>e</sup>*111112<sup>⋯</sup> *<sup>T</sup>*, *<sup>u</sup>* <sup>¼</sup> *<sup>u</sup>*0*jk <sup>u</sup>*1*jk*⋯*usjk* � �*<sup>T</sup>* , and *ν* ¼ ð Þ *ν*00*<sup>k</sup>*⋯*ν*0*tk*⋯*ν<sup>t</sup>*0*<sup>k</sup>*⋯*νttk <sup>T</sup>* with dimensions *N*, *s*, and *t* 2, respectively.

Multilevel models' assumptions are:

$$
\begin{pmatrix} \nu \\ u \\ e \end{pmatrix} \sim N \left( \begin{pmatrix} \mathbf{0} \\ \mathbf{0} \\ \mathbf{0} \end{pmatrix}, \quad \begin{pmatrix} D & \mathbf{0} & \mathbf{0} \\ \mathbf{0} & G & \mathbf{0} \\ \mathbf{0} & \mathbf{0} & R \end{pmatrix} \right) \tag{20}
$$

where **0** is the vector of zeros with the appropriate dimension and

$$D = \begin{pmatrix} \sigma\_{\nu 0}^2 & \sigma\_{\nu 01} & \dots & \sigma\_{\nu 0t} \\ \sigma\_{\nu 01} & \sigma\_{\nu 1}^2 & \dots & \sigma\_{\nu 1t} \\ & \dots \\ & & \sigma\_{\nu 0t} & \dots & \sigma\_{\nu t}^2 \end{pmatrix}, \quad G = \begin{pmatrix} \sigma\_{\nu 0}^2 & \sigma\_{\nu 01} & \dots & \sigma\_{\nu 0t} \\ \sigma\_{\nu 01} & \sigma\_{\nu 1}^2 & \dots & \sigma\_{\nu 1t} \\ & \dots \\ & & \sigma\_{\nu 0s} & \sigma\_{\nu 1s} & \dots & \sigma\_{\nu t}^2 \end{pmatrix}, \quad R = \sigma^2 I \tag{21}$$

Eq. (20) says:


5.Level-one and level-three errors are independent *Cov e*ð Þ¼ , *ν* **0**.

6.Level-two and level-three errors are independent *Cov u*ð Þ¼ , *ν* **0**.
