**2. Error control in structure learning**

In real world applications, especially in modelling brain connectivity, graphical models are not only a tool for operations such as classification or prediction, but more often than not, it is the network structure of the model itself which is of particular interest. Thus a desirable graphical model of fMRI data should not only statistically fit the overall data well, but also accurately reflect the internal brain connectivity structure. Structure-learning algorithms must therefore control or assess the error rate of the connections/edges detected by them.

#### **2.1 Criteria for error control**

6 Will-be-set-by-IN-TECH

centering Signals A, B, and C Residuals of A and B

Fig. 5. Two signals A (blue) and B (green) show strong pair-wise correlation, but with a third signal C (red) being considered, the residuals of A and B after removing the projection onto C

of biomedical data typically emphasizes such features of reliability, interpretability and

For example, when brain connections are reported, it is important to control or assess error rates in the claimed discoveries, addressing questions such as "how many among the reported connections are actually true connections?" and "how many true connections can be

Additionally, the ultimate goal of a biomedical experiment is usually a population inference applicable to a group of people, such as patients with a particular disease. However, subjects classified to the same experimental group according to the factor of interest can still be highly diverse with respect to other factors, such as gender, age, or race. Even repetitive experiments with the same subject can still be affected by various physical or psychological factors, such as drowsiness or stress. It is therefore important to integrate the information from separate experiments to make inference on the target topic, and to keep a balance between

Finally, as a multidisciplinary field, end users of connectivity analysis reports are often biomedical researchers or clinicians who focus on the biological implication of the results and the effects of medication. Therefore, it is undesirable to simply generate a vast network of

hardly show any correlation.

generality of reported results.

commonality and diversity.

detected?"

Scatter plot of A and B Scatter plot of the residuals

There are two basic types of statistical errors: type I errors, ie. falsely claiming connections when they actually do not exist; and type II errors, ie. failure in detecting connections that truly exist. Since real data are not free from noise, limited samples may appear to support the existence of a connection when it does not exist, or vice versa. It is therefore impossible to absolutely prevent the two types of errors simultaneously, but rather keep a balance between them. This can be done by, for example, minimizing a loss function associated with the two types of errors according to Bayesian decision theory.

There are several criteria available for error-rate control (see Table 2). Generally there is no single criteria that is universally superior if the research scenario is not specified. Selecting the error rate is largely not an abstract question "which error rate is superior over others?", but a practical question "which error rate is the researchers' concern?". One error-rate criterion may be favored in one scenario while another may be right in a different scenario, for example:


property of graphical models, and treat error control as a problem of multiple testing. Since a graphical model is a graphical encoding of conditional-independence relationships, the non-adjacency between two nodes is tested by inspecting their conditional independence given other nodes. Conditional-independence relationships among node variables are tested one by one in a certain order, and p-values about the existence of each edge are estimated. Error control procedures, such as Bonferroni correction for the family-wise error rate, or the Benjamini-Hochberg procedure for the FDR, or without-correction for the type-I error rate, are applied to the p-values to set the cut-off threshold of accepting or rejecting the existence

Graphical Models of Functional MRI Data for Assessing Brain Connectivity 383

Recently, a series of papers have addressed the problem using the classical approach. Listgarten and Heckman (Listgarten & Heckerman, 2007) proposed a permutation method to estimate the number of spurious connections in a graph learned from data. The basic idea is to repetitively apply a structure learning algorithm to data simulated from the null hypotheses with permutation. In general, this method will work with any structure learning method, but permutation may make the already time-consuming structure learning problem even more computationally expensive, limiting its practical usage. Kalisch and Bühlmann (Kalisch & Bühlmann, 2007) in 2007 proved that for Gaussian Bayesian networks, by adaptively decreasing the type I error rate, as the sample size approaches infinity, the PC algorithm (Spirtes et al., 2001) can, without errors, recover the equivalence class of the underlying sparse directed acyclic graphs, even if the number of nodes grows exponentially as the sample size does. Tsamardinos and Brown (Tsamardinos & Brown, July, 2008) in 2008 applied the FDR-procedure separately to edges related to each node. Li and Wang (Li & Wang, 2009) in 2009 applied FDR-control procedures globally to all connections of interest, and proved that with mild conditions, their method is able to asymptotically control the FDR of the "claimed" edges. They showed by empirical experiments that in the cases of moderate sample size (about several hundred), the method is still able to control the FDR under the user-specified

Bayesian approaches control errors by inferring the posterior probability of edges given the data. If *G* is the learned graph and *Gi* is the true graph, then the spurious edges in *G* are those of *G* \ *Gi*, ie. the sub-graph of *G* after edges in *Gi* are removed. In this case, the *realized* FDR is |*G* \ *Gi*|/|*G*| where |•| denotes the number of edges in a graph, and the *realized* type-I error rate in this case is |*G* \ *Gi*|/|*Gf ull* \ *Gi*| where *Gf ull* is the fully connected graph. Since Bayesian inference assigns a probability to each possible model, the error rate of *G* given data *D* should be integrated over all possible *Gi* according to their posterior possibilities (Listgarten

*FDR*(*G*|*D*) = ∑

*<sup>α</sup>*(*G*|*D*) = ∑

*Gi*

*Gi*

where *P*(*Gi*|*D*) is the probability of a model structure *Gi* given data *D*. Similarly the posterior

As in many other Bayesian procedures, the most difficult part of the inference is not the formulation, but rather the calculation, and especially the integration. Because the number of possible graphs increases super-exponentially as the number of nodes increases (Steinsky, 2003), it is impractical to enumerate all the possibilities and sum them up. For certain prior distributions, given the order of nodes, Friedman and Koller (Friedman & Koller, 2003) in



<sup>|</sup>*G*<sup>|</sup> *<sup>P</sup>*(*Gi*|*D*), (2)

*P*(*Gi*|*D*). (3)

of edges.

level.

& Heckerman, 2007). Therefore we have:

type-I error rate is:


Table 1. Results of multiple hypothesis testing, categorized according to the claimed results and the truth.


Table 2. Criteria for multiple hypothesis testing. Here *E*(*x*) means the expected value of *x*, and *P*(A) means the probability of event A. Please refer to Table 1 for related notations. \* If *R*<sup>2</sup> = 0, FP/*R*<sup>2</sup> is defined to be 0.

truly associated with the disease. In this case, the FDR will be chosen as the error rate of interest and should be controlled under 5%.

• We are selecting electronic components to make a device. Any error in any component will cause the device to run out of order. To guarantee the device functions well with a probability higher than 99%, the family-wise error rate should be controlled under 1%.

Since the scenario favoring the false discovery rate (FDR) (Benjamini & Yekutieli, 2001; Storey, 2002) is common in exploratory research, the FDR has become an important and widely used criterion in many fields, such as in inferring brain connectivity. Simply controlling the type I and type II error rates at specified levels does not necessarily keep the FDR sufficiently low, especially in the case of large and sparse networks. For example, suppose a network includes 40 nodes where each interact in average with 3 other nodes, i.e. there are 60 edges in the network. Then an algorithm with the *realized* type I error rate = 5% and the *realized* power = 90% (i.e. the *realized* type II error rate = 10%) will recover a network with 60×90% = 54 correct connections and [40 × (40 − 1)/2 − 60] × 5% = 36 false connections, which means that 36/(36 + 54) = 40% of the claimed connections do not exist in the true network.

#### **2.2 Structure-learning methods with error controlled**

Score-based search methods (Heckerman et al., 1995) look for a suitable network structure by optimizing a certain criterion of goodness-of-fit, such as the Akaike information criterion (AIC) (Akaike, 1974), the Bayesian information criterion (BIC) (Schwarz, 1978), or the Bayesian Dirichlet likelihood equivalent metric (BDE) (Heckerman et al., 1995)), with a random walk (*e.g.* simulated annealing) or a greedy walk (*e.g.* hill-climbing). However, scores do not explicitly reflect the error rate of edges, and the sample sizes in real world applications are usually not enough to guarantee asymptotic performance.

Both classical and Bayesian approaches are available for controlling errors during network learning (Listgarten & Heckerman, 2007). Classical approaches are based on the Markov 8 Will-be-set-by-IN-TECH

Negative TN (true negative) FN (false negative) *R*<sup>1</sup> Positive FP (false positive) TP (true positive) *R*<sup>2</sup>

Table 1. Results of multiple hypothesis testing, categorized according to the claimed results

Table 2. Criteria for multiple hypothesis testing. Here *E*(*x*) means the expected value of *x*, and *P*(A) means the probability of event A. Please refer to Table 1 for related notations. \* If

truly associated with the disease. In this case, the FDR will be chosen as the error rate of

• We are selecting electronic components to make a device. Any error in any component will cause the device to run out of order. To guarantee the device functions well with a probability higher than 99%, the family-wise error rate should be controlled under 1%. Since the scenario favoring the false discovery rate (FDR) (Benjamini & Yekutieli, 2001; Storey, 2002) is common in exploratory research, the FDR has become an important and widely used criterion in many fields, such as in inferring brain connectivity. Simply controlling the type I and type II error rates at specified levels does not necessarily keep the FDR sufficiently low, especially in the case of large and sparse networks. For example, suppose a network includes 40 nodes where each interact in average with 3 other nodes, i.e. there are 60 edges in the network. Then an algorithm with the *realized* type I error rate = 5% and the *realized* power = 90% (i.e. the *realized* type II error rate = 10%) will recover a network with 60×90% = 54 correct connections and [40 × (40 − 1)/2 − 60] × 5% = 36 false connections, which means that

36/(36 + 54) = 40% of the claimed connections do not exist in the true network.

Score-based search methods (Heckerman et al., 1995) look for a suitable network structure by optimizing a certain criterion of goodness-of-fit, such as the Akaike information criterion (AIC) (Akaike, 1974), the Bayesian information criterion (BIC) (Schwarz, 1978), or the Bayesian Dirichlet likelihood equivalent metric (BDE) (Heckerman et al., 1995)), with a random walk (*e.g.* simulated annealing) or a greedy walk (*e.g.* hill-climbing). However, scores do not explicitly reflect the error rate of edges, and the sample sizes in real world applications are

Both classical and Bayesian approaches are available for controlling errors during network learning (Listgarten & Heckerman, 2007). Classical approaches are based on the Markov

Full Name Abbrev. Definition False Discovery Rate (Benjamini & Yekutieli, 2001) FDR *E*(FP/*R*2)\* Positive False Discovery Rate (Storey, 2002) pFDR *E*(FP/*R*2|*R*<sup>2</sup> *>* 0) Family-Wise Error Rate FWER *P*(FP ≥ 1) Type I Error Rate (False Positive Rate) *α E*(FP/*T*1) Specificity (True Negative Rate) 1 − *α E*(TN/*T*1) Type II Error Rate (False Negative Rate) *β E*(FN/*T*2) Power (Sensitivity, True Positive Rate) 1 − *β E*(TP/*T*2) Positive Predictive Value PPV *E*(TP/*R*2)

Negative Positive Total

Test Results Truth

and the truth.

*R*<sup>2</sup> = 0, FP/*R*<sup>2</sup> is defined to be 0.

interest and should be controlled under 5%.

**2.2 Structure-learning methods with error controlled**

usually not enough to guarantee asymptotic performance.

Total *T*<sup>1</sup> *T*<sup>2</sup>

property of graphical models, and treat error control as a problem of multiple testing. Since a graphical model is a graphical encoding of conditional-independence relationships, the non-adjacency between two nodes is tested by inspecting their conditional independence given other nodes. Conditional-independence relationships among node variables are tested one by one in a certain order, and p-values about the existence of each edge are estimated. Error control procedures, such as Bonferroni correction for the family-wise error rate, or the Benjamini-Hochberg procedure for the FDR, or without-correction for the type-I error rate, are applied to the p-values to set the cut-off threshold of accepting or rejecting the existence of edges.

Recently, a series of papers have addressed the problem using the classical approach. Listgarten and Heckman (Listgarten & Heckerman, 2007) proposed a permutation method to estimate the number of spurious connections in a graph learned from data. The basic idea is to repetitively apply a structure learning algorithm to data simulated from the null hypotheses with permutation. In general, this method will work with any structure learning method, but permutation may make the already time-consuming structure learning problem even more computationally expensive, limiting its practical usage. Kalisch and Bühlmann (Kalisch & Bühlmann, 2007) in 2007 proved that for Gaussian Bayesian networks, by adaptively decreasing the type I error rate, as the sample size approaches infinity, the PC algorithm (Spirtes et al., 2001) can, without errors, recover the equivalence class of the underlying sparse directed acyclic graphs, even if the number of nodes grows exponentially as the sample size does. Tsamardinos and Brown (Tsamardinos & Brown, July, 2008) in 2008 applied the FDR-procedure separately to edges related to each node. Li and Wang (Li & Wang, 2009) in 2009 applied FDR-control procedures globally to all connections of interest, and proved that with mild conditions, their method is able to asymptotically control the FDR of the "claimed" edges. They showed by empirical experiments that in the cases of moderate sample size (about several hundred), the method is still able to control the FDR under the user-specified level.

Bayesian approaches control errors by inferring the posterior probability of edges given the data. If *G* is the learned graph and *Gi* is the true graph, then the spurious edges in *G* are those of *G* \ *Gi*, ie. the sub-graph of *G* after edges in *Gi* are removed. In this case, the *realized* FDR is |*G* \ *Gi*|/|*G*| where |•| denotes the number of edges in a graph, and the *realized* type-I error rate in this case is |*G* \ *Gi*|/|*Gf ull* \ *Gi*| where *Gf ull* is the fully connected graph. Since Bayesian inference assigns a probability to each possible model, the error rate of *G* given data *D* should be integrated over all possible *Gi* according to their posterior possibilities (Listgarten & Heckerman, 2007). Therefore we have:

$$FDR(G|D) = \sum\_{G\_l} \frac{|G \backslash G\_l|}{|G|} P(G\_l|D),\tag{2}$$

where *P*(*Gi*|*D*) is the probability of a model structure *Gi* given data *D*. Similarly the posterior type-I error rate is:

$$\mathfrak{a}(G|D) = \sum\_{\mathcal{G}\_l} \frac{|\mathcal{G} \backslash \mathcal{G}\_l|}{|\mathcal{G}\_{full}\backslash \mathcal{G}\_l|} P(\mathcal{G}\_l|D). \tag{3}$$

As in many other Bayesian procedures, the most difficult part of the inference is not the formulation, but rather the calculation, and especially the integration. Because the number of possible graphs increases super-exponentially as the number of nodes increases (Steinsky, 2003), it is impractical to enumerate all the possibilities and sum them up. For certain prior distributions, given the order of nodes, Friedman and Koller (Friedman & Koller, 2003) in

0.5 0.6 0.7 0.8 0.9 <sup>1</sup> <sup>0</sup>

 

PC-fdr\* PC-fdr

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4

 - --

-

"--

 ---



-

 

\$


!

#!!

%-

 -- 


ABC

Graphical Models of Functional MRI Data for Assessing Brain Connectivity 385

Fig. 6. Simulation results of the FDR control with the Bayesian and classical approaches. Sub-figure A and B are results in (Listgarten & Heckerman, 2007). Their x-axes and y-axes are the expected and realized positive predictive values (PPV) respectively. (The relationship between PPV and FDR is PPV = 1 - FDR.) The curves of the Bayesian approach was smoother and more favorable than those of the classical approach. When the expected PPV is high, or equivalently the expected FDR is low, the classical approach performed reasonably well. Sub-figure C is the result in (Li & Wang, 2009) to control the FDR at the conventional level of 5% with the classical approach. The x-axis is the strength of the conditional-dependence relationships among node variables, and the y-axis is the realized FDR. The PCfdr algorithm controlled the FDR under 5%, and its heuristic modification, the PCfdr\* algorithm, controlled the FDR satisfactorily around 5%. For details of the two simulation studies, please refer to

 

Since graphical models combine network structures and probability descriptions, the group analysis needs also accommodate commonality and diversity with both model structures and probability parameters. A review of the literature shows that current group-analysis methods based on graphical models can be classified into three broad categories (see Fig. 7),

First, we could ignore subject diversity, and assume that the brains of all the subjects are structured and function in a similar way, as if there is a virtual typical subject able to satisfactorily represent the whole group. This can be called the "virtual typical subject" approach. In this approach, the model for every subject has the same structure, and the

Fig. 7. Three broad categories of group-analysis methods. The "virtual typical" approach constructs a typical subject to represent the whole group, usually by pooling or averaging data of subjects. The "common-structure" approach imposes the same network structure to the model of every subject, and usually uses mixed-effect models to handle the parameter variability among subjects. The "individual-model" approach allows each individual subject to have its own model, and integrates the individual models with a group-level model.

(Listgarten & Heckerman, 2007) and (Li & Wang, 2009).



 

 

as discussed as follows (Li et al., 2008).

 


 - 

2003 derived a formula that can calculate the exact posterior probability of a structure feature with the computational complexity bounded by *O*(*NDin*+1), where *N* is the number of nodes and *Din* is the upper bound of node in-degrees. Considering similar prior distributions, but without the restriction on the order of nodes, Koivisto and Sood (Koivisto & Sood, 2004) in 2004 developed a fast exact Bayesian inference algorithm based on dynamic programming that is able to compute the exact posterior probability of a sub-network with the computational complexity bounded by approximately *O*(*N*2*N*). In practice, this algorithm runs fairly fast when the number of nodes is less than 25. For networks with more than about 30 vertices, the authors suggested setting more restrictions or combining with inexact techniques. For general situations, the posterior probability of a structure feature can be estimated with Markov chain Monte Carlo (MCMC) methods (Madigan et al., 1995). As a versatile implementation of Bayesian inference, the MCMC method can estimate the posterior probability given any prior probability distribution. However, MCMC usually requires intensive computation and the results may depend on the initial state.

In Listgarten and Heckman's simulation (2007) (Listgarten & Heckerman, 2007), the error-control curves of the Bayesian approach was smoother and more favorable than those of the classical approach, as show in Fig. 6-A and Fig. 6-B. However, it was also pointed out that in practice the expected FDR of interest usually is very small, within a narrow range near 0, and that the classical approach showed reasonable performance in this range. (The axes of Fig. 6-A and Fig. 6-B are marked with the positive predictive value (PPV) instead of the FDR. The relationship between PPV and FDR is *PPV* = 1 − *FDR*, as in Table 2.) In Li and Wang's simulation (2009) (Li & Wang, 2009), to control the FDR at the conventional level of 5%, their classical approaches, the PCfdr algorithm and its heuristic modification, the PCfdr\* algorithm, controlled the FDR satisfactorily around the expected level 5%, as shown in Fig. 6-C. For inferring brain connectivity, since brain regions are not just algebraically isolated variables, but rather located in a three-dimension space with complex geometric structure, it may be important in the future to exploit such geometric information for improving error control.

#### **3. Group analysis**

Biomedical experiments are usually conducted to verify or discover knowledge about a population characterized by health or certain disease state. However, subjects classified to the same group can still be highly diverse with respect to factors such as gender, age, or race. With careful experiment design, the effect of these confounding factors can be reduced, but inter-subject variability still plays an important role and remains a challenge. Even studies on a single subject may still face challenges related to variability. For example, EEG recordings conducted at different times from the same subject can be affected by the subject's physical or psychological state, such as drowsiness or stress. Thus in this paper, the term "group analysis" is not restricted to the analysis of a group of people, but generalized to the inference by integrating the information distributed in separate experiments and affected by cross-experiment variability.

#### **3.1 Commonality and diversity at different levels**

Two basic concepts in group analysis are *commonality and* diversity. For example, all doctors learn professional knowledge related to medicine, but a doctor could be a pediatrician, a surgeon or a physician. Each one has their own speciality and this is the diversity among them. Commonality and diversity usually co-exist, and are revealed at different levels, depending on the perspective and scale we study the problem.

10 Will-be-set-by-IN-TECH

2003 derived a formula that can calculate the exact posterior probability of a structure feature with the computational complexity bounded by *O*(*NDin*+1), where *N* is the number of nodes and *Din* is the upper bound of node in-degrees. Considering similar prior distributions, but without the restriction on the order of nodes, Koivisto and Sood (Koivisto & Sood, 2004) in 2004 developed a fast exact Bayesian inference algorithm based on dynamic programming that is able to compute the exact posterior probability of a sub-network with the computational complexity bounded by approximately *O*(*N*2*N*). In practice, this algorithm runs fairly fast when the number of nodes is less than 25. For networks with more than about 30 vertices, the authors suggested setting more restrictions or combining with inexact techniques. For general situations, the posterior probability of a structure feature can be estimated with Markov chain Monte Carlo (MCMC) methods (Madigan et al., 1995). As a versatile implementation of Bayesian inference, the MCMC method can estimate the posterior probability given any prior probability distribution. However, MCMC usually requires intensive computation and the

In Listgarten and Heckman's simulation (2007) (Listgarten & Heckerman, 2007), the error-control curves of the Bayesian approach was smoother and more favorable than those of the classical approach, as show in Fig. 6-A and Fig. 6-B. However, it was also pointed out that in practice the expected FDR of interest usually is very small, within a narrow range near 0, and that the classical approach showed reasonable performance in this range. (The axes of Fig. 6-A and Fig. 6-B are marked with the positive predictive value (PPV) instead of the FDR. The relationship between PPV and FDR is *PPV* = 1 − *FDR*, as in Table 2.) In Li and Wang's simulation (2009) (Li & Wang, 2009), to control the FDR at the conventional level of 5%, their classical approaches, the PCfdr algorithm and its heuristic modification, the PCfdr\* algorithm, controlled the FDR satisfactorily around the expected level 5%, as shown in Fig. 6-C. For inferring brain connectivity, since brain regions are not just algebraically isolated variables, but rather located in a three-dimension space with complex geometric structure, it may be important in the future to exploit such geometric information for improving error control.

Biomedical experiments are usually conducted to verify or discover knowledge about a population characterized by health or certain disease state. However, subjects classified to the same group can still be highly diverse with respect to factors such as gender, age, or race. With careful experiment design, the effect of these confounding factors can be reduced, but inter-subject variability still plays an important role and remains a challenge. Even studies on a single subject may still face challenges related to variability. For example, EEG recordings conducted at different times from the same subject can be affected by the subject's physical or psychological state, such as drowsiness or stress. Thus in this paper, the term "group analysis" is not restricted to the analysis of a group of people, but generalized to the inference by integrating the information distributed in separate experiments and affected by

Two basic concepts in group analysis are *commonality and* diversity. For example, all doctors learn professional knowledge related to medicine, but a doctor could be a pediatrician, a surgeon or a physician. Each one has their own speciality and this is the diversity among them. Commonality and diversity usually co-exist, and are revealed at different levels,

results may depend on the initial state.

**3. Group analysis**

cross-experiment variability.

**3.1 Commonality and diversity at different levels**

depending on the perspective and scale we study the problem.

Fig. 6. Simulation results of the FDR control with the Bayesian and classical approaches. Sub-figure A and B are results in (Listgarten & Heckerman, 2007). Their x-axes and y-axes are the expected and realized positive predictive values (PPV) respectively. (The relationship between PPV and FDR is PPV = 1 - FDR.) The curves of the Bayesian approach was smoother and more favorable than those of the classical approach. When the expected PPV is high, or equivalently the expected FDR is low, the classical approach performed reasonably well. Sub-figure C is the result in (Li & Wang, 2009) to control the FDR at the conventional level of 5% with the classical approach. The x-axis is the strength of the conditional-dependence relationships among node variables, and the y-axis is the realized FDR. The PCfdr algorithm controlled the FDR under 5%, and its heuristic modification, the PCfdr\* algorithm, controlled the FDR satisfactorily around 5%. For details of the two simulation studies, please refer to (Listgarten & Heckerman, 2007) and (Li & Wang, 2009).

Fig. 7. Three broad categories of group-analysis methods. The "virtual typical" approach constructs a typical subject to represent the whole group, usually by pooling or averaging data of subjects. The "common-structure" approach imposes the same network structure to the model of every subject, and usually uses mixed-effect models to handle the parameter variability among subjects. The "individual-model" approach allows each individual subject to have its own model, and integrates the individual models with a group-level model.

Since graphical models combine network structures and probability descriptions, the group analysis needs also accommodate commonality and diversity with both model structures and probability parameters. A review of the literature shows that current group-analysis methods based on graphical models can be classified into three broad categories (see Fig. 7), as discussed as follows (Li et al., 2008).

First, we could ignore subject diversity, and assume that the brains of all the subjects are structured and function in a similar way, as if there is a virtual typical subject able to satisfactorily represent the whole group. This can be called the "virtual typical subject" approach. In this approach, the model for every subject has the same structure, and the

This property allow the group to be updated incrementally without re-learning individual models when new data are added. Niculescu-Mizil and Caruana proposed a heuristic method to link individual models (Niculescu-Mizil & Caruana, 2007). They allow network structures to be different across subjects, and punish excessive diversity among the structures with a

Graphical Models of Functional MRI Data for Assessing Brain Connectivity 387

A trade-off between the two extremes is assuming that subjects' brains are structured similarly, but function with considerable difference. This can be referred as the "common-structure" approach (Mechelli et al., 2002). In this approach, the model of every subject share the same network structure, but the parameters can be different from subject to subject. When the common model structure is specified, this approach focuses on dealing with the model parameters across subjects. The standard method is the mixed-effect model (Mumford & Nichols, 2006) as follows. Consider an experiment where there are *n* subjects and for each subject, indexed by *k*, a regression parameter *β<sup>k</sup>* modeling the relationship between a response variable *Yk* and an explanatory variables *Xk*. The first-level model for each individual subject

where *�<sup>k</sup>* is the within-subject randomness following Gaussian distribution *N*(0, *Vk*). The

where *β<sup>g</sup>* is the group-level parameter, *Xg* is the group design matrix and *�<sup>g</sup>* is the cross-subject randomness following Gaussian distribution *N*(0, *Vg*). The combination of Eqs. (4) and (5) is called a mixed model, because both the within-subject and the cross-subject randomness are considered in the model. A notable issue in mixed-effect models is that the cross-subject variance *Vg* could be negative in maximum likelihood estimation. To avoid this undesirable and counter-intuition phenomenon, the random-effect variance is usually enforced to be positive. The mixed-effect model in practice usually is solved with the summary-statistics

*<sup>k</sup> Xk*)−1*X<sup>T</sup>*

*g*,

unnecessarily from *βk*s. The summary-statistics approach decomposes a complicated model into two relatively easy stages, and retains the estimation for each single subject even when new subjects are added into the analysis. The summary-statistics approach assumes that *V*∗

is known, but in practice it should be estimated from the data. The estimation, including both its value and its degree of freedom, is challenging and has attracted much research attention. Methods such as Restricted Maximum Likelihood (Harville, 1977), Smoothing with

*<sup>k</sup>* is the least-square-estimation of the parameter for each subject, and *�*<sup>∗</sup>

*<sup>k</sup> <sup>V</sup>*−<sup>1</sup>

<sup>⎦</sup> <sup>=</sup> *Xgβ<sup>g</sup>* <sup>+</sup> *�<sup>g</sup>* <sup>+</sup> *<sup>β</sup>*<sup>ˆ</sup> <sup>−</sup> *<sup>β</sup>*

*β* =

approach that reformulates the model as Eqs. (6) and (7):

*<sup>g</sup>* ). Given *V*<sup>∗</sup>

*β*ˆ *<sup>k</sup>* = (*X<sup>T</sup>*

*β*ˆ =

⎡ ⎢ ⎣ *β*ˆ 1 . . . *β*ˆ*n*

= *Xgβ<sup>g</sup>* + *�*∗

*<sup>g</sup>* (the variance of *�*<sup>∗</sup>

⎡ ⎢ ⎣ *β*1 . . . *βn*

*<sup>k</sup> <sup>V</sup>*−<sup>1</sup>

⎤ ⎥ ⎤ ⎥

*Yk* = *Xkβ<sup>k</sup>* + *�k*, for *k* = 1, 2, ··· , *n*, (4)

<sup>⎦</sup> <sup>=</sup> *Xgβ<sup>g</sup>* <sup>+</sup> *�g*, (5)

*<sup>k</sup> Yk*, (6)

*<sup>g</sup>*), *β<sup>g</sup>* can be solved from the estimation of *βk*s,

(7)

*g*

*<sup>g</sup>* <sup>=</sup> *�<sup>g</sup>* <sup>+</sup> *<sup>β</sup>*<sup>ˆ</sup> <sup>−</sup> *<sup>β</sup>*

tunable parameter.

second-level model for group parameters is

is

where *β*ˆ

following *N*(0, *V*∗

Fig. 8. Product of two Gaussian distributions. Thin lines are the contour of two bivariate Gaussian distributions, and the bold lines are the contour of their product. If the parameter likelihood for the data of two subjects is the two bivariate Gaussian distributions, then the parameter likelihood for the pooled data is the product distribution. In sub-figures B and C, the center of the product distribution is located off the x-axis, while neither of the two bivariate Gaussian distributions is centred off the x-axis. This is an undesirable and misleading phenomenon for data pooling. PPC is the abbreviation for partial correlation coefficient; *mx* is the mean of x, and *my* is that of y. This figure is from (Kasess et al., 2010).

same parameters as well. The "virtual-typical" subject is usually constructed by pooling or averaging the group data, and then one model is learned from the data. Technically this degrades group analysis to learning a model for a single subject, for which both classical (Heckerman et al., 1995) and Bayesian (Neumann & Lohmann, 2003) approaches have been developed. When the group is homogeneous or inter-subject variability follows certain regular distributions, this approach could increase detection sensitivity, because pooling can build a relatively large data set, and averaging can enhance the signal-to-noise ratio (Kasess et al., 2010). However, when the group becomes more heterogeneous, this approach could lead to undesirable and misleading results (Kasess et al., 2010). Fig. 8 shows that by pooling data together, the group estimation of connection parameters could be located far way from the center of individual estimations.

The other extreme is that we assume subjects can be completely different from each other – the "individual-model approach". In this approach, the model of each subject can be completely different. This approach is related to the concept of functional degeneracy, ie. "the ability of elements that are structurally different to perform the same function or yield the same output" (Edelman & Gally, 2001), or more plainly "there are multiple ways of completing the same task" (Price & Friston, 2002). Because subjects in the same group are considered to share similarity, their individual models must be linked together in a certain way, usually by a second-level group model built over the individual models.

The most straight-forward implementation of the "individual-model approach" is to directly input separately learned individual models as subject features into a second-level analysis. For example, structural features of the network of individual models can be selected with classification and cross-validation procedures at the group level, as applied in (Li et al., 2008). A theoretically elegant approach is to build a group-level model to describe the diversity distribution in a group, and the group-level and the individual-level models form a big model over the group data. Usually this big integrated model should be learned from the batch of group data, which would require an intensive computation. The Bayesian group model proposed by Stefan, etc. (Stephan et al., 2009) provides a rigorous theoretical background and is able to break the model learning into two separate stages, the individual and group stages. 12 Will-be-set-by-IN-TECH

Fig. 8. Product of two Gaussian distributions. Thin lines are the contour of two bivariate Gaussian distributions, and the bold lines are the contour of their product. If the parameter likelihood for the data of two subjects is the two bivariate Gaussian distributions, then the parameter likelihood for the pooled data is the product distribution. In sub-figures B and C, the center of the product distribution is located off the x-axis, while neither of the two bivariate Gaussian distributions is centred off the x-axis. This is an undesirable and misleading phenomenon for data pooling. PPC is the abbreviation for partial correlation coefficient; *mx* is the mean of x, and *my* is that of y. This figure is from (Kasess et al., 2010).

same parameters as well. The "virtual-typical" subject is usually constructed by pooling or averaging the group data, and then one model is learned from the data. Technically this degrades group analysis to learning a model for a single subject, for which both classical (Heckerman et al., 1995) and Bayesian (Neumann & Lohmann, 2003) approaches have been developed. When the group is homogeneous or inter-subject variability follows certain regular distributions, this approach could increase detection sensitivity, because pooling can build a relatively large data set, and averaging can enhance the signal-to-noise ratio (Kasess et al., 2010). However, when the group becomes more heterogeneous, this approach could lead to undesirable and misleading results (Kasess et al., 2010). Fig. 8 shows that by pooling data together, the group estimation of connection parameters could be located far way from

The other extreme is that we assume subjects can be completely different from each other – the "individual-model approach". In this approach, the model of each subject can be completely different. This approach is related to the concept of functional degeneracy, ie. "the ability of elements that are structurally different to perform the same function or yield the same output" (Edelman & Gally, 2001), or more plainly "there are multiple ways of completing the same task" (Price & Friston, 2002). Because subjects in the same group are considered to share similarity, their individual models must be linked together in a certain way, usually by

The most straight-forward implementation of the "individual-model approach" is to directly input separately learned individual models as subject features into a second-level analysis. For example, structural features of the network of individual models can be selected with classification and cross-validation procedures at the group level, as applied in (Li et al., 2008). A theoretically elegant approach is to build a group-level model to describe the diversity distribution in a group, and the group-level and the individual-level models form a big model over the group data. Usually this big integrated model should be learned from the batch of group data, which would require an intensive computation. The Bayesian group model proposed by Stefan, etc. (Stephan et al., 2009) provides a rigorous theoretical background and is able to break the model learning into two separate stages, the individual and group stages.

the center of individual estimations.

a second-level group model built over the individual models.

This property allow the group to be updated incrementally without re-learning individual models when new data are added. Niculescu-Mizil and Caruana proposed a heuristic method to link individual models (Niculescu-Mizil & Caruana, 2007). They allow network structures to be different across subjects, and punish excessive diversity among the structures with a tunable parameter.

A trade-off between the two extremes is assuming that subjects' brains are structured similarly, but function with considerable difference. This can be referred as the "common-structure" approach (Mechelli et al., 2002). In this approach, the model of every subject share the same network structure, but the parameters can be different from subject to subject. When the common model structure is specified, this approach focuses on dealing with the model parameters across subjects. The standard method is the mixed-effect model (Mumford & Nichols, 2006) as follows. Consider an experiment where there are *n* subjects and for each subject, indexed by *k*, a regression parameter *β<sup>k</sup>* modeling the relationship between a response variable *Yk* and an explanatory variables *Xk*. The first-level model for each individual subject is

$$\mathbf{Y}\_{k} = \mathbf{X}\_{k}\boldsymbol{\beta}\_{k} + \boldsymbol{\varepsilon}\_{k} \text{ for } k = 1, 2, \cdots \text{ }, n \text{ }\tag{4}$$

where *�<sup>k</sup>* is the within-subject randomness following Gaussian distribution *N*(0, *Vk*). The second-level model for group parameters is

$$\boldsymbol{\beta} = \begin{bmatrix} \boldsymbol{\beta}\_1 \\ \vdots \\ \boldsymbol{\beta}\_n \end{bmatrix} = \boldsymbol{X}\_{\mathcal{S}} \boldsymbol{\beta}\_{\mathcal{S}} + \boldsymbol{\varepsilon}\_{\mathcal{S}'} \tag{5}$$

where *β<sup>g</sup>* is the group-level parameter, *Xg* is the group design matrix and *�<sup>g</sup>* is the cross-subject randomness following Gaussian distribution *N*(0, *Vg*). The combination of Eqs. (4) and (5) is called a mixed model, because both the within-subject and the cross-subject randomness are considered in the model. A notable issue in mixed-effect models is that the cross-subject variance *Vg* could be negative in maximum likelihood estimation. To avoid this undesirable and counter-intuition phenomenon, the random-effect variance is usually enforced to be positive. The mixed-effect model in practice usually is solved with the summary-statistics approach that reformulates the model as Eqs. (6) and (7):

$$\hat{\beta}\_k = (\mathbf{X}\_k^T V\_k^{-1} \mathbf{X}\_k)^{-1} \mathbf{X}\_k^T V\_k^{-1} \mathbf{Y}\_{k\prime} \tag{6}$$

$$\begin{aligned} \boldsymbol{\hat{\beta}} &= \begin{bmatrix} \boldsymbol{\hat{\beta}}\_1 \\ \vdots \\ \boldsymbol{\hat{\beta}}\_n \end{bmatrix} = \boldsymbol{X}\_{\mathcal{S}} \boldsymbol{\beta}\_{\mathcal{S}} + \boldsymbol{\varepsilon}\_{\mathcal{S}} + \boldsymbol{\hat{\beta}} - \boldsymbol{\beta} \\ &= \boldsymbol{X}\_{\mathcal{S}} \boldsymbol{\beta}\_{\mathcal{S}} + \boldsymbol{\varepsilon}\_{\mathcal{S}'}^\* \end{aligned} \tag{7}$$

where *β*ˆ *<sup>k</sup>* is the least-square-estimation of the parameter for each subject, and *�*<sup>∗</sup> *<sup>g</sup>* <sup>=</sup> *�<sup>g</sup>* <sup>+</sup> *<sup>β</sup>*<sup>ˆ</sup> <sup>−</sup> *<sup>β</sup>* following *N*(0, *V*∗ *<sup>g</sup>* ). Given *V*<sup>∗</sup> *<sup>g</sup>* (the variance of *�*<sup>∗</sup> *<sup>g</sup>*), *β<sup>g</sup>* can be solved from the estimation of *βk*s, unnecessarily from *βk*s. The summary-statistics approach decomposes a complicated model into two relatively easy stages, and retains the estimation for each single subject even when new subjects are added into the analysis. The summary-statistics approach assumes that *V*∗ *g* is known, but in practice it should be estimated from the data. The estimation, including both its value and its degree of freedom, is challenging and has attracted much research attention. Methods such as Restricted Maximum Likelihood (Harville, 1977), Smoothing with

in their applications in brain connectivity, as discussed in (Bullmore & Sporns, 2009; Stam &

Graphical Models of Functional MRI Data for Assessing Brain Connectivity 389

The history of graph theory can be traced back to nearly three hundred years ago, marked by preeminent Swiss mathematician Euler's paper on the Seven Bridges of Königsberg. Its application to real-world complex networks was boosted at the end of last century by a series of discoveries on the architecture of world-wide-web, social networks, cellular networks, etc. These systems, despite their tremendous variety, share certain common properties, such as the "small world" (Watts & Strogatz, 1998), the "scale free" (Barabasi & Albert, 1999) and the "self-similarity" (Song et al., 2005) properties. These properties might hint how these networks evolve and grow, and are also related to their functions and interactions with the

Some well-know properties such as the "scale-free" or "self-similarity" properties are more suitable for large networks, than for networks of moderate size (with dozens of nodes), because their statistics need large scale observations. However, the number of time points of an fMRI scan is relatively small, approximately several hundred, and cannot support reliably discovery of large scale networks. Therefore, in this section, we focus on network analysis

Graphs can be studied at different levels from basically nodes and edges, to paths, or more intricately, sub-graphs. According to the object of interest, network measures can be broadly classified into two categories: (1) local measures that focus on local objects in the network, for example, a node, an edge, or a sub-graph, and (2) global measures that feature the pattern of the overall architecture. Local measures usually, yet not necessarily, put a local object in the global view. For example, the importance of a node could be defined as the proportion of communication in the whole network that must go through it. Vice versa, global measures are usually built on local features. For example, the "scale-free" property is about the distribution of node degrees. Fig. 9 illustrates those network measures listed below. Most network measures are ultimately linked to fundamental concepts such as node degree and path length. • **Centrality and local contribution to network communication.** A local object, for instance a node or an edge, that plays an important role in network communication is considered to be central in the network. The centrality of a node can be measured by its relay of the communication between other nodes. For example, betweenness centrality (Freeman, 1977) is based on the number of shortest paths between other nodes passing through a node. It can also be assessed by the geodesic distance to other nodes, as closeness centrality (Beauchamp, 1965) does, or by deleting a node and then comparing the connectivity loss of the "impaired" network, as Shapley ratings (Kötter et al., 2007) do. Similar ideas can be applied to define the centrality of an edge or a sub-graph. Some measures are not as intuitive as the aforementioned ones: for instance, eigenvector centrality (Bonacich, 1972) and sub-graph centrality (Estrada & Rodrguez-Velzquez, 2005) are also implemented for

• **Modularity and brain function organization.** It is believed that various cognitive functions are localized in different brain regions, and that these distributed functions are integrated together for complicated information processing. Such a perspective on brain function organization naturally leads to a network structure where some nodes are densely clustered and form function modules. The "small-world" property, at the global level,

Reijneveld, 2007).

**4.1 Network measures**

the same concept.

environment (Watts & Strogatz, 1998).

suitable for brain connectivity networks of moderate size.

fixed-effect model (Worsley et al., 2002) , and Markov Chain Monte Carlo (Woolrich et al., 2004) have been developed.

Though the aforementioned methods deal with group commonality and diversity with various techniques, most take or can be considered to be in a two-level framework: a lower level of models for each individual subject, and a group level integrating individual models and describing inter inter-subject commonality and diversity. The group level could enforce strong commonality, like the "virtual-typical" approach, or model group diversity probabilistically, like the "individual-model" approach in (Stephan et al., 2009). Models able to technically decouple the computation of the individual and the group levels are favoured.

#### **3.2 Desirable features of graph analysis methods**

To set clear goals for the future development of more advanced group-analysis methods, we suggest three highly desirable features: being modular, being incrementally updatable, and being scalable. Being modular means that a group-analysis method is not only designed for a particular type of single-subject model, but versatile and applicable to different types of single-subject models. For example, both Bayesian networks and structural equation models are applicable at the subject level, so the group-analysis method should not be restricted to only one of them, but should be able to work with both of them, though not necessarily with a mixture of them. If the group-level model just needs inputs such as the likelihood of individual models, then it is free from the specific format of the individual models. If a group-analysis model can be a module of itself, then it will be able to handle multi-level hierarchical group structures.

Being incrementally updatable means that group-inference results can be summarized as summary statistics and used for further analysis involving newly collected data. This feature is very useful in research practice because experimental data are usually collected incrementally. For example, after a study on eighty subjects half a year ago, twenty more subjects might be recruited. In this case, it may require cumbersome computation to analyze the entire data of one hundred subjects. However, if the group inference is incrementally updatable, it may need much less computation to include the additional twenty subjects.

Being scalable means that a group-analysis method can handle fast growing diversity among subjects. Because modern exploratory research usually involves investigation of a large number of candidate models, scalability has become a highly desirable feature for group analysis. For example, if the connectivity between ten brain regions is studied with Bayesian networks, then a group-analysis method should be able to handle the diversity of about 3.1 <sup>×</sup> 1017 (Steinsky, 2003) possible network structures.

#### **4. Network analysis**

Modelling is only the first step to investigate a system, following which humanunderstandable information should be further extracted from models to provide insightful understanding. For example, as final readers of a report on brain connectivity, neurologists might be interested in questions such as "which brain regions play the central role in conducting a functional task?", "in what patterns are cognitive functions segregated and integrated among brain regions?" or "how does this brain connectivity network react to the presence of a disease?" Simply reporting a vast and plain network without any highlights does not answer these questions. Graphical models notably have visualized network structures, so it is natural to analyze their structures as an important post-processing in their applications in brain connectivity, as discussed in (Bullmore & Sporns, 2009; Stam & Reijneveld, 2007).

The history of graph theory can be traced back to nearly three hundred years ago, marked by preeminent Swiss mathematician Euler's paper on the Seven Bridges of Königsberg. Its application to real-world complex networks was boosted at the end of last century by a series of discoveries on the architecture of world-wide-web, social networks, cellular networks, etc. These systems, despite their tremendous variety, share certain common properties, such as the "small world" (Watts & Strogatz, 1998), the "scale free" (Barabasi & Albert, 1999) and the "self-similarity" (Song et al., 2005) properties. These properties might hint how these networks evolve and grow, and are also related to their functions and interactions with the environment (Watts & Strogatz, 1998).

Some well-know properties such as the "scale-free" or "self-similarity" properties are more suitable for large networks, than for networks of moderate size (with dozens of nodes), because their statistics need large scale observations. However, the number of time points of an fMRI scan is relatively small, approximately several hundred, and cannot support reliably discovery of large scale networks. Therefore, in this section, we focus on network analysis suitable for brain connectivity networks of moderate size.

#### **4.1 Network measures**

14 Will-be-set-by-IN-TECH

fixed-effect model (Worsley et al., 2002) , and Markov Chain Monte Carlo (Woolrich et al.,

Though the aforementioned methods deal with group commonality and diversity with various techniques, most take or can be considered to be in a two-level framework: a lower level of models for each individual subject, and a group level integrating individual models and describing inter inter-subject commonality and diversity. The group level could enforce strong commonality, like the "virtual-typical" approach, or model group diversity probabilistically, like the "individual-model" approach in (Stephan et al., 2009). Models able to technically decouple the computation of the individual and the group levels are favoured.

To set clear goals for the future development of more advanced group-analysis methods, we suggest three highly desirable features: being modular, being incrementally updatable, and being scalable. Being modular means that a group-analysis method is not only designed for a particular type of single-subject model, but versatile and applicable to different types of single-subject models. For example, both Bayesian networks and structural equation models are applicable at the subject level, so the group-analysis method should not be restricted to only one of them, but should be able to work with both of them, though not necessarily with a mixture of them. If the group-level model just needs inputs such as the likelihood of individual models, then it is free from the specific format of the individual models. If a group-analysis model can be a module of itself, then it will be able to handle multi-level

Being incrementally updatable means that group-inference results can be summarized as summary statistics and used for further analysis involving newly collected data. This feature is very useful in research practice because experimental data are usually collected incrementally. For example, after a study on eighty subjects half a year ago, twenty more subjects might be recruited. In this case, it may require cumbersome computation to analyze the entire data of one hundred subjects. However, if the group inference is incrementally updatable, it may need much less computation to include the additional twenty subjects. Being scalable means that a group-analysis method can handle fast growing diversity among subjects. Because modern exploratory research usually involves investigation of a large number of candidate models, scalability has become a highly desirable feature for group analysis. For example, if the connectivity between ten brain regions is studied with Bayesian networks, then a group-analysis method should be able to handle the diversity of about

Modelling is only the first step to investigate a system, following which humanunderstandable information should be further extracted from models to provide insightful understanding. For example, as final readers of a report on brain connectivity, neurologists might be interested in questions such as "which brain regions play the central role in conducting a functional task?", "in what patterns are cognitive functions segregated and integrated among brain regions?" or "how does this brain connectivity network react to the presence of a disease?" Simply reporting a vast and plain network without any highlights does not answer these questions. Graphical models notably have visualized network structures, so it is natural to analyze their structures as an important post-processing

2004) have been developed.

hierarchical group structures.

**4. Network analysis**

**3.2 Desirable features of graph analysis methods**

3.1 <sup>×</sup> 1017 (Steinsky, 2003) possible network structures.

Graphs can be studied at different levels from basically nodes and edges, to paths, or more intricately, sub-graphs. According to the object of interest, network measures can be broadly classified into two categories: (1) local measures that focus on local objects in the network, for example, a node, an edge, or a sub-graph, and (2) global measures that feature the pattern of the overall architecture. Local measures usually, yet not necessarily, put a local object in the global view. For example, the importance of a node could be defined as the proportion of communication in the whole network that must go through it. Vice versa, global measures are usually built on local features. For example, the "scale-free" property is about the distribution of node degrees. Fig. 9 illustrates those network measures listed below. Most network measures are ultimately linked to fundamental concepts such as node degree and path length.


Fig. 10. An example of the discriminability of different centrality measures. With the betweenness, the closeness and the eigenvector centrality, all the nodes are identical, while with the sub-graph centrality, the nodes are distinguishable, with score 45.696 for blue nodes, and 45.651 for green nodes. This example is from (Estrada & Rodrguez-Velzquez, 2005).

When using network measures to quantitatively reveal the differences with respect to a certain network concept, we naturally expect that the measure could sensitively detect even subtle differences. As various calculation methods could be proposed to quantify the same network concept, this raises the concern of the discriminability of these calculation methods. For example, to measure the centrality of a node, there are options such as the betweenness (Freeman, 1977), the closeness (Beauchamp, 1965), the sub-graph (Estrada & Rodrguez-Velzquez, 2005), and the eigenvector centrality (Bonacich, 1972). As Fig. 10 shows,

Graphical Models of Functional MRI Data for Assessing Brain Connectivity 391

This situation motivates the future development of theoretical criteria for rigorously comparing network measures and guiding their design, besides just evaluating them empirically. Theoretically, it is possible to use just a real number to uniquely represent a binary graph, achieving perfect discriminability. For instance, the adjacency matrix can be lined straight as the binary coding of a rational number. Though this simple mapping may not be meaningfully related to any network concept, it at least shows that with a single index

For network measures of a graph, an available criterion is the mutual information between the measure and the "isomorphic" class of graphs (Corneil & Gotlieb, 1970). Two graphs *G*<sup>1</sup> and *G*<sup>2</sup> are isomorphic if and only if there is a one-to-one mapping *f* between the nodes of *G*<sup>1</sup> and *G*<sup>2</sup> such that for every adjacent node pair *a* and *b* in *G*1, their mirrors in *G*2, ie. *f*(*a*) and *f*(*b*), are also adjacent, and vice versa. Similarly, for network measures about a node in a graph, their discriminability can quantified with the mutual information between the measure and the "isomorphic" class of the nodes. Two nodes *a* and *b* in a graph *G* are isomorphic if and only if there a permutation *p* of the nodes of *G* such that *p*(*a*) = *b* and *p*(*b*) = *a* and that for every adjacent node pair *c* and *d* (which can be *a* or *b*), *p*(*c*) and *p*(*d*) are also adjacent. It is of great theoretic and practical importance to further pursue criteria for rigorously comparing

**4.2 Discriminability of network measures**

all graphs can be distinguished.

the discriminability of network measures.

certain difference can only be detected by some measures.

Fig. 9. Network measures suitable for graphs of moderate size. A central node, as the red nodes in sub-figure (a), plays an important role in network communication. Nodes densely clustered form modules, as those circled by dotted lines in sub-figure (a). The blue edge in sub-figure (b) is a possible compensatory edge to restore the connectivity impaired by deactivation of the central node (thick black). Sub-figure (c) shows the possible connectivity patterns of node triples. Connectivity patterns appear significantly more frequently than those in random graphs are called "network motif".

features systems that are highly locally clustered, like regular lattices, but still have small geodesic diameter, like random graphs (Watts & Strogatz, 1998). Modules can be detected by hierarchical clustering algorithms that groups node from the most linked pairs to the least pairs, or by community detection algorithms (Girvan & Newman, 2002) that draw module boundaries by breaking unimportant edges one by one.


Fig. 10. An example of the discriminability of different centrality measures. With the betweenness, the closeness and the eigenvector centrality, all the nodes are identical, while with the sub-graph centrality, the nodes are distinguishable, with score 45.696 for blue nodes, and 45.651 for green nodes. This example is from (Estrada & Rodrguez-Velzquez, 2005).

#### **4.2 Discriminability of network measures**

16 Will-be-set-by-IN-TECH



Fig. 9. Network measures suitable for graphs of moderate size. A central node, as the red nodes in sub-figure (a), plays an important role in network communication. Nodes densely clustered form modules, as those circled by dotted lines in sub-figure (a). The blue edge in sub-figure (b) is a possible compensatory edge to restore the connectivity impaired by deactivation of the central node (thick black). Sub-figure (c) shows the possible connectivity patterns of node triples. Connectivity patterns appear significantly more frequently than

features systems that are highly locally clustered, like regular lattices, but still have small geodesic diameter, like random graphs (Watts & Strogatz, 1998). Modules can be detected by hierarchical clustering algorithms that groups node from the most linked pairs to the least pairs, or by community detection algorithms (Girvan & Newman, 2002) that draw

• **Perturbation and compensation mechanisms.** It is believed that as a dynamic system, the brain will respond to impairment such as that induced by disease, by recruiting other neural resources to compensate the partially disabled function. This compensation mechanism is an important hypothesis of neuro-rehabilitation, and also related to many neurological diseases. Network analysis for the compensation mechanism can take a perturbation-and-recovery approach: deleting a connection or deactivating a node, and then searching for the most efficient changes that are needed to restore the impaired connectivity. Such mechanism could be developing a new connection other than the deleted one, or increasing the functionality of the most central node of the "lesioned"

• **Motif and connectivity pattern.** Inter-connected nodes are building blocks of a big network, and the connectivity patterns among neighboring nodes characterize how information is processed at the local level. It has been found that in real-world networks certain patterns of inter-connections occur much more frequently than in random networks, and these "signature" local patterns are called network "motif" (Milo et al., 2002; Sporns & Kötter, 2004). Another network measure related to the "small-world" property is clustering coefficient, which is a function of the counts of pattern C and pattern D in

-



-


!-




"



 


!\$


network (Kötter et al., 2007).

Fig. 9-(c).


module boundaries by breaking unimportant edges one by one.


those in random graphs are called "network motif".




"#-

When using network measures to quantitatively reveal the differences with respect to a certain network concept, we naturally expect that the measure could sensitively detect even subtle differences. As various calculation methods could be proposed to quantify the same network concept, this raises the concern of the discriminability of these calculation methods. For example, to measure the centrality of a node, there are options such as the betweenness (Freeman, 1977), the closeness (Beauchamp, 1965), the sub-graph (Estrada & Rodrguez-Velzquez, 2005), and the eigenvector centrality (Bonacich, 1972). As Fig. 10 shows, certain difference can only be detected by some measures.

This situation motivates the future development of theoretical criteria for rigorously comparing network measures and guiding their design, besides just evaluating them empirically. Theoretically, it is possible to use just a real number to uniquely represent a binary graph, achieving perfect discriminability. For instance, the adjacency matrix can be lined straight as the binary coding of a rational number. Though this simple mapping may not be meaningfully related to any network concept, it at least shows that with a single index all graphs can be distinguished.

For network measures of a graph, an available criterion is the mutual information between the measure and the "isomorphic" class of graphs (Corneil & Gotlieb, 1970). Two graphs *G*<sup>1</sup> and *G*<sup>2</sup> are isomorphic if and only if there is a one-to-one mapping *f* between the nodes of *G*<sup>1</sup> and *G*<sup>2</sup> such that for every adjacent node pair *a* and *b* in *G*1, their mirrors in *G*2, ie. *f*(*a*) and *f*(*b*), are also adjacent, and vice versa. Similarly, for network measures about a node in a graph, their discriminability can quantified with the mutual information between the measure and the "isomorphic" class of the nodes. Two nodes *a* and *b* in a graph *G* are isomorphic if and only if there a permutation *p* of the nodes of *G* such that *p*(*a*) = *b* and *p*(*b*) = *a* and that for every adjacent node pair *c* and *d* (which can be *a* or *b*), *p*(*c*) and *p*(*d*) are also adjacent. It is of great theoretic and practical importance to further pursue criteria for rigorously comparing the discriminability of network measures.

• **FMRIB Software Library (FSL)**: Developed mainly by the FMRIB Analysis Group at the

Graphical Models of Functional MRI Data for Assessing Brain Connectivity 393

Brief description: FSL is a comprehensive library of analysis tools for fMRI, MRI and DTI

Brief description: MRIcro allows efficient viewing and exporting of brain images. It can create Analyze format headers for exporting brain images to other platforms, such as SPM.

Brief description: FreeSurfer is a set of automated tools for reconstruction of the brain cortical surface from structural MRI data, and overlay of functional MRI data onto the

• **Brain Connectivity Toolbox (BCT)**: Developed mainly by the Computational Cognitive

Brief description: This toolbox provides an access to a large selection of complex network measures in Matlab. Such measures aim to characterize brain connectivity by neuro-biologically meaningful statistics, and are used in the description of structural and

There are normally fMRI datasets associated with the above software. Here we also briefly mention a few publicly-available fMRI databases. Details related with the experiment, design

• **fMRI Data Center (fMRIDC)**: Funded by the National Science Foundation, the W. M. Keck

• **The Neuroimaging Informatics Tools and Resources Clearinghouse (NITRC)**: Funded

Akaike, H. (1974). A new look at the statistical model identification, *Automatic Control, IEEE*

Ali, R., Richardson, T. & Spirtes, P. (2009). Markov equivalence for ancestral graphs, *The Annals*

Barabasi, A.-L. & Albert, R. (1999). Emergence of Scaling in Random Networks, *Science*

Bassett, D. S. & Bullmore, E. (2006). Small-World Brain Networks, *Neuroscientist*

Beauchamp, M. A. (1965). An improved index of centrality, *Behavioral Science* 10(2): 161–163.

• **MRIcro**: Developed by Professor Chris Rorden's group at the University of South Carolina.

In addition, it allows neuropsychologists to identify regions of interest (ROIs). • **FreeSurfer**: Developed by the Athinoula A. Martinos Center for Biomedical Imaging.

University of Oxford.

brain imaging data.

reconstructed surface.

Foundation.

**7. References**

functional connectivity datasets.

Website: http://www.fmridc.org

Website: http://nbirn.net/bdr

Website: http://www.nitrc.org

*Transactions on* 19(6): 716–723.

*of Statistics* 37(5B): 2808–2837.

286(5439): 509–512.

12(6): 512–523.

Website: http://www.fmrib.ox.ac.uk/fsl

Website: http://www.sph.sc.edu/comd/rorden

Website: http://surfer.nmr.mgh.harvard.edu

and data content are available in the associated website links.

Website: http://www.brain-connectivity-toolbox.net

• **Biomedical Informatics Research Network (BIRN) Data Repository**

by the National Institutes of Health Blueprint for Neuroscience Research.

Neuroscience Laboratory at Indiana University.
