**2. Image segmentation methods**

A wide variety of segmentation techniques have been reviewed in (Balafar et al., 2010; Bankman, 2000; Bezdek et al., 1993; Clarke et al., 1995; Dubey et al., 2010; Pal & Pal, 1993; Pham et al., 2000; Saeed, 1998; Suri, Singh, et al., 2002b, 2002a; Zijdenbos & Dawant, 1994). We separate these techniques into 9 categories based on the classification scheme in (Pham et al., 2000): (1) thresholding, (2) region growing, (3) edge detection, (4) classifiers, (5) clustering, (6) statistical models, (7) artificial neural networks, (8) deformable models, and (9) atlas-guided approaches. Other notable methods that do not belong to any of these categories are described at the end of this section. Though each technique is presented separately, multiple techniques are often used in conjunction to solve various applications.

#### **2.1 Thresholding**

The simplest operation in this category is image thresholding (Pal & Pal, 1993). In this technique a threshold is selected, and an image is divided into groups of pixels having value less than the threshold and groups of pixels with values greater or equal to the threshold. There are several thresholding methods: global thresholding, adaptive thresholding, optimal global and adaptive thresholding, local thresholding, and thresholds based on several variables (Bankman, 2000). Thresholding is a very simple, fast and easily implemented procedure that works reasonably well for images with very good contrast between distinctive sub-regions. A typical example is to separate CSF from highly T2-weighted brain images (Saeed, 1998). However, the distribution of intensities in brain MR images is usually very complex, and determining a threshold is difficult. In most cases, thresholding is combined with other methods (Brummer et al., 1993; Suzuki & Toriwaki, 1991).

#### **2.2 Region growing**

Region growing (or region merging) is a procedure that looks for groups of pixels with similar intensities. It starts with a pixel or a group of pixels (called seeds) that belong to the structure of interest. Subsequently the neighbouring pixels with the same properties as seeds (or based on a homogeneity criteria) are appended gradually to the growing region until no more pixels can be added (Dubey et al., 2010). The object is then represented by all pixels that have been accepted during the growing procedure. The advantage of region growing is that it is capable of correctly segmenting regions that have the same properties

Incorrect segmentation (*InC*): ܫ݊ܥ ൌ ሺܰ ܰሻȀܰ, representing the total percentage of

The purpose of this chapter is to render a review about existing segmentation techniques and the work we have done in the segmentation of brain MR images. The rest of this chapter is organized as follows: In Section 2, existing techniques for human cerebral cortical segmentation and their applications are reviewed. In Section 3, a new non-homogeneous Markov random field model based on fuzzy membership is proposed for brain MR image segmentation. In Section 4, image pre-processing, such as de-noising, the correction of intensity inhomogeneity and the estimation of partial volume effect are summarized. In

A wide variety of segmentation techniques have been reviewed in (Balafar et al., 2010; Bankman, 2000; Bezdek et al., 1993; Clarke et al., 1995; Dubey et al., 2010; Pal & Pal, 1993; Pham et al., 2000; Saeed, 1998; Suri, Singh, et al., 2002b, 2002a; Zijdenbos & Dawant, 1994). We separate these techniques into 9 categories based on the classification scheme in (Pham et al., 2000): (1) thresholding, (2) region growing, (3) edge detection, (4) classifiers, (5) clustering, (6) statistical models, (7) artificial neural networks, (8) deformable models, and (9) atlas-guided approaches. Other notable methods that do not belong to any of these categories are described at the end of this section. Though each technique is presented separately, multiple techniques are often used in conjunction to solve various

The simplest operation in this category is image thresholding (Pal & Pal, 1993). In this technique a threshold is selected, and an image is divided into groups of pixels having value less than the threshold and groups of pixels with values greater or equal to the threshold. There are several thresholding methods: global thresholding, adaptive thresholding, optimal global and adaptive thresholding, local thresholding, and thresholds based on several variables (Bankman, 2000). Thresholding is a very simple, fast and easily implemented procedure that works reasonably well for images with very good contrast between distinctive sub-regions. A typical example is to separate CSF from highly T2-weighted brain images (Saeed, 1998). However, the distribution of intensities in brain MR images is usually very complex, and determining a threshold is difficult. In most cases, thresholding is

Region growing (or region merging) is a procedure that looks for groups of pixels with similar intensities. It starts with a pixel or a group of pixels (called seeds) that belong to the structure of interest. Subsequently the neighbouring pixels with the same properties as seeds (or based on a homogeneity criteria) are appended gradually to the growing region until no more pixels can be added (Dubey et al., 2010). The object is then represented by all pixels that have been accepted during the growing procedure. The advantage of region growing is that it is capable of correctly segmenting regions that have the same properties

combined with other methods (Brummer et al., 1993; Suzuki & Toriwaki, 1991).

false segmentation.

Section 5, the conclusion of this chapter is given.

**2. Image segmentation methods** 

applications.

**2.1 Thresholding** 

**2.2 Region growing** 

and are spatially separated, and also it generates connected regions (Bankman, 2000). Instead of region merging, it is possible to start with some initial segmentation and subdivide the regions that do not satisfy a given uniformity test. This technique is called splitting (Haralick & Shapiro, 1985). A combination of splitting and merging adds together the advantages of both approaches (Zucker, 1976). However, the results of region growing depend strongly on the selection of homogeneity criterion. Another problem is that different starting points may not grow into identical regions (Bankman, 2000). Region growing has been exploited in many clinical applications (Cline et al., 1987; Tang et al., 2000).

#### **2.3 Edge detection techniques**

In edge detection techniques, the resulting segmented image is described in terms of the edges (boundaries) between different regions. Edges are formed at intersection of two regions where there are abrupt changes in grey level intensity values. Edge detection works well on images with good contrast between regions. A large number of different edge operators can be used for edge detection. These operations are generally named after their inventors. The most popular ones are the Marr-Hildreth or LoG (Laplacian-of-Gaussian), Sobel, Roberts, Prewitt, and Canny operators. Binary mathematical morphology and Watershed algorithm are often used for edge detection purposed in the segmentation of brain MR images (Dogdas et al., 2002; Grau et al., 2004). However, the major drawbacks of these methods are over-segmentation, sensitivity to noise, poor detection of significant areas with low contrast boundaries, and poor detection of thin structures, etc. (Grau et al., 2004).

#### **2.4 Classifiers**

Classifier methods are known as supervised methods in pattern recognition, which seek to partition the image by using training data with known labels as references. The simplest classifier is nearest-neighbour classifier (NNC), in which each pixel is classified in the same class as the training datum with closest intensity (Boudraa & Zaidi, 2006). Other examples of classifiers are *k*-nearest neighbour (*k*-NN) (Duda & Hart, 1973; Fukunaga, 1990), Parzen window (Hamamoto et al., 1996), Bayes classifier or maximum likelihood (ML) estimation (Duda & Hart, 1973), Fisher's linear discriminant (FLD) (Fisher, 1936), the nearest mean classifier (NMC) (Skurichina & Duin, 1996), support vector machine (SVM) (Vapnik, 1998). The weakness of classifiers is that they generally do not perform any spatial modelling. This weakness has been addressed in recent work extending classifier methods to segment images corrupted by intensity in-homogeneities (Wells III et al., 1996). Neighbourhood and geometric information was also incorporated into a classifier approach in (Kapur et al., 1998). In addition, it requires manual interaction to obtain training data. Training sets for each image can be time consuming and laborious (Pham et al., 2000).

#### **2.5 Clustering**

Clustering is the process of organizing objects into groups whose members are similar in certain ways, whose goal is to recognize structures or clusters presented in a collection of unlabelled data. It is a method of unsupervised learning, and a common technique for statistical data analysis used in many fields.

Segmentation of Brain MRI 149

1

*i <sup>k</sup> <sup>N</sup> <sup>m</sup> ik i*

 

*v*

��� � ���

& Ohya, 2010) or Markov random field (MRF) modeling (Liu et al., 2005).

which element(s) of this set are likely to be the true ones.

number of classes are required, and it has extensive computations.

**2.6.1 Expectation maximization (EM)** 

**2.6.2 Markov random field model (MRF)** 

This iteration will stop when ��� �����

**2.6 Statistical models** 

*<sup>N</sup> <sup>m</sup> ik i*

*u x*

(8)

�� � �, ε is a termination criterion between 0

*u*

1

�����

and 1, and *p* is the iteration step (Kannan et al., 2010). Although clustering algorithms do not require training data, they do require an initial segmentation (or equivalently, initial parameters). Clustering algorithms do not directly incorporate spatial modeling and can therefore be sensitive to noise and intensity inhomogeneities. This lack of spatial modeling, however, can provide significant advantages for fast computation (Hebert, 1997). Some work on improving the robustness of clustering algorithms to intensity inhomogeneities in MR images have been carried out (Pham & Prince, 1999). Robustness to noise can be incorporated with spatial correlations in an image based on k-nearest neighbor model (R. Xu

Statistical classification methods usually solve the segmentation problem by either assigning a class label to a pixel or by estimating the relative amounts of the various tissue types within a pixel (Noe et al., 2001). Statistical inference enables us to make statements about

Expectation maximization (EM) algorithm (Dempster et al., 1977) is a method for finding the maximum likelihood or maximum a posteriori (MAP) estimator of a hidden parameter � with a probability distribution. EM is an iterative method which alternates between performing an expectation (E) step, in which each pixel is classified into one cluster according to the current estimates of the posterior distributions over hidden variables, and a maximization (M) step, in which the hidden parameters are re-estimated by maximizing the likelihood function, according to the current classification. These parameter-estimates are then used to determine the distribution over hidden variables in the next E step. Convergence is assured since the increase of likelihood after each iteration (Zaidi et al., 2006). The underlying model in EM algorithm can be specified according the specific requirements of the given task (Wells III et al., 1996; Y. Zhang et al., 2001). In spite of these achievements, they have a few deficiencies: a good prior distribution and the known

Markov random field (S.Z. Li, 1995) model is a statistical model that can be used within segmentation methods. MRFs model spatial interactions among neighboring or nearby pixels. In medical imaging, they are typically used because most pixels belong to the same class as their neighboring pixels (Pham et al., 2000). Let a finite lattice *I* as a 2D image, ��� is the pixel *i* in this image, which is denoted by � � ���� � � ��, where �� is the gray value of pixel *i*. For each pixel, the region-type (or pixel class) that the pixel belongs to is specified by

#### **2.5.1 K-means clustering**

K-means clustering (or Hard C-means clustering, HCM) (MacQueen, 1967) is one of the simplest unsupervised clustering method, aiming to partition *N* samples into *K* clusters by minimizing an objective function so that the within-cluster sum of squares is minimized. It starts with defined initial *K* cluster centers and keeps reassigning the samples to clusters based on the similarity between the sample and the cluster centers until a convergence criterion is met. Given a set of samples ���� ��������, where each sample is a *M*-dimensional real vector, �� is the num of samples in cluster *k* denoted by ��, �� is the mean value of these samples, and then the objective function is defined as:

$$J\_m = \sum\_{k=1}^{K} \sum\_{i=1}^{N} \left| \left\| \left\| \left\| \left\| \left\| \left\| \left\| \left\| \right\| \right\| \right\| \right\| \right\| \right. \tag{4} \right| \tag{4} \right. \tag{5}$$

$$
\sigma\_k = \frac{1}{N\_k} \sum\_{\mathbf{x}\_i \in \Gamma\_k} \mathbf{x}\_i \tag{5}
$$

where ||�� � ��|| is a distance measure between point �� and the cluter center ��. The common distance measures are Euclidean distance, chessboard distance, city block distance, Mahalanobis distance, or Hamming distance. The K-means algorithm has been used widely in brain MR image segmentation (Abras & Ballarin, 2005; Vemuri et al., 1995), because of its easy implementation and simple time complexity. A major problem of this algorithm is that it is sensitive to the selection of *K* cluster centers, and may converge to a local minimum of the criterion function value (Jain et al., 1999). Dozens of optimal solutions have been proposed for selecting better initial *K* cluster centers to find the global minimum value (Bradley & Fayyad, 1998; Khan & Ahmad, 2004).

#### **2.5.2 Fuzzy c-means clustering (FCM)**

Fuzzy c-means clustering (FCM) (Bezdek, 1981; Dunn, 1973) is based on the same idea of finding cluster centers by iteratively adjusting their positions and minimizing an objective function as K-means algorithm. Meanwhile it allows more flexibility by introducing multiple fuzzy membership grades to multiple clusters. The objective function is defined as:

$$J\_m = \sum\_{k=1}^{K} \sum\_{i=1}^{N} u\_{ik}^m \parallel \left| \left. \boldsymbol{\omega}\_i - \boldsymbol{\upsilon}\_k \right| \right|^2, 1 \le m < \infty \tag{6}$$

where *m* is constant to control clustering fuzziness, generally *m = 2*. ��� is the fuzzy membership of �� in the cluster *k* and satisfying ① ����� � �, ② ∑ ��� � � � ��� . �� is the *i-*th sample in measured data. �� is the cluster center, and ∥∗∥ is a distance measure. Fuzzy partitioning is carried out through an iterative optimization of the objective function shown above, with the update of membership ��� and cluster centers �� by:

$$\mu\_{ik} = \frac{1}{\sum\_{l=1}^{K} (\frac{|\{\boldsymbol{\chi}\_i - \boldsymbol{\upsilon}\_k \mid \boldsymbol{\varrho}\_l\}|}{|\{\boldsymbol{\chi}\_i - \boldsymbol{\upsilon}\_l \mid \boldsymbol{\varrho}\_l\}|}} \tag{7}$$

K-means clustering (or Hard C-means clustering, HCM) (MacQueen, 1967) is one of the simplest unsupervised clustering method, aiming to partition *N* samples into *K* clusters by minimizing an objective function so that the within-cluster sum of squares is minimized. It starts with defined initial *K* cluster centers and keeps reassigning the samples to clusters based on the similarity between the sample and the cluster centers until a convergence criterion is met. Given a set of samples ���� ��������, where each sample is a *M*-dimensional real vector, �� is the num of samples in cluster *k* denoted by ��, �� is the mean value of these

> 1 1 || *K N m ik k i J xv*

> > 1

where ||�� � ��|| is a distance measure between point �� and the cluter center ��. The common distance measures are Euclidean distance, chessboard distance, city block distance, Mahalanobis distance, or Hamming distance. The K-means algorithm has been used widely in brain MR image segmentation (Abras & Ballarin, 2005; Vemuri et al., 1995), because of its easy implementation and simple time complexity. A major problem of this algorithm is that it is sensitive to the selection of *K* cluster centers, and may converge to a local minimum of the criterion function value (Jain et al., 1999). Dozens of optimal solutions have been proposed for selecting better initial *K* cluster centers to find the global minimum value

Fuzzy c-means clustering (FCM) (Bezdek, 1981; Dunn, 1973) is based on the same idea of finding cluster centers by iteratively adjusting their positions and minimizing an objective function as K-means algorithm. Meanwhile it allows more flexibility by introducing multiple fuzzy membership grades to multiple clusters. The objective function is defined

1 1

membership of �� in the cluster *k* and satisfying ① ����� � �, ② ∑ ��� � � �

1

 

*l i l*

*k i*

above, with the update of membership ��� and cluster centers �� by:

*u*

*ik K*

*K N <sup>m</sup> m ik i k*

2

2/( 1)

(6)

��� . �� is the *i-*th

(7)


*J u xv m*

where *m* is constant to control clustering fuzziness, generally *m = 2*. ��� is the fuzzy

sample in measured data. �� is the cluster center, and ∥∗∥ is a distance measure. Fuzzy partitioning is carried out through an iterative optimization of the objective function shown

> *x v x v*

1 || || ( ) || ||

*i k m*

*i k k i k x v x N*

2

(4)

(5)

**2.5.1 K-means clustering** 

samples, and then the objective function is defined as:

(Bradley & Fayyad, 1998; Khan & Ahmad, 2004).

**2.5.2 Fuzzy c-means clustering (FCM)** 

as:

This iteration will stop when ��� ����� ��� � ��� ����� �� � �, ε is a termination criterion between 0 and 1, and *p* is the iteration step (Kannan et al., 2010). Although clustering algorithms do not require training data, they do require an initial segmentation (or equivalently, initial parameters). Clustering algorithms do not directly incorporate spatial modeling and can therefore be sensitive to noise and intensity inhomogeneities. This lack of spatial modeling, however, can provide significant advantages for fast computation (Hebert, 1997). Some work on improving the robustness of clustering algorithms to intensity inhomogeneities in MR images have been carried out (Pham & Prince, 1999). Robustness to noise can be incorporated with spatial correlations in an image based on k-nearest neighbor model (R. Xu & Ohya, 2010) or Markov random field (MRF) modeling (Liu et al., 2005).

#### **2.6 Statistical models**

Statistical classification methods usually solve the segmentation problem by either assigning a class label to a pixel or by estimating the relative amounts of the various tissue types within a pixel (Noe et al., 2001). Statistical inference enables us to make statements about which element(s) of this set are likely to be the true ones.

#### **2.6.1 Expectation maximization (EM)**

Expectation maximization (EM) algorithm (Dempster et al., 1977) is a method for finding the maximum likelihood or maximum a posteriori (MAP) estimator of a hidden parameter � with a probability distribution. EM is an iterative method which alternates between performing an expectation (E) step, in which each pixel is classified into one cluster according to the current estimates of the posterior distributions over hidden variables, and a maximization (M) step, in which the hidden parameters are re-estimated by maximizing the likelihood function, according to the current classification. These parameter-estimates are then used to determine the distribution over hidden variables in the next E step. Convergence is assured since the increase of likelihood after each iteration (Zaidi et al., 2006). The underlying model in EM algorithm can be specified according the specific requirements of the given task (Wells III et al., 1996; Y. Zhang et al., 2001). In spite of these achievements, they have a few deficiencies: a good prior distribution and the known number of classes are required, and it has extensive computations.

#### **2.6.2 Markov random field model (MRF)**

Markov random field (S.Z. Li, 1995) model is a statistical model that can be used within segmentation methods. MRFs model spatial interactions among neighboring or nearby pixels. In medical imaging, they are typically used because most pixels belong to the same class as their neighboring pixels (Pham et al., 2000). Let a finite lattice *I* as a 2D image, ��� is the pixel *i* in this image, which is denoted by � � ���� � � ��, where �� is the gray value of pixel *i*. For each pixel, the region-type (or pixel class) that the pixel belongs to is specified by

Segmentation of Brain MRI 151

( ) <sup>1</sup> (|;) ( | ) [ log( )] <sup>2</sup> <sup>2</sup>

{,| } *k k*

In this way, the segmentation problem in MRF model is reduced to the minimization of the above energy function, which is usually computed by iterated conditional modes (ICM) algorithm (Besag, 1986). The ICM method uses the 'greedy' strategy in the iterative local minimization and convergence is guaranteed after only a few iterations (Boudraa & Zaidi, 2006). By importing spatial relations among pixels, non-supervised and nonparametric MRF model can effectively decrease the influence of image noise, and undertake fine stable and satisfied segmentation results for low SNR images. This model has been widely applied in human cerebral cortical segmentation (Held et al., 1997; Y. Zhang et al., 2001). Contrarily a difficulty associated with MRF models is proper selection of the parameters controlling the strength of spatial interaction (S.Z. Li, 1995). A setting that is too high can result in an excessively smooth segmentation and a loss of important structural details. Some researchers have proposed several schemes for the estimation of MRF parameters (Descombes et al., 1999; Salzenstein & Pieczynski, 1997; R. Xu & Luo, 2009). In addition, MRF methods usually require computationally intensive algorithms (Pham et al., 2000).

Artificial neural networks (ANNs) are parallel networks of processing elements or nodes to simulate biological neural networks. Each node in an ANN is capable of performing elementary computations. Learning is achieved through the adaptation of weights assigned to the connections between nodes. The massive connectionist architecture usually makes the system robust while the parallel processing enables the system to produce output in real time. To simulate biological neural network, the neurons and connections in ANNs model comprise the following components and variables in Fig. 2 (Kriesel, 2007). A thorough

*i I i I k*

 *v k* 

*y v Uy x Uy x*

**2.7 Artificial neural networks (ANNs)** 

treatment of ANNs can be found in (J.W. Clark, 1991).

Fig. 2. Data processing of a neuron (Images provided courtesy of D. Kriesel).

2

*i k i i k*

2

(17)

 

(16)

a class label � � ���� � ∈ �� (i.e., image segmentation results). �� ∈ Λ, Λ � ����� � � �� is a set of labels and *K* is the number of classes. So *X* (label filed) and *Y* (gray field) will be random fields in lattice *I* and the purpose of MRF model is to establish the relationship between *X* and *Y*, then the image model is defined as:

$$Y\_i = \upsilon\_{X\_i} + e\_i, i \in I \tag{9}$$

where ��� is the gray mean value of class ��, and �� is a random variable meeting Gaussian distribution. If �� � �� � ∈ Λ, �� � ���� �� ��, in which �� � is the variance of �� for *k*, then the conditional probability density is defined as:

$$P(Y\_i = y\_i \mid X\_i = k) = \frac{1}{\sqrt{2\pi\sigma\_k^2}} \exp[-\frac{\left(y\_i - v\_k\right)^2}{2\sigma\_k^2}]\tag{10}$$

Subsequently, � � ���� � ∈ ��, the priori model of image segmentation results is a 2D MRF. According to Hammersley-Clifford theorem in (Hammersley & Clifford, 1971), the priori probability of MRF meets Gibbs distribution, and so the priori model is defined as:

$$P(X = \mathbf{x}) = \frac{1}{Z} \exp[-\sum\_{c \in \mathcal{C}} V\_c(\mathbf{x})] \tag{11}$$

where � � ∑ ������ ∑ ����� �∈� �∈� � is a normalizing constant called *partition function* and ����� denotes the potential function of clique �∈�, which only depends on ������ ∈�. *C* is the set of second order cliques (i.e. doubletons), and ���� indicates the neighborhood of pixel *i*. If multi-level logistic (MLL) model is adopted and the second order neighborhood system and the dual potential function are only considered, energy function is defined as:

$$\mathcal{U}\mathcal{U}(\mathbf{x}) = \sum\_{i \in I} \sum\_{j \in \mathcal{S}(i)} \mathcal{V}(\mathbf{x}\_i, \mathbf{x}\_j) \tag{12}$$

$$V(\mathbf{x}\_i, \mathbf{x}\_j) = \begin{cases} -\mathcal{B}\_{\prime} & \text{if } \quad \mathbf{x}\_i = \mathbf{x}\_j \\ \mathcal{B}\_{\prime} & \text{if } \quad \mathbf{x}\_i \neq \mathbf{x}\_j \end{cases} \tag{13}$$

Note that the energies of singletons (i.e. pixel �∈�) directly reflect the probabilistic modeling of labels without context, while doubleton clique potentials express relationship between neighboring pixel label. On the basis of maximum a posteriori (MAP) estimation (Geman & Geman, 1993) and Bayes' theorem, the optimal solution ���∗ is defined as:

$$\begin{aligned} X &= \arg\max\_{X} P(X \mid Y) \\ &= \arg\max\_{X} P(Y \mid X)P(X) \end{aligned} \tag{14}$$

In order to facilitate the solution, the objective function takes natural logarithm to be

$$X^\* = \arg\min\_{x \in \Lambda} \{ \mathcal{U}(y \mid x; \theta) + \mathcal{U}(x) \} \tag{15}$$

a class label � � ���� � ∈ �� (i.e., image segmentation results). �� ∈ Λ, Λ � ����� � � �� is a set of labels and *K* is the number of classes. So *X* (label filed) and *Y* (gray field) will be random fields in lattice *I* and the purpose of MRF model is to establish the relationship between *X*

is the gray mean value of class ��, and �� is a random variable meeting Gaussian

2 2

*k k*

2 2

*c C*

��, in which ��

<sup>1</sup> ( ) ( | ) exp[ ]

Subsequently, � � ���� � ∈ ��, the priori model of image segmentation results is a 2D MRF. According to Hammersley-Clifford theorem in (Hammersley & Clifford, 1971), the priori

<sup>1</sup> ( ) exp[ ( )] *<sup>c</sup>*

where � � ∑ ������ ∑ ����� �∈� �∈� � is a normalizing constant called *partition function* and ����� denotes the potential function of clique �∈�, which only depends on ������ ∈�. *C* is the set of second order cliques (i.e. doubletons), and ���� indicates the neighborhood of pixel *i*. If multi-level logistic (MLL) model is adopted and the second order neighborhood system and the dual potential function are only considered, energy

> ( ) () ( , ) *<sup>i</sup> <sup>j</sup> i Ij i Ux Vx x*

Note that the energies of singletons (i.e. pixel �∈�) directly reflect the probabilistic modeling of labels without context, while doubleton clique potentials express relationship between neighboring pixel label. On the basis of maximum a posteriori (MAP) estimation (Geman &

\* arg max (|)

\* arg min { ( | ; ) ( )} *<sup>X</sup> <sup>x</sup> Uy x Ux*

*X PX Y*

In order to facilitate the solution, the objective function takes natural logarithm to be

arg max (| )( ) *X X*

*PY XPX*

*i j*

*if x x*

*if x x*

*i j*

, (, ) ,

*i j*

Geman, 1993) and Bayes' theorem, the optimal solution ���∗ is defined as:

*Vx x*

*PX x V x Z*

*y v PY y X k*

probability of MRF meets Gibbs distribution, and so the priori model is defined as:

, *<sup>i</sup> Y v ei I iXi* (9)

2

(11)

(12)

(14)

(15)

(13)

*i k*

(10)

� is the variance of �� for *k*, then the

and *Y*, then the image model is defined as:

distribution. If �� � �� � ∈ Λ, �� � ���� ��

conditional probability density is defined as:

*i ii*

where ���

function is defined as:

$$\mathcal{U}(y \mid \mathbf{x}; \boldsymbol{\theta}) = \sum\_{i \neq l} \mathcal{U}(y\_i \mid \mathbf{x}\_i) = \sum\_{i \neq l} \left[ \frac{(y\_i - v\_k)}{2\sigma\_k^2} + \frac{1}{2} \log(\sigma\_k^2) \right] \tag{16}$$

$$\theta = \{\upsilon\_{k'}\sigma\_k \mid k \in \Lambda\}\tag{17}$$

In this way, the segmentation problem in MRF model is reduced to the minimization of the above energy function, which is usually computed by iterated conditional modes (ICM) algorithm (Besag, 1986). The ICM method uses the 'greedy' strategy in the iterative local minimization and convergence is guaranteed after only a few iterations (Boudraa & Zaidi, 2006). By importing spatial relations among pixels, non-supervised and nonparametric MRF model can effectively decrease the influence of image noise, and undertake fine stable and satisfied segmentation results for low SNR images. This model has been widely applied in human cerebral cortical segmentation (Held et al., 1997; Y. Zhang et al., 2001). Contrarily a difficulty associated with MRF models is proper selection of the parameters controlling the strength of spatial interaction (S.Z. Li, 1995). A setting that is too high can result in an excessively smooth segmentation and a loss of important structural details. Some researchers have proposed several schemes for the estimation of MRF parameters (Descombes et al., 1999; Salzenstein & Pieczynski, 1997; R. Xu & Luo, 2009). In addition, MRF methods usually require computationally intensive algorithms (Pham et al., 2000).

#### **2.7 Artificial neural networks (ANNs)**

Artificial neural networks (ANNs) are parallel networks of processing elements or nodes to simulate biological neural networks. Each node in an ANN is capable of performing elementary computations. Learning is achieved through the adaptation of weights assigned to the connections between nodes. The massive connectionist architecture usually makes the system robust while the parallel processing enables the system to produce output in real time. To simulate biological neural network, the neurons and connections in ANNs model comprise the following components and variables in Fig. 2 (Kriesel, 2007). A thorough treatment of ANNs can be found in (J.W. Clark, 1991).

Fig. 2. Data processing of a neuron (Images provided courtesy of D. Kriesel).

Segmentation of Brain MRI 153

structures. This helps to obtain fully automatic cortical segmentation procedures. The standard atlas-guided approach treats segmentation as a registration problem. It first finds a one-to-one transformation that maps a pre-segmented atlas image to the target image. This process is often referred to as 'atlas warping'. The warping can be performed with linear transformations (Talairach & Tournoux, 1988), or nonlinear transformations (Collins et al., 1995; Davatzikos, 1996). Atlas-guided approaches have been applied mainly in brain MRI segmentation (Collins et al., 1995), as well as in extracting the brain volume from head scans (Aboutanos & Dawant, 1997). One advantage is that labels as well as the segmentation are transferred. They also provide a standard system for studying morphometric properties (Thompson & Toga, 1997). Atlas-guided approaches are generally better suited for segmentation of structures that are stable over the population of study. One method that helps model anatomical variability is to use probabilistic atlases (Thompson & Toga, 1997), but these require additional time and interaction to accumulate data. Another method is to

use manually selected landmarks (Davatzikos, 1996) to constrain transformation.

*Texture segmentation* is to segment an image into regions according to the textures of the regions. It was in the late 1970s when Haralick et al (Haralick et al., 1973) published an extensive paper on texture. Later, Peleg et al (Peleg et al., 1984) and Cross et al (Cross & Jain, 1983) also published work in texture analysis applied to computer vision images. Application of texture in brain segmentation started in the early 1990s, when Lachmann et al (Lachmann & Barillot, 1992) developed a method for the classification of WM, GM and CSF. This method, however, did not discuss the validation schemes, and it was hard to judge the performance of such a segmentation algorithm. Besides, it seemed sensitive to initial textural properties, and no such discussion was carried out in the paper (Suri,

*Self-organizing maps* (SOM), introduced by Kohonen in early 1981 (Kohonen, 1990), is a type of artificial neural network, whose precursor is *learning vector quantization* (LVQ) invented by T. Kohonen (Kohonen, 1997). It is able to convert complex, nonlinear statistical relationships between high-dimensional data items into simple geometric relationships on a low-dimensional display via using unsupervised learning. The applications of SOM method can be found in (Y. Li & Chi, 2005; Tian & Fan, 2007). However, SOM algorithms are, firstly, highly dependent on the training data representatives and the initialization of the connection weights. Secondly, they are very computationally expensive if the dimensions of

*Wavelet transform*, adventured in medical imaging research in 1991 (Weaver et al., 1991), is a tool that cuts up data or functions or operators into different frequency components, and then studies each component with a resolution matched to its scale (Daubechies, 2004). Modern wavelet analysis was considered to be proposed by Grossmann and Morlet in their milestone paper (Morlet & Grossman, 1984). In medical image segmentation, wavelet transforms have been employed to combine texture analysis, edge detection, classifiers, statistical models, and deformable models, etc. Many works benefit through using image features within a spatial-frequency domain after wavelet transform to assist the

**2.10 Other techniques** 

Singh, et al., 2002b).

the data increases (Y. Li & Chi, 2005).

segmentation (Barra & Boire, 2000; Bello, 1994).

The most widely application in medical imaging is as a classifier (Gelenbe et al., 1996; Hall et al., 1992), in which the weights are determined by training data and the ANN is then used to segment new data. ANNs can also be used in an unsupervised fashion as a clustering method (Bezdek et al., 1993; Reddick et al., 1997), as well as for deformable models (Vilarino et al., 1998). Because of the many interconnections used in a neural network, spatial information can be easily incorporated into its classification procedures (Pham et al., 2000). However, the major disadvantage of the artificial neural networks (ANNs) is that it requires training data. For large neural networks, it also requires high processing time because its processing is usually simulated on a standard serial computer.

#### **2.8 Deformable models**

Deformable models are physically motivated, model-based techniques for detecting region boundaries by using closed parametric curves or surfaces that deform under the influence of internal and external forces. To delineate an object boundary in an image, a closed curve or surface must first be placed near the desired boundary and then be allowed to undergo an iterative relaxation process. Internal forces are computed from within the curve or surface to keep it smooth throughout the deformation. External forces are usually derived from the image to drive the curve or surface toward the desired feature of interest (Pham et al., 2000). The original deformable, called *snake* model, was introduced in (Kass et al., 1988), in which the contour deforms to minimize the contour energy that includes the internal energy from the contour and the external energy from the image. A number of improvements have also been proposed, such as snake variations (Cohen, 1991; McInerney & Terzopoulos, 2000; C. Xu & Prince, 1998). *Level set* is another important deformable contour method and it was firstly proposed for image segmentation in (Malladi et al., 1995). Some researchers applied level set formulation with a contour energy minimization for obtaining a better convergence (Siddiqi et al., 1998; Wang et al., 2004).

Deformable models are quite helpful for cerebral cortical segmentation in MR images (Davatzikos & Bryan, 1996; C. Xu et al., 1998). The advantages are that they are capable of generating closed parametric curves or surfaces from images and incorporating a smoothness constraint that provides robustness to noise and spurious edges. The disadvantage is that they require manual interaction to place an initial model and choose appropriate parameters. The successes in reducing sensitivity to initialization have been made in (Cohen, 1991; Malladi et al., 1995; C. Xu & Prince, 1998). Standard deformable models can also exhibit poor convergence to concave boundaries. This difficulty can be alleviated somewhat through the use of pressure forces (Cohen, 1991) and other modified external-force models (C. Xu & Prince, 1998). Another important extension of deformable models is the adaptivity of model topology by using an implicit representation rather than an explicit parameterization (Malladi et al., 1995; McInerney & Terzopoulos, 1995). Several general reviews on deformable models in medical image analysis can be found in (He et al., 2008; Heimann & Meinzer, 2009; McInerney & Terzopoulos, 1996; Suri, Liu, et al., 2002).

#### **2.9 Atlas-guided approaches**

Atlas-guided approaches are a powerful tool for medical image segmentation when a standard atlas or template is available. The whole idea of using the brain atlas was to provide a priori knowledge, which can help in grouping the segments into anatomical

The most widely application in medical imaging is as a classifier (Gelenbe et al., 1996; Hall et al., 1992), in which the weights are determined by training data and the ANN is then used to segment new data. ANNs can also be used in an unsupervised fashion as a clustering method (Bezdek et al., 1993; Reddick et al., 1997), as well as for deformable models (Vilarino et al., 1998). Because of the many interconnections used in a neural network, spatial information can be easily incorporated into its classification procedures (Pham et al., 2000). However, the major disadvantage of the artificial neural networks (ANNs) is that it requires training data. For large neural networks, it also requires high processing time because its

Deformable models are physically motivated, model-based techniques for detecting region boundaries by using closed parametric curves or surfaces that deform under the influence of internal and external forces. To delineate an object boundary in an image, a closed curve or surface must first be placed near the desired boundary and then be allowed to undergo an iterative relaxation process. Internal forces are computed from within the curve or surface to keep it smooth throughout the deformation. External forces are usually derived from the image to drive the curve or surface toward the desired feature of interest (Pham et al., 2000). The original deformable, called *snake* model, was introduced in (Kass et al., 1988), in which the contour deforms to minimize the contour energy that includes the internal energy from the contour and the external energy from the image. A number of improvements have also been proposed, such as snake variations (Cohen, 1991; McInerney & Terzopoulos, 2000; C. Xu & Prince, 1998). *Level set* is another important deformable contour method and it was firstly proposed for image segmentation in (Malladi et al., 1995). Some researchers applied level set formulation with a contour energy minimization for obtaining a better convergence

Deformable models are quite helpful for cerebral cortical segmentation in MR images (Davatzikos & Bryan, 1996; C. Xu et al., 1998). The advantages are that they are capable of generating closed parametric curves or surfaces from images and incorporating a smoothness constraint that provides robustness to noise and spurious edges. The disadvantage is that they require manual interaction to place an initial model and choose appropriate parameters. The successes in reducing sensitivity to initialization have been made in (Cohen, 1991; Malladi et al., 1995; C. Xu & Prince, 1998). Standard deformable models can also exhibit poor convergence to concave boundaries. This difficulty can be alleviated somewhat through the use of pressure forces (Cohen, 1991) and other modified external-force models (C. Xu & Prince, 1998). Another important extension of deformable models is the adaptivity of model topology by using an implicit representation rather than an explicit parameterization (Malladi et al., 1995; McInerney & Terzopoulos, 1995). Several general reviews on deformable models in medical image analysis can be found in (He et al., 2008; Heimann & Meinzer, 2009; McInerney & Terzopoulos, 1996; Suri, Liu, et al., 2002).

Atlas-guided approaches are a powerful tool for medical image segmentation when a standard atlas or template is available. The whole idea of using the brain atlas was to provide a priori knowledge, which can help in grouping the segments into anatomical

processing is usually simulated on a standard serial computer.

**2.8 Deformable models** 

(Siddiqi et al., 1998; Wang et al., 2004).

**2.9 Atlas-guided approaches** 

structures. This helps to obtain fully automatic cortical segmentation procedures. The standard atlas-guided approach treats segmentation as a registration problem. It first finds a one-to-one transformation that maps a pre-segmented atlas image to the target image. This process is often referred to as 'atlas warping'. The warping can be performed with linear transformations (Talairach & Tournoux, 1988), or nonlinear transformations (Collins et al., 1995; Davatzikos, 1996). Atlas-guided approaches have been applied mainly in brain MRI segmentation (Collins et al., 1995), as well as in extracting the brain volume from head scans (Aboutanos & Dawant, 1997). One advantage is that labels as well as the segmentation are transferred. They also provide a standard system for studying morphometric properties (Thompson & Toga, 1997). Atlas-guided approaches are generally better suited for segmentation of structures that are stable over the population of study. One method that helps model anatomical variability is to use probabilistic atlases (Thompson & Toga, 1997), but these require additional time and interaction to accumulate data. Another method is to use manually selected landmarks (Davatzikos, 1996) to constrain transformation.

#### **2.10 Other techniques**

*Texture segmentation* is to segment an image into regions according to the textures of the regions. It was in the late 1970s when Haralick et al (Haralick et al., 1973) published an extensive paper on texture. Later, Peleg et al (Peleg et al., 1984) and Cross et al (Cross & Jain, 1983) also published work in texture analysis applied to computer vision images. Application of texture in brain segmentation started in the early 1990s, when Lachmann et al (Lachmann & Barillot, 1992) developed a method for the classification of WM, GM and CSF. This method, however, did not discuss the validation schemes, and it was hard to judge the performance of such a segmentation algorithm. Besides, it seemed sensitive to initial textural properties, and no such discussion was carried out in the paper (Suri, Singh, et al., 2002b).

*Self-organizing maps* (SOM), introduced by Kohonen in early 1981 (Kohonen, 1990), is a type of artificial neural network, whose precursor is *learning vector quantization* (LVQ) invented by T. Kohonen (Kohonen, 1997). It is able to convert complex, nonlinear statistical relationships between high-dimensional data items into simple geometric relationships on a low-dimensional display via using unsupervised learning. The applications of SOM method can be found in (Y. Li & Chi, 2005; Tian & Fan, 2007). However, SOM algorithms are, firstly, highly dependent on the training data representatives and the initialization of the connection weights. Secondly, they are very computationally expensive if the dimensions of the data increases (Y. Li & Chi, 2005).

*Wavelet transform*, adventured in medical imaging research in 1991 (Weaver et al., 1991), is a tool that cuts up data or functions or operators into different frequency components, and then studies each component with a resolution matched to its scale (Daubechies, 2004). Modern wavelet analysis was considered to be proposed by Grossmann and Morlet in their milestone paper (Morlet & Grossman, 1984). In medical image segmentation, wavelet transforms have been employed to combine texture analysis, edge detection, classifiers, statistical models, and deformable models, etc. Many works benefit through using image features within a spatial-frequency domain after wavelet transform to assist the segmentation (Barra & Boire, 2000; Bello, 1994).

Segmentation of Brain MRI 155

The fuzzy set is defined as: Given a domain *X*, *x* denotes its element, the mapping �� is defined as �� ��� �0,1�, �������, which means �� confirms a fuzzy set *F* in domain *X*, �� is called *F*'s membership function and ����� is *x*'s membership for *F*. The greater the membership, the greater the degree of one element pertaining to one fuzzy set. As a

In terms of the features in brain MR images, the spatial correlation of adjacent pixels varies with the positions of image space, which indicates the parameter � should be a variable changing with space site. Consequently, the corresponding MRF model should be

Let *y* be the gray value of pixels, and *x* be the classification of pixels in image I. If pixel *i* is marked by class *k* (�� is the clustering center of class *k*, � � 1, � , �), the parameter � will be a decreasing function of ���, which denotes the membership of pixel *i* belonging to class *k*. The smaller the ��� is, the less the degree of pixel *i* in class *k* would be, which implies the attribute of pixel *i* should be decided by the state of neighborhood. The larger the ��� is, the larger the degree of pixel *i* in class *k* would be, which implies the attribute of pixel *i* should

1 0.8

In traditional MRF model (see Section 2.6.2), the parameter � is used to calculate the *energy function* ���� and *clique potentials* ����� over all possible cliques ���, which only depends on the neighborhood of pixel *i*: ����,� � �. According to the � function, the energy function and clique potentials through considering multi-level logistic (MLL) model, second-order

> ( ) () ( , ) *c i <sup>j</sup> i Ij i Ux V x x*

( ) <sup>1</sup> (|) ( | ) [ log( )] <sup>2</sup> <sup>2</sup>

Therefore, the segmentation problem is reduced to minimize the above energy function, which is generally solved by iterated conditional modes (ICM) algorithm (Besag, 1986). The

*i I i I k*

*i i j*

*if x x*

*if x x*

2

2

*i k i i k*

*i i j*

*<sup>i</sup> uik* (18)

(19)

2

(21)

(20)

consequence, *F* is a subset in domain *X*, which does not have undefined border.

be decided by the gray value of itself. Thus, the � function is defined as:

neighborhood system and dual potential function, can be modified as

, (, ) ,

And the new non-homogeneous MRF (M-MRF) model has been improved into

*y v Uy x Uy x*

algorithm of M-MRF model for image segmentation is designed as follows:

*i j*

*Vx x*

**3.2 Modified non-homogeneous MRF model** 

**3.2.1 The** � **Function based on fuzzy membership** 

**3.2.2 The modified MRF model (M-MRF model)** 

considered as non-homogeneous.

*Multispectral segmentation* is a method for differentiating tissue classes having similar characteristics in a single imaging modality by using several independent images of the same anatomical slice in different modalities (e.g., T1, T2, proton density, etc.). As a consequence of different responses of the tissues to particular pulse sequences, this increases the capability of discrimination between different tissues (Fletcher et al., 1993; Vannier et al., 1985). The most common approach for multispectral MR image segmentation is pattern recognition (Bezdek et al., 1993; Suri, Singh, et al., 2002b). These techniques generally appear to be successful particularly for brain MR images (Reddick et al., 1997; Taxt & Lundervold, 1994), but much work remains in the area of validation.
