2.3.2.2 Second-level feature

The second-level feature consists in computing a similarity-based connectivity parameter wij between VOIs:

$$w\_{ij} = \begin{cases} e^{-\left\|\mathbf{x}\_i - \mathbf{x}\_j\right\|^2} & \text{ $i \neq j$ }\\ \mathbf{0} & \text{ $i = j$ } \end{cases} \tag{6}$$

where xi is the feature vector containing the mean and standard deviation of the ith VOI and wij is the similarity coefficient between the ith and the jth VOIs. The second-level features of any subject is denoted Wr which is a symmetric matrix.

The second-level feature is composed of similarity coefficients between all the 116 VOIs, totally 6670 dimension (upper part of matrix Wr), which is clearly not an optimal dimension for the subsequent classification. Therefore, W<sup>r</sup> is further decomposed into three subsets of features. Similar to the way of computing similarity coefficients between VOIs, we can obtain similarity coefficients between subjects for a specific VOI:

$$w\_{uv} = \begin{cases} e^{-\left\|\mathbf{x}\_u - \mathbf{x}\_v\right\|^2} & u \neq v \\ \mathbf{0} & u = v \end{cases} \tag{7}$$

where u and v stand for the uth and vth subjects. For any VOI, a symmetric matrix for subjects, W<sup>s</sup> is computed.

The dimension of W<sup>s</sup> is determined by the number of subjects, N, in a group (AD, NC, MCI). Since each subject is segmented into 116 VOIs, thus there are 116 matrices like Ws.

On the one hand, a VOI that is not affected by AD will give similar coefficients between AD and NC subjects. On the other hand, a VOI affected by AD will give different similarity coefficients for the two groups. In order to quantify the difference, we compute the frequency distribution histogram of the upper triangle values of Ws. Figure 4 shows the cumulative probability curve of similarity coefficients obtained for region angular L (c), region hippocampus L (b), and region cerebellum 10 R (a), respectively. There is a clear difference between the AD and NC groups in

Figure 4.

Statistics of the similarity coefficients between subjects for certain VOIs. (a) VOI: angular L; (b) VOI: hippocampus L; (c) VOI: cerebellum 10 R.

Figure 5.

Instance of the division for a similarity matrix. Highly involved VOIs in AD (red); Less involved VOIs in AD (blue); Connectivities between highly and slightly influenced VOIs appear in green.

Figure 4a, and for the other two VOIs, the difference decreases gradually. Cerebellum 10 R appears as the VOI that is almost unaffected by AD, while VOI angular L is the one that is most affected by AD. The area under curve, denoted S, quantifies differences between VOIs.

After ranking all the VOIs, the similarity matrix W<sup>r</sup> is recalculated according to the new order of VOIs. W<sup>r</sup> is divided into four equal parts, as shown in Figure 5a. VOIs that are highly involved in AD appear in red and are denoted W<sup>h</sup> . VOIs that are less involved in AD appear in blue and are denoted W<sup>l</sup> . Connectivities between highly and slightly influenced VOIs, denoted Wm, appear in green.

Since Wr is symmetric, only upper triangular matrix is taken into consideration, like in Figure 5b. Therefore, the second-level feature W<sup>r</sup> is divided into three sets, and after converting them to vectors, the second-level feature for the nth sample is represented as w<sup>h</sup> <sup>n</sup>, w<sup>m</sup> <sup>n</sup> , and w<sup>l</sup> <sup>n</sup>, respectively. For w<sup>h</sup> <sup>n</sup> and w<sup>l</sup> <sup>n</sup> (red and blue parts in Figure 5b), the dimension is 1653 (58 (58–1)/2), and for <sup>w</sup><sup>m</sup> <sup>n</sup> (green part), it is 3364 (58 58). Apparently, compared to 6670 (red, blue, and green parts), the dimension is decreased by about 50–75%.

#### 2.3.2.3 Third-level feature

The third-level feature is extracted from a graph, which represents the overall connectivity between a VOI and the others. A graph G = (V, E) is defined by a finite Alzheimer's Disease Computer-Aided Diagnosis on Positron Emission Tomography Brain Images… DOI: http://dx.doi.org/10.5772/intechopen.86114

set V of vertices (VOIs) and a finite set E ⊆ V � V of edges (similarity coefficient between the ith and the jth VOI is denoted αij).

After constructing a graph for a subject, several graph measures can be computed [30]. The third-level feature is represented by two graph measures: strength and clustering coefficient.

Strength: the sum of a vertex's neighboring link weights [30]:

$$s\_i = \sum\_{j=1}^{p} a\_{ij} \tag{8}$$

where si is the strength of a vertex or a VOI.

Clustering coefficient: the geometric mean of all triangles associated with each vertex [30]:

$$\mathcal{L} = \frac{\text{diag}\left[\left(W\_r^{\frac{1}{3}}\right)^3\right]}{d(d-1)}\tag{9}$$

where diag() is an operator which takes the diagonal values from a matrix, c is a clustering coefficient vector, and d is a degree vector in which the element di is

$$d\_i = \sum\_{j=1}^{p} a\_{ij} \tag{10}$$

where aij is the connection status between the ith vertex and the jth vertex: aij = 0 when wij = 0, otherwise aij = 1.

These features exhibit different ranges of values. Thus a procedure of feature normalization is necessary by z-score prior to classification:

$$z\_{mn} = \frac{f\_{mn} - \mu\_m}{\delta\_m} \tag{11}$$

where fmn is the value of the mth feature of the nth sample and μ<sup>m</sup> and δ<sup>m</sup> are the mean value and standard deviation of the mth feature, respectively. Most of the fmn values are within the range [�1, 1], while out-of-range values are clamped to either �1 or 1.

#### 2.4 Classification results

#### 2.4.1 Separation power factor approach

Classification was performed using a support vector machine (SVM) classifier. These classifiers map pattern vectors to a high-dimensional feature space where a "best" separating hyperplane (the maximal margin hyperplane) is build. In the present work, we used a linear kernel SVM classifier [29]. The reduced number of subjects (142 patients) for this first approach leads to use a leave-one-out (LOO) strategy as the most suitable for classification validation. This technique iteratively holds out a subject for test while training the classifier with the remaining subjects, so that each subject is left out once. Parameter C is used during the training phase and tells how much outliers are taken into account in calculating support vectors. A good way to estimate the better C value to be used is to perform it with cross-validation. As result, the estimation value is fixed to 10 (C = 10) depend upon database.

#### Figure 6.

Average accuracy obtained with SVM classifier varying number of features for different VOI selection analyses and applying LOO cross-validation with estimation value C = 10.

The results achieved using the proposed method with SVM are shown in Figure 6. These results are compared to different feature selection methods: Fisher score [31], support vector machine-recursive feature elimination (SVM-RFE) [32], feature selection with random forest [33], ReliefF [34], and minimum redundancy maximum relevance (mRMR) [35]. Cross-validation average accuracy results are shown as a function of the number of selected VOIs. We note that by injecting 19, 20, or 21 VOIs with their combination of parameters in the SVM with "combination matrix" 1, mono-parametric analysis, we obtain a classification rate of 95.07%. The "combination matrix" 2 achieved higher accuracy with lower number of features (14 VOIs). The best classification results were obtained on the "combination matrix" 2, achieving 96.47%. Therefore, the proposed feature selection from PET images is very effective providing a good discrimination between AD subjects and HC, where we considered the VOIS in the brain image illustrated in Figure 7.

#### 2.4.2 Multilevel approach

The support vector machine (SVM) classifier was also applied in this second approach for classifying AD or MCI from NC subjects. This second approach takes into account three levels of features, the total of which is seven types of features that are input to seven linear SVMs. The margin parameter C of all the SVMs is fixed to one for a fair comparison. The final decision is made through a majority voting of the seven classifiers' outputs:

$$Y = \text{sgn}\left(\sum\_{t=1}^{t=7} \mathbf{y}\_t\right) \tag{12}$$

where sgn() is a sign function and yt denotes the labels of SVMt.

Classification concerns AD vs. NC and MCI vs. NC. Evaluation was done using four different parameters: classification accuracy (ACC), sensitivity (SEN), specificity (SPE), and area under the curve (AUC). A tenfold cross-validation technique Alzheimer's Disease Computer-Aided Diagnosis on Positron Emission Tomography Brain Images… DOI: http://dx.doi.org/10.5772/intechopen.86114

15 VOIs' representation of the "combination matrix" 2 on a coronal plane (a), on a transverse plane (b), and on a sagittal plane (c).
