**2. Problem formulation**

An analysis system is proposed for MI EEG classification, as illustrated in Fig. 1. The procedure is performed in several steps, including data configuration, neuron-fuzzy prediction, feature extraction, and classification. Raw EEG data are first filtered to the frequency range containing mu and beta rhythm components in data configuration. ANFIS time-series predictions are trained by the training data at offline. Information from ANFIS time-series predictions is directly applied to predict the test data. Modified fractal dimension combined with DWT is utilized for feature extraction. The extracted fractal features are used to train the parameters of SVM classifier at offline. Finally, the SVM together with trained parameters is utilized to discriminate the features.

Fig. 1. Flowchart of proposed system.

### **3. Experimentation**

300 Fuzzy Inference System – Theory and Applications

These good properties are suitable for the prediction of non-stationary EEG signals.

An effective feature extraction method can enhance the classification accuracy. An important component for most BCIs is to extract significant features from the event-related area during different MI tasks. A great deal of feature extraction methods has been proposed. Among them, the band power and AAR parameters are commonly used [16–19]. Feature extraction based on band power is usually obtained by computing the powers at the alpha and beta bands. The features are then extracted from band powers by calculating their logarithm values [16] or averaging over them [17]. AAR parameters are another popular feature in mental tasks [18, 19]. The all-pole AAR model lends itself well to modeling EEG signals as filtered white noise with certain preferred energy bands. The EEG time series is

Furthermore, fractal geometry [20] provides a proper mathematical model to describe complex and irregular shapes that exist in nature. Fractal dimension is a statistical quantity that effectively extracts fractal features. In the last decade, feature extraction characterized by fractal dimension has been widely applied in various kinds of biomedical image and signal analyses, such as texture extraction [21], seizure onset detection in epilepsy [22], routine detection of dementia [23], and EEG analyses of sleeping newborns [24]. In this study, discrete wavelet transform (DWT) together with modified fractal dimension is utilized for feature extraction. That is, MFFVs are extracted from wavelet data by modified fractal dimension.

The support vector machine (SVM) [25] recognizing the patterns into two categories from a set of data is usually used for the analyses of classification and regression. For example, the SVM is used to classify attention deficit hyperactivity disorder (ADHD) and bipolar mood disorder (BMD) patients by proposing an adaptive mutation to improve performance [26]. The SVM is used for seizure detection in an animal model of chronic epilepsy [27]. Since it can balance accuracy and generalization simultaneously [25], it is

To evaluate the performance, several popular methods, including AAR-parameter approach and AAR time-series prediction, are implemented for comparison. This chapter is organized as follows: Section 2 presents the materials and methods. Section 3 describes experimental

An analysis system is proposed for MI EEG classification, as illustrated in Fig. 1. The procedure is performed in several steps, including data configuration, neuron-fuzzy prediction, feature extraction, and classification. Raw EEG data are first filtered to the frequency range containing mu and beta rhythm components in data configuration. ANFIS time-series predictions are trained by the training data at offline. Information from ANFIS time-series predictions is directly applied to predict the test data. Modified fractal dimension combined with DWT is utilized for feature extraction. The extracted fractal features are used to train the parameters of SVM classifier at offline. Finally, the SVM

MFFVs contain not only multiple scale attributes, but important fractal information.

results. The discussion and conclusion are given in Sections 4 and 5, respectively.

together with trained parameters is utilized to discriminate the features.

Therefore, ANFIS is used for time-series prediction in this study.

fitted with an AAR model.

used for classification in this study.

**2. Problem formulation** 

The EEG data was recorded by the Graz BCI group [19, 28–32]. Two data sets are used to evaluate the performance of all methods in the experiments. The first data sets were recorded from three subjects during a feedback experimental recording procedure. The task was to control a bar by means of imagery left or right hand movements [19, 28, 30, 31]. The order of left and right cues was random. The data was recorded on three subjects – the first subject S1 performs 280 trials, while the last two subjects, S2 and S3, hold 320 trials. The length of each trial was within 8–9s. The first 2s was quiet, an acoustic stimulus indicates the beginning of a trial at *t* = 2s, and a fixation cross + was displayed for 1s. Then at *t* = 3s, an arrow (left or right) was displayed as a cue (the data recorded between 3 and 8s are considered as event related). At the same time, each subject was asked to move a bar by imagining the left or right hand movements according to the direction of the cue. The recordings were made using a g.tec amplifier and Ag/AgCl electrodes. All signals were sampled at 128 Hz and filtered between 0.5 and 30 Hz. An example of a trial for C3 and C4 channels is given in Fig. 2(a).

The second data sets were recorded from three subjects by using a 64-channel Neuroscan EEG amplifier [29, 32]. The left and right mastoids served as a reference and ground, respectively. The EEG data was sampled at 250 Hz and filtered between 1 and 50 Hz. The subjects were asked to perform imagery movements prompted by a visual cue. Each trial started with an empty black screen; at *t* = 2s a short beep tone was presented and a cross '+'

Neuro-Fuzzy Prediction for Brain-Computer Interface Applications 303

classification. In this study, the raw EEG data are filtered to the frequency range between 8

To make a prediction at sample *t*, the measured signals extracted from the recorded EEG time-series data are used from samples *t-Ld* to *t-d*. The parameters *L* and *d* are the embedding dimension and time delay, respectively. Each training input data for ANFIS prediction consist of respective measured signals of length *L* on both the *C3* and *C4* channels, which are important for BCI works because they are located in the sensorimotor

There are event related data of approximately 5s length in each trial. All parameter selection is performed from the training data. All training data are used to train the parameters of prediction models, which will be further used for feature extraction. The test data are finally

Time series prediction is the use of a model to forecast future events based on known past events. Although all kinds of methods in time series prediction have been presented, ANFIS time-series prediction is slightly modified and adopted in this study since it integrates the

The ANFIS network architecture applied for the time-series prediction of EEG data is introduced. A detailed description of ANFIS can be found in [15]. ANFIS enhances fuzzy parameter tuning with self-learning capability for achieving optimal prediction objectives. An ANFIS network is a multilayer feed-forward network where each node performs a particular node function on incoming signals. It is characterized with a set of parameters pertaining to that node. To reflect different adaptive capabilities, both square and circle node symbols are used. A square node (adaptive node) has parameters needed to trained, while a circle node (fixed node) has none. The parameters of the ANFIS network consist of the union of the parameter sets associated to each adaptive node. To achieve a desired input-output mapping, these parameters are updated according to given training data and a

In this study, the ANFIS network applied for time-series prediction contains *L* inputs and one output. There are *2L* fuzzy if-then rules of Takagi and Sugeno's type [35] in the representation of rule base. The output is a current sample, and the inputs are the past *L* samples in the time delay *t*. The output of the *i*th node in the *l*th layer is denoted by *<sup>l</sup> Oi* . The

*Layer 1*: Each node in this layer is a square node, where the degree of membership functions

 <sup>1</sup> ( 1) ( ), 1,2, , ; 1,2; 1,2, ,2 *jk*

*O C i M t Lj d j Lk i L* (2)

of input data is calculated. The output of each node in this layer is represented as

tested to evaluate the performance of the system by using the trained parameters.

3 ,..., 3 , 4 ,..., 4 | 3 , 4 *t t C C C C CC t Ld t d t Ld t d t t* (1)

and 30 Hz with a Butterworth band-pass filter.

**4.2 Neuro-fuzzy prediction** 

advantages of NN and fuzzy system.

recursive least square (RLS) estimate.

 

node function for each layer is then described as follows.

cortex [34]. The training input data are represented as follows:

appeared on the screen to notify the subjects. Then at *t* = 3s an arrow lasting for 1.25s pointed to either the left or right direction. Each direction indicates the subjects to imagine either a left or right hand movement. The imagery movements were performed until the cross disappeared at *t* = 7s. No feedback was performed in the experiments. The data set recorded from subject S4 was 180 trials, while the data sets for subjects S5 and S6 were 120 trials. For each subject, the first half of the trials were used as training data and the later half of the trials were used as test data in this study.

Fig. 2. Intermediate results.

#### **4. Methodologies**

#### **4.1 Data configuration**

The mu and beta rhythms of the EEG are those components with frequencies distributed between 8-30 Hz and located over the sensorimotor cortex. In addition, using a wider frequency range from the acquired EEG signals can generally achieve higher classification accuracy in comparison with a narrower one [33]. A wide frequency range containing all mu and beta rhythm components is adopted to include all the important signal spectra for MI classification. In this study, the raw EEG data are filtered to the frequency range between 8 and 30 Hz with a Butterworth band-pass filter.

To make a prediction at sample *t*, the measured signals extracted from the recorded EEG time-series data are used from samples *t-Ld* to *t-d*. The parameters *L* and *d* are the embedding dimension and time delay, respectively. Each training input data for ANFIS prediction consist of respective measured signals of length *L* on both the *C3* and *C4* channels, which are important for BCI works because they are located in the sensorimotor cortex [34]. The training input data are represented as follows:

$$\left[ \left( \mathbf{C} \mathbf{3}\_{t-\mathrm{Ld}}, \dots, \mathbf{C} \mathbf{3}\_{t-\mathrm{d}}, \mathbf{C} \mathbf{4}\_{t-\mathrm{Ld}}, \dots, \mathbf{C} \mathbf{4}\_{t-\mathrm{d}} \right)^{t} \mid \left( \mathbf{C} \mathbf{3}\_{t'}, \mathbf{C} \mathbf{4}\_{t} \right)^{t} \right] \tag{1}$$

There are event related data of approximately 5s length in each trial. All parameter selection is performed from the training data. All training data are used to train the parameters of prediction models, which will be further used for feature extraction. The test data are finally tested to evaluate the performance of the system by using the trained parameters.

#### **4.2 Neuro-fuzzy prediction**

302 Fuzzy Inference System – Theory and Applications

appeared on the screen to notify the subjects. Then at *t* = 3s an arrow lasting for 1.25s pointed to either the left or right direction. Each direction indicates the subjects to imagine either a left or right hand movement. The imagery movements were performed until the cross disappeared at *t* = 7s. No feedback was performed in the experiments. The data set recorded from subject S4 was 180 trials, while the data sets for subjects S5 and S6 were 120 trials. For each subject, the first half of the trials were used as training data and the later half

(a) An example of a trial

(b) Actual and predicted signals (C3) (c) Actual and predicted signals (C4)

(Actual filtered signals: Red; Predicted signals: Blue)

The mu and beta rhythms of the EEG are those components with frequencies distributed between 8-30 Hz and located over the sensorimotor cortex. In addition, using a wider frequency range from the acquired EEG signals can generally achieve higher classification accuracy in comparison with a narrower one [33]. A wide frequency range containing all mu and beta rhythm components is adopted to include all the important signal spectra for MI

of the trials were used as test data in this study.

Fig. 2. Intermediate results.

**4. Methodologies 4.1 Data configuration**  Time series prediction is the use of a model to forecast future events based on known past events. Although all kinds of methods in time series prediction have been presented, ANFIS time-series prediction is slightly modified and adopted in this study since it integrates the advantages of NN and fuzzy system.

The ANFIS network architecture applied for the time-series prediction of EEG data is introduced. A detailed description of ANFIS can be found in [15]. ANFIS enhances fuzzy parameter tuning with self-learning capability for achieving optimal prediction objectives. An ANFIS network is a multilayer feed-forward network where each node performs a particular node function on incoming signals. It is characterized with a set of parameters pertaining to that node. To reflect different adaptive capabilities, both square and circle node symbols are used. A square node (adaptive node) has parameters needed to trained, while a circle node (fixed node) has none. The parameters of the ANFIS network consist of the union of the parameter sets associated to each adaptive node. To achieve a desired input-output mapping, these parameters are updated according to given training data and a recursive least square (RLS) estimate.

In this study, the ANFIS network applied for time-series prediction contains *L* inputs and one output. There are *2L* fuzzy if-then rules of Takagi and Sugeno's type [35] in the representation of rule base. The output is a current sample, and the inputs are the past *L* samples in the time delay *t*. The output of the *i*th node in the *l*th layer is denoted by *<sup>l</sup> Oi* . The node function for each layer is then described as follows.

*Layer 1*: Each node in this layer is a square node, where the degree of membership functions of input data is calculated. The output of each node in this layer is represented as

$$\mathbf{1}\_{\mathbf{i}}\bullet\mathbf{1}\_{\mathbf{i}}=\mu\_{M\_{\overline{\mathbb{K}}}}(\mathbb{C}\_{t-(L-j+1)d}), \quad j=1,2,...,L; \mathbf{k}: \mathbf{k}=1,2; \mathbf{i}=1,2,...,2L \tag{2}$$

Neuro-Fuzzy Prediction for Brain-Computer Interface Applications 305

ANFISs are used to perform prediction. That is, they labeled lANFIS and rANFIS are used to predict left and right training MI EEG data, respectively. The actual filtered signals and their predicted results for C3 and C4 channels are shown in Fig. 2(b) and 2(c), respectively.

After lANFIS and rANFIS are trained by using the left and right MI training data trial by trial respectively, they are used to perform one-step-ahead prediction. The test data are then input to these two ANFISs sample by sample, and features are extracted by continually calculating the difference of MFFVs between the predicted and actual signals as the length of predicted signals achieves 1-s window. The MFFV will be outlined in the next paragraph. In this study, feature extraction is performed on the 1-s window of predicted signals instead of directly classifying native predicted signals. A flowchart of

A signal is decomposed into numerous details in multiresolution analysis, where each scale represents a class of distinct physical characteristics within the signal. Wavelet transform is used to achieve multiresolutional representation in this study [21, 33, 36–39]. The 1-s segment is decomposed into numerous non-overlapping subbands by wavelet transform. Fractal geometry provides a proper mathematical model to describe a complex shape that exists in nature with fractal features. Since fractal dimension is relatively insensitive to

Fig. 3. Architecture of neuro-fuzzy prediction.

feature extraction is shown in Fig. 4.

Fig. 4. Flowchart of feature extraction.

**4.3 Feature extraction** 

where *i* 2 1 *j k* , *C* representing *C3* or *C4* is the input to node *i*, and *Mjk* is the linguistic label associated with this node function. The bell-shape Gaussian membership function ( ) *Mjk C* is used

$$\mu\_{M\_{jk}}(\mathbb{C}\_{t-(L-j+1)d}) = \exp\left(-\left(\frac{\mathbb{C}\_{t-(L-j+1)d} - a\_{jk}}{\sigma\_{jk}}\right)^2\right) \tag{3}$$

where the parameter set *ajk* , *jk* adjusts the shape of the Gaussian membership function. Parameters *Mjk* in this layer are referred to as premise parameters.

*Layer 2*: Each node in this layer is a circle node labeled multiplying the incoming signals together and sends out their product.

$$\text{CO}\_{i}^{2} = w\_{i} = \prod\_{j=1}^{L} \mu\_{M\_{ji}} (\text{C}\_{t-(L-j+1)d})\_{\prime} \quad i = 1,2 \tag{4}$$

Each node output represents the firing strength of a rule.

*Layer 3*: Each node in this layer is a circle node labeled *N*. The firing strength of a rule for each node in this layer is normalized.

$$O\_i^3 = \overline{w}\_i = \frac{w\_i}{\sum\_j w\_j}, \quad i = 1, 2 \tag{5}$$

*Layer 4*: Each node in this layer is a square node with its node function represented as

$$\mathbf{u}\odot\mathbf{f}\_{i}^{4} = \overline{w}\_{i}f\_{i} = \overline{w}\_{i}\left(\sum\_{j=1}^{L}p\_{ij}x\_{j} + r\_{i}\right) \quad i = 1,2\tag{6}$$

where the output *fi* is a linear combination of the parameter set *pij i* ,*r* . Parameters *fi* in this layer is referred to as consequent parameters.

*Layer 5*: The single node in this layer is a circle node labeled computing the overall output *y* as the sum of all incoming signals.

$$\sum\_{i} w\_{j} f\_{j}$$

$$O\_{1}^{5} = y = \sum\_{i} \overline{w}\_{i} f\_{i} = \frac{\displaystyle j}{\sum\_{j} w\_{j}}\tag{7}$$

The architecture of neuron-fuzzy prediction in this chapter is shown in Fig. 3. The consequent parameters are updated by the RLS learning procedure in the forward pass for ANFIS network learning, while the antecedent parameters are adjusted by using the error between the predicted and actual signals. The parameter optimization for ANFIS training is adopted an approach that is mixed least squares and back-propagation method. Two ANFISs are used to perform prediction. That is, they labeled lANFIS and rANFIS are used to predict left and right training MI EEG data, respectively. The actual filtered signals and their predicted results for C3 and C4 channels are shown in Fig. 2(b) and 2(c), respectively.

Fig. 3. Architecture of neuro-fuzzy prediction.

#### **4.3 Feature extraction**

304 Fuzzy Inference System – Theory and Applications

where *i* 2 1 *j k* , *C* representing *C3* or *C4* is the input to node *i*, and *Mjk* is the linguistic label associated with this node function. The bell-shape Gaussian membership function

*Layer 2*: Each node in this layer is a circle node labeled multiplying the incoming signals

*Layer 3*: Each node in this layer is a circle node labeled *N*. The firing strength of a rule for

 <sup>3</sup> , 1,2 *<sup>i</sup>*

*<sup>w</sup> Ow i*

*Layer 4*: Each node in this layer is a square node with its node function represented as

*L i i i i ij j i j*

1

where the output *fi* is a linear combination of the parameter set *pij i* ,*r* . Parameters *fi* in this

*Layer 5*: The single node in this layer is a circle node labeled computing the overall output

*i i j i j*

The architecture of neuron-fuzzy prediction in this chapter is shown in Fig. 3. The consequent parameters are updated by the RLS learning procedure in the forward pass for ANFIS network learning, while the antecedent parameters are adjusted by using the error between the predicted and actual signals. The parameter optimization for ANFIS training is adopted an approach that is mixed least squares and back-propagation method. Two

*j*

*j j*

*w f*

*w*

*O y wf*

*j j*

 

*w*

( 1) <sup>1</sup> ( ), 1,2 *ji*

<sup>2</sup>

*i i M t Lj d <sup>j</sup>*

*L*

*i i*

<sup>4</sup>

5 1

( 1)

*C a C* (3)

*t L j d jk*

*jk*

*jk* adjusts the shape of the Gaussian membership function.

*Ow C i* (4)

, 1,2

*O wf w px r i* (6)

2

(5)

(7)

( ) *Mjk*

*C* is used

where the parameter set *ajk* ,

together and sends out their product.

each node in this layer is normalized.

layer is referred to as consequent parameters.

*y* as the sum of all incoming signals.

*M t Lj d*

Each node output represents the firing strength of a rule.

Parameters *Mjk* in this layer are referred to as premise parameters.

( 1) ( ) exp *jk*

After lANFIS and rANFIS are trained by using the left and right MI training data trial by trial respectively, they are used to perform one-step-ahead prediction. The test data are then input to these two ANFISs sample by sample, and features are extracted by continually calculating the difference of MFFVs between the predicted and actual signals as the length of predicted signals achieves 1-s window. The MFFV will be outlined in the next paragraph. In this study, feature extraction is performed on the 1-s window of predicted signals instead of directly classifying native predicted signals. A flowchart of feature extraction is shown in Fig. 4.

Fig. 4. Flowchart of feature extraction.

A signal is decomposed into numerous details in multiresolution analysis, where each scale represents a class of distinct physical characteristics within the signal. Wavelet transform is used to achieve multiresolutional representation in this study [21, 33, 36–39]. The 1-s segment is decomposed into numerous non-overlapping subbands by wavelet transform.

Fractal geometry provides a proper mathematical model to describe a complex shape that exists in nature with fractal features. Since fractal dimension is relatively insensitive to

Neuro-Fuzzy Prediction for Brain-Computer Interface Applications 307

is a Lagrange multiplier and (, ) *Kxxi* is a kernel function. Generally, appropriate

*ii i o*

*g x dK x x b* (9)

<sup>2</sup> <sup>2</sup> ( , ) exp 1 2 *Kx x i j i j x x* . In this study, the

*<sup>p</sup> <sup>T</sup> Kx x x x ij ij* and the radial

Neuro-Fuzzy Prediction

In the proposed system, classification is performed on MFFVs for recognizing the corresponding state at the sample rate. A different SVM classifier at each sample point is produced to classify each set of MFFVs for the training data. The classification sample point possessing maximal classification rate for training data is used as the standard classifier, which will be used for all classification performed on the test data. The best parameters selected from the training data are then applied to the test data to estimate the classification

To assess the performance of proposed time-series prediction method, several prediction methods combined with power spectra features are implemented for comparison. They are AAR-parameter approach and AAR time-series prediction. The power spectra features are obtained by calculating the powers at the alpha and beta bands [16, 17]. The AAR-parameter method is an AAR signal modeling approach. The all-pole AAR model lends itself well to modeling the EEG as filtered white noise with certain preferred energy bands. The EEG time series is fitted with an AAR model. In the experiments, the order of AAR model is chosen as six and the AAR parameters are estimated with the RLS algorithm. To select the best value for the order of AAR model, an information theoretic approach is adopted [3]. The AAR parameters are used as features at each sample point for each trial. The AAR time-series prediction method is a time-series prediction approach, where left and rights ANFISs in the

S1 71.5 81.4 86.9 S2 66.3 76.6 84.2 S3 64.9 78.3 77.2 S4 72.6 79.6 88.6 S5 65.7 73.1 80.1 S6 61.0 77.0 79.8 Average 67.0 77.7 82.8 Table 1. Comparison of performance among different time-series prediction frameworks

AAR Prediction

AAR Parameters *i*

kernel functions are the polynomial kernel function (, ) 1

basis function (RBF) kernel function

where

latter is chosen for the SVM.

accuracy of test data.

**5.1 Performance of prediction methods** 

Classification Accuracy [%]

using power spectra features

**5. Results** 

1 ( ) , *N*

signal scaling and shows a strong correlation with human judgment of surface roughness [20], it is chosen as the feature extraction method. A variety of approaches were proposed to estimate fractal dimension from signals or images [21–24]. A differential box counting (DBC) method covering a wide dynamic range with a low computational complexity is modified and used in this study [33]. A MFFV is extracted by modified fractal dimension from all the non-overlapping subbands of a 1-s segment.

The MFFV reflects the roughness and complexity of non-overlapping subbands of a signal. These MFFV calculations reduce prediction cost from a 1-s window to a feature vector for each signal. Features are extracted by continually calculating the difference of MFFVs between the predicted and actual signals as the length of predicted signals achieves 1-s window. In other words, two sets of MFFV features are first extracted from the predicted and actual signals respectively as the length of predicted signals achieves 1-s window. They are then subtracted for each respective subband. Finally, features are obtained by continually calculating their difference. The left and right test data are input to both the lANFIS and rANFIS, and each ANFIS provides two predictions from the *C3* and *C4* channels. Accordingly, four sets of MFFVs can be extracted after each new set of predictions is obtained. Each time a new set of predictions is produced, the oldest one is removed from the 1-s segment and a new MFFV is then extracted from the signals within the window. Since a large window is too redundant for the real time application, a 1-s window is short and selected for feature extraction. The length of a 1-s segment is a compromise between the computation cost and event-related potential (ERP) component applications. If the window length is selected properly, the extracted MFFVs will produce the maximum feature separability and obtain the highest classification accuracy.

#### **4.4 Classification**

It can be difficult to establish stable NNs since appropriate number of hidden layers and neurons usually need to carefully choose to approximate the function in question to the desired accuracy. The SVM first proposed by Vapnik [25] not only has a very steady theory in statistical learning, but guarantees to obtain the optimal decision function from a set of training data. The main idea of SVM is to construct a hyperplane as the decision surface in such a way that the margin of separation between positive and negative examples is maximized. The SVM optimization problem is

$$\min\_{w} \frac{1}{2} w^T w + \mathbb{C} \sum\_{i=1}^{N} \xi\_i^{\varepsilon} \tag{8}$$
 
$$\text{subject to } \xi\_i^{\varepsilon} \ge 0, \forall i, \text{ and } d\_i \left( w^T x\_i + b \right) \ge 1 - \xi\_i^{\varepsilon}, \forall i = 1, 2, \dots N$$

where ( ) *<sup>T</sup> gx w x b* represents the hyperplane, w is the weighting vector, b is the bias term, x is the training vector with label d, C is the weighting constant, and is the slack variable. It is then transformed into a convex quadratic dual problem. The discriminant function with optimal w and b, ( ) *<sup>T</sup> o o gx w x b* , posterior to the optimization form becomes

$$\mathbf{g(x)} = \sum\_{i=1}^{N} a\_i d\_i \mathcal{K} \left( \mathbf{x}, \mathbf{x}\_i \right) + b\_o \tag{9}$$

where is a Lagrange multiplier and (, ) *Kxxi* is a kernel function. Generally, appropriate kernel functions are the polynomial kernel function (, ) 1 *<sup>p</sup> <sup>T</sup> Kx x x x ij ij* and the radial basis function (RBF) kernel function <sup>2</sup> <sup>2</sup> ( , ) exp 1 2 *Kx x i j i j x x* . In this study, the latter is chosen for the SVM.

In the proposed system, classification is performed on MFFVs for recognizing the corresponding state at the sample rate. A different SVM classifier at each sample point is produced to classify each set of MFFVs for the training data. The classification sample point possessing maximal classification rate for training data is used as the standard classifier, which will be used for all classification performed on the test data. The best parameters selected from the training data are then applied to the test data to estimate the classification accuracy of test data.

#### **5. Results**

306 Fuzzy Inference System – Theory and Applications

signal scaling and shows a strong correlation with human judgment of surface roughness [20], it is chosen as the feature extraction method. A variety of approaches were proposed to estimate fractal dimension from signals or images [21–24]. A differential box counting (DBC) method covering a wide dynamic range with a low computational complexity is modified and used in this study [33]. A MFFV is extracted by modified fractal dimension from all the

The MFFV reflects the roughness and complexity of non-overlapping subbands of a signal. These MFFV calculations reduce prediction cost from a 1-s window to a feature vector for each signal. Features are extracted by continually calculating the difference of MFFVs between the predicted and actual signals as the length of predicted signals achieves 1-s window. In other words, two sets of MFFV features are first extracted from the predicted and actual signals respectively as the length of predicted signals achieves 1-s window. They are then subtracted for each respective subband. Finally, features are obtained by continually calculating their difference. The left and right test data are input to both the lANFIS and rANFIS, and each ANFIS provides two predictions from the *C3* and *C4* channels. Accordingly, four sets of MFFVs can be extracted after each new set of predictions is obtained. Each time a new set of predictions is produced, the oldest one is removed from the 1-s segment and a new MFFV is then extracted from the signals within the window. Since a large window is too redundant for the real time application, a 1-s window is short and selected for feature extraction. The length of a 1-s segment is a compromise between the computation cost and event-related potential (ERP) component applications. If the window length is selected properly, the extracted MFFVs will produce the maximum feature

It can be difficult to establish stable NNs since appropriate number of hidden layers and neurons usually need to carefully choose to approximate the function in question to the desired accuracy. The SVM first proposed by Vapnik [25] not only has a very steady theory in statistical learning, but guarantees to obtain the optimal decision function from a set of training data. The main idea of SVM is to construct a hyperplane as the decision surface in such a way that the margin of separation between positive and negative examples is

where ( ) *<sup>T</sup> gx w x b* represents the hyperplane, w is the weighting vector, b is the bias

variable. It is then transformed into a convex quadratic dual problem. The discriminant

subject to 0, , and 1 , 1,2,

*T <sup>i</sup> <sup>w</sup> <sup>i</sup> T i ii i*

*ww C*

<sup>1</sup> min 2

term, x is the training vector with label d, C is the weighting constant, and

1

*i d wx b i N*

*N*

 

*o o gx w x b* , posterior to the optimization form becomes

(8)

is the slack

non-overlapping subbands of a 1-s segment.

separability and obtain the highest classification accuracy.

maximized. The SVM optimization problem is

function with optimal w and b, ( ) *<sup>T</sup>*

**4.4 Classification** 

#### **5.1 Performance of prediction methods**

To assess the performance of proposed time-series prediction method, several prediction methods combined with power spectra features are implemented for comparison. They are AAR-parameter approach and AAR time-series prediction. The power spectra features are obtained by calculating the powers at the alpha and beta bands [16, 17]. The AAR-parameter method is an AAR signal modeling approach. The all-pole AAR model lends itself well to modeling the EEG as filtered white noise with certain preferred energy bands. The EEG time series is fitted with an AAR model. In the experiments, the order of AAR model is chosen as six and the AAR parameters are estimated with the RLS algorithm. To select the best value for the order of AAR model, an information theoretic approach is adopted [3]. The AAR parameters are used as features at each sample point for each trial. The AAR time-series prediction method is a time-series prediction approach, where left and rights ANFISs in the


Table 1. Comparison of performance among different time-series prediction frameworks using power spectra features

Neuro-Fuzzy Prediction for Brain-Computer Interface Applications 309

suitable for the prediction of non-stationary EEG signals. Table 1 lists the comparisons of performance among different prediction frameworks using power spectra features. In addition, two-way ANOVA and multiple comparison tests are performed to verify if the prediction methods are significantly different or not. The results indicate that AAR timeseries prediction method is much better than AAR parameter approach in classification accuracy (*p*-value 0.0007) that is improved by 10.7% on average, while ANFIS time-series prediction method is slightly better than AAR prediction method (*p*-value 0.0195). The classification accuracy increases by 5.1%. Accordingly, ANFIS time-series prediction has the best performance in classification accuracy among these three methods. The results deduce that ANFIS time-series prediction is the best prediction framework in MI classification.

Wavelet-fractal features are extracted from wavelet data by modified fractal dimension. MFFVs are utilized to describe the characteristic of fractal features in different wavelet scales, which are greatly beneficial for the analysis of EEG data. The comparison of performance between power spectra and MFFV features under the use of ANFIS time-series prediction is listed in Table 2. In addition, two-way ANOVA and multiple comparison tests are performed again to validate whether the two features are significantly different. The results indicate that MFFV features are significantly better than power spectra features in classification accuracy (*p*-value 0.0030), which is improved by 8.2% on average. The results indicate that MFFV features are better. These two results also suggest that ANFIS prediction

framework together with MFFV features is a good combination in BCI applications.

The proposed ANFIS prediction framework combined with MFFV features provides a good potential for EEG-based MI classification. Furthermore, the proposed method has other potential advantages as follows: Firstly, the MFFV features really improve the separability of MI data, because the power spectra feature extracted from the predicted signals results in poorer performance. Secondly, the MFFV features can effectively reduce the degradation of noise. In other words, the MFFV features are extracted by DWT and modified fractal dimension. The former obtains multiscale information of EEG signals while the latter decreases the effect of noise. It is because the calculation of an improved DBC method is

We have proposed a BCI system embedding neuro-fuzzy prediction in feature extraction in this work. The results demonstrate the potential for the use of neuro-fuzzy prediction together with support vector machine in MI classification. It also shows that the proposed system is robust for the inter-subject use under careful parameter training, which is important for BCI applications. Compared with other well-known approaches, neuro-fuzzy prediction together with SVM achieves better results in BCI applications. In future works, more effective prediction/features and powerful classifiers will be used to further improve

**6.2 Statistical evaluation of features** 

**6.3 Advantage of proposed method** 

**7. Conclusion** 

classification results.

proposed and applied to modified fractal dimension.

ANFIS time-series prediction method are replaced by left and right AAR models. The lengths of windows for the AAR-parameter approach and AAR time-series prediction are all 1-s windows, which are the same as that for the ANFIS time-series prediction.

The comparison results of classification accuracy among different time-series prediction using power spectra features are listed in Table 1. The average classification accuracy of AAR-parameter approach is 67.0%, while AAR time-series prediction is 77.7% in the average classification accuracy. ANFIS time-series prediction obtains the best average classification accuracy (82.8%).
