*2.2.2 Frequency-domain based feature extraction*

SSVEP signals are identified by oscillations with frequencies synchronized with the stimulus frequency [6, 21]. For this reason, many SSVEP-based BCI systems use frequency information embedded in the signal in the feature extraction process [22, 23]. Within the scope of this chapter, SSVEP frequency features were extracted from the frequency domain representation of the SSVEP signal using a Fourier transform. The relevant and distinctive SSVEP frequency characteristics we detected are based on the spectral information of SSVEP signals for each EEG rhythm, such as energy, variance and spectral entropy.

These features explain how power, variance, and irregularity (entropy) change in certain related frequency bands. In practice, this implies that these features will use their power in certain frequency bands [24].

Features based on power spectrum, energy of each frequency band,

$$F\_1^{(f)} = \text{Energy}\_f = \sum\_{k=1}^{M} \mathcal{y}(k)^2 \tag{1}$$

Here is the Fourier transform of the analytic signal y of a real discrete time EEG signal x.*F*ð Þ*<sup>f</sup>* <sup>1</sup> ¼ *Energyf* stands for the EEG features computed from y, and M corresponds to the maximum frequency.

Features based on variance of each EEG frequency band,

*Evaluating Steady-State Visually Evoked Potentials-Based Brain-Computer Interface System… DOI: http://dx.doi.org/10.5772/intechopen.98335*

$$F\_2^{(f)} = Variance\_f = \frac{1}{M - 1} \sum\_{k=1}^{M} \left( y\_k - \overline{y} \right)^2 \tag{2}$$

"*y*" in the formula stands for the average of the "y" signal.

Feature based on entropy of each EEG frequency band: Spectral entropy measures the regularity of the power spectrum of EEG signal,

$$F\_3^{(f)} = Entropy\_f = \frac{1}{\log(M)} \sum\_{k=1}^{M} P(y(k)) \log P(y(k)) \tag{3}$$

#### *2.2.3 Wavelet transform based feature extraction*

#### *2.2.3.1 Wavelet decomposition*

SSVEP signal is non-stationary [18]. Consequently, WT has been used to examine not only spectral analysis of the signal but also the spectral behavior of the signal over time. This method is characterized by a smooth and fast oscillating function that is well localized in frequency and time [12]. WT can be applied as a specially designed dual Finite-Impulse Response (FIR) filter. The frequency responses of FIR filters separate the high frequency and low-frequency components of the input signal. The point of dividing the signal frequency is usually between 0 Hz and half the data sampling rate (Nyquist frequency). In the Multi-resolution Algorithm (MRA) of the WT, the identical wavelet coefficients are used in both low-pass (LP) and high-pass (HP) filters. The LP filter coefficients are associated with scaling parameter, which will determine the oscillatory frequency and the length of the wavelet. At the same time, the HP filter is associated with the wavelet function. The outputs of the LP filters are called the approximation *(a)* coefficients, and the outputs of the HP filters are called the detail *(d)* coefficients. In MRA of WT, any time-series signals can be entirely decomposed in terms of *a* and *d* coefficients based on decomposition level. Implementation of DWT on raw signal produces an MRA of various statistical and non-statistical parameters across time and frequency [24]. The subsets of the wavelet coefficients of the decomposition tree were selected as input vectors to the classifier. The SSVEP signals are decomposed into 9 decomposition levels, and *i = 1, 2,*. *.*., *9* for 512 Hz sampling frequency.

#### *2.2.3.2 Parameters for feature extraction*

Using different DWT functions (Haar, Db2, Sym4, Coif1, Bior3.5, Rbior2.8), SSVEP signals are subdivided into frequency bands (delta, theta, alpha, beta, gamma), and the energy, entropy and variance were calculated for each band [13, 14]. Every DWT frequency band is associated with one or two EEG rhythms. Thus, a number of features represented in the frequency bands were obtained.

Energy at each decomposition level was calculated using the following Equations [24]:

$$F\_1^{(w)} = Ed\_i = \sum\_{j=1}^{N} \left| d\_{ij} \right|^2, i = 1, 2, 3, \dots, l \tag{4}$$

$$F\_1^{(w)} = Ea\_i = \sum\_{j=1}^{N} \left| a\_{ij} \right|^2, i = 1, 2, 3, \dots, l \tag{5}$$

where *dij* and *a*ijrepresent detail and approximate coefficients, respectively, formed by the wavelet level corresponding to each EEG band (delta, theta, alpha, beta, gamma). *i* ¼ 1, 2, 3, … , *l* is the wavelet decomposition level from levels 1 to *l*. Finally, N stands for the number of detail and approximate coefficients at each decomposition level*.*

Another feature, the entropy at each decomposition level is calculated using the following Equation [25]:

$$F\_2^{(w)} = Ent\_i = -\sum\_{j=1}^{N} d\_{\vec{\eta}} \, ^2 \log \left( d\_{\vec{\eta}} \, ^2 \right), i = 1, 2, 3, \dots, l \tag{6}$$

The variance at each decomposition level was calculated using the following Equation [24]:

$$F\_3^{(w)} = Var\_i = \frac{1}{N-1} \sum\_{j=1}^{N} \left( d\_{ij} - \mu\_i \right)^2, \mu\_i = \frac{1}{N} \sum\_{j=1}^{N} d\_{ij}, i = 1, 2, 3, \dots, l \tag{7}$$

Extracted features, which consist of different combinations, (*l* +1) dimensional are used as input vectors. In other words, for an '*l*' level decomposition, the feature vector of any parameter can be represented as Feature = [*xd*1, *xd*2, … , *xdl*, *xal*], where *x* stands on energy, entropy, and variance.

#### **2.3 Machine learning classification algorithms**

The most important use of machine learning (ML) methods is classification [26]. After feature extraction, classification is performed to recognize an SSVEP signal and convert it to command, that is, to use it as output [27]. For the classification process, the "datasets" formed by a certain number of feature vectors, of which class it belongs, are passed through the training period required by the classification type. As a result of this training, a decision mechanism algorithm is created, which is used to assign the unknown signal to the appropriate class [28, 29].

The extracted feature vectors have been tested with seven well-known and commonly-used basic classifiers. These selected classifier algorithms are *Decision Trees, Discriminant Analysis, Logistic Regression, Naive Bayes, Support Vector Machines, k-Nearest Neighbors, and Ensemble Learner.* The classifier performances were examined to determine which combination of mother wavelet function, wavelet features, and classifier algorithm gives the highest accuracy.

#### **2.4 Evaluation of machine learning algorithms performance**

While training ML algorithm to classify SSVEP signals is an important step, it is essential to consider how the algorithm is generalized on unprecedented data (test set) [30]. We need to know if the algorithm works correctly and whether we can trust its predictions. The machine learning algorithm can only memorize the training set. Therefore, it can make reasonable predictions about future examples or examples that it has not seen before. Thus, it is one of the essential steps for BCI systems to know and apply the techniques used to evaluate how well a ML model generalizes to new, unprecedented data [31, 32]. For this goal the "k-fold cross-validation" and "confusion matrix" evaluation criteria were used to evaluate the performance of the ML algorithms used in this study.

*Evaluating Steady-State Visually Evoked Potentials-Based Brain-Computer Interface System… DOI: http://dx.doi.org/10.5772/intechopen.98335*

#### *2.4.1 k-fold cross-validation*

In this method, the data set is randomly divided into k segments. Among these segments, k-1 parts are used for the training, and the remaining part is used for the testing. This process is repeated until all parts are used for testing separately. The test errors are recorded each time, and the average of the errors after the last part is reported. The performance of each classifier algorithm used is measured by carrying out this approach [30, 31]. In this study, the data set is divided into five equal parts.

#### *2.4.2 Confusion matrix*

Confusion matrix is, at first, calculated to evaluate the classifier performance. The confusion matrix is generated by comparing the responses of the classification algorithm to the test set with the actual values in the data set. In case of two-state problems, it is a table consisting of four different situations [26]. These are True Positive (TP) value, True Negative (TN) value, False Positive (FP) value, and False Negative (FN) value.

Accuracy value (ACC) is calculated as classifier performance based on these values [27]:

$$\text{ACC} = \frac{\text{TP} + \text{TN}}{\text{TP} + \text{FN} + \text{FP} + \text{TN}} \tag{8}$$

#### **2.5 Experimental design and implementation details**

In accordance with the objective of our study, we have designed it in a two-fold manner for time-frequency domain features. First, we measured the accuracy of each (feature, mother wavelet function) pair. As the second part, we combined the set of three features with each mother wavelet function in order to discover which mother wavelet function yields the best performance in terms of accuracy. Three important features (i.e. energy, variance, and entropy) have been extracted for EEG bands (i.e. delta, theta, alpha, beta, and gamma) using six different mother wavelet families (Haar, db, sym, coif, bior, rbio). To this purpose, algorithms were implemented using Signal Processing Toolbox and Wavelet Toolbox in Matlab 2019a. All the classifiers and performance analyses were implemented using the Classifier Learner App tool from Matlab version 2019a.
