2. Concise description of technical/biological artifacts

#### 2.1. Technical artifacts

Technical artifacts such as power line interference, impedance fluctuation, and wire movement superimpose their energy on observed EEG signals because of faults in setting conditions

[18, 19]. These can be precluded from easy ways, detaching a charging AC adapter from the recording device, carefully attaching electrodes to the scalp, and using appropriated electrode wires or adhesive tapes to stabilize wires shown in Figure 1. The cross mark in the figure indicates detaching the source of technical artifact from the setting conditions.

#### 2.2. Biological artifacts

temporal resolution on the order of milliseconds, the small installation space for operating systems, and its usability in noninvasive recording [4]. Although the spatial resolution and specificity are low because it observes the volume conduction effects in brain network [5], this has been attracted attention as a viable and inexpensive modality to study kaleidoscopic functional states of the cerebral cortex: where, when, how, and under what our brain functions come into being [6]. Therefore, providing a capacity to adapt EEG systems to real environments is always a major challenge for neuroscientists and neuroengineers on the final stretch of constructing systems.

Using an extremely small number of electrodes (the single-electrode case would be an extreme case) for signal acquisition should result in better practical application in daily life. Recently, specialized (headband type or headset type) devices, which are endowed with small number of electrodes less than gold standard devices having 16, 32, 64, or more channels, have been developed as for compact, portable, and feasible EEG systems to use themselves in the real environments [7]. The devices are usually implemented with dry electrodes and wireless sensor network technology for recordings. These can diminish the burden on the user caused by oppressive feeling in the head, eliminate the discomfort from conductive gel or paste, and improve degree

However, technical/biological artifacts, such as active power line interference, eyeblink, and muscle activity caused by recording mistake, good conductivity of the scalp, and so on, are often mixed with EEG signals whether the type of device is gold standard or specialized. They ingeniously disguise themselves as EEG components in observed EEG signals and cause a discrepancy between research motivation and system realization. Removing mimetic components (artifacts) or extracting intrinsic EEG components from observed EEG signals will become a more important process in all EEG systems for practical use even if single electrode

Disclosing the meaning of electric signals comprising various neuronal populations (sources) breaks down the EEG inverse (blind source separation (BSS)) problem [9]. It is well known that the enormous indeterminacies in brain make the BSS problem ill-posed; however, statistical natures lead to restoring the well-posedness of the problem in a biosignal processing. By the properties, theoretically multivariate statistical analysis approaches like independent component analysis (ICA) can separate observed EEG signals into spatially and temporally distinguishable components effectively, and then, estimated components will be identified as neuronal or artifactual sources by hard/soft threshold to reconstruct artifact-free EEG matrix [10, 11]. Whereas there are several reviews on artifact rejection methods including overall procedure (signal separation, component identification, and signal reconstruction) for multi-channel EEG signals [12–16], we have never seen review of artifact rejection methods for single-channel EEG signals. In this chapter, we therefore describe algorithms for artifact rejection in multi-/single-channel EEG signals.

Technical artifacts such as power line interference, impedance fluctuation, and wire movement superimpose their energy on observed EEG signals because of faults in setting conditions

of freedom of movements by doing away with wires plugged into an amplifier [8].

is integrated with data acquisition module by a specialized device.

2. Concise description of technical/biological artifacts

2.1. Technical artifacts

70 Electroencephalography

Biological artifacts, which are discharged potentials of internal organs, diffuse their energy over the head and reach each electrode attaching on the surface of the scalp as observed EEG signal. They contaminate observed signals due to the iron accumulation in the brain and good conductivity of the scalp can be broadly separated into four categories: (i) muscular, (ii) cardiac, (iii) eyemovement, and (iv) eyeblink. EEG devices capture comprehensive electric field which was reached at an electrode even if the potential contains information of electrophysiological actions except neuronal one (see Figure 2). Because all electrical potentials will be equally and blindly treated, recording information including only EEG components from electrodes placed on the scalp is hardly realized. Furthermore, frequency characteristics of biological artifacts and neuronal oscillations could be overlapped. That means that shunning contact with biological artifacts may seem hopelessly difficult compared with technical

Figure 1. Ways of precluding technical artifacts [17]. (A) Power line interference. (B) Impedance fluctuation. (C) Wire movement.

Figure 2. Configuration of an observed EEG signal including biological artifacts.

artifacts. If contaminated epochs are found in visual or quantitative analysis, the EEG system has to ignore them before deciding control commands. Otherwise, the operator will make a fatal mistake in its system by counterfeit EEG patterns [12, 17].

Alternatively, signal processing techniques can extract EEG components from observed signals. Through this process, EEG systems would provide correct outputs for their unique and beneficial interface. Even today, many works for detection, classification, and removal of artifacts within observed EEG signals have been reported [20–22].

### 3. Review of existing methods on artifact rejection

In this section, the standard assumptions of observed cerebral signal for spatially and temporally separating components are described before introduction of artifact rejection methods to reach deep understanding of the statistical framework. Then, methods of multi-/single-channel artifact rejection (principal component analysis (PCA), independent component analysis (ICA), regression, filtering, ICA-based signal decomposition, and nonnegative matrix factorization) are presented. Each algorithm has specialized approaches for calculating demixing matrix, identifying separated components, and denoising the artifactual components to complete source separation. We have focused on the advantages and disadvantages of approaches.

#### 3.1. In multi-channel signals

#### 3.1.1. Standard assumption of sources

The first thing that all artifact rejection methods have to do is calculating demixing matrix W under the standard assumption of sources regardless of the target object. In EEG signal processing, the observed cerebral signal x(n) is considered as the sum of the cerebral source (local-field) activity s(n) and the noise/artifact d(n). Neuronal cells have limited their connection ability to short-range order (less than 500 μm) [23]. Besides, synchrony in local-field activities diffuses through a contiguous cortical area rather than jump between distant and weakly connected cortical areas [24].

Therefore, an assumption that cerebral sources and non-cerebral sources are linearly combined, allows the following formulation of the underlying biophysics of the signal generation and propagation of the potential [25]:

$$\mathbf{x}(n) = \mathbf{A}\mathbf{s}(n) + \mathbf{d}(n),\tag{1}$$

where: <sup>x</sup>ðnÞ¼½x1ðnÞ, <sup>x</sup>2ðnÞ,…, xPðnÞ�<sup>T</sup> is the observed <sup>P</sup>-channel EEG data at the <sup>n</sup>-th point (superscript T means the transpose of a vector or matrix); <sup>s</sup>ðnÞ¼½s1ðnÞ, <sup>s</sup>2ðnÞ,…, sQðnÞ�<sup>T</sup> is the Q unknown source data, in which each row means cerebral or non-cerebral source; A is the <sup>P</sup> · <sup>Q</sup> full-rank unknown mixing matrix; and <sup>d</sup>ðnÞ¼½d1ðnÞ, <sup>d</sup>2ðnÞ,…, dPðnÞ�<sup>T</sup> is the <sup>P</sup> additive zero-mean noise data. In real scenarios, there are likely to be more sources than observations (Q > P); however, handing the number of sources the same as the number of observations (Q = P) does not normally become a fatal problem. Thus, most algorithms extract a linear combination of sources belonging to the same subspace [26, 27].

All algorithms have a common disadvantage that they can only handle over-determined mixture for the inverse process while having no priori information on the characteristics of the sources. Additional three assumptions are reluctantly accepted: (i) the noise/artifact is spatially uncorrelated with the observed data (E½AsðnÞdðnÞ <sup>T</sup>� ¼ <sup>0</sup>, where <sup>E</sup>½�� is the expectation operator), and temporally uncorrelated (E½dðnÞdðn þ τÞ T � ¼ 0, where τ is lag time and ∀τ > 0); (ii) the number of sources is equal to or less than the number of observations (Q ≤ P); and (iii) the mixing matrix A is stationary [28].

#### 3.1.2. Blind source separation algorithms

artifacts. If contaminated epochs are found in visual or quantitative analysis, the EEG system has to ignore them before deciding control commands. Otherwise, the operator will make a

Alternatively, signal processing techniques can extract EEG components from observed signals. Through this process, EEG systems would provide correct outputs for their unique and beneficial interface. Even today, many works for detection, classification, and removal of

In this section, the standard assumptions of observed cerebral signal for spatially and temporally separating components are described before introduction of artifact rejection methods to reach deep understanding of the statistical framework. Then, methods of multi-/single-channel artifact rejection (principal component analysis (PCA), independent component analysis (ICA), regression, filtering, ICA-based signal decomposition, and nonnegative matrix factorization) are presented. Each algorithm has specialized approaches for calculating demixing matrix, identifying separated components, and denoising the artifactual components to complete source separation. We have focused on the advantages and disadvantages of approaches.

The first thing that all artifact rejection methods have to do is calculating demixing matrix W under the standard assumption of sources regardless of the target object. In EEG signal processing, the observed cerebral signal x(n) is considered as the sum of the cerebral source (local-field) activity s(n) and the noise/artifact d(n). Neuronal cells have limited their connection ability to short-range order (less than 500 μm) [23]. Besides, synchrony in local-field activities diffuses through a contiguous cortical area rather than jump between distant and

Therefore, an assumption that cerebral sources and non-cerebral sources are linearly combined, allows the following formulation of the underlying biophysics of the signal generation

where: <sup>x</sup>ðnÞ¼½x1ðnÞ, <sup>x</sup>2ðnÞ,…, xPðnÞ�<sup>T</sup> is the observed <sup>P</sup>-channel EEG data at the <sup>n</sup>-th point (superscript T means the transpose of a vector or matrix); <sup>s</sup>ðnÞ¼½s1ðnÞ, <sup>s</sup>2ðnÞ,…, sQðnÞ�<sup>T</sup> is the Q unknown source data, in which each row means cerebral or non-cerebral source; A is the <sup>P</sup> · <sup>Q</sup> full-rank unknown mixing matrix; and <sup>d</sup>ðnÞ¼½d1ðnÞ, <sup>d</sup>2ðnÞ,…, dPðnÞ�<sup>T</sup> is the <sup>P</sup> additive zero-mean noise data. In real scenarios, there are likely to be more sources than observations (Q > P); however, handing the number of sources the same as the number of observations

xðnÞ ¼ AsðnÞ þ dðnÞ, ð1Þ

fatal mistake in its system by counterfeit EEG patterns [12, 17].

artifacts within observed EEG signals have been reported [20–22].

3. Review of existing methods on artifact rejection

3.1. In multi-channel signals

72 Electroencephalography

3.1.1. Standard assumption of sources

weakly connected cortical areas [24].

and propagation of the potential [25]:

Under aforementioned assumptions, BSS approaches estimate sources <sup>S</sup>^ ¼ ½^sð1Þ, …, ^sðNÞ� from observed EEG data X ¼ ½xð1Þ, …, xðNÞ�. Unsupervised learning methods such as PCA and ICA jointly estimate demixing matrix <sup>W</sup> ð¼ <sup>A</sup>�<sup>1</sup> Þ:

$$
\hat{\mathbf{s}}(n) = \mathbf{W}\mathbf{x}(n). \tag{2}
$$

Each unsupervised learning method has an algorithm that is subject to various indices: uncorrelatedness, independence, non-Gaussianity, instantaneous propagation, and linearity [29]. Linear mixture concept of blind EEG source separation is shown in Figure 3 that presents a demixing matrix W(=W1W2) as two-step estimator because some methods firstly decorrelate an observed matrix by W<sup>1</sup> and then demix it by W2. Given a mixing matrix A is composed of the three blind cerebral sources s(n) and provides the same number of observations x(n) in the figure.

PCA converts the observed matrix of possibly correlated variables into values of linearly uncorrelated variables (principal components (PCs)) with the first-and second-order statics [30]. This algorithm conducts the eigenvalue decomposition to get the directions u of greater

Figure 3. Linear mixture concept of blind EEG source separation [15, 17].

variance in the input space of the EEG data X based on assumptions that data are jointly normally distribution, and the sources are uncorrelated. In order to satisfy the assumptions, obtained matrix Xold should be standardized to decorrelate samples of the same dimension (E½xðnÞxðn þ τÞ <sup>T</sup>� ¼ <sup>0</sup>) and to uniform unit (V½Xp� ¼ 1).

In PCA algorithm, the first PC, which has the largest variance in the standardized input space, is a linear combination of X defined by weights u<sup>1</sup> ¼ ½u1,…, uP� T :

$$\mathbf{PC}\_1 = \mathbf{X}^\mathrm{T} \boldsymbol{\mu}\_1 \tag{3}$$

$$\mathbb{V}[\mathbf{PC}\_1] = \mathbb{V}[\mathbf{X}^\mathrm{T}\boldsymbol{\mu}\_1] = \boldsymbol{\mu}\_1^\mathrm{T} \sum \boldsymbol{\mu}\_{1\prime} \tag{4}$$

where <sup>∑</sup> (<sup>¼</sup> XX<sup>T</sup>=ð<sup>N</sup> � <sup>1</sup>Þ) is covariance matrix of <sup>X</sup>. Therefore, this algorithm formulates the given problem in an optimization problem:

$$\max \,\mathfrak{u}\_1^T \sum \mathfrak{u}\_{1\prime} \tag{5}$$

$$\text{subject to } \boldsymbol{\mu\_1}^{\mathrm{T}} \boldsymbol{\mu\_1} = 1.\tag{6}$$

It can be solved by Lagrange multiplier method:

$$L(\boldsymbol{\mu}\_1, \lambda\_1) = \boldsymbol{\mu}\_1^T \sum \boldsymbol{\mu}\_1 + \lambda\_1 (1 - \boldsymbol{\mu}\_1^T \boldsymbol{\mu}\_1),\tag{7}$$

$$\frac{\partial L(\mu\_1, \lambda\_1)}{\partial \mu\_1} = 2 \sum \mu\_1 - 2\lambda\_1 \mu\_1 = 0,\tag{8}$$

$$
\mu\_1 \mathbf{u}\_1^T \sum \mathbf{u}\_1 = \lambda\_1 \mathbf{u}\_1^T \mathbf{u}\_1 = \lambda\_1. \tag{9}
$$

The covariance matrix ∑ is sequentially decomposed into eigenvector u<sup>p</sup> and eigenvalue λ<sup>p</sup> by an assumption that the PCs are orthogonal. The eigenvector u<sup>p</sup> is similar to the column of the inverse demixing matrix W�<sup>1</sup> . PCA-based methods have an advantage over stationary data; however, satisfying their assumption for EEG data is difficult [31]. On the other hand, PCA algorithm is often incorporated into a first decorrelation or whitening step of some ICA algorithms [32].

ICA is the most famous and prevalent unsupervised learning algorithm to decompose multichannel EEG data X into independent components (ICs) S^ with high-order (spatial) moments, beyond the second-order statics used in PCA, whereas some algorithms use the statics as well as PCA [4]. A state-of-the-art topical review published on 2015 reported that second order blind interference (SOBI) and information maximization (InfoMax) are the most commonly used algorithm for EEG signal processing [15]. In this chapter, we describe InfoMax algorithm.

The fundamental problem tackled by InfoMax ICA is how to minimize the mutual information (MI) of the output vector ^s,

Review of Artifact Rejection Methods for Electroencephalographic Systems http://dx.doi.org/10.5772/68023 75

$$\mathbf{MI}(\hat{\mathbf{s}}) = \sum\_{p=1}^{p} H(\hat{\mathbf{s}}\_p) - H(\hat{\mathbf{s}}).\tag{10}$$

Probability density functions of observed signal p(x) and estimated signal pð^sÞ have following relationship:

$$p(\hat{\mathbf{s}})d\hat{\mathbf{s}} = p(\mathbf{x})d\mathbf{x},\tag{11}$$

$$d\hat{\mathbf{s}} = f(\mathbf{x})d\mathbf{x} = |\mathbf{W}|d\mathbf{x},\tag{12}$$

$$p(\hat{\mathbf{s}}) = p(\mathbf{x})d\mathbf{x} = p(\mathbf{W}^{-1}\hat{\mathbf{s}})|\mathbf{W}|^{-1},\tag{13}$$

where J(x) is Jacobian matrix. The estimating entropy Hð^sÞ is given by:

$$\begin{split} H(\hat{\mathbf{s}}) &= -\int p(\hat{\mathbf{s}}) \log p(\hat{\mathbf{s}}) d\hat{\mathbf{s}} \\ &= -\int \left( \log p(\mathbf{x}) - \log |\mathbf{W}| \right) p(\mathbf{x}) d\mathbf{x} \\ &= -\int p(\mathbf{x}) \log p(\mathbf{x}) d\mathbf{x} + \log |\mathbf{W}| \\ &= H(\mathbf{x}) + \log |\mathbf{W}|. \end{split} \tag{14}$$

Therefore, the MI can be rewritten as following:

$$\text{MI}(\hat{\mathbf{s}}) = \sum\_{p=1}^{p} H(\hat{\mathbf{s}}\_p) - H(\mathbf{x}) - \log|\mathbf{W}|.\tag{15}$$

By partially differentiating this index on parameters W, optimized solution for source separation will be obtained.

$$\frac{\partial \mathbf{M}(\hat{\mathbf{s}})}{\partial \mathbf{W}} = \sum\_{p=1}^{p} \frac{\partial \left( - \int p(\hat{\mathbf{s}}) \log p(\hat{\mathbf{s}}\_p) d\hat{\mathbf{s}} \right)}{\partial \mathbf{W}} - \left( \mathbf{W}^{\mathrm{T}} \right)^{-1} = -\mathbb{E}[\mathbf{q} \boldsymbol{\hat{s}}(\hat{\mathbf{s}}) \mathbf{x}^{\mathrm{T}}] - \left( \mathbf{W}^{\mathrm{T}} \right)^{-1}, \tag{16}$$

where

variance in the input space of the EEG data X based on assumptions that data are jointly normally distribution, and the sources are uncorrelated. In order to satisfy the assumptions, obtained matrix Xold should be standardized to decorrelate samples of the same dimension

In PCA algorithm, the first PC, which has the largest variance in the standardized input space,

where <sup>∑</sup> (<sup>¼</sup> XX<sup>T</sup>=ð<sup>N</sup> � <sup>1</sup>Þ) is covariance matrix of <sup>X</sup>. Therefore, this algorithm formulates the

T

<sup>V</sup>½PC1� ¼ <sup>V</sup>½X<sup>T</sup>u1� ¼ <sup>u</sup><sup>1</sup>

max u<sup>1</sup> T

subject to u<sup>1</sup>

T

¼ 2

<sup>X</sup>u<sup>1</sup> <sup>¼</sup> <sup>λ</sup>1u<sup>1</sup>

The covariance matrix ∑ is sequentially decomposed into eigenvector u<sup>p</sup> and eigenvalue λ<sup>p</sup> by an assumption that the PCs are orthogonal. The eigenvector u<sup>p</sup> is similar to the column of the

however, satisfying their assumption for EEG data is difficult [31]. On the other hand, PCA algorithm is often incorporated into a first decorrelation or whitening step of some ICA

ICA is the most famous and prevalent unsupervised learning algorithm to decompose multichannel EEG data X into independent components (ICs) S^ with high-order (spatial) moments, beyond the second-order statics used in PCA, whereas some algorithms use the statics as well as PCA [4]. A state-of-the-art topical review published on 2015 reported that second order blind interference (SOBI) and information maximization (InfoMax) are the most commonly used algorithm for EEG signal processing [15]. In this chapter, we describe InfoMax algorithm. The fundamental problem tackled by InfoMax ICA is how to minimize the mutual information

<sup>X</sup>u<sup>1</sup> <sup>þ</sup> <sup>λ</sup>1ð<sup>1</sup> � <sup>u</sup><sup>1</sup>

Lðu1, λ1Þ ¼ u<sup>1</sup>

∂Lðu1, λ1Þ ∂u<sup>1</sup>

> u1 T

T :

PC1 <sup>¼</sup> <sup>X</sup><sup>T</sup>u1, <sup>ð</sup>3<sup>Þ</sup>

<sup>X</sup>u1, <sup>ð</sup>4<sup>Þ</sup>

<sup>X</sup>u1, <sup>ð</sup>5<sup>Þ</sup>

<sup>T</sup>u<sup>1</sup> <sup>¼</sup> <sup>1</sup>: <sup>ð</sup>6<sup>Þ</sup>

<sup>X</sup>u<sup>1</sup> � <sup>2</sup>λ1u<sup>1</sup> <sup>¼</sup> <sup>0</sup>, <sup>ð</sup>8<sup>Þ</sup>

. PCA-based methods have an advantage over stationary data;

<sup>T</sup>u<sup>1</sup> <sup>¼</sup> <sup>λ</sup>1: <sup>ð</sup>9<sup>Þ</sup>

<sup>T</sup>u1Þ, <sup>ð</sup>7<sup>Þ</sup>

<sup>T</sup>� ¼ <sup>0</sup>) and to uniform unit (V½Xp� ¼ 1).

is a linear combination of X defined by weights u<sup>1</sup> ¼ ½u1,…, uP�

given problem in an optimization problem:

It can be solved by Lagrange multiplier method:

inverse demixing matrix W�<sup>1</sup>

(MI) of the output vector ^s,

algorithms [32].

(E½xðnÞxðn þ τÞ

74 Electroencephalography

$$q(\hat{\mathbf{s}}\_p) = \frac{d \log p\left(\hat{\mathbf{s}}\_p\right)}{d \hat{\mathbf{s}}\_p}.\tag{17}$$

As analytical computation of equation as mentioned above is difficult, this algorithm uses a gradient update rule based on the natural gradient [33] and learning rate η that is a positive constant:

$$
\mathbf{W} \leftarrow \mathbf{W} + \eta \Delta \mathbf{W},\tag{18}
$$

$$\Delta \mathbf{W} = \left( \mathbb{E} [\boldsymbol{\varrho} \boldsymbol{\varrho}(\hat{\mathbf{s}}) \mathbf{x}^{\mathrm{T}}] + (\mathbf{W}^{\mathrm{T}})^{-1} \right) \mathbf{W}^{\mathrm{T}} \mathbf{W} = \left( \mathbb{E} [\boldsymbol{\varrho} \boldsymbol{\varrho}(\hat{\mathbf{s}}) \hat{\mathbf{s}}^{\mathrm{T}}] + \mathbf{I} \right) \mathbf{W}. \tag{19}$$

#### 3.1.3. Component identification after source separation

After source separation, estimated sources S^ have to be continuously identified as neuronal or artifactual sources to reconstruct artifact-free EEG matrix X^ . Visual inspection of scalp topography and empirical judgment was given the credit for identification of components [10, 14]. The overused techniques are still examined in an expedient manner for checking the results. That leads to increase in workload; therefore, hard/soft-threshold function, probability approach, and machine learning algorithm with features of the prepared material have been used for automatically identifying artifacts in estimated sources to reduce the workload and to get more repeatable labels [34, 35]. Proposing automatic and unsupervised component identification algorithm to characterize more precisely and flexibly has still been an active research area [36, 37]. Once estimated sources are identified, they advance to next step called denoising step, and then an underlying EEG matrix will be reconstructed using inverse linear demixing process (see Figure 4).

#### 3.2. In single-channel signals

#### 3.2.1. Discrepancy among standard assumptions about multi-/single-channel data

We can easily imagine that single-channel data do not always satisfy the assumptions for BSS techniques. Calculating demixing matrix W is especially difficult with single-channel artifact rejection methods (see Figure 5), so that researchers are forced to select whether to add information by using the reference channel before applying a method or to separate data by using only one-channel.

#### 3.2.2. Regression

Regression algorithm was most frequently used to remove artifact up to the mid-1990s [38, 39]. In this algorithm, an observed EEG signal x(n) can be expressed as

Figure 4. Block diagram of the blind source separation [11].

Review of Artifact Rejection Methods for Electroencephalographic Systems http://dx.doi.org/10.5772/68023 77

Figure 5. Procedure of signal separation in single-channel artifact rejection methods.

$$\mathbf{x}(n) = \mathbf{x}\_{\text{EEG}}(n) + \mathbf{x}\_{\text{Art}}(n) + d(n), \tag{20}$$

where xEEGðnÞ, xArtðnÞ, and d(n) are intrinsic EEG data, artifact, and noise. It is assumed that the expected value of d(n) is 0.

The artifact would be corrected by calculating propagation factors to estimate the relationship between the reference signal xRefðnÞ and the observed EEG signal and subtracting the regressed portion [40]. The rationale of the procedure is as follows:

Step 1. Separately average over observed EEG and reference signals of T trials to estimate the artifact waveform related variation for the channels:

$$\overline{\mathbf{x}}(n) = \frac{1}{T} \sum\_{t=1}^{T} \mathbf{x}\_t(n), \tag{21}$$

Step 2. Subtract the averages from every trial data to obtain deviations:

$$\mathfrak{x}'(n) = \mathfrak{x}(n) - \mathfrak{x}(n),\tag{22}$$

where xðnÞ is duplicated T · 1 matrix of the observed EEG average,

Step 3. Calculate the propagation factor C by linear least-square regression whereby the observed EEG data are considered as a dependent variable and the reference data are considered as the independent variable:

$$\mathbf{X} = \mathbf{C}(\mathbf{X}\_{\text{Ref}}),\tag{23}$$

where

W←W þ ηΔW, ð18Þ

W: ð19Þ

<sup>E</sup>½ϕð^sÞ^s<sup>T</sup>� þ <sup>I</sup>

ΔW ¼

process (see Figure 4).

76 Electroencephalography

using only one-channel.

3.2.2. Regression

3.2. In single-channel signals

3.1.3. Component identification after source separation

<sup>E</sup>½ϕð^sÞx<sup>T</sup>�þðW<sup>T</sup><sup>Þ</sup>

3.2.1. Discrepancy among standard assumptions about multi-/single-channel data

In this algorithm, an observed EEG signal x(n) can be expressed as

Figure 4. Block diagram of the blind source separation [11].

�1 

After source separation, estimated sources S^ have to be continuously identified as neuronal or artifactual sources to reconstruct artifact-free EEG matrix X^ . Visual inspection of scalp topography and empirical judgment was given the credit for identification of components [10, 14]. The overused techniques are still examined in an expedient manner for checking the results. That leads to increase in workload; therefore, hard/soft-threshold function, probability approach, and machine learning algorithm with features of the prepared material have been used for automatically identifying artifacts in estimated sources to reduce the workload and to get more repeatable labels [34, 35]. Proposing automatic and unsupervised component identification algorithm to characterize more precisely and flexibly has still been an active research area [36, 37]. Once estimated sources are identified, they advance to next step called denoising step, and then an underlying EEG matrix will be reconstructed using inverse linear demixing

We can easily imagine that single-channel data do not always satisfy the assumptions for BSS techniques. Calculating demixing matrix W is especially difficult with single-channel artifact rejection methods (see Figure 5), so that researchers are forced to select whether to add information by using the reference channel before applying a method or to separate data by

Regression algorithm was most frequently used to remove artifact up to the mid-1990s [38, 39].

<sup>W</sup><sup>T</sup><sup>W</sup> <sup>¼</sup>

$$\mathbf{X} = \begin{bmatrix} \mathbf{x}'(1), \dots, \mathbf{x}'(t), \dots, \mathbf{x}'(T) \end{bmatrix}^{\mathrm{T}}, \tag{24}$$

$$\mathbf{x}'(t) = [\mathbf{x}'(1 + N(t - 1)), \dots, \mathbf{x}'(t \text{N})],\tag{25}$$

Step 4. Correct the observed EEG data by subtracting the reference data scaled by the propagation factor C:

$$
\hat{\mathfrak{x}}(n) = \mathfrak{x}(n) - \mathbb{C}\left(\mathfrak{x}\_{\text{Ref}}(n)\right). \tag{26}
$$

Because averaging operator emphasizes a time-locked activity in observed EEG signals, this method requires a reference channel and is powerful only if the operating system treats eventrelated brain potentials. Cerebral activities are usually not time-locked that means that important nontime-locked components will be lost by the averaging operation. Furthermore, this method does not take bidirectional contamination into account and cancels the cerebral information from each observed EEG signal upon linear subtraction [41]. Despite its disadvantages, regression is still used as the "gold standard" method to which the performance of any artifact rejection algorithms may be compared.

#### 3.2.3. Filtering

Band-pass is one of the classical and simple separation attempts to remove artifacts from an observed EEG signal. This method is effective if the spectral distributions of the EEG component and artifact do not overlap, and there are small band artifacts such as power line noise (50/60 Hz interference) [42]. However, fixed-gain filtering is not effective for biological artifacts because it will attenuate EEG component and change both amplitude and phase of signal if the filtering keeps doing that [43]. Some adaptive algorithms try to adapt the filter parameters w to minimize the error between the artifact-free EEG signal x^ðnÞ and the desired original signal x (n) to suppress the limitations of this method.

Adaptive filtering assumes that the intrinsic EEG signal and artifact are uncorrelated; therefore, the artifact is considered to be an additive noise within the observed signal:

$$\mathbf{x}\_t(n) = \mathbf{s}\_t(n) + n\_{0t}(n),\tag{27}$$

where xt(n) is the observed EEG signal of t-th trial, n0(0) is the additive noise to offset and is uncorrelated with intrinsic EEG signal st(n). The filter parameters w are iteratively adjusted by a feedback (recursive) process designed to make the output as close as possible to some desired response with an additive noise interference [44, 45]. Figure 6 shows the noise canceller system using adaptive filtering. In this system, the primary input xt(n) and the reference input xRef<sup>t</sup>ðnÞ are the observed EEG and reference signals. A reference input xRef<sup>t</sup>ðnÞ ¼ n1tðnÞ which is a noise correlated with n0tðnÞ and uncorrelated with intrinsic EEG signal st(n), adds information to minimize the error et(n) between the response yt(n) and the desired response.

Figure 6. Noise canceller system using adaptive filtering [17, 47].

X ¼ ½x<sup>0</sup>

x0 ðtÞ¼½x<sup>0</sup>

gation factor C:

78 Electroencephalography

3.2.3. Filtering

rejection algorithms may be compared.

(n) to suppress the limitations of this method.

ð1Þ,…, x<sup>0</sup>

x^ðnÞ ¼ xðnÞ � C

1 þ Nðt � 1Þ

Step 4. Correct the observed EEG data by subtracting the reference data scaled by the propa-

Because averaging operator emphasizes a time-locked activity in observed EEG signals, this method requires a reference channel and is powerful only if the operating system treats eventrelated brain potentials. Cerebral activities are usually not time-locked that means that important nontime-locked components will be lost by the averaging operation. Furthermore, this method does not take bidirectional contamination into account and cancels the cerebral information from each observed EEG signal upon linear subtraction [41]. Despite its disadvantages, regression is still used as the "gold standard" method to which the performance of any artifact

Band-pass is one of the classical and simple separation attempts to remove artifacts from an observed EEG signal. This method is effective if the spectral distributions of the EEG component and artifact do not overlap, and there are small band artifacts such as power line noise (50/60 Hz interference) [42]. However, fixed-gain filtering is not effective for biological artifacts because it will attenuate EEG component and change both amplitude and phase of signal if the filtering keeps doing that [43]. Some adaptive algorithms try to adapt the filter parameters w to minimize the error between the artifact-free EEG signal x^ðnÞ and the desired original signal x

Adaptive filtering assumes that the intrinsic EEG signal and artifact are uncorrelated; there-

where xt(n) is the observed EEG signal of t-th trial, n0(0) is the additive noise to offset and is uncorrelated with intrinsic EEG signal st(n). The filter parameters w are iteratively adjusted by a feedback (recursive) process designed to make the output as close as possible to some desired response with an additive noise interference [44, 45]. Figure 6 shows the noise canceller system using adaptive filtering. In this system, the primary input xt(n) and the reference input xRef<sup>t</sup>ðnÞ are the observed EEG and reference signals. A reference input xRef<sup>t</sup>ðnÞ ¼ n1tðnÞ which is a noise correlated with n0tðnÞ and uncorrelated with intrinsic EEG signal st(n), adds information to

xtðnÞ ¼ stðnÞ þ n0tðnÞ, ð27Þ

fore, the artifact is considered to be an additive noise within the observed signal:

minimize the error et(n) between the response yt(n) and the desired response.

ðtÞ,…, x<sup>0</sup>

 , …, x<sup>0</sup>

 xRefðnÞ 

ðTÞ�<sup>T</sup>

, ð24Þ

ðtNÞ�, ð25Þ

: ð26Þ

Recursive least squares (RLS)-based adaptive filtering presents a superior performance than least mean squares-based one [46]. The algorithm can be implemented using the following equations:

$$\mathbf{g}(n) = \frac{\mathbf{R}(n-1)\mathbf{x}\_{\text{Ref}}(n)}{\lambda + \mathbf{x}\_{\text{Ref}}^{\text{T}}(n)\mathbf{R}(n-1)\mathbf{x}\_{\text{Ref}}(n)},\tag{28}$$

$$\mathbf{e}(\mathbf{n}) = \mathbf{x}(\mathbf{n}) - \mathbf{y}(\mathbf{n}) \tag{29}$$

$$\mathbf{y}(n) = \mathbf{w}(n)\mathbf{x}\_{\text{Ref}}(n),\tag{30}$$

$$\mathbf{R}(n) = \frac{\mathbf{R}(n-1) - \mathbf{g}(n)\mathbf{x}\_{\text{Ref}}{}^{\text{T}}(n)\mathbf{R}(n-1)}{\lambda},\tag{31}$$

$$
\mathfrak{w}(n) = \mathfrak{w}(n-1) + \mathfrak{g}(n)\mathfrak{e}(n), \tag{32}
$$

where g(n) and w(n) are the gain vector and the filtering parameters. The initial value of crosscorrelation R(0) is δI, where δ and I are some sufficiently large positive value and identity matrix. The updated filter parameters lead to output artifact-free EEG signal.

Consequently, adaptive filtering approach has a potential to recover "pure" EEG signal more rapidly and accurately than linear regression for ocular and cardiac artifacts [48]. However, it is rather difficult to converge to the solution of filtering parameters if muscular and vibration artifacts have contaminated in the observed EEG signal. In that situation, the algorithm sometimes does not converge because of their convulsive burst.

Optimal filtering like Kalman filtering can capture non-stationary properties of artifacts. The framework has flexibility for non-linear system due to approximating the probability density function that might lead to more effective artifact rejection method. Many works on filtering algorithms have developed this approach for more useful module in real-time applications [49, 50].

#### 3.2.4. ICA-based signal decomposition

ICA will achieve an artifact rejection with an outstanding performance if the number of independent sources is equal to or lower than observations. Unfortunately, this method is only applicable to multi-channel data; however, some works extended the idea to single-channel data to unmix a set of observed signals (components) into intrinsic sources [51–53]. These methods decompose a single-channel into multiple components by dividing into a sequence of blocks or different spectral modes before applying ICA so that we call these methods ICAbased signal decomposition approaches (see Figure 7).

Single-channel ICA is the oldest method for single-channel data under an assumption that stationary sources are being disjoint in the frequency domain [54]. An observed signal x(n) is split up into K short segments X, a sequence of contiguous blocks of length L which is to be handled as a set of observations.

$$\mathbf{X} = \begin{bmatrix} \mathbf{x}(1), \dots, \mathbf{x}(k), \dots, \mathbf{x}(K) \end{bmatrix}^{\mathrm{T}}, \tag{33}$$

$$\mathbf{x}(k) = \left[ \mathbf{x}\left(L(k-1) + 1\right), \dots, \mathbf{x}(kL)\right]^\mathsf{T},\tag{34}$$

where k is the block index. A standard ICA algorithm than performs to the matrix X to derive the demining matrix W. The artifacts overlap with EEG components and EEG signal has nonperiodic components; therefore, this method can be applied within limited situations. Wavelet transform (WT)-based and empirical mode decomposition (EMD)-based ICA have already been reported successful in removing artifacts for solving the similar problem than singlechannel ICA [42].

WT-based ICA transforms an observed signal into components of disjoint spectra (a matrix) instead of signal (a vector) via discrete WT [55].

Figure 7. Procedure of separation method using ICA-based signal decomposition.

Review of Artifact Rejection Methods for Electroencephalographic Systems http://dx.doi.org/10.5772/68023 81

$$\mathcal{W}(a,b) = \frac{1}{\sqrt{a}} \int \mathbf{x}(n) \psi\_{a,b}(n) dn,\tag{35}$$

$$
\psi\_{a,b} = \psi\left(\frac{n-b}{a}\right),
\tag{36}
$$

where W(a, b) and ψa, <sup>b</sup> denote that the wavelet representation of x(n) and the mother wavelet with a and b defining the time-scale and location. The decision of parameters is hard if the user does not have a priori knowledge of the signal of interest. Each IC using wavelet coefficients is, respectively, identified as either neuronal or artifactual by manually. The artifactual ICs are replaced their values with arrays of zeros and then reconstructed to wavelet components. Finally, artifact-free signal is acquired by inverse discrete WT.

EMD-based ICA decomposes an observed signal into a number of K intrinsic mode functions (IMFs) hk(n),

$$\mathbf{x}(n) = \sum\_{k=1}^{K} h\_k(n) + d(n),\tag{37}$$

where d(n) is a residue of the original data and a nonzero mean slowly varying function with only a few or no extreme [56]. This method can remove artifacts without a priori knowledge regarding characteristics of the signal embedded in the data [57]. Each IMF has monocomponent of the original data and is estimated by an iterative process called "shifting process":

Step 1. Find the local maxima and minima in xk(n),

Optimal filtering like Kalman filtering can capture non-stationary properties of artifacts. The framework has flexibility for non-linear system due to approximating the probability density function that might lead to more effective artifact rejection method. Many works on filtering algorithms have developed this approach for more useful module in real-time applications

ICA will achieve an artifact rejection with an outstanding performance if the number of independent sources is equal to or lower than observations. Unfortunately, this method is only applicable to multi-channel data; however, some works extended the idea to single-channel data to unmix a set of observed signals (components) into intrinsic sources [51–53]. These methods decompose a single-channel into multiple components by dividing into a sequence of blocks or different spectral modes before applying ICA so that we call these methods ICA-

Single-channel ICA is the oldest method for single-channel data under an assumption that stationary sources are being disjoint in the frequency domain [54]. An observed signal x(n) is split up into K short segments X, a sequence of contiguous blocks of length L which is to be

<sup>X</sup> ¼ ½xð1Þ,…,xðkÞ,…,xðKÞ�<sup>T</sup>

Lðk � 1Þ þ 1

where k is the block index. A standard ICA algorithm than performs to the matrix X to derive the demining matrix W. The artifacts overlap with EEG components and EEG signal has nonperiodic components; therefore, this method can be applied within limited situations. Wavelet transform (WT)-based and empirical mode decomposition (EMD)-based ICA have already been reported successful in removing artifacts for solving the similar problem than single-

WT-based ICA transforms an observed signal into components of disjoint spectra (a matrix)

, ð33Þ

,…, <sup>x</sup>ðkLÞ�<sup>T</sup>, <sup>ð</sup>34<sup>Þ</sup>

[49, 50].

80 Electroencephalography

3.2.4. ICA-based signal decomposition

handled as a set of observations.

channel ICA [42].

based signal decomposition approaches (see Figure 7).

instead of signal (a vector) via discrete WT [55].

xðkÞ¼½x

Figure 7. Procedure of separation method using ICA-based signal decomposition.


Step 5. Go to Step 1 until the residue is below a stopping criterion.

This decomposition is based on the three conditions: (i) the number of extreme and the number of zero-crossing must be equal or up to plus/minus one; (ii) zero mean; and (iii) all the maxima and all the minima of IMF will be positive and negative everywhere. Each IC using IMFs is, respectively, identified as either neuronal or artifactual by manually as well as WT-based ICA. The artifactual ICs are replaced their values with arrays of zeros. Finally, reconstructed IMFs are summed simply together to acquire artifact-free signal.

WT-based and EMG-based ICA have been reported as superb methods for artifact rejection [51, 58, 59]. Therefore, a certain number of researchers tends to select them over recent years. However, separating intrinsic EEG components and artifacts are not successfully completed by this approach because frequency characteristics of biological artifacts and EEG components could be overlapped. In addition, a presence of similar oscillations in different modes or a presence of disparate amplitude oscillations in the same mode, named "mode mixing" makes the performance of artifact rejection worse [60]. Signal distortion or attenuation typically occurs according to the above-mentioned methods by excessive interference. Thus, these approaches are not suitable for real-time applications.

#### 3.2.5. Nonnegative matrix factorization

In linear regression, filtering, and ICA-based signal decomposition approaches, parameters W cannot often converge to a solution for perfectly demixing the mixtures. This implies that partially restricting the active space should be determined for single-channel signals.

Meanwhile, non-negative matrix factorization (NMF) [61] has recently attracted attention as effective algorithms to remove artifacts from single-channel signals because it can find the latent features underlying the interactions between EEG components and artifacts. An Mdimensional non-negative data vector x<sup>n</sup> is placed in the column of M · N matrix X, where N is number of data vectors. The matrix X is based on short-time Fourier transform and approximately factorized into an M · K nonnegative matrix H and a K · N nonnegative matrix W where K is the number of "basis" which is optimized for linear approximation of the input vectors. It can be represented by the following equation:

$$\mathbf{x}\_n \approx \mathbf{y}\_n = \sum\_{k=1}^{K} \mathbf{h}\_k \mathbf{w}\_{k,n} \tag{38}$$

where an h<sup>k</sup> and a wk,n denote an entry of H and W. This equation means that respective nonnegative EEG feature (power spectrum or amplitude spectrum) vector is approximated by linear combination of the basis vector h<sup>k</sup> weighted by the component of wk,n. Therefore, it can be rewritten as

$$X \ltimes \!\!\!\!\!\!\!\/,\tag{39}$$

Some works reported that the supervised NMF could effectively factorize the observed EEG signals into the brain activity components and the artifacts if the user has artifact data in advance [62, 63]. Before applying supervised learning, template matrix XArt has been factorized into HArt and WArt. The matrix X is continuously factorized into H and W where H contains the elements of matrix HArt. The matrix HArt has no relation to the elements of H while using standard NMF algorithm because the initial values are set randomly and updated by multiplicative rules. In supervised learning algorithm, the matrix HArt is used as a fixed value that will partially restrict the active space. By contrast, activity components in the matrix WArt are variable values. For this constraint, the matrix H can attempt to express EEG components in the matrix X with the remaining based K<sup>0</sup> . EEG components will be stored in the bases (see Figure 8).

After these processing, non-negative data of artifact-free EEG are reconstructed from the following equation:

$$\hat{\mathbf{X}} = \mathbf{X} \ast \sum\_{k=K\_{\text{Art}}+1}^{K} \sum\_{n=1}^{N} \frac{H\_k W\_{k,n}}{HW} . \tag{40}$$

Figure 8. Procedure of supervised NMF.

Eq. (40) and inverse Fourier transform make it possible to acquire artifact-free signal. Supervised NMF is still in its infancy, showed high performance for artifact rejection. However, epoch detention step, which is not part of normal procedures in artifact rejection, must be embedded in the epoch-based method. This leads to increase the computational cost inevitably. Some low-cost (real-time) artifact detection algorithms for single-channel EEG signal [64, 65] are a silver lining in a dark cloud.
