3. Data analysis and fault diagnosis techniques

Many signal analysis and machine fault diagnosis techniques have been developed in the last few decades in order to improve the reliability, efficiency, and lifespan of machines, as well as to reduce the maintenance and operation cost. In this section, the most frequently employed signal-processing and fault diagnosis techniques for rolling element bearings are briefly discussed and summarized.

#### 3.1. Time-domain data analysis techniques

#### 3.1.1. Time series analysis

Time series analysis is a mathematical method, which handles an observed data series in a statistic manner. Time series analysis is based on the assumption that the variation trend of a historical observed data can be extended to provide an indication/prediction of future data variation of a same monitored system. A typical time series analysis approach is to establish a predictive model based on the observed data series. The three widely adopted univariate time series models in machine fault diagnosis are autoregressive (AR) model, moving average (MA) model, and autoregressive moving average (ARMA) model [4].

A general linear ARMA (p, q) model can be expressed as

$$y\_t = \mathcal{c} + \sum\_{i=1}^p \phi\_i y\_{t-i} + \sum\_{j=1}^q \theta\_j \varepsilon\_{t-j} + \varepsilon\_t \tag{2}$$

where yt is the time series needed to be modeled, c is a constant, p is the number of autoregressive orders, q is the number of moving average orders, φ<sup>i</sup> is the autogressive coefficients, θ<sup>j</sup> is the moving average coefficients, and ε<sup>t</sup> is the independent and identically distributed terms, which are commonly assumed to have zero mean and a constant variance, or (i) <sup>E</sup>ðεtÞ ¼ 0; (ii) <sup>E</sup>ðεtεTÞ ¼ 0 for t <sup>≠</sup> <sup>T</sup> ; and (iii) <sup>E</sup>ðε<sup>2</sup> <sup>t</sup>Þ ¼ <sup>σ</sup>2.

An AR model and an MA model can be viewed as the special case of an ARMA model. For instance, when all autoregressive coefficients φ<sup>i</sup> equal to 0 ðφ<sup>i</sup> ¼ 0, 1 ≤ i ≤ pÞ, an ARMA model degrades to an MA model. On the other hand, when all moving average coefficients θ<sup>j</sup> equal to 0 ðθ<sup>j</sup> ¼ 0, 1 ≤ j ≤ qÞ, an ARMA model degrades to an AR model.

After selecting the most suitable time series model for the data analysis, the next step is to determine the orders of AR or MA models. The commonly adopted order parameter determination criteria for these time series models are the final prediction error (FPE) criterion, Akaike information criterion (AIC), and Bayesian information criterion (BIC).

On the other hand, the common methods used to determine the autoregressive coefficients and moving average coefficients are least squares estimation, maximum likelihood estimation, Yuler-Walker estimation, and so on.

It is worth mentioning that ARMA models are developed based on the assumption that the signal is stationary. If the signal is not stationary, some data preprocessing steps need to be adopted. For example, (1) performing a differential operation of the data term by term until the signal satisfies the stationary criterion and (2) decomposing a nonstationary signal by empirical mode decomposition (EMD) to obtain the stationary intrinsic mode functions (IMFs).

#### 3.1.2. Minimum entropy deconvolution

Minimum entropy deconvolution (MED) technique is proposed originally by Wiggins, which has been successfully employed in dealing with the seismic response signal [5]. The basic idea of MED is to find an inverse filter that counteracts the effect of the transmission path [6]. It is designed to reduce the spread of impulse response functions (IRFs) and then obtain signals, which are close to the original impulses giving rise to them. The MED technique has been successfully employed in bearing fault diagnosis such as in [6].

Figure 3 illustrates the basic idea of the MED technique. In this process, an unknown impact signal x(n), which can be as highly impulsive as possible, passes through a structural filter h and then mixes with a noise e(n) to produce an intermediate output o(n). The signal o(n) then passes through an inverse (MED) filter f to produce a final output y(n). The process eliminates the structure resonance and the final output y(n) after the inverse filter needs to be as close as possible to the original input x(n).

Figure 3. Inverse filtering (deconvolution) process for MED.

The inverse filter f(l) can be modeled as a finite impulse response (FIR) filter with L coefficients:

$$\mathbf{y}(n) = \sum\_{l=1}^{L} f(l)\mathbf{x}(n-l),\tag{3}$$

and

3.1. Time-domain data analysis techniques

model, and autoregressive moving average (ARMA) model [4].

yt <sup>¼</sup> <sup>c</sup> <sup>þ</sup><sup>X</sup>

p

i¼1 φi

A general linear ARMA (p, q) model can be expressed as

or (i) <sup>E</sup>ðεtÞ ¼ 0; (ii) <sup>E</sup>ðεtεTÞ ¼ 0 for t <sup>≠</sup> <sup>T</sup> ; and (iii) <sup>E</sup>ðε<sup>2</sup>

Yuler-Walker estimation, and so on.

3.1.2. Minimum entropy deconvolution

0 ðθ<sup>j</sup> ¼ 0, 1 ≤ j ≤ qÞ, an ARMA model degrades to an AR model.

information criterion (AIC), and Bayesian information criterion (BIC).

Time series analysis is a mathematical method, which handles an observed data series in a statistic manner. Time series analysis is based on the assumption that the variation trend of a historical observed data can be extended to provide an indication/prediction of future data variation of a same monitored system. A typical time series analysis approach is to establish a predictive model based on the observed data series. The three widely adopted univariate time series models in machine fault diagnosis are autoregressive (AR) model, moving average (MA)

> yt<sup>−</sup><sup>i</sup> <sup>þ</sup><sup>X</sup> q

where yt is the time series needed to be modeled, c is a constant, p is the number of autoregressive orders, q is the number of moving average orders, φ<sup>i</sup> is the autogressive coefficients, θ<sup>j</sup> is the moving average coefficients, and ε<sup>t</sup> is the independent and identically distributed terms, which are commonly assumed to have zero mean and a constant variance,

An AR model and an MA model can be viewed as the special case of an ARMA model. For instance, when all autoregressive coefficients φ<sup>i</sup> equal to 0 ðφ<sup>i</sup> ¼ 0, 1 ≤ i ≤ pÞ, an ARMA model degrades to an MA model. On the other hand, when all moving average coefficients θ<sup>j</sup> equal to

After selecting the most suitable time series model for the data analysis, the next step is to determine the orders of AR or MA models. The commonly adopted order parameter determination criteria for these time series models are the final prediction error (FPE) criterion, Akaike

On the other hand, the common methods used to determine the autoregressive coefficients and moving average coefficients are least squares estimation, maximum likelihood estimation,

It is worth mentioning that ARMA models are developed based on the assumption that the signal is stationary. If the signal is not stationary, some data preprocessing steps need to be adopted. For example, (1) performing a differential operation of the data term by term until the signal satisfies the stationary criterion and (2) decomposing a nonstationary signal by empirical mode decomposition (EMD) to obtain the stationary intrinsic mode functions (IMFs).

Minimum entropy deconvolution (MED) technique is proposed originally by Wiggins, which has been successfully employed in dealing with the seismic response signal [5]. The basic idea of MED is to find an inverse filter that counteracts the effect of the transmission path [6]. It is designed to reduce the spread of impulse response functions (IRFs) and then obtain signals,

j¼1

<sup>t</sup>Þ ¼ <sup>σ</sup>2.

θjεt−<sup>j</sup> þ ε<sup>t</sup> (2)

3.1.1. Time series analysis

44 Bearing Technology

$$f(l) \* h(l) = \delta(n - l\_m). \tag{4}$$

where δ is a Dirac delta function and lm is the time delay after the MED process, which displaces the entire signal by lm but keeps the impulse spacing of the signal.

The inverse filter f(l) is implemented for the MED technique by the objective function method (OFM). The OFM is an optimization process designed to maximize the kurtosis of the output signal, y(t). OFM achieves this by changing the coefficients of the filter f(l). The kurtosis is taken as the normalized fourth-order moment given by

$$K\{f(l)\} = \sum\_{n=1}^{N} y^4(n) / \left[\sum\_{n=1}^{N} y^2(n)\right]^2,\tag{5}$$

and the maximum kurtosis of y(t) can be obtained according to f(l) for which the derivative of the objective function is zero:

$$
\partial \mathcal{K}(f(l)) / \partial f(l) \, \, = \, 0 \tag{6}
$$

where the filter coefficients of f(l) can converge to a given tolerance by the iterative process [7].

#### 3.1.3. Spectral kurtosis

Spectral kurtosis (SK) was first proposed in the 1980s for detecting impulsive events in sonar signals. SK was first applied in bearing fault diagnosis by Antoni [8]. The method basically computes a kurtosis at "each frequency line" in order to discover the presence of hidden nonstationarities and to indicate in which frequency band it takes place. The method has been proved to be relatively robust against strong additive noise. An SK of nonstationary signals is defined based on the Wold-Cramer decomposition, which describes any stochastic random process x(t) as the output of a causal, linear, and time-varying system [9]:

$$\mathbf{x}(t) = \mathbf{f}\_{-1/2}^{+1/2} H(t, f) \mathbf{e}^{\mathbf{j}2\pi \mathbf{f} \cdot \mathbf{n}} \mathbf{d}Z\_{\mathbf{x}}(f), \tag{7}$$

where dZxðfÞ is an orthonormal spectral process of unit variance and H(t, f) is the time-varying transfer function interpreted as the complex envelope of x(t) at frequency f. The SK of a signal x (t) is defined as a normalized fourth-order spectral accumulation given by [8, 9]:

$$K\_x(f) = \frac{\langle |H(t,f)|^4 \rangle}{\langle |H(t,f)|^2 \rangle^2} - 2,\tag{8}$$

in which Kx( f) is the spectral kurtosis of signal x(t) around frequency f and 〈∙〉 denotes the averaging over time.

Antoni and Randall [9] proposed two techniques to calculate the spectral kurtosis, one is based on short-time Fourier transform (STFT) (the so-called kurtogram for finding the optimal filter) and the other is based on one-third binary filter banks (fast kurtogram for on-line condition monitoring and fault diagnosis). Kurtogram is a powerful tool for the analysis of nonstationary signals in bearing fault diagnosis, though it has been reported that the technique fails to detect a bearing fault when the defect signal has low signal-to-noise ratio and contains non-Gaussian noise with high peaks [10, 11].

#### 3.1.4. Singular-value decomposition

Singular-value decomposition (SVD) is a numerical method which states that a matrix A of rank L can be decomposed into the product of three matrices: U (an orthogonal matrix), Λ (a diagonal matrix) and V<sup>T</sup> (the transpose of an orthogonal matrix V) [12]. This method is usually presented as

$$\mathbf{A}\_{m \times n} = \mathbf{U}\_{m \times m} \cdot \mathbf{A}\_{m \times n} \cdot \mathbf{V}\_{n \times n}^{\mathrm{T}},\tag{9}$$

where <sup>U</sup>TU <sup>¼</sup> 1 and <sup>V</sup>TV <sup>¼</sup> 1; <sup>Λ</sup><sup>m</sup> · <sup>n</sup> is a diagonal matrix containing the square roots of eigenvalues of <sup>A</sup>TA which can be expressed as <sup>Λ</sup> <sup>¼</sup> diagðσ1, <sup>σ</sup>2, …, <sup>σ</sup>LÞ, where <sup>L</sup> <sup>¼</sup> minðm, <sup>n</sup>Þ. The non-zero diagonal terms σiði ¼ 1, 2, ⋯, lÞ together with the zero terms σ<sup>l</sup>þ<sup>1</sup> ¼ … ¼ σ<sup>L</sup> ¼ 0 form the singular values of the matrix A.

SVD can be an effective tool for data reduction, for example, a reduction of the fault feature dimensions. It is also a useful tool for signal de-noising. The application of SVD in signal de-noising is mainly operated as follows:

Suppose a bearing CM signal x(k) can be modeled as

∂KðfðlÞÞ=∂fðlÞ ¼ 0 (6)

<sup>−</sup>1=<sup>2</sup> <sup>H</sup>ðt, <sup>f</sup>Þej2πf ndZxðfÞ, (7)

<sup>2</sup> − 2, (8)

<sup>n</sup> · <sup>n</sup>, (9)

where the filter coefficients of f(l) can converge to a given tolerance by the iterative process [7].

Spectral kurtosis (SK) was first proposed in the 1980s for detecting impulsive events in sonar signals. SK was first applied in bearing fault diagnosis by Antoni [8]. The method basically computes a kurtosis at "each frequency line" in order to discover the presence of hidden nonstationarities and to indicate in which frequency band it takes place. The method has been proved to be relatively robust against strong additive noise. An SK of nonstationary signals is defined based on the Wold-Cramer decomposition, which describes any stochastic random

where dZxðfÞ is an orthonormal spectral process of unit variance and H(t, f) is the time-varying transfer function interpreted as the complex envelope of x(t) at frequency f. The SK of a signal x

〈jHðt,fÞj<sup>2</sup>

in which Kx( f) is the spectral kurtosis of signal x(t) around frequency f and 〈∙〉 denotes the

Antoni and Randall [9] proposed two techniques to calculate the spectral kurtosis, one is based on short-time Fourier transform (STFT) (the so-called kurtogram for finding the optimal filter) and the other is based on one-third binary filter banks (fast kurtogram for on-line condition monitoring and fault diagnosis). Kurtogram is a powerful tool for the analysis of nonstationary signals in bearing fault diagnosis, though it has been reported that the technique fails to detect a bearing fault when the defect signal has low signal-to-noise ratio and contains non-Gaussian

Singular-value decomposition (SVD) is a numerical method which states that a matrix A of rank L can be decomposed into the product of three matrices: U (an orthogonal matrix), Λ (a diagonal matrix) and V<sup>T</sup> (the transpose of an orthogonal matrix V) [12]. This method is usually

<sup>A</sup><sup>m</sup> · <sup>n</sup> <sup>¼</sup> <sup>U</sup><sup>m</sup> · <sup>m</sup>∙Λ<sup>m</sup> · <sup>n</sup>∙V<sup>T</sup>

where <sup>U</sup>TU <sup>¼</sup> 1 and <sup>V</sup>TV <sup>¼</sup> 1; <sup>Λ</sup><sup>m</sup> · <sup>n</sup> is a diagonal matrix containing the square roots of eigenvalues of <sup>A</sup>TA which can be expressed as <sup>Λ</sup> <sup>¼</sup> diagðσ1, <sup>σ</sup>2, …, <sup>σ</sup>LÞ, where <sup>L</sup> <sup>¼</sup> minðm, <sup>n</sup>Þ.

〉

〉

KxðfÞ ¼ 〈jHðt, <sup>f</sup>Þj<sup>4</sup>

process x(t) as the output of a causal, linear, and time-varying system [9]:

þ1=2

(t) is defined as a normalized fourth-order spectral accumulation given by [8, 9]:

xðtÞ ¼ ∫

3.1.3. Spectral kurtosis

46 Bearing Technology

averaging over time.

presented as

noise with high peaks [10, 11].

3.1.4. Singular-value decomposition

$$
\mathfrak{x}(k) = \mathfrak{y}(k) + \mathfrak{n}(k), \tag{10}
$$

where y(k) and n(k) are respectively, the uncontaminated signal and noise. A conversion method such as phase space reconstruction can be used to transform the signal from a onedimensional vector into a two-dimensional matrix as follows:

$$\mathbf{A} = \overline{\mathbf{A}} + \mathbf{N},\tag{11}$$

where the matrix A represents the uncontaminated data y(k), which contains characteristic fault information and the matrix N signifies the unwanted noise part n(k). From Eqs. (9) and (11), we have

$$\mathbf{A} = \overline{\mathbf{A}} + \mathbf{N} = \begin{bmatrix} \mathbf{U}\_{\mathrm{l}} & \mathbf{U}\_{0} \end{bmatrix} \begin{bmatrix} \mathbf{A}\_{\mathrm{l}} & \mathbf{0} \\ \mathbf{0} & \mathbf{A}\_{0} \end{bmatrix} \begin{bmatrix} \mathbf{V}\_{\mathrm{l}}^{\mathrm{T}} \\ \mathbf{V}\_{0}^{\mathrm{T}} \end{bmatrix}. \tag{12}$$

Λ<sup>1</sup> in Eq. (12) contains the significant singular values σiði ¼ 1, 2, ⋯, lÞ which are used to construct the uncontaminated data and Λ<sup>0</sup> contains small singular values σ<sup>i</sup> ði ¼ l þ 1, …, LÞ, which can be viewed as noise.

From Eq. (12), the de-noising signal matrix can be written as

$$\overline{\mathbf{A}} = \mathbf{U}\_1 \mathbf{A}\_1 \mathbf{V}\_1^T \tag{13}$$

A is a reconstructed matrix using only the largest l number of singular values whose values are greater than a pre-set threshold value ε. The rest of the singular values are replaced by zero such that

$$
\sigma\_i = 0 \text{ when } \sigma\_i \le \varepsilon, \ t = l + 1, \dots, L. \tag{14}
$$

#### 3.2. Frequency-domain data analysis techniques

#### 3.2.1. Power spectrum

Spectrum analysis is the most popular frequency-domain analysis techniques, which transforms a time-domain data into discrete frequency components by Fourier transform. The power spectrum of a time-domain signal is defined as the square of the magnitude of the Fourier transform of a signal. It can be written as

$$P(\omega) = \left| \mathbf{f}\_{-\circ}^{+\circ} \mathbf{x}(t) \mathbf{e}^{\cdot \circ \mathrm{at}} dt \right|^2 = X(\omega) X^\*(\omega), \tag{15}$$

where XðωÞ ¼ ∫ þ∞ <sup>−</sup><sup>∞</sup> xðtÞe −jωt dt is the Fourier transform of a signal, X� ðωÞ is its complex conjugate and ω is the angular frequency in radian/s. The power spectrum is frequently employed to extract useful characteristic defect frequency components of a stationary CM signal.

#### 3.2.2. Cepstrum

A cepstrum is defined as the power spectrum of the logarithm of the power spectrum of a signal [13]. The name of cepstrum was derived by reversing the first four letters of spectrum. There are four types of cepstra: a real cepstrum, a complex cepstrum, a power cepstrum and a phase cepstrum.

The real cepstrum of a signal x(t) is given by

$$\mathbf{c}(t) = \frac{1}{2\pi} \mathbf{f}\_{-\pi}^{\pi} \log |X(\omega)| \mathbf{e}^{\text{jot}} d\omega. \tag{16}$$

The complex cepstrum of a signal x(t) is given by

$$\overrightarrow{\boldsymbol{c}}(t) = \frac{1}{2\pi} \mathfrak{f}\_{-\pi}^{\pi} \log \left( \mathbf{X}(\omega) \right) \mathbf{e}^{\mathbf{j}\omega t} d\omega. \tag{17}$$

The power cepstrum of a signal x(t) is given by

$$\mathbf{c}(t)^2 = \frac{1}{2\pi} |\mathbf{f}\_{-\pi}^{\pi} \mathbf{log} | \mathbf{X}(\omega) | \mathbf{e}^{\mathbf{j}\omega t} d\omega |^2. \tag{18}$$

The most commonly used cepstrum in machine condition monitoring is power cepstrum. The following steps can be taken to calculate the power cepstrum of a signal: A signal ! power spectrum of the signal ! logarithm of the power spectrum ! power spectrum of the data from the previous step ! inverse Fourier transform of the log power spectrum from the previous step (power cepstrum). The main application of cepstrum analysis in machine condition monitoring is for signals containing families of harmonics and sidebands where it is the whole family rather than individual frequency component characterizing the fault [13] (typical for bearing CM signals). The technical terms used in a cepstrum are quefrency, gamnitude and rahmonics corresponding to frequency, amplitude and harmonics in a spectrum analysis. The cepstrum can be used for the harmonics generated by bearing faults, but only if they are well separated. In contrast, envelope analysis to be introduced in the next section is not limited by such restriction.

#### 3.2.3. Envelope spectrum

It has been pointed out by Randall [13] that the benchmark method for rolling element bearing diagnosis is envelope analysis as the spectrum of raw bearing CM signals often contains little information about bearing faults. In envelope analysis, a time waveform is bandpass filtered in a high-frequency band and the fault signal is amplified by the structural resonances and amplitude modulated to form the envelope signal for bearing diagnosis [13]. The procedures for envelope analysis are briefly described below:

Given a real signal xðtÞ, its Hilbert transform, hðtÞ ¼ HfxðtÞg is defined as

PðωÞ¼j∫

where XðωÞ ¼ ∫

48 Bearing Technology

3.2.2. Cepstrum

phase cepstrum.

3.2.3. Envelope spectrum

þ∞ <sup>−</sup><sup>∞</sup> xðtÞe

−jωt

The real cepstrum of a signal x(t) is given by

The complex cepstrum of a signal x(t) is given by

The power cepstrum of a signal x(t) is given by

þ∞ <sup>−</sup><sup>∞</sup> <sup>x</sup>ðtÞe<sup>−</sup>jω<sup>t</sup>

<sup>c</sup>ðtÞ ¼ <sup>1</sup> 2π ∫ π

c !ð<sup>t</sup>Þ ¼ <sup>1</sup> 2π ∫ π <sup>−</sup>πlog XðωÞ e<sup>j</sup>ω<sup>t</sup>

cðtÞ <sup>2</sup> <sup>¼</sup> <sup>1</sup> <sup>2</sup><sup>π</sup> <sup>j</sup><sup>∫</sup> π dtj

dt is the Fourier transform of a signal, X�

extract useful characteristic defect frequency components of a stationary CM signal.

and ω is the angular frequency in radian/s. The power spectrum is frequently employed to

A cepstrum is defined as the power spectrum of the logarithm of the power spectrum of a signal [13]. The name of cepstrum was derived by reversing the first four letters of spectrum. There are four types of cepstra: a real cepstrum, a complex cepstrum, a power cepstrum and a

−πlogjXðωÞje<sup>j</sup>ω<sup>t</sup>

−πlogjXðωÞjejω<sup>t</sup>

The most commonly used cepstrum in machine condition monitoring is power cepstrum. The following steps can be taken to calculate the power cepstrum of a signal: A signal ! power spectrum of the signal ! logarithm of the power spectrum ! power spectrum of the data from the previous step ! inverse Fourier transform of the log power spectrum from the previous step (power cepstrum). The main application of cepstrum analysis in machine condition monitoring is for signals containing families of harmonics and sidebands where it is the whole family rather than individual frequency component characterizing the fault [13] (typical for bearing CM signals). The technical terms used in a cepstrum are quefrency, gamnitude and rahmonics corresponding to frequency, amplitude and harmonics in a spectrum analysis. The cepstrum can be used for the harmonics generated by bearing faults, but only if they are well separated. In contrast, envelope analysis to be introduced in the next section is not limited by such restriction.

It has been pointed out by Randall [13] that the benchmark method for rolling element bearing diagnosis is envelope analysis as the spectrum of raw bearing CM signals often contains little information about bearing faults. In envelope analysis, a time waveform is bandpass filtered in a high-frequency band and the fault signal is amplified by the structural resonances and

dωj 2

<sup>2</sup> <sup>¼</sup> <sup>X</sup>ðωÞX�

ðωÞ, (15)

dω: (16)

dω: (17)

: (18)

ðωÞ is its complex conjugate

$$h(t) = H\{\mathbf{x}(t)\} = \frac{1}{\pi} \J\_{-\infty}^{\circ} \frac{\mathbf{x}(\tau)}{t - \tau} d\tau = \frac{1}{\pi \tau} \mathbf{x}(t),\tag{19}$$

The real signal xðtÞ and its Hilbert transform hðtÞ together form a new complex analytic signal zðtÞ,

$$\mathbf{z}(t) = \mathbf{x}(t) + j\hbar(t). \tag{20}$$

The envelope signal EðtÞ is simply the absolute value of the analytic signal zðtÞ,

$$|E(t) = |\mathbf{z}(t)| = |\mathbf{x}(t) + jh(t)| = \sqrt{\mathbf{x}^2(t) + h^2(t)}.\tag{21}$$

After taking a fast Fourier transform on the envelope signal EðtÞ, an envelope spectrum can be obtained. An envelope spectrum can reveal the repetition characteristic defect frequencies caused by bearing faults even in the presence of a small random fluctuation. The envelope spectrum of the noise-added bearing defect signal shown in Figure 2(b) is given in Figure 4. It is noted that due to a high noise level in the signal (SNR = 0.05, representing an initial bearing defect), the envelope spectrum is compromised by many artificial frequency components and produces a subtle bearing defect information. Also due to such interference, the calculated defect frequency (the modulated frequency) of the outer race fault and its higher harmonics (1 +BPFO, 2 +BPFO and so on) shown in the envelope spectrum are also slightly lower than that of the simulated defect frequency (1 +BPFO = 78.18 Hz).

Figure 4. Envelope spectrum of the noise-added bearing defect signal shown in Figure 2(b).

#### 3.2.4. Higher-order spectra

Higher-order spectra (HOS) (also known as polyspectra) consist of higher-order moment of spectra, which are able to detect nonlinear interactions between frequency components [14]. Higherorder spectra are defined as the Fourier transform of the corresponding cumulant sequences of a signal. For instance, the first-order cumulant of a stationary process is the mean of the signal:

$$\mathbf{C}\_1 \mathbf{x} = E\{\mathbf{x}(n)\}. \tag{22}$$

The second- and third-order cumulants of a stationary process are defined as

$$\mathbf{C}\_{2}\mathbf{x}(k) = E\{\mathbf{x}^\*(n)\mathbf{x}(n+k)\},\tag{23}$$

and

$$\mathbb{C}\_{3}\mathbf{x}(k,l) = \mathbb{E}\{\mathbf{x}^\*(n)\mathbf{x}(n+k)\mathbf{x}(n+l)\} - \mathbb{C}\_{2}\mathbf{x}(k)\mathbb{C}\_{2}\mathbf{x}(l-m) - \mathbb{C}\_{2}\mathbf{x}(l)\mathbb{C}\_{2}\mathbf{x}(k-m) \tag{24}$$

From Eqs. (23) and (24), it is clear that the power spectrum is, in fact, the FT of the second-order cumulant of a signal and the third-order spectrum also termed as bispectrum is the FT of the third-order cumulant. Note that the bispectrum S3xðω1, ω2Þ is a function of two frequencies. Therefore, it can detect phase coupling between two frequency components which appears as a third frequency component at the sum or difference of the first two (frequencies) with a phase that is also the sum or difference of the first two.

Traditionally, power spectrum is used to break down a time waveform signal into a series of frequency components. However, power spectrum cannot determine whether peaks at harmonically related positions are phase coupling since power spectrum uses only the magnitude of Fourier components and the phase information is neglected. Higher-order spectra such as bispectrum use the phase information of the signal and are capable to detect phase coupling of frequency components in the spectra. Therefore, a bispectrum can provide additional phase information than a power spectrum analysis.

The motivation behind the use of higher-order spectrum analysis is summarized as follows. Firstly, the technique can suppress Gaussian noise in the data processing of unknown spectral characteristics for fault detection, parameter estimation and classification problems. If Gaussian noise is embedded in a non-Gaussian signal, a HOS transform can eliminate the noise. On the other hand, periodic, quasi-periodic signals and self-emitting signals from complex machinery in practical applications are typical non-Gaussian signals which will be preserved in the transform. Secondly, a HOS analysis can preserve the phase information. For example, there are situations in practice in which the interaction between two harmonic components creates a third component at their sum and/or difference frequencies. Thirdly, HOS can play a key role in detecting and characterizing the type of nonlinearity in a system from its output data.

For a better illustration of the signal-processing techniques discussed above in data analysis, a number of case studies are given in the following section to exemplify the usage of the above algorithms in bearing fault detection.

#### 3.2.5. Case studies

Case 1: (AR & MED de-noising): In this case study, the AR model described in Section 3.1.1 and the MED method described in Section 3.1.2 are employed in the analysis to filter the noise-added bearing defect signal shown in Figure 2(b) prior to an envelope analysis to enhance the bearing defect frequency components in the envelope spectrum for a more reliable bearing fault diagnosis. The results are presented in Figures 5 and 6. Compared to the result in Figure 4, it is shown that the two-step de-noising by the AR and MED models has successfully suppressed the artificial components and enhanced the defect signal representation in the spectrum.

3.2.4. Higher-order spectra

50 Bearing Technology

C3xðk, lÞ ¼ Efx�

phase that is also the sum or difference of the first two.

information than a power spectrum analysis.

algorithms in bearing fault detection.

3.2.5. Case studies

and

Higher-order spectra (HOS) (also known as polyspectra) consist of higher-order moment of spectra, which are able to detect nonlinear interactions between frequency components [14]. Higherorder spectra are defined as the Fourier transform of the corresponding cumulant sequences of a signal. For instance, the first-order cumulant of a stationary process is the mean of the signal:

From Eqs. (23) and (24), it is clear that the power spectrum is, in fact, the FT of the second-order cumulant of a signal and the third-order spectrum also termed as bispectrum is the FT of the third-order cumulant. Note that the bispectrum S3xðω1, ω2Þ is a function of two frequencies. Therefore, it can detect phase coupling between two frequency components which appears as a third frequency component at the sum or difference of the first two (frequencies) with a

Traditionally, power spectrum is used to break down a time waveform signal into a series of frequency components. However, power spectrum cannot determine whether peaks at harmonically related positions are phase coupling since power spectrum uses only the magnitude of Fourier components and the phase information is neglected. Higher-order spectra such as bispectrum use the phase information of the signal and are capable to detect phase coupling of frequency components in the spectra. Therefore, a bispectrum can provide additional phase

The motivation behind the use of higher-order spectrum analysis is summarized as follows. Firstly, the technique can suppress Gaussian noise in the data processing of unknown spectral characteristics for fault detection, parameter estimation and classification problems. If Gaussian noise is embedded in a non-Gaussian signal, a HOS transform can eliminate the noise. On the other hand, periodic, quasi-periodic signals and self-emitting signals from complex machinery in practical applications are typical non-Gaussian signals which will be preserved in the transform. Secondly, a HOS analysis can preserve the phase information. For example, there are situations in practice in which the interaction between two harmonic components creates a third component at their sum and/or difference frequencies. Thirdly, HOS can play a key role in detecting and

For a better illustration of the signal-processing techniques discussed above in data analysis, a number of case studies are given in the following section to exemplify the usage of the above

Case 1: (AR & MED de-noising): In this case study, the AR model described in Section 3.1.1 and the MED method described in Section 3.1.2 are employed in the analysis to filter the noise-added bearing defect signal shown in Figure 2(b) prior to an envelope analysis to enhance the bearing

characterizing the type of nonlinearity in a system from its output data.

The second- and third-order cumulants of a stationary process are defined as

C2xðkÞ ¼ Efx�

C1x ¼ EfxðnÞg: (22)

ðnÞxðn þ kÞxðn þ lÞg−C2xðkÞC2xðl − mÞ−C2xðlÞC2xðk − mÞ (24)

ðnÞxðn þ kÞg, (23)

Figure 5. The time waveform and the envelope spectrum of the simulated bearing defect signal after preprocessed by an AR model: (a) the time waveform after filtered by an AR model; (b) envelope spectrum of the signal shown in Figure 5(a).

Figure 6. The time waveform and the envelope spectrum of the simulated bearing defect signal after preprocessed by an AR model and filtered further by an MED model: (a) the time waveform after de-noising; (b) envelope spectrum of the signal shown in Figure 6(a).

Case 2 (spectrum kurtogram): The signal used in this case study is shown in Figure 7. The signal is generated by adding 0-dB white noise to the simulated bearing defect signal presented in Figure 2(a). In the analysis of the signal, the fast kurtogram algorithm [9] described in Section 3.1.3 is first employed to determine the bearing resonance band (to obtain the center frequency and the bandwidth) having the highest band energy (corresponding to the highest kurtosis value) in the signal. A five-level fast kurtogram based on a filter band and fast Fourier transform is shown in Figure 8 and the highest band energy is found to occur in the band encircled by the white ellipse in the figure. The band is found to be centered at 3958.33 Hz (close to the simulated bearing resonance frequency of 4000 Hz as listed in Table 2) and has the bandwidth of 416.67 Hz. The band has the highest kurtosis value of 0.1 which occurs at level 4.5 in the decomposition. The next step after the decomposition is to take an envelope analysis based on the band-filtered optimum frequency band signal obtained from the fast kurtogram and the result is shown in Figure 9(a) and (b). It is shown that the spectrum kurtosis technique can detect the characteristic defect frequency from weak-bearing defect signals.

Figure 7. Simulated bearing defect signal with 0-dB white noise added.

Figure 8. Fast kurtogram calculated based on the defect signal shown in Figure 7.

Figure 9. Spectrum kurtosis analysis of the bearing defect signal: (a) envelope of the band-filtered signal; (b) Fourier transform of the envelope signal.

#### 3.3. Time-frequency analysis

frequency and the bandwidth) having the highest band energy (corresponding to the highest kurtosis value) in the signal. A five-level fast kurtogram based on a filter band and fast Fourier transform is shown in Figure 8 and the highest band energy is found to occur in the band encircled by the white ellipse in the figure. The band is found to be centered at 3958.33 Hz (close to the simulated bearing resonance frequency of 4000 Hz as listed in Table 2) and has the bandwidth of 416.67 Hz. The band has the highest kurtosis value of 0.1 which occurs at level 4.5 in the decomposition. The next step after the decomposition is to take an envelope analysis based on the band-filtered optimum frequency band signal obtained from the fast kurtogram and the result is shown in Figure 9(a) and (b). It is shown that the spectrum kurtosis technique

can detect the characteristic defect frequency from weak-bearing defect signals.

Figure 7. Simulated bearing defect signal with 0-dB white noise added.

52 Bearing Technology

Figure 8. Fast kurtogram calculated based on the defect signal shown in Figure 7.

A major limitation of frequency-domain analysis is that it is only useful in dealing with stationary signals. The presence of transient or nonstationary signals in the data would not be captured in a traditional frequency-domain analysis. To overcome this limitation, time-frequency analysis techniques are then developed for a better understanding of how spectrum properties change with time. In time-frequency analysis, waveform signals are analyzed in both time and frequency domains to capture the progressive change of spectrum components. The most commonly employed time-frequency analysis techniques are short-time Fourier transform (STFT), wavelet transform (WT), Wigner-Ville distribution (WVD) and adaptive signal analysis techniques such as empirical mode decomposition (EMD) technique.

#### 3.3.1. Short-time Fourier transform

Short-time Fourier transform adds a time variable to the traditional Fourier spectrum, thus allowing it to investigate the time-varying nature of a signal. In a SFFT analysis, a continue time-domain waveform is multiplied by a sliding narrow time window and a Fourier transform is computed on the windowed signal at each time step of the sliding time window. The STFT analysis is based on the consideration that if a signal can be considered stationary over the length of the chosen sliding time window, a Fourier transform can be performed on the windowed signal segment for each new position of the sliding time window to obtain a satisfactory time-frequency analysis of a nonstationary signal.

A short-time Fourier transform of a continuous signal xðtÞ can be computed by

$$\text{STFT}\_{\mathbf{x}}(\tau,\omega) = \mathbf{f}^{+\circ}\_{-\circ}\mathbf{x}(\tau)\mathbf{w}^\*(\tau-t)\mathbf{e}^{-j\omega\tau}d\tau,\tag{25}$$

where wðτ−tÞ is a finite time window function centered at time t. The asterisk sign (\*) in the time window indicates a complex conjugate and the analysis window can be regarded as the impulse response of a low-pass filter. A spectrogram, which is the squared amplitude of an STFT transform, is often used in the display of the transformation result for signal analysis and interpretation. A major limitation of STFT is that the analysis has a uniform resolution in both time and frequency planes implying that a small time window will have a good time resolution but poor frequency resolution and vice versa. This drawback limits the technique for the analysis of signals with slow-changing dynamic only. Another time-frequency analysis technique, wavelet transform, has then been developed to overcome this limitation.

#### 3.3.2. Wavelet transform

Wavelet transform (WT) is a time scale representation of a signal. It is the inner product of a signal with the translated and scaled family of a mother wavelet function ψ(t). In general, WT analysis can be categorized into three forms: a continuous wavelet transform (CWT), a discrete wavelet transform (DWT), and a more general wavelet packet decomposition (WPD).

In CWT, the wavelet transform of a continuous signal xðtÞ is calculated using

$$\mathcal{W}\_f(\boldsymbol{\mu}, \mathbf{s}) = \frac{1}{\sqrt{s}} \mathbf{f} \, \mathbf{x}(t) \boldsymbol{\psi}^\* \left(\frac{t-\mu}{s}\right) dt. \tag{26}$$

where s is the scale parameter and u is the time translation. If one considers the wavelet function ψ(t) as a bandpass impulse response, then the wavelet transform is simply a bandpass analysis. Care should be taken to ensure that the decomposed signal can be perfectly reconstructed from its wavelet representation when using a wavelet transform. Thus, a WT has to meet the criterion which is also known as the admissibility condition.

A CWT is mainly used in data analysis of scientific research. In practical applications, the discrete version, a discrete wavelet transform (DWT), is more popular due to the small computation cost and excellent signal compaction properties. A DWT decomposes a signal into different frequency bands by passing it through a series of filters. In each step of decomposition, a signal is passed through a pair of low- and high-pass filters simultaneously accompanying by down sampling. A more general form of wavelet analysis is the so-called wavelet packet decomposition (WPD). In WPD, a signal is split into two parts, one contains a vector of approximation coefficients and the other contains a vector of detail coefficients in each stage of decomposition. Both the details and the approximations can be split further in the next level of decomposition which offer a great range of possibilities to decode a signal than ordinary wavelet analysis. Wavelet transform has been widely employed in signal processing for condition monitoring and fault diagnosis of rotating machine [15].

#### 3.3.3. Wigner-Ville distribution

Another popular time-frequency analysis technique is Wigner-Ville distribution (WVD). WVD is the core distribution for the quadratic class of quadratic time-frequency distributions. It yields an ideal resolution for mono-component, linearly frequency-modulated signals, but produces undesired cross-terms for multicomponent and nonlinearly frequency-modulated signals. Wigner-Ville distribution can be viewed as a particular case of the Cohen class distributions which yields a time-frequency energy density computed by correlating the signal with a time and frequency translation of itself.

The WVD of a signal xðtÞ is defined by

STFT transform, is often used in the display of the transformation result for signal analysis and interpretation. A major limitation of STFT is that the analysis has a uniform resolution in both time and frequency planes implying that a small time window will have a good time resolution but poor frequency resolution and vice versa. This drawback limits the technique for the analysis of signals with slow-changing dynamic only. Another time-frequency analysis tech-

Wavelet transform (WT) is a time scale representation of a signal. It is the inner product of a signal with the translated and scaled family of a mother wavelet function ψ(t). In general, WT analysis can be categorized into three forms: a continuous wavelet transform (CWT), a discrete

wavelet transform (DWT), and a more general wavelet packet decomposition (WPD).

ffiffi

where s is the scale parameter and u is the time translation. If one considers the wavelet function ψ(t) as a bandpass impulse response, then the wavelet transform is simply a bandpass analysis. Care should be taken to ensure that the decomposed signal can be perfectly reconstructed from its wavelet representation when using a wavelet transform. Thus, a WT

A CWT is mainly used in data analysis of scientific research. In practical applications, the discrete version, a discrete wavelet transform (DWT), is more popular due to the small computation cost and excellent signal compaction properties. A DWT decomposes a signal into different frequency bands by passing it through a series of filters. In each step of decomposition, a signal is passed through a pair of low- and high-pass filters simultaneously accompanying by down sampling. A more general form of wavelet analysis is the so-called wavelet packet decomposition (WPD). In WPD, a signal is split into two parts, one contains a vector of approximation coefficients and the other contains a vector of detail coefficients in each stage of decomposition. Both the details and the approximations can be split further in the next level of decomposition which offer a great range of possibilities to decode a signal than ordinary wavelet analysis. Wavelet transform has been widely employed in signal processing for condi-

Another popular time-frequency analysis technique is Wigner-Ville distribution (WVD). WVD is the core distribution for the quadratic class of quadratic time-frequency distributions. It yields an ideal resolution for mono-component, linearly frequency-modulated signals, but produces undesired cross-terms for multicomponent and nonlinearly frequency-modulated signals. Wigner-Ville distribution can be viewed as a particular case of the Cohen class

<sup>s</sup> <sup>p</sup> <sup>∫</sup> <sup>x</sup>ðtÞψ� <sup>t</sup> <sup>−</sup> <sup>u</sup>

s � �

dt: (26)

In CWT, the wavelet transform of a continuous signal xðtÞ is calculated using

Wfðu,sÞ ¼ <sup>1</sup>

has to meet the criterion which is also known as the admissibility condition.

tion monitoring and fault diagnosis of rotating machine [15].

3.3.3. Wigner-Ville distribution

nique, wavelet transform, has then been developed to overcome this limitation.

3.3.2. Wavelet transform

54 Bearing Technology

$$\mathcal{W}\_{\mathbf{x}}(t,\omega) = \frac{1}{2\pi} \int\_{-\infty}^{+\infty} \mathbf{x}\left(t + \frac{\pi}{2}\right) \cdot \mathbf{x}^\*\left(t - \frac{\pi}{2}\right) \cdot \mathbf{e}^{-j\omega \tau} d\tau,\tag{27}$$

where x\* denotes the conjugate of x. Thus, the Wigner-Ville integral is in fact a Fourier transform of the inner product of a signal and its conjugate with a time delay variable τ. The bilinear nature of this procedure therefore avoids the loss of time-frequency resolution in the transform such as the resolution problem encountered when performing the finite sliding time windowing in STFT.

Compared with other distributions, the WVD has the desirable property of fulfilling the marginal condition, thus the total signal energy can be calculated in time or in frequency using the Plancherel formula:

$$||\mathbf{x}^2|| = \int\_{-\infty}^{+\infty} |\mathbf{x}(t)|^2 d\mathbf{t} = \frac{1}{2\pi} \mathbb{I}\_{-\infty}^{+\infty} |X(\omega)|^2 d\omega \tag{28}$$

The values jjxðtÞjj<sup>2</sup> and jjXðωÞjj<sup>2</sup> can be interpreted as the energy densities in time and frequency domain, respectively. This enables a direct computation of the energy present at a given time-frequency box from the WVD output.

Another important feature of WVD is that its first conditional moment for a given time tc equals the instantaneous frequency:

$$\frac{d\boldsymbol{\rho}}{dt} = \left<\boldsymbol{\omega}\right>\_{t\_c} = \frac{1}{\left|\boldsymbol{s}(t\_c)\right|^2} \stackrel{+\ast}{\int}\_{-\bullet} \boldsymbol{\omega} \boldsymbol{W}(t\_c, \boldsymbol{\omega}) d\boldsymbol{\omega}.\tag{29}$$

Therefore, it can be computed as the average of all frequencies ω present in the time-frequency plane at a time tc.

In discrete form, the WVD is defined as

$$\text{WVD}(n,k) = \sum\_{p=-N}^{N-1} \mathbb{R}[n,q] \cdot \mathbf{e}^{-j2\pi kq/N},\tag{30}$$

where R[n, q] is the instantaneous correlation given by

$$R[n,q] = \mathbf{x}\left[n + \frac{q}{2}\right] \cdot \mathbf{x}^\*\left[n - \frac{q}{2}\right],\tag{31}$$

in which n is the number of samples of the analytical or interpolated form of the discrete signal x[n] and q is an odd integer.

Since the instantaneous correlation is centered on a value, the delay q is distributed between the delayed sample x½n − q=2� and the corresponding advanced sample, ½n þ q=2� . It is thus necessary to calculate the value of x½n� at the two half integer positions using an interpolation. In addition, positions at the extremes of x½n� are padded with zeros in order to compute the Fourier transform. A major drawback of WVD analysis is that it can induce artifacts and negative values which need to be properly compensated in the signal analysis [16].

#### 3.3.4. Adaptive signal decomposition

Empirical mode decomposition (EMD) is an adaptive time-frequency analysis technique originally proposed by Huang [17] in 1998. It is based on the local characteristic time scales of a signal and can decompose the signal into a set of complete and almost orthogonal components termed as intrinsic mode functions (IMF). Lei et al. [18] provided a review on the successful application examples of EMD technique in fault diagnosis of rotating machines.

In EMD analysis, the decomposed signal can be represented by

$$\mathbf{x}(t) = \sum\_{i=1}^{N} \mathbf{C}\_i(t) + r\_N(t), \tag{32}$$

where CiðtÞ represents the ith IMF component and rNðtÞ is the residual after the EMD decomposition.

The IMFs represent the natural oscillatory modes imbedded in the signal and can serve as the basis functions, which are determined by the signal itself, rather than predetermined kernels. Thus, the decomposition is a self-adaptive signal process suitable for nonlinear and nonstationary data analysis.

Although the EMD technique has been successfully employed in the analysis of nonlinear and nonstationary signals in various applications, the algorithm itself also has a number of weaknesses, for instance, a lack of a solid theoretical foundation, end effects, a sifting stop criterion and extremum interpolation. To overcome some of these deficiencies, an improved EMD algorithm or a so-called ensemble empirical mode decomposition (EEMD) technique has been developed [19]. EEMD is a noise-assisted data analysis technique which imposes a white Gaussian noise into a signal and then decomposes the mixed signal by using the EMD algorithm. A major advantage of the EEMD technique is that no missing scales will be presented in the decomposition and the IMF components in different scales of the signal are automatically projected into proper reference scales established by the white noise in the background [20].

#### 3.3.5. Case studies

Case 3 (EMD de-noising): In this analysis, the noise-added signal shown in Figure 7 is adaptively decomposed into 14 IMF components and a residual component using the EMD technique described in the previous section. The decomposition result is shown in Figure 10. The correlation coefficient of each IMF component with the original signal can be calculated using

Condition Monitoring and Fault Diagnosis of Roller Element Bearing http://dx.doi.org/10.5772/67143 57

$$\rho\_{xy} = \frac{n\sum x\_i y\_{j,i} - \sum x\_i \sum y\_{j,i}}{\sqrt{n\sum x\_i^2 - \left(\sum x\_i\right)^2} \sqrt{n\sum y\_{j,i}^2 - \left(\sum y\_{j,i}\right)^2}},\tag{33}$$

where xi ði ¼ 1, 2,…, nÞ is the original data, yj,<sup>i</sup> ði ¼ 1, 2, …, nÞ is the data of one of the jth IMF components and n is the data record length. The calculated correlation coefficients for the 14 IMF components and the residual are listed in Table 3.

Since the instantaneous correlation is centered on a value, the delay q is distributed between the delayed sample x½n − q=2� and the corresponding advanced sample, ½n þ q=2� . It is thus necessary to calculate the value of x½n� at the two half integer positions using an interpolation. In addition, positions at the extremes of x½n� are padded with zeros in order to compute the Fourier transform. A major drawback of WVD analysis is that it can induce artifacts and

Empirical mode decomposition (EMD) is an adaptive time-frequency analysis technique originally proposed by Huang [17] in 1998. It is based on the local characteristic time scales of a signal and can decompose the signal into a set of complete and almost orthogonal components termed as intrinsic mode functions (IMF). Lei et al. [18] provided a review on the successful

negative values which need to be properly compensated in the signal analysis [16].

application examples of EMD technique in fault diagnosis of rotating machines.

<sup>x</sup>ðtÞ ¼ <sup>X</sup> N

i¼1

where CiðtÞ represents the ith IMF component and rNðtÞ is the residual after the EMD decom-

The IMFs represent the natural oscillatory modes imbedded in the signal and can serve as the basis functions, which are determined by the signal itself, rather than predetermined kernels. Thus, the decomposition is a self-adaptive signal process suitable for nonlinear and nonstationary

Although the EMD technique has been successfully employed in the analysis of nonlinear and nonstationary signals in various applications, the algorithm itself also has a number of weaknesses, for instance, a lack of a solid theoretical foundation, end effects, a sifting stop criterion and extremum interpolation. To overcome some of these deficiencies, an improved EMD algorithm or a so-called ensemble empirical mode decomposition (EEMD) technique has been developed [19]. EEMD is a noise-assisted data analysis technique which imposes a white Gaussian noise into a signal and then decomposes the mixed signal by using the EMD algorithm. A major advantage of the EEMD technique is that no missing scales will be presented in the decomposition and the IMF components in different scales of the signal are automatically projected into proper reference scales established by the white noise in the background [20].

Case 3 (EMD de-noising): In this analysis, the noise-added signal shown in Figure 7 is adaptively decomposed into 14 IMF components and a residual component using the EMD technique described in the previous section. The decomposition result is shown in Figure 10. The correlation coefficient of each IMF component with the original signal can be calculated using

CiðtÞ þ rNðtÞ, (32)

In EMD analysis, the decomposed signal can be represented by

3.3.4. Adaptive signal decomposition

position.

56 Bearing Technology

data analysis.

3.3.5. Case studies

Figure 10. The IMF components and residual of the bearing defect signal from the EMD decomposition.


Table 3. The correlation coefficients of the IMF components and the residual.

It is shown in the figure that the first IMF component (IMF1) has the highest correlation coefficient with the original signal implying that it is most closely related to the defect signal. Therefore, an envelope analysis is undertaken on the IMF1 component in the next step of analysis. The result is shown in Figure 11 where the bearing defect frequency and its secondorder harmonic can be clearly discriminated from the envelope spectrum.

Figure 11. The envelope spectrum of the first IMF component of the bearing defect signal.

Case 4 (Signal de-noising using SVD decomposition): The singular-value decomposition described in Section 3.1.4 can be employed to filter out the noise from bearing conditionmonitoring signals to enhance the impulses produced by a bearing defect. In this approach, the one-dimension time waveform of the bearing condition-monitoring signal x ¼ ðx1, x2, …, xnÞ (as shown in Figure 12(a)) is rearranged by an incrementing process to form a Hankel matrix Aðp, gÞ which is then decomposed into three matrices using Eq. (9) in Section 3.1.4 to obtain the singular values σi, i ¼ 1, 2, …L, whose values are shown in Figure 13(a).

Figure 12. Noise-added bearing defect signal and its envelope spectrum: (a) time waveform (note: the sampling frequency in this simulation is reduced to 5 kHz to avoid the requirement of large computer memory in SVD decomposition; (b) envelope spectrum.

Figure 13. (a) Singular values and (b) the series of the difference between two sequential singular values.

The next step is to calculate the difference between the subsequent singular values bi ¼ σi−σ<sup>i</sup>þ<sup>1</sup>, ði ¼ 1, 2, ⋯, L−1Þ to form a series vector B ¼ ðb1, b2, …, bL<sup>−</sup>1Þ which reflects the variation of the two neighboring singular values. Vector B is plotted in Figure 13(b) for illustration. When the difference between two neighboring singular values is large, they will form a larger peak in the difference vector B (see Figure 13(b)) indicating that there is a small correlation between the defect and noise signals at the corresponding singular values. Keeping the singular values prior to as well as the largest difference peak and letting the singular values after this peak to zero and then substituting the modified singular-value vector to Eq. (9) to obtain a new Hankel matrix (note, as the first difference peak happens to be the largest peak in our case, the singular values associated with the sequential three peaks are also used in the modified singular value vector). We can then reverse the process of the first step to obtain the de-noised bearing defect signal as shown in Figure 14(a). The envelope spectrum of the de-noised signal (Figure 14(a)) is shown in Figure 14(b). It is shown that the SVD de-noise can yield a much clean spectrum mainly containing the bearing defect frequency component and its higher harmonics.

#### 3.4. Statistical-based bearing fault diagnosis

#### 3.4.1. Statistical features in the time domain

Case 4 (Signal de-noising using SVD decomposition): The singular-value decomposition described in Section 3.1.4 can be employed to filter out the noise from bearing conditionmonitoring signals to enhance the impulses produced by a bearing defect. In this approach, the one-dimension time waveform of the bearing condition-monitoring signal x ¼ ðx1, x2, …, xnÞ (as shown in Figure 12(a)) is rearranged by an incrementing process to form a Hankel matrix Aðp, gÞ which is then decomposed into three matrices using Eq. (9) in Section 3.1.4 to

Figure 12. Noise-added bearing defect signal and its envelope spectrum: (a) time waveform (note: the sampling frequency in this simulation is reduced to 5 kHz to avoid the requirement of large computer memory in SVD decomposition;

(b) envelope spectrum.

58 Bearing Technology

obtain the singular values σi, i ¼ 1, 2, …L, whose values are shown in Figure 13(a).

Figure 11. The envelope spectrum of the first IMF component of the bearing defect signal.

Some useful statistical features obtained directly from a time waveform signal can be used to evaluate the health condition of a rolling element bearing. Such features can be grouped into two categories: (a) dimensional features and (b) nondimensional features. The group of dimensional statistical features includes peak value, root-mean-square (RMS) value, absolute mean value and variance. This feature group is closely related to bearing fault severity, for instance, their values increase as the bearing fault severity increases. Though, care needs to be taken as their values are also influenced by the working conditions such as load or rotate speed of a machine.

Figure 14. (a) The de-noised bearing defect signal after SVD decomposition; (b) the envelope spectrum.

For a smooth and ergodic continuous time-domain signal xðtÞ, its peak value can be calculated as

$$\mathbf{x}\_p = \mathbf{Max}[|\mathbf{x}(t)|].\tag{34}$$

Its RMS value which reflects the power level of the signal is calculated by

$$\mathbf{x}\_{\text{rms}} = \sqrt{\mathbf{f}\_{-\infty}^{+\infty} \mathbf{x}(t)^2 p(\mathbf{x})} \mathbf{dx},\tag{35}$$

where pðxÞ is the probability density function of the signal xðtÞ, which represents the probability level that the signal xðtÞ fall into a certain interval.

Alternatively, an approximate formula can be used to calculate the RMS value as

$$\mathbf{x}\_{\text{rms}} = \sqrt{\frac{1}{T} \mathbf{f}\_0^T \mathbf{x}^2(t)} dt. \tag{36}$$

The absolute mean value is defined as

$$\mathbf{x}\_{\text{av}} = \int\_{-\infty}^{+\infty} |\mathbf{x}| p(\mathbf{x}) d\mathbf{x}.\tag{37}$$

Or it can be calculated using the approximate formula:

$$\propto\_{\rm av} = \frac{1}{T} \mathfrak{f}\_{-\rm os}^{+\rm os} |\mathfrak{x}(t)| dt. \tag{38}$$

Variance is used to depict the fluctuation of a signal that deviated from the center, which can be viewed as the dynamic feature of a signal. The variance of a signal xðtÞ is

Condition Monitoring and Fault Diagnosis of Roller Element Bearing http://dx.doi.org/10.5772/67143 61

$$D\_{\mathbf{x}} = \sigma\_{\mathbf{x}}^2 = \mathfrak{f}\_{-\infty}^{+\infty} (\mathbf{x} - \mu\_{\mathbf{x}})^2 p(\mathbf{x}) d\mathbf{x},\tag{39}$$

where μ<sup>x</sup> is the mean value and σ<sup>x</sup> is the standard deviation of the signal.

Variance can also be calculated approximately using the following formula:

$$D\_{\mathbf{x}} = \sigma\_{\mathbf{x}}^2 = \frac{1}{T} \mathbf{f}\_0^T \left(\mathbf{x}(t) \mathbf{-} \boldsymbol{\mu}\_{\mathbf{x}}\right)^2 dt. \tag{40}$$

Table 4 lists the mathematical formula for calculating the same features for a corresponding discrete time waveform, fxðnÞjn ¼ 1, 2, ⋯, Ng.


Table 4. The dimensional statistical features of a discrete time series.

For a smooth and ergodic continuous time-domain signal xðtÞ, its peak value can be calculated as

∫ þ∞ <sup>−</sup><sup>∞</sup> xðtÞ 2 pðxÞdx

where pðxÞ is the probability density function of the signal xðtÞ, which represents the probabil-

r

q

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 1 T ∫ T <sup>0</sup> x<sup>2</sup>ðtÞdt

Its RMS value which reflects the power level of the signal is calculated by

Figure 14. (a) The de-noised bearing defect signal after SVD decomposition; (b) the envelope spectrum.

xrms ¼

Alternatively, an approximate formula can be used to calculate the RMS value as

xrms ¼

xav ¼ ∫ þ∞

<sup>x</sup>av <sup>¼</sup> <sup>1</sup> T ∫ þ∞

be viewed as the dynamic feature of a signal. The variance of a signal xðtÞ is

Variance is used to depict the fluctuation of a signal that deviated from the center, which can

ity level that the signal xðtÞ fall into a certain interval.

Or it can be calculated using the approximate formula:

The absolute mean value is defined as

60 Bearing Technology

xp ¼ Max½jxðtÞj�: (34)

, (35)

: (36)

<sup>−</sup><sup>∞</sup> jxjpðxÞdx: (37)

<sup>−</sup><sup>∞</sup> jxðtÞjdt: (38)

The group of nondimensional statistical features includes crest factor, shape factor, impulsion factor, clearance factor, skewness, and kurtosis factors which are the ratio of two-dimensional statistical features. This type of features is insensitive to the change amplitude or frequency in a signal and thus is not influenced by the working condition of a machine and can accurately reflect a fault condition of a bearing. The formulas for these features are listed in Table 5:


Table 5. The nondimensional features of a time waveform.

In Table 5, xr is the root amplitude of a continuous time waveform, which is given by

$$\mathbf{x}\_{\tau} = \left( \int\_{-\infty}^{+\infty} |\mathbf{x}(t)|^{1/2} p(\mathbf{x}) d\mathbf{x} \right)^{2}. \tag{41}$$

Its discrete form is given by

$$\mathbf{x}\_{\mathbf{r}} = \left(\frac{1}{N} \sum\_{i=1}^{N} \sqrt{|\mathbf{x}\_{i}|}\right)^{2}. \tag{42}$$

Shape factor and impulsion factor are often used to examine whether there exists pulse shocks in a signal. Clearance factor is sometimes used to examine the wear condition of a machine. Yet, the most frequently employed nondimensional features in data analysis are the skewness and kurtosis factors. The skewness is the third statistic moment that measures the degree of asymmetric distribution of a time waveform. The kurtosis is the fourth statistic moment that measures the "Peakness" of the data distribution.

#### 3.4.2. Statistical features in the frequency domain

The power spectrum depicts the power amplitude level of the frequency components in a signal. If the power amplitude levels for some of the frequency components change, the weight-averaged center frequency of a power spectrum will also change. For example, if the number of frequency components with dominant amplitude in a spectrum increases, the energy distribution will be more dispersed. On the contrary, the energy distribution will be concentrated more around the dominant frequency components if there are only a few of such components in the spectrum. Hence, the fluctuation of a signal in the frequency domain can reflect the change of machine condition by observing the change in the weight-averaged center frequency of power spectrum or the disperse degree of power amplitude level distribution.

The typical statistical featured used to depict the weight-averaged center of a power spectrum are "Frequency center" and "Mean-square frequency," which are defined as follows:

$$\text{Frequency center}: \text{FC} = \frac{\int\_{-\infty}^{+\infty} fS(f) df}{\int\_{0}^{+\infty} S(f) df},\tag{43}$$

and

$$\text{Mean-square frequency}:\ \text{MSF} = \frac{\int\_0^{+\infty} f^2 S(f) df}{\int\_0^{+\infty} S(f) df},\tag{44}$$

where SðfÞ represents the power spectrum of a continuous waveform xðtÞ.

The statistical feature used to depict the disperse degree of energy distribution in the frequency domain is the "Variance of frequency," which is defined as

$$\text{Variance of frequency} : \text{VF} = \frac{\int\_0^{+\infty} (f - \text{FC})^2 S(f) df}{\int\_0^{+\infty} S(f) df} = \text{MSF} - \text{FC}^2. \tag{45}$$

Accordingly, for a discrete time series fxðnÞjn ¼ 1, 2, ⋯, Ng, the three frequency-domain statistical features are given by

$$\text{FC} = \frac{1}{2\pi\Delta} \frac{\int\_0^\pi \omega S(\omega) d\omega}{\int\_0^\pi S(\omega) d\omega},\tag{46}$$

$$\text{MSF} = \frac{1}{4\pi^2 \Delta^2} \frac{\int\_0^\pi \omega^2 S(\omega) d\omega}{\int\_0^\pi S(\omega) d\omega},\tag{47}$$

and

In Table 5, xr is the root amplitude of a continuous time waveform, which is given by

xr <sup>¼</sup> <sup>1</sup> N X N

i¼1

Shape factor and impulsion factor are often used to examine whether there exists pulse shocks in a signal. Clearance factor is sometimes used to examine the wear condition of a machine. Yet, the most frequently employed nondimensional features in data analysis are the skewness and kurtosis factors. The skewness is the third statistic moment that measures the degree of asymmetric distribution of a time waveform. The kurtosis is the fourth statistic moment that

The power spectrum depicts the power amplitude level of the frequency components in a signal. If the power amplitude levels for some of the frequency components change, the weight-averaged center frequency of a power spectrum will also change. For example, if the number of frequency components with dominant amplitude in a spectrum increases, the energy distribution will be more dispersed. On the contrary, the energy distribution will be concentrated more around the dominant frequency components if there are only a few of such components in the spectrum. Hence, the fluctuation of a signal in the frequency domain can reflect the change of machine condition by observing the change in the weight-averaged center frequency of power spectrum or the disperse degree of power amplitude level distribution.

The typical statistical featured used to depict the weight-averaged center of a power spectrum

The statistical feature used to depict the disperse degree of energy distribution in the frequency

þ∞ <sup>−</sup><sup>∞</sup> f SðfÞdf

> þ∞ <sup>0</sup> f 2 SðfÞdf

∫ þ∞

∫ þ∞

are "Frequency center" and "Mean-square frequency," which are defined as follows:

Frequency center : FC <sup>¼</sup> <sup>∫</sup>

Mean‐square frequency : MSF <sup>¼</sup> <sup>∫</sup>

where SðfÞ represents the power spectrum of a continuous waveform xðtÞ.

domain is the "Variance of frequency," which is defined as

pðxÞdx �2

ffiffiffiffiffiffi <sup>j</sup>xi<sup>j</sup> p !<sup>2</sup> : (41)

: (42)

<sup>0</sup> <sup>S</sup>ðfÞdf , (43)

<sup>0</sup> <sup>S</sup>ðfÞdf , (44)

xr ¼ � ∫ þ∞ <sup>−</sup><sup>∞</sup> <sup>j</sup>xðtÞj<sup>1</sup>=<sup>2</sup>

Its discrete form is given by

62 Bearing Technology

and

measures the "Peakness" of the data distribution.

3.4.2. Statistical features in the frequency domain

$$\text{VF} = \frac{1}{4\pi^2 \Delta^2} \frac{\int\_0^\pi (\omega - 2\pi \Delta F \text{C})^2 S(\omega) d\omega}{\int\_0^\pi S(\omega) d\omega} = \text{MSF-FC}^2,\tag{48}$$

where Δ is the sampling frequency, SðωÞ is the power spectrum of a discrete time series xðnÞ, which can be obtained using the following formula:

$$S(\omega) = X(\omega)\overline{X(\omega)},\tag{49}$$

and

$$X(\omega) = \sum\_{n=1}^{N-1} x(n) \mathbf{e}^{\cdot \text{j} \cdot \text{n} \cdot \text{s}},\tag{50}$$

where ω is the angular frequency.

#### 3.4.3. Data complexity index

The complexity of a signal can be described by two measures: entropy and Lempel-Ziv complexity.

Entropy is a measure of randomness, suggested by Shannon in 1948 [21]. Entropy is used to depict the randomness (or "uncertainty") existed in a signal or the amount of information carried by the signal.

For a discrete random variable x with probability density function fxiji ¼ 1, 2, ⋯, Ng, the Shannon entropy is given by

$$H(\mathbf{x}\_i) = -\sum\_{i=1}^n p(i) \log\_b p(i),\tag{51}$$

where pðiÞ is the probability of the ith event of the random variable x, b is the base of the logarithm which takes the common logarithm base values of either 2, e, or 10 depending on the application. For events with a probability around 0 or 1, the term pðiÞlogbpðiÞ converges to zero and the entropy is zero. For random signals with uniform probability density function such as pure noise, the entropy is maximum.

The entropy-based techniques have been widely used in bearing fault diagnosis in the last decade. It has been shown that the entropy value is closely related to the working condition of a machine and its value decreases monotonously with aggravation of faults or conditions [22]. Entropy is often combined with other techniques to capture the detail changes of the nonlinearity and nonstationary properties of a signal in machine fault diagnosis [23, 24]. Typical entropies used in machine fault diagnosis are approximate entropy, sample entropy, fuzzy entropy and permutation entropy. Yet, some of these features are still not ideal and problematic in CM applications. For example, approximate entropy is heavily dependent on the data record length which could yield lower estimation value. Sample entropy uses Heaviside step function which is mutational and discontinue at the boundary. Fuzzy entropy is calculated based on the membership function which is difficult to determine accurately. Permutation entropy requires the reconstruction of phase space though the embedding dimension and time lag of the reconstructed matrix need to be selected manually which has so far limits its application.

Ziv and Lempel [25] presented a specific complexity algorithm termed as Lempel-Ziv complexity (LZC) to calculate the complexity of a finite length time series. LZC can reflect the rate for generating the new condition pattern feature as the nonlinearity of a time series grows [26]. Its value represents the degree of random variation of a time series. In their algorithm, the complexity values of a time series is evaluated based on a ''coarse-graining'' operation by which the data sequence is transformed into a pattern of only a few symbols, for example, 0 and 1 and involves data sequence comparison and number counting in one dimension only. A flow chart of the LZC algorithm is shown in Figure 15 and the process is described below:

	- Step 1: Take Qr ¼ {sr} and check whether this string belongs to the vocabulary of SQvr. If so, string Qr ¼ {sr} is a simple repetition of an existing substring of SQvr (i.e., a simple "copy" of the existing vocabulary can restore it) and the complexity remains unchanged or cNðrÞ ¼ cNðr−1Þ.
	- Step2: Read the next string and take Qrþ<sup>1</sup> ¼ {sr,srþ<sup>1</sup>}. Check if Qrþ<sup>1</sup> ¼ {sr,srþ<sup>1</sup>} belongs to SQvrþ1.
	- Step 3: If string Qrþ<sup>1</sup> does not belongs to SQvrþ1, increase the complexity by one, i.e., cNðr þ 1Þ ¼ cNðrÞ þ 1, nullify Qrþ<sup>1</sup> ¼ {}, read the next string and take Qrþ<sup>3</sup> ¼ {srþ<sup>3</sup>}.

Step 4: Repeat the above process until SN ¼ {s1,s2, ⋯,sn} is covered. The resulting cNðnÞ is the complexity of a given string.

3. Normalization of the complexity value. The complexity obtained above equals the number of nullification of Q. It indicates that the complexity is affected by the length of the string, or the number of data sample N. To find a robust complexity measure, Lempel and Ziv [27] suggested a normalized measure cLZ ¼ cNðn<sup>Þ</sup> bNðnÞ , which is termed as LZC after their names and defined by

$$0 \le c\_{LZ} = \frac{c\_N(n)}{b\_N(n)} \le 1\tag{52}$$

where bNðnÞ ¼ lim<sup>n</sup>!<sup>∞</sup> cNðnÞ<sup>≈</sup> <sup>n</sup> logk<sup>n</sup>.

application. For events with a probability around 0 or 1, the term pðiÞlogbpðiÞ converges to zero and the entropy is zero. For random signals with uniform probability density function such as

The entropy-based techniques have been widely used in bearing fault diagnosis in the last decade. It has been shown that the entropy value is closely related to the working condition of a machine and its value decreases monotonously with aggravation of faults or conditions [22]. Entropy is often combined with other techniques to capture the detail changes of the nonlinearity and nonstationary properties of a signal in machine fault diagnosis [23, 24]. Typical entropies used in machine fault diagnosis are approximate entropy, sample entropy, fuzzy entropy and permutation entropy. Yet, some of these features are still not ideal and problematic in CM applications. For example, approximate entropy is heavily dependent on the data record length which could yield lower estimation value. Sample entropy uses Heaviside step function which is mutational and discontinue at the boundary. Fuzzy entropy is calculated based on the membership function which is difficult to determine accurately. Permutation entropy requires the reconstruction of phase space though the embedding dimension and time lag of the reconstructed matrix need to be selected manually which has

Ziv and Lempel [25] presented a specific complexity algorithm termed as Lempel-Ziv complexity (LZC) to calculate the complexity of a finite length time series. LZC can reflect the rate for generating the new condition pattern feature as the nonlinearity of a time series grows [26]. Its value represents the degree of random variation of a time series. In their algorithm, the complexity values of a time series is evaluated based on a ''coarse-graining'' operation by which the data sequence is transformed into a pattern of only a few symbols, for example, 0 and 1 and involves data sequence comparison and number counting in one dimension only. A flow chart of the LZC algorithm is shown in Figure 15 and the process is described below:

1. Coarse-grain process to a finite binary sequence. A discrete time series A ¼ fa1, a2, ⋯, ang is converted into a binary sequence SN ¼ {s1,s2, ⋯,sn} in the initiation and preprocessing

2. Copy and Insert. A binary sequence up to srð1 < r < NÞ of complexity cN can be reconstructed by simply copying and inserting some of the existing vocabulary of SQvr ¼ {s1,s2, ⋯,sv} ðv < rÞ. To check the rest string SN<sup>−</sup><sup>r</sup> ¼ {srþ<sup>1</sup>, ⋯,sN} can be reproduced by the same approach, the process is executed by the following steps:

Step 1: Take Qr ¼ {sr} and check whether this string belongs to the vocabulary of SQvr. If

Step2: Read the next string and take Qrþ<sup>1</sup> ¼ {sr,srþ<sup>1</sup>}. Check if Qrþ<sup>1</sup> ¼ {sr,srþ<sup>1</sup>} belongs to

Step 3: If string Qrþ<sup>1</sup> does not belongs to SQvrþ1, increase the complexity by one, i.e.,

remains unchanged or cNðrÞ ¼ cNðr−1Þ.

so, string Qr ¼ {sr} is a simple repetition of an existing substring of SQvr (i.e., a simple "copy" of the existing vocabulary can restore it) and the complexity

cNðr þ 1Þ ¼ cNðrÞ þ 1, nullify Qrþ<sup>1</sup> ¼ {}, read the next string and take Qrþ<sup>3</sup> ¼ {srþ<sup>3</sup>}.

pure noise, the entropy is maximum.

64 Bearing Technology

so far limits its application.

phase.

SQvrþ1.

It is shown in Figure 15 that the coarse-grained process of a finite time sequence serves as the basis for the LZC algorithm. Commonly employed technique in coarse-grained process is a single-scale process which converts a discrete time series A ¼ fa1, a2, ⋯, ang into a symbolic binary sequence SN ¼ {s1,s2, ⋯,sn} using the following formula:

$$s\_i = \begin{cases} 1, \ a\_i \ge \overline{a} \\ 0, \ a\_i < \overline{a} \end{cases} \tag{53}$$

where a ¼ ða<sup>1</sup> þ a<sup>2</sup> þ ⋯ þ anÞ=n is the mean value of the discrete time series which is used as a threshold in the coarse-graining preprocess.

In the single-scale coarse-grain process, all segments of a discrete time series larger than or equal to the mean value are set to 1 while all segments smaller than the mean value are set to 0. The process neglects the fluctuation between the data intervals and loses many detailed information of the data series during the process. In order to capture the details contained in a discrete time series, the division interval should be reduced and a multidivision scale should be adopted so that the variation in the data can be reflected in the binary sequence at a multiscale process. The preprocess of a revised multiscale coarse grain is outlined below:


be assigned to the binary value of 1. When the value of the latter point decreases into another division interval, the point will be assigned to the binary value of 0.

Figure 15. Flow chart of the LZC algorithm.

A data sample given by Figure 16 is used here as an example to illustrate the difference between the single-scale and multiscale coarse-grain process. For single-scale coarse-grain process, the binary sequence of the data sample is SN ¼ ð1;1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0Þ which does not reflect the fluctuation between the data interval of the time series. When a two-scale coarse-grain process is employed to process the same data, the binary sequence becomes SN=(1,1,0,1,0,1,0,1,0,1,0,1). Therefore, a multiscale coarse-grain process can better capture the detailed fluctuation in a discrete time series.

Figure 16. An example of a discrete time series.

#### 3.4.4. Manifold learning

be assigned to the binary value of 1. When the value of the latter point decreases into

A data sample given by Figure 16 is used here as an example to illustrate the difference between the single-scale and multiscale coarse-grain process. For single-scale coarse-grain process, the binary sequence of the data sample is SN ¼ ð1;1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0Þ which does not reflect the fluctuation between the data interval of the time series. When a two-scale coarse-grain process is employed to process the same data, the binary sequence becomes SN=(1,1,0,1,0,1,0,1,0,1,0,1). Therefore, a multiscale coarse-grain process can better capture the detailed fluctuation in a

discrete time series.

66 Bearing Technology

Figure 15. Flow chart of the LZC algorithm.

another division interval, the point will be assigned to the binary value of 0.

Classical dimensionality reduction techniques, such as multidimensional scaling (MDS) and independent component analysis (ICA), are only applicable to linearly structured data sets but not suitable for high-dimensional, nonlinear data sets such as bearing CM data. To overcome this problem, a nonlinear dimensionality reduction technique, manifold learning, has recently developed for machine fault diagnosis [28]. Manifold learning technique projects the original high-dimensional data onto a lower-dimension feature space while preserving the local neighborhood structure to detect the intrinsic structure of nonlinear high-dimensional data. Manifold learning can be realized through several algorithms including locally linear embedding (LLE), isometric feature mapping (IsoMap), local tangent space alignment (LTSA) and local preserving projection (LPP).

The application of manifold learning in mechanical fault diagnosis can be in twofolds. Firstly, fault features with a large dimension can have many redundant components, which can increase the complexity and operation time of a fault diagnosis process. Manifold learning can be used to eliminate the redundant components and extract the nonlinear features for fault classification in this case. Secondly, manifold learning can discard the noise components and extract the intrinsic manifold features related to nonlinear dynamic of a CM signal; therefore, it can also be used as a de-noise technique. It should be noted that manifold learning only operates on a matrix, so a preprocessing such as a reconstruction of phase space converting an original one-dimensional signal into a two-dimensional data is required. For a detailed information of the manifold learning algorithm and its application in fault diagnosis, interested readers are referred to [28, 29].

#### 3.5. Bearing fault diagnosis based on artificial intelligent

#### 3.5.1. Shallow architecture machine learning

The most commonly employed machine-learning techniques in fault diagnosis of rotating machines are hidden Markov models (HMMs), support vector machines (SVMs) and artificial neural networks (ANNs), which exploit shallow architectures either contain a single hidden layer or without a hidden layer. Such shallow architecture-based models have achieved great success both in theory and in practical applications. Though, shortcomings of these algorithms such as poor universality, lacking of theory basis in parameter selection, easier to fall into a local optimum value have limited the application of the algorithms in machine fault diagnosis.

#### 3.5.2. Deep neural network

A consensus criterion for an effective bearing fault diagnosis technique is that it should not only be able to identify various bearing fault conditions but also be able to discriminate different fault severities in each fault condition [30]. This leads to a stricter requirement on identification procedures where a classifier must have a greater capability in discriminating different fault classes. When fault data contain more than one level of fault severities in each fault condition, the accuracy of fault diagnostic result using shallow architecture classifiers described in the previous section will reduce dramatically. Hinton and Salakhutdinov [31] proposed a deep learning technique to overcome the deficiencies of single-layer architecture classifiers for a better pattern recognition capability. The concept is further extended to become a so-called deep belief networks (DBNs) [32] which relieves the training difficulties of deep network structures by adopting a layer-by-layer unsupervised forward pretraining learning and then back fine-tuning mechanism. A description of the training process of a DBN is given in Figure 17.

Figure 17. A description of the training process of a DBN.

Bengio et al. [33] proposed a deep stack auto-encoder (SAE) network by using a network structure similar to DBNs which is stacked with a number of auto-encoder networks. Furthermore, Le Cun et al. [34] proposed a convolutional neural network (CNN), a multilayer network where each layer is composed of several two-dimensional planes, to reduce the number of parameters in the learning process using a unique weight-sharing mechanism. The above cited deep-learning-based neural network algorithms have been widely employed in machine fault diagnosis nowadays [35, 36].

#### 3.5.3. A case study on bearing fault diagnosis

3.5. Bearing fault diagnosis based on artificial intelligent

The most commonly employed machine-learning techniques in fault diagnosis of rotating machines are hidden Markov models (HMMs), support vector machines (SVMs) and artificial neural networks (ANNs), which exploit shallow architectures either contain a single hidden layer or without a hidden layer. Such shallow architecture-based models have achieved great success both in theory and in practical applications. Though, shortcomings of these algorithms such as poor universality, lacking of theory basis in parameter selection, easier to fall into a local optimum value have limited the application of the algorithms in machine fault diagnosis.

A consensus criterion for an effective bearing fault diagnosis technique is that it should not only be able to identify various bearing fault conditions but also be able to discriminate different fault severities in each fault condition [30]. This leads to a stricter requirement on identification procedures where a classifier must have a greater capability in discriminating different fault classes. When fault data contain more than one level of fault severities in each fault condition, the accuracy of fault diagnostic result using shallow architecture classifiers described in the previous section will reduce dramatically. Hinton and Salakhutdinov [31] proposed a deep learning technique to overcome the deficiencies of single-layer architecture classifiers for a better pattern recognition capability. The concept is further extended to become a so-called deep belief networks (DBNs) [32] which relieves the training difficulties of deep network structures by adopting a layer-by-layer unsupervised forward pretraining learning and then back fine-tuning mecha-

Bengio et al. [33] proposed a deep stack auto-encoder (SAE) network by using a network structure similar to DBNs which is stacked with a number of auto-encoder networks. Furthermore, Le Cun et al. [34] proposed a convolutional neural network (CNN), a multilayer network where each layer is composed of several two-dimensional planes, to reduce the number of parameters in the learning process using a unique weight-sharing mechanism. The above cited deep-learning-based neural network algorithms have been widely employed in machine fault

nism. A description of the training process of a DBN is given in Figure 17.

3.5.1. Shallow architecture machine learning

3.5.2. Deep neural network

68 Bearing Technology

diagnosis nowadays [35, 36].

Figure 17. A description of the training process of a DBN.

Case 5: In this case study, a combination of a four-level wavelet packet decomposition (WVD), a locality preserving projection (LPP), a particle swarm optimization (PSO) and support vector machine (SVM) algorithms are employed for an intelligent bearing fault diagnosis and recognition. Bearing condition-monitoring data on various operation conditions such as healthy bearing, outer race fault, inner race fault and ball fault acquired from a bearing fault simulation test-rig as shown in Figure 18 are used in this analysis.

Figure 18. A graphical illustration of the bearing fault simulation test rig.

Three types of bearing defects are simulated in this experiment: an outer race fault, an inner race fault and a ball fault as shown in Figure 19. The measured vibration signals from an accelerometer mounted on the test bearing house for the four bearing operation conditions (the three simulated faults and a healthy bearing) are shown in Figure 20. Hundred data sets are acquired in the experiment which are divided into two groups: one contains 70 sets of data and is used as the training data set and the other contains 30 sets of data which is used as the test data set.

Figure 19. Simulated bearing faults: (a) outer race fault; (b) inner race fault; and (c) ball fault.

Figure 20. The measured vibration signals for the four bearing operation conditions. Note, sampling frequency, 8192, data length, 8192.

Five major steps are taken in the analysis of the bearing condition-monitoring data:

1. Each data is decomposed by a four-level wavelet packet decomposition leading to 16 components of different frequency contents (mutual orthogonal subspaces). The wavelet components of the condition-monitoring data for the outer race fault are shown in Figure 21 for illustration.

Figure 21. The wavelet components of the outer race bearing defect signal.

2. The energy value for each wavelet component is calculated and is used as the fault feature since the energy of the component can discriminate different classes and contains the fault information and its fluctuation in the particular component corresponding to the occurrence of the fault. A fault feature vector containing 16 energy features can be obtained for

each data. The energy distribution of the 16 wavelet components for the four bearing operation conditions is shown in Figure 22.

Figure 22. The energy values of the 16 wavelet components for the four bearing operation conditions.

3. For the 100 data sets (of four bearing operation conditions), the process in Step (2) will generate a 400 · 16 feature set. The dimension of the large feature set can cause a problem for the following algorithm leading to misclassification of the bearing fault classes in the diagnosis step using SVM algorithm. The locality preserving projection (LPP) [37], a linear dimensionality reduction algorithm, which can effectively reduce the dimension while preserving the neighborhood structure of a data set, is utilized to reduce the feature set dimension to 400 · 3. The three-dimensional feature distribution for the four bearing operation conditions after the dimensional reduction by LPP is shown in Figure 23.

Figure 23. The optimization result of the PSO algorithm.

Five major steps are taken in the analysis of the bearing condition-monitoring data:

Figure 21 for illustration.

data length, 8192.

70 Bearing Technology

1. Each data is decomposed by a four-level wavelet packet decomposition leading to 16 components of different frequency contents (mutual orthogonal subspaces). The wavelet components of the condition-monitoring data for the outer race fault are shown in

Figure 20. The measured vibration signals for the four bearing operation conditions. Note, sampling frequency, 8192,

2. The energy value for each wavelet component is calculated and is used as the fault feature since the energy of the component can discriminate different classes and contains the fault information and its fluctuation in the particular component corresponding to the occurrence of the fault. A fault feature vector containing 16 energy features can be obtained for

Figure 21. The wavelet components of the outer race bearing defect signal.


Figure 24. Prediction result of the SVM algorithm. Note, Levels 1–4 correspond to healthy bearing, inner race fault, ball fault and outer race fault, respectively.
