**2.1.2 Statistical features of filtered signals**

156 Fuzzy Inference System – Theory and Applications

Time-domain feature parameters Frequency-domain feature parameters

1

*k s k*

*<sup>p</sup> <sup>K</sup>* 

1

1 14 3

1 15 2

> *K k k K*

*K*

*k*

*<sup>p</sup> K p* 

*K*

*k*

16

 

*p*

*<sup>p</sup> Kp* 

*K*

*k*

*<sup>p</sup> <sup>K</sup>* 

( ) *K*

12

12

4 12

13

13

*f s k*

( )

( )

16

( ) ()

*f P sk*

*s k*

(( ) )

*sk p*

1

*k*

*K k k K*

2 1

*f s k*

( )

( )

where *s k*( ) is a spectrum for *k K* 1,2, , , *K* is

the number of spectrum lines; *kf* is the frequency value of the *k*th spectrum line.

*s k*

1

*k*

*k*

1

*K*

*k*

*<sup>p</sup> <sup>K</sup>* 

( )

(( ) )

*sk p*

<sup>2</sup>

<sup>17</sup>

<sup>3</sup>

20

*p*

(( ) )

*sk p*

1

<sup>4</sup>

19

*p*

1

*K k k K K*

1

1 1

*k k*

*<sup>p</sup> <sup>p</sup> p*

21

1 22 3

1 23 4

*<sup>p</sup> Kp* 

1

*<sup>p</sup> K p* 

*k*

*K*

*k*

 

*<sup>p</sup> Kp* 

*K*

*k*

 

1/2

24

*k*

*k*

*K*

*k*

  *K k k K k k*

2 1

*f s k*

( )

*f s k*

( )

4

*k*

() ()

*sk f sk*

16

16

( ) ()

*f p sk*

17

17

16

( ) ()

*f p sk*

17

( ) ()

*f p s k*

4 16

( )

*f sk*

12

*<sup>p</sup>* <sup>2</sup>

*<sup>p</sup>* <sup>3</sup>

13

1

(() )

*xn p*

( 1)

4

3

1 <sup>1</sup> ( ) *N*

1 <sup>1</sup> ( ) *N*

<sup>2</sup>

*n*

*N*

*x n*

*x n*

<sup>1</sup>

17

18

*p*

Each vibration signal is processed to extract eleven time-domain features and thirteen frequency-domain features from its spectrum. The time- and frequency-domain features

*n*

*N*

2

1

*N*

*n*

*<sup>p</sup> <sup>N</sup>* 

1

*N*

*n*

*<sup>p</sup> <sup>N</sup>* 

*N*

*n*

( )

(() )

*xn p*

1 2

( ( ))

*x n*

1

*N*

*n*

*<sup>p</sup> <sup>N</sup>* 

1 6 3

of data points.

*<sup>p</sup> N p* 

*N*

*n*

*x n*

( ) ( )

2

*p*<sup>5</sup> max ( ) *x n* <sup>5</sup>

3 1

2

(() )

where *x n*( ) is a signal series for *n N* 1,2, , , *N* is the number

Table 1. The feature parameters

*xn p*

( 1)

*N* 

1

<sup>4</sup>

2 1

<sup>5</sup>

<sup>5</sup>

 <sup>4</sup> 10

1 7 4

 

8

9

*<sup>p</sup> <sup>p</sup>*

*<sup>p</sup> <sup>p</sup>*

11

*<sup>p</sup> <sup>p</sup>*

*<sup>p</sup> <sup>p</sup>*

*<sup>p</sup> N p* 

*N*

*n*

*x n*

1

2

3

4

*P*

The examination of vibration signals indicates the presence of low-frequency interference. The signals are subjected to either high-pass or band-pass filtration to remove the lowfrequency interference components. *F* filters are adopted and the selected filtration frequencies should completely cover the frequency components characterizing faults of mechanical equipments. The eleven time-domain features are extracted from each of the filtered signals. Therefore 11× *F* time-domain features are obtained and defined as **feature set 3**.

In addition, the interference within the selected frequency band can be minimized by demodulation. Demodulation detection makes the diagnosis process a little more independent of a particular machine since it focuses on the low-amplitude high-frequency broadband signals characterizing machine conditions [17]. By performing demodulation and Fourier transform on the *F* filtered signals, we can obtain *F* envelope spectra. The envelope spectra are further processed to extract another set of 13× *F* frequency-domain features. This feature set is referred as **feature set 4**.

## **2.1.3 Statistical features of IMFs**

To extract more information, each of these raw signals is decomposed using the EMD method. EMD is able to decompose a signal into IMFs with the simple assumption that any signal consists of different simple IMFs [18]. For signal *x t*( ) , we can decompose it into *I* IMFs 1 2 ,,, *<sup>I</sup> cc c* and a residue *Ir* , which is the mean trend of *x t*( ) . The IMFs include different frequency bands ranging from high to low. The frequency components contained in each IMF are different and they change with the variation of signal *x t*( ) , while *Ir* represents the central tendency of signal *x t*( ) . A more detailed explanation of EMD can be found in Ref. [18].

Generally, first *S* IMFs containing valid information are selected to further analysis. Similar to the feature extraction method of the raw signals, the eleven features in time domain are extracted from each IMF. Then, we get an additional set of 11× *S* time-domain features referred as **feature set 5**.

Each IMF is demodulated and its envelope spectrum is produced. We extract the thirteen frequency-domain features from the envelope spectrum and finally derive another set of 13× *S* frequency-domain features defined as **feature set 6**.

## **2.2 Feature selection**

Although the above features may detect faults occurring in mechanical equipments from different aspects, they have different importance degrees to identify different faults. Some features are sensitive and closely related to the faults, but others are not. Thus, before a feature set is fed into a classifier, sensitive features providing mechanical fault-related information need be selected to enhance the classification accuracy and avoid the curse of

The Hybrid Intelligent Method Based on

**Step 6.** Calculating the ratio of ( ) *<sup>b</sup>*

set *m c*, , *<sup>j</sup> q* according to the distance evaluation criteria

regarding ANFIS can be referred to Ref. [19].

**2.4 The combination of multiple ANFISs** 

then normalizing

Clearly, larger

**2.3 Review of ANFIS** 

**Step 5.** Defining and calculating the compensation factor as

*<sup>j</sup> <sup>d</sup>* to ( ) *<sup>w</sup>*

 

*j*

Fuzzy Inference System and Its Application to Fault Diagnosis 159

( ) ( ) ( ) ( ) . (8)

*v v v v*

max( ) max( )

( ) ( ) *b j j j w j*

*<sup>j</sup>* by its maximum value and getting the distance evaluation criteria

*d d*

max( ) *j*

distinguish the *C* conditions. Thus, the sensitive features can be selected from the feature

The adaptive neuro-fuzzy inference system (ANFIS) is a fuzzy Sugeno model of integration where the final fuzzy inference system is optimized using the training of artificial neural networks. It maps inputs through input membership functions and associated parameters, and then through output membership functions to outputs. The initial membership functions and rules for the fuzzy inference system can be designed by employing human expertise about the target system to be modeled. Then ANFIS can refine the fuzzy if-then rules and membership functions to describe the input/output behavior of a complex system. Jang [19] found that even if human expertise is not available it is possible to intuitively set up reasonable membership functions and then employ the training process of artificial neural networks to generate a set of fuzzy if-then rules that approximate a desired data set. In order to improve the training efficiency and eliminate the possible trapping due to local minima, a hybrid learning algorithm is employed to tune the parameters of the membership functions. It is a combination of the gradient descent approach and least-squares estimate. During the forward pass, the node outputs advance until the output membership function layer, where the consequent parameters are identified by the least-squares estimate. The backward pass uses the back propagation gradient descent method to update the premise parameters, based on the error signals that propagate backward. More detailed description

The hybrid intelligent method is implemented by combining multiple ANFISs using GAs. The idea of combining multiple classifiers into a committee is based on the expectation that

*j*

*<sup>j</sup>* ( *j J* 1,2, , ) indicates that the corresponding feature is better to

*<sup>j</sup> d* and assigning the compensation factor

; (9)

. (10)

*<sup>j</sup>* from large to small.

1

*j w b j j w b j j*

dimensionality as well. Here, an improved distance evaluation technique is presented and it is used to select the sensitive features from the whole feature set [6].

Suppose that a feature set of *C* machinery health conditions is

$$\left\{q\_{m,c,j}, m = 1, 2, \dots, M\_c; \mathfrak{c} = 1, 2, \dots, \mathbb{C}; j = 1, 2, \dots, J\right\},\tag{1}$$

where *m c*, , *<sup>j</sup> q* is the *j*th eigenvalue of the *m*th sample under the *c*th condition, *Mc* is the sample number of the *c*th condition, and *J* is the feature number of each sample. We collect *Mc* samples under the *c*th condition. Therefore, for *C* conditions, we get *Mc* ×*C* samples. For each sample, *J* features are extracted to represent the sample. Thus, *Mc* ×*C* × *J* features are obtained, which are defined as a feature set *qm c*, , *<sup>j</sup>* .

Then the feature selection method based on the improved distance evaluation technique can be given as follows.

**Step 1.** Calculating the average distance of the same condition samples

$$d\_{c,j} = \frac{1}{M\_c \times (M\_c - 1)} \sum\_{l,m=1}^{M\_c} \left| q\_{m,c,j} - q\_{l,c,j} \right|\_{l'} \quad l, m = 1, 2, \dots, M\_c, l \neq m \; ; \tag{2}$$

then getting the average distance of *C* conditions

$$d\_j^{(w)} = \frac{1}{C} \sum\_{c=1}^{C} d\_{c,j} \, . \tag{3}$$

**Step 2.** Defining and calculating the variance factor of ( ) *<sup>w</sup> <sup>j</sup> d* as

$$w\_j^{(w)} = \frac{\max(d\_{c,j})}{\min(d\_{c,j})}.\tag{4}$$

**Step 3.** Calculating the average eigenvalue of all samples under the same condition

$$
\mu\_{c,j} = \frac{1}{M\_c} \sum\_{m=1}^{M\_c} q\_{m,c,j} \; ; \tag{5}
$$

then obtaining the average distance between different condition samples

$$\mathbf{d}\_{j}^{(b)} = \frac{1}{\mathbb{C} \times \{\mathbb{C} - 1\}} \sum\_{c,c=1}^{\mathbb{C}} \left| u\_{c,j} - u\_{c,j} \right| \quad \text{c}\_{\prime} \boldsymbol{\varepsilon} = \mathbf{1}, \mathbf{2}, \dots, \mathbf{C}, \mathbf{c} \neq \boldsymbol{\varepsilon} \,. \tag{6}$$

**Step 4.** Defining and calculating the variance factor of ( ) *<sup>b</sup> <sup>j</sup> d* as

$$w\_{\rangle}^{(b)} = \frac{\max(|u\_{e,j} - u\_{c,j}|)}{\min(|u\_{e,j} - u\_{c,j}|)}, \quad c, e = 1, 2, \dots, C, c \neq e. \tag{7}$$

**Step 5.** Defining and calculating the compensation factor as

$$\mathcal{X}\_{\boldsymbol{j}} = \frac{1}{\frac{\upsilon\_{\boldsymbol{j}}^{(w)}}{\max(\upsilon\_{\boldsymbol{j}}^{(w)})} + \frac{\upsilon\_{\boldsymbol{j}}^{(b)}}{\max(\upsilon\_{\boldsymbol{j}}^{(b)})}} \,. \tag{8}$$

**Step 6.** Calculating the ratio of ( ) *<sup>b</sup> <sup>j</sup> <sup>d</sup>* to ( ) *<sup>w</sup> <sup>j</sup> d* and assigning the compensation factor

$$\alpha\_{\dot{j}} = \mathbb{A}\_{\dot{j}} \frac{d\_{\dot{j}}^{(b)}}{d\_{\dot{j}}^{(w)}} ;$$

then normalizing *<sup>j</sup>* by its maximum value and getting the distance evaluation criteria

$$
\overline{\alpha}\_{\dot{j}} = \frac{\alpha\_{\dot{j}}}{\max(\alpha\_{\dot{j}})}.\tag{10}
$$

Clearly, larger *<sup>j</sup>* ( *j J* 1,2, , ) indicates that the corresponding feature is better to distinguish the *C* conditions. Thus, the sensitive features can be selected from the feature set *m c*, , *<sup>j</sup> q* according to the distance evaluation criteria *<sup>j</sup>* from large to small.

#### **2.3 Review of ANFIS**

158 Fuzzy Inference System – Theory and Applications

dimensionality as well. Here, an improved distance evaluation technique is presented and it

where *m c*, , *<sup>j</sup> q* is the *j*th eigenvalue of the *m*th sample under the *c*th condition, *Mc* is the sample number of the *c*th condition, and *J* is the feature number of each sample. We collect *Mc* samples under the *c*th condition. Therefore, for *C* conditions, we get *Mc* ×*C* samples. For each sample, *J* features are extracted to represent the sample. Thus, *Mc* ×*C* × *J* features

Then the feature selection method based on the improved distance evaluation technique can

*qmc j* , , , 1,2, , ; 1,2, , ; 1,2, , *m Mc C <sup>c</sup> j J* , (1)

, , 1,2, , , *<sup>c</sup> lm M l m* ; (2)

*<sup>j</sup> d* as

, *ce Cc e* , 1,2, , , . (6)

*<sup>j</sup> d* as

, *ce Cc e* , 1,2, , , . (7)

. (3)

*<sup>d</sup>* . (4)

; (5)

, 1

,

*c j d*

max( ) min( ) *w c j*

is used to select the sensitive features from the whole feature set [6].

**Step 1.** Calculating the average distance of the same condition samples

*Mc c j m c j l c j c c l m <sup>d</sup> q q M M*

( )

*j*

**Step 3.** Calculating the average eigenvalue of all samples under the same condition

*v*

then obtaining the average distance between different condition samples

*j e j c j c e d u u C C*

, , ( )

1 ( 1)

**Step 4.** Defining and calculating the variance factor of ( ) *<sup>b</sup>*

*j*

*v*

, 1

max( ) min( ) *e j c j b*

*u u*

*u u*

, ,

*e j c j*

*C*

1 *<sup>C</sup> <sup>w</sup> j c j c d d C*

( ) ,

, , , 1

, ,

1 *Mc c j m c j c m u q <sup>M</sup>*

, ,, ,, , 1

Suppose that a feature set of *C* machinery health conditions is

are obtained, which are defined as a feature set *qm c*, , *<sup>j</sup>* .

1 ( 1)

**Step 2.** Defining and calculating the variance factor of ( ) *<sup>w</sup>*

then getting the average distance of *C* conditions

( )

*b*

be given as follows.

The adaptive neuro-fuzzy inference system (ANFIS) is a fuzzy Sugeno model of integration where the final fuzzy inference system is optimized using the training of artificial neural networks. It maps inputs through input membership functions and associated parameters, and then through output membership functions to outputs. The initial membership functions and rules for the fuzzy inference system can be designed by employing human expertise about the target system to be modeled. Then ANFIS can refine the fuzzy if-then rules and membership functions to describe the input/output behavior of a complex system. Jang [19] found that even if human expertise is not available it is possible to intuitively set up reasonable membership functions and then employ the training process of artificial neural networks to generate a set of fuzzy if-then rules that approximate a desired data set.

In order to improve the training efficiency and eliminate the possible trapping due to local minima, a hybrid learning algorithm is employed to tune the parameters of the membership functions. It is a combination of the gradient descent approach and least-squares estimate. During the forward pass, the node outputs advance until the output membership function layer, where the consequent parameters are identified by the least-squares estimate. The backward pass uses the back propagation gradient descent method to update the premise parameters, based on the error signals that propagate backward. More detailed description regarding ANFIS can be referred to Ref. [19].

#### **2.4 The combination of multiple ANFISs**

The hybrid intelligent method is implemented by combining multiple ANFISs using GAs. The idea of combining multiple classifiers into a committee is based on the expectation that

The Hybrid Intelligent Method Based on

**3.1 Case 1: Fault diagnosis of bearings of CWRU test rig** 

cases.

in Ref. [21].

class classification problem.

frequency components representing bearing faults.

Fuzzy Inference System and Its Application to Fault Diagnosis 161

and diagnosis the existence of faults occurring in the bearings. In this section, two cases of rolling element bearing fault diagnosis are utilized to evaluate the effectiveness of the hybrid intelligent method. One is fault diagnosis of bearing test rig from Case Western Reserve University (CWRU), which involves bearing faults with different defect sizes [21]. The other is fault diagnosis of locomotive rolling bearings having incipient and compound faults. The vibration signals were measured under various operating loads and different bearing conditions including different fault modes and severity degrees in both

Faults were introduced into the tested bearings using the electron discharge machining method. The defect sizes (diameter, depth) of the three faults (outer race fault, inner race fault and ball fault) are the same: 0.007, 0.014 or 0.021 inches. Each bearing was tested under four different loads (0, 1, 2 and 3 hp). The bearing data set was obtained from the experimental system under four different health conditions: (1) normal condition; (2) with outer race fault; (3) with inner race fault; (4) with ball fault. Thus, the vibration data was collected from rolling element bearings under different operating loads and health conditions. More information regarding the experimental test rig and the data can be found

We conduct three investigations over three different data subsets (A–C) of the rolling element bearings. The detailed descriptions of the three data subsets are shown in Table 2. Data set A consists of 240 data samples of four health conditions (normal condition, outer race fault, inner race fault and ball fault) with the defect size of 0.007 inches under four various loads (0, 1, 2 and 3 hp). Each of the four health conditions includes 60 data samples. Data set A is split into two sets: 120 samples for training and 120 for testing. It is a four-class

Data set B also contains 240 data samples. 120 samples with the detect size of 0.021 inches are used as the training set. The remaining 120 samples with the detect size of 0.007 inches are identical with the 120 training samples of data set A and form the testing samples of data set B. The purpose of using this data set is to test the classification performance of the

Data set C comprises 600 data samples covering four health conditions and four different loads. Each fault condition includes three different defect sizes of 0.007, 0.014 and 0.021 inches, respectively. The 600 data samples are divided into 300 training and 300 testing instances. For data set C, in order to identify the severity degrees of faults, we solve the ten-

As mentioned in Section 2, the statistical features are extracted from the raw signal, filtered signal and IMF of each data sample. Three band-pass (BP1–BP3) and one high-pass (HP) filters are adopted for this bearing data. The band-pass frequencies (in kHz) of the BP1–BP3 filters are chosen as: BP1 (2.2–3.8), BP2 (3.0–3.8), and BP3 (3.0–4.5), respectively. The cut-off frequency of the HP filter is chosen as 2.2 kHz. These frequencies are selected to cover the

proposed method to incipient faults when it is trained by the serious fault samples.

classification task corresponding to the four different health conditions.

the committee can outperform its members. The classifiers exhibiting different behaviors will provide complementary information each other. When they are combined, performance improvement will be obtained. Thus, diversity between classifiers is considered as one of the desired characteristics required to achieve this improvement. This diversity between classifiers can be obtained through using different input feature sets.

In this study, the six different feature sets have been extracted and the relevant six sensitive feature sets have been selected. ANFIS is used as the committee member. The weighted averaging technique is utilized to combine the six classifiers based on ANFIS, and the final classification result of the hybrid intelligent method is given as follows:

$$\hat{y}\_n = \sum\_{k=1}^6 w\_k \hat{y}\_{n,k} \quad , \quad n = 1, 2, \dots, N \text{'} \quad k = 1, 2, \dots, 6 \text{'} \tag{11}$$

subject to

$$\begin{cases} \sum\_{k=1}^{6} w\_k = 1, \\ w\_k \ge 0, \end{cases} \tag{12}$$

where ˆ*<sup>n</sup> y* and , ˆ*n k y* represent the classification results of the *n*th sample using the hybrid intelligent method and the *k*th individual classifier respectively, *wk* is the weight associated with the *k*th individual classifier, and ' *N* is the number of all samples.

Here, the weights are estimated by using GAs to optimize the fitness function defined by Equation (13). Real-coded genomes are adopted and a population size of ten individuals is used starting with randomly generated genomes. The maximum number of generations 100 is chosen as the termination criterion for the solution process. Non-uniform-mutation function and arithmetic crossover operator [20] are used with the mutation probability of 0.01 and the crossover probability of 0.8, respectively.

$$f = \frac{1}{1 + E},$$

where *E* is root mean square training errors expressed as

$$E = \left[\frac{1}{N'} \sum\_{n=1}^{N'} \left(y\_n - \hat{y}\_n\right)^2\right]^{\frac{1}{2}}, \quad n = 1, 2, \dots, N'',\tag{14}$$

where *<sup>n</sup> y* is the real result of the *n*th training sample, and '' *N* is the number of the training samples.

#### **3. Applications to fault diagnosis of rolling element bearings**

Rolling element bearings are core components of large-scale and complex mechanical equipments. Faults occurring in the bearings may lead to fatal breakdowns of mechanical equipments. Therefore, it is significant to be able to accurately and automatically detect

the committee can outperform its members. The classifiers exhibiting different behaviors will provide complementary information each other. When they are combined, performance improvement will be obtained. Thus, diversity between classifiers is considered as one of the desired characteristics required to achieve this improvement. This diversity between

In this study, the six different feature sets have been extracted and the relevant six sensitive feature sets have been selected. ANFIS is used as the committee member. The weighted averaging technique is utilized to combine the six classifiers based on ANFIS, and the final

6

1

*w*

*k k*

 

*w*  1,

0, *k*

where ˆ*<sup>n</sup> y* and , ˆ*n k y* represent the classification results of the *n*th sample using the hybrid intelligent method and the *k*th individual classifier respectively, *wk* is the weight associated

Here, the weights are estimated by using GAs to optimize the fitness function defined by Equation (13). Real-coded genomes are adopted and a population size of ten individuals is used starting with randomly generated genomes. The maximum number of generations 100 is chosen as the termination criterion for the solution process. Non-uniform-mutation function and arithmetic crossover operator [20] are used with the mutation probability of

> 1 <sup>1</sup> *<sup>f</sup> <sup>E</sup>*

> > 2 2

where *<sup>n</sup> y* is the real result of the *n*th training sample, and '' *N* is the number of the training

Rolling element bearings are core components of large-scale and complex mechanical equipments. Faults occurring in the bearings may lead to fatal breakdowns of mechanical equipments. Therefore, it is significant to be able to accurately and automatically detect

, 1,2, , '' *n N* , (14)

1 ''

*n n*

*N*

*n E yy <sup>N</sup>*

1 <sup>1</sup> [ ( )] <sup>ˆ</sup> ''

**3. Applications to fault diagnosis of rolling element bearings** 

, 1,2, , ' *n N* , 1,2, ,6 *<sup>k</sup>* , (11)

(12)

, (13)

classifiers can be obtained through using different input feature sets.

classification result of the hybrid intelligent method is given as follows:

,

with the *k*th individual classifier, and ' *N* is the number of all samples.

0.01 and the crossover probability of 0.8, respectively.

where *E* is root mean square training errors expressed as

6

1 ˆ ˆ *n k nk k y w y* 

subject to

samples.

and diagnosis the existence of faults occurring in the bearings. In this section, two cases of rolling element bearing fault diagnosis are utilized to evaluate the effectiveness of the hybrid intelligent method. One is fault diagnosis of bearing test rig from Case Western Reserve University (CWRU), which involves bearing faults with different defect sizes [21]. The other is fault diagnosis of locomotive rolling bearings having incipient and compound faults. The vibration signals were measured under various operating loads and different bearing conditions including different fault modes and severity degrees in both cases.
