**Reliability Evaluation of Manufacturing Systems: Methods and Applications**

Alberto Regattieri

*DIEM - Department of Industrial and Mechanical Plants, University of Bologna Italy* 

### **1. Introduction**

314 Manufacturing System

Alfieri, A., Tolio, T., Urgo, M. (2010) A Project scheduling approach to production and

Bang, J.-Y.; Kim, Y.-D. (2010). Hierarchical production planning for semiconductor water

*Transactions on Automation Science and Engineering*, vol.7, no.2, pp,326-336. Capek, R.; Sucha, P.; Hanzálek, Z. (2011). Production scheduling with alternative process plans. *European Journal of Operational Research*, doi: 10.1016/j.ejor.2011.09.018. Dodin, B., Elimam, A.A. (2001) Integrated project scheduling and material planning with variable activity duration and rewards. *IIE Transactions*, vol.33, pp.1005-1018. Gerk, J. E. V.; Qassim, R. Y. (2008) Project acceleration via activity crashing, overlapping, and substitution. *IEEE Transactions on Engineering Management*, vol.55, no.4, pp. 590-601. Hartmann, S.; Briskorn, D. (2010). A survey of variants and extensions of the resource - constrained project scheduling problem. *European Journal of Operational Research*, vol.207, pp.1-14. Kong, F.; Zuo, J.; Zha, G.; Zhang, J. (2011). Study and application of open die forging CAPP

*Intelligent Manufacturing*, DOI 10.1007/s10845-010-0396-1.

material requirement planning in Manufacturing-to-Order environments. *Journal of* 

fabrication based on linear programming and discrete – event simulation. *IEEE* 

system based on process knowledge. *Journal of Advanced Manufacturing Systems*,

models for improved business strategic decision support. *Computers and Chemical* 

based approach for integrated process planning and scheduling. *Computers and*

enterprise solution for integrated production planning and control. *Computers in*

Scheduling and control decision-making under an integrated information

project activities information flow. *International Journal of Production Economics*,

and scheduling: A state – of – the – art survey. *IEEE Transactions on Systems, Man, and* 

Laínez, J.M.; Reklaitis, G.V.; Puigjaner, L. (2010). Linking marketing and supply chain

Li, X.; Shao, X.; Zhang, C.; Wang, C. (2010). Mathematical modeling and evolutionary –

Martínez, E. C., Duje, D., Pérez, G.A. (1997) On performance modeling of Project-oriented production. *Computers and Industrial Engineering*, vol.33, no.3, pp.509-527. Martínez, E. C., Pérez, G. A. (1998) A project-oriented production model of batch plants.

Monostori, L., Edos, G., Kádár, B., Kis, T., Kovács, A., Pffeifer, A., Váncza, A. (2010) digital

Munõz, E.; Capón-Garcia, E.; Moreno-Benito, M.; Espuña, A.; Puigjaner, L. (2011).

Ouelhadj, D.; Petrovic, S. (2009). A survey of dynamic scheduling in manufacturing systems.

Sajadieh, M.S., Shadrkh, S., Hassanzadeh, F. (2009) Concurrent project scheduling and material planning: A genetic algorithm approach. *Scientia Iranica*, vol. 16, pp.91-99. Shen, W.; Wang, L.; Hao, Q. (2006). Agent-based distributed manufacturing process planning

*Cybernetics – Partt C: Applications and Reviews*, vol.36, no.4, pp.563-577. Stray, J.; Fowler, J.W.; Carlyle, W.M.; Rastogi, A.P. (2006). Enterprise-wide semiconductor resource planning. *IEEE Transactions on Semiconductor Manufacturing*, vol.19, no.2, pp.259-268. Yu-guang, Z., Kai, X., Yong, Z. (2011). Modeling and analysis of panel hull block assembly system through time colored petri net. *Marine Structures*, vol.24, no.4, pp.570-580.

environment. *Computers and Chemical Engineering*, vol.35, pp.774-786. Nicoletti, S., Nicoló, F. (1998). A concurrent engineering decision model : Management of the

*Computers and Chemical Engineering*, vol.22, no.3, pp.391-414.

**6. References** 

vol.10, no.1, pp.45-52.

*Engineering*, vol.34, pp.2107-2117.

*Industry*, vol.61, pp.112-126.

vol.54, pp.115-127.

*Operations Research*, vol.37, pp.656-667.

*Journal of Scheduling*, vol.12, pp.417-431.

The measurement and optimization of the efficiency level of a manufacturing system, and in general of a complex systems, is a very critical challenge, due to technical difficulties and to the significant impact towards the economic performance.

Production costs, maintenance costs, spare parts management costs force companies to analyse in a systematic and effective manner the performance of their manufacturing systems in term of availability and reliability (Manzini et al. 2004, 2006, 2008).

The reliability analysis of the critical components is the basic way to establish first and to improve after the efficiency of complex systems.

A number of methods (i.e. Direct Method, Rank Method, Product Limit Estimator, Maximum likelihood Estimation, and others (Manzini et. Al., 2009) all with reference to *RAMS* (Reliability, Availability, Maintainability and Safety) analysis, have been developed, and can bring a significant contribution to the performance improvement of both industrial and non-industrial complex systems.

Literature includes a huge number of interesting methods, linked for example to preventive maintenance models; these models can determine the best frequency of maintenance actions, or the optimization of spare parts consumption or the best management of their operating costs (Regattieri et al., 2005, Manzini et al., 2009).

Several studies (Ascher et al..1984, Battini et al., 2009, Louit et al., 2007, Persona et al. 2007) state that often these complex methodologies are applied using false assumptions such as constant failure rates, statistical independence among components, renewal processes and others. This common approach results in poor evaluations of the real reliability performance of components. All subsequent analysis may be compromised by an incorrect initial assessment relating to the failure process. A correct definition of the model describing the failure mode is a very critical issue and requires efforts which are often not sufficiently focused on.

In this chapter the author discusses the model selection failure process, from the fundamental initial data collection phase to the consistent methodologies used to estimate the reliability of components, also considering censored data.

This chapter introduces the basic analytical models and the statistical methods used to analyze the reliability of systems that constitute the basis for evaluation and prediction of the stochastic failure and repair behavior of complex manufacturing systems, assembled using a variety of components. Consequently, the first part of the chapter presents a general framework for components which describes the procedure for the solution of the complete Failure Process Modeling (FPM) problem, from data collection to final failure modeling, that, in particular, develops the fitting analysis in the renewal process and the contribution of censored data throughout the whole process. The chapter discusses the main methods provided in the proposed framework.

Applications, strictly derived from industrial case studies, are presented to show the capability and the usefulness of the framework and methods proposed.

### **2. Failure Process Modeling (FPM) framework**

A robust reliability analysis requires an effective failure process investigation, normally based on non-trivial knowledge about the past performance of components or systems, in particular in terms of failure times. This data collection is a fundamental step. The introduction of a Computer Maintenance Management System - CMMS and of a Maintenance Remote Control System (Persona et al., 2007) can play an important role. Ferrari et al. (2003) demonstrate the risk due to a small data set or due to hasty hypothesis often considered (e.g. constant failure rate, independent identically distributed failure times, etc.).

Literature suggests different frameworks for the investigation of the failure process modeling of components and complex systems, generally focused on a particular feature of problem (e.g. trend tests in failure data, renewal or not renewal approach, etc.).

In this chapter a general framework is proposed considering all the FPM process from data collection to final failure modelling, also considering the contribution of censored data.

Figure 1 presents the proposed framework (Regattieri et al., 2010). Data collection is the first step of the procedure. This is a very important issue, since the robustness of analysis is strictly related to the collected data set. Both failure times and *censored* times are gathered. Times to failure are used for the failure process characterization and censored times finally enrich the data set used for the definition of the parameters of failure models, thus resulting in a more robust modeling.

In general, considering a population of components composed by *m* units, each specific failure (or inter-failure) time can be found. The result is represented by a set of times called *Xi,j* , where *ith* represents the time of failure of the *jth* unit: there is a complete data situation in this case, that is, all *m* unit failure times are available.

Unfortunately, frequently this is not a real situation, because a lot of time and information would be required. The real world test often ends before all units have failed, or several units have finished their work before data monitoring, so their real working times is unknown. These conditions are usually known as *censored data situations*.

Technically, censoring may be further categorized into:

	- All units have the same test time t\*. A unit has either failed before t\* or is still running (generating censored data);

the stochastic failure and repair behavior of complex manufacturing systems, assembled using a variety of components. Consequently, the first part of the chapter presents a general framework for components which describes the procedure for the solution of the complete Failure Process Modeling (FPM) problem, from data collection to final failure modeling, that, in particular, develops the fitting analysis in the renewal process and the contribution of censored data throughout the whole process. The chapter discusses the main methods

Applications, strictly derived from industrial case studies, are presented to show the

A robust reliability analysis requires an effective failure process investigation, normally based on non-trivial knowledge about the past performance of components or systems, in particular in terms of failure times. This data collection is a fundamental step. The introduction of a Computer Maintenance Management System - CMMS and of a Maintenance Remote Control System (Persona et al., 2007) can play an important role. Ferrari et al. (2003) demonstrate the risk due to a small data set or due to hasty hypothesis often considered (e.g. constant failure

Literature suggests different frameworks for the investigation of the failure process modeling of components and complex systems, generally focused on a particular feature of

In this chapter a general framework is proposed considering all the FPM process from data collection to final failure modelling, also considering the contribution of censored data.

Figure 1 presents the proposed framework (Regattieri et al., 2010). Data collection is the first step of the procedure. This is a very important issue, since the robustness of analysis is strictly related to the collected data set. Both failure times and *censored* times are gathered. Times to failure are used for the failure process characterization and censored times finally enrich the data set used for the definition of the parameters of failure models, thus resulting

In general, considering a population of components composed by *m* units, each specific failure (or inter-failure) time can be found. The result is represented by a set of times called *Xi,j* , where *ith* represents the time of failure of the *jth* unit: there is a complete data situation

Unfortunately, frequently this is not a real situation, because a lot of time and information would be required. The real world test often ends before all units have failed, or several units have finished their work before data monitoring, so their real working times is

All units have the same test time t\*. A unit has either failed before t\* or is still running

problem (e.g. trend tests in failure data, renewal or not renewal approach, etc.).

capability and the usefulness of the framework and methods proposed.

**2. Failure Process Modeling (FPM) framework** 

rate, independent identically distributed failure times, etc.).

in this case, that is, all *m* unit failure times are available.

Technically, censoring may be further categorized into:

unknown. These conditions are usually known as *censored data situations*.

provided in the proposed framework.

in a more robust modeling.

1. Individual censored data

2. Multiple censored data

(generating censored data);

Test times vary from unit to unit. Clearly, failure times differ but there are also different censoring times. Censored units are removed from the sample at different times, while units go into service at different times.

Fig. 1. Generalized framework for Failure Process Modeling (FPM)

In reference to Figure 1, let *X1,j< X2,j <…< Xi,j <…< Xn,j* be the ordered set of failure or inter-failure times of item j; censored times (denoted *Xij+*) are temporarily removed from the data set. The trend test applied to ordered failure times (graphical trend test, Mann test, etc.) determines if the process is stationary or not.

If the process presents a trend, the Xi,j are not identically distributed and a non-stationary model must be fitted. The NHPP model is the most used form due to its simplicity and according to significant experimental evidence available (Coetzee, 1997). At this time the censored data must be reconsidered in the model. Their impact is discussed by Jiang et al. (2005).

If the failure process is trend free, the next step is to identify if inter-failure times are independent. There are a lot of tests for independence, but this check is usually skipped by practitioners, as stated by Ascher and Feingold, because of a lack of understanding of the relevance of this type of test. An effective way of testing the dependence is the serial correlation analysis discussed by Cox and Lewis (1966). Dependence between data involves a Branching Poisson Processes (BPP), which is also analyzed by Cox and Lewis (1966). Censored data also play an important role in the BPP model and must be considered during final modelling.

In real applications the failure process is frquently stationary and the failure data are independent: then a renewal process is involved. In spite of this, the proposed framework pays attention to the evaluation of reliability functions, in particular in presence of censored data.

More precisely, non parametric methods and distribution based techniques are suggested to find the reliability functions such as survival functions, hazard functions, etc. considering censored data.

The Product Limit Estimator method and Kaplan-Meier method for the first category and Least Square Analysis and Maximum Likelihood Estimator technique for the second category are robust and consistent approaches.

Regattieri et al. (2010), Manzini et al. (2009) and Ebeling (2005) discuss in details each method referred in the presented framework.

### **3. Applications**

The proposed framework has been applied in several case studies. In this chapter, two applications are presented, in order to discuss methods, advantages and problems.

The first application deals with an important international manufacturer of oleo-dynamic valves. Using a reliability data set collected during the life of the manufacturing system, the effect of considering or not the censored data is discussed.

The second application involves the application of the complete framework in a light commercial vehicle manufacturing system. In particular, the estimation of the failure time distribution is discussed.

### **3.1 Application 1: The significant effect of censored data**

During the production, the Company collects times to failure using the CMMS system. In particular, the performance of component *r.090.1768* is analysed; this is a very important electric motor: it is responsible of the movement of the transfer system of the valves assembly line. Figure 2 shows a sketch of the component.

The failure process can be considered as a Renewal Process; stationary test and dependence test are omitted for the sake of simplicity (for details see application 2).

Using non parametric methods, in particular the Kaplan Meyer method (Manzini et al. 2009, Ebeling, 2005) it is possible to evaluate the empirical form of reliability function (called *R(ti))*.

Fig. 2. Component r.090.1768

If the failure process is trend free, the next step is to identify if inter-failure times are independent. There are a lot of tests for independence, but this check is usually skipped by practitioners, as stated by Ascher and Feingold, because of a lack of understanding of the relevance of this type of test. An effective way of testing the dependence is the serial correlation analysis discussed by Cox and Lewis (1966). Dependence between data involves a Branching Poisson Processes (BPP), which is also analyzed by Cox and Lewis (1966). Censored data also play an important role in the BPP model and must be

In real applications the failure process is frquently stationary and the failure data are independent: then a renewal process is involved. In spite of this, the proposed framework pays attention to the evaluation of reliability functions, in particular in presence of

More precisely, non parametric methods and distribution based techniques are suggested to find the reliability functions such as survival functions, hazard functions, etc. considering

The Product Limit Estimator method and Kaplan-Meier method for the first category and Least Square Analysis and Maximum Likelihood Estimator technique for the second

Regattieri et al. (2010), Manzini et al. (2009) and Ebeling (2005) discuss in details each

The proposed framework has been applied in several case studies. In this chapter, two

The first application deals with an important international manufacturer of oleo-dynamic valves. Using a reliability data set collected during the life of the manufacturing system, the

The second application involves the application of the complete framework in a light commercial vehicle manufacturing system. In particular, the estimation of the failure time

During the production, the Company collects times to failure using the CMMS system. In particular, the performance of component *r.090.1768* is analysed; this is a very important electric motor: it is responsible of the movement of the transfer system of the valves

The failure process can be considered as a Renewal Process; stationary test and dependence

Using non parametric methods, in particular the Kaplan Meyer method (Manzini et al. 2009, Ebeling, 2005) it is possible to evaluate the empirical form of reliability function (called *R(ti))*.

applications are presented, in order to discuss methods, advantages and problems.

considered during final modelling.

category are robust and consistent approaches.

method referred in the presented framework.

effect of considering or not the censored data is discussed.

**3.1 Application 1: The significant effect of censored data** 

assembly line. Figure 2 shows a sketch of the component.

test are omitted for the sake of simplicity (for details see application 2).

censored data.

censored data.

**3. Applications** 

distribution is discussed.

Table 1 presents all the available data, in terms of times to failures (*ti*) and censored times (*ti+*)


Table 1. r.090.1768 data set

Assuming *ti* as the ranked failure times and *ni* to be the number of components at risk, prior to the *ith* failure, the estimated reliability is calculated by:

$$
\hat{R}(t\_i) = \left(1 - \frac{1}{n\_i}\right)^{\alpha\_i} \hat{R}(t\_{i-1}) \tag{1}
$$

where

δi = (1,0) (if failure occurs at time *ti* , if censoring occurs at time *ti*);

#### ˆ *R*(0) 1

The results of reliability analysis are summarized in Table 2 and Figure 3.


Table 2. Reliability evaluation using the Kaplan-Meier method (component r.090.1768)

Experimental evidences show that Companies often neglect censored data considering only the times to failure (i.e. the so called *complete* data set).

The use of the complete data set when several components are still working (i.e. there are censored data) introduces significant errors. Considering the component r.090.1768, Figure 4 shows a comparison between the Reliability functions obtained by Kaplan Meyer method

 0 1,000 1 667 30 0.967 1 R(667) = 0.967 R(0) = 0.967

6 1124 25 0.960 1 R(1124) = 0.960 R(667) = 0.928 7 1246 24 0.958 1 R(1246) = 0.958 R(1124) = 0.889

9 1348 22 0.955 1 0.849 10 1478 21 0.952 1 0.808

13 1642 18 0.944 1 0.764 14 1745 17 0.941 1 0.719 15 1945 16 0.938 1 0.674 16 1974 15 0.933 1 0.629 17 2056 14 0.929 1 0.584 18 2128 13 0.923 1 0.539 19 2461 12 0.917 1 0.494 20 2489 11 0.909 1 0.449 21 2497 10 0.900 1 0.404

24 2674 7 0.857 1 0.346 25 2687 6 0.833 1 0.289 26 2756 5 0.800 1 0.231 27 2785 4 0.750 1 0.173 28 2894 3 0.667 1 0.115 29 3097 2 0.500 1 0.058 30 3467 1 0.000 1 0.000

Table 2. Reliability evaluation using the Kaplan-Meier method (component r.090.1768)

Experimental evidences show that Companies often neglect censored data considering only

The use of the complete data set when several components are still working (i.e. there are censored data) introduces significant errors. Considering the component r.090.1768, Figure 4 shows a comparison between the Reliability functions obtained by Kaplan Meyer method

ˆ *R*(0) 1

The results of reliability analysis are summarized in Table 2 and Figure 3.

i Ti ni (1-1/ni) δ<sup>i</sup> R(ti)

2 700 + 29 0.966 0 3 800 + 28 0.964 0 4 1000 + 27 0.963 0 5 1000 + 26 0.962 0

8 1300 + 23 0.957 0

11 1500 + 20 0.950 0 12 1500 + 19 0.947 0

22 2500 + 9 0.889 0 23 2500 + 8 0.875 0

the times to failure (i.e. the so called *complete* data set).

applied to the set with censored data, and Improved Direct Method (Manzini et al., 2009) applied only to the failure times (complete data set).

Fig. 3. Reliability Plot using Kaplan Meyer method (component r.090.1768)

Fig. 4. Comparison of complete (using IDM) and censored data (using Kaplan Meyer)

The error is generally an under estimation of reliability. It depends on the percentage of units not considered and on the censoring times.

Anyway, if censored times are not considered, a significant error is introduced.

#### **3.2 Application 2: A complete failure process modeling**

In this application the FPM process has been applied to carry out the reliability analysis of several components of a light commercial vehicles manufacturer production system. The plant is composed by a lot of subsystems; for each of them a set of critical components is considered. Each component has a preferable failure mode (wear, mechanical crash, thermal crash, etc.); from now on, the generic expression *failure* is used considering the particular and prominent failure mode for each component.

Table 3 shows the subset of critical components, called S1, analyzed in the Chapter.


Table 3. Critical components and subsystems

Among them, the main welding robot named wr1 is very critical, and in particular its components named KKL5699, which are the main actuators, are considered to be mainly responsible for the poor reliability performance of the entire manufacturing system. For this reason, the Chapter presents the application of the proposed framework to the component KKL5699. Finally, the conclusions take into account all the critical components shown in Table 1.

#### **3.2.1 Analysis of KKL5699 component**

The tow system is composed by 9 identical repairable components KKL5699, working in 9 different positions, named with letters A,B,…, L, under the same operating conditions. For this reason, they are pooled in a single enhanced data set. The working time is 24 hours/day, 222 days/year.

The CMMS has collected failure data from initial installation (T0 = 0). Table 4 reports the interfailure time *Xij* (failure *i* of item *j*) and the cumulative failure times *Fij* as shown in Figure 5.

The data are collected during 5 years of operating time, but FPM must be an iterative procedure applied at different instants of system service. The growth of the data set allows a more robust investigation of the failure process. In particular, the paper involves the results of analysis developed at the end of different time intervals [T0,t] : 1,440, 3,696, 4,824, 6,720, 8,448, 11,472, 13,440, 15,560, 18,816 and 23,688 hours. For the generic time *t*,the analysis uses all the failure times collected, but also the existing censored times according to the components in service. For each instant of analysis, 9 suspended times are collected due to the working times from the last repair action of components and the time analysis. Table 5 reports the data set of failure times, the censored times and the relative working position available at the instant of analysis 3,956 hours.

Fig. 5. Inter-failure time *Xij* and cumulative failure times *Fij*

plant is composed by a lot of subsystems; for each of them a set of critical components is considered. Each component has a preferable failure mode (wear, mechanical crash, thermal crash, etc.); from now on, the generic expression *failure* is used considering the particular

Among them, the main welding robot named wr1 is very critical, and in particular its components named KKL5699, which are the main actuators, are considered to be mainly responsible for the poor reliability performance of the entire manufacturing system. For this reason, the Chapter presents the application of the proposed framework to the component KKL5699. Finally, the conclusions take into account all the critical components shown in

The tow system is composed by 9 identical repairable components KKL5699, working in 9 different positions, named with letters A,B,…, L, under the same operating conditions. For this reason, they are pooled in a single enhanced data set. The working time is 24

The CMMS has collected failure data from initial installation (T0 = 0). Table 4 reports the interfailure time *Xij* (failure *i* of item *j*) and the cumulative failure times *Fij* as shown in

The data are collected during 5 years of operating time, but FPM must be an iterative procedure applied at different instants of system service. The growth of the data set allows a more robust investigation of the failure process. In particular, the paper involves the results of analysis developed at the end of different time intervals [T0,t] : 1,440, 3,696, 4,824, 6,720, 8,448, 11,472, 13,440, 15,560, 18,816 and 23,688 hours. For the generic time *t*,the analysis uses all the failure times collected, but also the existing censored times according to the components in service. For each instant of analysis, 9 suspended times are collected due to the working times from the last repair action of components and the time analysis. Table 5 reports the data set of failure times, the censored times and the relative working position

Table 3 shows the subset of critical components, called S1, analyzed in the Chapter.

and prominent failure mode for each component.

Table 3. Critical components and subsystems

**3.2.1 Analysis of KKL5699 component** 

available at the instant of analysis 3,956 hours.

Fig. 5. Inter-failure time *Xij* and cumulative failure times *Fij*

hours/day, 222 days/year.

Table 1.

Figure 5.

Obviously, when the instant of analysis increases, the number of failure times increases too. Whereas the censored times are constantly 9, then the *Censoring Rate - CR*, given by (1), decreases:

$$CR = \frac{Nc(t)}{Ntot(t)}\tag{2}$$

where *Nc(t)* is the number of censored times available at time analysis *t*, and *Ntot(t)* is the number of times (failure and censored) available at time analysis *t*.

The different Censoring Rates involved in the analysis are presented in Table 6.


Table 4. Inter-failure and failure time data set (KKL5699)

According to the proposed framework, the first test deals with the stationary condition. Figure 6, referring to all the pooled data set, presents the *cumulative failures vs time plot* graphs. No trend can be appreciated in the failure data. The Mann test, counting the number of reverse arrangements, confirms this belief, both for each component and for the pooled data set. The Laplace trend test leads to the same conclusions, in particular its test statistic is uL=0.55 according to a p-value p=0.580. Comparing uL2 with the χ<sup>2</sup> distribution with 1 degree of freedom, there is no evidence of a trend with a significance level of 5% (Ansell and Philips, 1994).


Table 5. Data set at 3.956 working hours (inter-failure times and censored times) component KKL5699


Table 6. Censoring rates (KKL5699)

A first assessment shows that the component failure process is stationary. The next step deals with the renewal process hypothesis evaluation.

The *serial correlation analysis* is used to reveal the independence of the analyzed data set. According to the *autocorrelation plot*, in Figure 7, we cannot reject the null hypothesis of no autocorrelation for any length of lag (5% is the significance level adopted).

The Durbin-Watson statistic confirms this belief, then the component failure process for component KKL5699 can be considered to be a renewal process (RP).

Considering the RP assumption, *non parametric* methods or *distribution based techniques* are available to define the failure process model.

Fig. 6. Cumulative failures vs time plot (pooled data set KKL5699)

Fig. 7. *Correlogram* for pooled *Xij* with 5% significance level (KKL 5699)

It is important to take into consideration the role of censored data: as demonstrated in the previous application 1, their use enhances the data set and then increases the confidence of the model.

According to the censored data consideration, data are previously analyzed by the *Product Limit Estimator* method (Manzini et al. 2009, Ebeling, 2005), and then the best reliability distributions (i.e. survival function, hazard rate, etc.) are fitted by the *Least square analysis* method. This approach is applied for each time interval of the system service time (i.e. instant of analysis).

From now on, we will only consider the pooled data set; for this reason, the time notation *Xij* collapses into *Xz.* The value of survival function - R(*Xz*), after a working time equal to *Xz* using the Product Limit Estimator method, is given by:

$$R(X\_z) = \left(\frac{n+1-z}{n+2-z}\right)^{\delta\_z} R(X\_{z-1})\tag{3}$$

where

324 Manufacturing System

Table 5. Data set at 3.956 working hours (inter-failure times and censored times) component

Instant of analysis (hs) 1440 2410 3696 4824 6720 8448 11472 13440 16560 18816 23688 N. of interfailure times *X ij* 2 9 12 18 26 32 46 55 66 75 93

*Censoring rate - CR* 0,818 0,500 0,429 0,333 0,257 0,220 0,164 0,141 0,120 0,107 0,088

A first assessment shows that the component failure process is stationary. The next step

The *serial correlation analysis* is used to reveal the independence of the analyzed data set. According to the *autocorrelation plot*, in Figure 7, we cannot reject the null hypothesis of no

The Durbin-Watson statistic confirms this belief, then the component failure process for

Considering the RP assumption, *non parametric* methods or *distribution based techniques* are

0 5000 10000 15000 20000 25000

age (hours)

autocorrelation for any length of lag (5% is the significance level adopted).

component KKL5699 can be considered to be a renewal process (RP).

Fig. 6. Cumulative failures vs time plot (pooled data set KKL5699)

*<sup>+</sup>* 99999999999

829 A 1.735 A 1.132 A 2.038 B 1.658 B 473 C 673 C 1.562 D 983 C 972 E 1.567 C 1.553 F 2.134 D 1.716 G 2.724 E 418 H 2.143 F 1.523 L

*<sup>+</sup>* (h) position

interfailure times *X ij* (h) position censored times *X ij*

1.980 G 3.278 H 2.173 L

deals with the renewal process hypothesis evaluation.

available to define the failure process model.

cumulative

 failures

KKL5699

N. of censored times *X ij*

Table 6. Censoring rates (KKL5699)

*δz* = (1;0) (if failure occurs at time *Xz* ; if censoring occurs at time *Xz*); *R(0)* = 1; *n* number of failure and censored events available

All the service times are investigated. Figure 8 shows the survival empirical curves for component KKL5699, obtained at different instants of analysis, sometimes very spaced out one from each other (e.g. several months apart).

The reliability evaluation after 1,440 hours (roughly 3 months of service) appears very approximate, while after 23,688 hours (more than 4 years of service) it is very confident. Considering the data set, the survival function evaluations change in a significant way (up to 25%) along the life of component; the data collection and the maintenance of data collected is thus a very important issue.

An alternative approach is to identify a proper statistical distribution for the principal reliability functions, such as the survival function, *R(t)*, the failure cumulative probability function, *F(t)*, and the hazard function, *h(t)*, to evaluate its parameter(s), and perform a goodness-of fit test.

Fig. 8. Survival function at different service times (1,440, 2,410, 3696, 23,688 hours) for component KKL5699

In general, this approach is very interesting because in recent years a significant number of techniques, based on the knowledge of the reliability distributions, have been developed, and then they can provide an important contribution to the performance improvement of both industrial and non-industrial complex systems. These techniques are often referred to as *RAMS (Reliability, Availability, Maintainability and Safety)* analysis.

For example, scientific literature includes a huge number of interesting methods, some being more complex than others, linked to preventive maintenance models; these models support the determination of the best intervention interval, or the optimization of the procedures, that determine spare parts consumption or the best management of their operating costs.

The *2-parameters Weibull* distribution is one of the most commonly used distributions in reliability engineering because of the many failure processes it attains for various values of parameters. It can therefore model a great variety of data and life characteristics. The distribution parameters can be estimated using several methods: the Least Square method (LS), the Maximum Likelihood Estimator (MLE) and others (Ebeling, 2005). In the KKL5699 component case, the Least Square method is preferred for its simplicity and robustness.

Table 7 summarizes the results in terms of Weibull parameters and index of fit, according to different instants of analysis.


Table 7. Parameters of Weibull distribution at different instants of analysis (KKL5699)

The parameters of Weibull distribution move toward steady values of about 3.0 for β and about 2.400 for η. Figure 9 shows graphically the trend of parameters according to different instants of analysis.

function, *F(t)*, and the hazard function, *h(t)*, to evaluate its parameter(s), and perform a

R(t) 23688 R(t) 3696 R(t) 2410 R(t) 1440

0 1000 2000 3000 4000 5000 6000 7000

age (h)

Fig. 8. Survival function at different service times (1,440, 2,410, 3696, 23,688 hours) for

as *RAMS (Reliability, Availability, Maintainability and Safety)* analysis.

In general, this approach is very interesting because in recent years a significant number of techniques, based on the knowledge of the reliability distributions, have been developed, and then they can provide an important contribution to the performance improvement of both industrial and non-industrial complex systems. These techniques are often referred to

For example, scientific literature includes a huge number of interesting methods, some being more complex than others, linked to preventive maintenance models; these models support the determination of the best intervention interval, or the optimization of the procedures, that determine spare parts consumption or the best management of their operating costs.

The *2-parameters Weibull* distribution is one of the most commonly used distributions in reliability engineering because of the many failure processes it attains for various values of parameters. It can therefore model a great variety of data and life characteristics. The distribution parameters can be estimated using several methods: the Least Square method (LS), the Maximum Likelihood Estimator (MLE) and others (Ebeling, 2005). In the KKL5699 component case, the Least Square method is preferred for its simplicity and robustness.

Table 7 summarizes the results in terms of Weibull parameters and index of fit, according to

Instant of analysis (hs) 1.440 2.410 3.696 4.824 6.720 8.448 11.472 13.440 16.560 18.816 23.688 shape factor 1,790 2,133 2,489 2,730 2,814 2,749 2,849 2,888 2,989 2,914 2,900 scale factor 3.160,420 2.087,720 2.268,320 2.251,380 2.378,990 2.393,510 2.311,830 2.335,510 2.335,140 2.400,010 2.427,910 index of fit 0,9451 0,9642 0,9857 0,9871 0,9822 0,985 0,9769 0,9793 0,9788 0,9647 0,9723 *Censoring rate - CR* 81,8 50,0 42,9 33,3 25,7 22,0 16,4 14,1 12,0 10,7 8,8

The parameters of Weibull distribution move toward steady values of about 3.0 for β and about 2.400 for η. Figure 9 shows graphically the trend of parameters according to different

Table 7. Parameters of Weibull distribution at different instants of analysis (KKL5699)

goodness-of fit test.

0

component KKL5699

different instants of analysis.

instants of analysis.

0,2

0,4

survival function -

 R(Xz)

0,6

0,8

1

Fig. 9. Trend of Weibull parameters according to the different service times (KKL5699)

Another interesting analysis deals with the link between the estimate of Weibull parameters and the censoring rate. Figures 10 and 11 show the significant role played by *CR* on β and η paths. The different censoring rates are due to a fixed number of censored data (i.e. 9) and an increasing number of failure data.

Fig. 10. Shape factor (β) according to the censoring rate (CR) (KKL5699)

Fig. 11. Scale factor (η) according to the censoring rate (CR) (KKL5699)

Reliability functions, such as the survival function, the cumulative probability of failure and the hazard rate, can be directly derived by 2-parameters Weibull distribution using the estimated parameters of Table 7. Figure 12 shows the survival function R(t) of component KKL5699 for a generic *time interval t* estimated after 2,410, 4,824 and 23,688 hours of service.

Fig. 12. Survival function (Weibull distribution) according to different instants of analysis (KKL5699)

Fig. 13. Comparison between Survival functions obtained by non-parametric method and by 2-parameters Weibull distribution for component KKL5699 after 23,688 hours.

The proposed framework (Fig.1) provides both the *non parametric* and the *distributions based* approaches. Their use usually results in similar outcomes but, as stated before, the second one incorporates more information and is preferable when possible. Figure 13 shows the comparison between the estimates of the survival functions of component KKL5699 obtained after 23,688 hours of service adopting the two different approaches.

#### **3.2.2 Analysis of the subset S1**

The stationary and dependence tests of the times to failure highlight how all the components of subset S1 can be described as having a Renewal Process behavior. Failure and censored data are used to perform the Product Limit Estimator method to evaluate empirical values of the cumulative failure distributions. The *2-parameters Weibull*  distribution is used to evaluate the reliability functions in an analytical manner; in particular the distribution parameters are estimated using the Least Square method for each instant of analysis. Table 8 summarizes the results in terms of Weibull parameters and in terms of index of fit after 23,688 operating hours.


Table 8. Parameters of Weibull distribution of S1 components after 23.688 operating hours

Figure 14 shows the survival function of the critical components as calculated on the basis of the parameters reported in Table 8.

Fig. 14. Survival function of S1 components (2-parameters Weibull)

### **4. Conclusion**

328 Manufacturing System

estimated parameters of Table 7. Figure 12 shows the survival function R(t) of component KKL5699 for a generic *time interval t* estimated after 2,410, 4,824 and 23,688 hours of service.

Fig. 12. Survival function (Weibull distribution) according to different instants of analysis

Fig. 13. Comparison between Survival functions obtained by non-parametric method and by

The proposed framework (Fig.1) provides both the *non parametric* and the *distributions based* approaches. Their use usually results in similar outcomes but, as stated before, the second one incorporates more information and is preferable when possible. Figure 13 shows the comparison between the estimates of the survival functions of component KKL5699

The stationary and dependence tests of the times to failure highlight how all the components of subset S1 can be described as having a Renewal Process behavior. Failure and censored data are used to perform the Product Limit Estimator method to evaluate empirical values of the cumulative failure distributions. The *2-parameters Weibull*  distribution is used to evaluate the reliability functions in an analytical manner; in particular the distribution parameters are estimated using the Least Square method for each instant of analysis. Table 8 summarizes the results in terms of Weibull parameters and in terms of

2-parameters Weibull distribution for component KKL5699 after 23,688 hours.

obtained after 23,688 hours of service adopting the two different approaches.

**3.2.2 Analysis of the subset S1** 

index of fit after 23,688 operating hours.

(KKL5699)

The Failure Process Modeling plays a fundamental role in reliability analysis of manufacturing systems. Complex methodologies are often applied using false assumptions such as constant failure rates, statistical independence between components, renewal processes and others. These misconceptions result in poor evaluation of the real reliability performance of components and systems. All complicated subsequent analysis may be compromised by an incorrect initial assessment relating to the failure mode process.

The experimental evidences show that a correct definition of the model describing the failure mode is a very critical issue and requires efforts often not sufficiently focused on by engineers.

The information collection of both failure data and censored data is a fundamental step. The CMMS method and a system automatically managing the alarms coming from the different sensors installed, can represent valid tools to improve this phase.

As demonstrated in the presented applications, the neglecting of the censored information results in significant errors in the evaluation of the reliability performance of the components.

The knowledge of a fitted analytical distribution is very interesting, because it allows several developments: for example, the determination of the best intervention frequency, or the optimization of the procedures that determine spare parts consumption or the best management of their operating costs.

The FPM procedure must also be maintained: during the service of systems, the reliability data set grows and a more robust estimation of reliability functions is allowed. FPM process, performed using the proposed framework (Fig. 1), must be an iterative procedure renewed during the life of systems.

### **5. References**

