**Meet the editor**

Prof. Jaroslav Menčík, originally mechanical engineer, has spent many years in technological and materials research at various institutions and universities in Czech Republic, Australia, Germany, UK, Japan, and other places. He is the author of two monographs on mechanical properties and many papers in scientific journals and conferences, with more than 1200 quotations. Since joining the Uni-

versity of Pardubice in 1996 he has been active in reliability engineering. He headed several grant projects and international scientific conferences on the topic. His lectures on reliability have been attended by students from various countries and branches: mechanical, civil and electrical engineering, transport technology, economy. With this extensive experience he decided to prepare a concise book for students and other people who want to get some insight into reliability, regardless their professional orientation. In writing, he put therefore emphasis on universal rules and methods. Jaroslav Menčík is married and has a daughter and a son.

## Contents

#### **Preface XIII**


Jaroslav Menčík

X Contents


#### **Section 2 Advanced Reliability 79**



Chapter 26 **Software for Reliability Analysis 201** Jaroslav Menčík

## Preface

Reliability, in general, is the ability to fulfill the demanded tasks. Our life is often influenced by the reliability or unreliability of the things we use, such as home appliances, machines or cars, and by the reliability of processes and services, such as the supply of electricity, tele‐ phone services, or keeping the timetable of transport means. Even the people we meet can be considered as reliable or unreliable; everyone has such experience.

The opposite of reliability is unreliability and failures. Failures mean losses, such as costs of repairs and losses due to dropout of production, and also fatalities or injuries and damages to property or the environment as well as sometimes the loss of good reputation of the man‐ ufacturer, which can even contribute to his bankruptcy.

In this respect, reliability is related closely to safety and also to quality. The quality of a product or service is its ability to ensure customer satisfaction. And reliability is nothing else than the ability to keep the quality in time. It would be of little use to buy a nice and power‐ ful car if it would fail the next day and have to stay for several weeks in a workshop. No‐ body would say that a product is of high quality if it fails several times during short time.

These days, much is spoken about quality, but it is actually reliability that must be aimed at. During the last 100 years, attention has been devoted to reliability and its increase. The main factors influencing reliability have been revealed and the foundations of the reliability theo‐ ry have been laid out. In addition, practical techniques have been developed for achieving high reliability. All this has brought its fruits. For example, the warranty time of cars has increased from 6 months, as common 60 years ago, to 6 years today. This could not be possi‐ ble without the increase of reliability thanks to the systematic effort in this direction. And this trend should continue.

Today, the methods of reliability engineering are taught in numerous courses and textbooks. These are often limited only to certain branches (e.g. reliability of machines, electric applian‐ ces, software, civil engineering structures, or reliability of services). However, many rules and methods are universal and applicable in various areas of technology, as well as in our life. Yes, even human life can be considered from the reliability point of view, and life insur‐ ance companies use the mathematics common for reliability calculations. Moreover, as the occurrence of failures is accompanied by uncertainty, the methods for increasing reliability are, in principle, suitable for the reduction of any uncertainties.

The author of this book, originally a mechanical engineer, has spent many years in research. This is also an area with many uncertainties. Later, at the University of Pardubice, he has been teaching reliability engineering and solving related problems. His lectures are attended by students from various countries and branches: mechanical, civil, and electrical engineer‐ ing, transport technology, and economy. With this extensive experience, he decided to pre‐ pare a concise book for students and others who want to get some insight into reliability, regardless of their professional orientation. Therefore, he put emphasis on reliability rules and methods useful for various branches. This book is no cookbook with recipes for the sol‐ ution of a narrow group of problems. It explains the basic principles and universal methods of wide applicability, as it is well known that progress can also be achieved by the transfer of ideas and methods from one branch to another.

The book is divided into two parts. The first part (Chapters 1.1 to 1.10) explains the basic terms and simple methods for the determination of reliability characteristics, which form the base for any reliability evaluation. To understand the contents of this section, no special knowledge is necessary. In the second part (Chapters 2.1 to 2.12), more advanced methods are explained, such as failure modes and effects analysis, load-resistance interference meth‐ od, the Monte Carlo simulation method, cost-based reliability optimization, basic ap‐ proaches to reliability testing, and methods based on Bayesian approach or fuzzy logic suitable for the processing of rather vague information. The practical examples included help in understanding the individual topics. All examples can be solved without a special software; Excel or even a pocket calculator is sufficient.

The book is complemented with information on the standards for reliability evaluation, soft‐ ware for reliability, sources of information on reliability, list of references, and an index.

The author wanted to write a brief book that can serve as an introduction to reliability and to the study of a special literature on this topic. He wishes that this book brings the readers pleasure and helps them in increasing the reliability of anything.

#### **Acknowledgments**

During the preparation of this book, I made use of the experience gained during my work and teaching at various institutions. I am grateful to my colleagues for their advice as well as to my students for their questions. Special thanks belong to Prof. Hynek Šertler and Prof. Milan Lánský, who oriented me in reliability issues after I joined the University of Pardu‐ bice; Prof. Dietrich Munz of the Research Centre of Karlsruhe (KIT); and Prof. Milan Holický of the Klokner Institute of the Czech Technical University. I am also very grateful to the Group for Reliability at the Czech Society for Quality, Prague, and its head, Prof. Václav Legát of the Czech University of Life Sciences in Prague, for the permanent effort to culti‐ vate a sense of reliability in the society.

> **Prof. Jaroslav Menčík** University of Pardubice Dopravní fakulta Jana Pernera, KMMČS Czech Republic

**Section 1**

**Reliability Basics**

**Chapter 1**

## **Basic Terms of Reliability**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62354

#### **Abstract**

Basic terms are explained, such as reliability, failure, fault, limit state, quality, safety, re‐ pair, renewal, maintenance, availability and dependability, inherent and operational reli‐ ability.

**Keywords:** Reliability, safety, failure, limit state; repair, renewal, maintenance, availabili‐ ty, dependability

In matters of reliability, people from various branches — manufacturers, customers, techni‐ cians, and lawyers — must often communicate together, especially when trying to find the answers to the following questions: "What happened and why?", "Who is guilty of this acci‐ dent?", "Who should pay the damages caused by the failure?", or "How should the warran‐ ty be defined?". Therefore, it is necessary that all participants understand certain technical terms in the same way. The most important expressions are explained in this section; more rigorous definitions can be found in the standards, such as ISO, IEC, and others, as listed in Appendix 2.

**Reliability**, in general, is the ability of an object (or process or service) to fulfil the demanded tasks and meet the specifications under given conditions. The specifications (i.e. technical parameters) must be written in the accompanying technical documentation. The conditions of use must also be specified, for example, the temperature range, in which the object will keep the assumed parameters.

Reliable operation is interrupted or terminated by failures. **Failure** is an event leading to the loss of the ability to fulfil the demanded tasks and meet the specifications. Examples are fracture of a component due to overloading or fatigue, collapse of a structure, loss of electric contact, unacceptable deformations or wear, or some parameters out of the allowable limits.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**Fault** is a defect in the component or product. Faults can also be present in software.

**Limit state.** From the reliability point of view, every object can be either in an operational state or in a failed state. The border between both states is the limit state. Some objects fail suddenly and the failure is complete. The technical condition of other objects becomes gradually worse. For some time, however, they are able to fulfil their purpose, though in limited extent (with worse technical parameters or lower safety); the failure is partial. However, if certain param‐ eters exceed the corresponding limit values, the object gets either unfit for further use or destroyed.

Reliability is a component of quality. **Quality** is the ability of a product or service to ensure full customer satisfaction. The quality of an object (e.g. a car) can be judged with respect to several criteria, such as power, maximum speed, noise, fuel consumption, and color. Examples of quality in services are the possibility to provide quick connection to the Internet or to keep the timetable of trains or buses (and also the cleanliness in the vehicles, etc.). Reliability, in essence, is the ability to keep the quality with time. Nobody would say that a car is of good quality if it fails repeatedly during a short time.

A further important characteristic is safety. **Safety** is the ability of an object not to endanger the human health or life, the environment, and properties.

If an object fails, it must be repaired. The word **repair** denotes the works for restoring full operability after a failure. However, the situation is more complex. Only some items are repaired, and we call them **repairable**. Some objects are **unrepairable** (e.g. electric lamp bulbs). If the filament has burned, the bulb must be replaced by a good one. Some items are repairable (e.g. an electric motor), but in some cases they are not repaired after failure but **replaced** by good ones, just to reduce the downtime of the object in which they were mounted (e.g. a gear box in a locomotive). If necessary, the repair of the failed part can be done at a suitable time elsewhere. For some mass-produced items, their replacement by new ones is cheaper than repair. Therefore, one often speaks of **repaired** or **unrepaired** objects.

A more general expression than repair is **renewal**, which means putting the object back into the condition as if it "were new". Renewal can be achieved by repair, replacement, or main‐ tenance.

**Maintenance** comprises cleaning, exchange of oils and dirty filters, tightening of locked screws and other adjustments, and repair of minor faults or damaged paints – generally small works for restoring full operability.

**Availability** is the readiness for correct service. In complex structures with long life (e.g. a car or a locomotive), failures inevitably appear from time to time. In this case, the term availability is used, which considers the times of operation as well as of repairs. Later, it will be shown how availability can be calculated and used for reliability evaluation.

All these terms together form the so-called **dependability**. This is a general term used to characterize availability and factors that influence it: reliability, maintainability, and mainte‐ nance support. Briefly,

**•** availability = readiness for correct service,


In this context, reliability has narrower meaning than in the aforementioned definition.

When speaking of reliability, two kinds should be distinguished: inherent and operational.

**Inherent reliability** is the reliability "built-in" into the object during the design stage by using an appropriate concept, materials and dimensions, and also by suitable conditions of manu‐ facture or assembly. There are many pieces of evidence that it is the design that is most important for achieving high reliability. Some means will be shown later.

**Operational reliability** is the reliability achieved in operation. It depends on the way of operation and maintenance. Reckless operation and poor maintenance can significantly shorten the life of every object. However, neither very good maintenance can mitigate the faults and weaknesses caused by improper design.

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

#### Concise Reliability for Engineers

**Chapter 2**

## **Probability Basics**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62355

#### **Abstract**

The main concepts of probability theory are explained, such as probability, random quan‐ tity, population, sample, mean, average, standard deviation, coefficient of variation, probability density, distribution function, quantile, critical value, confidence interval and testing of hypotheses. Important probability distributions are also shown.

**Keywords:** Probability, sample, mean, standard deviation, distribution function, quantile, confidence interval, testing of hypotheses

The occurrence of failures is usually accompanied by some uncertainty. This is due to many factors that we cannot control, and call them therefore **random**. Similarly, we speak about **random events**, which can happen or not, depending on random influences. For their pre‐ diction, we use the concept of **probability** and the related methods. However, before these methods will be explained, several words are addressed here to those who have no or little knowledge of this topic. There are also methods that can improve reliability without proba‐ bility tools, e.g. Failure Mode and Effect Analysis, which will be explained later. Neverthe‐ less, such methods are suitable only in some cases, whereas the formulas based on probability can facilitate the solution of many reliability problems. Because computers can do all the necessary work, the only thing a user of probabilistic methods needs is some un‐ derstanding of the basic terms and concepts. The following pages will try to help him or her.

**Probability** is a quantitative measure of the possibility that a random event occurs. The simplest definition of probability *P* is based on the occurrence of an event in a numerous repetition of a trial:

$$P = \mathfrak{n} / \mathfrak{N}\_{\prime} \tag{1}$$

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

where *N* is the total number of trials and *n* is the number of trials with a certain outcome (e.g. a tossed coin with the eagle on the top, the number of days with the maximum temperature higher than 20°C, or the number of defective components). Probability is a dimensionless quantity that can attain values between 0 and 1; zero denotes the impossible event and 1 denotes a certain event. A **random variable** is a variable that can attain various values with certain probabilities. Random quantities are **discrete** and **continuous**. Examples of **discrete random quantities** are the number of failures during a certain time, number of vehicle collisions, number of customers in a queue or number of their complaints, and number of faulty items in a batch. **Continuous random quantities** can attain any value (in some interval), such as strength of a material, wind velocity, temperature, diameter, length, weight, time to failure (expressed in hours, kilometers, loading cycles, or worked pieces), duration of a repair, and, also, probability of failure. Examples are depicted in Figure 1.

Random quantities can be described by **probability distribution** or by single numbers, called **parameters**, if they are related to the population (i.e. the set of all possible members or values of the investigated quantity), or **characteristics**, if they are calculated from a sample of a limited size. Parameters are usually denoted by Greek letters and characteristics are denoted by Latin letters.

**Figure 1.** Examples of random quantities.

### **1. Description by parameters**

The main parameters (or characteristics) of random quantities are given below, with the formulas for calculation from samples of limited size.

**Mean** *μ* (or **average value**) characterizes the position of the quantity on numerical axis; it corresponds to its centroid,

$$
\overline{\infty} = \frac{\sum \mathbf{x}\_j}{n} \tag{2}
$$

*x*j is the *j*th value and *n* is the size of the sample. The summation is done over all *n* values.

**Variance** *σ***<sup>2</sup>** (or *s***<sup>2</sup>** ) characterises the dispersion of the quantity, and is calculated as

$$s^2 = \sqrt{\frac{\sum \left(x\_j - \overline{x}\right)^2}{n-1}}\tag{3}$$

**Standard deviation** *σ* (or *s*) is defined as the square root of scatter,

$$s = \sqrt{\frac{\sum \left(\chi\_{\parallel} - \overline{\chi}\right)^2}{n - 1}}\tag{4}$$

It has the same dimension as the investigated variable *x*, and therefore it is used more often than scatter.

**Coefficient of variation***ω* (or *v*) characterizes the relative dispersion, compared to the mean value,

$$w = \frac{s}{\overline{\overline{\chi}}} \tag{5}$$

It can thus be used for the comparison of random variability of various quantities.

A disadvantage of the average value is its sensitivity to the extreme values; the addition of a very high or very low value can cause its significant change. A less sensitive characteristic of the "mean" of a series of values is **median** *m*. This is the value in the middle of the series of data ordered from minimum to maximum (e.g. *m* = 4 for the series 2, 6, 1, 8, 10, 4, 3).

#### **2. Description by probability distribution**

A more comprehensive information is obtained from probability distribution, which informs how a random variable is distributed along the numerical axis. For discrete quantities, **probability function** *p***(***x***)** is used (Fig. 2), which expresses the probabilities that the random variable *x* attains the individual values *x*\*,

$$p\left(\mathbf{x}^\*\right) \, = P\left(\mathbf{x} = \mathbf{x}^\*\right) \,. \tag{6}$$

**Figure 2.** Binomial distribution. (An example; parameters: *p* = 0.23, *n* = 10.)

**Probability density** *f***(***x***)** is used for continuous quantities and shows where this quantity appears more or less often (Fig. 3a). Mathematically, it expresses the probability that the variable *x* will lie within an infinitesimally narrow interval between *x*\* and *x*\* + **d***x*.

**Distribution function** *F***(***x***)** is used for discrete as well as continuous quantities (Fig. 3b) and expresses the probability that the random variable *x* attains values smaller or equal to *x*\*:

$$P\left(\mathbf{x}^\*\right) \ = P\left(\mathbf{x} \le \mathbf{x}^\*\right) \ . \tag{7}$$

These functions are related mutually as

$$f(\mathbf{x}) = \, d\mathbf{F} \Big/ \mathrm{d}\mathbf{x}, \quad \mathbf{F}(\mathbf{x}) = \bigvee\_{\cdot=\mathbf{e}}^{\mathrm{x}} f(\mathbf{x}) \, d\mathbf{x}, \quad \text{or} \, \mathbf{F}(\mathbf{x}) = \sum\_{i=1}^{\mathrm{n}} p(\mathbf{x}\_{i}) \,. \tag{8}$$

Figure 3 shows two possibilities for depicting these functions: by histograms or by analytical functions. Histograms are obtained by dividing the range of all possible values into several intervals, counting the number of values in each interval and plotting rectangles of heights proportional to these numbers. To make the results more general, the frequencies of occurrence in individual intervals are usually divided by the total number of all events or values. This

**Figure 3.** (a) Probability density *f*(*x*) and (b) distribution function *F*(*x*) of a continuous quantity.

gives relative frequencies (a) or relative cumulative frequencies (b), which approximately correspond to probabilities. quantity. The histograms show relative frequency (nj ) and relative cumulative frequency (nc,j ).

Figure 3. Probability density f(x) and distribution function F(x) of a continuous

Fitting such histogram by a continuous analytical function gives the probability density or distribution function (solid curves in Fig. 3).

The probability of some event (e.g. snow height *x* lower than *x*A) can be determined as the corresponding area below the curve *f*(*x*) or, directly, as the value *F*(*x*A) of the distribution function.

Also very important are the following two quantities.

**Quantile** is such value of the random quantity *x*, that the probability of *x* being smaller (or equal) to is only *α*,

$$P\left(\mathbf{x} \le \mathbf{x}\_a\right) = \mathbf{a}.\tag{9}$$

**Quantiles** are inverse to the values of distribution function (Fig. 3b),

$$\mathfrak{x}\_a = \mathcal{F}^{-1}(a) \; ,$$

$$\mathcal{F}^{-1}(a) \; ,$$

and are used for the determination of the "guaranteed" or "safe" minimum value of some quantity, such as the minimum expectable strength or time to failure.

**Critical value** (Fig. 3b) is such value of the random quantity*x***,** that the probability of its exceeding is only *β*,

$$P\left(\mathbf{x} > \mathbf{x}^{\boldsymbol{\beta}}\right) = \boldsymbol{\beta}.\tag{11}$$

The critical values are used for the determination of the expectable maximum value of some quantity, such as wind velocity or maximum height of snow in some area. They are also used for hypotheses testing, for example whether two samples come from the same population. Probability *β* is complementary to *α*; *β* = 1 – *α*,

$$\mathbf{x}\_{\alpha} = \mathbf{x}^{1-\theta}, \qquad \qquad \mathbf{x}^{\theta} = \mathbf{x}\_{1-\alpha}. \tag{12}$$

More about the basic probability definitions and rules can be found, for example, in [1 – 5].

#### **3. Probability distributions common in reliability**

Several probability distributions exist, which are especially important for reliability evalua‐ tion. For discontinuous quantities, it is binomial and Poisson distribution. The main distribu‐ tions for continuous quantities used in reliability are normal, lognormal, Weibull, and exponential. For some purposes also, uniform distribution, Student's *t*-distribution, and chisquare (*χ*<sup>2</sup> ) distribution are used. The brief descriptions follow; more details can be found in the special literature [1 – 5].

**Binomial distribution** (Fig. 2) gives the probability of occurrence of *x* positive outcomes in *n* trials if this probability in each trial equals *p*. An example is the number of faulty items in a sample of size *n* if their proportion in the population is *p*. The probability function is

$$p(\mathbf{x}) = \binom{n}{\mathbf{x}} p^{\mathbf{x}} (\mathbf{1} - p)^{n - \mathbf{x}},\tag{13}$$

and the mean value is *μ* = *np*. This distribution is discrete and has only one parameter *p*, which can be determined from the total number *m* of positive outcomes in *n* trials as *p* = *m*/*n*.

**Poisson distribution** is similar to binomial distribution but is better suitable for rare events with low probabilities *p*. The probability function giving the probability of occurrence of *x* positive outcomes in *n* trials is

$$p(\mathbf{x}) = \frac{\mathcal{X}^{\mathbf{x}} \mathcal{C}^{-\lambda}}{\mathbf{x}!} \tag{14}$$

*λ* is the distribution parameter that corresponds to the average occurrence of *x* (and, in fact, to the product *np* of binomial distribution.)

**Normal distribution**, called also Gauss distribution, resembles a symmetrical bell-shaped curve (Fig. 4). It is used very often for continuous variables, especially if the variations are caused by many random factors and the scatter is not too big (cf. the central limit theorem). The probability density is

$$f(\mathbf{x}) = \frac{1}{\sigma \sqrt{2\pi}} \exp\left[ -\frac{1}{2} \left( \frac{\mathbf{x} - \boldsymbol{\mu}}{\sigma} \right)^2 \right] \tag{15}$$

with the mean *μ* and standard deviation *σ* as parameters. There is no closed-form expression for the distribution function *F*(*x*); it must be calculated as the integral of the probability density, cf. Equation (8). In practice, various approximate formulas are also used.

**Figure 4.** Normal distribution (probability density).

**Standard normal distribution** corresponds to normal distribution with parameters *μ* = 0 and *σ* = 1 (Fig. 4). The expression for probability density is usually written as

$$f(u) = \frac{1}{\sqrt{2\pi}} \exp\left(-u^2/2\right),\tag{16}$$

*u* is the standardised variable, related to the variable *x* of the normal distribution as

$$
\mu = \left< \chi - \mu \right> / \sigma \text{ .}\tag{17}
$$

It expresses the distance of *x* from the mean as the multiple of standard deviation. It is useful to remember that 68,27% of all values of normal distribution lie within the interval (*μ* ± *σ*), 95,45% within (*μ* ± 2*σ*), and 99,73% within (*μ* ± 3*σ*).

**Log-normal distribution** is asymmetrical (elongated towards right, similar to Weibull distribution with *β* = 2 in Fig. 5) and appears if the logarithm of random variable has normal distribution.

**Weibull distribution** (Fig. 5) has the distribution function

$$F(t) \ = 1 \ - \ \exp\left\{ \ - \ \left[ \left( t - t\_0 \right) / a \right]^\flat \right\} \ , \tag{18}$$

with three parameters: scale parameter *a*, shape parameter *b*, and threshold parameter *t*0 that corresponds to the minimum possible value of *x*. The probability density *f*(*x*) can be obtained easily as the derivative of distribution function. Weibull distribution is very flexible, thanks to the shape parameter *b* (Fig. 5). It is often used for the approximation of strength or time to failure. It belongs to the family of **extreme value distributions** [5, 6] and appears if the failure of the object starts in its weakest part. The determination of parameters of this very important distribution will be explained in Chapter 11.

**Exponential distribution** is a special case of Weibull distribution for shape parameter *b* = 1, cf. Fig. 5, with the distribution function

$$F(t) \ = 1 \ - \ \exp\left(t/T\_0\right) \ . \tag{19}$$

which may be used, for example, for the times between failures caused by many various reasons and also in complex systems consisting of many parts. This distribution has only one parameter, *T*0, which corresponds to the mean *μ* and has the same value as the standard deviation *σ*.

The following three distributions are important especially for the determination of confidence intervals, for statistical tests, and for the Monte Carlo simulations, as it will be shown later.

**Figure 5.** Weibull distribution for various values of shape parameter Figure 5. Weibull distribution, various shape parameters *b*.

**Uniform distribution** has constant probability density, *f* = *const*, in the interval <*a*; *b*>, so that it looks like a rectangle. The mean value is the average of both boundaries, *μ* = (*a* + *b*)/2, and the scatter equals *σ*<sup>2</sup> = (*b* – *a*) 2 /12.

*χ***2 distribution** is a distribution of the sum of *n* quantities, each defined as the square of standard normal variable. An important parameter is the number of degrees of freedom. For more, see [1–5].

*t***-distribution** (or Student's distribution) arises from a combination of *χ*<sup>2</sup> and standard normal distribution. It looks similar to normal distribution but also depends on the number of degrees of freedom; see [1–5].

The values of distribution functions and quantiles of the above distributions can be found via special tables or using statistical or universal computer programs, such as Excel.

Finally, two important probabilistic concepts should be explained.

**Confidence interval**. A consequence of random variability of many quantities is that every measurement or calculation gives a different result depending on the used specimen or input value. Thus, the average = Σ*x*<sup>j</sup> /*n* is usually determined from several (*n*) values for obtaining a more definite information. This, however, does not say how far the actual mean *μ* can be from it. For this reason, confidence interval is often determined, which contains (with high proba‐ bility) the actual value. For example, the confidence interval for the mean is

$$
\sqrt{\overline{x}} - t\_{a, n-1} \frac{s}{\sqrt{n}} < \mu < \overline{x} + t\_{a, n-1} \frac{s}{\sqrt{n}} \tag{20}
$$

and *s* are the average and standard deviation of the sample of *n* values, and *t*α, n-1 is the *α–* critical value of two-sided *t*–distribution for *n* – 1 degrees of freedom. The probability that the true mean *μ* will lie inside the interval (20), is 1 – *α*. Confidence intervals can also be determined for other quantities.

A note. Also one-sided critical values exist. Such value (*α´*) corresponds to the probability that the *t*-value will be either higher or lower than the pertinent critical value. *α´* is related to *α* as *α´* = *α*/2. When using statistical tables or computer tools one must be aware how was the pertinent quantity defined.

**Testing of hypotheses.** Often, one must decide which of the two products or technologies is better. The decision can be based on the value of the characteristic parameter (e.g. the mean). However, the values of individual candidates usually differ. If the differences are not big, one must consider that a part of the variability of individual values is due to random reasons. Statistical tests can reveal whether the differences between characteristic values of both compared samples are only random or if they reflect a real difference between both types of products. The value of the pertinent test criterion is calculated from basic statistical charac‐ teristics of each sample and compared with the critical value (of the probability distribution) of this criterion. If the calculated value is larger than the unlikely low critical value, we conclude that the difference is not random. If it is smaller, we usually conclude that there is no substantial difference between both populations. These tests are explained in the literature [1–5] and available in various statistical or universal programs. Also Excel offers several tests (e.g. for the difference between the mean values or scatters of two populations).

#### **Example 1**

The diameters of machined shafts, measured on 10 pieces, were *D* = 16.02, 15.99, 16.03, 16.00, 15.98, 16.04, 16.00, 16.01, 16.01, and 15.99 mm, respectively. Calculate: (a) the average value and the standard deviation. Assume that the diameters have normal distribution, and calculate (b) the 95% confidence interval for the mean value *μ*D and also (c) the interval, which will contain 95% of all diameters.

#### **Solution**


$$\begin{aligned} 16.007 - 2.2622 &\frac{0.01889}{\sqrt{10}} < \mu\_{\rm D} < 16.007 - 2.2622 \frac{0.01889}{\sqrt{10}}\\ 15.993 &< \mu\_{\rm D} < 16.020 \text{ mm} \end{aligned}$$

**c.** The individual values can be expected (under assumption of normal distribution) within the interval – *u*α/2×*s* < *d* < + *u*α/2×*s*, where *u*α/2 is *α*/2 – critical value of standard normal distribution (corresponding to probability *α*/2 that the diameter will be larger than the upper limit of the confidence interval, and *α*/2 that it will be smaller than the lower limit). In our case, *u*0.025 ≈ 1.96, so that 16.007 – 1.96×0.01889 < *D* < 16.007 + 1.96×0.01889; that is *D* ∈ (15.970; 16.044). The reliability of prediction could be increased if tolerance interval were used instead of confidence interval; cf. Chapter 18.

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


#### Concise Reliability for Engineers

**Chapter 3**

## **Characteristics of Reliability**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62356

#### **Abstract**

The basic reliability characteristics are explained: time to failure, probability of failure and of failure-free operation, repairable and unrepairable objects. Mean time to repair and between repairs, coefficient of availability and unavailability, failure rate. Examples for better understanding are included.

**Keywords:** Time to failure, mean time to failure, mean time between failures, mean time to repair, availability, unavailability, failure rate

Reliability is usually characterized by the probability of failure or by the time to failure. If failure is considered as a single event (e.g. collapse of a bridge), regardless of the time, only its probability is of interest. If we want to know when the failures can occur, their time char‐ acteristics are also important. In this chapter, time-dependent failures will be dealt with. Here, one must distinguish between unrepaired and repaired objects depending on whether the failed object is discarded or repaired and again put into service.

## **1. Unrepaired objects**

The basic quantity for unrepaired objects is the **time to failure** *t***f**. If a group of identical objects is put into operation, the individual pieces begin to fail after some time, and it is also possible to express the number of failed pieces as a function of time, *n*<sup>f</sup> (*t*). A more universal quantity is the relative proportion of the failed items, that is, the number of the failed items related to the number *n* of monitored objects, *n*<sup>f</sup> (*t*)/*n*. This ratio approximately expresses the **probability of failure** *F***(***t***)** during the time interval <0; *t*>;

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

$$F(t) = \int\_0^t f(t)dt \; \approx n\_f(t) / n \tag{1}$$

Function *F*(*t*) is the **distribution function of the time to failure**, also called **failure function** (Fig. 1a). An aid for easier remembering: the letter *F* is also the first letter of the word failure.

**Figure 1.** (a) Failure function *F*(*t*) and (b) reliability function *R*(*t*).

The **probability of failure-free operation** *R***(***t***)** expresses the probability that no failure occurs before the time *t*;

$$R(t) = \bigcap\_{t}^{\alpha} f(t)dt \approx \left[n - n\_f(t)\right] \Big/ n \tag{2}$$

*R*(*t*) shows the gradual loss of serviceable objects (Fig. 1b) and is called **reliability function** (therefore the symbol *R*). *R* is complementary to *F*,

$$R\begin{pmatrix} t \ \end{pmatrix} + F\begin{pmatrix} t \end{pmatrix} \ = \ 1 \ . \qquad R = \ 1 \ -F \ . \quad F = \ 1 \ -R \tag{3}$$

The **probability density of the time to failure,** *f***(***t***),** expresses the probability of failure during a very short time interval ∆*t* at time *t*, related to this interval:

$$f(t) = \frac{dF(t)}{dt} \approx \frac{n\_f(t + \Delta t) - n\_f t}{n\Delta t};\tag{4}$$

the unit is s-1 or h-1. The right-hand side of Equation (4) indicates how the probability density can be determined from empirical data, *n*<sup>f</sup> (*t* + ∆*t*) expresses the number of failed parts from 0 to *t* + ∆*t*, and *n*<sup>f</sup> (*t*) is the number of failures that occurred until the time *t*. In fact, probability density *f*(*t*) shows the distribution of failures in time, similar to Fig. 3a in Chapter 2.

Useful information on reliability is obtained from a very simple characteristic, the average or **mean time to failure** or *MTTF*, which is generally defined as

$$MTTF = \overline{t} = t\_{\text{man}} = \bigcap\_{0}^{\alpha} t f(t) dt = \bigcap\_{0}^{\alpha} R(t) dt \; . \tag{5}$$

The mean time to failure can be calculated from operational records as the average of the group of measured times to failure,

$$MTTF = \begin{pmatrix} 1/n \end{pmatrix} \,\Sigma t\_{f,j}.\tag{6}$$

Remark: Equation (6) is appropriate if all objects have failed. If components with very long life are tested, the tests are usually terminated after some predefined time or after failure of certain fraction of all components. In such cases, modified formulas for *MTTF* must be used; see Chapter 20 or [1].

### **2. Repaired objects**

If a repairable item fails, it is repaired and again put into operation. After the next failure, it is again repaired and put into operation, etc. One can thus speak of a flow of operations and repairs (Fig. 2). If we denote each interval as "uptime" *t*up or "downtime" *t*down, we can calculate the **mean time between failures,** *MTBF*, and the **mean time to repair,** *MTTR***:**

$$MTBF = \begin{pmatrix} 1/n \end{pmatrix} \ \Sigma t\_{\text{up},\downarrow\prime} \tag{7}$$

$$MTTR = \begin{pmatrix} 1/n \end{pmatrix} \ \Sigma t\_{down,l} \tag{8}$$

If data for a high number of values *t*up and *t*down are available, the distribution of these times can also be obtained and used.

**Figure 7.** Flow of times of operations (up-times, tup) and repairs (down-times, td).

**Figure 2.** Flow of operations (uptimes, *t*up) and repairs (downtimes, *t*d).

The mean time between failures and mean time to repair can be used to characterize the probability that the object will be serviceable at a certain instant or not. The **coefficient of availability,** *COA*, is defined as [2, 3]:

$$\text{COA} = \begin{array}{c} \sum t\_{up} \text{/} t\_{tot} = \begin{array}{c} \sum t\_{up} \text{/} \left( \sum t\_{up} + \sum t\_{down} \right) \end{array} \tag{9}$$

where ∑*t*up is the sum of times of operation during the investigated interval (e.g. 1 month or year), ∑*t*down is the sum of down times in this interval, and *t*tot is the total investigated time. The coefficient of availability can also be calculated as

$$\text{COA} = \text{MTBF} \mid \left( \text{MTBF} + \text{MTTR} \right) \; ; \tag{10}$$

*MTBF* is the mean time (of operation) between failures and *MTTR* is the mean time to repair (generally, the mean down time caused by failures).

The coefficient of availability simply says what part of the total time is available for useful work. It also expresses the average probability that the object will be able to fulfill the expected task at any instant.

The complementary quantity, **coefficient of unavailability**,

$$\text{COIL} = \sum t\_{\text{down}} / \left(\sum t\_{up} + \sum t\_{down}\right) \\ = \text{MTTR} / \left(\text{MTBF} + \text{MTTR}\right) \\ = 1 - \text{COA} \\ \tag{11}$$

says how many percent of the total time are downtimes. It also expresses the probability that the object will not be able to perform its function at a demanded instant. For example, *COA* = 0.9 means that, on average, the vehicle (or machine) is only 90% of all time in operation, and 10% of the total time it is idle due to failures. In other words, there is a 90% probability that the object will be available when needed and a 10% probability that it will not be available. Even the simple records from operation can give the basic values of probabilities and reliability.

It must reminded here that the time of a repair is not always the same as the downtime when the object (e.g. a machine) does not work. In addition to the net time of the repair, some logistic times are often necessary, which sometimes last much longer than the repair.

#### **3. Failure rate**

A very important reliability characteristic is **failure rate** *λ***(***t***)**. Basically, failure rate expresses the probability of failure during a time unit. Unlike probability, which is nondimensional, failure rate has a dimension. It is *t*-1, for example, h-1 or % per hour for machines, components, or appliances, km-1 for vehicles, etc. Two cases must be distinguished, depending on whether the object after failure is repaired or not.

#### **Unrepaired objects**

The failed item is discarded. This is typical of simple unrepairable objects, such as lamp bulbs, screws, windows, integrated circuits, and many inexpensive parts. Also, a living being cannot be repaired, if it has died. Some objects could be repaired after failure but are not, because of economic reasons. Thus, the term nonrepaired objects can be used as more universal.

Failure rate expresses the probability of failure during a time unit but is related only to those objects that have remained in operation until the time *t*, that is, those that have not failed before the time *t*. Failure rate is defined as

$$\mathcal{A}(t) := f(t) / \mathcal{R}(t) \; ; \; \tag{12}$$

*f*(*t*) is the probability density of failure (=d*F*/d*t*), and *R*(*t*) is the probability that the object was operated until the time *t*. An illustrative idea of failure rate can be gained from a simple formula for its calculation from the data from operation:

$$\mathcal{A}\{t\} \, = \begin{bmatrix} n\_f\left(t+\Delta t\right) \ -n\_f\left(t\right) \end{bmatrix} / \left\{ \left[ \left( n-n\_f\left(t\right) \right) \Delta t \right] \right\} \, \tag{13}$$

*n* is the total number of the monitored objects, *n*<sup>f</sup> (*t*) is the number of the objects failed until the time *t*, [*n*<sup>f</sup> (*t* + ∆*t*) – *n*<sup>f</sup> (*t*)] is the number of objects failed during the time from *t* to *t* + ∆*t*, and ∆*t* is a short time interval. [Remark: Formula (13) is only approximate and often exhibits big scatter. A more accurate value of the instantaneous failure rate *λ*(*t*) can be obtained from several *n*<sup>f</sup> values occurring in a wider interval around the time *t*.]

The fraction of failed objects, *F*(*t*), increases with time, and the fraction of objects that have not failed, *R*(*t*), decreases.

Equation (1) relates mutually three variables: λ, *f*, and *R*. Fortunately, it can be transformed into simple relationships of two quantities. First, it can be rewritten as follows:

$$\mathcal{A}(t) = f(t) / \mathcal{R}(t) \ = \begin{bmatrix} dF(t) / dt \\ \end{bmatrix} / \mathcal{R}(t) \ = \ -\left\lceil d\mathcal{R}(t) / dt \right\rceil / \mathcal{R}(t) \ . \tag{14}$$

The separation of the variables leads to the differential equation of first order,

$$
\mathcal{A}(t) \, dt = - \, d\mathcal{R}(t) \, \, / \mathcal{R}(t) \, \,. \tag{15}
$$

The integration and transformation lead to the following expression for the probability of operation as a function of time:

$$R(t) = \exp\left(-\int\_0^t \lambda(t)dt\right). \tag{16}$$

The probability of failure is

$$F(t) \ = \ 1 \ -R(t) \ . \tag{17}$$

With respect to Equations (12) to (17), any of the four quantities *f*, *F*, *R*, and *λ* is sufficient for the determination of any of the remaining three quantities.

The mean time to failure can be calculated using Equation (5).

#### **Repaired objects**

After a failure, the object is repaired and continues working. In complex systems, the failed part can also be replaced by a good one to reduce the downtime. The number of working objects remains constant, so that *R*(*t*) = 1. Failure rate (1) thus corresponds to the failure probability density, *λ*(*t*) = *f*(*t*). In this case, the term **hazard rate** is used as more appropriate, but the expression failure rate is also very common.

#### **Example 1**

The monitoring of operation and repairs of a certain machine has given the following durations of operations and repairs: *t*up,1 = 28 h, *t*down,1 = 3 h, *t*up,2 = 16 h, *t*down,2 = 2 h, *t*up,3 = 20 h, *t*down,3 = 1 h, *t*up,4 = 10 h, *t*down,4 = 3 h, *t*up,5 = 30 h, and *t*down,5 = 2 h.

Tasks.


#### **Solution**


#### **Example 2**

In a town, *N* = 30 buses are necessary for assuring reliable traffic on 15 routes. However, due to failures and maintenance, several buses are unavailable every day. As it follows from longterm records, the mean availability of the buses is *COA* = 0.85. How many reserve buses (*N*r) are necessary? What is the total necessary number of buses *N*tot?

#### **Solution**

The coefficient of availability can be calculated as the number of operable buses, *N*up, divided by the total number of vehicles, *COA* = *N*up/*N*tot, from which *N*tot = *N*up/*COA*. With the above numbers, *N*tot = 30/0.85 = 35.29. To reliably ensure the public traffic, 36 buses are thus necessary. The number of reserve vehicles is 36 – 30 = 6 buses.

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


## **Chapter 4**

## **Bathtub Curve**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62357

#### **Abstract**

Typical time course of failure rate of unrepaired objects, called bathtub curve, is shown and its main stages are explained: period of early failures, useful life, and period of aging and deterioration. Attention is paid to the useful-life period, where the failure rate is con‐ stant and the distribution of times to failure (or between failures) is exponential. Illustra‐ tive examples are included.

**Keywords:** Failure rate, bathtub curve, early failures, steady-state operation, period of ag‐ ing, exponential distribution

Failure rate, as defined in Chapter 3, can change with time. Figure 1 shows the time course of λ(*t*) typical of nonrepairable objects, such as electrical bulbs, pumps, switches, or springs, and also living beings, including humans. Such course can be obtained if the operation of a high number of objects of the same kind is monitored. Due to its shape, resembling a longi‐ tudinal section of a bathtub, the curve has got the nickname **bathtub curve**. It can be divided into three stages with characteristic time courses related to different reasons of failures.

Stage I. Failure rate λ(*t*) is high at the beginning and decreases with time. The failures occur due to errors in design, weak components or inferior materials, due to faults appearing during manufacture or building, or due to mistakes caused by an inexperienced personnel or user. A weak newborn baby more easily succumbs to an infectious disease. Software errors also belong to this category. The failed components are discarded and not used any more, the customer gradually becomes familiar with the use of a product, and the errors in software are corrected. This period is called the stage of **early failures** or **infant mortality**.

Stage II. Failure rate λ is low and approximately constant. In contrast to early failures, caused by the inherent weakness of the object, the failures during stage II occur mostly due to external reasons, such as overloading, collision with another object, weather or natural catastrophes,

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**Figure 1.** Bathtub curve (a schematic). λ(*t*) – failure rate; *t* – time. I – stage of early failures; II – steady state, useful life, III – wear-out period.

hidden defects, and mistakes of the personnel. (In the case of people, the reasons for the "failures" during this stage are traffic accidents, diseases, wars, and murders.) Depending on the object and conditions, failure rates for various objects can be very different. Stage II represents the major part of the life and is called the **useful life** or the period of **steady-state operation**.

Stage III. Failure rate λ(*t*) increases with time. The failures in this stage are caused by wear, fatigue, corrosion, or gradual deterioration of the material, for example due to UV radiation (plastics) or ozone (rubber). This period is called the **wear-out period** or **aging**.

Figure 1 shows the general shape of the time course of failure rate. In reality, various patterns of λ(*t*) can occur. Today, many advanced products, when put into operation, have the failure rate constant from the beginning, without the period of early failures. This can be achieved by using high-quality materials and reliable components admitted only after entrance tests and by excluding potentially risky solutions as early as in the design stage, thanks to computer modeling and the simulation of various design solutions and conditions of operation. Also, thorough controls and checks during manufacture or building are an efficient tool for avoiding early failures or significant reduction of their number. Examples are cars, TV sets, washing machines, and other consumer goods. In the past, the so-called burning-in period was used for some products before putting them into operation. During this period, the objects were some time switched on, often under somewhat higher voltage or load, so that the weaker components failed during this period, before the object was sold to the customer and put into service. Today, thanks to special tests and the use of high-quality components, the burning-in period is not necessary. An evidence of the generally better situation today is the significantly longer warranty time provided by the manufacturers of many products.

Also, stage III, the wear-out period, can be avoided for more complex objects if their technical condition is monitored and the critical parts approaching stage III are replaced in time by new ones. This case belongs to repairable objects. The "bathtub curve" here consists only of periods I and II (early failures and useful life) or even only period II (steady-state operation).

Remark: The failures from external reasons can happen at any time; the instantaneous resultant failure rate equals the sum of failure rates from all reasons.

#### **Special case: λ = const.**

This is a very important case, as constant failure rate can often be assumed (approximately) for the prevailing period of useful life (stage II in Fig. 1). With λ = *const*, the probability of failure during the interval <0; *t*> follows from Equations (15) and (16) in Chapter 3:

$$F(t) = 1 - R(t) = 1 - \exp\left(-\int\_0^t \lambda dt\right) = 1 - \exp(-\lambda t) \tag{1}$$

The reliability (i.e. the fraction of serviceable objects) decreases with time as

$$R(t) \ = \exp(-\lambda t),\tag{2}$$

The distribution of times to failure is exponential with the probability density

$$df(t) := dF(t) \;/\; dt = \lambda \exp(-\lambda t) \tag{3}$$

and the mean value

$$MTTF = \overline{t} = t\_{\text{max}} = \bigcap\_{0}^{\alpha} t f(t)dt = 1/\lambda. \tag{4}$$

Vice versa, the failure rate of some kind of components can be obtained from the mean time to failure,

$$
\mathcal{X} = \mathbf{1} \; / \mathbf{t}\_{\text{mean}}.\tag{5}
$$

The time course of reliability may thus also be expressed as

$$R(t) \;= \exp(-t/t\_{\text{mean}}) \; ; \tag{6}$$

note that the argument in the exponential function is nondimensional.

The mean time to failure (and also the mean time between failures) can be calculated by Equation (4). With λ = *const*,

$$MTTF = t\_{\text{mean}} = \bigcap\_{0}^{n} \exp(-\lambda t)dt \tag{7}$$

The empirical determination of the mean time to failure is based on the testing or monitoring of a group of components of the same kind and measuring their times to failure,

$$MTTF = t\_{\text{mean}} = (1 \,/\, m) \sum t\_{\text{j}} \,\tag{8}$$

the summation is done for all *n* tested objects. The mean failure rate is obtained easily as

$$
\lambda = 1 \; / \, \text{MTTF} \; . \tag{9}
$$

In design, the knowledge of failure rate λ of a component, found from the manufacturer's catalog or by measurement, enables the determination of the mean time to failure, which is important for the determination of the overall reliability of more complex systems (cf. Chapter 5).

Exponential distribution is typical of systems consisting of many elements, where failures happen from various reasons, as usual in electric or electronic appliances. However, one should not forget that the period with constant failure rate often becomes dominant only after some time *t*<sup>0</sup> from putting the system into operation. In such cases, the time *t* in Equation (6) must be replaced by *t* – *t*0.

Note: One must always keep in mind that the mean time between failures, calculated as the reciprocal value of failure rate, has nothing in common with the mean time to failures caused by aging or fatigue. Failure rate given in catalogs is determined from the period of steady-state operation. For example, a high-quality component has a failure rate λ = 10–6 h1 . However, this does not mean that these components will work until *t*<sup>f</sup> = 1/λ = 10<sup>6</sup> h. They fail after a much shorter time, for example after 10,000 h, when they enter stage III (wear out).

#### **Example 1**

A device should work 2 h without failure, and such operation should be 99% guaranteed. (There may be only 1% probability of failure during this time.) Assume that you can choose from various devices available in the market. What are the demanded failure rate and the mean time to failure of a suitable device? Assume exponential distribution of the time to failure.

Solution. The probability of failure-free operation is *R*(*t*) = exp(–λ*t*). Taking logarithms gives ln *R* = – λ*t*, from which the demanded failure rate is λ = – (1/*t*) ln *R*.

For the demanded *t* = 2 h and *R* = 0.99, the necessary failure rate is λ = – (1/2) ln 0.99 = 0.005025 ≈ 0.005 h–1. The demanded mean time to failure is *MTTF* = 1/λ = 1/0.005 = 200 h or more.

## **Example 2**

A ventilator (air fan) has exponential distribution of times to failure with the mean time *MTTF* = 10,000 h. Calculate the probability that the ventilator does not fail during the first 800 h after being put into operation. What is the probability of failure during this time?

Solution.

Probability of not failing: *R*(*t*) = exp(– *t*/*t*mean) = exp(–800/10,000) = 0.923 (=92.3%).

Probability of failure: *F*(*t*) = 1 – *R*(*t*) = 1 – 0.923 = 0.077 (=7.7%).

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

**Chapter 5**

## **Reliability of Systems**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62358

#### **Abstract**

Many objects consist of more components. The mutual arrangement of the individual ele‐ ments influences the resultant reliability. The formulae are shown for the resultant relia‐ bility of series arrangement, as well as for parallel and combined arrangement. The possibility of reliability increasing by means of redundancy is explained, and also the principle of optimal allocation of reliabilities to individual elements. Everything is illus‐ trated on examples.

**Keywords:** Reliability, systems, series system, parallel system, probability of failure, time to failure, failure rate, redundancy, reliability allocation

Many objects consist of more parts or elements. From reliability point of view, an element is any component or object that is considered in the investigated case as a whole and is not decomposed into simpler objects. An element can be a lamp bulb, the connecting point of two electric components, a screw, an oil hose, a piston in an engine, and even the complete engine in a diesel locomotive. Also, the individual operations or their groups in a complex manufacturing or building process can be considered as elements.

An example of a simple system is an electric lamp made by a light bulb, socket, switch, wires, plug, and the lamp body. An extremely complex system is an aircraft, containing tens of thousands of mechanical, hydraulic, or electric elements. Each of them can fail. This increases the probability that the whole system fails. The resultant reliability depends on the reliability of the individual elements and their number and mutual arrangement. A suitable arrangement can even increase the reliability of the system. In this chapter, important cases will be shown together with the formulas for the calculation of resultant reliability. Two basic systems are series and parallel, and their combinations are also possible.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### **1. Series system**

From reliability point of view, a series system (Fig. 1a) is such, which fails if any of its elements fails. For example, a motorcycle cannot go if any of the following parts cannot serve: engine, tank with fuel, chain, frame, front or rear wheel, etc., and, of course, the driver. All these elements are thus arranged in series. Elements are also screws and many other things. If failure of any component does not depend on any other component, the reliability of the system is obtained simply as the product of the reliabilities of individual elements,

$$R = R\_1 \times R\_2 \times R\_3 \times \dots \times R\_n = \Gamma \Pi R\_j. \tag{1}$$

A practical conclusion is that "the reliability of a series system is always lower than the reliability of any of its components".

**Figure 1.** Examples of series system (a) and parallel system (b).

The probability of failure is complementary to reliability, i.e.

$$F = \begin{array}{c} \mathbf{1} \ -\mathbf{R} \end{array} \tag{2}$$

The characteristic features of series arrangement will be shown on several examples.

#### **Example 1**

The resultant reliability of two components is *R* = *R*1 × *R*2. For example, if *F*1 = 0.1 and *F*2 = 0.2, then *R*1 = 0.9 and *R*2 = 0.8 and *R* = 0.9 × 0.8 = 0.72. This is less than the reliability of the weaker component no. 2. The probability of failure has increased to 1 – 0.72 = 0.28, i.e. more than the failure probability *F*2.

#### **Example 2**

The reliability of a series system with three elements with *R*1 = 0.9, *R*2 = 0.8, and *R*3 = 0.5 is *R* = 0.9 × 0.8 × 0.5 = 0.36, which is less than the reliability of the worst component (*R*<sup>3</sup> = 0.5). This reminds of the well-known saying "The chain is as weak as its weakest link" (which, however, does not consider that several components can fail simultaneously).

#### **Example 3**

The influence of the number of elements (and thus complexity of the system) can be illustrated on several systems where all components have the same probability of failure *F*<sup>1</sup> = 0.02; the corresponding reliability *R*<sup>1</sup> = 0.98. What will be the reliability of a system composed of (a) 2 components, (b) 10 components, (c) 50 components, and (d) 200 components?

Solution: (a) *R* = *R*1 × *R*1 = 0.982 = 0.960; (b) *R* = *R*<sup>1</sup> 10 = 0.9810 = 0.817; (c) *R* = *R*<sup>1</sup> 50 = 0.9850 = 0.364; and (d) *R*= *R*<sup>1</sup> 200 = 0,98200 = 0.0176.

One can see that the drop of reliability is significant especially for high numbers of components. Although one component has relatively high reliability (98%), a system with 200 such parts is practically unable to work, as it has reliability lower than approximately 2% and probability of failure 98%! Complex large systems must therefore be assembled from very reliable elements.

Until now, we have assumed that the reliability of individual parts does not change with time. If it varies, Equation (1) changes to

$$R\begin{pmatrix} t \ \end{pmatrix} = R\_1\begin{pmatrix} t \end{pmatrix} \times R\_2\begin{pmatrix} t \end{pmatrix} \times R\_3\begin{pmatrix} t \end{pmatrix} \times \dots \times R\_n\begin{pmatrix} t \ \end{pmatrix} \ = \Pi R\_f\begin{pmatrix} t \ \end{pmatrix} \tag{3}$$

the resultant probability of failure is obtained as

$$F(t) \ = \ 1 \ -R(t) \ \ \text{or} \ F(t) \ \ = \ 1 \ -\Pi \begin{bmatrix} 1 \ -F\_f(t) \end{bmatrix} \ . \tag{4}$$

The reliability of components is often characterized by failure rate *λ*. If the failure rate may be assumed constant (especially in systems containing many elements), the decrease of reliability with time is exponential, *R*(*t*) = exp (– *λt*), and Equation (3) changes to

$$\begin{split} R(t) &= \exp(-\lambda\_1 t) \times \exp(-\lambda\_2 t) \times \exp(-\lambda\_3 t) \times \dots \times \exp(-\lambda\_n t) \\ &= \exp\left[ -(\lambda\_1 + \lambda\_2 + \lambda\_3 + \dots + \lambda\_n)t \right] = \exp(-\lambda t) \,. \end{split} \tag{5}$$

The distribution of times to failure of such system is again exponential, with the resultant failure rate equal the sum of individual failure rates,

$$
\mathcal{X} = \begin{array}{c}
\Sigma \mathcal{X}\_{\mathfrak{l}}.\end{array} \tag{6}
$$

This means that "the failure rate of a series system is always higher (and the mean time between failures shorter) than that of individual components, and the reliability *R*(*t*) decreases with time faster".

The mean time between failures is

$$\text{MTBF} = \text{1 } / \text{\AA} \text{.} \tag{7}$$

The decrease of reliability with time is illustrated in Figure 2 for several systems with different numbers of elements. One can see a very fast drop of reliability in systems with many components. This must be accounted for if guaranteed operation of a complex object during certain time is demanded. This issue will be treated in detail later.

**Figure 2.** Series system. Time course of reliability for various number of elements *n*.

#### **2. Parallel system**

A parallel system (Fig. 1b) is such, which fails only if all its parts fail. An example is a fourcylinder engine. It will fail only if all four cylinders are unable to run. If one, two, or even three cylinders do not work, the fourth one is still able to put the car into motion (though with significantly reduced power).

The probability of a simultaneous occurrence of mutually independent events equals the product of individual probabilities. In parallel systems, the resultant probability of failure is thus calculated as

$$F(t) \ = F\_1(t) \times F\_2(t) \times F\_3(t) \times \dots \times F\_n(t) \ = \Pi F\_f(t) \ . \tag{8}$$

Reliability is complementary to probability of failure, i.e.

Reliability of Systems http://dx.doi.org/10.5772/62358 37

$$R(t) \ = \ 1 \ -F(t) \ \ , \ or \ R(t) \ = \ 1 \ -\Pi \begin{bmatrix} 1 \ -R\_\rangle(t) \end{bmatrix} \ . \tag{9}$$

For example, if two components are arranged in parallel, each with reliability *R*1 = *R*2 = 0.9, that is, *F*1 = *F*<sup>2</sup> = 0.1, the resultant probability of failure is *F* = 0.1 × 0.1 = 0.01. The resultant reliability is *R* = 1 – 0.01 = 0.99. The probability of failure has thus dropped 10 times. This feature is sometimes used for reliability increasing by using redundant parts (see later).

If the reliability of elements is characterized by failure rates, the situation is more complex than in a series system, even if the failure rates of the individual elements are constant. For the simplest case of two components, with *R*1(*t*) = exp(-*λ*1*t*) and *R*2(*t*) = exp(-*λ*2*t*),

$$\begin{aligned} F(t) &= F\_1(t) \times F\_2(t) \ &= \begin{bmatrix} 1 \ -R\_1(t) \end{bmatrix} \times \begin{bmatrix} 1 \ -R\_2(t) \end{bmatrix} = \\ &= \begin{bmatrix} 1 \ -\exp(-\lambda\_1 t) \end{bmatrix} \times \begin{bmatrix} 1 \ -\exp(-\lambda\_2 t) \end{bmatrix} = \\ &= \begin{bmatrix} 1 - \exp(-\lambda\_1 t) \ -\exp(-\lambda\_2 t) \ &+\exp\left[-(\lambda\_1 + \lambda\_2)t\right] \end{bmatrix} .\end{aligned} \tag{10}$$

and

$$R(t) = 1 - F(t) \ = \exp(-\lambda\_1 t) \ + \exp(-\lambda\_2 t) \ - \exp\left[-(\lambda\_1 + \lambda\_2)t\right] \ . \tag{11}$$

The distribution is no more exponential and the failure rate is not constant. The mean time to failure is

$$MTTF = \bigcap\_{0}^{n} \mathbf{R}\{t\} \\ dt = \prod\_{0}^{n} \exp\left(-\boldsymbol{\lambda}\_{1}t\right) + \exp\left(-\boldsymbol{\lambda}\_{2}t\right) - \exp\left[-\boldsymbol{\lambda}\_{1}t + \boldsymbol{\lambda}\_{2}t\right] \\ dt = \boldsymbol{\lambda}\_{1}^{-1} + \boldsymbol{\lambda}\_{2}^{-1} - \left(\boldsymbol{\lambda}\_{1} + \boldsymbol{\lambda}\_{2}\right)^{-1} \tag{12}$$

For identical components, with *λ*1 = *λ*2 = *λ*,

$$\text{MTTF} = \text{\textX}^{-1} + \text{\textX}^{-1} - (\text{\textX} + \text{\textX})^{-1} = \text{\text{\textX} / 2}\text{\textX}^{-1} = \text{\text{\textX} / 2}\text{)MTTF},\tag{13}$$

i.e. by 50% longer than the mean time to failure of individual components.

The solution for parallel systems with more elements can be obtained in similar way. However, it is much more complicated. Analytical solutions exist only in very simple cases; more effective is the use of the Monte Carlo simulation method, explained in Chapter 15.

Generally, the reliability of parallel arrangement can be characterized as follows:

"The probability of failure-free operation of a system with several parallel elements is always higher than that of the best element in the system." The situation is depicted in Figure 3. Also, the mean time to failure of a parallel system is always longer than that of any of its parts. For this reason, parallel arrangement is sometimes used to increase reliability (see further).

**Figure 3.** Parallel system. Time course of reliability for various number of elements *n*.

#### **3. Combined arrangement**

In some systems, series and parallel arrangements of elements appear together (Fig. 4). The resultant reliability can be found using step-by-step solution and gradual simplification. The group of elements arranged in series is replaced by one element with equivalent reliability parameters. Parallel elements can sometimes also be replaced by an equivalent element, and so on. The situation is easier if the time dependency of reliabilities does not need to be considered. Unfortunately, if reliability is characterized by failure rates, the failure rate for parallel arrangement is not constant and no simple and accurate analytical solutions exist, only approximate. Better results can be obtained using numerical simulation methods.

**Figure 12.** Combined system.

**Figure 4.** Combined system.

## **4. Redundancy**

Reliability can be increased if the same function is done by two or more elements arranged in parallel. This is called redundancy. Two kinds of redundancy can be distinguished: structural and algorithmic. **Structural redundancy** uses more components for the same purpose. Examples include dual-circuit brakes in modern cars, a reserve water pump in a power plant, joining of two load-carrying parts using more rivets than necessary for safe transfer of the load, a spare electric generator to ensure safe power supply in a hospital, or a reserve electric line. **Redundancy** can be **active** (the parallel elements work or are loaded simultaneously) or **standby**. In the latter case, only one element is loaded or works, whereas the second (third, etc.) redundant element is switched on just if the first one has failed. The advantage of standby redundancy is that only one component is loaded and exposed to wear or other kinds of deterioration. A disadvantage is that such arrangement usually needs a **switch** or similar item, which increases the costs and can also contribute to the unreliability of the system.

The second case is algorithmic redundancy. This means the repetition of some operations, for example measurement or check for defects in some kinds of nondestructive control, such as X-ray or ultrasonic revealing of internal defects in castings or fatigue cracks in airframes or wings, as well as the proofreading of a paper for finding errors. Algorithmic redundancy is commonly used in the transmission of signals and information, from the simple addition of parity bits (check digits) to complex systems for safe information coding.

## **5. Reliability allocation**

Until now, we determined the resultant reliability of a system composed of more components. In the design of complex systems, an opposite problem appears: what should be the reliabilities of individual parts so that the reliability of the whole system is equal to some demanded value (or better)? Several methods of reliability allocation were proposed. The simplest one for series systems uses **equal apportionment**, which distributes the reliability uniformly among all members. If the resultant reliability should be *R* and the system consists of *n* components in a series, each of the reliability *R*<sup>i</sup> , then it follows from Equation (1) that *R* = *R*<sup>i</sup> n, so that every single element should have the reliability

$$R\_{\rangle} = \mathbb{R}^{1/n}.\tag{14}$$

If failure rates are considered, then the failure rate *λ*<sup>i</sup> of every element should be

$$
\lambda\_i = \lambda \mid \mathfrak{n}, \tag{15}
$$

where *λ* is the demanded failure rate of the system.

Also other apportionments are possible. Not always has each available component the reliability *R*<sup>i</sup> or *λ*<sup>i</sup> corresponding exactly to Equation (14) or (15). Such values can serve as a guide for finding the parameters so that the resultant reliability (1), (3), or (6) fulfills the requirements. In the reliability allocation, other criteria can also be considered, such as the importance of individual parts.

#### **Example 4**

A system consists of three parallel components (Fig. 1b) with probabilities of failure (during a certain, unspecified time): *F*1 = 0.08, *F*2 = 0.20, and *F*<sup>3</sup> = 0.20. Calculate the resultant probability of failure (*F*) and of failure-free operation (*R*). Assume that the components are independent.

Solution. In parallel systems, *F* = *F*1 × *F*2 × *F*<sup>3</sup> = 0.08 × 0.20 × 0.20 = 0.0032. *R* = 1 – *F* = 1 – 0.0032 = 0.9968. (Compare the results with the failure probabilities of individual components!)

#### **Example 5**

Calculate the mean time to failure and failure rate of a system consisting of four elements in a series (like in Fig. 1a). The individual elements have exponential distribution of the time to failure with failure rates *λ*1 = 8 × 10– 6 h–1, *λ*2 = 6 × 10– 6 h–1, *λ*3 = 9 × 10– 6 h–1, and *λ*4 = 2 × 10– 5 h–1. Calculate the probability of failure (in %) during the time *t* = 500 hours of operation.

Solution.

$$\begin{aligned} \lambda &= \lambda\_1 + \lambda\_2 + \lambda\_3 + \lambda\_4 = \begin{pmatrix} 8 \ +6 \ +9 \ +20 \end{pmatrix} \times 10^{-6} = 43 \times 10^{-6} h^{-1} .\\ \text{MTTF} &= 1 \ / \lambda = 1 / 43 \times 10^{-6} = 23,256 \text{ h} .\\ F(t) &= 1 - \exp(-\lambda t) = 1 - \exp(-43 \times 10^{-6} \times 500) = 0.9787 = 97.87\% . R = 1 - F = 2.13 \text{ \%.} . \end{aligned}$$

#### **Example 6**

Calculate the resultant probability of failure (*F*) and failure-free operation (*R*) for a combined series-parallel system (Fig. 4). Assume that the components are independent. The failure probabilities of individual elements are: *F*1 = 0.08, *F*2 = 0.30, *F*3 = 0.20, and *F*4 = 0.10.

Solution. The system must be solved step-by-step. First, the reliability of elements 2 and 3 in a series is calculated: *R*2–3 = *R*2 × *R*3 = (1 – *F*2) × (1 – *F*3) = (1 – 0.3) × (1 – 0.2) = 0.7 × 0.8 = 0.56. The probability of failure is complementary to reliability, so that *F*2–3 = 1 – *R*2–3 = 1 – 0.56 = 0.44. Then, the reliability of this *F*2–3 group arranged in parallel with element 4 is obtained as *F*4,2–3 = *F*4 × *F*2–3 = 0.10 × 0.56 = 0.056. The resultant reliability of the whole system is obtained as the reliability of component 1 in a series with the subsystem 4,2-3. Here, the reliabilities must be multiplied. The resultant reliability thus is

$$\begin{aligned} R &= R\_1 \times R\_{4,2-3} = \begin{pmatrix} 1 \ -F\_1 \end{pmatrix} \times \begin{pmatrix} 1 \ -F\_{4,\ 2-3} \end{pmatrix} = \\ &= \begin{pmatrix} 1 \ -0.08 \end{pmatrix} \times \begin{pmatrix} 1 \ -0.056 \end{pmatrix} = 0.92 \times 0.944 = 0.86848. \end{aligned}$$

The resultant probability of failure is *F* = 1 – *R* = 1 – 0.86848 = 0.13152 ≈ 0.13.

### **Example 7**

The failure rate of a system of five components arranged in a series should be *λ* = 2.0 × 10-5 h-1. Determine the failure rate of individual components provided that all can have the same *λ*<sup>i</sup> .

Solution. The resultant failure rate of this series system is *λ* = *λ*1 + *λ*2 + *λ*3 + *λ*4 + *λ*5. For identical components, it is *λ* = 5*λ*<sup>i</sup> . The demanded failure rate of each part is *λ*<sup>i</sup> = *λ*/5 = 2.0 × 10– 5 / 5 = 4.0 × 10– 6 h-1.

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

#### Concise Reliability for Engineers

## **Time to Failure of Deteriorating Objects**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62359

#### **Abstract**

This chapter explains the prediction of the time to failure in the following cases: fatigue of metallic components under cyclic loading or in the presence of cracks, static fatigue, wear and creep, variable loading (damage accumulation). Prediction of the time to failure based on monitoring of the changing response. Probabilistic aspects of the lifetime pre‐ diction. The determination of the time to failure is illustrated on examples.

**Keywords:** Failure, time to failure, fatigue, static fatigue, wear, creep, damage accumula‐ tion, prediction, monitoring

As we have seen in Chapter 4, failure rate often changes with time (Fig. 1 there). This is be‐ cause the main causes of failures change with time. The early failures (stage I) are mostly caused by errors in design, manufacture, assembly, or building process or due to hidden de‐ fects, and the instant of their occurrence cannot be predicted. The failures in stage II (useful life) arise from external reasons (random overloading, collision with another object, climatic events, and errors of personnel) and also cannot be predicted. Only if the pertinent failure rate is known one can predict approximately how often failures can be expected and take suitable measures to mitigate their effects.

The failures in stage III (aging and wear-out) arise due to the internal "weakness" of the object and appear after some time of operation even under appropriate conditions of use. Many objects fail due to wear, fatigue, creep, corrosion, or other processes of gradual deterioration. Fortunately, in such cases a possibility exists to predict (with higher or lower accuracy) the time when the object is about to fail, provided that the relationship between the load intensity and the rate of deterioration is known. Two principal ways exist:

**1.** If the basic mechanism of degradation is known, one can express the deterioration rate as a function of the characteristic load and then derive a formula for the calculation of the

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

time to failure. For the known load, the time to failure can thus be predicted in advance (e.g. during the design stage). Vice versa, it is possible to find such dimensions of loaded parts, which will guarantee an acceptably low rate of deterioration and thus the demanded life.

**2.** Due to deterioration, the load response of many objects also gradually changes. If a quantity characterizing the degradation is known (e.g. the magnitude of vibrations), it is possible to monitor its time development in the real object. These data can be extrapolated, and the time can be predicted at which the characteristic quantity reaches the critical value.

Both approaches may be combined. During design, the time to failure can be predicted, and during operation, this prediction is updated with respect to the actual time history of the operation and response. The object is never allowed in operation until the instant of expected failure. At a reasonable time before it, a check of its condition is made and a suitable time for repair or next inspection is proposed.

Both cases will be discussed here in detail.

## **1. Prediction of the time to fatigue failure from the law of material degradation**

A typical example is the fatigue of metallic parts under cyclic or periodic loading. If the characteristic stress is higher than the so-called fatigue limit, a small crack arises in the component after some time of operation and grows slowly, and when it reaches the critical size, the component breaks. The number of cycles to failure *N*<sup>f</sup> depends on the stress amplitude *σ*<sup>a</sup> in the loading cycle [1, 2]. In the simplest case, it can be expressed by the so-called *S - N* (or Wöhler) curve (Fig. 1):

$$N\_f = A \sigma\_a^{-\
u};$$

*A* and *m* are constants obtained by testing the material or component. Several variants of fatigue equation exist, but, basically, their character is similar to Equation (1). Sometimes, the stress range in the loading cycle, ∆*σ* = *σ*max – *σ*min, is worked with instead of the stress amplitude *σ*a.

Equation (1) can be rearranged to obtain the stress amplitude or range corresponding to the demanded life:

$$
\sigma\_a = \left( A / N\_t \right)^{\left( 1/m \right)}.\tag{2}
$$

This formula can be used in the dimensioning of a component for a certain prescribed life.

**Figure 1.** *S* – *N* curve for fatigue. *N*<sup>f</sup> – number of cycles till failure, *σ* – characteristic stress or load (amplitude *σ*a or range ∆*σ*).

### **2. Time to fatigue failure of objects with cracks**

Equation (1) does not assume any previous damage to the component. Sometimes, however, one or more cracks or similar defects are present in the body from the beginning. The behavior of bodies with cracks is studied by fracture mechanics [1 - 4]. Crack growth is influenced not only by the stress, but also by the crack size (Fig. 2). Both quantities form together a very important parameter, called **stress intensity factor** *K*,

$$K\_l = \sigma \,\,\, Y(a) \,\, a^{1/2};\tag{3}$$

*σ* is the nominal stress in the crack region, *a* is the characteristic length of the crack, and *Y(a)* is a factor characterizing the crack shape and size and stress distribution. The subscript of *K* denotes the mode of crack opening; number I means simple opening, which is the most important case. If the stress intensity factor attains the critical value *K*<sup>C</sup> for the given material, fast fracture follows. The corresponding critical crack length is

$$\boldsymbol{a}\_c = \left\{ \boldsymbol{K}\_{\mathrm{lC}} \;/\ \left[ \boldsymbol{\sigma}\_{\mathrm{max}} \, \mathrm{Y} (\boldsymbol{a}\_c) \right] \right\}^2. \tag{4}$$

In components exposed to periodic loading, the crack can grow very slowly even if the stress intensity factor is lower than the critical value. The period of subcritical crack growth can last from minutes to years and can be predicted via the relationship between the crack velocity and the stress intensity factor. The crack velocity *v* during the period most important for delayed failure (region II in Fig. 3a) can often be approximated by the Paris-Erdogan equation [3]:

**Figure 2.** Body with a crack. Characteristic modes of crack opening.

$$
\sigma = \text{ d}a \text{ / } \text{dN} = \text{B} \Lambda \text{K}\_{\text{l}} \text{ \textdegree.} \tag{5}
$$

d*a*/d*N* is the increment of crack length per loading cycle, ∆*K*I is the range of stress intensity factor in the loading cycle, ∆*K*I = *K*I,max – *K*I,min, and *B* and *n* are the constants for the given material and environment. Inserting ∆*K*I from the modified Equation (3) into Equation (5) and sepa‐ rating *a* and *N*, one can arrive at the following expression for the number of cycles for the crack growth from the initial length *a*<sup>i</sup> to the length *a*:

$$N = \int\_{a\_0}^{a} \frac{da}{B\Delta\sigma^m \left[\left(Y\left(a\right)\right)^m a^{m/2}}\tag{6}$$

Fast fracture occurs if the stress intensity factor attains critical value *K*IC, also called fracture toughness. The corresponding critical crack length *a*c, used as the upper limit in the integral (6), is given by Equation (4), with *σ*max denoting the maximum stress in the loading cycle. The resultant formula for *N*<sup>f</sup> in the simplest case (constant stress range, small crack enlargement, and thus *Y* ≈ const) is basically similar to Equation (1), with ∆*σ* instead of *σ*a. The number of cycles to failure is roughly indirectly proportional to some power of the characteristic stress range or amplitude. It is thus possible in design to propose such dimensions of the cross-section that the stresses will be so low to guarantee the demanded lifetime.

Figure 15. Crack growth velocity as a function of stress intensity factor. a – metallic materials, periodic loading, b – brittle materials, static load. **Figure 3.** Crack growth velocity *v* as a function of the stress intensity factor (a schematic). *a* – metallic materials, period‐ ic loading; *b* – brittle materials, static load.

#### **3. Static fatigue, wear, and creep**

In some materials, fatigue occurs even under constant load. Examples are glass and some ceramics as well as some metals in a corrosive environment. In this case, called static fatigue, it is possible to express the time to failure *t*<sup>f</sup> as a function of acting stress. The velocity *v* of very slow crack growth depends not on the amplitude but on the value of the stress intensity factor *K*I (Fig. 3b), and in the important part of the *v*(*K*) diagram, the velocity can be approximated by a power-law function:

$$
\sigma = \begin{array}{c} d\mathfrak{a} \end{array} / \begin{array}{c} dt = AK\_{\mathfrak{l}} \end{array} \tag{7}
$$

The relationship for the time to failure is similar to Equation (1) or (6) with the stress amplitude *σa* replaced by characteristic stress *σ* and the number of cycles *N*<sup>f</sup> by the time to failure *t*<sup>f</sup> . The relationships similar to Equation (1) are also used for the prediction of the life of ball bearings and other components exposed to **wear** or of parts exposed to **creep** and other kinds of gradual deterioration. In all these cases, the time to failure is roughly indirectly proportional to some power of the characteristic load *P*,

$$t\_{\mathbf{f}} = AP^{-n} \mathbf{\hat{\phantom{x}}} \tag{8}$$

the time to failure in rotating parts can be expressed by means of a number of revolutions. Equation (8) is the simplest formula; the relationship in some cases is more complex. For details, the reader is referred to a special literature, for example [1 - 4].

The consequences of random variability of load and uncertainties in the determination of parameters in fatigue equation will be dealt with in Chapter 19.

### **4. Variable loading**

Until now, we have assumed a constant load amplitude. Often, it varies. Figure 4 depicts a regular operation regime of a machine, with four characteristic stages. Examples of irregular or random regime are bogie of a car or components of an engine. In all these cases, the concept of **damage accumulation** is used. Various hypotheses and models have been proposed [1, 2]. Here, only the simplest concept of linear damage accumulation, also named the Palmgren-Miner rule, will be explained.

**Figure 4.** Variable loading (a schematic for damage accumulation).

The basic idea of the **Palmgren-Miner rule** is that every loading cycle contributes to damage and exhausts a minute part of the life. Damage (in fact relative damage) *D* is then defined as the ratio of the number of loading cycles, which the object has undergone, and the number of cycles (under the same kind of loading), which would cause failure,

$$D = \mathcal{N} / \mathcal{N}\_{\text{f}}.\tag{9}$$

For example, if a component could sustain 1000 loading cycles until failure, then one loading cycle has exhausted 1/1000 of its fatigue life. Failure occurs if *N* = *N*<sup>f</sup> , that is if *D* = 1 (in this case, *D* = 1 for *N* = 1000). Note the difference between *N* and *N*<sup>f</sup> !

If the loading pattern is more complex, consisting of various loading blocks, each with different amplitude and different number of loading cycles (Fig. 4), the total damage is obtained as the sum of damages caused during the individual blocks,

$$D = \begin{array}{c} \sum D\_{\rangle} = \ \sum N\_{\rangle} / N\_{\mathfrak{f}\_{\langle \rangle}} \end{array} \tag{10}$$

Again, failure occurs if *D* = 1. In this way, it is possible to find the number of loading cycles or blocks until failure. If the damage, corresponding to one day of operation, is *D*1, then the object can sustain 1/*D*1 days before it fails.

Special procedures have been developed for cases where the load changes irregularly, in a random way, such as constructions of airplanes or cars. For more, see [1 - 3].

### **5. Prediction of the time to failure from the response**

The gradual deterioration of many objects can be characterized by various quantities, such as amplitude of vibrations, noise level, deformation under load, or loss of material by wear or corrosion. The pertinent quantity (*y*) grows gradually to the critical value, corresponding to failure or just an unacceptable condition (Fig. 5). In some cases, it is possible to monitor this quantity and its development in time. These data can be fitted by a suitable function *y*(*t*), and the time to failure is predicted as such, for which *y* reaches the critical value *y*<sup>C</sup> (Fig. 5). The knowledge of *y*C is thus very important. However, it cannot be predicted accurately due to uncertainties in the determination of material parameters, loads, influence of environment, and other factors. For these reasons, the object may never be let in operation till the instant of expected failure. At a reasonable time before it, a check of its condition must be done. This time is denoted as alert, *t*A, with the appropriately chosen value *y*A. The determination of *t*A is also shown in Figure 5. At this time, the object is inspected, and the obtained value *y*(*t*A) serves as a base for the decision on the maintenance, renovation, termination of further operation, or allowing the object in operation until the next inspection. Practical examples are shown at the end of this chapter.

**Figure 5.** Time course of gradual deterioration (e.g. wear) – a schematic.

## **6. Probabilistic aspects of the deterioration curves and lifetime prediction**

When dealing with the constants of a fatigue curve, taken from the material data sheets, or with the constants of a curve showing the development of a certain parameter with time, one must not forget that they are only approximate, obtained by testing several samples only. One should always be aware of the scatter of individual values. Actually, each curve corresponds to a particular specimen despite the same test conditions. The testing of many samples would thus give a group or distribution of *S* – *N* or similar curves (Fig. 6). The data can be processed in several ways. It is possible to use all experimental data points and calculate the parameters corresponding to the "average" curve, giving the "mean" times to failure. In this case, however, the actual times to failure will be in 50% of all cases longer than the times calculated using these parameters and in 50% shorter, which could be dangerous. The reliability of the predic‐ tion can be increased using confidence band around the regression curve based on the scatter of the individual values around this curve (see also Chapters 18 and 19). Another approach fits only certain quantiles of the times to failure for various stress levels. In this way, for example, the 5% quantile curve can be obtained. This is such curve that the probability of failing earlier is only 5%. Similarly, the curves for other reliabilities can be constructed. This approach is possible if many data points (several tens or more) are available.

**Figure 6.** Scatter of fatigue values and curves.

When taking the constants for calculations from a material database or literature, one should therefore know how they were obtained and to what conditions they correspond. Scatter also influences the values of other material parameters, such as strength, fracture toughness *K*IC, constants in the equation for subcritical crack growth or for creep, and so on. Also here, one must distinguish between the work with the mean values or with certain quantiles.

The uncertainty due to the scatter of experimental data must always be considered in the predictions of the time to failure or the alert time. The higher uncertainty, the longer time before reaching the critical state should an inspection be made. We return to this topic in Chapter 19.

### **Example 1**

A component is loaded by the sinusoidal mechanical stress of the amplitude *σ*<sup>a</sup> = 120 MPa. The fatigue (*S* – *N*) curve of the material (cf. Fig. 1) is *N*<sup>f</sup> = *Aσ*<sup>a</sup> – *<sup>m</sup>*, with the constants *m* = 4.0 and *A* = 5.0 × 1013 (for *σ*a given in MPa).

Task. Determine the number of cycles to failure *N*<sup>f</sup> . What part of the life will be exhausted after *N* = 20,000 loading cycles?

Solution. *N*<sup>f</sup> = *Aσ*<sup>a</sup> –*<sup>m</sup>* = 5.0 × 1013 × 120–4.0 = 241,126 cycles.

*D* = *N* / *N*<sup>f</sup> = 20,000 / 24,1126 = 0.0829 = 8.29 %.

#### **Example 2**

Determine the allowable stress amplitude for the component from Example 1.6/1 so that the component can sustain *N* = 600,000 cycles.

Solution. The rearrangement of the *S* – *N* curve gives *σ*a = (*A/N*<sup>f</sup> ) 1/m. With *N* = 600,000 and the constants *m* = 4.0 and *A* = 5.0 × 1013, the allowable stress is 5.0 × 1013/600,0001/4.0 = 95.54 ≈ 95 MPa.

Note. In reality, some factor of safety would be used, either for increasing the number of cycles to failure or for reducing the allowable stress. The stress can be reduced by more ample dimensioning.

#### **Example 3**

The technical condition of a machine can be characterized by the amplitude *y* of vibrations. This amplitude was measured once a day. During 10 days, the following values were meas‐ ured:


Task. Fit the measured data: (a) by a straight line, and (b) by exponential function and predict:


Solution. The measured data were plotted using Excel. Then, the command Add Trendline was used (Fig. 7). Linear approximation (a) was: *y* = *a* + *bt* = – 0.002667 + 0.054121*t*; the coefficient of determination, characterizing the quality of fit was *r* 2 = 0.9445. Exponential fit (b) was *y* = 0.085250 × exp(0.19762*t*), with the coefficient of determination *r* 2 = 0.9952. (Remark: The higher the *r* 2 , the better fit; *r* 2 = 1 means perfect fit.)

The approximations have given the following amplitudes of vibrations on 14th day: *y*lin = 0.755 mm according to the linear fit, and *y*exp = 1.356 mm according to the exponential fit. The amplitude *y* = 1.00 mm could be expected for *t* = 18.52 days according to the linear fit and for *t* = 12.46 days according to the exponential approximation.

**Figure 7.** Vibration amplitude as a function of time (an example). Fitting by various regression functions and predic‐ tions.

One can see the big differences between the predictions done using different approximations. Caution is necessary especially for longer intervals of predictions and also with respect to the consequences of a wrong prediction.

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


**Chapter 7**

## **Maintenance**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62360

#### **Abstract**

Various approaches to maintenance are explained: maintenance after failure (breakdown maintenance), preventive maintenance, on-condition maintenance, reliability centred maintenance (RCM), the use of technical diagnostics.

**Keywords:** Maintenance, preventive maintenance, on-condition maintenance, reliability centred maintenance (RCM), technical diagnostics

The operational reliability and service life of machines, vehicles, various appliances, bridges, and many other long-life objects are strongly influenced by **maintenance**. This term general‐ ly denotes small works for restoring full operability, such as cleaning, exchange of oils and filters, and tightening of the locked screws and other adjustments, as well as the repairs of paints or minor faults or the exchange of small damaged parts.

Maintenance is related to the kinds of objects and failures, and its techniques and strategies have been developed along with the development of technology. Historically first was the **maintenance after failure** or **breakdown maintenance**. It was the only strategy common when the machines were relatively simple, their number and productivity was low, and the knowl‐ edge about their operation and failures was limited. The losses caused by the corresponding downtimes had to be accepted, as no other solution was known. The strategy of repairs and maintenance after failure is used also today, namely in two cases. The first one are the sudden failures that cannot be predicted, and the second case are the failures of cheap objects, where a failure has none or only insignificant consequences.

During the first half of the 20th century, substantial changes have occurred. The machines, vehicles and other objects became more complex, with failures from various reasons. The number of failures grew, the more so that some products, such as cars, were produced in large

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

series in production lines. The consequence of a failure of a machine in such line was the stopping of the whole line, and the losses caused by this break became much bigger. Also a failure of a complex and expensive object (e.g. a locomotive, a big ship, or an aircraft) meant big losses. Many of these failures were due to wear or fatigue and occurred after a certain time of operation. Thanks to the understanding of these processes, it became possible to predict approximately the time to failure or to a significant degree of wear in some kinds of items (bearings, cylinders and piston rings, sealing, and valves), or the time to the critical loss of efficiency of oil or air filters, etc. All this has led to the change of the maintenance philosophy. Gradually, the so-called **preventive maintenance** was introduced. Its principle is simple: the endangered parts are repaired or replaced by new ones in fixed regular intervals based mostly on experience. The organization of preventive maintenance is simple and cheap. This approach is suitable for gradual failures (deterioration due to wear, fatigue, corrosion, etc.) and is effective especially if the scatter of the deterioration rate or the time to failure of individual items of a certain kind is low.

In complex systems consisting of many components, failures occur from various reasons, in irregular intervals, and the failures of the system usually occur due to other reasons than the fatigue of key components. During this useful period of the life, the failure rate of the system is approximately constant and the time between failures has exponential distribution (cf. Chapter 4). The maintenance according to a fixed plan then loses any substantiation.

Another drawback of preventive maintenance is that the replacement of components in fix intervals can be too early for some of them, which is uneconomical, and too late for some others, which can mean worse efficiency of operation or a higher risk of failure. As the methods for ascertaining the technical condition become gradually better, an approach based on the use of **technical diagnostics** became more common (see also Chapter 8). This kind of maintenance, based on the actual condition of the object, is called **on-condition maintenance**. The monitor‐ ing of the technical condition and its time development are used for the determination of the optimum time for replacement. Modern cars, rail vehicles, aircrafts, and some other items, as well as the pertinent service stations are equipped with devices enabling the appropriate diagnostics. The diagnosed objects are used best, but additional costs for the diagnostics are necessary.

In machinery, chemical, and some other industries, maintenance has become a very important branch. However, the related costs are high. Ways are therefore sought to reduce them. Currently, the so-called **reliability-centred maintenance (RCM)** is being introduced [1]; see also IEC 60300-3-11. The idea is to eliminate all maintenance works that are not necessary. Often, the failure of a non-crucial component does not endanger the operation of the whole object. With the RCM approach, the system is sometimes allowed in operation also after the failure of unimportant components, and their repairs and maintenance are done only when the more important parts approach their limit state. If the RCM strategy should be introduced, a thorough analysis is done first to reveal the consequences of all possible failures, and the maintenance plan is adjusted appropriately.

Maintenance is a wide topic that goes much beyond this concise book. The reader is referred to the literature, such as [1 - 3].

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


## **Chapter 8**

## **Diagnostics**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62361

**Abstract**

This chapter explains the tasks of technical diagnostics and basic diagnostic quantities, such as the time of use, time of operation, structural parameters, operational parameters and cost indicators. The main kinds of diagnostics are described, as well as its advantages and drawbacks.

**Keywords:** Technical diagnostics, diagnostic quantities, time of operation, structural pa‐ rameters, operational parameters, cost indicators, diagnostic systems

The reliability of many objects, machines, transport means, various appliances, or processes can be increased and operational costs can be reduced by the application of diagnostics. This is a branch of technology dealing with the methods and means for ascertaining the condi‐ tion of investigated objects. Even the "reliability" of humans can be increased and the life can be prolonged in some cases if the health condition is diagnosed (e.g. by cancer screening of people achieving a certain age or by measuring specific body parameters of patients in hospitals). In this chapter, only the basic diagnostic terms will be explained briefly, with an emphasis on technical objects. However, the applicability is much more general.

The main tasks of **technical diagnostics** are as follows:


The **technical condition** of an object is the condition characterizing its ability to fulfill the demanded functions (under assumed conditions). It can be described by diagnostic quantities.

The **diagnostic quantity** (or parameter) is a quantity that carries information on technical condition and can be used for diagnostics. Various quantities can serve for diagnosing. The basic kinds of diagnostic quantities are as follows:

**1.** Time of use

It is the total (calendar) time of operation and breaks. This quantity yields certain information on the corrosion or aging but not about wear. For example, if one wants to buy a second-hand car, the first information, which is looked at, is its age.

	- **a.** The time necessary for doing some work (the time for which the construction was loaded or working; also the number of loading cycles). This quantity, however, does not sufficiently reflect the variable operating regime and loads. This makes no problem for objects with monotonic conditions of operation.
	- **b.** The amount of the performed work (driven distance, amount of consumed fuel, area of the harvested field, number of machined pieces, product of the mileage, weight of cargo transported by a truck, etc.). The buyer of a second-hand car is also interested in the total mileage the car has driven.

Parameters that directly characterize the technical condition or extent of damage of individual components (wear of the cylinders in an engine or of other components; e.g. tires, size of cracks, magnitude of deformations, the clearance between the shaft and bearing, the value of electrical resistance, decrease of thickness of load-carrying parts due to corrosion or wear, and amount and condition of the lubricating oil).

These quantities have very high informative value. A drawback is that they often can be determined only on a disassembled object.

**4.** Operational parameters

They are derived from structural parameters and express the properties of the diagnosed object or illustrate its changed technical condition. Examples are vibrations, temperatures, fuel consumption, noise, and efficiency. They characterize the technical condition directly and can be monitored without disassembling. As such, they are sometimes measured by the user, but sometimes during special tests when the structure is exposed to specific loads (e.g. load tests of a bridge). A drawback is that operational parameters can be measured only on objects able of operation.

**5.** Cost indicators

The changes of technical condition lead to the changes of some components of operational costs (e.g. those for fuel consumption). Therefore, the following quantities are sometimes monitored: total cumulative costs *C*(*t*), average unit costs *C*1(*t*) = *C*(*t*)/*t*, and instantaneous unit costs d*C*1(*t*)/d*t*. They can be used for the determination of the cost-based optimum instant for renewal, as it will be described in Chapter 17.

The advantage, as well as disadvantage, of cost indicators is that they depend not only on the technical condition of the object, but also on the current level of prices.

### **Application of technical diagnostics**

Diagnostics can be carried out in several ways. The simplest is **manual diagnostics**, either visual or with the use of universal devices (e.g. observation of the tires of a car, inspection of the condition of a certain object, such as a building, bridge, or dam; measurement of defor‐ mations, either permanent or under certain load, strain gauge measurement of stresses, measurement of vibrations or noise). Semiautomatic diagnostics needs special devices, but it works according to a fixed regime, which can also be controlled by a technician. Automatic diagnostics is performed by a special system, where a computer controls the testing of all important parameters and functions. The system can work in an adaptive manner: it selects (and stores) only the important quantities to be measured, with respect to the course and results of the ongoing diagnostic process.

**Diagnostic systems** can be on-board or station-type. Stationary systems are used for various objects and are either stable (immovable) or mobile. Stable systems are used in service stations for technical control (e.g. analysis of exhaust gases or adjustment of lights in cars). A mobile system is installed in a vehicle (e.g. in a van, a special wagon or another transportable appliance for measurements on railways). For example, aircrafts, locomotives or cars have today often installed sensors for the demanded quantities. Then, a measuring vehicle comes and its devices are connected with them via the plugs.

On-board diagnostic systems are installed in the diagnosed object and measure during its usual operation (e.g. the amount of fuel or temperature of bearings). Often they are only oriented at certain parameters (e.g. dynamic response of a wagon bogie during a ride).

#### **Advantages of diagnostics**


#### **Drawbacks of diagnostics**

The use of diagnostics increases the total costs. Diagnostics is useful, but not for any costs. One generally aims at the minimum sum of the costs for diagnostics and the losses caused by insufficient diagnosing, and a compromise is often necessary.

More about diagnostics can be found in the literature, for example [1, 2]. Some books deal with diagnostics from the point of view of data and information processing; others are devoted to the instruments for diagnostics or to the application of diagnostics in special branches, such as machines, electrical appliances or electricity supply, and vehicles, and to the use of diag‐ nostics in medicine.

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


## **Chapter 9**

## **Failure Analysis**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62362

#### **Abstract**

Two approaches to failure analysis are explained: analysis of individual failures and stat‐ istical analysis. Various criteria for failure sorting and classification are presented, as well as the main causes and mechanisms of failures. The text is accompanied by figures with characteristic fracture patterns. The chapter is complemented by an example of computer aided sorting of failures in railway driving vehicles.

**Keywords:** Analysis, failure, statistics, Pareto diagram, failure mechanism, computer aid‐ ed analysis, railway vehicles

The analysis of failures is very important for revealing the cause of a particular failure, for taking appropriate measures to avoid similar failures in the future, and for improving simi‐ lar products or processes. Historically, failure analysis has also contributed to the creation of new disciplines of mechanics and other branches, and to the better design and reliability of many products.

Two basic kinds of failure analysis can be distinguished: analysis of individual failures and statistical analysis. However, failures can also be sorted according to other criteria.

## **1. Analysis of individual failures**

This kind of analysis aims at finding the cause of particular failures. Generally, it has three sources: appearance of the failed object, history of the accident, and information on the loads, material properties, and conditions of the service or operation.

An observation of the appearance of the failed item can often reveal the fracture origin (Figs. 1 and 2) and the internal cause of the failure (e.g. a material defect). Fracture mechanics can

help in the determination of the magnitude of stresses and forces acting at the critical place at the instant of fracture. The appearance of the fracture surface (Fig. 1) and the crack trajectory (Fig. 2) can inform about the time course of the fracture process and also about the characteristic failure mode (brittle fracture or fatigue fracture) and the kinds of acting stresses (e.g. shear stress leading to torsion fracture; Fig. 2g). Some details are visible with the naked eye, and some need an electron microscope. It is important to create a detailed photographic docu‐ mentation or, at least, a thorough description of the failed object.

**Figure 1.** Fracture surface of a steel shaft broken due to fatigue. The arrow indicates the origin of the fatigue process and direction of crack propagation.

Failure analysis uses the time course of the accident and the situation before it, the detailed history of operation, and the conditions of use of the object. The analysis can lean on records from operation (time course of pressures, temperatures, other loads acting on the object, the conditions of environment and the personnel). The sources are records of measuring devi‐ ces, logbooks, and protocols from inspections.

Often, a computer analysis of the stresses acting in the object is done or also the analysis of material properties, including the mechanical testing of specimens taken out from the critical parts (tensile test, fatigue tests, test of notch resistance or fracture toughness, etc.).

Generally, this kind of analysis can result in measures for the prevention of similar failures in similar components or structures in the future. Well-known are extensive (and expensive) analyses done after aircraft accidents, but thorough analyses became common after every accident with critical consequences.

**Figure 21.** Examples of crack trajectories. *a* – *f*: in glass windows (adapted from [12]). *a* – load perpendicular to the plate, *b* – as (*a*) but more intensive load), *c* – detonation in the room, *d* – shot through the plate, *e* – the pane was twisted, *f* – **Figure 2.** Examples of crack trajectories. a – f: cracks in glass panes (adapted from [1]). a – force acting at the left edge; b – pressure acting over all the area; c – like b, but more intensive load; d – thermal stress; e – fracture due to the detona‐ tion in a room; f – a shot through the plate, g – a shaft of brittle material, broken by twisting.

fracture due to thermal stress, *g* – a shaft of brittle material, broken by twisting.

Some rules for failure analysis are also summarized in [1, 2]. Examples of many failures and their analysis can be found in the literature [2 - 6]. Interesting also is the TV series "Seconds from disaster" or the analyses of aircraft accidents observable via YouTube.

## **2. Statistical failure analysis**

This kind of analysis works with a high number of failures of certain kind (e.g. failures of bridges, gear boxes or buildings, as well as vehicle collisions in the analysis of traffic accidents, interruptions of electricity supply, or death cases in a hospital). It uses the records on accidents and failures and records from service stations or repair workshops. The failures can be sorted according to various criteria, such as the kind of the failed object, the place or time of the occurrence, or the cause.

Statistical analysis needs a high number of values and the knowledge of statistical methods. Useful is the software able to sort the data according to various criteria and to perform statistical tests. It enables one to distinguish rare failures from systematic ones and helps in finding the common cause of some failures. Compared to the analysis of individual failures, statistical analysis can reveal hidden relations and reasons for some kinds of failures. As a consequence, it can help to introduce system measures for reliability enhancement, such as 100% control of welds or bought components, introduction of certain regulations, such as the demand for special qualification for some kinds of work, fire regulations, standards for building of metallic structures, prohibition of building in the area endangered by flooding or avalanches, or prescription of preventive inspections in health care.

## **3. Sorting of failures**

Failures can be sorted according to various criteria. Understanding the character and cause of a particular failure helps in deciding what means should be used to avoid such failures in the future or to reduce their consequences. In this paragraph, sorting according to the character of failures is shown first followed by sorting according to various criteria.

A failure can be:


The inclusion of the investigated failure into the proper group facilitates the selection of an appropriate strategy for avoiding similar failures in the future.

Failures can further be sorted according to


The individual criteria are discussed further in detail.

**Kind of the object** (component or appliance). In civil or mechanical engineering, it is usual to record, study, and analyze the failures of bridges, gear boxes, fans or pumps of certain kind; failures of traction rail vehicles, cars or airplanes of brand "xxx" or type "yyy", failures of brakes, failures of electric or hydraulic appliances, etc. In medicine, one can observe and investigate diseases (or deaths) of children or adults, some kind of disease, etc. In traffic, accidents of certain vehicles, such as buses, motorcycles, or trucks, can be studied. This topic will also be addressed at the end of this chapter.

**Severity and frequency of occurrence**. Failures can be sorted according to the consequences (e.g. "insignificant – minor – mean – serious – critical – fatal"). (Other classifications are possible as well.) The significance of a certain kind of failures can be evaluated similarly to the FMEA analysis, as explained in Chapter 12. In addition to the severity of a particular kind of failure, it is also possible to consider its frequency or probability of occurrence ("how often it hap‐ pens?"), to assign weights to them, and to form a common criterion "consequences × frequen‐ cy". Generally, and for civil engineering structures in particular, it is reasonable to distinguish the failures of serviceability and the failures of load-carrying capacity.

**Time of operation until failure**. This quantity can be measured in hours or seconds, kilome‐ ters, number of pieces processed until the cutting tool becomes blunt, etc. The distribution of a large amount of these values, plotted along the time axis, enables one to distinguish early failures and wear-out failures, etc. It is also important for the determination of basic reliability characteristics, such as MTTF and MTBF, or for planning maintenance and renewal.

**Time of occurrence**. The occurrence of failures can sometimes depend on time [e.g. on the daytime (hour), year season (influence of weather), or even the day in a week]. For example, many years ago, the ironic term "Monday car" was used in the United States because of a much higher failure rate of cars assembled on that day, perhaps the aftermath of the weekend.

**Stage in the lifetime of the object**. Generally, three stages exist, where failure occurs or can be initiated:


The individual stages can become sources of different causes and kinds of failures (e.g. "child diseases" appearing soon after putting the object into operation or failures due to wear and aging after a long time of service). The knowledge of the typical features of failures in the individual stages helps in the choice of a proper strategy for improvement.

**Place of origin**. Three examples can be given: the fifth mould in a multisectional machine for glass bottle-making, various parts of a road in the case of traffic accidents, or the operator who has the highest rate of failures. The knowledge of the place where the failures occur most often helps one to better identify the reason for the failures. Also, it reduces the time necessary for repair.

**Failure cause**. Basically, the cause of a failure can be internal or external. An **internal cause** means that the component was "weak" for the assumed load. Such failures can be avoided by better design, dimensioning, or manufacture. Failures due to **external cause** are those caused by overloading, collision with another object or due to another failure. The efficient way to mitigate them needs the knowledge of failure cause in the particular case.

#### **Principal causes of failures**

	- **•** Ignorance, insufficient knowledge;
	- **•** Negligence, disorderliness, laziness, carelessness;
	- **•** Errors, inattention, absent-mindedness, bad psychical condition;
	- **•** Unsubstantiated reliance in other people;
	- **•** Excessive thriftiness, greed;
	- **•** Malicious intention.

#### **Mechanisms and causes of mechanical failures**

	- **•** ductile (with well observable permanent deformations; it occurs due to the overloading of components from tough materials);
	- **•** brittle (without observable permanent deformations; it occurs in brittle materials, at low temperatures, dynamic load, impact, notches, cracks);
	- **•** fatigue [with typical appearance (Fig. 1); it occurs under harmonic or periodic loading or even under constant load];

Also other criteria for sorting can be used. The high number of available failure data enables their sorting and analysis according to several criteria simultaneously. Such analysis can re‐ veal relationships and influences unknown as yet.

### **4. Classification of failures**

Especially two aspects of failures are important: severity and frequency or probability of occurrence. In the case of a high number of various kinds of failures, a **Pareto analysis** is very informative. In this analysis, kinds of failures are rank-ordered according to the frequency of occurrence. The pertinent histogram shows at first sight the failures that occur most often and the rare ones (Fig. 3). The typical shape of a Pareto diagram has led to the saying "20% of all causes are responsible for 80% of all troubles, and 80% of causes are responsible for 20% of problems". However, this is only a saying and not a law of nature.

Figure 22. Pareto diagram (an example). **Figure 3.** Pareto diagram (an example).

Also, the knowledge of relative frequencies of occurrence, corresponding to probabilities, is important. One should keep in mind that a product with 100 failures per 1million pieces is much more reliable than another product with "only" 10 failures, but per 1000 pieces.

The significance of failures can be evaluated according to the consequences and to the frequency of occurrence. The overall importance is evaluated with respect to both criteria, as described in the "Severity and frequency of occurrence" paragraph and in Chapter 12.

## **5. Computer-aided failure analysis and record keeping**

Many items today are very complex and can fail from various reasons. Manufacturers or users of products such as cars, locomotives, pumps, etc., often produce or operate many pieces, so that the number of various failures can be very high. A consequent reliability analysis needs a system for the classification of failures (and also a system for the evidence of times between failures and times to repair, a system for data collection, and tools for statistical data analysis). Here, a simple system for the classification of failures in railway driving vehicles [7] will be shown as an example. This system classifies the failures with respect to the (1) kind of vehicle, (2) structural group in the vehicle, (3) subassemblies, and (4) specification of the failure. Each of these four categories is again divided into several subcategories. The code for each failure thus represents a four-digit number of the form ABCD, whose digits specify the situation in the individual subcategories, for example:


...

7 - Diesel locomotives


...

C. Subassemblies

The number here characterizes the specific properties of individual groups, typical for some kinds of vehicles, purpose, and kind of use (feeding, power transmission, shifting, etc.). The number of subassemblies can be different in individual groups.


For example, the code 2312 in the above "ABCD" system means electric locomotive for alternating current (2), failure of mechanical equipment (3) at traction part (1), and the pertinent component had to be replaced by a new one (2).

Various systems exist (also as a part of maintenance management systems) or can be created according to the specific requirements of the user (e.g. a system for evidence and the classifi‐ cation of failures combined with the tools for cost analysis).

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


**Chapter 10**

## **Approaches to Ensuring the Reliability and Safety**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62363

#### **Abstract**

This chapter presents various ways to reliability ensuring. The philosophies fail-safe, safe-life and damage-tolerant are explained briefly, as well as deterministic and probabil‐ istic approach. Then, the important methods are explained, such as allowable stress, use of standards, load and resistance factor design, probabilistic approach and proof testing.

**Keywords:** Reliability, safety, design, fail-safe, safe-life, damage-tolerant design, allowa‐ ble stress, codes, load and resistance factor design, proof testing, probability

From a reliability point of view, every technical object can be either in a serviceable state or in a failed state.The boundary between both is the **limit state**. Some objects fail suddenly. The condition of other objects changes gradually (e.g. due to wear or corrosion). They are able to fulfill their purpose for a long time, though in a limited extent (worse technical pa‐ rameters or lower safety); the failure is partial. However, if certain parameter exceeds a specified **limit value**, the object either becomes destroyed or unfit for further use; the failure is complete.

In civil and mechanical engineering, two kinds of limit states are distinguished: limit state of load-carrying capacity and that of serviceability (usability). Exceeding the limit state of loadcarrying capacity leads to the destruction of the object, often with fatal consequences. If the limit state of usability is exceeded (e.g. large deformations), the object cannot fulfill its function properly, but the consequences are not fatal. Correspondingly, the demanded degrees of safety of an object can differ depending on the kind of the limit state and consequences of its exceeding.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

The boundary between serviceable and failed state can be described by one number (e.g. the stress value, whose exceeding means fracture of the component) or by some analytical expression (e.g. the relationship between the critical load for buckling of a compressed column and its slenderness). Also, the condition of the fracture of a shaft loaded simultaneously by twisting and bending depends on the ratio of both load components; see also Figure 1.

The **basic condition of reliability and safety** says: "The resistance to load effects must be higher than these effects".

## **1. Basic philosophies for ensuring safety in the design stage**

The two most common approaches are (1) Fail-safe and (2) Safe-life. A special case of the safelife approach is the (3) Damage-tolerant design.

The **Fail-safe approach** understands that the important parts of the object can fail and tries to do everything that such failure will not be fatal for the whole object. This is mostly achieved using redundant components or circuits. **Redundancy** can be **active**, with all parts loaded or working simultaneously, or **standby**, where the redundant component is switched-on only if the principal component has failed; see also Chapter 5.

The **Safe-life approach** tries to do everything to ensure that the component or object can sustain all expectable loads during the assumed life. Basically, it means sufficient dimensioning (the knowledge of all possible loads is important) and the use of materials that do not deteriorate or whose rate of degradation is acceptably low. The dimensioning for "infinite" life of the items exposed to fatigue or ample dimensioning of parts exposed to creep also belongs here.

In contrast to the previous case, which assumed a "perfect" object at the beginning of service, the **damage-tolerant approach** assumes that the component contains some defects (e.g. cracks), which will gradually grow during the operation. The components are dimensioned so that these defects cannot attain critical size during the expected lifetime. The knowledge of the defect growth velocity as a function of load is necessary. This approach uses fracture mechan‐ ics, but it is similar to the safe-life approach.

The determination of the time to failure, or dimensioning of a component for the demanded life was explained in Chapter 6.

## **2. Deterministic versus probabilistic approach**

The procedures for the design and check of reliability depend on whether random influences are considered. Some quantities can be considered as deterministic (e.g. the number of teeth in gears or the distance of bearings in a gearbox). Other quantities, such as the strength of material, loads, or action of environment (e.g. wind velocity), have random character, with values varying in some intervals. There are also other sources of uncertainties (e.g. the computational methods and models characterizing the limit state). Historically, various approaches for ensuring reliability have been developed. They can be divided into two groups: deterministic and probabilistic.

In the deterministic approach, every quantity (load, strength, etc.) is described by one number of constant value. The design in this case is simple, as well as the check of safety, as it will be shown in this section.

The probabilistic approach is based on the fact that some quantities (e.g. load or strength) vary due to random reasons and cannot be sufficiently described by one number only. This approach works with the probability distributions of the pertinent quantities and determines the probability of failure or probability of exceeding the allowable values. The component or structure is considered safe (or reliable) if the probability of failure is lower than certain allowable value.

The probabilistic approach can give more accurate answers but is more demanding than the deterministic approach. It needs a basic knowledge of the probability theory, some computer tools for the work with random quantities (even Excel is sufficient in simple cases), and, of course, the knowledge of probability distributions of the pertinent random variables. If their types and parameters or histograms are not known, this kind of analysis cannot be made. One must also be sure that the statistical characteristics (of the materials or parts) used in the design will correspond to reality. The probabilistic methods for reliability assessment are described in Part 2 of this book.

## **3. Design using allowable stress**

This is a traditional approach in mechanical engineering. The safety condition is

"The maximum operating stress must not exceed the allowable stress".

The allowable stress is obtained by dividing the nominal strength of the material *σ*n,s (ultimate strength or yield stress; do not confuse it with standard deviation) by the so-called **factor of safety** *k***S**:

$$
\sigma\_{\text{allow}} = \sigma\_{\text{n,s}} / k\_{\text{s}}.\tag{1}
$$

The meaning of this factor could be interpreted roughly as "failure would occur at *k*S–times higher stress". However, the situation is more complex. The value of safety factor is chosen to "cover" all uncertainties related to the material, the component, and the conditions of opera‐ tion. Therefore, this approach is also somehow related to probability, but only very loosely. For example, the allowable stress is such value that practically all pieces of this material will be stronger. For metallic parts, *σ*allow is calculated so that the "minimum" strength, given in material data sheets as 5% quantile (i.e. corresponding to 5% probability that weaker pieces can appear), is divided by a factor of safety, which is chosen, in accordance with the years of experience, so that the probability that *σ*allow will be lower than the maximum acting stress is negligibly low.

The factor of safety *k*S is a number usually between 1 and 3 and, in some cases even more. Generally, the higher the uncertainties, the higher the *k*S. Its values are based on experience; they can be found in the literature; manufacturers use their own well-proven values. The safety factor is often given as a single number (e.g. 2.5), but sometimes it is calculated as the product of several partial coefficients; for example,

$$K\_{\mathbb{S}} = \mathbf{s}\_1 \times \mathbf{s}\_2 \times \mathbf{s}\_3 \times \mathbf{s}\_4 \times \mathbf{s}\_5 \times \mathbf{s}\_{6'} \tag{2}$$

where s1 characterizes the importance of the component (low or high), s2 – the technology of manufacturing, s3 – the

issues are important in dimensioning. A misunderstanding of the term "safety" can lead to unnecessarily high costs.

matter of agreement between the manufacturer and the customer. The advantage of standards is that they are usually created as a result of cooperation of many specialists, often from various countries, and are based on a thorough analysis and experimental verification. They are updated from time to time to reflect the progress in the state of knowledge. Generally, standards represent an efficient way for obtaining safe items and constructions. The advantage for the design engineer, manufacturer or builder is that the design and calculations according to proven formulas and procedures in standards are straightforward and simple. Moreover, if the structure fails and the designer or builder can prove that he has done everything according to the standards, he cannot be prosecuted. On the contrary, the design according to codes is somewhat conservative, and the standards do not solve all eventualities (e.g. certain combinations of loads). In such

This approach is common in the design of civil engineering structures, where it is used in some standards (e.g. for steel

where *s*<sup>1</sup> characterizes the importance of the component (low or high), *s*<sup>2</sup> – the technology of manufacturing, *s*<sup>3</sup> – the material testing, *s*<sup>4</sup> – the way of strength calculations, *s*<sup>5</sup> – the quality of manufacturing (e.g. casting or machining by turning or grinding), and *s*6 characterizes the possible overloading; *s*<sup>j</sup> is closer to 1 for smaller uncertainty in the *j*-th factor. This approach enables the consideration of the variability and level of knowledge about the individual factors. material testing, s4 – the way of strength calculations, s5 – the quality of manufacturing (e.g. casting or machining by turning or grinding), and s6 characterizes the possible overloading; sj is closer to 1 for smaller uncertainty in the j-th factor. This approach enables the consideration of the variability and level of knowledge about the individual factors. Two values of safety factor should be distinguished. The first is the demanded or target value, which is used in the design stage. The other is the value corresponding to the actual situation. Sometimes, the dimensions of the cross-section

Two values of safety factor should be distinguished. The first is the demanded or target value, which is used in the design stage. The other is the value corresponding to the actual situation. Sometimes, the dimensions of the cross-section (e.g. the wall thickness) should satisfy various criteria, and another criterion than strength can be decisive (e.g. thermal resistance). In such case, the wall will be thicker, so that the actual safety against overloading will also be higher than that originally demanded in design. (e.g. the wall thickness) should satisfy various criteria, and another criterion than strength can be decisive (e.g. thermal resistance). In such case, the wall will be thicker, so that the actual safety against overloading will also be higher than that originally demanded in design. When the safety against overload is to be determined, one should consider the actual "path" of overloading leading to the collapse (Fig. 23). Sometimes, several loads act simultaneously, but some of them are constant, such as the dead weight of a bridge, and only some can increase, for example the traffic load. The actual safety against overloading by traffic is here obtained as the ratio of the actual traffic load at the instant of collapse to the nominal traffic load. These

Figure 1. Limit curve, separating safe and failure regions. The failure depends on two quantities: X1 and X2. Curves a,b,c show the various paths of overloading. **Figure 1.** Limit curve, separating safe and failure regions. The failure depends on two quantities: X1 and X2. Curves a,b,c show the various paths of overloading.

4. Design according to standards Standards are often used for the design and dimensioning of bridges, cranes, pressure vessels, and many metal or other important constructions. Also, vehicles, aircrafts, and electric appliances are often designed using various standards, such as ISO, ASME, or Euro-codes; see Appendix A.2. In some cases, their use is compulsory, but sometimes it is only a When the safety against overload is to be determined, one should consider the actual "path" of overloading leading to the collapse (Fig. 1). Sometimes, several loads act simultaneously, but some of them are constant, such as the dead weight of a bridge, and only some can increase,

"The design value of load effect must not exceed the design value of the resistance."

cases, other approaches can be more appropriate.

5. Load and Resistance Factor Design (LRFD)

constructions) [1]. The safety condition, in general, is

for example the traffic load. The actual safety against overloading by traffic is here obtained as the ratio of the actual traffic load at the instant of collapse to the nominal traffic load. These issues are important in dimensioning. A misunderstanding of the term "safety" can lead to unnecessarily high costs.

## **4. Design according to standards**

Standards are often used for the design and dimensioning of bridges, cranes, pressure vessels, and many metal or other important constructions. Also, vehicles, aircrafts, and electric appliances are often designed using various standards, such as ISO, ASME, or Euro-codes; see Appendix 2. In some cases, their use is compulsory, but sometimes it is only a matter of agreement between the manufacturer and the customer. The advantage of standards is that they are usually created as a result of cooperation of many specialists, often from various countries, and are based on a thorough analysis and experimental verification. They are updated from time to time to reflect the progress in the state of knowledge. Generally, standards represent an efficient way for obtaining safe items and constructions. The advantage for the design engineer, manufacturer or builder is that the design and calculations according to proven formulas and procedures in standards are straightforward and simple. Moreover, if the structure fails and the designer or builder can prove that he has done everything according to the standards, he cannot be prosecuted. On the contrary, the design according to codes is somewhat conservative, and the standards do not solve all eventualities (e.g. certain combinations of loads). In such cases, other approaches can be more appropriate.

## **5. Load and Resistance Factor Design (LRFD)**

This approach is common in the design of civil engineering structures, where it is used in some standards, e.g. for steel constructions [1]. The safety condition, in general, is

"The design value of load effect must not exceed the design value of the resistance."

The term "design value" means the value assumed in design (e.g. recommended or prescribed in a standard), because the actual values are not known exactly yet.

For a component or structure, this condition can be written as

$$
\gamma\_n \mathbf{S}\_d \le \mathbf{R}\_d;\tag{3}
$$

*S*d is the effect of maximum load, *R*d is the resistance (e.g. the load-carrying capacity or the allowable deformation), and *γ*n is the factor characterizing the purpose of the object. The subscript **d** means "design" and denotes the value considered in design; the pertinent standards usually show how the design value is related to the mean or nominal value.

In this approach, the uncertainties are divided into two groups: those related to the load and those related to the material or components. The design value of the load effect *F*d is obtained as the product of characteristic load *F*k and partial safety factor *γ*F for the load,

$$F\_d = \mathcal{Y}\_F F\_k. \tag{4}$$

The design value of the resistance (or strength) *f*d is obtained as the characteristic value of strength *f*k divided by the partial factor of reliability of the material *γ*m; for example

$$f\_d = f\_k \wr \gamma\_m;\tag{5}$$

*f*<sup>k</sup> can be either the characteristic value of the yield strength or the ultimate strength (e.g. 5% quantile). The characteristic values and the partial safety factors are given in pertinent standards (e.g. [1]).

As we can see, the actual values of loads and properties were replaced in the LRFD approach by the design values given in codes. This approach is reasonably conservative and the procedures are arranged so that they enable fast control in standard cases.

Note: Load and resistance are also used in the determination of failure probability in the socalled load-resistance interference method, as described in Chapter 14.

## **6. Probabilistic approach**

If probabilistic approach to reliability assessment is used, only general recommendations for allowable probabilities can usually be found instead of definite obligatory values. Generally, the allowable probability of a failure should be related closely to its consequences. Some idea about these probabilities can be obtained from two examples. The first one is from aviation technology. Usually, 1:10,000 is the acceptable probability of critical failure for an aircraft at the end of its service life, just before decommissioning. The acceptable probability of failure at the time of its putting into service must be several orders lower.

The second example is the Eurocode for metal constructions [1], which gives the following design probabilities of failure for the limit states of load-carrying capacity and usability. They are differentiated according to the assumed level of reliability or safety, as given in Table 1.

The above numbers correspond to the reliability of the whole object. If it consists of many components, the failure probabilities of single elements must be appropriately lower (see Chapter 6), of the order 10–5 to 10–10. Similarly, if reliability is assessed via failure rate, the allowable failure rates of elements must be very low. In such cases, specific problems arise in design. First, it can be difficult to prove very high reliability of the pertinent item, because the number of tested samples must be high and the duration of tests are very long. For example, the failure rate of 10–6 h–1 could be (roughly) verified in a test with one component tested for 106 h or with 1,000,000 components tested for 1 h. None of these cases is practicable and a


Note: The unrounded numbers in Table 1 look rather strange. The reason is that reliability index *β* (see Chapter 14) was used originally instead of probabilities, and the relationship between *β* and *P*<sup>f</sup> is nonlinear (e.g. *P*<sup>f</sup> = 23×10–3 corresponds to reliability index *β* = 2.00; see the distribution function of standard normal distribution).

**Table 1.** Recommended design probabilities of failure *P*<sup>f</sup> [1]

compromise must be found. Another example is related to the guaranteed strength. The value of 0.001% quantile of strength, determined from three tests only, does not make great confi‐ dence. Many more tests would be better. However, this would also cost much more money. The big manufacturers of standard electric and electronic components can make extensive tests and use sophisticated techniques for testing and processing the results, as it will be mentioned in Chapter 20, so that their data are trustworthy. Often, however, the means for testing are much more limited, and the predictions are less safe. Usually, the allowable reliability is a compromise between the demands for high reliability and the money available for reliability ensuring. In some cases, the optimum reliability can be found from the condition of minimum total costs consisting of the purchase costs and the costs caused by failure. This topic is treated in Chapter 17. However, this approach cannot be used if very high reliability or safety is demanded. In some cases, the so-called **ALARP** philosophy is used, demanding that the risk should be "as low as reasonably practicable".

Generally, the demands for increasing reliability should not be unrealistically high. Useful information on the actual situation can be obtained from the statistics of failures. If the current probability of failure of a certain item is 1:102 , it will be easier to reduce it to 1:104 than 1:108 .

In some cases, quite different approaches are used for guaranteeing very high reliability. Sometimes, certain technologies or activities are prescribed or, vice versa, forbidden by law (e.g. building family houses in the areas endangered by flooding or avalanches). Another means is proof testing, as explained in the following paragraph.

## **7. Proof testing**

In these tests, all components are exposed to certain overload, so high that the "weak" parts are destroyed during the test. Destruction is complete with components of brittle materials, whereas the ductile parts are often only permanently deformed. (For example, overpressure tests common for pressure vessels belong also to proof tests.) Basically, it is sufficient if the proof-test stress or load is equal to a certain value higher than the maximum load expectable in service. In some cases, proof tests are also used for ensuring a sufficient life of components from brittle materials exposed to static fatigue due to the corrosive action of environment. This fatigue causes very slow growth of preexisting minute cracks until the critical size. The methods of fracture mechanics together with the knowledge of the velocity of subcritical crack growth under stress enable the calculation of the necessary proof-test stress guaranteeing a sufficient life of the parts that have passed the test. Such approach was used, for example, in the design of glass windows in the American orbital laboratory Skylab [2, 3]. Theoretical foundations of proof testing are explained in detail in [2, 3] and [4].

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


**Advanced Reliability**

## **Chapter 11**

## **Weibull Distribution**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62375

#### **Abstract**

Weibull distribution is very flexible in fitting empirical data, such as strength or time to failure. Several methods for the determination of parameters are described, including di‐ rect fitting using solvers available in universal programs. Also finding of parameters of exponential distribution is described. The use of Weibull distribution is illustrated on ex‐ amples.

**Keywords:** Probability, reliability, Weibull distribution, exponential distribution, deter‐ mination of parameters, least squares method, solver

A special position in reliability assessment pertains to Weibull distribution, which offers great flexibility in fitting empirical data. The distribution function (Fig. 1a) is

$$F(t) \ = 1 \ - \ \exp\left(-\left[\left(t - t\_0\right) / a\right]^b\right) \tag{1}$$

with parameters *a*, *b*, and *t*0. The scale parameter *a* is related to the values of *t* and ensures that the distribution is independent of the units of *t* (e.g. minutes or hours). The constant *b* is shape parameter. Depending on its value, Weibull distribution can approximate various, even very different shapes (Fig. 5 in Chapter 2). It is suitable for the characterization of time to failure as well as strength or load; therefore, it became popular in reliability assessment. The constant *t*<sup>0</sup> is the threshold value that corresponds to the minimum possible value and characterizes the position of the distribution on the *t*-axis. (*t* is the usual symbol for time; for other quantities, other symbols may be used.)

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**Figure 24.** Weibull distribution function F(t): a) original coordinate **Figure 1.** Weibull distribution function *F*(*t*): (a) original coordinate system, (b) transformed coordinates (Weibull prob‐ abilistic paper).

system, b) transformed coordinates (Weibull probabilistic paper).

#### **ATTENTION ! The letters "a", "b" (above) must be connected with 1. Determination of parameters in a two-parameter distribution**

The strength or time to failure cannot attain negative values, so that the threshold parameter is often assumed zero, *t*<sup>0</sup> = 0. The distribution function (1) will thus have only two parameters:

$$F(t) = 1 - \exp\left[-\left(t/a\right)^b\right].\tag{2}$$

Parameters *a* and *b* can be found easily, as the transformed data can be fitted by a straight line. Double logarithmic transformation and rearrangement change Equation (2) to

$$1\ln t = 1\ln a + \left(1/b\right)\ln\left\{\ln\left[1/\left(1-F\right)\right]\right\},\tag{3}$$

which corresponds to the equation of straight line (Fig. 1b)

**the diagram !** 

$$Y = A + BX,\tag{4}$$

where *Y* = ln *t*, *X* = ln{ln[1/(1 – *F*)]}, *A* = ln *a*, *B* = 1/*b*.

The method of linearization was very popular in the past, and it is still often used for the determination of parameters from the operation data via a special diagram, called Weibull paper (Fig. 1b). For its construction, the individual measured values *t*<sup>j</sup> and the corresponding values *F*<sup>j</sup> of the empirical distribution function are needed. The *t*<sup>j</sup> values are obtained by rankordering the *n* data from operation (e.g. times to failure) from the minimal value (*j* = 1) to maximal (*j* = *n*). The corresponding values of distribution function are calculated as

$$F\_j = j \nmid \ (n+1); \tag{5}$$

*j* is the rank number and *n* is the total number of measured values. The explanation of formula (5), common for order statistics, is simple. If we have, say, 100 values and order them from the minimal to maximal, then the probability *F* that *t* will be smaller or equal to the lowest of 100 values, *t*1, is 1:100. The probability of *t* ≤ *t*2 is 2/100, etc.; generally, *F*<sup>j</sup> = *j/n*. In Equation (5), 1 was added to the denominator because of mathematical correctness; the probability *F* that *t* will be smaller or equal *t*n must be smaller than 1, simply because if more measurements would be done, values higher than *t*n could appear. Also other formulas exist for the calculation of empirical *F*<sup>j</sup> values [e.g. *F*<sup>j</sup> = (*j* – ½)/*n*], but none can be recommended unequivocally, especially when considering the fact that bigger errors in the determination of distribution parameters can arise due to the small amount of data than due to the formula used for *F*<sup>j</sup> .

The regression constants *A*, *B* can be obtained by fitting the empirical data by a straight line (using Weibull paper or a program for curve fitting, such as "Insert Trendline" in Excel). Then, the constants in the distribution function (2) are obtained from *A* and *B* by inverse transfor‐ mation:

$$b = \mathbf{1}/B, \ a = \exp\left(A\right). \tag{6}$$

Plotting the empirical data into the coordinate system *X* = ln{ln[1/(1 – *F*)]}, *Y* = ln *t*, enables a good visual check. In the ideal case, if Equation (2) is valid, the data lie on a straight line.

### **2. Determination of parameters in a three-parameter distribution**

A two-parameter distribution is not always suitable. Sometimes, the transformed data do not lie on a straight line, or it is obvious that the distribution should have a threshold value *t*<sup>0</sup> higher than zero. In such case, the use of a two-parameter distribution as a base for dimen‐ sioning could lead to uneconomical design, and a three-parameter function (1) would be better. The parameters in this distribution can be found by the procedure for a two-parameter function if *t* in Equation (2) is replaced by the expression *t* – *t*0; the constant *t*<sup>0</sup> must be chosen in advance. For various *t*0 values, the shape of empirical distribution varies. The best *t*<sup>0</sup> value is such for which the transformed data best resemble a straight line. However, a more straightforward procedure exists.

#### **Direct determination of parameters**

The constants *a*, *b*, and *t*<sup>0</sup> can also be obtained in a simpler way without any transformation. The solution of Equation (1) for *t* gives the formula for quantiles:

$$t = t\_0 + a \left\{ \ln \left[ \mathbf{1} / \left( \mathbf{1} - F \right) \right]^{1/b} \right\}. \tag{7}$$

This equation and the least-squares method are used in search for such values of *a*, *b*, and *t*0, which minimize the sum of squared differences between the measured and the calculated values of *t*,

$$(t\_{\text{j.meas}} - t\_{\text{j.calc}})^2 = \min \, ! \, \tag{8}$$

If a suitable solver is available for such minimization (one is present also in Excel), it is then sufficient to prepare one series of measured data, *t*j,meas, and another series of the *t*j,calc values, calculated via Equation (7) for the same values of *F*j using the parameters *a*, *b*, and *t*0. Solver's command to minimize the expression (8) by changing *a*, *b*, and *t*0 will do the job. An example is shown at the end of this chapter.

Remark: Formula (7) is also suitable for the determination of a "minimum guaranteed value"(e.g. strength or time to failure) for acceptably low probability *F*.

In addition to flexibility, Weibull distribution has one more advantage. The shape parameter *b* in Equation (1) or (2) is related to the character of failures. This is well visible at the bathtub curve (Fig. 1 in Chapter 4). The values *b* < 1 are typical of decreasing failure rate *λ* and may thus indicate the period of early failures. On the contrary, *b* >1 corresponds to increasing failure rate *λ* and is typical of the period of aging or wear out. The value *b* = 1 corresponds to the constant failure rate *λ* = const, with failures from many various reasons (see Chapter 4). The exponent *b* thus can inform generally about the possible kind of failures and about the period in the life of an object even if the amount of data is not large. However, caution is necessary. If the data from a long period are fitted by Weibull distribution, failures from various reasons and stages can be mixed, and the relation of *b* to the kind of failures is not unambiguous.

Remark: Weibull distribution was proposed in 1939 by the Swedish engineer Waloddi Weibull, who studied the strength of materials, life endurance of ball bearings, and fatigue life of mechanical components and other quantities. Later, it appeared that this very useful distri‐ bution belongs to the family of extreme value distributions [1, 2]. More on Weibull distribution and its applications can be found, for example, in [3 - 5].

## **3. Exponential distribution**

Let us now look at a special and very important case. With the shape parameter *b* = 1, Weibull distribution simplifies to exponential distribution

$$F(t) = 1 - \exp\left[-\left(t/a\right)\right], \text{ or } \; F(t) \; = 1 - \exp\left[-\left(t - t\_0\right)/a\right]. \tag{9}$$

The probability density and distribution function are depicted in Fig. 5. The parameters *a* and *t*0 can be determined similarly as described above. If *t*0 = 0, the remaining parameter *a* is usually calculated from the mean time to failure, as it will be shown in Chapter 20. Typical of expo‐ nential distribution is that the standard deviation has the same or similar value as the mean.

The determination of parameters and use of Weibull and exponential distribution will be demonstrated in the following examples.

#### **Example 1**

The strength (*S*) of a new alloy was measured on seven specimens, with the following results: 203, 223, 248, 265, 290, 313, and 342 MPa. Solve the following three problems:


B. Calculate (for each case) the probability that the strength will be lower than 120 MPa.

C. Calculate (for each distribution) the "minimum guaranteed" strength such that the proba‐ bility of the actual strength being lower equals: 0.05 – 0.01 – 0.001.

#### **Solution.**

#### **Task A. Determination of distribution parameters**

**a.** Linearized two-parameter Weibull distribution. The strength values, ordered from minimum to maximum, are given in Table 1 together with the values of distribution function, calculated as *F*<sup>j</sup> = *j*/(*n* + 1), with *n* = 7; see also Fig. 2. The distribution function *F*(*t*) = 1 – exp[– (*t*/*a*) *b* ] was transformed to linear form; see Equation (4) and the following formulas. The transformed values are in the columns Xj and Yj . Note: The values of distribution function are fixed (deterministic), as they correspond to the number of measured values, whereas the strengths exhibit random variations. Therefore, *F* is the independent variable and *t* is the dependent variable.


Subscript c means calculated; lin2 – linearized, two parameters; sol2 – nonlinearized, Solver, two parameters; sol3 – nonlinearized, Solver, three parameters

**Table 1.** Measured values *S*(*F*<sup>j</sup> ) and those calculated using three methods.

The transformed values were fitted by linear function (4); see columns *X*<sup>j</sup> and *Y*<sup>j</sup> in Table 2. The regression constants were *A* = 5.673642 and *B* = 0.194844. The inverse transformation has given *a* = exp *A* = 291.0928 and *b* = 1/*B* = 5.132311, so that the two-parameter distribution function is *F*(*t*) = 1 – exp[– (*S*/291.0928)5.13231]. The corresponding calculated values *S*<sup>j</sup> are in column *S*j,c,lin2 and depicted by a curve in Fig. 2.


#### **Task B. Determination of probability S ≤ 120 MPa**

The probabilities are as follows:

**a.** 0.010533, (b) 0.012069, and (c) 0; the minimum possible value is *t*0 = 133.8 MPa.

#### **Task C. Determination of guaranteed strength**


The results are in the following table.

Note the big difference between the two- and three-parameter distributions for very low failure probabilities (cf. also Fig. 2). According to the three-parameter model, the minimum (thresh‐ old) strength is 133.8 MPa.

**Figure 2.** Measured values of strength (*S*) and approximate distribution functions (*F*) for various approximations in Example 1. Thick curve – case c, three-parameter function; thin curves – cases a, b, two-parameter curves.

#### **Example 2**

Eight components (*n* = 8) were tested until failure. The failures occurred at the following times *t*j : 65, 75, 90, 120, 250, 510, 520, and 760 h. Calculate the mean time to failure and failure rate. Calculate also the standard deviation, so that you can assess whether exponential distribution may be used for the time to failure.

#### **Solution.**

*MTTF* = ∑*t*<sup>j</sup> /*n* = (65+75+90+120+250+510+520+760)/8 = 298.750 h.

The sample standard deviation [Equation (4) in Chapter 2] is *σ*MTTF = 264.288 h. This is reason‐ ably close to the sample mean, and an exponential distribution may be assumed. For this case, failure rate *λ* = 1/*MTTF* = 1/298.75 = 0.003347 h–1. The determination of confidence interval for *λ* will be demonstrated in a similar case in Chapter 20.

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


## **Chapter 12**

## **Failure Modes and Effects Analysis**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62364

#### **Abstract**

Failure Mode and Effect Analysis (FMEA) is a simple procedure for systematic revealing of possible failures of structures or processes as early as in the design stage. The main steps of this procedure are explained. Classification of severity, frequency and possibility of early detection of the individual failure modes is shown, as well as the calculation of the risk priority number, which serves for finding the most dangerous causes of failures. The application of FMEA is shown on an example.

**Keywords:** Failure, failure mode, severity, frequency of occurrence, risk, FMEA

Until now, probabilistic methods were described. In this chapter, a nonprobabilistic method will be explained, which can increase reliability in a very effective way.

**Failure modes and effects analysis** (**FMEA**) is a simple procedure for systematic revealing possible failures of a structure or process as early as in the design or project stage and avoiding or mitigating them. The basic idea is that the prevention of failures is better and cheaper than their later detection and repairs. The term failure means here any loss of the ability of the object to perform its functions properly.

FMEA was used for the first time in the Apollo project. Today, it is compulsory in the design of aircrafts; very often it is used in the automotive industry and gradually spreads into other branches. Its use is recommended by quality standards such as ISO 9000. In the past, good designers and builders used a similar approach intuitively. The advantage of FMEA is the fact that it is a systematic procedure guaranteeing that everything will be done to prevent expect‐ able failures of a component, structure, or process. A very important thing is that FMEA is not a matter of one expert only, but uses the knowledge and experience of people from various branches. Their cooperation can have synergic effects and bring further improvements into the design.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Failure modes and effects analysis can be done in 10 steps.

## **1. Formulation of the problem and establishing a FMEA team**

FMEA can be done for a product (component or structure) or a process. A special team is usually formed for the pertinent task. The team should consist of designers, technologists, somebody responsible for the manufacture or building, and somebody representing the future user. His practical experience with the operation and maintenance of similar objects is invaluable.

Every FMEA team has its leader, either appointed by the management or selected by the team itself. The role of the leader is to organize and facilitate the FMEA sessions, to ensure the resources for the work, and to help the team to reach the consensus and to progress toward the completion of FMEA.

Before starting the analysis, it is necessary to define well its scope, the relation of the team to the management, and its competences and responsibility. It is also necessary to set the budget for the analysis as well as the deadline. All this, including the names of the team leader and members and the way of communication with the management should be written down in a document.

## **2. Review of the construction or process**

The purpose for a product FMEA is to reveal problems that could result in safety hazards, product malfunctions, or a shortened life. The key question is "How can the product fail?"

The process FMEA should uncover the problems related to the manufacture, building, or assembly of the product. It is helpful to consider the five elements of a process: **people**, **materials**, **equipment**, **methods**, and **environment**. With these elements in mind, the key question is "How can the process failure affect the product, processing efficiency, or safety?"

During the first session, the members of the team should be sure that they understand all necessary details of the construction or process and their interrelations. To ensure it, every member should get in advance engineering drawings and documents of the product or a detailed flowchart of the process or operation. It is helpful to have an expert on the construction or process, who will be able to answer any questions the team might have.

## **3. Revealing of all potential failure modes**

Once the team members understand the product (or process), they can begin thinking about the potential failure modes that can affect the product quality, reliability and safety during its useful life. This should be done during one or more sessions organized according to the rules of brainstorming; information on previous failures is also useful.

In such meetings, no idea or comments should be rejected. However, some people personally involved in the design might feel offended by somebody's finding the faults and mistakes. The role of the team leader is to facilitate the process, enhance the people to bring ideas and comments, and mitigate some negative psychological effects.

## **4. Listing of potential effects of each failure mode**

Once the possible failure modes have been identified, they are written down into a special form (Fig. 1). Then, the FMEA team reviews each failure mode and identifies the potential effects of the failure should it occur. For every failure mode, there may be one or more effects. Again, everything is written into the FMEA form. This is very important, as this information is the base for assigning risk ratings to each failure mode. It is recommended to use the if-then thinking: "If this occurs, what are the consequences?" The form (Fig. 1) helps in taking measures for the elimination of some failures or reduction of their severity.

**Figure 1.** Failure modes and effects analysis worksheet. (In real worksheets, both parts are printed together.)

(In real worksheets both parts are printed together.)

**Figure 26.** Failure Modes and Effects Analysis (FMEA) worksheet.

## **5. Assigning severity, occurrence, and detection ratings for each effect**

Each effect is assigned three numbers characterizing its severity, frequency, and probability of early detection, and these numbers are written into the left part of the form (upper part of Fig. 1). Often, each of the ratings is based on a 10-point scale, with 1 being the best case and 10 the worst case; for example,

Severity rating scale:

10 – consequences dangerously high (failure could injure or kill); 8 – consequences very serious (failure renders the object unfit for use); 6 – moderate (failure results in partial malfunction); 4 – very low (there is minor performance loss); 3 – minor (the effects could be overcome without performance loss); 1 – none (failure would not be noticeable).

Occurrence rating scale:

10 – very high probability of occurrence [failure is (almost) inevitable]; 8 – high probability (repeated failures); 6 – moderate probability (occasional failures); 3 – low (relatively few failures); 1 – negligible (failure is unlikely).

Detectability rating scale:

10 – probability of detection (POD) is zero (the object is not inspected or the defect is not detectable); 8 – POD is low (the signs of failure are not easily detectable); 3 – POD is high (the signs of failure are easily detectable, the objects are 100% controlled; 1 – detection of approach‐ ing failure is certain [the emerging defect is obvious or there is 100% automatic control (regular inspections, if necessary)].

There are no fixed scales; the classification depends on the character of the object. However, it is important to establish a clear description of the points on each scale so that all team members have the same understanding and consensus of the ratings.

When assigning a severity rating, one must be aware that a single failure of a component can have several effects, and each effect can have a different severity.

The best method for determining the occurrence rating is to use actual data from the same or similar product or process. When actual failure data are not available, the team must estimate how often the pertinent failure mode can occur.

The detection rating tells how likely a failure can be revealed before it happens. If there are no controls, the probability of detection is low and the rating high (9 or 10).

## **6. Calculation of the Risk Priority Number (RPN) for each failure mode**

Now, the RPN is calculated by multiplying the severity rating by the occurrence rating and the detection rating for each item (see the special column in Fig. 1):

$$\text{RPN} = \text{Severity} \times \text{Occurience} \times \text{Detection}.\tag{1}$$

This number for a single item can be between 1 and 1000.

Then, the total RPN can be calculated by summing up the risk priority numbers for all failure modes (Fig. 1, at the bottom of the table). This number alone is meaningless, because each FMEA has a different number of failure modes and effects. However, it can serve for compar‐ ison with the revised total RPN once the improving measures have been proposed (see further).

## **7. Prioritizing the failure modes for action**

The failure modes can now be ranked from the highest RPN to the lowest RPN. This can easily be accomplished by common spreadsheet programs (e.g. Excel).

The team must now decide which failure modes will be worked on to reduce their RPN. Usually, a limit value of RPN is chosen, and only those items are dealt with, whose RPN was higher. However, special attention must also be paid to all cases with the highest severity rating, such as 8 – 10.

## **8. Taking action for eliminating or reducing the high-risk failure modes**

Each of the high-risk failure modes is discussed, and the team members propose measures to reduce its RPN. This number is a product of three terms (severity, occurrence, and detectabil‐ ity), and the reduction of each of them will reduce the RPN. However, the best way is to eliminate the reason for particular failure. For example, if a steel component can fail due to corrosion, the use of a stainless steel can fully avoid this danger. If there is no failure, there is no need to reduce its severity or frequency, nor improve its detectability.

Then, measures follow for the reduction of severity of a failure or their frequency. (Some of the failure modes have similar reasons.) Improvement can be reached by new design, by using other components or materials, by the improvement of input control for components or raw materials, and discarding the unsuitable ones. The third way to reduce RPN strives at the improvement of detection of failures in early stages (e.g. by building-in special elements or sensors or by periodic inspections). However, this does not mean an actual improvement of the structure.

## **9. Calculation of the resulting RPN as the failure modes are reduced**

For each item corrected, new ratings are determined (severity, occurrence, and detectability) as well as the risk priority number (see part 2 of Fig. 1). Then, the total RPN is calculated for the whole structure. This number can often be several tens of percent lower than the original RPN, partly thanks to the elimination of reasons for some failures. The comparison of both RPN shows how effective the FMEA was. It can also help in deciding what measures should be taken in cases of several possible ways of improvement, with different RPNs.

## **10. Taking action for improvements**

The recommended measures for improvement are written into the FMEA form, including their ratings and RPN. However, the most important thing is to ensure that these measures will be realized. Thus, it must also be proposed who will be responsible for the corrective action, the date to which this action should be carried out, and the person who will check it (with respect to the competences of the FMEA team). The final FMEA forms are then submitted to the management.

## **Concluding remarks**

Failure Mode and Effect Analysis, although it is very simple and does not work explicitly with probabilities, can significantly reduce the number of mistakes happening during the design, manufacture, and assembly or building of an object, as well as the number of failures occurring during its life. Thus, FMEA reduces the total costs and increases the safety, reliability, lifetime, and quality of the object. Very often, the design is improved.

Further details on FMEA can be found in the literature, e.g. [1 – 3]. FMEA has been incorporated into reliability standards, such as IEC 60812, and also commercial computer programs for FMEA are available, although the creation of own, purpose-tailored programs is easy.

A variant of FMEA exists, called FMECA (failure mode, effects, and criticality analysis), which puts more emphasis on the assessment of consequences of possible failures [3]. The principle, however, is the same as above.

#### **Example 1**

In a Failure Modes and Effects Analysis, done during the design of a home appliance, five possible failure modes were revealed. Their severity (S), probability of occurrence (O), and possibility of early detection (D) were classified as shown in the table below. Calculate the RPN for each failure mode and the resultant RPN for the whole appliance.

#### **Solution**

The individual values of RPN (=S×O×D) and the resultant value (=∑RPN<sup>i</sup> ) are written in italics.


## **Author details**

### Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


## **Fault Tree Analysis and Reliability Block Diagrams**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62374

#### **Abstract**

Fault tree analysis (FTA) strives to reveal all possible sources of critical failures. It starts from the most critical event ("top event") and looks at its reasons, and continues in this way backwards to the initial events leading finally to the failure. So-called fault tree, plot‐ ted using the symbols of Boolean algebra can then be used for the construction of a relia‐ bility block diagram, which serves for finding the critical way and probability of failure. The principle of Markov analysis is explained as well.

**Keywords:** Failure, fault tree, fault tree analysis, FTA, top event, reliability block dia‐ gram, probability of failure, Boolean algebra

The failure modes and effects analysis (FMEA), explained in the previous chapter, strives for finding all possible sources of future failures. It starts with failures of single elements, with mistakes of personnel, etc., and looks for their consequences for the structure or process. It is very efficient but has two drawbacks. First, it reveals perhaps all sources of many possible failures, but only few of them are really serious and have fatal consequences, such as the col‐ lapse of the structure. Moreover, complex objects can fail in various ways. Second, FMEA is a rather qualitative analysis and does not give information on the probabilities of failure.

For these reasons, **fault tree analysis (FTA)** is also often used (IEC 61025). In contrast to the "bottom-up" inductive approach of FMEA, the Fault Tree Analysis is a deductive method and goes "top-down". It starts with the so-called **top event** (critical event; e.g. the aircraft is falling down) and searches for all possible causes (e.g. failure of all engines, a broken wing, or an explosion in the aircraft). Then, the reasons for each of these causes are looked for, and so on, until basic events. If all these events are depicted, showing how the "upper" event follows the "lower" event, and so on, the so-called **fault tree** is obtained, which shows the straightest ways to critical failures. Special symbols are used for creating these diagrams (Fig. 1).

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**Figure 1.** Symbols for Fault Tree Analysis.

**Figure 27.** Symbols for Fault Tree Analysis. A simple example with electric lighting in a room with two lamps is shown in Fig. 2. The top event is "there is darkness in the room". This can happen if none of the two lamps lights, and four possible reasons exist for this (either both the lamps have failed, there is no voltage in the network, the switch is off or failed, or the fuse has burnt).

A single fault tree is used to analyze one and only one top event (or undesired event). FTA involves five principal steps:


In contrast to FMEA, fault tree analysis is able to consider also events caused by external reasons.

Fault tree analysis is often used in the aviation industry, as well as chemical, petrochemical, nuclear power, and other high-hazard industries.

A fault tree can be converted into a **reliability block diagram (RBD).** This is a scheme similar to Fig. 4 in Chapter 5, with series and parallel arrangement of blocks representing the indi‐ vidual elements or groups of them. Each element is characterized by a failure rate. A series arrangement fails if any of its elements fails. Parallel paths are redundant, that is, all elements must fail for the parallel network to fail. If the probabilities of individual events are known, one can calculate the failure probability of the system, as shown in Chapter 5.

A reliability block diagram RBD may be drawn using switches instead of blocks, where a closed switch represents a working component and an open switch represents a failed component. If a path may be found through the network of switches from the beginning to the end, the system is still working. The system can also be solved using the rules of Boolean algebra. Series paths can be replaced by AND gates and parallel paths with OR gates, etc.

In complex systems consisting of many blocks, various blocks can fail simultaneously. If connections exist between certain elements, the failure of one or even more blocks does not

**Figure 2.** Fault tree for two lights in a room.

necessarily mean the failure of the whole system. Reliability in such systems is studied by the **cut set** or **tie set** methods. A cut set is obtained by drawing a line through the blocks, whose failures would cause the failure of the system. Tie sets are obtained by drawing lines through such blocks, which, if working, would ensure the operation of the system. This analysis helps in revealing the possible conditions for failure or in finding an arrangement with high resistance to failure. **Figure 28.** Fault tree for two lights in a room.

Another approach to reliability analysis of complex systems uses the so-called Markov chains or **Markov analysis**. This analysis is suitable for systems whose components can be in two states, failed or not failed, and transitions from one state to another can happen from time to time. The analysis can be applied in cases where the response (or change of state) at a certain instant does not depend on previous events (so-called memoryless system) and the probabil‐ ities are known for the transition from operable state to failed state and vice versa; these probabilities are assumed constant. Markov analysis enables one to trace how the system evolves in time from certain initial conditions, and to see how quickly (and whether) it approaches to a steady state after a disturbing event. For example, if the probability of transition from the available state to a failed one is *P*A→ F and from the failed state to an available state is *P*F→ A, and if the component was initially available (*P*0 = 1), then the probability that it will be in a failed state after one step equals *P*1(*F*) = *P*A→ F. The probability of a failed state after the second step is *P*2(*F*) = *P*A→ F×*P*F→ F + *P*A→ A×*P*A→ F; here the probability of the transition from the failed state into the failed state is *P*F→ F = 1 – *P*F→ A, whereas the probability of transition from the available state into an available state is *P*A→ <sup>A</sup> = 1 – *P*A→ F. The evolution can be depicted using Markov state transition diagrams and tree diagrams, which become more and more complex with each step. Computer support is thus necessary. Markov analysis is used, for example, for the simulation and analysis of reliability of systems for electricity supply or reliability of software.

More details to fault tree analysis and reliability block diagram can be found in the literature [1 - 3]. These methods have also been incorporated into reliability standards, e.g. IEC 61025, and commercial computer programs for FTA are also available. More about cut set and tie sets can be found in [2, 3], more about Markov analysis is in [3 - 5].

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


## **Load – Resistance Interference Method**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62368

#### **Abstract**

Reliability and safety of a load carrying structure needs that its resistance R must be high‐ er than the load effect S. So-called reliability margin G = R - S and reliability index are used for reliability assessment and determination of failure probability if R and S are ran‐ dom quantities. This chapter explains the determination of parameters of the reliability margin and shows its use on examples, including the finding of suitable dimensions for achieving the demanded reliability.

**Keywords:** Reliability, safety, load, resistance, reliability margin, reliability index, failure, interference, probability

Many situations exist, which can be characterized as the conflict "load-resistance" or "ac‐ tion-barrier". The reliability is ensured if the load effect is smaller than the resistance against it. An example is a load-carrying structure, such as a road bridge or a mast of a TV transmit‐ ter exposed to wind. If the instantaneous load acting on the structure is higher than its loadcarrying capacity, the structure can collapse or its deformations will be larger than allowed. Several examples follow. If the voltage at the input of a device is higher than its electric strength, a breakdown of insulation will follow. If the amount of water, flowing into a reser‐ voir during rain period, is higher than its capacity at that time, the water overflows the up‐ per edge. The strength of a shrink-fitted connection depends on the overlap of the bolt in the hole (i.e. on the difference between the diameter of the bolt and the diameter of the hole). If this overlap is too small, the strength of the joint is insufficient. If the bus arrives at the train station later than at the time of the train departure, the passengers miss the journey. The consequences of these failures can range from negligible to fatal.

In all these cases, a tool is needed that can quantify the reliability. The object is reliable if its **resistance** *R* to a certain "load" is higher than the **load effect** or **stress** *S*. (The meaning of the

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

terms "load" and "resistance" can be very broad depending on the context.) For the quantifi‐ cation of reliability, the so-called **reliability margin** *G* is introduced, defined as

$$G = R - S.\tag{1}$$

This reliability margin shows, for example, how much the load-carrying capacity is higher than the load or how many minutes remain between the arrival and departure. The reliability condition can be written as

$$G = R - S > \begin{array}{c} 0 \ \text{ .} \tag{2} \\ \text{ } \tag{3} \end{array} \tag{2}$$

The case *G*< 0 corresponds to the resistance lower than load, which means failure.

Often, only the load varies (the wind force, for example), whereas the resistance *R* of a structure is constant. In such case, only the stress *S* is a random quantity, and the probability of failure is determined as the probability that *S* exceeds the value of *R*,

$$P\_f = P(S \ge R). \tag{3}$$

If the distribution function of the wind-caused stress in the structure, *F*(*S*), is known, then its value corresponding to the known value *R* gives the probability that the stress will be lower than the strength, and the structure is safe. The probability of failure is the complement,

$$P\_{\mathbf{f}} = \begin{array}{c} \mathbf{1} \ \ \ \ \end{array} - F\left(\mathbf{S} = \mathbf{R}\right) \ . \tag{4}$$

Note that here the letter *F* denotes distribution function and the probability of failure is *P*<sup>f</sup> . Vice versa, it is possible to determine the necessary strength *R* of the structure such that the probability of failure will not exceed the allowable value *P*f,a.

Often, also the random variability of the resistance *R* must be considered. It is especially during the design stage that the actual parameters of the structure are not known yet: for example, strength or Young's modulus of the material, characteristic stiffness of rubber bearings in a bridge, thickness of the flanges of rolled steel beams, etc. Only their nominal values are known in advance. The actual values vary randomly less or more around them and can be determined accurately only after the components have been purchased or manufactured.

In such cases, both quantities *R* and *S* must be considered as random. They can be characterized by probability distribution or simply by the average value and standard deviation. The situation is depicted in Figure 1. If the distributions of both quantities do not overlap at all, no failure can occur. This can be achieved if the average resistance is sufficiently higher than the average load. However, the effort to ensure that the distributions *R* and *S* never overlap can be uneconomical, especially if the consequences of failure are not critical. Sometimes, it is reasonable to admit a reasonably low probability of failure (e.g. small and short-time exceeding that of the allowable deformation). In this case, both distributions overlap (Fig. 1) and the probability of failure is proportional to the area below the overlapping part of both curves (see further). It is thus useful to know the failure probability for certain combinations of *S* and *R* or, vice versa, to determine in advance what cross-section dimensions should be used if the failure probability must not exceed some allowable value. The pertinent procedure, called the "load – resistance" or "stress – strength interference method" [1, 2], is explained further.

If the stress *S* and resistance *R* are random quantities, the reliability margin *G* is also a random quantity (Fig. 1), with its own probability distribution. Its mean *μ* and standard deviation *σ* can be calculated as

$$\begin{aligned} \mu\_{\mathbb{C}} &= \mu\_{\mathbb{R}} - \mu\_{\mathbb{S}}, \quad & \quad \text{ (a)}\\ \sigma\_{\mathbb{C}} &= \{\sigma\_{\mathbb{R}}^{\;2} + \sigma\_{\mathbb{S}}^{\;2}\}^{1/2}; \quad & \quad \text{ (b)}\end{aligned} \tag{5}$$

the subscripts denote the corresponding quantities. When working with empirical data, *μ* and *σ* are replaced by the sample mean *m* and sample standard deviation *s*.

S – stress; R – resistance (strength); G – reliability margin.

**Figure 1.** Stress – strength interference method (a schematic).

Failure occurs if the load effect is higher than the resistance, i.e. if the reliability margin *G* in Eq. (1) is negative (Fig. 1). The corresponding critical value of *G* is 0, and the probability of failure *P*<sup>f</sup> can be determined as

$$P\_{\mathbf{f}} = P\left(R \le S\right) \;= P\left(G = \; 0\right) \;. \tag{6}$$

Reliability margin can be standardized to a nondimensional form in the following way:

$$
\mu = \left\langle G - m\_{\odot} \right\rangle / s\_{\odot}. \tag{7}
$$

Its value for the transition between reliable state and failure (*G* = 0) equals

$$
\mu = \left\langle 0 \, -\mathfrak{m}\_{\mathcal{G}} \right\rangle / \, \mathrm{s}\_{\mathcal{G}} = \, -\mathfrak{m}\_{\mathcal{G}} / \, \mathrm{s}\_{\mathcal{G}}.\tag{8}
$$

The ratio

$$\left(m\_{\rm G} \,/\,\mathrm{s}\_{\rm G} = \left(m\_{\rm R} - m\_{\rm S}\right) / \left(\mathrm{s}\_{\rm R}^{\,\,2} + \mathrm{s}\_{\rm S}^{\,2}\right)^{1/2} = \beta\right.\tag{9}$$

is called the **reliability index** *β* and corresponds to the distance of the mean value of reliability margin *G* from 0, which is expressed as a multiple of standard deviation of *G*. The reliability index gives a simple measure of reliability, as it shows how far is the average value of reliability margin from the critical point. The situation is simple if the reliability margin *G* has normal distribution: in this case, an unambiguous relationship exists between *β* and the probability of failure:*P*<sup>f</sup> equals the value of distribution function of standard normal distribution for *u* = – *β*. For example, *P*<sup>f</sup> = 0.02275 for *β* = 2, *P*<sup>f</sup> = 0.00135 for *β* = 3, and *P*<sup>f</sup> = 0.0000317 for *β* = 4. Some standards for civil engineering constructions admit the reliability evaluation using the reliability index and give the characteristic values of *β* for various degrees of safety [3].

The advantage of the reliability index is that it is simple and its values are of the order of units, which is near to our way of thinking. A normal distribution of *G* may be assumed if the coefficient of asymmetry *α*<sup>G</sup> < 0.3. Otherwise, no simple relation between *β* and *P*<sup>f</sup> exists, and other methods for the determination of failure probability are more appropriate (see further).

As the *S* and *R* curves can overlap or interfere (Fig. 1), the term **interference method** is used for this way of reliability assessment. Its use will be illustrated on two simple examples.

#### **Example 1**

Determine the probability of failure of a pull rod loaded by tensile force. The force magnitude is normally distributed with the mean *m*<sup>F</sup> = 140,000 N and standard deviation *s*F = 14,000 N. The diameter *D* of the rod is 20 mm. The stress is determined as *σ* = *F*/*A*, where the cross-section *A* = π*D*<sup>2</sup> /4 = 314.16 mm2 . The mean and standard deviation of the stress are: *m*S = 140,000/314.16 = 445.6 MPa and *s*<sup>S</sup> = 14,000/314.16 = 44.6 MPa. The strength parameters of the used steel are: *m*R = 500 MPa and *s*<sup>R</sup> = 50 MPa. One can assume a normal distribution of *R* and *S* as well as of *G*.

Solution. The mean and standard deviation of reliability margin are *m*G = *m*R – *m*S = 500.0 – 445.6 = 54.4 MPa and *s*G = (*s*<sup>R</sup> 2 + *s*<sup>S</sup> 2 ) 1/2 = (50.02 + 44.62 ) 1/2 = 67.0 MPa. The reliability index is *β* = *m*G/*s*G = 54.4/67 = 0.8117. The probability of failure is *P*<sup>f</sup> = *F*(–*β*) = *F*(–0.8117) = 0.20848 = 20.85%. (Various programs can be used for finding the values of standard normal distribution function *F*; the appropriate command in Excel is "norm.s.dist" or "normsdist". The statistical tables for the distribution function of standard normal distribution give the same result.)

#### **Example 2**

The failure probability from Example 1 is too high and must be reduced to *P*<sup>f</sup> = 0.0001. Find the appropriate diameter of the rod.

Solution. The reliability index for *P*<sup>f</sup> = 0.0001 is *β* = 3.719. The material parameters *m*R and *s*<sup>R</sup> are the same as above, so that it is necessary to determine only the stress parameters. In fact, there are two unknown parameters, *m*S and *s*S. However, we can assume that the coefficient of variation *v* of the slightly larger cross-section will be the same as in the first variant (i.e. 10%; cf. the *m*S and *s*S values above), so that *s*S = *vm*S = 0.1*m*S. Thus, only the mean stress *m*S (and the corresponding rod diameter) are to be determined. Several possible methods for finding *m*<sup>S</sup> exist. The first one, exact, is based on the solution of Equation (5) for given *β*, *m*R, *s*R, and unknown *m*S. This approach leads to a quadratic equation and could be preferred by those who like mathematical analysis. The second approach uses the formula for the calculation of *β*, varies step-by-step the value *m*<sup>S</sup> or rod diameter, and calculates repeatedly *β* or the failure probability until the target value of *β* or *P*<sup>f</sup> is found. This solution can be facilitated using a suitable solver: if the formula for the calculation of failure probability as a function of shaft diameter was created [using relationships *P*<sup>f</sup> = *F*(–*β*), Eq.(7), and *A* = π*D*<sup>2</sup> /4, *σ* = *F*/*A*], it is possible to "ask" the solver to change the diameter *D* until *P*<sup>f</sup> attains the demanded value. In this way, the solver in Excel has given the (accurate) value *m*S = 285.81 MPa (for *β* = 3.719). The corre‐ sponding cross-section area (for the load 140,000 N) is *A* = *F*/*m*<sup>S</sup> = 140,000/285.81 = 489.84 mm2 , and the rod diameter is *D* = 24.97 = 25.0 mm. (The reader is encouraged to make the check by calculating the cross-section area *A*, mean stress *m*S, reliability index *β*, and failure probability *P*f for this diameter.) Note how dramatically the failure probability has decreased (from 0.2048 to 0.0001) by increasing the rod diameter from 20 to 25 mm.

These computations can be done even for the values of *D* or *m*<sup>S</sup> chosen "by hand", without a special algorithm. This "primitive" approach, which does not need analytical abilities or solver, also leads quickly to an acceptable solution, the more so that some quantities (e.g. dimensions of standard rolled steel profiles) are not continuous, but graded.

#### **Other distributions and approaches**

If the stress and resistance have asymmetrical distributions that can be approximated by lognormal functions, the above approach may be used if the reliability condition is defined not as the difference of the strength and stress, Equation (1), but as their ratio:

$$\mathbf{G} = \mathbf{R} / \mathbf{S}.\tag{10}$$

Taking the logarithms of Equation (8) gives an expression similar to Equation (1):

$$
\log G = \log R - \log S.\tag{11}
$$

Both transformed quantities, log *R* and log *S*, have normal distribution and Equation (9) resembles Equation (1) in transformed coordinates (log *Γ* corresponding to *G*, etc.). Thus, Equation (9) can be treated by the procedures of interference method described above.

If the distributions of the resistance and stress are only known in the form of histograms, the probability of failure can be determined by numerical integration:

$$P\_f = \bigcap\_{\circ=\circ}^{\circ} F\_{\mathbb{R}}(\mathbb{S}) f\_{\mathbb{S}}(\mathbb{S}) d\mathbb{S} \,\tag{12}$$

and the probability that failure does not occur:

$$P\_{\mathcal{T}} = \bigcap\_{-n}^{n} F\_{\mathcal{S}}(R) f\_{\mathcal{R}}(R) dR \tag{13}$$

depending on what functions are available. The differentials d*S* and d*R* are replaced by finite intervals ∆*S* and ∆*R*. For more, see [2].

The probability of failure can also be determined by numerical simulation methods, such as Monte Carlo, which will be explained in the following chapter.

### **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

#### **References**


## **Chapter 15**

## **Monte Carlo Simulation Method**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62369

#### **Abstract**

The Monte Carlo method studies random phenomena using numerous fictitious experi‐ ments with computer-generated random numbers. Its principle is explained and also the principle of generation of random numbers with various probability distributions. Also more complex cases, such as the response surface method and generation of correlated random quantities are explained. The use of the Monte Carlo method is illustrated on several examples.

**Keywords:** Probability, random numbers, Monte Carlo method, correlation, response surface method, probabilistic transformation

Random phenomena or processes can be successfully studied by the Monte Carlo method [1 - 4]. This is a probabilistic method based on performing numerous fictive experiments using random numbers. It is used in various branches of science and technology. For example, in re‐ liability it serves for the analysis of load-carrying capacity or deformations of a construction, for the determination of time to failure, resonant frequency of a mechanical structure, or an electric circuit, or for the study of behavior of a complex transport or production system.

The Monte Carlo method is close to the engineering way of thinking. It is universal and does not need a special knowledge of probability theory. The only information it needs is the relationship between the output and input quantities,

$$y = f\left(\mathbf{x}\right) \text{ , or } \ y = f\left(\mathbf{x}\_{1'}, \mathbf{x}\_{2'}, \mathbf{x}\_{3'}, \ldots\right) \text{ , }\tag{1}$$

and the knowledge of probability distributions of the input variables. The method uses numerous repetitions of trials with computer-generated random numbers and the relevant

mathematical operations. In each "trial", the input variables *x*1, *x*2,..., *x*n are assigned random values, but such that their distributions correspond to the probability distribution of each variable. With these input values, the output quantity *y* is calculated via Equation (1). From the results, a histogram can be constructed (Fig. 1), which corresponds to the distribution of *y*.

**Figure 1.** Histogram obtained by the Monte Carlo simulation program Ant-Hill [2].

The generated *y* values can be used for the determination of the average value and standard deviation, but also the probability that *y* will be lower or higher than a chosen value or lie within some interval. Also, the characteristic values that will be exceeded with higher proba‐ bility than some allowable value (e.g. the guaranteed strength or time to failure). In a similar way, critical values can also be obtained, whose exceeding can be expected with only small probability (e.g. maximum load or deformation).

Today, various commercial computer programs exist for Monte Carlo simulations, but they can also be created. The base of such program is a generator of random numbers. Actually, they are not truly random but computer generated using a suitable deterministic algorithm. However, such algorithms are used, which generate numbers behaving nearly as if they were random.

The principle of these generators is simple. For example, the so-called congruential generator gives random numbers, distributed uniformly in the interval (0; 1), in the following way. One number is chosen as the base for the series of random numbers *u* (e.g. *u*<sup>0</sup> = 0.5284163). Now, in the first step, this number is multiplied by some suitable number *Q*, for example 997. The product is 997 × 0.5284163 = 526.8310511. The first random number *u*<sup>1</sup> is then created as the part of the result, lying behind the decimal point; in our case, *u*1 = 0.8310511. In the second step, *u*1 is again multiplied by the same number *Q*, 997 × 0.8310511 = 828.5579467, and the second random number is created as the decimal part of the result (i.e. *u*<sup>2</sup> = 0.5579467). This procedure is repeated again and again. The formula for the random number in the *j*-th step is

$$
\mu\_{\rangle} = \text{MOD}(\mathbb{Q} \times \mu\_{\rangle\_{-1}}) \; ; \tag{2}
$$

*u*j –1 is the random number from the previous step, and the symbol MOD (read "modulo") means the decimal part of the expression in brackets. The reader is encouraged to make several steps in this way; for a check, *u*3 = 0.2728599. A long series of these numbers has approximately uniform distribution. Also other algorithms exist. For example, one generator of pseudoran‐ dom numbers with normal distribution uses the central limit theorem, etc. In any case, the use of "commercial" generators is strongly recommended, as they have undergone thorough statistical testing to prove that they behave as really "nearly random", and the period, after which the series of generated numbers is repeated, is very long, hundreds of millions of numbers. Generators of random numbers are usually a part of universal computer programs or languages. Even Excel can produce such numbers. Better tools are included in special packages for probabilistic analysis of reliability, such as FREET (www.freet.cz) or Ant-Hill (www.sbra-anthill.com), as well as in Matlab and other advanced software.

### **1. Creation of random numbers with nonstandard distributions**

The commercial programs offer often-used distributions, such as uniform or normal. The random numbers, corresponding to other analytically defined distributions, can be generated via uniform distribution. The basic idea is that the distribution function *F* for any continuous random quantity is also random variable, distributed uniformly in the interval (0; 1). Thus, if the distribution function of random quantity *x* is described by the expression *z* = *F*(*x*), then the random numbers of *x* can be obtained from the random numbers *z* with uniform distribution in (0; 1) using the inverse formula:

$$\propto = F^{-1}\begin{pmatrix} z \\ \end{pmatrix} \ . \tag{3}$$

Here, *F*–1 means inverse probabilistic transformation (Fig. 2). For example, the distribution function for exponential distribution is *z* = *F*(*x*) = 1 – exp(–*x*/*x*0), with the parameter *x*0. The inverse transformation for this distribution is *x* = – *x*0 ln(1 – *z*);

In some cases, the distribution of a random input quantity has a more complex shape and can be described by a histogram, obtained from experiments or monitoring. This histogram is then used for the construction of distribution function *F*(*x*). This function can be approximated either by constant values of *F* in the individual subintervals of *x* (if the number of classes is high) or by interpolation within each class, such as

$$F(\mathbf{x}) = F\_i + \frac{F\_{i+1} - F\_i}{\mathbf{x}\_{i+1} - \mathbf{x}\_i} (\mathbf{x} - \mathbf{x}\_i),\\\mathbf{x} = \mathbf{x}\_i + \frac{F(\mathbf{x}) - F\_i}{F\_{i+1} - F\_i} (\mathbf{x}\_{i+1} - \mathbf{x}\_i) \tag{4}$$

Figure 31. Generation of random numbers by inverse probabilistic transformation. **Figure 2.** Generation of random numbers *x* by inverse probabilistic transformation.

*i* = 1, 2,..., *n* denotes the interval. The formula on the right gives *x* corresponding to the probability *F*. The *F* values are then generated as random numbers with uniform distribution.

A typical feature of the Monte Carlo method is that the characteristic values (average, quantiles, probabilities corresponding to certain values of *y*, etc.), obtained as a result of *n* trials, are never the same as the results obtained in any other set of simulations. The results are thus only approximate and are closer to the actual values for a higher number of trials. The number of simulation steps *n*, needed for attaining some accuracy of results, is given approximately by the formula:

$$m = \mu\_{a/2} \left( 1 - P \right) / \left( P \delta^2 \right);\tag{5}$$

*P* is the expected (estimated) probability of the investigated phenomenon, *δ* is the allowed relative error in the determination of *P*, *u*α/2 is the *α*/2–critical value of standard normal variable, and *α* is the probability that the actual value of *P* will lie outside the interval *P* ± *δ*. The necessary number of simulations thus grows significantly with decreasing probability. For example, if the assumed probability *P* is 0.01 and the allowed relative error *δ* is 10 % and confidence level *α* = 5% (with *u*α/2 = 1.96), then ≈ 40,000 simulation trials are necessary. For *P* = 0.0001, it is as much as 4,000,000 trials, etc. [Note: Equation (5) is based on the fact that the number of outcomes of an event of probability *P* in *n* repetitions has binomial distribution, and this distribution can be approximated by normal distribution for high *n*.]

The characteristic features of the Monte Carlo method are illustrated on several examples at the end of this chapter. The reader is encouraged to work them out on a PC.

## **2. More complex cases — Response Surface Method**

The direct use of the Monte Carlo method is suitable for simple relationships *y* = *f*(*x*1, *x*2,...). Often, the response must be obtained by numerical solution (e.g. the finite-element method). If one trial lasts minutes or more, the thousands of simulations would consume too much time. In these cases, more effective is the combination of Monte Carlo with the response surface method (RSM). The principle is that the response is first calculated only for selected values of input variables, the results are fitted by a simple regression function (response surface; Fig. 3), and the Monte Carlo trials are done with this function.

The relationship between the output quantity *y* (deformation, load-carrying capacity, ampli‐ tude of vibrations, etc.) and the input variables can often be fitted by a polynomial function:

$$\mathbf{y} = \mathbf{a}\_0 + \begin{array}{c} \Sigma \mathbf{a}\_i \mathbf{x}\_i + \ \Sigma \mathbf{b}\_i \mathbf{x}\_i^2 + \dots \ + \ \Sigma \mathbf{c}\_{i\parallel} \mathbf{x}\_i \mathbf{x}\_j + \dots \end{array} \tag{6}$$

This approximation is suitable if the actual relationship between input and output has a similar character (e.g. *y* ∼ *x*<sup>3</sup> ) or if the output quantity changes in the considered interval only little. If it differs significantly from a polynomial (e.g. *y* ∼ 1/*x*<sup>3</sup> or *y* ∼ *x*1/2), expression (6) cannot give a good approximation in a wider interval. There are several ways for improvement, the starting point being a visual judgment of the character of the response. Linear or polynomial function may be used for the approximation of other relationships if suitable transformations are made. For example, the relationship *y* = *a*/*x*<sup>3</sup> can be expressed as *y* = *az* by introducing a new variable *z* = 1/*x*<sup>3</sup> ; the relationship *y* = *ax*1/*x*<sup>2</sup> <sup>2</sup> can be converted to multiple linear regression *Y* = *A*0 + *A*1*X*<sup>1</sup> + *A*2*X*<sup>2</sup> using logarithmic transformations, etc. Solvers in universal software (Excel, Mathcad, or Matlab) can find regression coefficients directly, without transformations.

**Figure 3.** Response surface for two independent variables (a schematic, with cuts *x*1 = const, *x*2 = const).

The fit of response function can sometimes be improved by dividing the definition interval of some input quantities into subintervals and using different regression functions for each. This may be substantiated by the physical character of the problem. For example, the elastoplastic deflection of a beam obeys another law than purely elastic deformations.

The quality of the fit can be evaluated by means of residual standard deviation *s*res, which characterizes the scatter of the measured values around the regression function. Also the maximum difference between the individual "accurate" values and those on the response surface can serve as a criterion. With good response surface, the individual differences are randomly positive and negative. Larger regions with differences of the same sign indicate that the chosen function does not correspond well to the character of the actual response.

## **3. Application of the Monte Carlo method for correlated quantities**

The application of Monte Carlo simulations in problems with several input variables is simple if the individual input quantities are mutually independent (e.g. Young's modulus and the cross-section area of a beam). Sometimes, however, a relation between them exists; (e.g. between mass density and Young's modulus of concrete). In such case, one speaks about statistical dependence or correlation. A special case is the so-called autocorrelation, when the value of a random quantity at some point is related partly to the values at neighboring points or in preceding times. Examples are the properties of concrete or of soil at foundations or the temperature of a building structure: it varies during a day or from a day to day, but depends partly also on the season in the year.

The omission of correlations in the simulations can lead to errors. For example, a very low value of elastic modulus of concrete could be generated simultaneously with a very high value of strength, but this does not correspond to reality. If correlations are respected, the calculations reflect the reality better and the conclusions or predictions are more accurate, with smaller scatter. Sometimes, also, a quantity needed for the analysis is unavailable, but can be replaced by a correlated quantity. For example, if the direct measurement of the tensile strength of components in an existing massive steel structure is impossible, the information from hardness tests can sometimes be adapted.

The tightness of the relationship of two quantities is characterized by the correlation coefficient *r*, defined as

$$r\_{ij} = \text{cov}(\mathbf{x}\_i \mathbf{x}\_j) \;/\; (\mathbf{s}\_i \mathbf{s}\_j) \tag{7}$$

where cov(*x*<sup>i</sup> *x*j ) is the covariance of *x*<sup>i</sup> and *x*<sup>j</sup> , and *s*<sup>i</sup> and *s*<sup>j</sup> are their standard deviations. The correlation coefficient *r* is nondimensional, ranging from –1 to +1. For *r* = 0, no mutual relationship exists, whereas *r* = +1 or –1 mean deterministic (functional) relationship. For *r* > 0, the *x*<sup>j</sup> values grow with the growth of *x*<sup>i</sup> and decrease for *r* < 0. (Note: The correlation coefficient is also equal to the square root of the coefficient of determination *r*<sup>2</sup> , given by programs for curve fitting, available also in Excel.) Three examples with various values of *r* are shown in Figure 4.

**Figure 4.** Two correlated quantities *x*1 and *x*2 with the same mean values (*μ*1 = 100 and *μ*2 = 700) and standard devia‐ tions (*σ*1 = 30 and *σ*2 = 150) and various correlation coefficients *r* [5].

If two correlated random quantities *x*1 and *x*<sup>2</sup> should be generated, and if the regression function *x*2,reg = *f*(*x*1) is known, as well as the coefficient of determination *r*<sup>2</sup> of this approxima‐ tion, the following procedure may be used. First, the random value of *x*<sup>1</sup> is generated. Then, the corresponding value of *x*2 is generated as [5, 6]

$$\mathbf{x}\_{2} = f(\mathbf{x}\_{1}) + \Delta \mathbf{x}\_{2} = f(\mathbf{x}\_{1}) + \mu \mathbf{s}\_{2, \text{vs}} = f(\mathbf{x}\_{1}) + \mu \mathbf{s}\_{2} \langle 1 \rangle - r^{2} \tag{8}$$

where *s*2,res is the residual standard deviation of quantity *x*<sup>2</sup> around the regression function *f*, and *u* is the quantile of standard normal distribution (provided that the distribution of individual values *x*2 around *f* is normal). The right-hand part of Equation (8) uses the fact that the residual deviation *s*2,res of *x*<sup>2</sup> can be expressed by means of the standard deviation *s*2 of *x*<sup>2</sup> and the coefficient of determination *r*<sup>2</sup> pertaining to the regression function *x*2 = *f*(*x*1).

This approach may be used for linear, as well as nonlinear relationships between *x*1 and *x*2. With some modification, it may also be used for multiple regression [6].

More information on the Monte Carlo method, especially on its use in the assessment of reliability of structures, including load-carrying capacity and lifetime, can be found in books [1 – 3] and proceedings of conferences [4], which contain many practical examples.

#### **Example 1**

Generate (e.g. using Excel) 500 random numbers with normal distribution with the mean *μ* = 5 and standard deviation *σ* = 1. Calculate the sample average *m* and standard deviation *s* and compare them with *μ* and *σ*. Determine also the 5% quantile and the probability that *x* will be larger than 8.0 and compare them with the exact values *x*0,05 = 3.35515, and *P*(*x* > 8.0) = 0.99865. Repeat the procedure and look at the new results. You can also make similar simulations with a lower number of generated values (e.g. *n* = 50) and also with a higher number (*n* = 5000).

Remark: Excel was mentioned here because it is ubiquitous and its use is easy. Everybody can thus try to solve such problems. The necessary routines Descriptive statistics, Histograms, Generation of random numbers, and Solver are installed in every Excel. However, they are not always directly available. If the command Data analysis does not appear on your screen after the command "Data", it must be activated. The procedure is as follows. Click on the button File, then on Possibilities (in this menu), then Add-Ins, then Analytical Tools (and Solver), and, finally, OK. After next pressing the command Data, the buttons Data Analysis and Solver appear in the upper part of the screen. It would be a pity not to use such powerful tools !

### **Example 2**

Generate 10,000 random numbers (*x*) with uniform distribution in the interval (0; 1). Calculate the mean value and standard deviation (accurate values are *μ* = 0.5 and *σ* = 12–1/2), plot the histogram, and check if it corresponds to uniform distribution. Now, generate the second series of 10,000 numbers (*Y*) with the same parameters. Make the sums of two numbers with the same subscript *j*. Calculate now the parameters and create the histogram of the resultant distribution. (The accurate value of the mean is *μ* = *μ*1 + *μ*<sup>2</sup> = 0.5 + 0.5 = 1, and the distribution is triangular.) If you would make (in similar way) the sum of three and more random quantities with uniform distribution, you will see that the resultant distribution resembles more and more normal distribution, in agreement with the central limit theorem.

### **Example 3**

Determine (by the Monte Carlo method) the mean time to failure from Example 5 in Chapter 5 (four elements in a series, each of exponential distribution with *λ*1 = 8×10- 6 h–1, *λ*2 = 6×10– 6 h-1, *λ*3 = 9×10-6 h-1, and *λ*4 = 2×10-5 h-1.

#### Solution.

The problem was solved using Excel. First, the mean times to failure of individual components were calculated (in standard way) as *m*t,i = 1/*λ*<sup>i</sup> (i.e. *m*t,1 = 125,000 h, *m*t,2 = 166,667 h, *m*t,3 = 111,111 h, and *m*t,4 = 50,000 h). In the second step, four series of random values of time to failure of individual components were generated via inverse formula *t* = – *m*<sup>t</sup> ln(1 – *F*) from random numbers *F* with uniform distribution in the interval (0; 1) and for the above parameters *m*t,1,..., *m*t,4. Each series had 1000 numbers. Then, in each simulation trial, the minimum of the four random numbers, corresponding to the times to failure of individual elements, was found, because the failure of the system of several components in a series occurs as the first element fails. In this way, the mean time to failure of the series system was *m*<sup>t</sup> = 23,376.3 h, which is very near the theoretical value 23,256 h. The standard deviation was *s*<sup>t</sup> = 24,137.7 h, again near the mean value, as typical for exponential distribution. (A histogram will confirm it.)

The mean values of the simulation series for the individual components are as follows (the numbers in brackets express the standard deviation): No. 1: 120,073 (126,158), No. 2: 161,217 (162,127), No. 3: 104,902 (103,727), and No. 4: 52,001 (53,723), everything in hours. One can see that the parameters of the generated variables are all near the parent parameters. With higher numbers of simulation trials, the differences would be even smaller.

Remark: Systems with parallel arrangement of components can be solved in a similar way, but instead of searching for the minimum of the times to failure in each trial, now the maximum will be sought, because the parallel system fails only after the failure of the component with the longest time to failure.

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


**Chapter 16**

## **Latin Hypercube Sampling**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62370

#### **Abstract**

The simultaneous influence of several random quantities can be studied by the Latin hy‐ percube sampling method (LHS). The values of distribution functions of each quantity are distributed uniformly in the interval (0; 1) and these values of all variables are ran‐ domly combined. This method yields statistical characteristics with less simulation ex‐ periments than the Monte Carlo method. In this chapter, the creation of the randomized input values is explained.

**Keywords:** Probability, Monte Carlo method, Latin Hypercube Sampling, probabilistic transformation, randomization

The Monte Carlo method has two disadvantages. First, it usually needs a very high number of simulations. If the output quantity must be obtained by time-consuming numerical com‐ putations, the simulations can last a very long time, and the response surface method is not always usable. Second, it can happen that the generated random numbers of distribution function *F* (which serves for the creation of random numbers with nonstandard distribu‐ tions) are not distributed sufficiently and regularly in the definition interval (0; 1). Some‐ times, more numbers are generated in one region than in others, and the generated quantity has thus somewhat different distribution than demanded. This problem can appear especial‐ ly if the output function depends on many input variables.

A method called Latin Hypercube Sampling (LHS) removes this drawback. The basic idea of LHS is similar to the generation of random numbers via the inverse probabilistic transforma‐ tion (3) and Figure 2 shown in Chapter 15 [1, 2]. The difference is that LHS creates the values of *F* not by generating random numbers dispersed in chaotic way in the interval (0; 1), but by assigning them certain fix values. The interval (0; 1) is divided into several layers of the same width, and the *x* values are calculated via the inverse transformation (*F*–1) for the *F* values

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

corresponding to the center of each layer. With reasonably high number of layers (tens or hundreds), the created quantity *x* will have the proper probability distribution. This approach is called stratified sampling. If the output variable *y* depends on several input quantities, *x*1, *x*2,..., *x*m, it is necessary that each quantity is assigned values of all layers and that the quantities and layers of individual variables are randomly combined. This is done by random assigning the order numbers of layers to the individual input quantities.

Figure 1. Principle of the LHS method. **Figure 1.** Principle of the LHS method.

and other characteristics.

7 (1992), issue 2, 123 – 130.

Engineering Mechanics. 2002; 25: issue 1, 47 – 68.

References

layers. N, the same for all variables, also corresponds to the number of trials (= simulation experiments). In each trial, the order numbers of layers are assigned randomly to the individual variables (X1, X2, ..., Xm). In this way, various layers of the individual variables are always randomly combined. In practice, this is achieved by means of random numbers and their rank-ordering. Then, each input variable is assigned the value corresponding to the center of the pertinent layer of its distribution function. The application is illustrated on a case with four random quantities (X1, X2, X3, and X4) and the definition interval of F divided into five layers (Fig. 34). Only five layers are used here for simplicity; usually, several tens of layers are used. In our case, Y will be calculated for five combinations of the four input quantities. Thus, 5 × 4 = 20 random numbers with The procedure is as follows. The definition interval of the distribution function *F* of each of *m* variables is divided into *N* layers. *N*, the same for all variables, also corresponds to the number of trials (= simulation experiments). In each trial, the order numbers of layers are assigned randomly to the individual variables (*X*1, *X*2,..., *X*m). In this way, various layers of the individual variables are always randomly combined. In practice, this is achieved by means of random numbers and their rank-ordering. Then, each input variable is assigned the value correspond‐ ing to the center of the pertinent layer of its distribution function.

The procedure is as follows. The definition interval of the distribution function F of each of m variables is divided into N

Figure 2. LHS method: assignment of layers to individual variables and trials.Usually, several tens or hundreds of trials are made, which enable the construction of distribution function F(Y) and determination of the mean value, standard deviation, various quantiles,

[1] Florian A. An efficient sampling scheme: Updated Latin Hypercube Sampling. Probabilistic Engineering Mechanics,

[2] Olsson A, Sandberg G, Dahlblom G. On Latin hypercube sampling for structural reliability analysis. Probabilistic

uniform distribution in interval (0; 1) are generated (see the table in the left part of Fig. 35). Then, the layer numbers for variable X1 (for example) for individual trials are assigned with respect to the order of random values (for X1) ranked by size from the maximum to minimum. Here, layer no. 3 (with the highest number 0.83) for the first trial, layer no. 1 for the second, no. 5 for the third, no. 2 for the fourth, and no. 4 for the fifth, corresponding to the numbers 0.56 - 0.25 - 0.83 - 0.17 and 0.30 in the column for X1. Similar operations are done for each variable. Thus, in the first trial, variables X1, X2, X3, and X4 are assigned the values corresponding to the second, fourth, second, and fifth layers of their distribution functions, respectively. Inverse probabilistic transformation F–1 is then used for the determination X1 from F1,1, etc.; see the table on the right. Now, the investigated quantity Y = Y(X1, X2, X3, X4) is calculated five times. The obtained values Y1, Y2, Y3, Y4, and Y5 can be used for the determination of statistical characteristics (mean, standard deviation, ...). The application is illustrated on a case with four random quantities (*X*1, *X*2, *X*3, and *X*4) and the definition interval of *F* divided into five layers (Fig. 1). Only five layers are used here for simplicity; usually, several tens of layers are used. In our case, *Y* will be calculated for five combinations of the four input quantities. Thus, 5 × 4 = 20 random numbers with uniform distribution in interval (0; 1) are generated (see the table in the left part of Fig. 2). Then, the layer numbers for variable *X*<sup>1</sup> (for example) for individual trials are assigned with respect to the order of random values (for *X*1) ranked by size from the maximum to minimum. Here, layer no. 3 (with the highest number 0.83) for the first trial, layer no. 1 for the second, no. 5 for the third, no. 2 for the fourth, and no. 4 for the fifth, corresponding to the numbers 0.56 - 0.25 - 0.83 - 0.17 and 0.30 in the column for *X*1. Similar operations are done for each variable. Thus, in the first trial, variables *X*1, *X*2, *X*3, and *X*4 are assigned the values corresponding to the second,

fourth, second, and fifth layers of their distribution functions, respectively. Inverse probabil‐ istic transformation *F*–1 is then used for the determination *X*1 from *F*1,1, etc.; see the table on the right. Now, the investigated quantity *Y* = *Y*(*X*1, *X*2, *X*3, *X*4) is calculated five times. The obtained values *Y*1, *Y*2, *Y*3, *Y*4, and *Y*5 can be used for the determination of statistical characteristics (mean, standard deviation,...).

**Random numbers (RN) Layer numbers for individual trials** 


**Fig. 35.** LHS method – assignment of layers to individual variables and trials. **Figure 2.** LHS method: assignment of layers to individual variables and trials.

**The titles "Random numbers" and "Layer numbers" should be placed above the tables - as you see it.**  Usually, several tens or hundreds of trials are made, which enable the construction of distri‐ bution function *F*(*Y*) and determination of the mean value, standard deviation, various quantiles, and other characteristics.

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


**Chapter 17**

## **Economy of Reliability**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62371

#### **Abstract**

The simultaneous influence of several random quantities can be studied by the Latin hy‐ percube sampling method (LHS). The values of distribution functions of each quantity are distributed uniformly in the interval (0; 1) and these values of all variables are ran‐ domly combined. This method yields statistical characteristics with less simulation ex‐ periments than the Monte Carlo method. In this chapter, the creation of the randomized input values is explained.

**Keywords:** Probability, Monte Carlo method, Latin Hypercube Sampling, probabilistic transformation, randomization

Permanent effort exists to make the components and constructions more reliable and with longer life. Higher reliability can be achieved by better design and more ample dimension‐ ing of load-carrying parts and by better maintenance. However, all these cost money, and one can ask: How is reliability related to the costs? Does an optimum reliability exist from the costs' point of view? How can it be found? This chapter is devoted to the following is‐ sues: (1) optimum time for the renewal of objects with gradual deterioration, (2) optimum dimensions of the cross-section of load-carrying components that can fail suddenly (e.g. due to overloading) or due to fatigue or similar processes, (3) optimum probability of failure, and (4) cost-based optimum strategy of the maintenance and renewal of a group of objects.

## **1. Optimum time for the renewal of deteriorating objects**

Examples of deteriorating objects are machines, cars, bridges, cutting tools, pumps, or aircrafts. Basically, there are three kinds of costs related to these objects: (1) purchase costs *C*0, (2) costs for operation *C*op, and (3) costs caused by a failure *C*f,p. Their sum creates the total costs *C*tot,

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

$$\mathcal{C}\_{\text{tot}}\begin{pmatrix} t \end{pmatrix} = \mathcal{C}\_0 + \mathcal{C}\_{\text{op}}\begin{pmatrix} t \end{pmatrix} + \mathcal{C}\_{\text{f},\text{p}}\begin{pmatrix} t \end{pmatrix};\tag{1}$$

*t* denotes the time of operation. The general time course of the costs is depicted in Figure 1. The purchase costs *C*<sup>0</sup> is the money spent for buying or manufacturing or building the object. *C*<sup>0</sup> is spent at the instant of its purchase and remains constant until the failure or replacement by another object. The operation costs *C*op(*t*) have several components, such as costs of energies, fuels, processed material, common maintenance, and small repairs. Basically, these costs should grow approximately linearly with the time. For example, fuel is consumed continu‐ ously during any car ride. However, as the technical condition of the object gets gradually worse, the operation costs after some time start growing faster (the fuel consumption of a worn engine becomes higher, some small parts must be exchanged more often, etc.). The failurerelated costs *C*f,p(*t*) mean here the probable costs, which can be determined as

$$\mathcal{C}\_{\mathfrak{t},p}\left(t\right) \ = \mathcal{C}\_{\mathfrak{t},\textup{tot}} \times F\left(t\right) \ ; \tag{2}$$

*C*f,tot is the total costs caused by the failure, which consist of the price for the replacement of failed components or parts and additional costs, such as damages of other objects caused by this failure, costs due to the fall-out of production, and costs related to injuries or casualties. *F*(*t*) means the probability of failure caused by gradual deterioration (see Figure 1 and Chapter 16). When a new object is put into service, this probability equals zero and remains very low for a relatively long part of its life. Later, however, it gradually grows faster and faster due to the worsening condition of the object.

Note: The concept of probable failure-related costs *C*f,p and economic optimization of the allowable probability of failure makes sense in cases with many objects involved, where failures can occur more often, but the total managed property is so high that the administrator can include the average losses easily into his expenses. For example, a collapse of a bridge is a very costly event. However, the administrator of a bridge network comprising hundreds of bridges knows that several million Euros will be needed every year for repairs and thus prepares funds for them. On the contrary, a small manufacturer, who owns only one workshop, can become bankrupt if the workshop collapses due to overloading by snow, just because he has not enough money to build a new one. The total actual loss *C*0 (e.g. 105 €) is incomparably higher than the probable loss *C*f,tot × *F* included into the economic optimization, which would make only 1 € for failure probability *F* = 1:105 . All insurance systems are based on the idea of distributing the rarely occurring high losses over many subjects. (This also illustrates the fact that the concept of probable costs *C*f,p is sometimes inadequate, and the allowable failure probability must be based on the other criteria.)

The optimum lifetime from the economic point of view is that the object attains during its life the minimum costs per unit of the demanded production or service [e.g. per 1 km for a vehicle, per one machined part in the case of a cutting tool, per time unit of service (e.g. year) of a

**Figure 1.** Development of costs with time (a schematic). *C*0, purchase costs; *C*op, operation costs (*C*op,id, ideal case); *C*f,p, failure-related probable costs; *C*tot, total costs; *t*, time.

bridge, etc.]. The lifetime means the time until the demolition, complete overhaul, or replace‐ ment by a new one. Equation (1) can be rewritten as

$$\mathbb{C}\_{\text{tot},1}\begin{pmatrix} t \end{pmatrix} \ = \mathbb{C}\_{\text{tot}}\begin{pmatrix} t \end{pmatrix}/t = \mathbb{C}\_{0}/t + \mathbb{C}\_{\text{op}}\begin{pmatrix} t \end{pmatrix}/t + \mathbb{C}\_{\text{f},p}\begin{pmatrix} t \end{pmatrix}/t. \tag{3}$$

The time course of individual components and of the total unit costs is depicted schematically in Figure 2. One can see that the total unit costs *C*tot,1 attain a minimum at certain time *t*opt. This is the optimum time for replacement. For all other times, the economy of operation is worse. This can be illustrated on the example of a car. Everybody understands that it will not be economical to buy a new car every year. Similarly, it will not be reasonable to use one car for 50 years or more because of the increasing fuel consumption and the necessity to buy spare parts more often (provided they would be available for so long time).

The optimum condition for renewal must be understood in a wider sense. From a mathematical point of view, the optimum is exactly the time, for which d*C*tot,1/d*t* = 0, and nowhere else. However, only seldom can a repair be done at just this instant. On the other hand, the curve *C*tot,1(*t*) changes very slowly near the optimum (Fig. 2), and often it does not matter if the reconstruction of a bridge, for example, is done 1 month earlier or later. This makes the planning of maintenance and repairs easier.

### **2. Optimum dimensioning of load-carrying components — sudden failure**

Sudden failures occur due to a "weak" component or overloading from external cause (e.g. snow, wind, flood, or error in operation). The failure can be a fracture, permanent deformation,

**Figure 2.** Development of unit costs with time. (The symbols are explained in Figure 1.)

or collapse of a structure. Failure by overloading can occur at any time. If it happens before the end of the assumed service life, the system must be put back into its original state either by repairing the failed component or by replacing it by a new one. More ample dimensioning of the load-carrying parts means higher safety and lower probability of failure due to over‐ loading, but also higher price of the component. One can thus seek such dimensions, which will guarantee the best combination of high reliability and low costs.

The influence of the size of the cross-section on the probability of failure and total costs was studied by Menčík [1]. The main results for a bar loaded by tensile force are shown in Figures 3 to 5. The nondimensional scales express the ratio of individual values (*C*tot and *A*) to the reference values *C*\* or *A*\* as a function of the cross-section area. Also, the failure probability *P*f is shown. The individual curves correspond to various magnitudes of the failure-related costs (Fig. 3a) and to the various rates of the cost increase with the cross-section area (Fig. 3b).

Typical is the fast increase of costs with the decreasing cross-section at the left part of the diagram. For small cross-sections, the probability of failure increases and the failure-related costs are no more negligible. The cost growth after the minimum has been reached is usually much slower. The analysis for various cases has also revealed that the economically optimal failure probability is sometimes relatively high (e.g. 10–3 or even more). This is acceptable if the costs caused by a failure are low. For higher failure losses, the optimum failure probability will be lower. It is thus important that all possible losses be included into *C*f,a. The relatively high number of failures of some product can even cause the loss of customer reliance, which can lead to the bankruptcy of the manufacturer. Another example is the disaster of the space shuttle Challenger in 1986 due to the failure of a sealing ring. The price of such sealing ring is minute. If, however, the price of a destroyed space shuttle should be added to it, the total costs and *P*f,opt would be quite different. The differences in optimum dimensions can exist if big differences in the probable failure-related losses exist (e.g. due to the different financial compensation for the accident victims in various countries).

**Figure 3.** Total costs *C*tot and failure probability *P*<sup>f</sup> as functions of the cross-section area *A* (a schematic, after [1]). Curves 1 to 3 in (a) correspond to increasing failure costs and curves 4 to 6 in (b) correspond to increasing costs per unit of the cross-section area. The scales for costs and the cross-section area are standardized, and the scale for *P*<sup>f</sup> is logarithmic.

**Fig. 38.** Total costs Ctot and failure probability Pf as functions of cross-section area A

Based on the same data as above, Figure 4 shows the total costs *C*tot as a function of the failure probability *P*<sup>f</sup> . Obvious is the fast increase of the costs for higher values of *P*<sup>f</sup> due to the growing probable costs *P*<sup>f</sup> (*C*0 + *C*f,a). The probable failure-related costs are usually negligible for *P*<sup>f</sup> < 10–6, but significant for *P*<sup>f</sup> > 10–1. It is also obvious that the probability of failure, corresponding to the minimum total costs, is relatively high. Sometimes, other criteria are thus decisive rather than the costs. (a schematic, after [40]). Curves 1, 2, 3 in Fig. (a) correspond to increasing failure costs, curves 4, 5, 6 in Fig. (b) correspond to increasing costs per unit of crosssection area. The scales for costs and cross-section area are standardised, the scale for Pf is logarithmic.

These examples assumed that the cross-section area can change continuously. The reality is often more complex: the cross-section of standardized components changes stepwise, and so also the price. The computations should thus be done for all possible nominal cross-sections and arrangements, and the variant with minimum total costs can be chosen.

**Figure 4.** Total costs *C*tot as functions of failure probability *P*<sup>f</sup> [1]. Curves 1 to 3 correspond to increasing failure costs and curve 7 shows the probable failure costs *P*<sup>f</sup> . The curves correspond to the same input values as in Figure 3a.

## **3. Optimum dimensioning of components with stress-enhanced deterioration**

Examples of gradual deterioration are fatigue or wear of metallic parts as well as corrosion, creep at increased temperature, or carbonation of concrete. In these cases, the failure occurs after some time of operation depending on the load. Thus, the costs must be related to the time to failure or replacement of the component *T*<sup>f</sup> , and the cost-effectiveness is evaluated according to the unit costs (i.e. the costs per unit time of operation),

$$\mathsf{C}\_{\text{tot},1}\{t\}\,\,=\mathsf{C}\_{\text{tot}}\,/\,T\_{\text{f}}.\tag{4}$$

The time to failure depends on the load effect *S*, which is the stress amplitude in cyclical loading, the force acting on a ball bearing, or the intensity of creep or corrosive environment. In the simplest case, the lifetime *T*<sup>f</sup> decreases with increasing *S* according to the formula:

$$T\_{\mathbf{f}} = \mathcal{B} \mathcal{S}^{-\text{m}};$$
 
$$(5)$$

*B* and *m* are material constants. Equation (5) is known as *S – N* curve for fatigue, with *S* denoting the stress amplitude. However, as shown in Chapter 6, Equation (5) can also be used for the prediction of the time to failure of a body with a slow crack growth or the endurance of components exposed to creep at high temperatures, the lifetime of bearings, as well as several other cases.

The load effect *S* depends on the load *L* and the size of the cross-section under load as

$$S = L/Z\_t \tag{6}$$

where *L* is the amplitude of characteristic load (tensile force or bending moment) and *Z* is the characteristic parameter of the cross-section (the area for tension or section modulus for bending). The expected lifetime *T*<sup>f</sup> may thus be expressed as

$$T\_{\mathbf{f}} = \mathbf{B}\left(Z \mid \mathbf{L}\right)^{\mathrm{m}},\tag{7}$$

and can be ensured by proper dimensioning (*Z*).

The relationships between cross-section size, costs, and time to failure may be used for studying the influence of the cross-section size on the cost-effectiveness. Figures 5 and 6 show the development of unit costs and time to failure as the function of the diameter of a shaft loaded in tension (Fig. 5) and bending (Fig. 6a,b). Also, these diagrams can be plotted in nondimensional coordinates.

**Figure 5.** Unit costs *C*1 and time to failure *T*<sup>f</sup> as functions of shaft diameter *D* (a schematic, [1]). Component loaded in tension, fatigue exponent *m* = 3.0.

Several important conclusions can be drawn from this analysis. An optimum with minimum unit costs exists only in some cases. Sometimes, it lies outside the suitable interval of service times or acceptable dimensions of the cross-section. An important role is played by the kind of loading and fatigue exponent *m*. The situation for tensile loading is better: for the common values of fatigue exponent in metals (2 < *m* < 6), an optimum often exists (Fig. 5). For bending, the increase of the characteristic cross-section dimension (e.g. diameter *D*) causes the increase of the cross-section area, *A* ~ *D*<sup>2</sup> , and also a faster increase of the section modulus (*Z* ~ *D*<sup>3</sup> ) and thus much faster growth of useful life (*T*<sup>f</sup> ~ *Z*m). As a consequence, any enlargement of the

**Figure 6.** Unit costs *C*1 and time to failure *T*<sup>f</sup> as functions of shaft diameter *D* [1]. Bending load, fatigue exponent *m* = 3.0 (a) and *m* = 1.7 (b).

cross-section usually leads to the reduction of unit costs *C*<sup>1</sup> (Fig. 6a); the exception is low values of fatigue exponent, *m* < 1.7, where also a cost minimum exists (Fig. 6b).

Because of the many factors involved, it is impossible to formulate simple universal rules for finding the optimum size. A more practical way is to model the situation for admissible ranges of the input quantities. A graphical representation is very useful. Sometimes, it becomes obvious that the unit costs vary only monotonously or insignificantly in the possible range of input variables, which can make the choice of optimum dimensions easier.

## **4. Cost-based maintenance optimization of long-life objects**

Examples of such objects are bridges, cooling towers, locomotives, or heavy machines in power plants. Two cases can be distinguished: maintenance optimization of a single object, and of a group of similar objects, such as bridges in a railway or road network.

#### **Single object**

The optimization tries to minimize the total costs spent from putting the object into operation until its replacement or reconstruction. The total costs *C* in the period *T* are [1 – 4]:

$$\mathbf{C} = \mathbf{C}\_{i} + \mathbf{C}\_{m} + \mathbf{C}\_{r} + \mathbf{C}\_{f,p} + \mathbf{C}\_{u} + \mathbf{C}\_{a} - V\_{s};\tag{8}$$

*C*i = inspection costs, *C*m = maintenance costs, *C*r = repair costs, *C*u = user costs, *C*a = additional costs, *V*<sup>s</sup> = salvage value of the object at the end of the considered period, and *C*f,p = probable failure costs, as defined by Equation (2). These include the costs due to a collapse of the object (e.g. a bridge) or its closing if the collapse is imminent. Also, the expenses of users *C*u should be included (delays and the necessity to use longer alternative routes).

Various strategies can be used for organizing the maintenance and repairs (e.g. no repair until the replacement, or regular maintenance plus small repairs), which would enable a longer time of operation until replacement. The optimization consists of comparing the costs *C* for several variants and choosing that with the lowest possible costs. For each variant, the total costs *C*, Equation (8), are calculated for the time interval considered (e.g. 5 or 20 years), and the costs for individual variants are rank–ordered and compared (Fig. 7). As some of the input data are only estimated, it is recommended to make several estimates for each maintenance strategy: with optimistic, probable, and pessimistic input values.

**Figure 7.** Comparison of total costs for six variants (an example).

#### **Maintenance optimization for a group of objects**

In an ideal case, the optimum variant for maintenance of each object in the network is found as described above. In real life, however, various constraints must often be respected. Very important is the amount of money available for maintenance and repairs in the individual years. It could happen that more bridges should be reconstructed simultaneously than the budget would permit, whereas, in other years, the working capacities for repairs would not be fully used. Thus, it is necessary to calculate the cost components for every bridge and maintenance variant for every year in the considered time interval, in which the renewal should be optimized (e.g. 20 years). These costs can be written into a table with the columns corresponding to years and the rows to individual bridges and maintenance variants (Fig. 8) and compared with the money available. A comparison of all variants reveals the best strategy. In addition to the demand of uniform flow of money for maintenance and repairs, several other factors must be considered when prioritizing the objects (e.g. bridges) for repairs:


The importance of the object in the whole network or in some region.


**Figure 43.** Example of cost matrix for seven bridges during the period 2015 – 2035. Br – bridge, Vj – maintenance variant. **Figure 8.** Example of cost matrix for seven bridges during the period 2015 to 2035. Br – bridge; Vj – maintenance var‐ iant.

For these reasons, maintenance optimization is sometimes divided into three steps [1, 3, 4]:

Step 1. Condition rating prioritization. All objects are ranked according to their condition as revealed by inspections. Only those with condition worse than a certain value will be consid‐ ered for the maintenance in the near future. This preselection significantly reduces the number of candidates for further steps.

Step 2. Object importance prioritization. The role of individual objects in the network is considered. One bridge can be in a worse condition than another bridge. However, if it lies on an unimportant road, whereas the other is on a main road, the latter one will be repaired preferably.

Step 3. Optimization of money allocation. The possible maintenance strategies are compared with respect to the available money and the work capacities. Then, the strategy is chosen, which is economically most favorable for the stock in the longer period.

## **5. Time value of money**

In long-term planning, one must be aware of the difference between spending money today and in the future. Due to interests, the value of (suitably deposited) money gradually increases, so that the today's value *V* will, in *n* years, correspond to *V*(1 + *r*) n, where *r* is the interest rate. Thus, when the total costs during a long period are calculated as a sum of expenses arising in various times, the values of individual components should be converted to the same time base, usually to the time *T*<sup>0</sup> when the study is made. The common formula for the conversion of the *j*-th component *C*j,T,i paid at time *T*<sup>i</sup> is

$$\mathbb{C}\_{j,T,0} = \mathbb{C}\_{j,T} \frac{1}{\{1+r\}^{T-T0}}.\tag{9}$$

However, due to inflation, the prices of material, components, and work gradually grow, and the gain from postponing an investment is smaller. If the inflation rate is close to the interest rate, the profit can be negligible, and the standardization (9) is not necessary for the comparison of individual variants.

## **Author details**

#### Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


## **Reliability Assessment with a Small Amount of Data**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62372

#### **Abstract**

This chapter describes various methods for reduction of uncertainties in the determina‐ tion of characteristic values of random quantities (quantiles of normal and Weibull distri‐ bution, tolerance limits, linearly correlated data, interference method, Monte Carlo method, bootstrap method).

**Keywords:** Random quantity, uncertainty, normal distribution, Weibull distribution, tol‐ erance limits, correlation, interference method, Monte Carlo method, bootstrap method

Many quantities necessary for reliability assessment are obtained by observation or meas‐ urement (load and material characteristics). Often, the number of tests *n* is low. As a conse‐ quence, the distribution parameters are only estimated. Their true values can be different and thus also the other characteristics, such as quantiles. This could be dangerous especially if *n* is very low (tens or less). Uncertainty arising from a small amount of data should be tak‐ en into consideration in any reliability assessment. This chapter presents four methods that can increase the safety of these assessments. The first method is related to the determination of the guaranteed lowest or highest values (i.e. low probability quantiles of normal or Wei‐ bull distribution). The second method tries to mitigate the uncertainties related to the use of the *S*–*R* interference method. The third method is devoted to the uncertainties of the Monte Carlo simulations if the input distributions were obtained from a small amount of measure‐ ments. The last method explains briefly the principle of the so-called bootstrap technique.

## **1. Guaranteed values of quantiles**

Guaranteed or safe value of a random quantity *x* is such value that will be exceeded (e.g. load) or not-attained (strength) only with a very low probability *α*. This value corresponds

to *α*-quantile in the latter case and to *α*-critical value in the former case; note that the *α*critical value corresponds to (1 – *α*)-quantile.

A quantile can be calculated as the inverse value of the distribution function. However, because usually only the estimates of the parameters of a population are known, only an estimate of the quantile is obtained in this way. Its true value can be different. As a consequence, in some cases, the actual reliability will be lower than assumed, which is dangerous. The difference can be high especially if the statistical parameters of the population were obtained from a very small number of samples. For this reason, confidence interval should always be determined for the quantiles. Two very important cases are normal and Weibull distribution.

#### **Quantiles of normal distribution**

The *α*-quantile of a quantity *x* is calculated as

$$\mathfrak{x}\_{\alpha} = \begin{array}{c} \mu \, + \, \mathfrak{u}\_{\alpha} \sigma \, \end{array} \tag{1}$$

*μ* and *σ* are the mean and standard deviation and *uα* is the *α*-quantile of the standard normal distribution. The uncertainty in the determination of the distribution parameters *μ* and *σ* can be reduced by means of confidence limits for the quantile. For quantities with normal distri‐ bution, usually tolerance limits *xα*,tol are used instead. The lower (upper) tolerance limit demarcates the fraction *α* of the population, which will be lower (higher) than *xα*,tol only with probability *α*, the risk of a wrong estimate being *γ*. The tolerance limits are determined using the formula [1, 2]

$$\mathbf{x}\_{a,tol} = \begin{array}{c} \overline{\mathbf{x}} \ \pm k\_a \mathbf{s}\_{\prime} \end{array} \tag{2}$$

and *s* are the sample average and standard deviation, and *k<sup>α</sup>* is the one-sided tolerance factor, which depends on *α*, the number of measurements *n*, and the risk *γ* that the prediction will be wrong. The plus sign pertains to the upper limit, whereas the minus sign pertains to the lower limit.

The difference between the values obtained using Eq. (1) or (2) is large especially for low *n*, *α* and *γ*. For example, the 0.1%-quantile according to (1) is *μ* – 3.09*σ*, while the 0.1% lower tolerance limit (for *γ* = 10%) is – 3.44*s* for *n* = 100, – 3.79*s* for *n* = 30, and – 4.63*s* for *n* = 10. If, for example, strength tests were made only on 10 specimens, and the standard deviation was obtained as *s* = 0.1, then the strength guaranteed with 99.9% is by (4.63–3.09)*s* = 0.15, i.e. by 15% of the average strength lower than the less conservative value according to Eq. (1)

The tolerance factors *k* can be found in statistical tables, e.g. [2]. Their exact determination is difficult. An approximate formula was proposed by Wallis [1, 3]:

$$k\_a = \left( u\_a + u\_\gamma \sqrt{\frac{1}{n} + \frac{\left(u\_a\right)^2}{2\left(n-1\right)} - \frac{\left(u\_\gamma\right)^2}{2n\left(n-1\right)}} \right) \Bigg/ \left( 1 - \frac{\left(u\_\gamma\right)^2}{2\left(n-1\right)} \right),\tag{3}$$

where *uα* and *uγ* denote the *α*- and *γ*-critical values of the standard normal distribution. The approximation (3) coincides with the exact solution for *n* ≥ 500, and the difference slightly increases with decreasing sample size. For the confidence level 1 – *γ* = 0.9, the difference exceeds 1% for *n* < 20 and 2.5% for *n* < 10 regardless the value of *α*.

This method can also be used for quantities with log-normal distribution. They can be transformed (by logarithms) into variables with normal distribution. Then, tolerance limits for quantiles of this new quantity can be calculated using the above approach. Finally, the tolerance limits for the original quantity are found by inverse transformation.

#### **Quantiles of three-parameter Weibull distribution**

Weibull distribution has a very flexible character and is used very often to characterize the strength or time to failure. The *α*-quantile *xα* of a three-parameter Weibull distribution is determined as

$$\begin{array}{rcl} \mathbf{x}\_a & = & \mathbf{x}\_0 & + & a \left[ -\ln\left(1 - a\right) \right]^{1/b} \end{array} \tag{4}$$

where *a* and *b* are the scale and shape parameters of the distribution, respectively, and *x*<sup>0</sup> is the parameter of its position (threshold value). Again, only estimates of the true parameters can be obtained from a sample of limited size and thus also only an estimate of the quantile *x*α. Especially for small sample sizes (*n* equal several tens or less), the sample parameters can differ significantly from those of the population [4]. Sometimes, the threshold value *x*0 is assumed zero for safety reasons. However, this can yield unreasonably low values of low probability quantiles. The three-parameter distribution can be better, but confidence limits should always be given with quantiles, especially for small samples and low probability quantiles. These limits (L - lower; U - upper) can be computed as

$$\mathbf{x}\_{\alpha,\mathsf{L},\mathsf{U}} = \mathbf{x}\_{\alpha} \pm \boldsymbol{\Lambda}\_{\alpha,\mathsf{y},\mathsf{u}'} \tag{5}$$

where *∆*α,γ,ν is a certain function of the distribution shape, scatter of the measured values, probability *α*, confidence level 1 – *γ*, and number of measurements *n*. Mann et al. [5] proposed a method for the determination of the confidence limits. Unfortunately, the procedure is complicated and tabulated values must be used. Menčík [4] has proposed a simple approxi‐ mate expression for *∆* based on the variation of the position and slope of the distribution function:

$$\Delta\_{\perp} = \sqrt{t\_{\text{y}, u-1}^2 \frac{\text{s}^2}{m} + \left[1 - \left(\frac{2n}{\mathcal{X}\_{\text{y}, 2u}^2}\right)^{1/b}\right]^2 \left(\text{x}\_a - \overline{\text{x}}\right)^2},\tag{6}$$

where and *s*<sup>x</sup> are the sample average and standard deviation, respectively; *n* is the number of measurements from which the distribution parameters were estimated; *t<sup>γ</sup>*, n-1 is the one-sided *γ*-critical value of the Student's distribution for *n* – 1 degrees of freedom; *b* is the shape parameter; and *χ*<sup>2</sup> *γ*,2n is the *γ*-quantile of the *χ*<sup>2</sup> –distribution for 2*n* degrees of freedom. The probability that the true value of *x<sup>α</sup>* will be lower than the lower confidence limit is *γ*. The use of confidence intervals for quantiles may strongly be recommended if *n* < 100.

#### **Linearly correlated data**

If fatigue or wear processes occur, the relationship between characteristic load *S* and cycles to failure *N* can usually be described by a power-law function:

$$\begin{array}{rcl} N & = & A \, S \, ^{-m} \end{array} \tag{7}$$

where *A* and *m* are constants. This equation corresponds to a straight line

$$\begin{array}{ccccc}y &=& a & + & b \ge & & & & & & \end{array} \tag{8}$$

in coordinates *x* = log *S* and *y* = log *N*. The constants *a* = log *A* and *b* = – *m* are obtained by testing several specimens under various stress levels *S*, measuring the cycles to failure *N*, and fitting the transformed *y*(*x*) data by linear regression function. Typical of fatigue is the large scatter of times to failure. This must be taken into account when determining the guaranteed time to failure for a given service stress or the allowable stress for the required lifetime.

The situation is easier if 10 or more specimens were used for each stress level: the pertinent *N*-values in each level can be ordered so that they (approximately) correspond to quantiles. For example, if *N* = 10, then the lowest value corresponds to the 10% quantile, the second lowest corresponds to 20% quantile, etc. Then, the *S*–*N* curves may be constructed for various probabilities of survival by fitting only the pertinent quantiles. This is the best solution. Unfortunately, fatigue tests are time and cost demanding, so that often only several specimens are tested and Equation (8) is fitted to all data, thus representing the mean line. In this case, 50% probability exists that the true times to failure will be lower than those predicted via this line. Therefore, confidence intervals are also needed. The confidence interval for the points on the regression line (Fig. 1) is

$$y\_{\perp} = a\_{\perp} + \begin{array}{c} bx\_{\perp} \ \pm \ t\_{a\_{\perp},v} s\_{\parallel \text{res}} \sqrt{\frac{1}{n} + \frac{\left(x - \overline{x}\right)^{2}}{\left(n - 1\right)s\_{\pm}^{2}}},\tag{9}$$

where and *s*<sup>x</sup> are the average value and standard deviation of *x*, calculated from all *n* values *x*<sup>j</sup> used for the determination of regression constants *a* and *b*; *t<sup>α</sup>*,*<sup>ν</sup>* is one-sided *α*-critical value of *t*-distribution for *ν* = *n* – 2 degrees of freedom, and

$$s\_{ns} = \sqrt{\frac{\sum \left(y\_j - a - bx\_j\right)^2}{n - 2}}\tag{10}$$

**Figure 1.** Confidence interval around regression curve (a schematic).

is the residual standard deviation, characterizing the scatter around the regression line; the summation is done over all measured values of *y*<sup>j</sup> .

The modification of Equation (9), as proposed in [4], gives the approximate expression for tolerance limits for single values of *y*,

$$y\_{\;\;\;\nu} = a\_{\;\;\;\;\nu} + \; b\infty \pm t\_{\;\;\;\nu\_{a,\;\nu}} s\_{\;\;\nu\varepsilon} \sqrt{\left(\frac{k\_a}{t\_{a,\nu}}\right)^2 + \frac{\left(\chi - \overline{\chi}\right)^2}{\left(n - 1\right)s\_{\;\;\nu}^{\;\;\;\;\nu}}},\tag{11}$$

the minus (plus) sign pertains to the lower (upper) limit. In Equation (11), *kα* is a one-sided tolerance factor, which can be calculated using Equation (3). The probability that *y*(*x*) will be lower or higher than the tolerance limit (11) equals *α*, whereas the probability that this estimate is wrong is *γ*.

The intervals (11) for all *y*(*x*) form a tolerance band around the regression line (9). The tolerance limits for the actual number of cycles (or time) to failure can be obtained using the inverse transformations *S* = 10*<sup>x</sup>* , *N* = 10*<sup>y</sup>* , *A* = 10*<sup>a</sup>* , and *m* = – *b*.

### **2. Interference method for normally distributed stress and strength**

The interference method, suitable if random "load" acts on an "object" whose "resistance" also exhibits random scatter, was explained in Chapter 14. Failure occurs if the load effect *S* is higher than the resistance *R*. If the distributions of *R* and *S* interfere, the distribution of reliability margin *G = R – S* can be found, and the probability of failure is determined as the value of the distribution function for *G* = 0. The solution is easy if *R* and *S* are normally distributed and their parameters are known, because here the distribution of *G* is also normal, with parameters

$$
\mu\_{\mathcal{G}\_{\mathcal{G}}} = \quad \mu\_{\mathcal{R}} - \mu\_{\mathcal{S}\_{\mathcal{S}}} \quad , \qquad \sigma\_{\mathcal{G}\_{\mathcal{G}}} = \quad \sqrt{\sigma\_{\mathcal{R}}^2 + \sigma\_{\mathcal{S}}^2} ; \tag{12}
$$

*μ* and *σ* are the mean value and standard deviation; the subscript denotes the pertinent quantity. *G* can be transformed to standard normal variable *u* using the relationship

$$
\mathcal{G}\_- = \begin{array}{c} \mu\_{\mathcal{G}\_-} \ + \ \imath \sigma\_{\mathcal{G}\_-} \end{array} \tag{13}
$$

Using the failure condition, *G* = 0, the probability of failure *Pf*, can be found as the value of the distribution function for *u* = *– μG*/*σG*.

However, instead of the parameters *μ* and *σ* of the distributions *R* and *S*, only their estimates *x* and *s* are usually inserted into Eq. (13), which were obtained from samples of limited size *nR* and *nS*. As a consequence, one obtains only the estimates *xG* and *sG* of the reliability margin. In such case it is necessary to use one-sided tolerance factor *k* instead of quantile *u* in Eq. (13). Otherwise the actual probability of failure can differ from the forecasted one by more than one order [6, 7]. The tolerance factor *k* should be determined for the confidence level *γ* of the forecast and for the equivalent size *nG* of the sample *G*, which must be calculated from the sample sizes and standard deviations of the samples *R* and *S* via the relationship

$$
\mathcal{G}\_- = \begin{array}{c} \mu\_{\mathcal{G}\_-} \ + \ \imath \sigma\_{\mathcal{G}\_-} \end{array} \tag{14}
$$

The probability of failure is found as that corresponding to the lower tolerance limit

$$k \quad \quad = \overline{\mathbf{x}}\_{\mathcal{G}} / \mathbf{s}\_{\mathcal{G}} \quad . \tag{15}$$

When dimensioning for given probability of failure, *k* is determined first via Eq. (3). For this value, *xG* is calculated from Eq. (15). Finally, the nominal size *x* of cross-section is deter‐ mined from *xG* and the known mean value of the load using Eq. (12). Diagrams for this purpose are given in [6, 7].

#### **3. Monte Carlo method**

If the investigated quantity *x* is a function of random input variables, its distribution can be obtained easily by the simulation Monte Carlo method (Chapter 15). In the simulation trials, random values are assigned to input quantities so that their distribution corresponds to the probability distribution of the pertinent variable. However, the distribution parameters used in the simulations were obtained from samples of limited size. This means that the actual distributions can differ more or less from those used in the simulation. The corresponding uncertainties and errors persist in the results regardless of the number of Monte Carlo trials but can be reduced in two ways.

The first approach [8] uses random variation of distribution parameters. If the distributions of parameters of a random quantity *x* are known, this quantity can be generated so that, in each trial, random values are also assigned to its parameters, so that they vary randomly in their probable range. For example, the random quantity *x* of normal distribution with the known mean value *μ* and standard deviation *σ* can be generated using formula (1) with *uα* replaced by the random value *u* of the standard normal distribution. If, instead of true parameters *μ* and *σ*, only the sample estimates *m* and *s* are known, the probable values of *μ* and *σ* can be generated in individual trials using modified expressions for their confidence limits. The corresponding formula for random values of *x* is

$$\text{tr}^{-} = \left. m \right| + \left. t\_{n-1} \frac{s}{\sqrt{n}} \right. \\ \left. + \right. \left. us \sqrt{\frac{n-1}{\mathcal{X}\_{n-1}^{2}}} ; \\ \tag{16}$$

*u*, *t*, and *χ*<sup>2</sup> are random values of normal, *t*, and *χ*<sup>2</sup> distribution, respectively (*t*, *χ*<sup>2</sup> for *n* – 1 degrees of freedom). One value of *x* thus needs three random numbers to be generated. The quantiles of *t* and *χ*<sup>2</sup> distributions can be expressed approximately by means of quantiles of standard normal distribution; a review of various approximations can be found in [1].

The second method [4] adds a random component to each generated number. The basic random values *x0* of a quantity *x* are created (via the inverse probabilistic transformation *F–*<sup>1</sup> ) from random values *F* uniformly distributed in interval (0; 1). Then, a random component ∆ is added to each value *x0* created so that the result

$$
\Delta \mathbf{x}(F) = \mathbf{x}\_0(F) + \Delta(F) \tag{17}
$$

has the same probability distribution around *x0* like the quantiles of the genuine variable *x*. The obtained *x* values create a blurred confidence band around their distribution function.

#### **4. Bootstrap method**

This method, which also uses the Monte Carlo simulations, was originally used for finding the statistical characteristics of random quantities from a relatively small number of data *n* [9]. It creates its own population, consisting only of the experimental values. From this population, *n* values are chosen randomly, and the characteristic *X* of interest is calculated (e.g. mean or a quantile). This procedure is repeated many times. Then, the *α*-confidence interval for *X* is determined by one of the following ways. In the first approach, the average value *mx* and standard deviation *sX* of the pertinent quantity are calculated, and the lower (*L*) and upper (*U*) confidence limits are found using the expression

$$X\_{L, \mathcal{U}} = \overline{\mathfrak{x}} \; \pm \; \mathfrak{u}\_a \mathcal{s}\_{\mathcal{X}'} \tag{18}$$

where *u*α is the *α*-quantile of standard normal distribution; the probability of *X* being outside the limits (18) is 2*α*. This approach assumes that *X* has normal distribution. This needs not always be true, and various improvements were later proposed. (For more details, see [10].)

Another approach generates a large number of simulated data sets (at least 100). Then, the values of the characteristic of interest, calculated for each data set, are ranked into ascending order. The confidence bounds, corresponding, for example, to the 90% confidence interval, are obtained using the 5th and 95th values of the ordered values. However, when determining the confidence bounds for quantiles, this approach may only be used for quantiles sufficiently far from the tails of the distribution.

The bootstrap method can also be used in reliability assessment by the Monte Carlo technique, the more so that each simulation set gives a different value of *X* (e.g. the probability of failure *Pf* ). Thus, the Monte Carlo simulation sets are repeated many times. In each set, *Pf* , is deter‐ mined. Then, its probable highest value is found by one of the above approaches.

## **5. Concluding remarks**

Reliability assessment based on a small number of experimental values always means risk. A very important condition for the use of any probabilistic method is that the experimental sample adequately represents the whole population. The situation can be very dangerous if the population is not homogeneous, for example if several kinds of flaws and other defects can be responsible for the strength of a brittle material [11]. All characteristic kinds of defects must be present in the experimental sample (including the largest but rare ones), otherwise the predicted values of low probability strength can be wrong despite the determination of their confidence limits. The probability that a defect of probability of occurrence (e.g. 1:1000) will be found in a small sample consisting of only 10 pieces is really very low (only 1% compared to the probability 99% that such flaw will not be revealed). A similar situation exists, for example, when the maximum height of water waves at the sea coast should be predicted. Statistical characteristics can be obtained from long-term measurements, but if the waves at extremely rare tsunami events have not been included into the evaluation, the new coast structures will not be sufficiently protected against them.

When dealing with reliability assessment of some kind of structures, it is thus reasonable to gradually gather the data of all measurements (for the pertinent material or load) and to combine the newer data with older ones. For this purpose, Bayesian methods may be suitable, explained briefly in Chapter 22.

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


**Chapter 19**

## **Robust Design, Sensitivity Analysis, and Tolerance Setting**

Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62373

#### **Abstract**

This chapter describes various methods for reduction of uncertainties in the determina‐ tion of characteristic values of random quantities (quantiles of normal and Weibull distri‐ bution, tolerance limits, linearly correlated data, interference method, Monte Carlo method, bootstrap method).

**Keywords:** Random quantity, uncertainty, normal distribution, Weibull distribution, tol‐ erance limits, correlation, interference method, Monte Carlo method, bootstrap method

The reliability and safety of engineering objects are mostly formed during the design. Every design process has three stages:


Here, stages 2 and 3 will be explained in more detail, as they are very important for reliability.

## **1. Determination of optimum parameters — Robust design**

After the concept of the construction (an engine, a bridge, a transmitter, etc.) has been pro‐ posed, it is necessary to determine all important parameters. However, input quantities often vary or can attain values different from those assumed in design. Good design ensures that the important output quantities will always lie within the allowable limits. This can be

achieved by a suitable choice of nominal values of input quantities and by setting their tolerances.

The nominal values of input quantities form together the **design point**. Its position should ensure the low sensitivity of the output parameters to the deviations of input quantities from nominal values. This is called **robust design** [1]. Figure 1 illustrates its principle on an example with one input variable *x*: the design point 1 is with high sensitivity, whereas point 2 is with low sensitivity. One can see that the changes of the output quantity *y* around point 2 are much smaller than around point 1, in both cases for the same changes of *x*. This also means that acceptable scatter of *y* can sometimes be achieved with lower demands on the accuracy of input parameters. The reliability is influenced not only by the scatter of input quantities, but also by the position of design point. The ideal position, with the lowest sensitivity to the parameter variations, corresponds to an extreme of the response function *y* = *f*(*x*1, *x*2, ..., *x*n). Various optimization methods exist for finding this position, analytical or with computer modeling. Universal is the "simplex method", where the input variables approach the optimum step-bystep according to a simple algorithm [2, 3]. The graphical representation of the response is very informative. Also, the procedures of **design of experiments** (DOE) are suitable; see books by G. Taguchi and other authors [4 - 7]. The determination of optimum parameters should go hand in hand with the sensitivity analysis.

**Figure 1.** Principle of robust design. Note the influence of the position of the design point on the sensitivity of the out‐ put (*∆y*) to variations of input variable *x*.

#### **2. Sensitivity analysis**

After the design point has been found, the sensitivity analysis could be made to show the influence of the variations of input variables on the variability of the output [8]. The results may be used for setting the tolerances of input quantities to keep the output in the allowable range. The sensitivity analysis can be done using analytical expressions or simulation methods. The analytical expression for the output variable *y*,

$$y = f(\mathbf{x}\_1, \mathbf{x}\_2, \dots, \mathbf{x}\_n) \; \tag{1}$$

is known exactly only in simple cases (e.g. resonant frequency of an oscillator or deflection of a beam). Often, the **response function** must be found by numerical solution (e.g. using the finite element method). Then, an approximate expression is obtained by regression fitting the response computed for several combinations of input variables (Fig. 3 in Chapter 15).

The sensitivity analysis is usually done in two steps. First, the influence of individual variables is investigated. Several groups of computations are carried out, and in each group, only one variable (*xi* ) is changed, whereas the others keep their nominal values *x1,0*, *x2,0*, ..., *xn,0*, corre‐ sponding to the design point. Then, the *y* values for the individual groups are fitted by a suitable regression function (e.g. a polynomial),

$$y\_{i\ } = \ \mathbf{a}\_0 \ + \ \mathbf{a}\_{i1}\mathbf{x}\_i + \mathbf{a}\_{i2}\mathbf{x}\_i^2 + \mathbf{a}\_{i3}\mathbf{x}\_i^3 + \dots \tag{2}$$

or

$$y\_{i\_1} = \\_y\_0 + a\_i(\mathbf{x}\_i - \mathbf{x}\_{i,0}) + b\_i(\mathbf{x}\_i - \mathbf{x}\_{i,0})^2 + \dots \tag{3}$$

the latter expression characterizes the changes of *y* as a function of deviations of the *i*-th input variable from the design point. These regression functions correspond to the cuts through the response surface (Fig. 3 in Chapter 15). The sensitivity analysis will depend on whether the deviations are small or large.

#### **Small changes of the input and output quantities**

In this case, linear approximation of the response function may be used, which yields simple expressions. The **sensitivity of the response** to the variations of individual variables is obtained from partial derivatives at the pertinent point,

$$
\varepsilon\_{\parallel} = \varepsilon \quad \text{ } \hat{\mathcal{O}} \left| \hat{\mathcal{O}} \mathbf{x}\_{\parallel} \approx \Delta \mathbf{y} / \Delta \mathbf{x}\_{\parallel} \text{.} \tag{4}
$$

For linear approximation, the sensitivity coefficients *ci* correspond to the constants *ai,1* in (2) and *ai* in (3). Further information is obtained from **relative sensitivities**,

$$\mathbf{c}\_{ri} = \begin{array}{c} \mathcal{O}\mathbf{y}\_i \\ \hline \mathcal{O}\mathbf{x}\_i \end{array} \frac{\mathbf{x}\_{i,0}}{\mathbf{y}\_0} \qquad \approx \frac{\Delta \mathbf{y}}{\mathbf{y}\_0} \Big/ \frac{\Delta \mathbf{x}\_i}{\mathbf{x}\_{i,0}} \,\tag{5}$$

where *y0* and *xi,0* are the values corresponding to the design point. Coefficient *cri* expresses the change of *y* (in %, for example) caused by 1% deviation of *xi* from the nominal value *xi,0*. For linear approximation, *cr,i* = *ai* (*xi,0* /*y0*).

Generally, two kinds of sensitivity analysis can be made: (1) deterministic, which assumes that the deviations of individual quantities from nominal values have constant magnitude, and (2) stochastic, which assumes the random scatter of individual input quantities around their nominal values.

Both approaches will be illustrated on an example [9]. A cantilever flat spring of rectangular cross-section (Fig. 2) should be used in a precise measuring device. It is necessary to get an idea how the deviations of its individual dimensions and material properties from the nominal values will influence its compliance. The spring compliance *C* is given by the formula:

$$\mathbf{C} = \mathbf{y} / F = \ 4L^3 / \ \left( \mathbf{E} \mathbf{w} t^3 \right);\tag{6}$$

*y* is deflection, *F* is load, *L* is length, *E* is elastic modulus, *w* is spring width, and *t* is spring thickness.

**Figure 2.** Spring for a measuring device (a schematic).

#### **Deterministic analysis for small deviations**

**Figure 46.** Spring for a measuring device (a schematic). The increments of *y* are calculated via the first derivatives. The response surface is replaced by a tangent plane at the investigated point. For *y* = *f*(*x1*, *x2*,..., *xn*), the infinitesimal increment of *y* can be expressed generally as

$$\mathbf{d}y = \langle \partial y / \partial \mathbf{x}\_1 \rangle \mathbf{dx}\_1 + \langle \partial y / \partial \mathbf{x}\_2 \rangle \mathbf{dx}\_2 + \dots + \langle \partial y / \partial \mathbf{x}\_n \rangle \mathbf{dx}\_{n'} \tag{7}$$

where ∂*y*/∂*x*<sup>1</sup> expresses partial derivatives. For practical reasons, the differentials are replaced by small finite increments ∆,

$$
\Delta y = \langle \left\langle \left\langle \left\langle \left\langle \mathbf{\hat{x}}\_1 \right\rangle \right\rangle \Delta \mathbf{x}\_1 + \left\langle \left\langle \mathbf{\hat{y}} \right\rangle \left\langle \left\langle \mathbf{\hat{x}}\_2 \right\rangle \right\rangle \Delta \mathbf{x}\_2 + \dots + \left\langle \left\langle \mathbf{\hat{y}} \right\rangle \left\langle \left\langle \mathbf{\hat{x}}\_n \right\rangle \right\rangle \Delta \mathbf{x}\_n \right\rangle. \tag{8}
$$

In our example with the spring, the partial derivative of Equation (6) with respect to the first variable (*x*1 = *L*) is

$$\text{GCC}/\text{\"GL} = \text{\"SL}^2 \times 4 \text{\textdegree} \left( \text{Ewt}^3 \right) \\ = \left[ \text{\"} \text{L}^3 / \left( \text{Ewt}^3 \right) \right] \times \text{\"} \text{\"} \text{\"} \\ = \left( \text{\"} \text{\"} \text{\"} \text{\"} \right) \times \text{\"} \text{\"} \\ = \left( \text{\"} \text{\"} \text{\"} \text{\"} \right) \times \text{\"} \text{\"} \\ = \left( \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"} \text{\"}$$

and the increment of compliance due to a small increment of the beam length ∆*L* is thus

$$
\Delta \mathcal{C} = \mathcal{B} \mathcal{C} \{ \Delta L / L \} \,. \tag{10}
$$

The formulas for other variables are obtained in a similar way. The resultant expression, involving the changes of all variables, is

$$
\Delta \mathcal{C} = \mathcal{C} \{ \Im \Delta L / L - \Delta E / E - \Delta w / w - \ \Im \Delta t / t \} \,, \tag{11}
$$

and the relative sensitivity of the stiffness is

$$
\Delta \mathbf{C} / \mathbf{C} = \text{ } \mathbf{3}
\Delta \mathbf{L} / \text{ } L - \Delta \mathbf{E} \text{ } \text{ } \mathbf{E} - \Delta \mathbf{w} \text{ } \text{ } \mathbf{w} - \text{ } \mathbf{3}
\Delta \mathbf{t} \text{ } \text{ } \text{ } \tag{12}
$$

This formula shows the influence of individual quantities. If the spring will be longer by 1% than the nominal value, the compliance will be higher by 3%; if the elastic modulus *E* will be higher by 1%, the compliance will be lower by 1%, etc. The constants at individual terms correspond to their exponents in Equation (6), and the signs depend on whether the quantity was in the numerator or denominator.

This preliminary analysis reveals which input quantities have very small influence on the variability of the output quantity *y* and may thus be considered as constants in the following analysis of simultaneous random variance of the input quantities. However, one must always keep in mind that the variance of the output depends on both the sensitivity *c*<sup>i</sup> and the variance of the pertinent input quantity *x*<sup>i</sup> .

#### **Deterministic analysis for large deviations**

The above approach is acceptable if the response function is linear or if the errors due to approximation by linear function are small. If the response function is nonlinear and the investigated ranges of input quantities are not small, the errors will not be negligible (Fig. 3). In such case, it is better to study the influence of deviations of input quantities by modeling the response without simplifications. For example, the influence of *j*-th variable can be studied from Equation (1), in which only *x*<sup>j</sup> varies, whereas the others keep their values corresponding to the design point.

#### **Influence of random variability – small scatter**

The influence of random variability of input quantities can be investigated using the formula for the scatter of a function of several random variables. For small scatter,

The constants at individual terms correspond to their exponents in Equation (6), and the signs depend on whether the

This preliminary analysis reveals which input quantities have very small influence on the variability of the output quantity y and may thus be considered as constants in the following analysis of simultaneous random variance of the input quantities. However, one must always keep in mind that the variance of the output depends on both the sensitivity

The above approach is acceptable if the response function is linear or if the errors due to approximation by linear

(1), in which only xj varies, whereas the others keep their values corresponding to the design point.

modeling the response without simplifications. For example, the influence of j-th variable can be studied from Equation

Figure 3. Error caused by linear approximation of response function y = f(x). Influence of random variability – small scatter **Figure 3.** Error caused by linear approximation of response function *y* = *f*(*x*).

2 22

0 11 22 ... , n n y a ax ax ax =+ + + + (11)

2 22 2 2 2 2 11 22 ... ... <sup>y</sup> <sup>x</sup> <sup>x</sup> n xn s as as as = + ++ + (12)

the scatter is

quantity was in the numerator or denominator.

ci and the variance of the pertinent input quantity xi.

Deterministic analysis for large deviations

$$\left(\mathbf{s}\_{\mathbf{y}}\right)^{2} = \left(\frac{\partial \mathbf{y}}{\partial \mathbf{x}\_{1}}\right)^{2} \mathbf{s}\_{\mathbf{x}1}^{2} + \left(\frac{\partial \mathbf{y}}{\partial \mathbf{x}\_{2}}\right)^{2} \mathbf{s}\_{\mathbf{x}2}^{2} + \dots + 2\left(\frac{\partial \mathbf{y}}{\partial \mathbf{x}\_{1}}\right)\left(\frac{\partial \mathbf{y}}{\partial \mathbf{x}\_{2}}\right) \text{cov}(\mathbf{x}\_{1}, \mathbf{x}\_{2}) + \dots \tag{13}$$

where *sxj* 2 is the scatter of the *j*-th variable (quadrate of standard deviation). The far right-hand term is nonzero if the variables are correlated; often, it can be omitted. For linear approximation of *y*, where sxj<sup>2</sup> is the scatter of the j-th variable (quadrate of standard deviation). The far right-hand term is nonzero if the variables are correlated; often, it can be omitted. For linear approximation of y,

1 2 1 2

1 2 1 2 ... 2 cov( , ) ..., y xx y y y y s ss x x x x x x

∂ ∂ ∂ ∂

$$y = a\_0 + a\_1 \mathbf{x}\_1 + a\_2 \mathbf{x}\_2 + \dots + a\_n \mathbf{x}\_{n'} \tag{14}$$

(10)

the scatter is

$$\left|\mathbf{s}\_{y}\right|^{2} = \left|\mathbf{a}\_{1}^{2}\mathbf{s}\_{x1}\right|^{2} + \left|\mathbf{a}\_{2}^{2}\mathbf{s}\_{x2}\right|^{2} + \dots + \left|\mathbf{a}\_{n}^{2}\mathbf{s}\_{xn}\right|^{2} + \dots \tag{15}$$

The individual components, *syj* 2 = *aj* 2 *sxj* 2 , give the scatter of *y* caused by random variations of *j*th variable. Similarly to deterministic analysis, the contribution of a certain variable *xj* to the total scatter is larger for large scatter of this variable (*sxj* 2 ) and for large sensitivity (*aj* ) of the output *y* to its changes.

The expression obtained by dividing Equation (10) or (12) by the total scatter *sy* 2 gives the relative proportions of individual factors in the total scatter,

$$\mathbf{1} = \left| a\_1^2 \frac{\mathbf{s}\_{\times 1}^2}{\mathbf{s}\_y^2} \right. + \left. a\_2 \frac{\mathbf{s}\_{\times 2}}{\mathbf{s}\_y^2} \right. \\ \left. + \dots \right. + \left. a\_n \right. \frac{\mathbf{s}\_{\times n}^2}{\mathbf{s}\_y^2} \right. + \dots \tag{16}$$

The square root of scatter (10) is the standard deviation *s*y. If the input quantities have normal distribution, the confidence interval for the output quantity *y* can be calculated as

$$\mathbf{y}\_{\text{lower,upper}} = \mathbf{y}\_0 \pm \boldsymbol{\mu}\_a \mathbf{s}\_y;\tag{17}$$

the + or – sign corresponds to the upper (or lower) confidence limit and *u<sup>α</sup>* is the *α*-critical value of standard normal distribution. The probability that *y* will lie out of these limits is 2*α*.

If Formula (12) is applied on the above example with a spring, one obtains the following expression for the standard deviation of the compliance caused, for example, by random variability of the length *L:*

$$s\_{\rm CL} = \Re \mathcal{C} \langle \mathbf{s}\_{\rm L} \, \! / \, \mathbf{L} \rangle;\tag{18}$$

cf. Equation (11). Similar expressions can be written for other variables. The random variability of all input quantities causes the following variability of the spring compliance:

$$\mathbf{s}\_{\mathbb{C}} = \left[ \left( \mathbf{\mathcal{K}} / \boldsymbol{L} \right)^{2} \mathbf{s}\_{\mathbb{L}}^{2} + \left( \mathbf{\mathcal{C}} / \boldsymbol{E} \right)^{2} \mathbf{s}\_{\mathbb{E}} \right]^{2} + \left( \mathbf{\mathcal{C}} / \boldsymbol{w} \right)^{2} \mathbf{s}\_{\mathbb{w}}^{2} + \left( \mathbf{\mathcal{K}} / \boldsymbol{t} \right)^{2} \mathbf{s}\_{\mathbb{t}}^{2} \mathbf{l}^{1/2}. \tag{19}$$

The ratio of the standard deviation of a quantity and its mean is the variation coefficient,

$$
v = \sigma \mid \mu,\tag{20}$$

so that the combination of Equations (16) and (17) gives the variation coefficient of the compliance,

$$
\upsilon\_{\mathbb{C}} = \text{s}\_{\mathbb{C}} / \mathbb{C} = \left\{ \mathfrak{P} \upsilon\_{\mathbb{L}}^2 + \upsilon\_{\mathbb{E}} \, ^2 + \upsilon\_w \, ^2 + \Re \upsilon\_t \, ^2 \right\}^{1/2}. \tag{21}
$$

#### **Stochastic analysis for large scatter**

The above approach, based on the linearization of the response function, is suitable for small values of variance coefficients of input quantities, say *v*<sup>j</sup> ≤ 10%. If their scatter is large, it is better to study the influence of variability or deviations of input quantities by the Monte Carlo simulation method. A preliminary assessment consists of making *m* simulation experiments with random variable only *xj* , for *j* = 1, 2,... *n*, and then calculating partial scatter *syj* 2 of the obtained values *y*. Using the characteristics *sxi*, *xi,0*, and *y0,* one can determine the variation coefficients *v*<sup>j</sup> or the sensitivity coefficients *aj* (= *sy*/*sxj*).

The approximate value of the total scatter is obtained by summing up the partial scatters,

$$\left|\mathbf{s}\_{y}\right|^{2} = \left|\mathbf{s}\_{y1}\right|^{2} + \left|\mathbf{s}\_{y2}\right|^{2} + \dots + \left|\mathbf{s}\_{yu}\right|^{2} + \dots \tag{22}$$

More accurate value is obtained if all input variables, *x1*, *x2*, ..., *xn* are considered as random in the Monte Carlo simulations, and the scatter is calculated from all values *yj* . Dividing Equa‐ tion (19) by the total scatter *sy 2* gives the relative influence of individual factors, like in Equa‐ tion (13).

### **3. Determination of tolerances of input quantities**

If the variability or deviation of the output quantity *y* from the nominal value is larger than allowed, it must be reduced. The procedure depends on whether the variability is random or deterministic.

#### **Deterministic deviations**

If the deviation of *y* is caused by the deviation of one or more input quantities, Equation (12) or (12), showing the contribution of individual factors to the total deviation of *y*, can be used to decide which factor should be aimed at. Let us assume that the deviation of *y* in Equation (12) is caused only by the deviation of *x*<sup>j</sup> . The allowable magnitude of ∆*x*<sup>j</sup> , ensuring that the deviation of *y* does not exceed ∆*y*, is

$$
\Delta \mathbf{x}\_{\rangle} \le \Delta y / \left( \partial y / \partial \mathbf{x}\_{\rangle} \right) \,. \tag{23}
$$

For example, the allowable length tolerance of the above spring, ensuring the compliance tolerance ∆*C*, is

$$
\Delta L = \Delta \mathbb{C} \{ L / \mathfrak{K} \} \text{ , or } \Delta L / L = \{ 1/3 \} \left( \Delta \mathbb{C} / \mathbb{C} \right) \text{ .} \tag{24}
$$

The tolerances of other quantities can be determined in similar way. One must respect that the deviations of some input quantities influence the output in one direction, whereas the deviations of other quantities can have the opposite influence. Generally, the deviations of *y* depend on the deviations of input quantities and also on the sensitivity of *y* to the changes of *xj* . The reduction of the tolerance of *y* can thus be accomplished by tightening the tolerances of individual input quantities or by changing the position of the design point towards lower sensitivity. The decision will also depend on the costs related to the individual adjustments.

#### **Random variability of input quantities**

The following analysis assumes that the range of probable occurrence of *y* (i.e. the half-width *∆yα* of the *α*-confidence interval for *y*) is directly proportional to the standard deviation *sy*, equal to the square root of the scatter. In production, the allowable limits of a quantity *x* are also often determined as *xnom* ± *ksx*, where *k* is a constant (e.g. a suitable quantile of standard normal distribution). With this assumption, the tolerance of *y* can be reduced from *∆yα* to *∆yα'* by reducing the standard deviation of *y* from the original value *sy* to *sy'*. This may be accom‐ plished by the reduction of the variance or influence of input factors.

Often, the influence of one factor prevails (e.g. *xk*). In such case, most of the scatter of *y* can be reduced by reducing its component due to this factor. As it follows from Equation (12), the scatter of *y* can be reduced by reducing the standard deviation *sx,k* or the sensitivity of *y* to the changes of *xk* (coefficient *ak*). The reduction of variance of *yk* can be attained by more accurate manufacturing or by better control and sorting out the components that are out of the tolerance limits. The reduction of sensitivity of *y* to the changes of *xk* can be accomplished by changing the parameters of the design point (Fig. 1). An example is a prestressed flange connection in steam turbines: the use of long bolts increases the compliance of the joint and reduces the sensitivity of the prestress to the variations of pressure in the pipe and thermal dilatations of the flanges. Sometimes, both ways, the reduction of *sx,k* and *ak* is combined.

If several input variables vary, one must decide, which of them should be reduced. As the standard deviation equals the square root of the scatter, it is obvious that the reduction of scatter of a quantity, contributing to the total scatter by only 5% to 10%, will have negligible effect. Also, the costs of the pertinent improving operation must be considered, as they usually increase with tightening the tolerances.

After having obtained the corrected standard deviation *sxi'*, the lower (*L*) and upper (*U*) allowable limit for the input quantity *xj* can be determined as

$$\mathbf{x}\_{i,L,ll} = \begin{array}{c} \mathbf{x}\_{i,0} \ \pm \quad \mathbf{k}\_{L,ll} \mathbf{s}\_{x\,\prime\prime} \end{array} \tag{25}$$

*xj,0* is the nominal (design) value and *k* is a constant (e.g. 5% quantile of standard normal distribution). *kL* corresponds to the lower limit, whereas *kU* corresponds to the upper limit.

The above optimization can be performed even if the scatter *sy* <sup>2</sup> from the preliminary design is smaller than the allowable value. The optimization assigns such tolerances that the total costs are minimal. Sometimes, the tolerances may even be made wider, with lower costs.

Often, the scatter of some input quantities cannot be changed continuously. In such cases, the response must be evaluated for each possible value of every discontinuous quantity.

The determination of suitable tolerances will be illustrated on the following example, adapted from [9].

#### **Example 1**

A cantilever microbeam from Figure 2, with length *L* = 10 mm, width *w* = 1.0 mm and thickness *t* = 50 μm, made of a material with elastic modulus *E* = 200 GPa, has compliance *C* = 4*L*<sup>3</sup> /(*Ewt*<sup>3</sup> ) = 0.16 mm/mN. Each input quantity has coefficient of variation *vl* = *vw* = *vt* = *vE* = *v* = 0.01 = 1%. The variation coefficient of the compliance, Equation (17), is *vC* = (9*vL* 2 + *vE* 2 + *vw* 2 + 9*vt* 2 ) 1/2 = 0.0447, and the standard deviation *sC* = *CvC* = 0.00716 mm/mN. Such variation of compliance is unacceptably high and must be reduced to *sC*' = 0.004 mm/mN.

Solution. The corresponding reduced variation coefficient is *vC*' = *sC*'/*C* = 0.004/0.16 = 0.025. It is possible to reduce the scatter of *L*, *w*, and *t*; the material (*E*) remains unchanged. The easiest way is to reduce the scatter of *L*. However, even if this scatter were zero, the variation coefficient of compliance would be *vC*' = 0.033, which is much more than demanded. Therefore, the variance of all three quantities (*L*, *w*, and *t*) must be reduced by more accurate manufacturing. If the new variation coefficients of *L*, *w*, and *t* would have the same value, *vL*' = *vw*' = *vt* ' = *v*', this value *v*' can be calculated from the modified Equation (18):

$$
\boldsymbol{\upsilon}\_{\boldsymbol{\zeta}} \text{ ' = [\boldsymbol{\Theta}\boldsymbol{\upsilon}\prime^{2} + \boldsymbol{\upsilon}\_{\boldsymbol{\varepsilon}}\prime^{2} + \boldsymbol{\upsilon}\prime^{2} + \boldsymbol{\Theta}\prime\prime^{2}]\prime^{1/2}.\tag{26}
$$

With the variation coefficient of elastic modulus unchanged, *v*E = 0.01, the new coefficients of variation of *L*, *w*, and *t* must be reduced to *v*' ≤ 0.005256, which is approximately *v*' = 0.005. The corresponding allowable standard deviations, obtained by multiplying the variation coeffi‐ cient *v*' by the nominal values of *L, w*, and *t*, are *sl* ' ≤ 0.05 mm, *sw*' ≤ 0.005 mm, and *st* ' ≤ 0.25 μm. In the limit case, *vC*' = 0.024 and *sC*' = 0.0038 m/N. However, the tolerances of individual dimensions could be adjusted with respect to the manufacturing possibilities, the principal condition being *sC*' ≤ 0.004 m/N.

#### **4. Uncertainties in ensuring safety and lifetime using proof testing**

If the high reliability of a certain object must be ensured, a proof-test is often used: the component is exposed to some overload, specified so that only sufficiently strong components survive it; the weaker ones are destroyed. In the same way, sufficient lifetime can be ensured for components made of brittle materials suffering by static fatigue. The minimum time to failure of a component that has passed a proof-test is [10 - 12]

$$t\_{\rm min} = \frac{2\sigma\_{pt}^{N-2}}{(N-2)AY^2\mathcal{K}\_{\hbar^C}^{-N-2}\sigma\_0^{N}}\,'\tag{27}$$

where *K*IC is the fracture toughness of the material, *N* and *A* are the parameters of subcritical crack growth, *Y* is the geometrical factor of typical crack, responsible for fatigue failure, *σ*0 is the characteristic operational stress (assumed constant), and *σ*pt is the proof-test stress. A rearrangement of Equation (24) gives the formula for the proof stress needed to guarantee the minimum lifetime:

$$
\sigma\_{pt} = \sigma\_0^{N\not\{N-2\}} \, K\_{\text{IC}} \left[ \frac{N-2}{2} \, AY^2 t\_{\text{min}} \right]^{1/(N-2)} \,. \tag{28}
$$

However, KIC, N, and A were determined by measurement and are known only approximately and Y was estimated. Therefore, it is recommended to perform sensitivity analysis and correct the proof stress appropriately. The pertinent theory, based on probabilistic analysis, is explained in [11, 12] or in [10]. For easier application, strength-probability-time diagrams were developed [13 – 15], in which the necessary proof stress can be found for the demanded time to failure and confidence level.

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


## **Chapter 20**

## **Reliability Testing and Verification**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62377

#### **Abstract**

This chapter describes various methods for reduction of uncertainties in the determina‐ tion of characteristic values of random quantities (quantiles of normal and Weibull distri‐ bution, tolerance limits, linearly correlated data, interference method, Monte Carlo method, bootstrap method).

**Keywords:** Random quantity, uncertainty, normal distribution, Weibull distribution, tol‐ erance limits, correlation, interference method, Monte Carlo method, bootstrap method

Reliability tests are often indispensable. The material properties, needed in design, can only sometimes be found in data sheets. If they are not available, they must be obtained by test‐ ing, for example the strength of a new alloy or concrete or the fatigue resistance of a vehicle part. Also, the manufacturers of electrical components must provide the reliability data for catalogs (e.g. the failure rate and the data characterizing the influence of some factors, such as temperature or vibrations). It is also impossible to predict with 100% accuracy the proper‐ ties of a new bridge, an engine or a complex system consisting of many parts, whose proper‐ ties vary more or less around the nominal values. In all these cases, tests are often necessary to verify whether the object has the demanded properties or if it conforms to the standards. Also, the information on loads (e.g. wind velocities in an unknown area) must often be ob‐ tained by measurement.

The reliability tests can be divided into two groups: those for providing detailed informa‐ tion on properties of new materials or components, and those for the verification of the expected values. The former are more extensive, as they must provide the mean value and statistical parameters characterizing the random variability. The extent of verification tests is smaller.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

In this chapter, the reliability tests of mass-produced components will be described first, followed by the tests of large or complex structures or components and the tests of strength and fatigue resistance.

### **1. Testing of mass-produced electrical and mechanical components**

The most important reliability characteristics are the mean failure rate and the mean time to failure or between failures. The tests can be done so that several components are loaded in a usual way (e.g. by electric current), and the times to failure of individual pieces are measured. As the time to failure of some samples can be very long, the test is sometimes terminated after failure of several pieces, at time *t*<sup>t</sup> . The total cumulated time of operation to failure is calculated generally as

$$\mathbf{t}\_{\text{tot}} = \sum \mathbf{t}\_{f,\text{j}} + m\mathbf{t}\_{\text{t}} \quad \mathbf{j} = \mathbf{1}, \dots, r,\tag{1}$$

where *t*f,j is the time to failure of *j*-th piece, *r* is the number of failed specimens, and *m* is the number of pieces that have survived the test, whose duration was *t*<sup>t</sup> . The total number of all checked samples is *n* = *r* + *m*. If all pieces have failed during the test, *m* = 0 and the term *mt*<sup>t</sup> falls out. (Also other test arrangements are possible, for example with replacing the failed pieces by good ones; see [1] or the corresponding IEC standards listed in Appendix 2. The mean time to failure is calculated as

$$\text{MTTF} = \mathfrak{t}\_{\text{tot}} / r \,\tag{2}$$

The individual times to failure vary, and this must also be characterized. If failures occur due to various reasons, an exponential distribution of times to failure is often assumed. A simple check of this is the standard deviation *σ*. For exponential distribution, the standard deviation has the same value as the mean *μ* (in an ideal case; in real tests it can somewhat differ). If the difference between *μ* and *σ* is larger, a statistical test should be made to check whether the exponential distribution is suitable. Common for this purpose are the goodness-of-fit tests (e.g. Kolmogorov-Smirnov or the *χ*<sup>2</sup> test); see [2 - 4]. If exponential distribution is not suitable, another distribution can be better, e.g. Weibull.

If an exponential distribution is acceptable, the estimate of the mean failure rate can be obtained easily as the reciprocal value of the mean time to failure,

$$
\overline{\lambda} = \mathbf{1} / \text{MTTF}.\tag{3}
$$

The two-sided confidence interval for the true mean failure rate *λ* is [1]:

Reliability Testing and Verification http://dx.doi.org/10.5772/62377 157

$$
\lambda\_{\perp} = \frac{\chi^2\_{\perp - a/2}(2r)}{2r} \overline{\lambda} \le \lambda \le \frac{\chi^2\_{\perp a/2}(2r)}{2r} \overline{\lambda} = \lambda\_{\perp}. \tag{4}
$$

if the testing was terminated at the *r*th failure, and

$$
\lambda\_{\perp} = \overline{\lambda} \frac{\chi^2\_{\perp - a/2} \langle 2r \rangle}{2r}, \quad \lambda\_{\perp} = \overline{\lambda} \frac{\chi^2\_{\perp a/2} \langle 2r + 2 \rangle}{2r} \tag{5}
$$

if the testing continued come time after the *r*th failure. In Equations (4) and (5), *λ* is the calculated mean value of *λ*, the subscripts *L* and *U* denote the lower and upper confidence limit, *χ*<sup>2</sup> 1 –α/2(2*r*) is the (1-*α*/2)-critical value of the chi-square distribution for 2*r* degrees of freedom, *χ*<sup>2</sup> α/2(2*r*) is the α/2-critical value for 2*r* degrees of freedom, and *χ*<sup>2</sup> α/2(2*r*+2) is the α/2 critical value for 2*r*+2 degrees of freedom. The probability that *λ* will lie within this confidence interval is *γ* = 1 – *α*. Often, we are interested only in the maximum expectable failure rate; the pertinent formula for the upper limit of one-sided interval is

$$
\lambda\_{\mathsf{U}} = \frac{\mathsf{X}^{\mathsf{Z}}\,\_{a}(2r)}{2r} \; , \tag{6}
$$

the probability that the actual failure rate will be higher is now *α*. As the mean time to failure failure is the reciprocal of the failure rate, the corresponding two-sided confidence interval for the mean time to failure is obtained as

$$t\_L = \frac{2r}{\mathcal{X}\_{\
u/2}^2(2r)}\\
\text{MTTF} \le \overline{t} \le \frac{2r}{\mathcal{X}\_{1-a/2}^2(2r)}\\
\text{MTTF} = t\_{\mathcal{U}} \tag{7}$$

if the test was terminated after the *r*th failure (and analogously for a longer test).

The determination and importance of confidence limits will be illustrated on the following examples.

#### **Example 1**

Ten electrical components were tested to determine the failure rate. The tests were terminated after *t*<sup>T</sup> = 500 h. During this time, six components failed (*r* = 6), in times: 65, 75, 90, 120, and 410 h. Four components survived the test. Estimate the mean time to failure and failure rate and construct two-sided confidence intervals (for confidence *α* = 90%).

Solution. The mean value and standard deviation of the times to failure of the six failed components were 168.66 and 136.33 h, respectively. It is thus possible to assume exponential distribution.

The cumulated duration of tests, calculated after [1], was:

$$t\_{\text{tot}} = \sum\_{i=1}^{6} t\_i + 4 \times t\_i = 60 + 75 + 90 + 120 + 250 + 410 + 4 \times 500 = 3010 \text{ h.}$$

The mean time to failure is *t*mean = *t*tot/*r* = 3010/6 = 501.67 h, and the mean failure rate is *λ*mean = *<sup>λ</sup>*¯ = 1/*t*mean = 1/501.67 = 1.993 × 10-3 h-1.

The lower and upper confidence limits for *λ* were calculated, with respect that the tests were terminated before the failure of all samples, according to Equation (5). With *r* = 6 and *α* = 10%, the critical values are *χ*<sup>2</sup> 0.95(12) = 5.226 and *χ*<sup>2</sup> 0.5(14) = 23.685. Inserting them, together with *λ*mean = 1.993 × 10-3 h-1 into (5) gives *λ*<sup>L</sup> = 8.68 × 10– 4 h– 1 and *λ*<sup>U</sup> = 3.93 × 10–3 h– 1. The confidence limits for the mean time to failure are *t*L = 1/*λ*<sup>U</sup> = 254.4 h and *t*U = 1/*λ*<sup>L</sup> = 1152.1 h. The mean time to failure thus can lie within the interval *t*mean ∈ (254 h; 1152 h).

As we can see from this example, the confidence interval obtained from only six failures is very wide. If it should be narrower (to get more accurate estimate), it is necessary either to make a longer test so that more parts of the tested group fail or to increase the number of parts tested simultaneously or both.

#### **Example 2**

The above testing has continued until the time *t*<sup>t</sup> = 1000 h. During this time, two more pieces failed, at the times *t*7 = 520 h and *t*8 = 760 h.

Solution. The same procedure as above has given the following results: *Τ* = 4290 h and *r* = 8, so that the mean time to failure is now *t*mean = *t*<sup>t</sup> /*r* = 4290/8 = 536 h and the mean failure rate *λ*mean = 1/536 = 1.865 × 10– 3 h– 1. Also, the confidence interval will respect that more pieces have failed. The critical values now are *χ*<sup>2</sup> 0.95(16) = 7.962 and *χ*<sup>2</sup> 0.5(18) = 28.869. With all these values, the lower and upper limits of failure rate are *λ*L = 9.28×10– 4 h– 1 and *λ*U = 3.4×10– 3 h– 1. The mean time to failure *t*mean thus can be expected to lie within the interval (297 h; 1078 h).

The whole test lasted twice as long as the previous one, but the new confidence interval is only slightly narrower. If significantly more accurate estimates should be achieved, much longer tests or with substantially higher number of tested pieces must be done. Thus, when preparing the tests for the determination of failure rate, one should estimate in advance the duration of the test, the number of tested pieces, and the number of pieces that can fail — all this for the acceptable probability *α* that the actual maximum failure rate would be higher than that obtained from the test.

The rearrangement of the expression for the upper limit of confidence interval for *λ* gives the following relationship between the expected failure rate *λ*0, the number of tested samples *n*, test duration *t*<sup>t</sup> , and the number of failed components *r* [1]:

$$\text{out}\_t = \chi^2\left(2r\right) \;/\; \left(2\lambda\_0\right). \tag{8}$$

If the number of failed samples does not exceed *r*, the actual failure rate is not higher than *λ*0, the risk of wrong prediction being *α*.

As it follows from the product *n* × *t*<sup>t</sup> in Equation (8), the number of tested parts *n* is equivalent to the test duration *t*<sup>t</sup> . This means that the same information can be obtained by testing, for example, 10 specimens for 1000 h or 1000 specimens for 10 h. If the tested objects are expensive, one would prefer testing fewer specimens for longer time. However, at least several pieces should always be tested to reduce the risk that the only piece chosen at random for the test was especially good or especially bad.

The following table, based on Equation (8), shows the values of the product *n* × *t*<sup>t</sup> for the various numbers of failed parts during the tests; the probability of a wrong result is *α* = 10%.


**Table 1.** Extent of tests for various failure rate and the number of failed pieces.

For example, the reliability testing of components with assumed exponential distribution, failure rate *λ* = 10-4 h-1 and the test terminated after the fifth failure, needs *n* × *t*<sup>t</sup> = 79,936 ≈ 80,000 pieces × hour. Thus, for example, 100 components should be tested 800 h or 800 components for 100 h. If the expected failure rate were *λ* = 10-6 h-1, then *n* × *t*<sup>t</sup> ≈ 8,000,000 pieces × hour, so that 10,000 components must be tested for 800 h or 100 components for 80,000 h. One can see that testing for proving the reliability of very reliable components becomes very difficult or impracticable. Therefore, various **accelerated tests** are often used. One way, suitable for the items working periodically with pauses between the operations, such as switches or valves, eliminates the idle times: the switch is permanently switched on and off.

Another way to obtain the demanded reliability information sooner uses a higher intensity of load (e.g. higher mechanical load, higher electric stress or electric current, or more severe environment (e.g. higher temperature or vibrations). If this approach should be effective, one must know the mechanism of degradation and the relationship between the load intensity and the rate of degradation. For example, the rate of chemical processes, which are the cause of some failures, often depends on the temperature according to the Arrhenius equation:

$$r = C \exp\left(-\frac{\Delta E}{kT}\right) \tag{9}$$

*C* is a constant, *∆E* is the activation energy, *k* is the Boltzmann constant, and *T* is the absolute temperature (K). If the times to failure have exponential distribution, the failure rates or times to failure are related with the absolute temperatures as follows [1]:

$$\frac{\mathcal{A}\_1}{\mathcal{A}\_2} = \frac{t\_2}{t\_1} = \left[ \frac{\Delta E}{k} \left( \frac{1}{T\_2} - \frac{1}{T\_1} \right) \right] \tag{10}$$

Equation (10) can be used for the determination of necessary temperature change from *T*1 to *T*2 if the test duration should be reduced from *t*1 to *t*2.

Similarly, the number of cycles to fatigue failure of periodically loaded components can be reduced by increasing the characteristic stress or load amplitude *P*. The basic relationship, based on the Wöhler-like curve [Equation (1) in Chapter 6], is

$$\mathbf{t}\_1 \, \mathbf{t}\_2 = \mathbf{C} \begin{pmatrix} P\_1 \ \' P\_2 \end{pmatrix}^n; \tag{11}$$

*C* and *n* are constants for a given material and environment. Similar relationships can be used for finding the increased load for shortened tests of components exposed to creep or static fatigue (stress enhanced corrosion), with rates depending on some power of the load.

Today, mass-produced electronic and electrical components are tested in special chambers and under special conditions enabling acceptably short duration of the tests. More about these tests, denoted **HALT** (for highly accelerated life testing) or **HASS** (for highly accelerated stress screening), can be found in the literature, for example [5].

#### **Sorting tests**

These tests aim at sorting out "weak" items that could fail shortly after being put into service. However, they must not cause excessive degradation of properties in "good" components (i.e. they should not shorten their life significantly). Sorting tests can be nondestructive or destruc‐ tive. **Nondestructive tests** use visual observation, X-ray, ultrasound or magnetic inspection, and special electrical or other measurements. **Destructive tests** can be arranged in several ways, for example **proof tests** that use short-time overloading by mechanical or electrical stress exceeding the nominal value so that the weak parts are destroyed during the test. Other ways for revealing the weak parts are artificial aging under increased temperature, cyclic loading by varying temperatures (this causes additional thermal stresses that can reveal hidden defects or weak joints), the use of burn-in period with 75% to 100% of nominal load acting several tens of hours before putting into service, special kinds of mechanical loading, such as impacts, vibrations of certain amplitude, and frequency, overloading of rotating parts by centrifugal forces and others.

## **2. Acceptance sampling**

This operation, common in series production, ensures that only those batches of items will be released to the customer or to the next operation, which are either perfect or contain only very small proportion of out-of-tolerance parts. Before this control is introduced, a test plan must be prepared, which contains:


Generally, three approaches are used:


than LQL. The probability *β* is called **customer's risk** and means the risk that an unsatis‐ factory lot will be accepted as good. On the other hand, also a **producer's risk** *α* exists, such that a good lot, with less defective pieces than AQL, will be rejected. Usually, 5% or 1% is chosen for both *α* and *β*.

The curve showing how the probability of accepting the lot decreases with increasing propor‐ tion of defectives in the sample is called the **operating characteristic curve** (OCC). Figure 1 shows examples of OCCs for two different decisive numbers.

The rejected batch is either discarded or 100% checked. In the latter case, the good pieces are added to other good items. This makes the average quality of the batches composed in this way better, so that the quality demands in the tests may slightly be reduced.

**Figure 1.** Operating characteristic curve (OCC). *P -* probability of acceptance; *p -* percentage of defectives in the popu‐ lation; *α –* producer's risk; *β –* customer's risk. Subscripts 1 and 2 denote curves OCC1 and OCC2.

Also other schemes exist. For example, a **double sampling scheme** uses two decisive numbers, *c*1 and *c*2. If the number *z* of defectives in the first sample is smaller than *c*1, the lot is accepted, and if it is higher than *c*2, it is rejected. If *c*1 < *z* ≤ *c*2, another sample is taken and the total number of defective in both samples is checked, etc. Further modifications, such as multiple sampling or sequential sampling, exist as well. For more, see [6].

However, doubts are sometimes cast on the cost-effectivity of statistical control. On the one hand, this control costs money. On the other hand, losses can arise due to possible defective pieces hidden in the batches checked as good. Deming [7] has pointed out that if the cost for inspection of one piece is *k*<sup>1</sup> and the average cost of a failure caused by not inspecting is *k*2 and the average fraction of defective is *p*, then, if *pk*2 < *k*1, the lowest total costs (control costs plus costs caused by failures) will be achieved without any testing. If *pk*2 > *k*1, full (100%) inspection should be used, especially for higher ratios *pk*2/*k*1. However, the situation is often not so simple; the fraction *p* of defectives can vary, 100% testing can be impracticable for too high investment costs or if all tests end with destruction, etc.

The statistical acceptance was very popular in the second half of the 20th century but not so much today. There are two reasons: the demands on quality and reliability are much higher today than 50 years ago and the allowable probabilities are often of the order 1:106 , much lower than the degree of confidence common in statistical sampling. Moreover, the controlling devices are much more powerful today. The incorporation of automated test equipment (**ATE**) into production line enables 100% control.

## **3. Testing of large structures and complex components**

These tests will be illustrated on two cases: bridges and large components exposed to fatigue, such as parts of heavy vehicles (e.g. locomotives).

The assumed service life of road and railway bridges is many tens of years and sometimes more. During this time, the structure deteriorates and its safety decreases. Also, the loading pattern can change in a long time (new kinds of vehicles and changed traffic demands). For these reasons, bridges must sometimes be repaired or reconstructed. In such case, thorough inspections are done at suitable time, including **load tests** in important cases. In these tests, the bridge is usually loaded by a group of trucks loaded by sand or concrete blocks as much as possible so that the load-carrying capacity of the bridge is attained. During the tests, deformations and stresses at selected points are measured and compared with the values obtained by computer analysis of the structure – to see if the actual response (e.g. deflection of some parts of the bridge) corresponds to the assumed response. In some cases, dynamic properties are also studied (i.e. the response to periodic or dynamic loading). If the actual condition is worse than allowed, measures must be taken for improvement.

Large parts of mechanical structures, such as vehicles or aircrafts (sometimes these objects as a whole), are mechanically loaded with the purpose to find whether the actual response (deformations and stresses at selected points) corresponds to the values assumed in design. Also, dynamic response is investigated. Exceptionally, the object is loaded until the destruc‐ tion. In the past, the measurements were often the only reliable source of information of the stresses and behavior. Today, the methods of stress analysis are much better and much information can be obtained by computer simulation as early as in the design stage. Therefore, today the tests serve rather for confirmation whether the demanded parameters have been achieved.

The test loads are often imposed by electrohydraulic cylinders attached to the tested object. Often, special test stands are used, consisting of a massive frame with hydraulic cylinders, clamping equipment, and a controlling unit. The work of the stand is controlled by a computer. This enables one to program the demanded loading sequences. Sometimes, the load program is based on a record made during a test vehicle driving on real roads or on a test track containing typical examples of road surfaces. The test vehicle is equipped with sensors (usually strain gauges fixed at certain points of the car body) and the measured data are recorded. These data must be transformed to the data for the control of the load cylinders of the testing stand. The reason is that these cylinders are often attached to the tested structure at different points than were those used in the test vehicle driven on the track. Also, the data recorded with one test vehicle are sometimes used for the testing of other types of vehicles. The test stand can repeat the recorded load sequence again and again, so that also fatigue resistance can be tested in this way.

## **4. Tests of strength and fatigue resistance**

These tests are often arranged according to various standards. In this paragraph, we thus limit our attention to some probabilistic aspects of these tests.

Strength tests. The individual values vary, so that the number of tests should be adjusted to the purpose of the measurement and to the scatter of individual values. If only approximate information on the average strength is needed, three tests may be sufficient; the standard deviation can serve for the estimation of confidence interval of the mean strength. However, especially for brittle materials with high scatter of individual values, the knowledge of the "minimum" strength is often demanded. This is determined as a lowprobability quantile. For this purpose, more tests must be done, often several tens or more. From these tests, the parameters of strength distribution are determined. Often, Weibull distribution, but also lognormal distribution, is assumed. The determination of parameters and quantiles of Weibull distribution was described in Chapters 11 and 18. The parameters of log-normal distribution are found in several steps. In the first step, logarithms are taken from the measured values, then the average and standard deviation are calculated from the transformed data, and finally they are transformed back to the original system of units. The question of which distribution is better can be solved by means of statistical tests of goodness of fit [2 – 4].

Generally, many values are necessary to obtain reliable values of low-probability quantiles of strength. (Remember that 1% quantile corresponds to the minimum of 100 values.)

Fatigue tests. The main purpose of fatigue tests is the determination of fatigue limit (if it exists) and finding the relationship between the characteristic stress (*S*) and the time or number of cycles to failure (*N*<sup>f</sup> ). As for the fatigue limit, everything from the above paragraph on strength tests remains valid. The *S N* relationship is obtained by making the tests under various characteristic stress amplitudes and fitting the data by a suitable function, for example [8, 9]:

$$N\_f = A \text{ S}^{\text{-}w} \text{ } \tag{12}$$

or a similar expression. Now, two possibilities exist depending on the number of tests that were done or could be done with respect to the available money and time. If only several tests have been performed, all measured *N*<sup>f</sup> (*S*) values are fitted by the regression function (12). The consequences of the scatter of individual values are depicted in Figure 1 in Chapter 18. The regression function, obtained by the least-squares method, gives such *N*<sup>f</sup> values that proba‐ bility 50% exists that the true number of cycles to failure under a chosen stress will be 50% lower (!) than the number obtained from the regression function. The "safe" *N*f,α values, for which acceptably low probability *α* would exist that the component or construction can fail earlier, may be found as boundary values of the pertinent confidence band for all *S*-*N* data; see Chapter 18.

If more values (e.g. tens) are available for each stress level, a more accurate procedure can be used. The data for individual stress levels are rank-ordered in ascending order. Each value corresponds to some quantile of time to failure for a given stress level. For example, the shortest time of 10 values obtained for the same stress corresponds approximately to 10% quantile of the time to failure. Now, only the *N*f,*<sup>α</sup>*(*S*) values, corresponding to the same quantile *α*, are fitted by regression function (12). The "safety" of the prediction of the time to failure with this function equals 1 – *α*. It is also possible to fit all measured data by function of type (12) with additional parameters characterizing the probability that the actual number of cycles to failure will be lower than that calculated via modified Equation (12).

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


## **Multicriterial Condition Evaluation and Fuzzy Methods**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62380

#### **Abstract**

This chapter describes various methods for reduction of uncertainties in the determina‐ tion of characteristic values of random quantities (quantiles of normal and Weibull distri‐ bution, tolerance limits, linearly correlated data, interference method, Monte Carlo method, bootstrap method).

**Keywords:** Random quantity, uncertainty, normal distribution, Weibull distribution, tol‐ erance limits, correlation, interference method, Monte Carlo method, bootstrap method

Large and long-life engineering structures, such as bridges, dams, buildings, or cooling towers, deteriorate gradually due to corrosion, mechanical damage, fatigue, and other processes. As a consequence, their safety slowly decreases, and after some time, they must be repaired. Every decision of a repair must be based on a good knowledge of the actual technical condition. This is gained from regular inspections. However, it is impossible to characterize the overall condition of a complex object by means of only one simply measurable quantity. It is influenced by many factors, and most of them can be characterized only verbally (e.g. slightly corroded reinforcement or many short cracks in the wall). Probabilistic and exact methods cannot be applied everywhere, often because of the lack of data and the vagueness of the characteristic criteria and way of their evaluation, and also because of the lack of appropriate models relating the extent of the defects to the load-carrying capacity or lifetime of the object. The overall safety and reliability characteristic is obtained by a suitable processing of many partial ratings, each for every criterion. Such evaluation is usually based on a judgement by a person with long practice – an expert. The results of this approach, using his experience and intuition, are usually reasonably good. Nevertheless, methods that are more objective and less sensitive on the person of the inspector are needed. In this chapter, two procedures will be described: a simple method that assigns weights to the individual criteria and uses a simple rule for their aggre‐ gation, and an advanced method for more complex cases, which uses fuzzy logic tools.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **Simple multicriterial condition assessment**

The method can be explained on the evaluation of bridges, but a similar approach may be used for other objects, too. The decision on a repair is based on the results of inspections. Various systems exist for the classification of bridge condition, usually based on a scale with several degrees. For example, in Czech Republic a three-degree scale is used for railway bridges (1 = good, 2 = satisfactory, 3 = unsatisfactory), whereas a seven-degree scale is used for road bridges (1 = faultless, 2 = very good, 3 = good, 4 = satisfactory, 5 = bad, 6 = very bad, 7 = emergency, danger of collapse). In Poland, a continuous scale between 0 and 5 is used.

The condition evaluation is based on several criteria. For small concrete railway bridges, they pertain to: 1 = condition of the beams, 2 = condition of water insulation, 3 = condition of the cornice, and 4 = response to train passage. The bridge inspection protocol with verbal descrip‐ tion of the situation serves for assigning weights to the individual criteria. This is facilitated by the catalog relating the weights to various degree of damage. The sum of the weights for individual criteria forms the resultant characteristic of the overall condition (*R*) and serves for the decision whether or not the bridge should be repaired.

The procedure will be explained here on an example of a simple bridge [1].

### **Example 1**

A concrete bridge was inspected to evaluate its overall condition. The results of the inspection, written into a protocol, were as follows:


These results were then compared with the catalog of weights (*W*) for various conditions. An extract from the list is shown below. The first subscript denotes the criterion and the second subscript denotes the classification. The weights range from 0 to 1; higher values correspond to better condition.

**1. Concrete beams** (7-degree classification). *W*1,7 = 0.9 – 1.0: Concrete plastering is without cracks, the steel reinforcement is fully covered, the edges of beams are without rust, protective painting is in order. *W*1,6 = 0.8: Concrete plastering at the bottom contains hair cracks, the steel reinforcement is bare at lengths less than 0.05 m (i.e. not substantial), the edges of beams are with slight rust, the protective painting starts flaking off. *W*1,1 = 0.0 – 0.1: The plastering has significantly fallen away, the edges of steel beams are very rusty, with the thickness reduced by 2 to 3 mm, cracks are present in the concrete, and the concrete crumbles up to the depth 60 mm.


Based on this list, the individual criteria of the investigated bridge were assigned the following weights: *W*1 = 0.65, W2 = 0.45, *W*3 = 0.85, and *W*4 = 0.90. The resultant characteristic *R* of the overall condition is obtained as the sum of the individual weights:

*R* = *W*1 + *W*2 + *W*3 + *W*4 = 0.65 + 0.45 + 0.85 + 0.90 = 2.85.

This result can be interpreted using the following 3-degree scale based on experience:

*R* = 3.0 – 4.0, Degree 1. Condition: Good. Load-carrying construction needs only common maintenance.

*R* = 1.8 – 3.0, Degree 2. Condition: Satisfactory. Load-carrying construction needs repair (more extensive than common maintenance), but the defects do not endanger the safety.

*R* = 0.0 – 1.8, Degree 3. Condition: Unsatisfactory. Load-carrying construction needs total reconstruction or exchange of the load-carrying construction or substantial repair.

The above value *R* = 2.85 can thus be interpreted as "satisfactory condition, but a repair is necessary". The bridge manager will decide about the repair (also with respect to the money available, condition of other bridges in the network, etc.; see Chapter 17).

Note: The method can be improved by assigning weights to the groups of individual criteria to better respect the influence of each criterion on the safety of the whole structure.

## **2. Fuzzy logic approach to condition assessment**

As we have seen, the above characterization of bridge condition was based not on exact values but on rather vague terms. There are many situations like this. In daily life, we often describe the situation as "slightly increased temperature" or "the girders are very rusty". Even a driver controls his car in terms "fast-slow" and "near-far". The necessity of working with such "fuz‐ zy" quantities has led to the development of methods based on fuzzy sets [2, 3]. These methods enable work with linguistic and numerical quantities and allow their combination as well as the use of mathematical and logical operators (IF, AND, OR, THEN,...). The procedures for application of fuzzy methods are principally similar as the above multicriterial condition assessment. However, instead of one single ("sharp") value for each criterion, they use socalled membership functions and offer more flexibility and better characterization of the situation (Figure 1A). The application of fuzzy logic on decision processes consists of three steps: fuzzification, fuzzy inference, and defuzzification.

### **STEP 1. Fuzzification**

The real values of input variables are transformed into fuzzy values of linguistic variables. This is done by assigning a suitable attribute to each basic variable. An example of such variable is "deflection of a beam" and an example of an attribute is "small". Often, three to seven attributes are used (e.g. positive big, positive medium, positive small, zero, negative small, negative medium, and negative big). The fuzzy approach uses membership functions that express the degree of correspondence of individual quantities to their definitions. For example, usual operating temperature of a machine is from 40°C to 70°C. The temperature 75°C can be considered as increased, but still also as operating. Its appropriateness to operating conditions, however, is not so high as if it were within the above interval. The fuzzy approach enables dealing with just such cases. Examples of various membership functions are shown in Figure 1A; *μ*(*x*) = 1 means full correspondence of *x* with its definition.

#### **STEP 2. Fuzzy inference**

In the second step, mathematical and logic operations are performed with the fuzzified input variables. For example, "If 'A' is small and 'B' is high, then 'C' is small". The output is also fuzzy or in a form of a linguistic variable. A suitable processing of membership functions for several input variables gives the membership function of the result. For example, if a load "about 5 kN" acts on a structure and also a load "about 10 kN", then the total load is "about 15 kN". Figure 1B shows this simple case for triangular membership functions.

#### **STEP 3. Defuzzification**

In this step, the fuzzy result is transformed into a sharp value of the output variable, charac‐ terizing the overall condition, e.g. "the damage degree is 4.3". Various methods exist for this purpose: position of the centroid of the resultant membership function, the first of maxima, etc. If the technical condition of a structure is evaluated, the resultant statement can be "the condition is good (satisfactory, bad,...)". This serves for the decision about the further operation or repair of the object.

**Fig. 49.** (A) Examples of membership functions, (B) Example of composition of **Figure 1.** (A) Examples of membership functions and (B) example of composition of two fuzzy quantities (*F* = *F*1 + *F*2).

**Attention – the symbols A and B are not attached to the pertinent figures yet -** 

two fuzzy quantities (F = F1 + F2).

**but should be !** 

The condition assessment using fuzzy logic needs computer support. Special programs may be created, but commercial software can also be used. For example, Matlab offers universal Fuzzy Logic Toolbox [4]. It enables the definition of various membership functions (e.g. for the intensity of damage and extent of damage or other quantities relevant to the particular problem). The user can also choose the rules for the inference process from a database. The solution is controlled by the editor of fuzzy inference system (Fig. 2), and the results can be presented in graphic form.

**Figure 2.** Editor of fuzzy inference system in Matlab®.

The main parts of a fuzzy-logic tool are: an editor, databases of membership functions and rules for work with them, and a viewer on the resultant membership function. Before com‐ puter-aided condition evaluation with fuzzy methods may be applied on an engineering object, the following steps must be done:


There are many publications on fuzzy methods. Their use for reliability assessment is ex‐ plained in [2, 3, 5 - 7]. Rudolf [8] developed an application of computer aided fuzzy inference for the evaluation of bridges; see also [1].

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

### **References**


## **Chapter 22**

## **Bayesian Methods**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62378

#### **Abstract**

Probabilistic Bayesian methods enable combination of information from various sources. The Bayes theorem is explained and its use is illustrated on several examples of practical importance, such as revealing the cause of an accident or reliability increasing of non-de‐ structive testing. Also its use for continuous quantities and for increasing the reliability of the parameters of normal or Weibull distribution is shown.

**Keywords:** Statistics, probability, Bayes, Bayes theorem, reliability, non-destructive test‐ ing, normal distribution, Weibull distribution, combination of information

The term "Bayesian methods" denotes probabilistic methods that enable the combination of information on some event or quantity with previous information from measurement or ex‐ perience. The use of additional information can increase the reliability of our information or reduce the extent of measurements needed for making conclusions on certain event. Exam‐ ples of application are the determination of the most probable cause of a failure, increasing the reliability of diagnostic methods or increasing the accuracy of the determination of dis‐ tribution parameters of random quantities.

Bayesian methods are based on the so-called Bayes theorem [1 – 6]. It was originally formulated for discrete quantities, but extended later for continuous quantities as well. These methods have also been included into standards. In this chapter, their principle will be explained, and the use is shown on several practical examples.

## **1. Bayes theorem**

Let us assume that an event (*B*) can occur if another event (*A*) has occured. The event *A*, however, could occur by several ways (*A*1, *A*2,..., *A*n), which are mutually exclusive. The probability of simultaneous occurence of both events *A*<sup>j</sup> and *B* is calculated as

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

$$\left(\mathbb{B}A\_{\rangle}\right)\_{\rangle} = P\left(A\_{\rangle}\right) \times P\left(B \mid A\_{\rangle}\right) \,, \tag{1}$$

where *P*(*A*<sup>j</sup> ) is the probability of event *A*<sup>j</sup> , and *P*(*B*|*A*<sup>j</sup> ) is (conditional) probability that event *B* can occur provided that event *A*<sup>j</sup> has happened. The total probability of event *B* is

$$P(B) = \begin{array}{c} \Sigma P(BA\_\rangle) \\ \vdots \end{array} \tag{2}$$

the summation is done for all possible cases *j* = 1, 2,..., *n*. Bayes theorem looks at the issue from the opposite side: "If event *B* has happened, what is the probability that it was as a consequence of (or after) event *A***<sup>j</sup>** ?" With the use of Equations (1) and (2) and the fact that *P*(*BA***<sup>j</sup>** ) = *P*(*A***<sup>j</sup>** *B*), this probability can be expressed as [2 – 6]:

$$P\left(A\_{\mathbf{j}} \mid B\right) \, = P\left(A\_{\mathbf{j}}\right) \, \times P\left(B \mid A\_{\mathbf{j}}\right) \, / P\left(B\right) \, \, \, \, \tag{3}$$

where the total probability *P*(*B*) in the denominator is calculated from individual probabilities via Equations (2) and (1). Equation (3) is the simplest form of Bayes theorem. Its use will be shown on three examples. The first example, adapted from [4], does not solve a reliability problem, but is very instructive.

**Example 1.** Identification of origin of a sample from several possible sources.

The materials for road building are delivered from two plants with daily capacities 300 t (plant 1) and 700 t (plant 2). The long-term monitoring of quality shows that plant 1 has 2% of all batches faulty and plant 2 has 4% faulty batches. If now a sample is chosen at random at the building site, and if this sample is faulty, which plant is the batch from?

From the total amount of 300 + 700 = 1000 t/day, plants 1 and 2 produce 30% and 70%, respectively. Let us denote event *A*1: the sample is from plant 1; event *A*2: the sample is from plant 2. The corresponding probabilities are *P*(*A*1) = 0.3; *P*(*A*2) = 0.7. Event *B*: the sample is defective. The probability of defective sample from plant 1 is *P*(*B*|*A*1) = 0.02, and from plant 2, it is *P*(*B*|*A*2) = 0.04. The total fraction of faulty production is: *P*(*B*) = 0.3 × 0.02 + 0.7 × 0.04 = 0.034 = 3.4%. The defective material from plant 1 represents 0.02 × 0.3 = 0.006 from the total production of both plants. This is 0.006/0.034 = 0.176 = 17.6% from the total faulty produc‐ tion. Similarly, plant 2 produces 82.4% of the scrap. These numbers also say that if the randomly chosen sample was faulty, a probability of 17.6% exists that it is from plant 1 and 82.4% that it is from plant 2.

Using Bayes rule (3), one can express the probability that the defective specimen is from plant 1 as *P*(*A*1|*B*) = *P*(*A*1) × *P*(*B*|*A*1)/*P*(*B*). The values 0.3 × 0.02/0.034 yield the same result 0.176 as above. Similarly, the probability that the faulty sample is from plant 2, *P*(*A*2|*B*) = 0.7 × 0.04/0.034 = 0.824 (=1 – 0.176).

If the quality is not considered, the probability that a randomly chosen sample comes from plant 1 equals 30% (i.e. the fraction of production from plant 1). If, however, additional information "the sample was defective" was used together with the information on quality in both plants, this probability has dropped to 17.6%. The same information has increased the probability of the sample being from plant 2 from 70% to 82.4%. Although the probability that a sample is from plant 2 was higher even without the Bayes rule (70%), the strengthening of this hypothesis is obvious.

#### **Further strengthening of the hypothesis by using more tests**

The hypothesis "the material is from plant 1 (or 2)" can be strengthened (or mitigated) by checking more specimens. If *n* specimens are taken from one batch, and if all appear to be defective (= event *B'*), then the expression *P*(*B*|*A*<sup>i</sup> ) in Bayes rule (3) must be replaced by the expression *P*(*B'*|*A*<sup>i</sup> ) = *P*(*B*|*A*<sup>i</sup> ) *<sup>n</sup>*. For example, if three specimens were taken from a batch from the above example, and if all were faulty, then *P*(*B'*|*A*1) = 0.023 , *P*(*B'*|*A*2) = 0.043 , *P*(*B*') = 0.3 × 0.023 + 0.7 × 0.043 = 0.0000472, and *P*(*A*1|*B'*) = 0.3 × 0.023 /0.0000472 = 0.05. Similarly, *P*(*A*2|*B'*) = 0.95. In such case, it is nearly sure that the batch was from plant 2.

**Example 2.** Revealing the most probable cause of an accident.

This example is adapted from [2]. An explosion occurred during a repair of a tank for liquid natural gas. The accident could have happened due to (1) static electricity, (2) fault in the electric equipment, (3) work with open flame during the repair, or (4) intentional act (sabotage). Engineers for risk analysis estimated that the accident could happen with a probability of 25% due to static electricity, 20% due to a fault in the electric equipment, 40% due to work with open flame, and 75% due to a sabotage. The discussion with them also gave the following subjective assessment of probability of individual causes: 0.30, 0.40, 0.15, and 0.15. What is the most probable cause of the explosion in view of all this information?

Solution. Event *A*: presence of conditions for explosion: *P*(*A*1) = 0.30; *P*(*A*2) = 0.40; *P*(*A*3) = 0.15; *P*(*A*4) = 0.15 (note: Σ*P*(*A*<sup>i</sup> ) = 1.00). Event *B*: explosion. The probabilities of explosion under particular conditions are *P*(*B*|*A*1) = 0.25; *P*(*B*|*A*2) = 0.20; *P*(*B*|*A*3) = 0.40; *P*(*B*|*A*4) = 0.75. Total probability of the accident: *P*(*B*) = 0.30 × 0.25 + 0.40 × 0.20 + 0.15 × 0.40 + 0.15 × 0.75 = 0.3275. Probability that the explosion has happened due to: (1) static electricity: *P*(*A*1|*B*) = 0.30 × 0.25/0.3275 = 0.229 = 22.9%, (2) electric appliance: *P*(*A*2|*B*) = 24.4%, (3) open flame: *P*(*A*3|*B*) = 18.3%, and (4) sabotage: *P*(*A*4|*B*) = 34.3%. [Compare these updated probabilities with the original estimates *P*(*A*<sup>i</sup> ).]

**Example 3**. Increasing the reliability of nondestructive testing.

Welded components are tested for the occurrence of defects (cracks). The device used for nondestructive testing is not perfect. It classifies defect correctly (as defect) only with proba‐ bility 98%, whereas, in 2% of all cases, it does not recognize the crack and classifies the component as good. On the contrary, the device marks 96% of good parts as good, but 4% classifies as with a crack. According to long-term inspection records, 3% of all tested compo‐ nents contain cracks. The questions are: If the tested part was classified as "wrong" (i.e. with a defect), what is the probability that it is actually (a) wrong or (b) good? And what about if the component was classified as "good"?

Solution. Event *A*1: Component contains a defect, *A*2: component is good. *P*(*A*1) = 0.03; *P*(*A*2) = 0.97. Event *B*: component is classified as wrong. *P*(*B*|*A*1) = 0.98; *P*(*B*|*A*2) = 0.04. The fraction of tested components marked as wrong: *P*(*B*) = 0.03 × 0.98 + 0.97 × 0.04 = 0.0682.

Case 1a. Probability that the component marked as wrong is actually wrong, is *P*(*A*1|*B*) = *P*(*A*1) × *P*(*B*|*A*1)/*P*(*B*) = 0.03 × 0.98/0.0682 = 0.431 = 43.1%. Case 1b. Probability that the component marked as wrong, is actually good, is *P*(*A*2|*B*) = 0.97 × 0.04/0.0682 = 0.569 = 56.9%. (Remark: Due to the high proportion of good parts (98%), the proportion of good but rejected parts is high.)

Event *B'*: Component is classified as good. *P*(*B'*|*A*1) = 0.02; *P*(*B'*|*A*2) = 0.96. The total fraction of components, denoted as good, is *P*(*B'*) = 0.03 × 0.02 + 0.97 × 0.96 = 0.9318. Case 2a. Probability that the component marked as good is actually wrong, is *P*(*A*1|*B'*) = 0.03 × 0.02/0.9318 = 0.00064 = 0.06%. Case 2b. Probability that the component marked as good is actually good is *P*(*A*2|*B'*) = 0.99936 = 99.94%.

Recommendation: All rejected components could be tested once more to reduce the number of discarded good components.

A similar approach can be used in medicine (e.g. in cancer screening).

## **2. Bayes rule for continuous quantities**

If the probability of event *B* depends on the value of a continuous quantity *A*, described by the probability density *f*(*A*), it is possible to calculate the total probability of this event as

$$P\left(\mathcal{B}\right) \,= \left[ \right] P\left(\mathcal{B} \mid A\right) f\left(A\right) \Big] \,\mathrm{d}A;\tag{4}$$

the integration is performed over the whole domain *A*. [The integration has replaced the summation in Equation (2).] An example is the nondestructive detection of cracks in welds: *P*(*B*) is the probability of crack detection, *f*(*A*) is the probability distribution of cracks of size *A*, and *P*(*B*|*A*) is the probability of detection of a crack of size *A*, the so-called "probability of detection" curve (shortly POD curve) of the device.

Now, a question can be asked: If event *B* (result of the test) has occurred, what is the actual distribution of random variable *A*? Bayes rule (3) can be modified also for this case; the formula for updated distribution of quantity *A* is [1, 3]:

$$f\left(A \mid B\right) = f\left(A\right) \times P\left(B \mid A\right) \;/\; P\left(B\right) \;/\tag{5}$$

where *P*(*B*) is given by Equation (4). For example, the updated distribution of crack lengths can be used for the estimation of time to fatigue failure by the Monte Carlo method [1, 2].

### **2. Other applications**

Bayesian methods can also be used for the improvement of parameter estimate of various probability distributions. Three examples follow.

#### **Parameters of normal distribution**

The mean value *μ* and standard deviation *σ* of a population with normal distribution are usually unknown, so that they are replaced by their estimates *m* and *s* from a sample of size *n*. The estimate of the mean value can be refined via confidence interval:

$$
\mu m - t\_{a,\nu} \lor \left( s \ne n \right) \le \mu \le m + t\_{a,\nu} \lor \left( s \ne n \right) \,. \tag{6}
$$

where *t*α,ν is *α*-critical value of Student's *t*-distribution for *ν* = *n* – 1 degrees of freedom.

The estimate can be made more accurate if additional information is available (e.g. estimates of *m*0 and *s*0 from previous measurements or records). If the number *n*<sup>0</sup> of these values is known, and if the assumption can be made that all samples (new and old) belong to the same popu‐ lation, the following procedure may be used. The updated average is calculated as the weighted average of both sample averages:

$$m\_u = (nm + n\_0 m\_0) / n\_u; n\_u = n + n\_{0^\prime} \tag{7}$$

where *n***u** is the updated number of values. The updated standard deviation is

$$\mathbf{s}\_{u} = \sqrt{\frac{(n-1)\mathbf{s}^{2} + (n\_{0}-1)\mathbf{s}\_{0}^{2} + nm^{2} + n\_{0}m\_{0}^{2} - n\_{u}m\_{u}^{2}}{n\_{u}-1}}.\tag{8}$$

Then, the updated confidence interval for *μ* can be calculated with *m*, *s*, and *n* in (9) replaced by the updated values *m*u, *s*u, and *n*u. If *n***0** is unknown, the literature on Bayesian methods recommends an approximate formula [1, 5, 6]:

$$m\_0 = \text{s}^2 \text{ / s}^2 \text{ /} \tag{9}$$

based on the idea that *m*0 and *s*0 carry information corresponding to a fictitious sample of certain size *n*0. The smaller the scatter *s*<sup>0</sup> <sup>2</sup> compared to *s*<sup>2</sup> , the more important are the original results and the larger is the size of the fictitious sample. (An important condition for this estimate is that the "a priori" values of *m*0 and *s*0 were obtained from large samples.)

#### **Quantiles of normal distribution**

The ISO 12491 standard "Statistical methods for quality control of building materials and components" recommends the following formula for the Bayesian estimate of *p*-quantile of normal variable *x*:

$$\mathbf{x}\_{p,B} = m\_u + t\_p \mathbf{s}\_u \sqrt{\left(1 + \mathbf{1}/n\_u\right)} \;/\; \tag{10}$$

where *t*p = *t*p(*α*, *p*, *ν*u) is *p*-quantile of Student's *t-*distribution for *ν*u = *n*<sup>u</sup> – 1 degrees of freedom. If no additional information is available, the standard recommends the original values *m*, *s*, and *n*.

#### **Weibull distribution**

Some quantities, such as the strength or time to failure due to fatigue, can often be approxi‐ mated by Weibull distribution:

$$\mathbb{P}\left(\mathbf{x}\right) = 1 - \exp\left(-\left[\left(\mathbf{x} - \mathbf{x}\_0\right)/a\right]^b\right). \tag{11}$$

The parameters *a, b*, and *x*0 are determined from tests. (The threshold value *x*<sup>0</sup> is often assumed equal zero.) Sometimes, the number of tests is too low for obtaining reliable values of all parameters. Fortunately, the investigated component or structure is often not a quite new solution but rather an improvement of the current conception. In such case, one can expect that the failure mechanism will be similar as in the previous construction. As the parameter *b* is closely related to the character of failures, one can assume that the value *b* will be approxi‐ mately the same as for the previous components and use it as a known constant. Under this assumption, the finding of the remaining parameters *a* and *x*0 from small amount values is more reliable. This approach is called "Weibayes" [7]. The assumed value *b* is more reliable if it was determined from many tests. It is thus suitable to keep records from all tests – for possible use in the future!

### **4. Software for Bayesian methods**

The problems from the above first three examples can be solved easily using Excel and standard Bayesian notation. Some simple programs can be found in the literature, for example [2, 3]. At ETH Zürich, a PC program Combinfo was created, which enables the combination of data from various sources [8], including vague information, such as probability estimates by experts or by judgment. The program allows assigning various weights to individual information. Bayesian methods are also incorporated into software packages for reliability analysis, such as www.reliability.com, www.weibull.com, www.reliasoft.com, or www.itemsoft.com.

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## **References**


#### Concise Reliability for Engineers

**Epilogue – Ways to Higher Reliability**

## **Epilogue — Ways to Higher Reliability**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62379

#### **Abstract**

This last chapter summarizes the means and recommendations for reliability increasing, which are suitable for design and operation.

**Keywords:** Reliability, materials, design, control, standards, maintenance, computers, measurements, diagnostics, standards, failure analysis

Many current products are much more reliable than in the past, and such trend should continue. This book summarized the methods that contribute to higher reliability. The first part (Chapters 1 to 10) explained the basic terms and methods, whereas the second part (Chapters 11 to 22) explained more advanced tools for reliability evaluation and optimization. This epilogue summarizes the means that have enabled growth of reliability and gives a brief list of recommendations for reliability increasing.

## **1. Experience from failures and accidents**

All big accidents of aircrafts, ships, big structures, or chemical plants have been thoroughly investigated. Intensive attention has also been devoted to the frequent failures. The analysis of causes and time course of failures contributed to the improvement of the pertinent objects or constructions, to the improvement of manufacturing and building processes, and to the creation of various standards and procedures for increasing reliability and safety.

## **2. Reliability theory and statistical methods**

The basic concepts and quantities for reliability characterization and measurement have been defined. Gradually, the main kinds of failures and their causes were identified and the

characteristic course of failure rate during the life of various objects (bathtub curve) was explained. Statistical analysis enables a better understanding and prediction of failures. The theory of probability has led to the measures for increasing reliability (e.g. the use of redun‐ dancy or the optimization of reliability of complex systems by allocating various reliabilities to the individual components). It also enables one to formulate reasonably reliable conclusions from limited information (e.g. minimum strength of a material or statistical acceptance). However, efficient methods for reliability increasing have also been developed, which do not work with probability, such as failure modes and effects analysis (FMEA).

## **3. Better materials, better components, and better technologies**

Due to systematic research, many new materials have been developed since World War II, with outstanding properties (e.g. plastics, such as Teflon, carbon fibers, and synthetic dia‐ mond). Also, various methods of surface treatment, which increase the resistance to corrosion, high temperatures or wear (e.g. hard TiN layers on machine tools), or strength (e.g. glass strengthening by ion exchange in the surface layer). A great variety of components are available on the market. The manufacturers of various machines, appliances, and other products can buy components tailored for particular purposes and thus bring their products to perfection. Also, high-precision tools and technologies exist, which enable a better achieve‐ ment of the demanded parameters of components or products.

## **4. Better knowledge in mechanics, materials science, and other branches**

During the last 50 years, various branches of engineering sciences have made significant progress (e.g. strength of materials, fatigue, fracture mechanics, dynamics, heat transfer, and flow of liquids and gases as well as control). In design and dimensioning, one can use better models for the response of structures and appliances to operation loads. Today, a much better knowledge of materials and the causes and time course of their deterioration and failure also exists.

## **5. Possibility to analyze, simulate, and test the objects via computer models**

The improvement thanks to computers is enormous. Computers can quickly process a huge amount of information. In the past, stresses and deformations could be analyzed only in components of simple shapes and loads, and the results were often only approximate. Very important at those times thus was the testing of physical models and actual constructions, which is cumbersome and expensive. Today, computers allow the analysis and solution of very complex problems. For example, the programs for finite-element analysis can relatively and accurately determine the stresses in complicated bodies and reveal their critical parts. As early as in the design stage (which is crucial for reliability), it is possible to reveal the behavior under many load variants and conditions, including extreme ones. Unsuitable solutions can thus be excluded in advance. This facilitates the finding of the optimum shape or configuration, especially if computer programs for optimization are used. All this reduces the necessary extent of tests of prototypes (which are, nevertheless, still important). Computers also allow one to store and process information from the operation, which can be used for the optimiza‐ tion of maintenance and gradual improvements of the object.

## **6. Obtaining reliable design data by measurements and tests**

The properties of materials or standard components can be obtained from material data sheets or manufacturers' catalogues. If they are missing, or in very demanding cases, they are gained via special tests. The important parts or prototypes are tested during the development. Overload tests done before putting the component or object into service can reveal weak pieces. Examples are load tests of bridges, overpressure tests for pressure vessels, tests of rotating parts under significantly higher velocities, and high voltage tests of electrical components and appliances. Special kinds are proof tests, in which all "weak" parts with unacceptably large defects are destroyed by controlled overloading.

## **7. Better techniques for the measurement of various quantities and for control; the use of diagnostics and design of intelligent devices**

There has been a great improvement in measuring technologies, sensors, and devices for the analysis and processing of various quantities and signals (e.g. vibration diagnostics). All these are significantly enhanced by computers. Today, it is possible to measure gradual changes and deterioration of a component or machine and the changes of the operating conditions. In this way, the appliance can be switched off and repaired before a serious failure happens. Many production processes are 100% monitored. For example, in the production line for glass bottles (with the rate one bottle per second), all important parameters are measured at every bottle and also their changes with time. This, together with the identification of individual moulds, makes possible an early intervention targeted only at the problematic mould. The evolution proceeds towards smart devices with self-control. Two examples of intelligent elements from everyday life can be named: indication of unfastened seatbelts in a car and automatic dynamic balancing of the content in a home washing machine before spinning.

## **8. Codes and mandatory procedures to ensure reliability and safety**

Experience, gathered continuously for a long time, has been incorporated into standards and regulations. These include a variety of proven procedures and practices that guarantee a universally acceptable level of reliability; see, for example, the codes for the design of steel structures or standards for production and acceptance control. Codes also represent certain etalons in disputes arising due to malfunction or accident.

## **9. Organizational measures, consistent control of processes, and operation**

Even the best technical ideas, solutions, and regulations are useless if their application is not ensured. In complex processes, this must be achieved organizationally. Where necessary, checks must be done consistently at the input, during the process, and at the output. It is reasonable to seek ways for the elimination of human errors (e.g. by replacing physical or mental work by machines and computers). If this is not possible, emphasis must be put on personal responsibility. For example, in manual welding or inspection of welds, every qualified worker has his personal stamp to confirm that it was he who has done the operation.

## **10. Better approaches to maintenance**

From the originally used maintenance after failure, the development went over preventive maintenance, done in fixed intervals, to on-condition maintenance, which strongly uses technical diagnostics and decides with respect to the actual condition. The newest trend is reliability centered maintenance, which reduces the pertinent costs by the elimination of all unnecessary maintenance works as revealed by a thorough analysis.

## **11. Competition and legal responsibility for defects and failures**

In economic systems where supply exceeds demand and a possibility of choice exists, emphasis is put on reliability. Free market and competition make permanent pressure on manufacturers to improve their products. If several firms can make certain products at a similar price, the firm, whose products are more reliable, will win, as the losses due to the failures of its products will be lower. The pressure toward increasing reliability is also supported by legislation, with legal responsibility for any defect, failure, or damage caused by them.

## **12. Recommendations for reliable design and operation**


## **Author details**

#### Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

#### Concise Reliability for Engineers

## **Sources of Information on Reliability**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62365

This appendix gives the names of publishing houses, journals, web servers, and professional bodies related to reliability. The web addresses correspond to December 2015.

## **1. Books**

Useful books on reliability, probability, risk, maintenance, and related subjects are listed in the References in this book. Many others can be found in various libraries. Some books are freely accessible via *http://www.knovel.com*; it is sufficient to write the book name or a keyword into a "search window" there. The primary sources of books on reliability are various publishing houses. Among the best known, the following can be named:

**Elsevier Science***(http://www.elsevier.com)*

**Springer***(http://www.springer.com)*

**John Wiley & Sons***(http://wiley.com)*

**Taylor and Francis***(http://www.tandfonline.com)*

## **2. Journals**

The web pages presented below offer information on the reliability-related journals, contents of individual volumes and abstracts, as well as other useful information. For the unsubscribed readers, often it is possible to buy the pertinent article and sometimes even to get free access to the full text via web.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

**Reliability Engineering & System Safety***(www.journals.elsevier.com/reliability-engineering-andsystem-safety)*

A scientific journal devoted to the development and application of methods for the enhance‐ ment of the safety and reliability of complex technological systems. It is published by Elsevier in association with the European Safety and Reliability Association and the Safety Engineering and Risk Analysis Division.

#### **Reliability Edge***(http://www.reliabilitynews.com)*

A journal published four times a year by ReliaSoft Corporation brings articles related to the reliability engineering theories and principles along with useful information on ReliaSoft's upcoming training seminars and product updates.

### **Reliability HotWire***(http://weibull.com/hotwire/index.htm)*

An Internet journal bringing news from reliability and solutions of various practical problems (see also the title "Weibull" below).

**Quality and Reliability Engineering International***(http://www3.interscience.wiley.com/cgi-bin/ jhome/3680) (http://onlinelibrary.wiley.com/journal/10.1002/(ISSN)1099-1638)*

A scientific journal devoted to the problems of quality and reliability. Published by John Wiley & Sons, Ltd.

#### **IEEE Transactions on Reliability***(http://www.ieee.org, http://ieeexplore.ieee.org)*

A scientific journal devoted to the principles and practices of reliability, maintainability, and product liability pertaining to electrical and electronic equipment.

**IEEE Transactions on Device and Material Reliability***(http://ieeexplore.org/xpl/RecentIssue.jsp? punumber=7928)*

A scientific journal devoted to the reliability of electronic elements.

#### **Microelectronics Reliability***(http://www.journals.elsevier.com/microelectronics-reliability)*

A scientific journal devoted to the reliability of electronic components. Published by Elsevier.

**Software Testing, Verification, and Reliability***(http://www3.interscience.wiley.com/cgi-bin/ jhome/13635)*

A scientific journal devoted to the problems of testing and reliability of software. Published by John Wiley & Sons, Ltd.

**Maintenance Technology***(http://mt-online.com)*

An electronic journal devoted to maintenance.

**Maintenance Resources***(http://www.maintenanceresources.com/productsshowcase/index.htm)*

A journal for professionals on reliability and maintenance.

**Warranty Week***(http://www.warrantyweek.com)*

The newsletter for warranty management professionals.

## **3. Internet**

The Internet is a very important source. This paragraph presents the links to several servers devoted to reliability and related topics. However, the named sources correspond to the date when this book was published, and changes cannot be excluded (as usual with Internet).

### **Reliability Web***(http://www.reliabilityweb.com)*

This web contains lot of useful information, including links to the specialized papers on reliability (freely accessible), calendar of events, info on books to buy, or discussion forum.

#### **ReliaSoft Corporation***(http://www.reliasoft.com)*

This corporation offers various software and courses oriented to the automotive industry but not only to this. It also operates the portal *http://www.reliawiki.com*, with useful resources freely downloadable, including comprehensive handbooks, such as Life Data Analysis Reference, Accelerated Life Testing Reference, System Analysis Reference — Reliability, Availability & Optimization, Experiment Design & Analysis Reference, and Reliability Growth Analysis Reference. **Reliawiki** also provides the access to the magazine Reliability HotWire.

#### **Barringer & Associates***(http://www.barringer1.com)*

Consultancy firm offering various courses from reliability area and many interesting texts (including some standards of Military Handbook) and software products, some of them for free. It also offers books for sale.

#### **System Reliability Center***(http://src.alionscience.com)*

The Center provides expert services, information support and education for people engaged in reliability and also a forum for the exchange of information from reliability, information on literature, software tools, standards, publications, and older issues of the journal published by SRC. It operates a Toolbox with answers to many practical problems.

#### **Weibull***(http://www.weibull.com)*

The reliability engineering web site devoted to theory, data analysis, and modelling. It includes sections on life testing, system reliability and maintainability, reliability growth analysis, FMEA and FMECA, reliability-centered maintenance (RCM), and design of experiments (DOE). It also contains the info on books and free access to the magazine *Reliability HotWire* for reliability professionals.

#### **Quanterion Solutions, Inc.***(https://quanterion.com)*

A firm offering (among other products) courses, consulting services, special books on relia‐ bility, and downloadable informative texts and data related to various reliability problems.

#### **Maintenance Resources***(http://www.maintenanceresources.com)*

Web pages related to maintenance and maintainability.

### **Maintenance World***(http://www.maintenanceworld.com)*

A source for reliability and maintenance management and professionals for the exchange of experience. It informs on conferences and courses and contains freely accessible articles dealing with solution of many practical problems related to operation and maintenance.

#### **Plant Maintenance Resource Center***(http://www.plant-maintenance.com)*

A portal for industrial maintenance, informing about issues on maintenance and reliability, including books. It contains many articles devoted to practical problems. Some links, unfortu‐ nately, are no more active, as this service was understood as a competition to Google.

**Google, Wikipedia, and Wikimedia***(http://www.google.com, http://www.wikipedia.com, http:// www.wikimedia.com)*

Web portals enabling search for information on any topic, including reliability, probability, mathematics, maintenance, and many others.

#### **Information on failure rates**

Very important for the assessment of reliability and availability of various appliances and systems are the data on reliability of mass-produced components. Some values of failure rate can be found, for example, in the databases **Electronic Parts Reliability Data** and **Nonelec‐ tronic Parts Reliability Data**. The data contained in these databases represent a compilation of field experience in military, commercial, and industrial applications. Both databases were created by Reliability Information Analysis Center (RIAC), originally for the U.S. Department of Defense. Some of the data from the database Electronic Parts Reliability Data (EPRD) are available at *http://theriac.org/productsandservices/products/downloads/content/EPRD%20Sam‐ ple.pdf*, and some data from the database Nonelectronic Parts Reliability Data (NPRD) are available at *http://theriac.org/productsandservices/products/downloads/content/NPRD-2011Sample‐ Pages.pdf*.

More comprehensive versions of these databases can be purchased, for example via Quante‐ rion Solutions, Inc., or other providers. Some data on the reliability of electric components can be found in the Military Handbook Reliability Prediction of Electronic Equipment (MIL HDBK 217), 1991 issue; see *http://www.sre.org/pubs/Mil-Hdbk-217F.pdf*.

Note: Some of the data available in the above databases are not "up-to-date" but still contain useful information.

## **4. Professional bodies**

#### **European Safety and Reliability Association (ESRA)***(http://www.esrahomepage.org)*

A nonprofit professional organization aimed at the advancement of safety and reliability technology in all areas. It informs on various activities and conferences, publishes ESRA Newsletters, and organizes the annual European Safety and Reliability Conference ESREL.

#### **European Safety, Reliability & Data Association (ESReDA)***(http://www.esreda.org)*

This Association provides a forum for the exchange of information, data, and current research in Safety and Reliability.

#### **International Association for Probabilistic Safety Assessment and Management***(http:// www.iapsam.org)*

The main purpose of IAPSAM is to sponsor and oversee the organization of the International Conferences on Probabilistic Safety Assessment and Management (PSAM).

#### **IEEE Reliability Society***(http://www.ieee.org/portal/site/relsoc)*

A division of the important professional organization IEEE publishes Transactions on Relia‐ bility and other journals; organizes international symposia on reliability, availability, quality, and system safety; and informs on various events, standards, and literature.

#### **American Society for Quality, Reliability Division***(http://asq.org/reliability)*

A section for reliability within the American Society for Quality. It informs on various activities, conferences, literature, journals, newsletters, and courses.

**Safety Engineering and Risk Analysis Division (SERAD)***(https://community.asme.org/ safety\_engineering\_risk\_analysis\_division/w/wiki/3574.about.aspx)*

A division of the American Society for Mechanical Engineers (ASME), which stimulates an interest in risk analysis and safety information applied to mechanical engineering.

**Society of Automotive Engineers, Reliability, Maintainability, Supportability, and Logis‐ tics Division (G-11)***(http://www.sae.org/standardsdev/aerospace/g11.htm)*

A division of SAE providing an industry/government forum to review RMS technology and its interfaces with logistics support, engineering design and development, maintainability, reliability, and diagnostics, especially for automotive and aerospace industries.

#### **Safety and Reliability Society***(http://www.sars.org.uk)*

International professional organisation. Among other activities, it publishes a quarterly journal *Safety and Reliability*, each volume devoted to certain topic.

#### **Society for Maintenance and Reliability Professionals***(http://www.smrp.org)*

A U.S./international nonprofit professional society aiming at the advancement of reliability and physical asset management industry. It is valuable for practitioners looking to expand their knowledge and skills in maintenance and reliability and build business connections with other professionals.

#### **Society of Reliability Engineers***(http://www.sre.org)*

A professional society. The web site contains info on activities, articles, and references.

In addition to those mentioned above, national organizations for reliability and quality exist in various countries and can be found via the Internet. An example follows.

### **Czech Society for Quality, Reliability Section***(http://www.csq.cz)*

CSQ is an association bringing together individuals and organizations engaged in quality management, reliability, risk and security, environmental management, automotive industry, technical standardization, and others. The special sections organize courses and seminars and publish the corresponding materials (mostly in Czech language).

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

**Chapter 25**

## **Standards Related to Reliability**

Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62366

## **1. IEC — International Electrotechnical Commission** *(http://iec.ch)*

Due to big importance, reliability issues are covered by many standards, which provide welltested procedures and design values. The standards are the result of cooperation of many experts from various countries and contribute worldwide to the high level of quality, reliability, and safety of products and services and to the reduction of failures and accidents. Standards also facilitate legal disputes related to the compensation of various damages. In some cases, the use of standards is compulsory, but sometimes it is a matter of agreement between the supplier and the customer. However, competition among many manufacturers leads to high emphasis on quality and reliability so that the use of pertinent standards gradually becomes the norm. Generally, standards offer safe rules and values, though somewhat conservative, so that they are also updated from time to time, with respect to new information or methods.

There are many associations for standardization around the world. The names of many of them as well as the numbers and titles of numerous individual standards can be found, for example, at the website of IHS Global *(http://global.ihs.com)* and can be purchased here. They can also be found (and bought) via various standardization bodies given below. In this chapter, several institutions will be listed, whose standards for reliability are used internationally. Then, some of these standards will be named as examples.

The technical commission TC 56 "Dependability" of this organization for standardization prepares the international standards related to reliability. An overview of all valid IEC standards can be found via the above web site.

## **2. ISO — International Commission for Standardisation** *(http:// www.iso.org)*

This commission prepares and approves the standards for quality and many other subjects.

© 2016 The Author(s). Licensee InTech. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reliability and quality standards can also be found under BS (British Standards), DIN (Deutsches Institut für Normung), ASME (American Society of Mechanical Engineers), NBS (National Bureau of Standards, USA), JSA (Japanese Standards Association), KSA (Korean Standards Association), GOST (Gosudarstvennye Standarty — Russian state standards), and various others. Standards for civil engineering constructions can be found, for example, under EN (Euronorms, i.e. European standards or Eurocodes; see *http://www.eurocodes-online.com*).

Standards for military applications, but not only for them, can be found via *http:// www.dstan.mod.uk* (United Kingdom defence standards), *http://nso.nato.int/nso* (NATO Stand‐ ardization Office), and *http://www.defense.gov* (DoD or U.S. Department of Defense) or via websites such as *http://quicksearch.dla.mil*, *https://assist.dla.mil*, or *http://everyspec.com*. The access to some of them is open only for authorized people.

A selection of some international standards related to reliability follows.

IEC/ISO 31010. Risk management. Risk assessment techniques.

IEC 60050-191. International electrotechnical vocabulary. Part 191: Dependability and quality of service.

IEC 60050-192. International electrotechnical vocabulary. Part 192: Dependability.

IEC 60300-1. Dependability management. Part 1: Guidance for management and application.

IEC 60300-3-1. Dependability management. Part 3-1: Application guide: Analysis techniques for dependability — Guide on methodology.

IEC 60300-3-2. Dependability management. Part 3-2: Application guide: Collection of depend‐ ability data from the field.

IEC 60300-3-3. Dependability management. Part 3-3: Application guide: Life cycle costing.

IEC 60300-3-4. Dependability management. Part 3-4: Application guide: Guide to the specifi‐ cation of dependability requirements.

IEC 60300-3-5. Dependability management. Part 3-5: Application guide: Reliability test conditions and statistical test principles.

IEC 60300-3-6. Dependability management. Part 3: Application guide Section 6: Software aspects of dependability.

IEC 60300-3-10. Dependability management. Part 3-10: Application guide: Maintainability.

IEC 60300-3-11. Dependability management. Part 3-11: Application guide: Reliability centred maintenance.

IEC 60300-3-12. Dependability management. Part 3-12: Application guide: Integrated logistic support.

IEC 60300-3-14. Dependability management. Part 3-14: Application guide: Maintenance and maintenance support.

IEC 60300-3-15. Dependability management. Part 3-15: Application guide: Engineering of system dependability.

IEC 60300-3-16. Dependability management. Part 3-16: Application guide: Guidelines for specification of maintenance support services.

IEC 60319. Presentation and specification of reliability data for electronic components.

IEC 60605-2. Equipment reliability testing. Part 2: Design test cycles.

IEC 60605-4. Equipment reliability testing. Part 4: Statistical procedures for exponential distribution: Point estimates, confidence interval, prediction intervals, and tolerance intervals.

IEC 60605-6. Equipment reliability testing. Part 6: Tests for the validity and estimation of the constant failure rate and constant failure intensity.

IEC 60706-2. Maintainability of equipment. Part 2: Maintainability requirements and studies during the design and development phase.

IEC 60706-3. Maintainability of equipment. Part 3: Verification and collection, analysis, and presentation of data.

IEC 60706-5. Maintainability of equipment. Part 5: Testability and diagnostic testing.

IEC 60812. Analysis techniques for system reliability. Procedure for failure mode and effects analysis (FMEA).

IEC 61014. Programmes for reliability growth.

IEC 61025. Fault tree analysis (FTA).

IEC 61070. Compliance test procedures for steady-state availability.

IEC 61078. Analysis techniques for dependability. Reliability block diagram and Boolean methods.

IEC 61123. Reliability testing. Compliance test plans for success ratio.

IEC 61124. Reliability testing. Compliance tests for constant failure rate and constant failure intensity.

IEC 61160. Design review.

IEC 61163-1. Reliability stress screening. Part 1: Repairable assemblies manufactured in lots.

IEC 61163-2. Reliability stress screening. Part 2: Electronic components.

IEC 61164. Reliability growth — Statistical test and estimation methods.

IEC 61165. Application of Markov techniques.

IEC 61649. Weibull analysis.

IEC 61650. Reliability data analysis techniques: Procedures for comparison of two constant failure rates and two constant failure (event) intensities.

IEC 61703. Mathematical expressions for reliability, availability, maintainability, and mainte‐ nance support terms.

IEC 61709. Electric components: Reliability. Reference conditions for failure rates and stress models for conversion.

IEC 61710. Power law model: Goodness-of-fit tests and estimation methods.

IEC 61882. Hazard and operability studies (HAZOP studies): Application guide.

IEC 62198. Managing risk in projects: Application guidelines.

ISO 61713. Software dependability through the software life-cycle processes: Application guide.

ISO 9000. Quality management.

This is a group of standards that addresses various aspects of quality management. It provides guidance and tools for companies and organizations that want to ensure that their products and services consistently meet the customer's requirements and that quality is permanently improved.

The updated version ISO 9001:2008 – Quality Management Systems has several parts: Princi‐ ples, Vocabulary, Requirements, etc. This standard sets out the criteria for a quality manage‐ ment system and is the only standard in this family that can be certified to. It can be used by any organization regardless of its field of activity. ISO 9001:2008 is implemented by more than 1 million companies and organizations in more than 170 countries.

There are also branch-related groups of standards based on internationally recognized standard ISO 9001. For example, standards AS/EN 9100, AS/EN 9110, and AS/EN 9120 are devoted to quality management systems with specific requirements on aviation, space, and defense industries. They are published by the International Aerospace Quality Group (IAQG). Especially, the AS/EN 9120 focuses on product safety and reliability and addresses critical product performance, conformity to specifications, and airworthiness.

ISO 2394. General principles on reliability for structures.

ISO 12491. Statistical methods for quality control of building materials and components.

ISO 13822. Bases for design of structures: Assessment of existing structures.

ISO/IEC Guide 51. Safety aspects: Guidelines for their inclusion in standards.

ISO-IEC Guide 73. Risk management: Vocabulary.

CAN/CSA – Q634-91. Risk Analysis Requirements and Guidelines.

EN 1990. Eurocode: Basis of structural design.

EN 50126. Railway applications. The Specification and Demonstration of Reliability, Availa‐ bility, Maintainability and Safety (RAMS). This is a group of the standards of the CENELEC (European Committee for Electrotechnical Standardization) for the reliability and safety in rail industry in Europe and other countries. It has several parts: EN 50126-1: Part 1: Generic RAMS process, EN 50126-2: Part 2: Systems approach to safety, EN 50126-3: Guide to application of EN 50126-1 for rolling stock RAM, EN 50126-4: Functional safety: Electrical/electronic/ programmable electronic systems, and EN 50126-5: Functional safety: Software.

The standards for reliability and quality in the railway industry can also be found under the International Railway Industry Standard (IRIS). Those for road vehicles industry can be found, for example, under SAE (Society of Automotive Engineers) or VDI (Verein Deutscher Ingen‐ ieure).

As said above, the standards are updated from time to time, and some can even lose its validity, be withdrawn, or replaced by others. Thus, when looking for a certain standard, one must make sure whether it is valid or if it has undergone some changes.

## **3. MIL-HDBK and MIL-STD**

Various volumes of Military Handbook (MIL-HDBK) or Military Standards (MIL-STD), issued by the U.S. Department of Defense (DoD), deal with reliability (among other things) and are worldwide recognized and become international standards of their own. Some titles follow. However, the reader must always check whether the pertinent issue is valid. Nevertheless, even "cancelled" issues can contain useful information.

MIL-HDBK-189. Military Handbook: Reliability growth management.

MIL-HDBK-217. Military Handbook: Reliability prediction of electronic equipment. It contains failure rate models for numerous components, usually more conservative than in other standards.

MIL-HDBK-338B. Military Handbook: Electronic reliability design handbook.

MIL-HDBK-344. Military Handbook: Environmental stress screening of electronic equipment.

MIL-HDBK-472. Military Standardization Handbook: Maintainability prediction.

MIL-STD-105E. Military Standard: Sampling procedures and tables for inspection by attrib‐ utes. This standard was officially cancelled in 1995. A similar topic is dealt with in MIL-STD-1916: DoD preferred methods for acceptance of product.

MIL-HDBK-781D. Military Handbook: Reliability test methods, plans, and environments for engineering development, qualification, and production.

MIL-STD-785B. Military Standard: Reliability program for systems and equipment develop‐ ment and production.

MIL-STD-1629A. Military Standard: Procedures for performing a failure mode, effects, and criticality analysis.

MIL-STD-2074. Military Standard: Failure classification for reliability testing.

MIL-STD-2155. Military Standard: Failure reporting, analysis, and corrective action system (FRACAS).]

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

**Chapter 26**

## **Software for Reliability Analysis**

## Jaroslav Menčík

Additional information is available at the end of the chapter

http://dx.doi.org/10.5772/62367

This short overview gives only some of the commercial programs for reliability assessment. They range from simple ones at moderate prices and suitable for a limited range of prob‐ lems to program systems for the analysis of complex problems. For getting practice and for the solution of simple problems, the reader can create own programs based on universal software such as Excel, although their possibilities are limited.

### **VaP – Variables Processor** *(http://www.petschacher.at)*

A simple Monte Carlo simulation program VaP enables the probabilistic analysis of a userdefined function *G*(*x*) depending on one or more random input variables. Several types of probability distributions can be used. VaP calculates the expected value, standard deviation, skewness, and curtosis of *G*. It shows the histogram of the function *G* and calculates the probability that *G* gets less than zero. (The program was originally developed for the students of civil engineering at ETH, with *G* denoting reliability margin, cf. Chapter 14, and can also use the First Order Reliability Method FORM.) The main results are saved and printed. Favorable price-performance ratio.

#### **Anthill***(http://www.sbra-anthill.com)*

Anthill is a computer program for the calculation of reliability and other statistical analyses based on the Monte Carlo method. The user-defined model for analysis can use various mathematical and logical functions and predefined histograms. After the trials with random values of input quantities have been performed, a statistical analysis of sampled values is performed, and the results are displayed. The resulting histograms and statistical parameters can be saved for further postprocessing. Favorable price-performance ratio.

### **Feasible Reliability Engineering Tool (FReET)** *(http://www.freet.cz)*

FReET is a multipurpose probabilistic software for statistical, sensitivity, and reliability analysis of engineering problems, developed at the Brno University of Technology, Institute

of Structural Mechanics. It allows the simulation of random uncertainties in various problems, especially in civil and mechanical engineering (material properties, loading and geometrical imperfections). It uses the following probabilistic techniques: crude Monte Carlo simulation, Latin hypercube sampling, simulated annealing, first-order reliability method (FORM), and others. Favorable price-performance ratio.

#### **Strurel** *(http://www.strurel.com)*

Strurel is a set of programs for the reliability analysis of constructions (especially in civil engineering). It consists of three programs: Statrel (reliability-oriented statistical analysis), Comrel (time-invariant and time-variant analysis), and Sysrel (a program for system reliability analysis). It can work with analytical functions and performs reliability analysis using various methods, such as Monte Carlo or FORM and SORM (first-order or second-order reliability methods), used for the solution of the problems of load-resistance type.

In addition to the above software, oriented mostly on the determination of failure probability of one component or construction, also software systems exist, which use various tools and enable comprehensive reliability analysis of very complex objects, such as aircrafts or weapon systems. Here, four brands will be mentioned.

#### **Item Software** *(http://www.itemsoft.com)*

This software firm (USA) offers various products, such as ITEM ToolKit. This is a suite of several analytical and reliability prediction modules in one integrated environment, such as

MIL-HDBK-217 module. A reliability prediction program based on the internationally recognized method of calculating electronic equipment reliability defined in military hand‐ book MIL-HDBK-217 (published by the U.S. Department of Defense).

IEC 62380 Electronic Reliability Prediction module. It supports reliability prediction methods based on the latest European reliability prediction standard IEC.

NSWC Mechanical Reliability Prediction module. It uses a series of models for various types of mechanical devices including actuators, springs, bearings, seals, electric motors, pumps, compressors, brakes, and clutches to predict failure rates based on temperature, stresses, flow rates, and various other parameters. The module is based on the Naval Surface Warfare Center Handbook of Reliability Prediction Procedures for Mechanical Equipment.

China 299B Electronic Reliability Prediction module: A reliability prediction program based on the internationally recognized method of calculating electronic equipment reliability provided in the Chinese Military/Commercial Standard GJB/z 299B.

Telcordia Electronic Reliability Prediction module: Based on the Telcordia (Bellcore) TR-332 and SR-332 standards, calculates the reliability (steady-state failure rate) for various categories of electronic, electrical, and electromechanical components for various quality levels, envi‐ ronmental conditions, electrical stress conditions, and other parameters.

In addition to these modules, the ITEM ToolKit contains several other modules, for example for failure modes, effects, and criticality analysis (FMECA); for fault tree analysis (FTA); for construction of reliability block diagrams (RBD); for Markov analysis; and for maintenance and others.

## **ReliaSoft** *(http://www.reliasoft.com)*

This U.S. software firm offers a group of programs in one integrated environment, such as Weibull±±® for reliability analysis; ALTA® for Accelerated Life Testing Data Analysis; DOE± ±® for design of experiments; BlockSim® for the creation of reliability block diagrams based on fault tree analysis; RENO® simulation software for risk and decision analysis; Xfmea® – software for facilitating data management and reporting for all types of FMEA and FMECA; RCM±±® – software for the support of reliability-centered maintenance; Lambda Predict® – for reliability assessment based on standards; RBI® – risk-based inspection analysis for oil, gas, chemical, and power plants in adherence to the guidelines presented in the American Petro‐ leum Institute's publications API RP 580 and RP 581, as well as the American Society of Mechanical Engineers ASME; XFRACAS® – software system for web-based incident/failure/ data reporting, analysis, and corrective action; and RGA® – software for the analysis and support of reliability growth.

### **Isograph** *(http://www.isograph.com)*

Isograph offers various software for reliability analysis, such as:

Availability Workbench: A system for availability simulation and reliability-centered main‐ tenance RCM. It is used to optimize maintenance and spare parts, predict system availability and throughput, and estimate life-cycle costs. It includes Weibull analysis and life cycle costing modules as well as modeling methods such as FMECA, reliability block diagram analysis, and fault tree analysis.

Reliability Workbench: An integrated visual environment in which failure rate and maintain‐ ability prediction, FMECA, reliability allocation, reliability block diagram, fault tree, event tree, and Markov analysis are combined. Failure rate predictions are calculated from the Telcordia, MIL-HDBK-217, 217 Plus, and IEC TR 62380 standards for electronic equipment and the NSWC-98/LE1 Handbook for mechanical parts. FMECA, reliability block diagram, and fault tree analysis are performed to well-known standards such as MIL-STD 1629 and IEC 61508.

Hazop±: software for hazard and operability studies, with visual environment using the forms for entering Hazop information. Extensive reporting facilities are available.

More information, including other products, is available at the web site.

### **PTC Windchill**, formerly **Relex Software***(http://www.ptc.com/product/windchill/quality)*

PTC Windchill Quality Solutions combines quality, reliability, and risk management into an integrated toolset with the following products:

PTC Windchill CAPA: corrects and prevents actions and demonstrates compliance.

PTC Windchill Nonconformance: manages, corrects, and tracks internal quality issues.

PTC Windchill FRACAS: identifies and prioritizes failure-related trends.

PTC Windchill FMEA: identifies and mitigates potential failures.

PTC Windchill MSG-3: manages aircraft reliability according to industry standards.

PTC Windchill FTA: utilizes fault tree analysis to investigate safety and reliability issues.

PTC Windchill Prediction: predicts failure rate of components and system reliability.

PTC Windchill RBD: Reliability Block Diagrams manage quality in complex systems.

PTC Windchill Maintainability: predicts maintenance and repairs.

PTC Windchill LCC: Life Cycle Cost software analyzes the lifetime cost of a product.

PTC Windchill Weibull: Life data analysis or Weibull estimates on the life data of a product.

PTC Windchill ALT : Accelerated Life Testing software predicts product reliability.

PTC Windchill Markov: Visual analysis software that models complex systems.

PTC Windchill Customer Experience Management: reports and manages quality-related field issues.

## **Author details**

Jaroslav Menčík

Address all correspondence to: jaroslav.mencik@upce.cz

Department of Mechanics, Materials and Machine Parts, Jan Perner Transport Faculty, University of Pardubice, Czech Republic

## *Authored by Jaroslav Mencik*

Our life is strongly influenced by the reliability of the things we use, as well as of processes and services. Failures cause losses in the industry and society. Methods for reliability assessment and optimization are thus very important. This book explains the fundamental concepts and tools. It is divided into two parts. Chapters 1 to 10 explain the basic terms and methods for the determination of reliability characteristics, which create the base for any reliability evaluation. In the second part (Chapters 11 to 23) advanced methods are explained, such as Failure Modes and Effects Analysis and Fault Tree Analysis, Load-Resistance interference method, the Monte Carlo simulation technique, cost-based reliability optimization, reliability testing, and methods based on Bayesian approach or fuzzy logic for processing of vague information. The book is written in a readable way and practical examples help to understand the topics. It is complemented with references and a list of standards, software and sources of information on reliability.

Concise Reliability for Engineers

Concise Reliability

for Engineers

*Authored by Jaroslav Mencik*

Photo by agsandrew / AdobeStock