**2. The most important problems of Shewhart control chart application**

It was outlined in [13] that "There is no friendship between business and the theory of variation – incomprehension is going on and on" [13]. This is not just our viewpoint. We have already mentioned above similar statements of Hoyer and Ellis in [14–16]. A well-known expert in SPC W. Woodall in 2000 published a survey, "Controversies and Contradictions in Statistical Process Control" [17]. One of the main problems discussed there was the relationship between hypothesis testing and control charting. The main Woodall's conclusion on this issue is that it is a simplification to consider control charting as something equivalent to hypothesis testing ([17], *343*). This approach can be a serious obstacle to the right application of ShCCs in Phase I of process analysis. We agree. Moreover, this widely spread view of equivalency between these two entities can prevent the right use of control charts both in

<sup>1</sup> Here and everywhere below the figure printed in Italic after comma indicates the page in the reference.

#### *Shewhart Control Chart: Long-Term Data Analysis Tool with High Development Capacity DOI: http://dx.doi.org/10.5772/intechopen.113991*

Phase I and in Phase II. What is even more important is that very few practitioners know about this problem and hardly ever ponder on it.

D. Steinberg wrote in 2016 a survey of the state-of-the-art in industrial statistics, where he mentioned the following problems in SPC: multivariate data, profile data, and data from phasor measurement units [18]. He expressed his concern that too many research papers rely on unrealistic assumptions that do not exist in practice.

In 2017, W. Woodall wrote a follow-up to his 2000 survey called "Bridging the gap between theory and practice in basic statistical process monitoring" [19]. This time, he revisited some of the same problems he had talked about before, and discussed some new ones. Among the old problems, there was again the relation between statistical theory and practice. At the end of that paper, Woodall made several useful suggestions that could have improved the quality of statistical papers in the area of SPC and the quality of related studies. Simultaneously, he made a proposal that we think is completely unacceptable. According to Woodall, the use of the moving range chart should be ceased. We beg to differ from this suggestion and will explain our viewpoint below.

In 2022 W. Woodall published a new paper on SPC issues titled "Recent Critiques of Statistical Process Monitoring Approaches". In the introductory section, he wrote: "Hundreds of flawed papers on statistical process monitoring (SPM) methods have appeared in the literature over the past five to ten years. The presence of so many flawed methods, and so many incorrect theories, reflects badly on the SPM research field. Critiques of the various misguided approaches have been published in the last two years in an effort to stem this tide" [20]. Let us look at the flawed methods enlisted by Woodall: Use of Inadvisable Weighted Averages, Use of Auxiliary Information, Rules Equivalent to Runs Rules, Neutrosophic Methods, Mixing Various Charts, The Generally Weighted Moving Average Chart, Misuses of the EWMA Statistic, Repetitive Sampling Methods, using the coefficient of variation, the multivariate coefficient of variation, and various capability indices, etc.

It is noteworthy that overwhelming majority of practitioners uses only simple ShCCs because many new types of charts (e.g., CUSUM, EWMA, changepoint, etc.) turn out to be too difficult for engineers, operators, and workers.

We are sure that there are at least two root causes of such a sad situation. One, mentioned above, was described in Hoyer, Ellis paper [14–16]. Another one is more fundamental. In the report written in 1996, G. Box noted [21]: "An important issue in the 1930s was whether statistics was to be treated as a branch of Science or Mathematics. Unfortunately, to my mind, the latter view has been adopted in the United States and in many other countries. Statistics has for some time been categorized as one of Mathematical Sciences, and this view has dominated "university teaching, research, the awarding of advanced degrees, promotion, tenure of faculty, the distribution of grants by funding agencies and the characteristics of statistical journals". Judging by above-mentioned papers nothing has changed since 1996. All flawed techniques enlisted by Woodall in [20] are math's exercises or "Statistical Gymnastics" as caustically noted Ch. Quesenberry in [22]. Shewhart's close friend and associate W. Edwards Deming, ending the foreword to the 1939 Shewhart's book, wrote: "Another half-century may pass before the full spectrum of Dr. Shewhart's contributions has been revealed in liberal education, science, and industry" ([23], *ii*). It seems like another half-century may pass before all who are trying to use control charts efficiently have understood the main ideas of Shewhart and Deming.

D. Steinberg, in his paper [18], cited well-known statistician B. Gunter, who wrote in 2008 panel discussion in *Technometrics* on the future of industrial statistics: "I fear that

*Technometrics* has evolved from primarily making connections to the real, hard, and complex questions of scientific practice to primarily producing artificial formulations of those questions suitable for compact "solution" by mathematical characterization. To understand what is useful and not merely wrong in industrial statistical practice, we need to pay much more attention to the messy details that make up reality". Then Steinberg did not agree with Gunter's statement that most academic papers "has become completely cut off from real problems". But he agreed "that many of the most challenging and exciting problems arising today are not getting space in our journals and that we need better theory to guide us in attacking such problems" ([18], *52*).

Let us sum up the main idea of all papers cited above: too many statistical works went far away from real practice and do not help practitioners in solving their real problems. This is a direct contradiction to Shewhart-Deming approach and to the basic idea of Shewhart control chart, which "stands out as the only one that actually examines the data for the internal consistency which is a prerequisite for any extrapolation into the future. Thus, unlike all "tests" and "interval estimates" of statistical inference Shewhart's process behavior charts are tools for Analytic Studies. Rather than mathematical modeling, or estimation, Shewhart's charts are concerned with taking appropriate actions in the future based upon an analysis of the data from the past. Out of all the statistical procedures available today, they alone were designed for the inductive inferences of the real world" ([9], *19*). We see this tendency to disregard reality for the world of math models and also ignore the problems of simple control charts in favor of more and more complex designs. Many books and standards that are being widely used by practitioners all over the world teach the theory of control charts based on very unrealistic assumptions about real processes and their behavior (see [1–5, 7, 12], to name a few). In the following discussion, we will examine more thoroughly some issues, such as the various types of assignable causes of variations, the examples of unanswered questions in the theory of control charts, and other related topics.

#### **3. What is an assignable cause of variation, and how it changes a process?**

According to Shewhart ([24], *14*), "…in the majority of cases there are unknown causes of variability in the quality of a product which do not belong to a constant system…these causes were called *Assignable"*. He, further, explains that an assignable cause of variation is one which can be found without excessive waste of time and money. What is mostly important for us, Shewhart outlines the principal impossibility to establish a criterion of revealing an assignable cause a priori either by formal or by mathematical method.

A famous statistician and quality guru, Dr. Deming wrote in his foreword to Shewhart's book ([23], *ii*): "The great contribution of control charts is to separate variation into two sources: (1) the system itself ('chance causes', Dr. Shewhart called them), the responsibility of management; and (2) assignable causes, called by Deming 'special causes', specific to some ephemeral event …" A process is called statistically controllable or stable or predictable if all assignable causes of variation are removed.

On the other hand, Wheeler and Chambers [6] defined assignable causes of variation as follows: "Uncontrolled variation that is characterized by a pattern of variation that changes over time". In a paper written later, Wheeler [25] pointed out that chance or common causes differ from assignable causes due to their impact on a process. So, they are not principally contradictory.

#### *Shewhart Control Chart: Long-Term Data Analysis Tool with High Development Capacity DOI: http://dx.doi.org/10.5772/intechopen.113991*

Woodall [17] provided the following definition: "'Common cause' variation is considered to be due to the inherent nature of the process and cannot be altered without changing the process itself. 'Assignable (or special) causes' of variation are unusual shocks or other disruptions to the process, the causes of which can and should be removed".

Finally, Montgomery [10] stated that: "…common causes are sources of variability that are embedded in the system or the process itself, while assignable causes usually arise from an external source."

Thus, there are slightly different views on whether assignable causes of variation are a result of intervention into the system from the outside or not; however, there is a full agreement that they are "some ephemeral events that can usually be discovered …and removed". A tool for distinguishing assignable causes of variation from common causes is the control chart, coined by W. Shewhart in 1924. Up to 2010 there were more than 4000 research papers published on this topic [26]. Intensive analysis of books and main reviews in this area showed [1–10, 25, 27, 28] that practically all papers on the ShCCs considered very simple model of chart's behavior. Almost all researchers studied the statistical properties of simple ShCC charts when some assignable cause of variation changed either the mean or the standard deviation or both of the underlying distribution, which stayed of unchanged type (and almost always was normal).

Alternatively, if one looks through the explanation of assignable causes of variation in the very popular SPC Manual [29], used in auto industry for many years, she/ he will see the picture (page 30 in [29]), which clearly shows that an assignable cause of variation can lead to an arbitrary change of the distribution function type2 . Just this was the main idea of the work [30]. The authors studied the case'when after a special cause of variation emerged, the underlying normal distribution function (DF) transformed into either uniform or log-normal distribution. It was found that the probabilities of detecting a shift in the mean changed radically from the case of normal DF (see Figures 4–7 in Ref. [30]). Indeed, as soon as one accepts the opportunity of DF type to change after the impact of assignable cause of variation, a lot of different possibilities emerges, and only one of them has been investigated and described in the literature. It was proposed in Ref. [30] to introduce two types of assignable causes of variation: not changing the underlying DF and changing it. It is worth stressing that this idea can be generalized, as ShCCs do not need any assumptions about DF type. So, a more general proposal could be an introduction of two types of assignable causes of variation: not changing the system where the process is going on, and changing that system. Though more than 10 years have passed since the paper [30] was published, this idea has not been either supported or refuted by statistical community. Here, we would like to revisit this notion from a different angle. Let us look at **Figure 1**, taken from our work [31]. One can see in **Figure 1**, the ShCC with two red circles and two green ovals on it. Red circles relate to the points falling beyond chart's limits, and ovals show the points where the process mean jumped. Obviously, both situations emerged due to some special causes of variations. But is there any difference between these two cases: one when the assignable cause was evanescent, and the system has not changed, and second when the assignable cause has changed the system? As far as we know, such a question has never been discussed in SPC literature. Does it deserve to be discussed? We are sure it does because in the first case the search

<sup>2</sup> It is noteworthy that using an appropriate ShCC to analyze stability of key processes is a mandatory requirement in the automotive standard ISO/TS 16949.

#### **Figure 1.**

*Daily sales of a distribution network. Here 1 centner = 100 kilograms.*

for the root cause of interference in the process has to be made by the process team (engineers, operators, linear managers, etc.), and in the second case this search is an act of top management – only CEOs are responsible for the system as a whole. This forces us to come back to the idea of different types of assignable causes of variations. Having generalized and slightly modified the definitions from [30, 31] we suggest the following version:

*Definition 1*: An assignable cause of variation of type I (*Intrinsic*) does not change the system within which a process works (e.g., does not change the type of the underlying DF). As a result, it is quite natural to consider such a type of assignable causes as belonging to the system (though this is not a necessary condition).

*Definition 2*: An assignable cause of variation of type X (eXtrinsic) changes the system within which a process works (e.g., changes the parameters, or type or both of the underlying DF). As a result, it is quite natural to consider such a type of assignable cause as, most probably, not belonging to the system (though this is also not necessary).

If the statistical community agrees with our suggestions, then the difference between dissimilar types of assignable causes of variation will help practitioners to grasp who, in the first place, has to interfere in the process. This is highly important knowledge to improve the process with success. The instability due to assignable causes of type I requires searching for a root cause within the system. The instability due to assignable causes of type X requires searching for a root cause outside the system.

#### **4. The importance of time order of process values**

As stressed by Shper and Adler [32], the problem of data nonrandomness has been underestimated in recent years, though it was of primary importance to Shewhart. More than once, W. Shewhart returned to this issue, explaining the key role of the order of points for understanding whether a process is stable or not. On page 12 of [23], he clarifies that all attempts to determine some DF that could

#### *Shewhart Control Chart: Long-Term Data Analysis Tool with High Development Capacity DOI: http://dx.doi.org/10.5772/intechopen.113991*

thoroughly describe a state of statistical control are useless and senseless. Some statisticians considered the normal law to be such DF, but these hopes turned out to be refuted completely. Further on page 27, W. Shewhart continues: "…the significance of observed order is independent of the frequency distribution…" and "… are primitive". Shewhart's conclusion about the importance of the point order was firmly supported by such outstanding gurus as W. Deming (see Deming's Foreword in [23]) and G. Box [33].

Let us consider a revealing example of an erroneous conclusion caused by the neglecting of the role of the order of points in real processes. A well-known expert in SPC, W. Woodall has long since his survey of 2000 [17], stood up for the elimination of the moving range chart from an arsenal of SPC tools. His arguments were based on the results of random data simulation, which follows from the paper of Rigdon et al. [34], which Woodall referred to in [17]. However, the moving range automatically takes into account the order of points due to the structure of successive differences. That is why it contains much more important information about the process than the standard deviation (SD). The SD completely ignores the succession of process points. Therefore, using the average of the moving range of two (AMR) to estimate the ShCC limits allows anyone to take into account the order of points within a process. If we eliminate the moving range chart from using, we automatically neglect many patterns within a process.

Another famous SPC expert, L. Nelson, also recommended avoiding using the moving ranges because of problems with their interpretation [34]. But, in fact, he considered the moving ranges necessary to calculate the limits of the x-chart because they are better than SDs. The reason is clear: the moving range measures variations from point to point irrespective of their level, which can vary due to trends, oscillations, patterns, etc. Now, one may ask: how often are any patterns present in real processes, and if they influence the outcomes or not?

Without any doubt patterns are present at all real processes, but sometimes their influence may be small, and, consequently, ignored. The famous Box adage about models is working, of course. But, in practice, the level of pattern influence is rarely known beforehand. Shewhart foresaw this many years ago: "… a sequence is called random if it is known to have been produced by a random operation, but is assumed to be nonrandom if occurring in experience and not known to have been given by a random operation" ([23], *17*). This means that if an observed sequence is or is not random, it can be verified only in the future and not by any ingenious math. It follows immediately from this Shewhart remark that there is no and cannot even exist any universal indicator of process nonrandomness. So, what should we do in such a situation? We need to have a variety of dissimilar metrics/indices/rules revealing different types of nonrandomness. There are a number of such indices that are well-known and widely used. The so-called additional rules for ShCC interpretation are the first coming to mind. These rules are described in practically all books, guides, standards, etc., of SPC (see, e.g., §5.7 in [9]). It is noteworthy that each such rule can reveal only one single case of nonrandomness. In other words, the standard set of rules covers a minuscule part of potential opportunities. Except for these rules, there is a run test on randomness [35] – a useful rule based on the number of series in data. Again, it reveals only one type of nonrandomness – a relationship between a number of points lying above and below a chart central line.

A new test on data randomness was proposed in [32]. It is the ratio of AMR to SD. As noted above, the value of AMR is very sensitive to any patterns of nonrandomness, so this ratio deviates from its standard value as soon as data have some kind of

#### **Figure 2.**

*The limits of AMR/SD ratio for k = 10, 20, 30, 40, 50, 100, 150, and 200.*

nonrandomness3 . What is important about this index is that it covers not a single case of nonrandomness but, rather, an unknown set of possibilities. In **Figure 2** (taken from [32]), one can see the dependence of confidence limits for AMR/SD on the number of points (*k*) under investigation. If the value of AMR/SD lies above or below the lines on that picture, then one may state that with 95% confidence, the data are not random. The exact values of probabilities are given in [32].

There was the question raised in Shper and Adler [32]: why do current studies of ShCCs ignore the order of points? Our version is as follows. This ignorance is caused by the lack of understanding of the difference between analytic and enumerative studies. As Deming explained many times (see, e.g., [37]), an enumerative study deals with units taken from some population that have a definite value of mean, SD, DF, etc. An analytic study deals with a process that is going on and on; it does not have a definite mean or DF because it changes permanently. Many statistical methods such as confidence intervals and hypothesis testing can be applied to enumerative studies but are inapplicable to analytical ones. Deming's conclusion in Ref. [37] turned out to be quite unambiguous: "Statistical theory for analytic problems has not yet been developed" [37].

#### **5. The impact of non-normality on the control limits of ShCCs**

It is well-known that the traditional theory of ShCCs is based on (1) assumption of data randomness and (2) assumption of data being distributed normally. We discussed the first assumption in the previous paragraph. Below, we will consider the second.

It is assumed here that the reader is somewhat familiar with the construction and use of ShCCs. According to a generally accepted view, each ShCC has limits, separating the zone of system variability from the area of the assignable cause habitation. These limits can be calculated with very simple formulas based on the three-sigma

<sup>3</sup> The reverse ratio of SD to (AMR/*d2*) was called a stability index in [36] (*d2* is one of the constants used to construct ShCC). We think that this name is incorrect, but this is a topic of another paper.

*Shewhart Control Chart: Long-Term Data Analysis Tool with High Development Capacity DOI: http://dx.doi.org/10.5772/intechopen.113991*

rule suggested by Shewhart [24]. Typical formula for the control limits (CL) of many popular control charts for variables looks like [38]:

CL Average Scaling Factor • Some Measure of = ± Dispersion (1)

Scaling factors in (1) are frequently named as control chart constants and are usually denoted by different capital letters with indices, for example, *A*2*, D*3*, D*4*,* and *E*<sup>2</sup> (to name a few most widely used). They, in turn, depend on the so-called bias correction factors *d*2*, d*3, and *d*4 ([9], *416*):

$$A\_z = \frac{3}{d\_2 \sqrt{n}}\tag{2}$$

$$D\_3 = 1 - 3\frac{d\_3}{d\_2} \tag{3}$$

$$D\_4 = \mathbf{1} + \mathbf{3} \frac{d\_3}{d\_2} \tag{4}$$

$$E\_2 = \frac{\mathfrak{Z}}{d\_2} \tag{5}$$

$$E\_8 = \frac{3}{d\_4} \tag{6}$$

The factors *d*2*, d*3, and *d*4 (and, therefore, chart coefficients) are considered constant in the SPC literature, mostly widely used by practitioners (e.g., [6, 8–10]). Though, statisticians working in the field have known for a long time that nonnormality has a great impact on bias correction factors. Why is this? On the one hand, this opinion is based on the works of many outstanding statisticians of the first half of the last century (see references and other details in [39], and, e.g., §7.3 in the excellent book [40]). On the other hand, the bias correction factors do change insignificantly in many cases but not in all possible ones. This issue was carefully studied in our work [39], where such skewed distributions as exponential, log-normal, Weibull, Burr and Pareto were simulated, and the values of *d*2*, d*3, and *d*4 were estimated. The main results of that paper are as follows.

In conclusion to his landmark work of 1967 [41], Irwing Burr wrote: "… we can use the ordinary normal curve control charts constants unless the population is markedly non-normal. When it is, the tables provide guidance on what constants to use." Unfortunately, Burr did not point out what the words "markedly non-normal population" meant operationally. Moreover, there has been no discussion on this issue up to now. So, we proposed in [39] that a twofold increase in the probability of falling beyond the control limits be considered as the condition of significant deviation. Then, after simulation, the results shown in **Tables 1**–**3** were obtained. In **Table 1**, the parameters of investigated DFs and their notation are presented. *β*1 and *β*2 are the squared skewness and traditional kurtosis, respectively. The means of *d*2*, d*3, *d*4, and

#### *Quality Control and Quality Assurance – Techniques and Applications*


#### **Table 1.**

*Notations and parameters of DFs studied in [39].*


*\*This row gives the generally accepted values of bias correction factors.*

*\*\*δE2, % denotes the relative deviation of E2 from its standard value 2.66. Similar – δE5,% and δD4, %.*

#### **Table 2.**

*Results for x-mR chart.*

the corresponding values of *A*2*, E*2*, and E*5, calculated by using Eqs. (2)–(6), as well as their relative deviations from the standard values for the *x-mR* chart, are given in **Table 2**. **Table 3** provides similar information for *X -R* chart with subgroup size *n* = 2 and 3. The DFs having a relative probability increase of less than twofold are excluded from **Table 3**. Almost all figures in **Table 2** show the values that relate to DFs having


*Shewhart Control Chart: Long-Term Data Analysis Tool with High Development Capacity DOI: http://dx.doi.org/10.5772/intechopen.113991*

**Table 3.** *Results for X -R chart.*

more than a twofold increase in corresponding probabilities. It was recommended in [39] that for all cases when one encountered "markedly non-normal" data, he/she should use the algorithm to construct the ShCC proposed in [39] and the corrected values of the chart's constants given there.
