**1. Introduction**

Statistical process control and monitoring (SPCM) methods originally arose in the context of industrial/manufacturing applications, developed during and after World War II. Since then, it has become a popular way of monitoring all processes. Today, large volumes of data are often available from a variety of sources, in a variety of environments that need to be monitored. This means one needs to make sense of these data and then be able to make efficient monitoring decisions based on them. While constructing and applying monitoring tools, a fundamental assumption, necessary to justify the end results, involves the assumption about the distribution that the data have been generated from. This is the heart of outbreak detection of events particularly for estimating the in-control false discovery rates.

Selecting an appropriate probability distribution for the data is one of the most important and challenging aspects of data analysis. The estimates of outbreak false discovery rates often hinge on this crucial selection. We focus on the distribution for the time between events (TBEs) because there are often many of these events as compared to counts. Therefore, it is easier to fit an appropriate distribution using these many TBE values. The most commonly assumed distribution in the application of TBE is the Weibull distribution which is asymmetric and sometimes severely skewed. However, depending on the context, other distributions may also be used, such as the exponential distribution (which is a special case of a Weibull

distribution) or the gamma distribution. If the TBE distribution is exponentially distributed, then the related counts are Poisson distributed. This book chapter focusses on counting processes when the distribution of TBE values is known to be Weibull distributed.

standardized residuals. The assumption of a distribution is also an important part of this analysis. In this chapter, our focus is on the Weibull distributional assumption for the TBE values. However, rather than monitoring the TBEs, we monitor the counts over a fixed time interval, because this improves the early detection of

*Exponentially Weighted Moving Averages of Counting Processes When the Time between Events…*

Considering a fixed distribution for TBEs during the monitoring period, we assume that the time of the day and date stamp of all events are available. For a series comprised of n events, let the day numbers for events be denoted as

*d*1, *d*2, … , *dn*

*τ*1, *τ*2, … , *τn:*

Note that these times are measured in fractions of a day, e.g., 0 ≤*τ<sup>i</sup>* < 1. Daily counts are given by counting the number *di* values that are the same and then adding the days with zero counts for those days that are missing from the *di* values.

*w*<sup>1</sup> ¼ *d*<sup>2</sup> � *d*<sup>1</sup> þ *τ*<sup>2</sup> � *τ*1, … , *wn*�<sup>1</sup> ¼ *dn* � *dn*�<sup>1</sup> þ *τ<sup>n</sup>* � *τ<sup>n</sup>*�<sup>1</sup>

*x*< � *rweibull n*ð Þ ¼ *number of TBE*, *shape* ¼ *shape*, *scale* ¼ *scale counts* < � *table floor cumsum x* ð Þ ð Þ ð Þ *counts*< � *counts*½ � �*length counts* ð Þ

Denote these counts as *ci* for the *i*th day. We define the exponentially weighted moving average (EWMA) statistic for these daily homogeneous counts as follows:

where *e*<sup>0</sup> ¼ *E c*ð Þ*<sup>i</sup>* is assumed constant when in-control for days *i* ¼ 1, 2, … *:*. The smoothing parameter, *α*ð Þ 0<*α*< 1 , determines the level of memory of past observations included in *ei*. Smaller values of *α* retain the more memory of past counts than do the larger values. Therefore, smaller values of *α* are more efficient at detecting smaller changes in mean counts, while larger values of *α* are more efficient at detecting larger changes. EWMA statistic, *ei*, has a minimum value of zero, and we do not allow it to fall below zero because we are only interested in outbreaks, i.e., counts that are higher than expected. This EWMA statistic controls the worst-case scenarios, whereas the traditional EWMA statistic can wonder well below zero before going out of control causing the EWMA statistic to take a long time to signal the outbreak. An outbreak is flagged when *ei* exceeds a predetermined threshold, *hc*ð Þ *ARL*0, *scale*, *shape*, *α* . For a given pair of shape and scale parameters,

*ei* ¼ max 0, ð Þ *αci* þ ð Þ 1 � *α ei*�<sup>1</sup> (1)

where *wi* represents the time between ð Þ *i* þ 1 th and *i*th events. We flag an outbreak wherever *wi* for any *i* ¼ 1, … , *n* � 1 are consistently lower than an expected value. These TBEs are assumed to be Weibull distributed with fixed scale and shape parameters. Using R code, we define the counting process, for say daily

with the event times within days which are defined as

smaller outbreaks (Sparks et al., [4]).

*DOI: http://dx.doi.org/10.5772/intechopen.90873*

Then TBEs are given by

counts, as

**187**

**2. Monitoring homogeneous counts**

The challenge of making and meeting the distributional assumption is faced by all practitioners and data analysts. In many monitoring settings, event data are collected in a nearly continuous stream, and it is often more meaningful to monitor the individual TBE data [1–4] when outbreaks are of large magnitude. This individual event data are aggregated over fixed time intervals (e.g., daily) to form counts. In this chapter, these counts are monitored to detect outbreaks resulting from small changes in the incident of events. The focus is on the steady-state situation because this is the most common situations in event monitoring. Note that we cannot stop the process and investigate the out-of-control situation because often in nonmanufacturing settings it is not under our control. Events may include warranty claims of a product, health presentation at emergency departments, sales of an online products, etc. Here the term "quality of the process" is used in a general sense, which is context dependent. In the case of sales, an outbreak would represent an increased sales opportunity, provided the inventory stock can support this outbreak and the products are not sold out before the next order arrives. However, for warrantee claims, this would represent an undesirable outbreak of increased claims which may require a failure mode effect analysis [5]. Monitoring of in-control nonhomogeneous counting processes has traditionally been carried out using either Poisson or the negative binomial distributions for the counts. Many statistical tools used in SPCM for counts data are documented in Sparks et al. [1–3], Sparks et al. [6], Weiß [7, 8], Weiß and Testik [9, 10], Yontay et al. [11], and Albarracin et al. [12]. These control charts are perhaps the most well-known count monitoring methodologies. The control chart graphic is a time series plot of a signal-to-noise ratio designed for the user to make decisions about the outbreak of events.

In any case, before defining an "in-control" process, we need the information about the probability distribution of the events being monitored. When this information is available, it is possible to calculate the probabilities (or the chance) from the event in-control distribution, which could be defined by the TBEs or the counting of events within a fixed time interval. Deciding on whether event distribution constitutes an event outbreak is based on whether the event distribution is extreme compared to what is usual, i.e., counts are higher than expected. This is usually gauged by some upper threshold for the signal-to-noise ratios.

In a vast majority of SPCM applications, it is common to assume that the underlying probability distribution is of a (given) known form. In this chapter we assume that the TBEs are distributed as a Weibull distribution. However, we explore approaches to monitoring the counts of these events over fixed periods of varying length to find the period width that leads to earlier detection of outbreaks in terms of the average time to signal (ATS). The distribution of these counts when the TBEs are Weibull distributed is neither Poisson nor negative binomial distributed. Therefore, this chapter offers a different approach compared to others in the literature. In addition, the appropriate period of aggregation for the counts is explored in terms of how it influences early detection of outbreak events.

In practice event data are often collected in a continuous stream defined by the TBEs. Besides, wherever outbreaks are of large magnitude, it is more meaningful to monitor these individual TBE values [4]. In such situations, one also needs to deal with the issue of autocorrelation which may be thought of as the effect of time, as data values in close proximity with respect to time and space are likely dependent. This violates one of the basic assumptions in process monitoring, and a common way to deal with this issue is by fitting a time series model and monitoring the

*Exponentially Weighted Moving Averages of Counting Processes When the Time between Events… DOI: http://dx.doi.org/10.5772/intechopen.90873*

standardized residuals. The assumption of a distribution is also an important part of this analysis. In this chapter, our focus is on the Weibull distributional assumption for the TBE values. However, rather than monitoring the TBEs, we monitor the counts over a fixed time interval, because this improves the early detection of smaller outbreaks (Sparks et al., [4]).

### **2. Monitoring homogeneous counts**

distribution) or the gamma distribution. If the TBE distribution is exponentially distributed, then the related counts are Poisson distributed. This book chapter focusses on counting processes when the distribution of TBE values is known to be

*Quality Control - Intelligent Manufacturing, Robust Design and Charts*

The challenge of making and meeting the distributional assumption is faced by all practitioners and data analysts. In many monitoring settings, event data are collected in a nearly continuous stream, and it is often more meaningful to monitor the individual TBE data [1–4] when outbreaks are of large magnitude. This individual event data are aggregated over fixed time intervals (e.g., daily) to form counts. In this chapter, these counts are monitored to detect outbreaks resulting from small changes in the incident of events. The focus is on the steady-state situation because this is the most common situations in event monitoring. Note that we cannot stop the process and investigate the out-of-control situation because often in nonmanufacturing settings it is not under our control. Events may include warranty claims of a product, health presentation at emergency departments, sales of an online products, etc. Here the term "quality of the process" is used in a general sense, which is context dependent. In the case of sales, an outbreak would represent an increased sales opportunity, provided the inventory stock can support this outbreak and the products are not sold out before the next order arrives. However, for warrantee claims, this would represent an undesirable outbreak of increased claims which may require a failure mode effect analysis [5]. Monitoring of in-control nonhomogeneous counting processes has traditionally been carried out using either Poisson or the negative binomial distributions for the counts. Many statistical tools used in SPCM for counts data are documented in Sparks et al. [1–3], Sparks et al. [6], Weiß [7, 8], Weiß and Testik [9, 10], Yontay et al. [11], and Albarracin et al. [12]. These control charts are perhaps the most well-known count monitoring methodologies. The control chart graphic is a time series plot of a signal-to-noise ratio designed for the user to make decisions about the outbreak of events.

In any case, before defining an "in-control" process, we need the information about the probability distribution of the events being monitored. When this information is available, it is possible to calculate the probabilities (or the chance) from the event in-control distribution, which could be defined by the TBEs or the counting of events within a fixed time interval. Deciding on whether event distribution constitutes an event outbreak is based on whether the event distribution is extreme compared to what is usual, i.e., counts are higher than expected. This is

usually gauged by some upper threshold for the signal-to-noise ratios.

in terms of how it influences early detection of outbreak events.

**186**

In a vast majority of SPCM applications, it is common to assume that the underlying probability distribution is of a (given) known form. In this chapter we assume that the TBEs are distributed as a Weibull distribution. However, we explore approaches to monitoring the counts of these events over fixed periods of varying length to find the period width that leads to earlier detection of outbreaks in terms of the average time to signal (ATS). The distribution of these counts when the TBEs are Weibull distributed is neither Poisson nor negative binomial distributed. Therefore, this chapter offers a different approach compared to others in the literature. In addition, the appropriate period of aggregation for the counts is explored

In practice event data are often collected in a continuous stream defined by the TBEs. Besides, wherever outbreaks are of large magnitude, it is more meaningful to monitor these individual TBE values [4]. In such situations, one also needs to deal with the issue of autocorrelation which may be thought of as the effect of time, as data values in close proximity with respect to time and space are likely dependent. This violates one of the basic assumptions in process monitoring, and a common way to deal with this issue is by fitting a time series model and monitoring the

Weibull distributed.

Considering a fixed distribution for TBEs during the monitoring period, we assume that the time of the day and date stamp of all events are available. For a series comprised of n events, let the day numbers for events be denoted as

$$d\_1, d\_2, \dots, d\_n$$

with the event times within days which are defined as

$$
\pi\_1, \pi\_2, \dots, \pi\_n.
$$

Note that these times are measured in fractions of a day, e.g., 0 ≤*τ<sup>i</sup>* < 1. Daily counts are given by counting the number *di* values that are the same and then adding the days with zero counts for those days that are missing from the *di* values. Then TBEs are given by

$$w\_1 = d\_2 - d\_1 + \tau\_2 - \tau\_1, \ \dots, w\_{n-1} = d\_n - d\_{n-1} + \tau\_n - \tau\_{n-1}$$

where *wi* represents the time between ð Þ *i* þ 1 th and *i*th events. We flag an outbreak wherever *wi* for any *i* ¼ 1, … , *n* � 1 are consistently lower than an expected value. These TBEs are assumed to be Weibull distributed with fixed scale and shape parameters. Using R code, we define the counting process, for say daily counts, as

$$\times <-number \, (n = number \, of \, TBE, shape = shape, scale = scale)$$

$$\text{counts} < -table(\text{floor}(cumsum(\mathbf{x})))$$

$$\text{counts} < - \, \text{counts}[-length(counts)]$$

Denote these counts as *ci* for the *i*th day. We define the exponentially weighted moving average (EWMA) statistic for these daily homogeneous counts as follows:

$$e\_i = \max\left(0, ac\_i + (1 - a)e\_{i-1}\right) \tag{1}$$

where *e*<sup>0</sup> ¼ *E c*ð Þ*<sup>i</sup>* is assumed constant when in-control for days *i* ¼ 1, 2, … *:*. The smoothing parameter, *α*ð Þ 0<*α*< 1 , determines the level of memory of past observations included in *ei*. Smaller values of *α* retain the more memory of past counts than do the larger values. Therefore, smaller values of *α* are more efficient at detecting smaller changes in mean counts, while larger values of *α* are more efficient at detecting larger changes. EWMA statistic, *ei*, has a minimum value of zero, and we do not allow it to fall below zero because we are only interested in outbreaks, i.e., counts that are higher than expected. This EWMA statistic controls the worst-case scenarios, whereas the traditional EWMA statistic can wonder well below zero before going out of control causing the EWMA statistic to take a long time to signal the outbreak. An outbreak is flagged when *ei* exceeds a predetermined threshold, *hc*ð Þ *ARL*0, *scale*, *shape*, *α* . For a given pair of shape and scale parameters,

*hc* is determined so that a desired in-control average run length (*ARL*0) is achieved. Appendix A provides models for establishing the thresholds for various values for 0*:*2≤ *scale*≤0*:*46, 0*:*6≤*shape*≤1*:*4, *α* ¼ 0*:*1 and *ARL*<sup>0</sup> ¼ 100, 200, 300 or 400*:* Given a *hc* and the process is out of control, we want the ARL to be as low as possible. Monitoring daily counts, the *ARL* is defined as the number of days before a signaled outbreak. Later we will use the ATS to assess the relative performance of control chart plans, because the ARL may vary according to the aggregation period we select for the counts. Note that a false outbreak is flagged by this EWMA statistic being significantly larger than expected given it is in-control.

parameters of 0.85 and 0.02, respectively) is set to be equal to 400. The *ATS*<sup>0</sup> for the plans associated with other processes are equal to 300, 200, and 100, respectively. 100,000 events are employed to estimate the count mean for each process. In addition, 500 events are also considered for the burn-in period of the simulations. The performance of the devised plans is presented in T**ables 1–4**. The lowest ATS values are colored in the tables below in black bold text to make it easier to see

*Exponentially Weighted Moving Averages of Counting Processes When the Time between Events…*

**Table 1** shows the performance results for the plans employed to monitor counting process where the in-control data are Weibull with scale and shape parameters of 0.035 and 1.25, respectively. *ATS*<sup>0</sup> (in days) for these plans are set approximately equal to 100. As shown in **Table 1**, for the outbreaks of larger magnitude, plans with larger smoothing parameter are superior. On the contrary,

**EWMA counts (***α***), shape = 1.25 Threshold 31.835 32.162 32.524 32.807 33.207 33.470 33.803 34.108**

0.035 100.05 101.16 100.41 101.95 100.31 100.02 100.55 101.21 0.034 41.952 40.255 **39.997** 41.552 40.954 38.936 40.372 43.733 0.033 19.116 18.757 **18.275** 19.098 19.564 19.106 19.538 20.302 0.032 13.054 11.924 11.721 **11.259** 11.359 10.733 11.370 11.843 0.031 8.598 8.231 8.156 7.860 7.847 **7.491** 7.515 7.566 0.030 6.774 5.937 5.943 5.749 5.628 **5.393** 5.508 5.437 0.029 5.824 4.715 4.724 4.696 4.426 4.211 **4.161** 4.183 0.028 4.408 4.149 3.912 3.739 3.660 3.424 3.446 **3.283** 0.027 3.699 3.358 3.386 3.025 3.002 2.886 2.832 **2.788**

*Performance of plans when the in-control TBEs are Weibull distributed with scale = 0.035 and shape = 1.25.*

**EWMA counts (***α***), shape = 1.15 Threshold 36.20 37.058 37.4917 37.904 38.315 38.725 39.1025 39.515**

0.03 205.09 200.54 201.68 201.21 200.23 200.82 200.96 200.51 0.029 50.792 **47.665** 54.813 53.764 52.916 54.227 56.616 61.987 0.028 22.027 21.034 20.747 **21.067** 21.624 22.292 23.361 23.265 0.027 12.791 11.790 11.953 11.655 **11.265** 11.344 11.397 12.319 0.026 8.678 8.414 7.811 7.414 7.211 **7.230** 7.333 7.497 0.025 6.718 6.163 5.718 5.508 5.444 5.243 **5.159** 5.162 0.024 5.078 4.650 4.462 4.283 4.228 3.921 3.920 **3.877** 0.023 4.482 4.010 3.742 3.558 3.465 3.190 3.155 **3.072**

*Performance of plans when the in-control TBEs are Weibull distributed with scale = 0.03 and shape = 1.15.*

**0.04 0.06 0.08 0.10 0.125 0.15 0.175 0.20**

**0.04 0.06 0.08 0.10 0.125 0.15 0.175 0.20**

which plans are more efficient in certain situations.

*DOI: http://dx.doi.org/10.5772/intechopen.90873*

**Scale** *α*

**Scale** *α*

**Table 1.**

**Table 2.**

**189**
