3.2.1. Transformation functions

eDPM uses two transformation functions: one horizontal shift and one vertical shift. We have performed a statistical correlation study (e.g., quantile-quantile or Q-Q plot [22]) and found a very high correlation. Figure 6 illustrates a Q-Q plot for Project B data. If the data points follow a straight line, it indicates a strong correlation between two factors being considered or statistically, the distributions of the two factors are the same. That is, both curves have similarity in shape.

Figure 6. A sample Q-Q plot for project B release 5 data.

readiness is given as green (implying 'good to go') if the threshold is less than 15%, yellow if it is between 15% and 25%, and red if it is greater than 25%. In this example, the software falls in the yellow range, indicating that some caution is needed if the project decides to proceed with

As illustrated above, residual defects, which are derived from the defect arrival curve using SRGM, play a key role in software quality assessment in terms of delivery readiness. Our recommendation is that readiness is given as green (implying 'good to go') if the threshold is less than 15%, yellow if it is between 15 and 25%, and red if it is greater than 25%. In addition, it is important to track backlog defects at delivery, so as not to deliver known issues. Our recommendation is that all customer critical and major issues be resolved by delivery. We will

Typical SRGM techniques require defect data from the software test period. This limits their use during the early phases of software development during which it is usually necessary to make important (and time intensive) decisions (such as the level of staffing or amount of required testing or number of features to focus on) about the development process. Considering the industry trend towards very short software development lifecycles (i.e. agile development), it is essential to be able to make such decisions accurately very early on in the development phase. Specifically, in order to determine the staffing requirements for development and test activities during the early planning phase, many projects now need to understand what the defect find curve would look like during the internal test period. Therefore, early software defect prediction is needed for the early identification of software quality, cost

We propose a novel method, eDPM, for predicting defect arrival curves based on the feature arrival curve during the planning phase. The feature arrival curve often gives the number of sub-features for each feature of the project, together with the times when each sub-feature is expected to be completed. Such information is usually available during the development planning phase of the software development life cycle. Specifically, eDPM involves using data from a previous release of the same product, together with the feature arrival curve for the upcoming release. In order to produce a reliability modeling approach that covers the whole development process, the eDPM approach has been integrated into BRACE as an enhanced

eDPM uses two transformation functions: one horizontal shift and one vertical shift. We have performed a statistical correlation study (e.g., quantile-quantile or Q-Q plot [22]) and found a very high correlation. Figure 6 illustrates a Q-Q plot for Project B data. If the data points follow a straight line, it indicates a strong correlation between two factors being considered or statistically, the distributions of the two factors are the same. That is, both curves have similar-

the planned delivery.

50 Telecommunication Networks - Trends and Developments

address how to predict backlog defects in Section 3.2.2.

overrun, and optimal development strategy.

SRGM.

ity in shape.

3.2.1. Transformation functions

3.2. Enhanced SRGM: early defect prediction model (eDPM)

Let (x, y) represent a feature curve and (xnew, ynew) represent a defect arrival curve. We can move the feature curve to the right and closer to the defect arrival curve with the horizontal shift function in (5)

$$
\mathfrak{x}\_{new} = \mathfrak{a} + \mathfrak{x} \tag{5}
$$

where α and β are parameters intrinsic to an individual release. The parameter α represents the average time to find α defects, and β represents the additional delay in the defect find process, likely due to a test resource constraint or critical bugs. Next, we use a simple form for the vertical shift as shown in (6).

$$y\_{new} = \gamma \text{ y} \tag{6}$$

The parameter γ is determined as a ratio of the feature count and the defect count used for the best fitted line in the Q-Q plot. It represents the number of defects per feature. Combining (5) and (6), we can transform the feature curve to represent the defect arrival curve. If previous release data is not available, we can use defect data from the initial test period. Figure 7 demonstrates that feature ready curve is a good leading indicator for defect arrival curves. The feature arrival data is readily available for most projects.

#### 3.2.2. Case studies

We will now provide four case studies to demonstrate the robustness of eDPM for practical uses. It should be pointed out that the team "feature" is used here in a generic sense to represent either sub-feature, epic, story, or sprint depending on the availability of metrics for individual projects. Similarly, the term "release" represents a set of features defined for each software delivery. The release content continues to evolve over the software lifecycle. It is important to continuously monitor the release content and adjust the transformation functions to improve the prediction accuracy. While out of scope for this chapter, we have recently developed an algorithm which automates the estimation of parameters as new feature and defect data becomes available.

Figure 7. eDPM defect arrival curve prediction based on feature arrival curve.

Figure 8. eDPM case study #1: Previous release data.

Case Study #1—Previous release data: This case study considers a project without feature ready data available. We use previous release data, called Release N – 1, to predict the defect arrival curve for Release N. Using the transformation functions described in Section 3.2.1 and the test defect data from Release N, we can predict the defect arrival curve, as shown in Figure 8. Actual data is overlaid and compared against the predicted arrival curve. The several weeks (between 10 and 5) the actual data was not following the predicted curve were due to a few critical issues slowing down the test progress. Once they were cleared with fixes, the test progress was quickly recovered and the defects started to come in as expected.

Case Study #3—Feature development start data: This considers a project with a good record of feature development status. We first use feature ready dates to predict the defect arrival curve. As expected, it gives a good prediction. The next task is to evaluate if we can use the development start dates to predict the feature ready curve and the defect arrival curve. This is important to help a project to plan the development and test resources for both feature ready dates and test defects. In this case we apply the transformation functions in two phases, i.e., one for predicting feature ready curve and the other for predicting the defect arrival curve. As

Software Quality Assurance

53

http://dx.doi.org/10.5772/intechopen.79839

Case Study #4—DevOps CI/CD story points completed: The case considers a DevOps continuous integration & delivery project with a delivery interval of 2 or 3 weeks. The project implements a full Agile development process. Figure 11 shows cumulative views of story points and integration test defects in two different vertical scales. The vertical lines represent individual release dates. A user story is a very low-level definition of a requirement, containing just enough information so that the developers can produce a reasonable estimate of the effort to implement it. A story point is a measure of the effort required to implement a story. It is a relative measure of complexity, albeit a subjective one. However, if it is done in a

illustrated in Figure 10, the predictions are remarkably accurate for both cases.

Figure 10. eDPM case study #3: Development start vs. feature ready vs. test defects.

Figure 9. eDMP case study #2: Test cases executed vs. test defects.

Case Study #2—Test cases executed: This case considers a project without accurate data on feature ready dates, but having a good record of test cases executed prior to handing the features over to the test team. For our eDPM purpose the test case data is considered equivalent to the feature ready data. Figure 9 demonstrates near-perfect prediction of the defect arrival curve using test cases executed.

Figure 9. eDMP case study #2: Test cases executed vs. test defects.

Figure 10. eDPM case study #3: Development start vs. feature ready vs. test defects.

Case Study #1—Previous release data: This case study considers a project without feature ready data available. We use previous release data, called Release N – 1, to predict the defect arrival curve for Release N. Using the transformation functions described in Section 3.2.1 and the test defect data from Release N, we can predict the defect arrival curve, as shown in Figure 8. Actual data is overlaid and compared against the predicted arrival curve. The several weeks (between 10 and 5) the actual data was not following the predicted curve were due to a few critical issues slowing down the test progress. Once they were cleared with fixes, the

Case Study #2—Test cases executed: This case considers a project without accurate data on feature ready dates, but having a good record of test cases executed prior to handing the features over to the test team. For our eDPM purpose the test case data is considered equivalent to the feature ready data. Figure 9 demonstrates near-perfect prediction of the defect

test progress was quickly recovered and the defects started to come in as expected.

Figure 7. eDPM defect arrival curve prediction based on feature arrival curve.

arrival curve using test cases executed.

Figure 8. eDPM case study #1: Previous release data.

52 Telecommunication Networks - Trends and Developments

Case Study #3—Feature development start data: This considers a project with a good record of feature development status. We first use feature ready dates to predict the defect arrival curve. As expected, it gives a good prediction. The next task is to evaluate if we can use the development start dates to predict the feature ready curve and the defect arrival curve. This is important to help a project to plan the development and test resources for both feature ready dates and test defects. In this case we apply the transformation functions in two phases, i.e., one for predicting feature ready curve and the other for predicting the defect arrival curve. As illustrated in Figure 10, the predictions are remarkably accurate for both cases.

Case Study #4—DevOps CI/CD story points completed: The case considers a DevOps continuous integration & delivery project with a delivery interval of 2 or 3 weeks. The project implements a full Agile development process. Figure 11 shows cumulative views of story points and integration test defects in two different vertical scales. The vertical lines represent individual release dates. A user story is a very low-level definition of a requirement, containing just enough information so that the developers can produce a reasonable estimate of the effort to implement it. A story point is a measure of the effort required to implement a story. It is a relative measure of complexity, albeit a subjective one. However, if it is done in a consistent manner, it can be used as a good leading indicator for predicting defects, as shown in Figure 11.

It was later confirmed that there were two major process changes made during the reported period. As eDPM was applied to the data (Transformation #1), we were able to identify the trend change after several months where the transformation is no longer valid. It turns out that the trend change occurred when a major process change was made. Another set of transformation functions, called Transformation #2, were then used. The predicted values are very closely matching with actual defect data. Several months later, we encountered another trend change, which turns out to be caused by another major process change. We then used another transformation #3. With the successive use of eDPM we demonstrated that defects can be predicted with reasonable accuracy for the entire reported period.

Another benefit of eDPM is to help quantify the process improvement. One of the parameters, γ, as described in Eq. (6), represents defects per story point in this case study. By comparing the values of γ between two transformation periods, we can calculate the relative change in γ values. This relative change represents the percent of improvement due to the process change. To further provide the significance of this benefit, we were able to quantify an improvement of 10 and 70% for the first process change and the second, respectively.

4. Customer defect prediction

Figure 12. eDPM case study #5: sub-feature, defect arrival/closure/backlog.

Figure 13. Cumulative view of project B customer defect prediction vs. actual.

exactly the same as test environments [7].

discussions.

In Section 3, we demonstrated that the last curve prior to software delivery represents the final product from which the total number of defects and residual defects can be calculated. Previous release data or historical data from other projects will be helpful for determining the percent of delivered defects to be found during the operation period. See [16] for detailed

Software Quality Assurance

55

http://dx.doi.org/10.5772/intechopen.79839

The assumption that the defect curve can be extended from the development phase into the operational phase (e.g. [23–25]) does not hold in practice, as there are usually discontinuities due to changes in the intensity of testing, as well as operational conditions not always being

To highlight the procedure and results we will use defect data taken from Project B. Figure 13 shows a cumulative view of customer defect prediction. Note that the curve should be always above the actual data after delivery. The difference between the curve and the last actual data

Case Study #5—Defect closure data: This case study we consider a project with both defect arrival and closure data in addition to sub-feature arrival data. This project is still in early test phase but project management wants to know whether the defect backlog can reach zero at the delivery date. First, we predict the defect arrival curve based on the sub-feature arrival data and actual defect arrival data so far using eDPM. We then predict the closure curve based on the predicted arrival curve using eDPM. By combining the predicted arrival and closure curves we can now calculate the defect backlog by subtracting the closure curve from the arrival curve. Figure 12 shows the predicted arrival and closure curves, along with the predicted defect backlog curve. The project can now see some actions to be taken to improve the backlog curve to get closer to zero at the delivery date.

Figure 11. eDPM case study #4: Story points vs. integration test defects.

Figure 12. eDPM case study #5: sub-feature, defect arrival/closure/backlog.
