**2. Generalized additive models**

2 Will-be-set-by-IN-TECH

In the last decades, digital mammography has emerged as a promising technique that offers the possibility of a second-opinion consultation, or computer-aided detection (CAD) schemes to assist radiologists in the detection of radiological features that could point to those different

Nowadays, utility of CAD systems has been already demonstrated, and there are several computerized systems dedicated to detection and diagnosis tasks approved by the Food and Drug Administration (FDA), such as Second Look (CADx Medical Systems, Inc) (approved in 2002), MammoReader (Intelligent Systems Software, Inc) (approved in 2002), or the Kodak

It is clear that, in order to automatically detect lesions, it could be very useful to learn from the radiologists' experience, as well as to quantify the different image features employed by the clinicians to perform their diagnosis. Even although a computer system will never reach the specialists knowledge level, its ability to detect and classify abnormalities can be improved analyzing the existing differences between the human observer and the computer (Kuprinski & Nishikawa, 1997). It becomes necessary to understand both the medical image contents and the process developed by radiologists for analyzing the information. Given the difficult task of interpreting mammograms by radiologists, the CAD mammographic systems are addressed to limited goals, such as the detection and classification of masses and microcalcifications. It must be indicated that CAD systems, dedicated to detect abnormalities not only in the breast but also in other medical fields (Doi, 2007), produce suspicious areas that should be identified as lesions or false detections, in order to avoid confusing the clinicians when analyzing the areas detected by the computer. Because of this, a significant stage in nearly all the CAD schemes consists in reducing the number of false positives, by the application of different

pathologies (Banik et al., 2011; Hupse & Karssemeijer, 2009 ; Lado et al., 2001).

Mammography CAD Engine (Eastman Kodak Company) (approved in 2004).

algorithms and diverse statistical methods(Lado et al., 2006; Tourassi et al., 2005).

inherent discrimination capability of these systems (Metz, 1986).

any type of cancer or disease (Keotan et al., 2002; Li et al., 2005).

positives step.

There are several models, usually employed by the CAD systems in any field to reduce false detections, such as linear discriminant analysis (LDA) (Yoshida et al., 2002), neural networks (Park et al., 2011), or generalized additive models (GAMs) (Lado et al., 2006). However, reduction of false positives can be a difficult task if an inadequate method or algorithm is selected, this leading to incorrect results, by rejecting correct detections while keeping false positives. Because of this, researchers should pay much attention to the reduction of false

One of the most important aspects to be considered when the diagnostic imaging systems are analyzed is the evaluation of their diagnostic performance. To perform this task, receiver operating characteristic (ROC) curves are the method usually selected, since they indicate the trade-off between sensitivity and specificity, available from a diagnostic system describing the

The method of ROC curves can be generalized for the diagnostic performance of both the human observers and the CAD systems. In fact, a large amount of automated systems dedicated to the early detection and diagnosis cancer are frequently evaluated employing ROC methodology, not only in the field of breast lesions (Obuchowski, 2005), but also in nearly

In previous works (Lado et al., 2006; 2008) GAMs were applied to the reduction of false positives in CAD systems dedicated to the detection of microcalcifications. In the first work (Lado et al., 2006), the main goal was to overcome the limitations imposed by LDA in the type In this work, we are interested in predicting the presence or absence of a lesion, using a regression model for binary response. Explicitly, let *Y* be a binary (0/1) response variable, and **X** = (*X*1,..., *Xq*) the q-vector of the associated continuous covariates. In this framework,

Fig. 1. Region of a mammogram containing a cluster of microcalcifications, delimited by the

<sup>463</sup> Reducing False Positives in a Computer-Aided Diagnosis Scheme for Detecting Breast Microcalcificacions: A Quantitative Study with Generalized Additive Models

The cases were randomly selected from the mammographic screening program, currently underway, from 1992, at the Galicia Community (Spain). This program is integrated in the

The average radiation dose employed for the craniocaudal projections was 1.26 mGy, and 1.49 mGy for mediolateral oblique projection. The radiological classification criteria followed the guidelines stated by the Breast Imaging Reporting and Data System (BI-RADS), which establishes the following groups: a) category 0: need additional imaging evaluation; b) category 1: negative; c) category 2: benign finding, noncancerous; d) category 3: probably benign finding, short-interval follow-up suggested; e) category 4: suspicious abnormality, biopsy considered; f) category 5: highly suggestive of malignancy, appropriate action needed. All the images were digitized at a resolution of 2000x2500 pixels and 4096 gray levels

Two experienced radiologists, by consensus, categorized the mammograms into two groups, according to the breast tissue, resulting in 118 dense mammograms, the rest (56) being classified as fatty mammograms. Theses two experts also marked the location of each cluster of microcalcification in the digital images, being this marks stored on truth data files, in terms of x and y directions. These data truth archives were used to compare the experimental results, obtained with the use of the computerized system to detect microcalcifications, with the true position of the clusters. Figure 1 shows a region of a mammogram containing clustered

To detect the clusters of microcalcifications, a CAD system was developed and extensively described elsewhere (Lado et al., 2001). Briefly, the method is a five-step process that includes (Figure 2): a) detection of the breast border, employing a tracking algorithm that computes the gradient of gray levels; b) application of wavelet transform to enhance

white square, and a zoomed window containing some microcalcifications

European Network of Reference Centers for Breast Cancer Screening.

employing a Lumiscan 85 laser scanner (Lumysis Inc., Sunnyvale, CA).

microcalcifications.

**3.2 CAD system**

denoting by *p*(**X**) = *p*(*Y* = 1|**X**) , the logistic generalized linear models (GLM) (McCullagh & Nelder, 1989) takes the form:

$$p(\mathbf{X}) = p(Y = 1 | \mathbf{X}) = \frac{\exp\left(a\_0 + a\_1 \cdot X\_1 + \dots + a\_q \cdot X\_q\right)}{1 + \exp\left(a + a\_1 \cdot X\_1 + \dots + a\_q \cdot X\_q\right)}\tag{1}$$

where (*a*0, *a*1,..., *aq*) is a vector of coefficients. In some instances, GLMs can be very restrictive, since they assume linearity in the covariates. This constraint can be avoided by replacing the linear index *η* = *a*<sup>0</sup> + *a*<sup>1</sup> · *X*<sup>1</sup> + ... + *aq* · *Xq* with a non-parametric structure. Accordingly, here we shall concentrate on the generalized additive model (GAM) (Hastie & Tibshirani, 1990), which is a generalization of the GLM, by introducing one-dimensional, non-parametric functions instead of linear components. Specifically, GAMs express the conditional mean

$$p(\mathbf{X}) = p(Y = 1|\mathbf{X}) = \frac{\exp\left(a + f\_1(\mathbf{X}\_1) + \dots + f\_p(\mathbf{X}\_q)\right)}{1 + \exp\left(a + f\_1(\mathbf{X}\_1) + \dots + f\_p(\mathbf{X}\_q)\right)}\tag{2}$$

where *a* is a constant and *fj* is the unknown smooth partial function or effect curve associated to each continuous covariate *Xj*. Note that identification is guaranteed by introducing a constant a into the model and requiring a zero mean for the partial functions. The GAM is widely used as an extension of the traditional GLMs (McCullagh & Nelder, 1989) specially when continuous covariates are present. The GAM is more flexible than the GLM, since the researcher does not assume a parametric form for the effects of the continuous covariates, but only assumes that these effects may be represented by arbitrary unknown smooth functions. The GAMs are easy to interpret, because the additive components simply describe the influence of each covariate separately. Several contributions to GAMs can be found in the literature. Hastie and Tibshirani discussed various approaches using smoothing splines (Hastie & Tibshirani, 1990). Wood introduced a numerical procedure based on regression splines (Wood, 2006). Nowadays, there exists standard software, such as the mgcv package in R, to fit this model.

A generalization of the "pure" GAM in (2) is the GAM with "factor-by-curve" interactions. In this type of model, the relationship between *Y* and each of the continuous covariates *Xj* may vary among the subsets defined by the levels 1, . . . , *L* of a categorical covariate *Z* (called factor). Explicitly, in the the factor-by-curve logistic GAM the effect of each covariate *Xj* can be expressed as

$$f\_{\vec{f}}(Z, x) = \begin{cases} f\_{\vec{f}}^1(x) \text{ if } Z = 1 \\\\ \vdots \\ f\_{\vec{f}}^M(x) \text{ if } Z = M \end{cases}$$

In this way, the effect of each continuous covariate *Xj* is descomposed in the effects *f <sup>l</sup> j* associated to each level *l* (1, . . . , *L* ) of the factor *Z* .

#### **3. Database and CAD system**

#### **3.1 Mammogram selection**

The mammogram database was constituted by 174, mammograms containing 77 clusters of microcalcifications, proven by biopsy, each mammogram having no more than one cluster.

Fig. 1. Region of a mammogram containing a cluster of microcalcifications, delimited by the white square, and a zoomed window containing some microcalcifications

The cases were randomly selected from the mammographic screening program, currently underway, from 1992, at the Galicia Community (Spain). This program is integrated in the European Network of Reference Centers for Breast Cancer Screening.

The average radiation dose employed for the craniocaudal projections was 1.26 mGy, and 1.49 mGy for mediolateral oblique projection. The radiological classification criteria followed the guidelines stated by the Breast Imaging Reporting and Data System (BI-RADS), which establishes the following groups: a) category 0: need additional imaging evaluation; b) category 1: negative; c) category 2: benign finding, noncancerous; d) category 3: probably benign finding, short-interval follow-up suggested; e) category 4: suspicious abnormality, biopsy considered; f) category 5: highly suggestive of malignancy, appropriate action needed.

All the images were digitized at a resolution of 2000x2500 pixels and 4096 gray levels employing a Lumiscan 85 laser scanner (Lumysis Inc., Sunnyvale, CA).

Two experienced radiologists, by consensus, categorized the mammograms into two groups, according to the breast tissue, resulting in 118 dense mammograms, the rest (56) being classified as fatty mammograms. Theses two experts also marked the location of each cluster of microcalcification in the digital images, being this marks stored on truth data files, in terms of x and y directions. These data truth archives were used to compare the experimental results, obtained with the use of the computerized system to detect microcalcifications, with the true position of the clusters. Figure 1 shows a region of a mammogram containing clustered microcalcifications.

#### **3.2 CAD system**

4 Will-be-set-by-IN-TECH

denoting by *p*(**X**) = *p*(*Y* = 1|**X**) , the logistic generalized linear models (GLM) (McCullagh &

1 + exp �

1 + exp �

where *a* is a constant and *fj* is the unknown smooth partial function or effect curve associated to each continuous covariate *Xj*. Note that identification is guaranteed by introducing a constant a into the model and requiring a zero mean for the partial functions. The GAM is widely used as an extension of the traditional GLMs (McCullagh & Nelder, 1989) specially when continuous covariates are present. The GAM is more flexible than the GLM, since the researcher does not assume a parametric form for the effects of the continuous covariates, but only assumes that these effects may be represented by arbitrary unknown smooth functions. The GAMs are easy to interpret, because the additive components simply describe the influence of each covariate separately. Several contributions to GAMs can be found in the literature. Hastie and Tibshirani discussed various approaches using smoothing splines (Hastie & Tibshirani, 1990). Wood introduced a numerical procedure based on regression splines (Wood, 2006). Nowadays, there exists standard software, such as the mgcv package in

A generalization of the "pure" GAM in (2) is the GAM with "factor-by-curve" interactions. In this type of model, the relationship between *Y* and each of the continuous covariates *Xj* may vary among the subsets defined by the levels 1, . . . , *L* of a categorical covariate *Z* (called factor). Explicitly, in the the factor-by-curve logistic GAM the effect of each covariate *Xj* can

*f* 1

*f <sup>M</sup>*

In this way, the effect of each continuous covariate *Xj* is descomposed in the effects *f <sup>l</sup>*

The mammogram database was constituted by 174, mammograms containing 77 clusters of microcalcifications, proven by biopsy, each mammogram having no more than one cluster.

*<sup>j</sup>* (*x*) if *Z* = 1 . . .

*<sup>j</sup>* (*x*) if *Z* = *M*

⎧ ⎪⎪⎪⎨

⎪⎪⎪⎩

*fj*(*Z*, *x*) =

associated to each level *l* (1, . . . , *L* ) of the factor *Z* .

**3. Database and CAD system**

**3.1 Mammogram selection**

where (*a*0, *a*1,..., *aq*) is a vector of coefficients. In some instances, GLMs can be very restrictive, since they assume linearity in the covariates. This constraint can be avoided by replacing the linear index *η* = *a*<sup>0</sup> + *a*<sup>1</sup> · *X*<sup>1</sup> + ... + *aq* · *Xq* with a non-parametric structure. Accordingly, here we shall concentrate on the generalized additive model (GAM) (Hastie & Tibshirani, 1990), which is a generalization of the GLM, by introducing one-dimensional, non-parametric functions instead of linear components. Specifically, GAMs express the

*a*<sup>0</sup> + *a*<sup>1</sup> · *X*<sup>1</sup> + ... + *aq* · *Xq*

*a* + *f*1(*X*1) + ... + *fp*(*Xq*)

*a* + *f*1(*X*1) + ... + *fp*(*Xq*)

*a* + *a*<sup>1</sup> · *X*<sup>1</sup> + ... + *aq* · *Xq*

�

�

� (1)

� (2)

*j*

*<sup>p</sup>*(**X**) = *<sup>p</sup>*(*<sup>Y</sup>* <sup>=</sup> <sup>1</sup>|**X**) = exp �

*<sup>p</sup>*(**X**) = *<sup>p</sup>*(*<sup>Y</sup>* <sup>=</sup> <sup>1</sup>|**X**) = exp �

Nelder, 1989) takes the form:

conditional mean

R, to fit this model.

be expressed as

To detect the clusters of microcalcifications, a CAD system was developed and extensively described elsewhere (Lado et al., 2001). Briefly, the method is a five-step process that includes (Figure 2): a) detection of the breast border, employing a tracking algorithm that computes the gradient of gray levels; b) application of wavelet transform to enhance

9. dif1 (X10): difference in gray level values between the average gray level value of the cluster, avglcluster, and the average gray level value of the ROI, avglROI (mean size of

<sup>465</sup> Reducing False Positives in a Computer-Aided Diagnosis Scheme for Detecting Breast Microcalcificacions: A Quantitative Study with Generalized Additive Models

10. dif2 (X11): difference in gray level values between the average gray level value of the cluster, avglcluster, and the average gray level value of the breast image, avglbreast (mean

The response of the model was the binary variable true (0/1): a value of 0 indicates that the detected cluster is a false positive. If true equals 1, the detected cluster corresponds to a real

The analysis of the previous feature values was performed employing GAMs, and considering as the factor added to the model the breast tissue (*BT*), corresponding either to dense tissue (*d*) or to fatty tissue ( *f* ), as previously classified by the radiologists. Explicity we have considered

*<sup>p</sup>*(*BT*, **<sup>X</sup>**) = *<sup>p</sup>*(true <sup>=</sup> <sup>1</sup>|**X**) = exp (*<sup>a</sup>* <sup>+</sup> *<sup>f</sup>*1(*BT*, *<sup>X</sup>*1) + ... <sup>+</sup> *<sup>f</sup>*11(*BT*, *<sup>X</sup>*11))

As stated before, the present work was an attempt to improve the sensitivity of our computerized system by trying the discriminatory capability of different subsets of covariates. A question that tends to arise in regression models of type (3) is that of determining the best subset or subsets of *q* (*q <* 11) predictors, which will establish the model or models with the best discrimination capacity. As a general rule, as an increasing number of variables are added to the model, the "apparent" fit of the observed data will be improved. However, these estimates are not always satisfied, due to various reasons. On the one hand, inclusion of irrelevant variables would increase the variance of the estimates, thereby amounting to a loss of the predictive capacity of the model; and on the other, inclusion of many variables would

To choose the model, we have used an automatic fordward stepwise selection procedure. This procedure selects the model containing the best subset of q variables that would provide the best discrimination capacity, and eliminates the remainder from the model, according to an optimal criterion based on the use of the ROC curve. The area under the ROC curve (AUC) is one of the most widely used criteria for comparing the performance of a series of binary response regression models. The ROC curve relies on false/true-positive/negative tests, where sensitivity is the proportion of event responses that were predicted to be events and specificity is the proportion of non- event responses that were predicted to be non-events. The plot of sensitivity (i.e., hit rate) versus 1-specificity (i.e., false alarm rate) is the ROC curve; the area under this curve measures the accuracy of the detection system and does not require any assumptions concerning either the shape or form of the underlying signal and noise distributions (Saveland & Neuenschwander, 1990). This statistic is a threshold-independent measure of model discrimination, where 0.5 suggests no discrimination, 0.7-0.8 suggests acceptable discrimination, and 0.8-0.9 suggests excellent discrimination

To obtain the corresponding AUCs for various and different covariate subsets, the models (3) were trained on half of the outputs of the detection scheme, which resulted in 36 true

*<sup>j</sup>* (*x*) for dense tissue and *fj*(*BT*, *x*) = *f*

mean that the model would be difficult to interpret.

(Hosmer & Lemeshow, 2000).

<sup>1</sup> <sup>+</sup> exp (*<sup>a</sup>* <sup>+</sup> *<sup>f</sup>*1(*BT*, *<sup>X</sup>*1) + ... <sup>+</sup> *<sup>f</sup>*11(*BT*, *<sup>X</sup>*11)) (3)

*<sup>j</sup>* (*x*) for fatty tissue.

*f*

103.63±73.16).

the following GAM:

with *fj*(*BT*, *x*) = *f <sup>d</sup>*

size of 315.25±300.00).

cluster, and it is a correct detection.

Fig. 2. Scheme of the CAD system for detecting microcalcifications

microcalcifications, by dividing each mammogram into vertical lines, and applying wavelet transform to each lines. As a result, and after applying a local thresold to the wavelet image, a binary image containing the possible seed (origin) points of microcalcifications was obtained; c) gray level thresholding to extract the possible microcalcifications, and application of contrast-size test and morphologic operators, including a region growing algorithm "to grow" the microcalcifications from the corresponding seed points; d) clustering procedure to group the microcalcifications, following the criteria given by Kopans (Kopans, 1989), that considers a cluster of microcalcifications as five or more signals within a region of 1 cm2 of area; and 5) reduction of false positives, employing different techniques.

#### **4. Feature extraction and GAM study**

When the CAD system previously described was applied over the complete dataset of fatty and dense mammograms, 72 true positives (TPs) were detected, but the system yielded 740 false detections or positives (FP). This means a sensitivity of 93.5% (72/77) at a false positive rate of 4.25 FPs/image.

At this moment, even although the sensitivity produced by the automated system is really promising, it is needed to understand the importante of maintaining a reduced number of false detections, in order to not confuse the radiologist by suggesting normal areas as suspicious, and to reduce the number of biopsies to be performed. Our system arosed a high number (4.25) of false detections per image. Beacuse of this, a FP reduction step becomes necessary and fundamental.

To reduce false detections, various features of the detected clusters (true and false positives) were extracted:


6 Will-be-set-by-IN-TECH

microcalcifications, by dividing each mammogram into vertical lines, and applying wavelet transform to each lines. As a result, and after applying a local thresold to the wavelet image, a binary image containing the possible seed (origin) points of microcalcifications was obtained; c) gray level thresholding to extract the possible microcalcifications, and application of contrast-size test and morphologic operators, including a region growing algorithm "to grow" the microcalcifications from the corresponding seed points; d) clustering procedure to group the microcalcifications, following the criteria given by Kopans (Kopans, 1989), that considers a cluster of microcalcifications as five or more signals within a region of 1 cm2 of

When the CAD system previously described was applied over the complete dataset of fatty and dense mammograms, 72 true positives (TPs) were detected, but the system yielded 740 false detections or positives (FP). This means a sensitivity of 93.5% (72/77) at a false positive

At this moment, even although the sensitivity produced by the automated system is really promising, it is needed to understand the importante of maintaining a reduced number of false detections, in order to not confuse the radiologist by suggesting normal areas as suspicious, and to reduce the number of biopsies to be performed. Our system arosed a high number (4.25) of false detections per image. Beacuse of this, a FP reduction step becomes

To reduce false detections, various features of the detected clusters (true and false positives)

1. avglbreast (X1): average gray level value of the breast image containing the detected

2. avglROI (X2): average gray level value of the region of interest (ROI) containing the

3. avglcluster (X3): average gray level value of the pixels belonging to the detected

4. avgldist (X4): average distance among the detected microcalcifications in each cluster,

5. dimx and dimy (X5 and X6): x and y dimensions of the ROI containing the detected cluster

8. size/avglcluster (X9): relationship between the size and the distribution of gray level

6. size (X7): size of each detected cluster, in pixels (mean value of 10356.02±24558.54). 7. size/avgldist (X8): relationship between the size and the mean distance among

Fig. 2. Scheme of the CAD system for detecting microcalcifications

area; and 5) reduction of false positives, employing different techniques.

region, ranging from 0 to 4095 (mean value of 2765.08±275.97).

measured in pixels (mean value of 74.69±34.63).

values of a cluster (mean size of 3.34±7.56).

(mean values of 85.25´s70.76 and 82.91±69.48, respectively).

microcalcifications for a cluster mean value of 99.26±133.51).

detected region, ranging from 0 to 4095 (mean value of 2976.70±359.61).

microcalcifications, ranging from 0 to 4095 (mean value of 3080.32±340.91).

**4. Feature extraction and GAM study**

rate of 4.25 FPs/image.

necessary and fundamental.

were extracted:


The response of the model was the binary variable true (0/1): a value of 0 indicates that the detected cluster is a false positive. If true equals 1, the detected cluster corresponds to a real cluster, and it is a correct detection.

The analysis of the previous feature values was performed employing GAMs, and considering as the factor added to the model the breast tissue (*BT*), corresponding either to dense tissue (*d*) or to fatty tissue ( *f* ), as previously classified by the radiologists. Explicity we have considered the following GAM:

$$p(\mathcal{B}T, \mathbf{X}) = p(\text{true} = 1 | \mathbf{X}) = \frac{\exp\left(a + f\_1(\mathcal{B}T, X\_1) + \dots + f\_{11}(\mathcal{B}T, X\_{11})\right)}{1 + \exp\left(a + f\_1(\mathcal{B}T, X\_1) + \dots + f\_{11}(\mathcal{B}T, X\_{11})\right)}\tag{3}$$

with *fj*(*BT*, *x*) = *f <sup>d</sup> <sup>j</sup>* (*x*) for dense tissue and *fj*(*BT*, *x*) = *f f <sup>j</sup>* (*x*) for fatty tissue.

As stated before, the present work was an attempt to improve the sensitivity of our computerized system by trying the discriminatory capability of different subsets of covariates. A question that tends to arise in regression models of type (3) is that of determining the best subset or subsets of *q* (*q <* 11) predictors, which will establish the model or models with the best discrimination capacity. As a general rule, as an increasing number of variables are added to the model, the "apparent" fit of the observed data will be improved. However, these estimates are not always satisfied, due to various reasons. On the one hand, inclusion of irrelevant variables would increase the variance of the estimates, thereby amounting to a loss of the predictive capacity of the model; and on the other, inclusion of many variables would mean that the model would be difficult to interpret.

To choose the model, we have used an automatic fordward stepwise selection procedure. This procedure selects the model containing the best subset of q variables that would provide the best discrimination capacity, and eliminates the remainder from the model, according to an optimal criterion based on the use of the ROC curve. The area under the ROC curve (AUC) is one of the most widely used criteria for comparing the performance of a series of binary response regression models. The ROC curve relies on false/true-positive/negative tests, where sensitivity is the proportion of event responses that were predicted to be events and specificity is the proportion of non- event responses that were predicted to be non-events. The plot of sensitivity (i.e., hit rate) versus 1-specificity (i.e., false alarm rate) is the ROC curve; the area under this curve measures the accuracy of the detection system and does not require any assumptions concerning either the shape or form of the underlying signal and noise distributions (Saveland & Neuenschwander, 1990). This statistic is a threshold-independent measure of model discrimination, where 0.5 suggests no discrimination, 0.7-0.8 suggests acceptable discrimination, and 0.8-0.9 suggests excellent discrimination (Hosmer & Lemeshow, 2000).

To obtain the corresponding AUCs for various and different covariate subsets, the models (3) were trained on half of the outputs of the detection scheme, which resulted in 36 true

2 4 6 8 10

2 4 6 8 10

number of covariates

**b) GAM**

0.60

2 4 6 810

number of covariates

Fig. 3. Possible subset model. For each subset size, the AUC is shown for a) GLM, b) GAM

When the correlations were calculated, the subset selection for the whole set of covariates was performed, and the corresponding AUCs were obtained for each model. Figure (3) presents the values for all the possible subset model combinations, employing GLMs and GAMs. This figure can be interpreted as follows: The *x* axis gives the number of covariates included in the statistical model, while the *y* axis represents the AUC obtained with the model employing the number of covariates indicated by the *x* value. For example, for a number of covariates equal to 1, a line of vertical points corresponding to different AUC values is represented, the first of them with a value close to 0.60 in both the GLM and the GAM. The last value obtained is greater than 0.80 in both models. All of them correspond to the AUC values obtained with one different covariate between *X*<sup>1</sup> and *X*11. The intermediate values are the AUCs yielded

 0.70

 0.80

AUC

<sup>467</sup> Reducing False Positives in a Computer-Aided Diagnosis Scheme for Detecting Breast Microcalcificacions: A Quantitative Study with Generalized Additive Models

**c) GLM vs. GAM**

 0.90

number of covariates

by the models constructed employing the rest of covariates.

**a) GLM**

0.60

0.78

and c) GLM vs. GAM.

 0.82

> GLM GAM

 0.86

AUC

 0.90

 0.70

 0.80

AUC

 0.90

positives and 370 false detections, corresponding to 90 mammograms. The cases employed for training the technique were randomly extracted from the total number of outputs of the CAD scheme. The models were finally tested on the other half of the cases: 36 true positives and 370 false positives (84 mammograms) that had not been used at the initial training stage. The performances of the developed GAMs and GLMs, with the different feature values, were analyzed employing ROC analysis, and considering as the decision variable the estimated probabilities obtained with the models.

To obtain the corresponding AUCs for various and different covariate subsets, the models (3) were trained on half of the outputs of the detection scheme, randomly extracted from the total number of outputs of the CAD system. The models were finally tested on the other half of the cases that had not been used at the initial training stage. The performances of the developed GAMs and GLMs, with the different feature values, were analyzed employing ROC analysis, and considering as the decision variable the estimated probabilities obtained with the models.

### **5. Results and discussion**

This research work aimed at studying how the different features extracted from true and false positive clusters of microcalcifications behave in presence of categorical covariates and factors that can influence and even condition their behaviour. The main goal is, in this way, to discriminate between true clusters and false detections.

The interactions among the different variables were considered in the study. Previously to the selection of variables, correlation among the different covariates was calculated in Table (1).


Table 1. Matrix correlations(×100) between covariates

A high correlation can be observed for several features, particularly between X2 and X3, or among X9, X10 and X11. Surely, this is due to the fact that these variables can be very similar. For example, X2 and X3 represent gray level values for the cluster of microcalcifications and the ROI containing it, and both regions may nearly contain the same pixel values, this resulting in a high similarity between them. However, there are other features with a low correlation, for example the properties based on gray level values and the properties based on either distances or sizes.

8 Will-be-set-by-IN-TECH

positives and 370 false detections, corresponding to 90 mammograms. The cases employed for training the technique were randomly extracted from the total number of outputs of the CAD scheme. The models were finally tested on the other half of the cases: 36 true positives and 370 false positives (84 mammograms) that had not been used at the initial training stage. The performances of the developed GAMs and GLMs, with the different feature values, were analyzed employing ROC analysis, and considering as the decision variable the estimated

To obtain the corresponding AUCs for various and different covariate subsets, the models (3) were trained on half of the outputs of the detection scheme, randomly extracted from the total number of outputs of the CAD system. The models were finally tested on the other half of the cases that had not been used at the initial training stage. The performances of the developed GAMs and GLMs, with the different feature values, were analyzed employing ROC analysis, and considering as the decision variable the estimated probabilities obtained with the models.

This research work aimed at studying how the different features extracted from true and false positive clusters of microcalcifications behave in presence of categorical covariates and factors that can influence and even condition their behaviour. The main goal is, in this way, to

The interactions among the different variables were considered in the study. Previously to the selection of variables, correlation among the different covariates was calculated in Table (1).

A high correlation can be observed for several features, particularly between X2 and X3, or among X9, X10 and X11. Surely, this is due to the fact that these variables can be very similar. For example, X2 and X3 represent gray level values for the cluster of microcalcifications and the ROI containing it, and both regions may nearly contain the same pixel values, this resulting in a high similarity between them. However, there are other features with a low correlation, for example the properties based on gray level values and the properties based on

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X1 100 56 54 -1 -3 -4 -21 -30 -1 -4 -4 X2 56 100 98 -2 0 -2 -35 60 2 1 -3 X3 54 98 100 1 4 4 -16 64 6 6 2 X4 -1 -2 1 100 74 75 19 3 65 64 66 X5 -3 0 4 74 100 67 20 7 82 88 83 X6 -4 -2 4 75 67 100 25 8 83 86 84 X7 -21 -35 -16 19 20 25 100 1 22 25 23 X8 -30 60 64 3 7 8 1 100 9 10 6 X9 -1 2 6 65 82 83 22 9 100. 93 100 X10 -4 1 6 64 88 86 25 10 93 100 94 X11 -4 -3 2 66 83 84 23 6 100 94 100

probabilities obtained with the models.

discriminate between true clusters and false detections.

Table 1. Matrix correlations(×100) between covariates

either distances or sizes.

**5. Results and discussion**

Fig. 3. Possible subset model. For each subset size, the AUC is shown for a) GLM, b) GAM and c) GLM vs. GAM.

When the correlations were calculated, the subset selection for the whole set of covariates was performed, and the corresponding AUCs were obtained for each model. Figure (3) presents the values for all the possible subset model combinations, employing GLMs and GAMs. This figure can be interpreted as follows: The *x* axis gives the number of covariates included in the statistical model, while the *y* axis represents the AUC obtained with the model employing the number of covariates indicated by the *x* value. For example, for a number of covariates equal to 1, a line of vertical points corresponding to different AUC values is represented, the first of them with a value close to 0.60 in both the GLM and the GAM. The last value obtained is greater than 0.80 in both models. All of them correspond to the AUC values obtained with one different covariate between *X*<sup>1</sup> and *X*11. The intermediate values are the AUCs yielded by the models constructed employing the rest of covariates.

fatty dense *q* variables AUC variables AUC 1 10 79.20 8 73.60 2 3 86.40 7 74.50 3 11 86.70 1 74.90 4 4 87.00 2 74.90 5 2 86.40 3 74.90 6 5 86.40 4 74.90 7 6 85.80 11 72.30 8 7 85.70 5 72.10 9 8 85.70 6 69.90 10 1 85.60 9 73.60 11 9 85.60 10 74.90

<sup>469</sup> Reducing False Positives in a Computer-Aided Diagnosis Scheme for Detecting Breast Microcalcificacions: A Quantitative Study with Generalized Additive Models

Table 3. AUC values for the different GLMs and number of covariates in each subset, for both

Table 4. AUC values for the different GAMs and number of covariates in each subset, for

improved even for radiologists. The dense parenchyma always presents greater difficulties

Appart from this, we can also perceive that, when one covariate is included in the model, best results are obtained by X10 again; however, if more variables are present, selection is not the same in both GLM and GAMs, and it does not match the selection performed when the tissue

fatty dense *q* variables AUC variables AUC 1 10 79.20 1 79.00 2 3 86.10 11 84.40 3 5 86.90 5 85.10 4 11 85.30 6 86.80 5 1 81.80 7 86.60 6 9 87.90 3 87.00 7 2 81.20 9 87.70 8 4 80.10 10 87.00 9 6 69.10 2 66.70 10 7 87.00 4 64.10 11 8 62.60 8 70.50

fatty and dense breast tissue

both fatty and dense breast tissue

to peform a correct diagnosis.

From Figure (3), it can be observed that, if only one covariate is considered, the best AUC obtained in the GLM is lower than in the rest of cases; however, the AUC is very similar when selecting 2 or more covariates.

When applying the GAM the situation is different, providing the best results for the AUC values when three covariates are considered. Finally, the comparison of the best models for both GLM and GAM indicates that GAM performs better when a number up to 8 covariates are considered. If this number increases, best results are provided by GLMs. Table (2) lists the AUC values obtained for both types of models with different number of covariates.

From Table (2), it can be observed that for both the linear and the additive models, if we only consider one variable in the model, best results were obtained for X10, that is, for the difference in gray level values between the average gray level value of the cluster avglcluster, and the average gray level value of the ROI, avglROI.


Table 2. AUC values for the different models and number of covariates in each subset.

As an increasing number of variables are included in the study, that is, as different covariates were considered in both models, greater values for AUCs were obtained; particularly, better results were achieved for GAMs when a number up to 8 covariates were included in the model, and better results were obtained for GLMs for 9 or more variables considered. In the previous models, no distinction was considered for the type of tissue, and a unique analysis was performed for both GLM and GAM. However, to study the effect of the previous models in the breast tissue, different analyses were performed for clusters of microcalcifica- tions embedded on both fatty and dense tissue. Different subset combinations were again obtained for both GLMs and GAMs, and the corresponding AUCs were calculated (Tables (3,4). From these tables, it can be observed that best AUCs are obtained for fatty tissue, while for dense parenchyma are always lower in the corresponding models. This is consistent with the fact that, for fatty tissue, the contrast value, that is, the difference between the microcalcification and the background surrounding it, is greater than for dense tissue. Thus, detection is 10 Will-be-set-by-IN-TECH

From Figure (3), it can be observed that, if only one covariate is considered, the best AUC obtained in the GLM is lower than in the rest of cases; however, the AUC is very similar when

When applying the GAM the situation is different, providing the best results for the AUC values when three covariates are considered. Finally, the comparison of the best models for both GLM and GAM indicates that GAM performs better when a number up to 8 covariates are considered. If this number increases, best results are provided by GLMs. Table (2) lists the

From Table (2), it can be observed that for both the linear and the additive models, if we only consider one variable in the model, best results were obtained for X10, that is, for the difference in gray level values between the average gray level value of the cluster avglcluster,

> GLM GAM *q* variables AUC variables AUC 1 10 82.10 10 82.10 2 8 85.10 1 87.80 3 4 86.10 11 89.80 4 2 86.30 8 89.30 5 3 86.30 9 87.90 6 1 86.20 3 86.60 7 5 86.20 4 86.60 8 6 83.00 2 82.80 9 7 85.80 5 79.90 10 9 85.80 6 78.90 11 11 85.40 7 78.30

AUC values obtained for both types of models with different number of covariates.

Table 2. AUC values for the different models and number of covariates in each subset.

As an increasing number of variables are included in the study, that is, as different covariates were considered in both models, greater values for AUCs were obtained; particularly, better results were achieved for GAMs when a number up to 8 covariates were included in the model, and better results were obtained for GLMs for 9 or more variables considered. In the previous models, no distinction was considered for the type of tissue, and a unique analysis was performed for both GLM and GAM. However, to study the effect of the previous models in the breast tissue, different analyses were performed for clusters of microcalcifica- tions embedded on both fatty and dense tissue. Different subset combinations were again obtained for both GLMs and GAMs, and the corresponding AUCs were calculated (Tables (3,4). From these tables, it can be observed that best AUCs are obtained for fatty tissue, while for dense parenchyma are always lower in the corresponding models. This is consistent with the fact that, for fatty tissue, the contrast value, that is, the difference between the microcalcification and the background surrounding it, is greater than for dense tissue. Thus, detection is

selecting 2 or more covariates.

and the average gray level value of the ROI, avglROI.


Table 3. AUC values for the different GLMs and number of covariates in each subset, for both fatty and dense breast tissue


Table 4. AUC values for the different GAMs and number of covariates in each subset, for both fatty and dense breast tissue

improved even for radiologists. The dense parenchyma always presents greater difficulties to peform a correct diagnosis.

Appart from this, we can also perceive that, when one covariate is included in the model, best results are obtained by X10 again; however, if more variables are present, selection is not the same in both GLM and GAMs, and it does not match the selection performed when the tissue

0.0 0.2 0.4 0.6 0.8 1.0

GLM (AUC= 0.863 ) GAM (AUC= 0.898 )

0.0 0.2 0.4 0.6 0.8 1.0

GLM (AUC= 0.749 ) GAM (AUC= 0.877 )

1−esp

**dense**

1−esp

sen

Fig. 5. ROC curves obtained for the GAM and GLM, for the global analysis, and for both

positives with additive models was 2.7, this demonstrating the benefits of this type of models

In this work, GLMs and GAMs were applied to the reduction of false positives yielded by a CAD system devoted to the detection of clusters of microcalcifications. Results indicate that not all the features extracted from the detected clusters are useful for the discrimination between true and false detections: Moreover, there are features that are relevant when the different type of tissue is considered, and their influence is different depending on the breast

After the reduction of false positives, the system is capable of discriminating and detecting clustered microcalcifications from digital mammograms, this suggesting that this CAD

0.0 0.2 0.4 0.6 0.8 1.0

**global analysis**

<sup>471</sup> Reducing False Positives in a Computer-Aided Diagnosis Scheme for Detecting Breast Microcalcificacions: A Quantitative Study with Generalized Additive Models

0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

types of breast tissue

**6. Conclusion**

parenchyma.

sen

0.0 0.2 0.4 0.6 0.8 1.0

GLM (AUC= 0.870 ) GAM (AUC= 0.879 )

1−esp

for discriminating tasks, employing factors and interactions.

**fatty**

sen

type was not separately considered. Moreover, for fatty tissue GLM performs better, while, by contrary, for dense breasts the optimal results are obtained when employing GLMs to select covariates.

Figure (4) shows the AUC values for the best subset model combinations, employing GLMs and GAMs, for both fatty and dense tissue. Figure 5 represents the AUCs for the global analysis, and for both fatty and dense tissue, for the best GLM and GAM.

It can be observed that, for GLM, results obtained for fatty tissue are higher than those obtained for dense tissue. However, when employing the GAM, differences are lower, and a more reduced number of covariates have to be included in the study.

Sensitivity and false positive rates were also calculated for the best GLM and the best GAM. For linear models, results yielded a sensitivity of 88.31%, at a false positive rate of 3.7 FPs per image. For the same sensitivity, the false positive fraction achieved when reducing false

Fig. 4. "Optimal" models for both fatty and dense tissue. For each subset size, the AUC is shown for each model.

Fig. 5. ROC curves obtained for the GAM and GLM, for the global analysis, and for both types of breast tissue

positives with additive models was 2.7, this demonstrating the benefits of this type of models for discriminating tasks, employing factors and interactions.
