A. Appendix

[1,] "Barolo" "Barolo" "Barolo" "Barolo" "Barolo" "Barolo" [2,] "Barbera" "Barbera" "Barbera" "Barbera" "Barbera" "Barbera"

different wines with different (e.g., independent) chemical characteristics.

Frank copula and 429.184 for the Clayton copula).

6. Conclusion

110 Recent Applications in Data Clustering

select it can be varied.

clustering methods.

Acknowledgements

CoClust selects 6 clusters and allocates to each cluster the two types of wines. Thus, across clusters, we can perfectly recognize the two types of Italian wines and in each cluster we have

Similarly, the other two copula models can be used as in clustC and clustG above. The results appear to not be affected by the type of model used even though, based on the loglikelihood of the copula fitted on the final clustering, the more appropriate model appears to be the Gumbel model with a log-likelihood equal to 527.3022 (compared to 500.8835 for the

In this chapter, we describe a copula-based clustering algorithm and its implementation in the R package CoClust. One major advantage of this new package is that it provides an algorithm that is able to cluster multivariate observations by taking into account their underlying complex multivariate dependence structure. Being copula-based, the CoClust algorithm inherits the benefits of the copula. Thus, potentially any type of multivariate dependence structure can be handled and the most appropriate method can be employed to estimate both a probability model for each cluster/margin and the copula model.

The current version of the R package implements the clustering algorithm procedure in the main function CoClust. It enables the user to simultaneously choose the copula model, the estimation method for the margins and for the copula, the aggregation function for constructing the k-plet of observation allocation candidates. Moreover, the range (or set) of the number of clusters from among which the procedure automatically selects the best one and the sample size to be used to

As with many other software packages, CoClust package is continually being augmented and improved. We are currently investigating possible graphical solutions for the final clustering and implementing some measures to validate the clustering solution. Another future direction includes expanding the functionality of the CoClust package to allow comparing the solution of other clustering algorithms, such as mixture-based clustering and hierarchical

The author acknowledges the support of the Free University of Bozen-Bolzano, Faculty of Economics and Management, via the project "Aggregation functions for Innovation and Data The following two functions are useful to evaluate the goodness of the final clustering obtained through the CoClust algorithm when true clustering or benchmark clustering is available. The arguments of these two functions are ccfit, which is the object CoClust as given by the corresponding R function; ind.t, which is the true clustering expressed through the clustered index matrix with clusters by columns and the row index of matrix in Eq. (8) by rows; and nmarg, which is the dimension of the copula model, that is, the selected number of clusters. For an example of the use of these two functions see Section 4.3, "Example 2".

```
R> library("gtools")
```

```
pca.coclust <- function(ccfit, ind.t, nmarg){
    n.marg <- ccfit@"Number.of.Clusters"
    ind.perm <- permutations(n.marg,n.marg)
    n.comb <- nrow(ind.perm)
    if(n.marg==nmarg){
        ind.cc <- ccfit@"Index.Matrix"[,1:n.marg]
        n.kp <- nrow(ind.cc)
        res <- rep(NA,n.kp)
        for(i in 1:n.kp){
            dum <- ind.cc[i,]
            res0 <- rep(NA,n.comb)
            for(j in 1:n.comb){
                ind.ccs <- dum[ind.perm[j,]]
                ind.ccs <- paste(ind.ccs, collapse="-")
                 res0[j] <- as.integer(ind.ccs%in%ind. t)
            }
            res[i] <- any(res0)
        }
        pca.k <- sum(res)/length(ind.t)*100
    }
    return(pca.k=pca.k)
}
pcc.coclust <- function(ccfit, ind.t, nmarg){
    n.marg <- ccfit@"Number.of.Clusters"
   ind.perm <- permutations(n.marg,n.marg)
   n.comb <- nrow(ind.perm)
   if(n.marg==nmarg){
       ind.cc <- ccfit@"Index.Matrix"[,1:n.marg]
```

```
n.kp <- nrow(ind.cc)
       res <- rep(NA,n.kp)
       for(i in 1:n.kp){
           dum <- ind.cc[i,]
           res0 <- rep(NA,n.comb)
           for(j in 1:n.comb) {
               ind.ccs <- dum[ind.perm[j,]]
               ind.ccs <- paste(ind.ccs, collapse="-")
               res0[j] <- sum(ind.ccs%in%ind.it)
          }
          res[i] <- any(res0)
       }
       pcc.k <- sum(res)/nrow(ccfit@"Index.Matrix")*100
   }
   return(pcc.k=pcc.k)
}
```
[7] Durante F, Pappadà R, Torelli N. Clustering of financial time series in risky scenarios.

CoClust: An R Package for Copula-Based Cluster Analysis

http://dx.doi.org/10.5772/intechopen.74865

113

[8] Durante F, Pappadà R. Cluster analysis of time series via Kendall distribution. In: Grzegorzewski P, Gagolewski M, Hryniewicz O, Gil MA, editors. Strengthening Links Between Data Analysis and Soft Computing, Volume 3l5 of Advances in Intelligent

[9] Durante F, Pappadà R, Torelli N. Clustering of time series via non–parametric tail depen-

[10] De Luca G, Zuccolotto P. A tail dependence-based dissimilarity measure for financial time series clustering. Advances in Data Analysis and Classification. 2011;5(4):323-340 [11] De Luca G, Zuccolotto P. Time series clustering on lower tail dependence for portfolio selection. In: Corazza M, Pizzi C, editors. Mathematical and Statistical Methods for

[12] De Luca G, Zuccolotto P. Dynamic tail dependence clustering of financial time series.

[13] De Luca G, Zuccolotto P. A double clustering algorithm for financial time series based on

[14] D'Urso P, Disegna M, Durante F. Copula-based fuzzy clustering of time series. In: Mola F, Conversano C, editors. Book of Abstracts of the 10th Scientific Meeting of the Classifica-

[15] Arakelian V, Karlis D. Clustering dependencies via mixtures of copulas. Communication

[16] Kosmidis I, Karlis D. Model-based clustering using copulas with applications. Statistics

[17] Cherubini U, Luciano E, Vecchiato W. Copula Methods in Finance. Chichester: Wiley

[20] Trivedi PK, Zimmer DM. Copula Modeling: An Introduction for Practitioners, volume 1.

[21] Joe H, Xu J. The Estimation Method of Inference Functions for Margins for Multivariate Models. Technical report. Department of Statistics, University of British Columbia; 1996

[22] Genest C, Ghoudi K, Rivest LP. A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika. 1995;82:543-552

[23] Raftery AE, Nema D. Variable selection for model-based clustering. Journal of the Amer-

[18] Durante F, Sempi C. Principles of Copula Theory. Boca Raton: CRC Press; 2015

Systems and Computing. Springer International Publishing; 2015. pp. 209-216

Advance in Data Analysis and Classification. 2014;8:359-376

dence estimation. Statistical Papers. 2015;56(3):701-721

Statistical Papers, page in press. 2015

and Computing. 2016;26(5):1079-1099

Finance Series. John Wiley & Sons Ltd.; 2004

Foundations and Trends in Econometrics. 2005

ican Statistical Association. 2006;101(473):168-178

Actuarial Sciences and Finance. Berlin: Springer; 2014. pp. 131-140

extreme events. Statistics and Risk Modeling. 2017;34(1–2):1-12

tion and Data Analysis Group, Page 4. Cagliari: CUEC; 2015

in Statistics - Simulation and Compution. 2014;43(7):1644-1661

[19] Nelsen RB. Introduction to Copulas. New York: Springer; 2006
