**2.2 Groupings by** *k***-means partition**

The *k*-means algorithm is commonly used in the partition of *N*-dimensional population into *k* series based on a sample [52, 53]. Where *k*-series corresponds to the number of clusters to be calculated, arbitrarily specified by the researcher. The algorithm consists of classifying objects forming *k* clusters, so that for each group the intra-class similarity is minimized, but in turn, each group is as different as possible from the rest [54, 55]. Since the members of each cluster are the most similar to each other, the centre (centroid) of each group is represented by the respective mean. Briefly, the standard procedure for the computational algorithm is as follows: 1) the researcher specifies an arbitrary number of *k* clusters to be calculate. Alternatively, centroids can also be specified; 2) if the centroids are not specified, they are obtained randomly for each group; 3) by calculating the Euclidean distance, each object is assigned to its closest centroid; 4) the centroids are updated considering the recently incorporated objects; 5) each observation is reviewed with respect to the other clusters to confirm their membership to the respective group. The assignment and update steps are repeated until convergence or the total number of iterations are reached [53]. This method implies advantages when the author has prior knowledge of the analyzed data. For example, in taxonomy, the number of *k* clusters can refer to the number of data classes to classify [56, 57] or to the taxa that are known or those that want to be tested [21]. In the validation or optimization analysis of methods, it could correspond to the number of systems or criteria that are being considered [58]. An optimal number of *k* clusters can be more efficient when combined with other multivariate analysis techniques; e.g., in analysis of hierarchical clustering on principal components with partition of *k*-means (HCPC), which will be explained in the subsequent sections. If there is not enough information to select a specific number of *k* clusters, the optimal number of *k* partitions can be inferred using the "elbow" method [49, 59, 60]. The method consists of applying the *k*-means algorithm to the data, adopting different numbers of *k* clusters. Then graphically represent the internal variance of the groups, using the number of groups and their respective total within-cluster sum-of-squares (WCSS). The optimal number of *k* clusters will be indicated by the point where the slope of the WCSS tends to flatten, that is, where the variance is minimized [59, 61, 62]. Due to the randomness with which the initial centroids are selected, it is possible to observe variation in the clusters obtained when replicating the analysis. A suggested solution is to calculate the *k*-means algorithm several times and select the number of *k* clusters that generates the lowest WCSS [49]. Furthermore, it is suggested to compare different indices and select an optimal number of *k* clusters based on the majority rule (**Figure 5**).
