**3. Description measures**

In a multi-label dataset, the number of labels varies from one instance to another. For this reason, we can find some datasets that contain few labels compared with the total number of labels. This could be a parameter that influences the performance of different methods and approaches used to deal with the classification problem in multi-label databases. Therefore, a statistical analysis is necessary in order to have a description on a database [7, 8].

#### **3.1 Label cardinality LC**

LC indicates the average number of labels per instance (Eq. (1)).

$$L\mathbf{C} = \frac{\mathbf{1}}{\mathbf{N}} \sum\_{i=1}^{N} |y\_i| \tag{1}$$

#### **3.2 Label density LD**

LD is the average number of labels divided by the overall number of labels Q (Eq. (2)).

*Classification in Multi-Label Datasets DOI: http://dx.doi.org/10.5772/intechopen.109352*

$$LD = \frac{1}{N} \sum\_{i=1}^{N} \frac{|\mathcal{y}\_i|}{Q} \tag{2}$$

#### **3.3 Distinct label sets DL**

DL counts the number of label sets that are unique across the total number of instances (Eq. (3)).

$$DL = |\{\exists \mathbf{x}\_i \in \mathbf{X} \text{ and } Y\_i \subseteq Y(\mathbf{x}\_i, Y\_i) \in D\tag{3}$$
