*5.1.1 Copy transformation method*

This method [18] creates a single label dataset from original multi-label one. It replaces each multi-label instance with |Yi| labels by |Yi| instances. The variations of this method are dubbed copy-weight, select family of transformations, and ignore transformation. The first variation associates a weight to each produced instance. In the second one, for each set of created instances, only one instance is selected by applying the select max method that selects the most frequent instance, or the select min method that selects the least frequent instance, or select random one that selects an instance randomly. The last method deletes all multi-label instances.

### *5.1.2 Binary relevance (BR)*

BR [17] is one of the most popular methods. It generates one dataset for each label where each dataset contains all instances, but with only one class, which may be positive or negative. For each instance of the ith dataset, if its set of labels contains the i th label, then its class is positive; otherwise its class is negative.

For each dataset, a classifier is generated. To classify a new instance, the BR method returns the union of all labels predicted by generated classifiers.

Although BR is a simple transformation method, it has been strongly criticized due to its incapacity of handling label dependency information [19].

### *5.1.3 Label power set (LP)*

LP method [7] considers each set of labels of an instance as one class. For classifying a new instance, BR outputs the most probable class.

LP takes into account label dependence, but it has two drawbacks. First, the learning step becomes difficult when the number of label sets increases, especially when this number is exponential [20]. Second, the class imbalance problem can

appear when there are some label sets that are represented by very few instances in the training dataset [20].

#### *5.1.4 Random K-labelsets (RAKEL)*

RAKEL [7] generates m Label Power set (LP) classifiers. To construct the LP classifier, we randomly select a k-labelset from Lk without replacement, and we build the appropriate training dataset. We note that the number of iterations m and the size of a label set k are the user-specified parameters. The different steps are detailed in this algorithm:

```
Input: training dataset D, set of labels L, parameters m and k.
Output: m classifiers and corresponding k-label sets Zi
Begin
 1. Construct the set R of all k-label sets
 2. for i:=1 to min(m, |Lk
                         |) do
           2.1. Select randomly the k-label set Zi from R; R:=R/Zi
           2.2. Construct the corresponding training dataset Di:
                • Di:= Ø
                • For each instance (Xi,Si) from D do
                      ◦ W:=Xi ∩ Zi
                      ◦ If W = Ø, then replace Si by the empty class else replace Si by W
                      ◦ Di:=Di U {(Xi,W)}
           2.3. Build the classifier Hi using Di
           End.
```
To classify a new instance, each classifier uses its corresponding k-label set as it is illustrated in this algorithm:

```
Input: new instance X, set of m k-label set Zi, L, m LP classifiers Hj and the threshold T.
Output: vector of predictions V
Begin
 1. for i:=1 to |L| do sumi:=0; votesi:=0
 2. for j:=1 to m do
     for each label li ϵ Zi do sumi:=sumi + Hj(X,li); votesi:=votesi + 1
 3. for i:=1 to |L| do
     Avgi:=sumi/votesi
     If (Avgi > T), then Vi:=1 else Vi:=0
 End.
```
#### *5.1.5 Ranking by pair-wise comparison (RPC)*

RPC [21] produces L\*(L-1)/2 binary datasets from original dataset, one for each pair (li, lj) with 1 ≤ i < j ≤ k. Each dataset contains only instances that have the label li or lj , but not both, and it is used to generate a binary classifier. To classify a new instance, each binary classifier outputs the labels, then the majority votes are applied for each label.

#### *5.1.6 Calibrated label ranking (CLR)*

CLR [22] is a technique that extends RPC by introducing a new virtual label. This latter is known as calibration label, and it is considered as a breaking point of the ranking that split the set of labels into two sets: relevant labels and irrelevant labels.
