**3. The proposed method**

This section presents the overall flow of our proposed method, as shown in **Figure 2**. The multilabel data matrix is first converted into a similarity matrix generated from a Laplacian graph. We call this a multilabel-based Laplacian graph and use this graph as inputs to the GCN model. Each node in the output layer predicts the probability of class membership for the label.

#### **3.1 Multilabel-based Laplacian graph**

This section presents the proposed method. Before this, let us describe some notational conventions. Matrices are written in boldface capital letters (e.g., **X**). The transpose of a matrix is denoted as **X**<sup>⊤</sup>. Vectors are written in boldface lowercase

#### **Figure 2.**

*An illustration of the work flow of the proposed method. Fully green color represents the training model; fully blue color represents the test model.*

letters (e.g., **x**). For a matrix **X** ∈ *<sup>n</sup>*�*<sup>m</sup>*, the *j*-th column and the *ij*-th entry are denoted by **x***<sup>j</sup>* and *xij*, respectively. **I** denotes the identity matrix, k k� <sup>2</sup> is the *l*2-norm, and **1** denotes a column vector with all elements equal to ones.

Based on [32], we formally present our multilabel-based Laplacian graph. For a multilabel dataset, let **<sup>X</sup>** <sup>¼</sup> ½ � **<sup>x</sup>**1, <sup>⋯</sup>, **<sup>x</sup>***<sup>n</sup>* <sup>∈</sup> *<sup>n</sup>*�*<sup>m</sup>* be the data matrix with *<sup>n</sup>* and *<sup>m</sup>* representing the number of samples and the dimensions, respectively. **S**∈ *<sup>n</sup>*�*<sup>n</sup>* is the multilabel-based Laplacian graph, and we use a sparse representation method to construct this graph as follows:

$$\begin{aligned} \min\_{\mathbf{S}} \quad & \sum\_{i,j=1}^{n} ||\mathbf{x}\_{i} - \mathbf{x}\_{j}||\_{2}^{2} \mathbf{S}\_{ij} + \beta \sum\_{i=1}^{n} ||\mathbf{s}\_{i}||\_{2}^{2} \\ \text{s.t. } \forall \mathbf{S}\_{ii} = \mathbf{0}, \ \mathbf{S}\_{ij} \ge \mathbf{0}, \ \mathbf{1}^{T} \mathbf{s}\_{i} = \mathbf{1}. \end{aligned} \tag{1}$$

We normalize **1***<sup>Τ</sup>* **s***<sup>i</sup>* = 1 which represents a sparse constraint on **S** because sparse representation is robust to noise [33], and *β* is an adjustable parameter. The second term is added to regularize the loss function.

#### **3.2 Graph convolutional network**

Based on [34], we fit the GCN used for single-label classification to multilabel classification. The GCN has been modified from a first-order Chebyshev approximation [35]. In order to create a multidimensional input, ChevNet convolution with an input vector **x** and a filter *g<sup>θ</sup>* is formulated as follows:

$$\mathbf{x} \star \mathbf{g}\_{\boldsymbol{\theta}} = \theta\_0 \mathbf{x} - \theta\_1 \mathbf{D}^{-\frac{1}{2}} \mathbf{A} \mathbf{D}^{-\frac{1}{2}} \mathbf{x},\tag{2}$$

where ⋆ means the convolution operator, **A** is the adjacency matrix and **D** is the degree matrix. By using the single parameter *θ* = *θ*<sup>0</sup> = �*θ*<sup>1</sup> to avoid overfitting, Eq. (2) can be rewritten as:

*Multilabel Classification Based on Graph Neural Networks DOI: http://dx.doi.org/10.5772/intechopen.99681*

$$\mathbf{x} \star \mathbf{g}\_{\boldsymbol{\theta}} = \theta \left( \mathbf{I}\_{\boldsymbol{n}} + \mathbf{D}^{-\frac{1}{2}} \mathbf{A} \mathbf{D}^{-\frac{1}{2}} \right) \mathbf{x}. \tag{3}$$

Repeated use of this graph convolution operation may cause serious problems such as vanishing gradients. Therefore, **<sup>I</sup>***<sup>n</sup>* <sup>þ</sup> **<sup>D</sup>**�<sup>1</sup> 2**AD**�<sup>1</sup> <sup>2</sup> in Eq. (3) is modified to **<sup>D</sup>**<sup>~</sup> �<sup>1</sup> 2 **<sup>A</sup>**~**D**<sup>~</sup> �<sup>1</sup> 2 with **<sup>A</sup>**<sup>~</sup> <sup>¼</sup> **<sup>A</sup>** <sup>þ</sup> **<sup>I</sup>***<sup>n</sup>* and **<sup>D</sup>**<sup>~</sup> *ii* <sup>¼</sup> <sup>P</sup> *j* **A**~ *ij*, finally giving a layerwise propagation rule to support multidimensional inputs as follows:

$$\mathbf{H}^{(l+1)} = \sigma \left( \tilde{\mathbf{D}}^{-\frac{1}{2}} \tilde{\mathbf{A}} \tilde{\mathbf{D}}^{-\frac{1}{2}} \mathbf{H}^{(l)} \mathbf{W}^{(l)} \right). \tag{4}$$

Here, **H**ð Þ*<sup>l</sup>* is the output of an activation function in the *l*-th layer of the GCN. **W**ð Þ*<sup>l</sup>* is a trainable weight matrix corresponding to the *l*-th layer of GCN. **H**ð Þ <sup>0</sup> is the data matrix. *σ*ð Þ� denotes a specific activation function such as a sigmoid activation function.

This paper considers only a two-layer GCN model as the proposed method, and we modify Eq. (4) by placing the adjacent matrix into a multilabel-based Laplacian graph to obtain the formula of the two-layer GCN method proposed in this paper as follows:

$$\begin{aligned} \mathbf{H}^{(1)} &= \sigma \Big( \hat{\mathbf{D}}^{-\frac{1}{2}} \hat{\mathbf{S}} \hat{\mathbf{D}}^{-\frac{1}{2}} \mathbf{H}^{(0)} \mathbf{W}^{(0)} \Big) \\ \mathbf{H}^{(2)} &= \sigma \Big( \hat{\mathbf{D}}^{-\frac{1}{2}} \hat{\mathbf{S}} \hat{\mathbf{D}}^{-\frac{1}{2}} \mathbf{H}^{(1)} \mathbf{W}^{(1)} \Big), \end{aligned} \tag{5}$$

where **<sup>S</sup>**^ <sup>¼</sup> **<sup>S</sup>** <sup>þ</sup> **<sup>I</sup>***<sup>n</sup>* and **<sup>D</sup>**^ *ii* <sup>¼</sup> <sup>P</sup> *j* **S**^*ij*. For semi-supervised multilabel classification, we evaluate the mean square error over all labeled samples:

$$\text{Mean Square Error} = \frac{1}{t} \sum\_{i=1}^{t} \left( \mathbf{H}\_i^{(2)} - \mathbf{Y}\_i \right)^2,\tag{6}$$

where **<sup>Y</sup>** <sup>∈</sup>½ � 0, 1 *<sup>n</sup>*�*<sup>c</sup>* is the ground truth label matrix with *<sup>c</sup>* labelsets, and *<sup>t</sup>* is the number of labeled samples.
