**3. Method**

The method is divided into two main steps, namely, train feature extractor and train classification model as shown in **Figure 1**. The dataset is first utilized to train the feature extraction model based on convolutional autoencoder. In this step, the convolutional autoencoder is trained in order to learn to encode the input in a set of images and then tries to reconstruct the input from them. Thus, it can learn the feature of the input data by minimizing the reconstruction error. Then we extract the encoder and use it as the feature extractor.

**Figure 2** illustrates the method of training feature extractor. From the first step, the model is trained only to learn filters able to extract feature that can be used to reconstruct the input. These filters in the encoder then are extracted and then utilized as the feature extraction for the classification model which is a fully connected layer.

#### **3.1 Feature extractor**

Autoencoders are well-known unsupervised learning algorithm whose original purpose is to find latent lower-dimensional state spaces of datasets, but they are also capable of solving other problems, such as image denoising, enhancement, or colorization. The main idea behind autoencoders is to reduce the input into a latent state space with lower dimensions and then try to reconstruct the input from this representation. Thus the autoencoder uses its input as the reference of the output in the learning phase. The two parts are called encoder and decoder, respectively. By reducing the number of variables which represent the data, we force the model to learn how to keep only meaningful information, from which the input is reconstructable. It can also be viewed as a compression technique as shown in **Figure 2**.

A conventional autoencoder is composed of two layers, corresponding to the encoder *fw*(.) and decode *gv*(.). It aims to find a code for each input that minimizes the difference between input, *xi*, and output, *gu*(*fW*(*xi*)), over all samples [23]:

$$\min\_{W,U} \frac{1}{n} \sum\_{i=1}^{n} \left\| \mathbf{g}\_u(f\_W(\boldsymbol{\omega}\_i)) - \boldsymbol{\omega}\_i \right\|\_2^2,\tag{1}$$

In the fully connected autoencoder,

$$\begin{aligned} \text{noder,} \\ f\_W(\mathbf{x}) &= \sigma(W\mathbf{x}) \quad \equiv \text{ } h \\ \text{g}\_U(h) &= \sigma(Uh) \end{aligned} \tag{2}$$

where *x* and *h* are vectors, *W* is the learn weights, and σ is the activation function. After learning, the embedded vector h is a unique representation for input. In our application, the convolutional autoencoders (CAE) are defined as

$$f\_{\mathbf{X}}(\mathbf{x}) = \mathbf{x} \cdot \mathbf{T} + \mathbf{T}\_{\mathbf{F}} \cdot \mathbf{T} + \dots + \mathbf{T}\_{\mathbf{F}} \cdot \mathbf{T}$$
 Real autoencoders (CAE) are defined as

 
$$f\_{\mathbf{W}}(\mathbf{x}) = \sigma(\mathbf{x} \* W) \equiv h$$
  
$$\mathbf{g}\_{U}(h) = \sigma(h \* U)$$

**47**

*A Deep Learning-Based Aesthetic Surgery Recommendation System*

We propose the CAE-based feature extraction method that learns the generic feature of the face. The encoder serves as the feature extractor that encodes the image into vector h that represent the image of the facial part, for example, the eyes. In **Table 1**, the several model structures that we use in our experiment were shown. We tried four models. The differences between the four models are the number of layers and dimension of the embedded vector. The number of layers in autoencoders vary between 3 and 4 layers, while the dimensions of the embedded vector are 32 and 64.

After training the autoencoder, it will serve as the feature extractor as shown in **Figure 2**. In this step, we apply the transfer learning method to transfer the welllearned filters for facial part feature extraction. We try two different lengths of the embedding vector, say, 32 and 64, as shown in **Table 1**. We also try both to freeze

This model predicts the probability of perfection; the output range of the model is a one-dimensional vector with the interval of [0–1] that reflects the probability of

• If the face is not perfect (surgery is needed to be performed), the probability is 0.

However, in a real situation, we cannot obtain a big dataset of the perfect/nonperfect face. Thus, we assume that the outcome of surgery is perfect (value is 1) and

perfection. The probability of perfection is defined as follows:

• If the face is perfect (no surgery is needed), the probability is 1.

*DOI: http://dx.doi.org/10.5772/intechopen.86411*

**3.2 Perfection prediction system**

*Method of training the feature extractor.*

**Figure 2.**

**Figure 1.**

*Flowchart of the method.*

and to retrain the extractor.

the original face is not perfect (value is 0).

where *x* and *h* are the matrix or tensor and "\*" is the convolutional operator.

*A Deep Learning-Based Aesthetic Surgery Recommendation System DOI: http://dx.doi.org/10.5772/intechopen.86411*

*Advanced Analytics and Artificial Intelligence Applications*

the encoder and use it as the feature extractor.

score [20–22].

**3. Method**

nected layer.

**3.1 Feature extractor**

shown in **Figure 2**.

framework facilitates a holistic approach to identify what kinds of facial treatment should be performed to enhance the attractiveness, rather than predicting beauty

The method is divided into two main steps, namely, train feature extractor and train classification model as shown in **Figure 1**. The dataset is first utilized to train the feature extraction model based on convolutional autoencoder. In this step, the convolutional autoencoder is trained in order to learn to encode the input in a set of images and then tries to reconstruct the input from them. Thus, it can learn the feature of the input data by minimizing the reconstruction error. Then we extract

**Figure 2** illustrates the method of training feature extractor. From the first step,

Autoencoders are well-known unsupervised learning algorithm whose original purpose is to find latent lower-dimensional state spaces of datasets, but they are also capable of solving other problems, such as image denoising, enhancement, or colorization. The main idea behind autoencoders is to reduce the input into a latent state space with lower dimensions and then try to reconstruct the input from this representation. Thus the autoencoder uses its input as the reference of the output in the learning phase. The two parts are called encoder and decoder, respectively. By reducing the number of variables which represent the data, we force the model to learn how to keep only meaningful information, from which the input is reconstructable. It can also be viewed as a compression technique as

A conventional autoencoder is composed of two layers, corresponding to the encoder *fw*(.) and decode *gv*(.). It aims to find a code for each input that minimizes the difference between input, *xi*, and output, *gu*(*fW*(*xi*)), over all samples [23]:

‖*gu*( *fW*(*xi*)) <sup>−</sup>*xi*‖<sup>2</sup>

*fW*(*x*) = σ(*Wx*) ≡ *h gU*(*h*)= <sup>σ</sup>(*Uh*)

where *x* and *h* are vectors, *W* is the learn weights, and σ is the activation function. After learning, the embedded vector h is a unique representation for input. In

*fW*(*x*) = σ(*x* ∗ *W*) ≡ *h gU*(*h*) <sup>=</sup> <sup>σ</sup>(*<sup>h</sup>* <sup>∗</sup> *<sup>U</sup>*)

where *x* and *h* are the matrix or tensor and "\*" is the convolutional operator.

our application, the convolutional autoencoders (CAE) are defined as

2

, (1)

(2)

(3)

min *W*,*U* \_\_1 *<sup>n</sup>* ∑ *i*=1 *n*

In the fully connected autoencoder,

the model is trained only to learn filters able to extract feature that can be used to reconstruct the input. These filters in the encoder then are extracted and then utilized as the feature extraction for the classification model which is a fully con-

**46**

**Figure 2.** *Method of training the feature extractor.*

We propose the CAE-based feature extraction method that learns the generic feature of the face. The encoder serves as the feature extractor that encodes the image into vector h that represent the image of the facial part, for example, the eyes.

In **Table 1**, the several model structures that we use in our experiment were shown. We tried four models. The differences between the four models are the number of layers and dimension of the embedded vector. The number of layers in autoencoders vary between 3 and 4 layers, while the dimensions of the embedded vector are 32 and 64.
