6. Machine learning

� <sup>1</sup>

y � r sinθ

Through some mathematical manipulations, the equation of the straight line becomes

y sin<sup>θ</sup> � rsin<sup>2</sup>

Combining the above two equations yields

And the resulting equation can be rewritten as

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi width<sup>2</sup> <sup>þ</sup> height<sup>2</sup>

spond to the prominent lines.

0 < r ≤

110 Perception of Beauty

perspective.

q

tan<sup>θ</sup> ¼ � cos<sup>θ</sup>

<sup>x</sup> � r cos<sup>θ</sup> ¼ � cos<sup>θ</sup>

By substituting the coordinates x and y of every pixel located in the edges to the above equation, many possible combinations of r and θ are acquired, where the range of r is

combinations whose occurrences exceed a given threshold. The chosen combinations corre-

The results obtained from the prominent line detection performed on the photos of different compositions are demonstrated in Figure 18 associated with the histograms of detected line orientations, respectively. In each histogram, the scope of 180 angle degrees is uniformly partitioned into 10 bins; that is, each bin contains 18 angle degrees. In the horizontal axis of a

Figure 18. The detection results of the prominent lines appearing in six photos of different image compositions associated with their respective histograms of detected lines: (a) central; (b) rule of thirds; (c) vertical; (d) horizontal; (e) diagonal; (f)

<sup>θ</sup> ¼ �xcos<sup>θ</sup> <sup>þ</sup> rcos<sup>2</sup>

sin<sup>θ</sup> <sup>ð</sup>15<sup>Þ</sup>

sin<sup>θ</sup> <sup>ð</sup>16<sup>Þ</sup>

x cosθ þ y sinθ ¼ r ð18Þ

and the range of θ is �90� < θ ≤ 90�. Therefore, we choose the

θ ð17Þ

The machine learning method can be used to determine whether a photo is favorable or not. Basically, there are two kinds of machine learning methods grouped into supervised and unsupervised. In supervised machine learning, each sample photo has a label that indicates whether it is beautiful when training, while in unsupervised learning, there is no label. Choosing different machine learning methods leads to distinct functionalities and results.

#### 6.1. Comparison of machine learning methods

In the unsupervised methods, the K-means algorithm is useful for clustering the n-dimensional feature vectors extracted from a photo. After performing the K-means algorithm and examining each cluster, there will be groups divided into three possible types that stand for favorable, unfavorable, and ambiguous classes. Then reserve the former two clusters. When a new photo is inputted, compute the distance between the centers of these two clusters, and an aesthetic score of the photo can be determined. On the other hand, two models can be used in supervised learning, including decision trees and neural networks. The benefit of a decision tree is that the classification rules are readable, which can be used to tell why a photo is favorable according to the machine learning result. An example of the decision tree is shown in Figure 19. The nodes in the decision tree can be either all low-level features as shown in Figure 19(a) or the nodes can comprise some semantics ones as shown in Figure 19(b).

With the neural network approach, it is possible that the aesthetics score of a photo can be measured. There are two neurons in the output layer. The summation of the two output neurons' values is exactly one. The values of the "high quality" and "low quality" neurons

Figure 19. A simplified decision tree for the perception of beauty trained by 100 samples: (a) without semantics nodes; (b) with some semantics nodes.

are their respective probabilities. The structure of such a neural network for perceiving the beauty of a photo is illustrated in Figure 20.

#### 6.2. Decision tree

A decision tree [10] is a popular supervised learning approach because the decision process is made from walking through a path of the tree and each path can be written as a readable classification rule. In a decision tree, the internal nodes excluding the leaf nodes represent features and their child edges are predicates for the features, such as "is larger than" and "is less than." A node without any children is a leaf node. The class labels are placed on the leaf nodes of the decision tree. When a series of feature values is fed to a decision tree, a path can be established through the decision tree from the root, via some internal nodes, and finally arrives at a leaf node. The label of the leaf node in the path is the classification result.

Decision tree algorithms are based on information theory, where the main idea is to calculate the entropy of the classes of data when the data are composed of specified features and branching values. The entropy of the entire data is computed by

$$Info(D) = -\sum\_{i=1}^{m} p\_i \log\_2(p\_i),$$

where D is the input data, pi is the percentage of class i that appears in all of the data and m is the number of classes. After using feature F to split D into v partitions, the information needed to classify D is computed by

$$\text{Info}\_{\mathbb{F}}(D) = \sum\_{j=1}^{v} \frac{|D\_j|}{|D|} \times \text{Info}(D\_j) \tag{20}$$

where |D| is the cardinality of data, |Dj| is the cardinality of data for partition j, and Info(Dj) comprises the entropy data for segment partition j.

Figure 20. The structure of a neural network for perceiving the beauty of a photo.

The value of v is set to 2 if we want to build a binary decision tree in which the degree of each node is exactly two except leaf nodes. To split the input data into two partitions, a threshold is given initially. To find the optimal threshold, each distinct value in the data of the selected feature F is computed iteratively.

The information gain for feature F is defined as follows

$$\text{Gain}(F) = \text{Info}(D) - \text{Info}\_{\mathbb{F}}(D) \tag{21}$$

The feature F with the highest information gain is then selected as the feature for splitting the data. Nevertheless, the information gain tends to be biased toward the features with more levels. For example, if the brightness has 256 levels and the sharpness has 128 levels, then the result is often biased toward the brightness. Therefore, a gain ratio is used to normalize the information gain to eliminate bias, which is expressed below

$$\text{Gain}\_{\text{ratio}(F)} = \frac{\text{Gain}(F)}{\text{SplitInfo}\_{F}(D)},\tag{22}$$

where

are their respective probabilities. The structure of such a neural network for perceiving the

A decision tree [10] is a popular supervised learning approach because the decision process is made from walking through a path of the tree and each path can be written as a readable classification rule. In a decision tree, the internal nodes excluding the leaf nodes represent features and their child edges are predicates for the features, such as "is larger than" and "is less than." A node without any children is a leaf node. The class labels are placed on the leaf nodes of the decision tree. When a series of feature values is fed to a decision tree, a path can be established through the decision tree from the root, via some internal nodes, and finally arrives

Decision tree algorithms are based on information theory, where the main idea is to calculate the entropy of the classes of data when the data are composed of specified features and

> i¼1 pi log2ðpi

jDjj jDj

where D is the input data, pi is the percentage of class i that appears in all of the data and m is the number of classes. After using feature F to split D into v partitions, the information needed

j¼1

where |D| is the cardinality of data, |Dj| is the cardinality of data for partition j, and Info(Dj)

Þ, ð19Þ

� Inf oðDjÞ ð20Þ

at a leaf node. The label of the leaf node in the path is the classification result.

Inf oðD޼�X<sup>m</sup>

Inf oFðDÞ ¼ <sup>X</sup><sup>v</sup>

branching values. The entropy of the entire data is computed by

comprises the entropy data for segment partition j.

Figure 20. The structure of a neural network for perceiving the beauty of a photo.

beauty of a photo is illustrated in Figure 20.

6.2. Decision tree

112 Perception of Beauty

to classify D is computed by

$$\text{Split } Info\_F(D) = -\sum\_{j=1}^{v} \frac{|D\_j|}{|D|} \times \log\_2\left(\frac{|D\_j|}{|D|}\right). \tag{23}$$

The Gainratio (F) is adopted to replace the information gain to prevent bias toward the features with more levels.

A major problem that affects the performance of decision trees is over-fitting. If the depth of the tree is too high, some unnecessary nodes are then produced, which reduce the accuracy of the decision tree. As a result, pruning must be applied to the tree. A postpruning method involves trial and error. If a node is pruned by replacing it with a class leaf and the accuracy of the tree is better after the replacement, then the pruning is accepted; otherwise, keep the original sub-tree. An example is shown in Figure 21, where if the accuracy of Figure 21(b) is

Figure 21. Pruning of a decision tree: (a) the original tree; (b) the tree has been pruned by replacing a sub-tree with a class leaf.

better than Figure 21(a), then replace the original tree with the pruned tree in Figure 21(b); otherwise, keep the unpruned tree in Figure 21(a).

#### 6.3. Artificial neural network

The multilayer perceptron (MLP) is a feed-forward neural network whose architecture is composed of three main substructures, namely the input, hidden, and output layers. Figure 22 shows the fundamental architecture of an MLP.

The MLP comprises various neurons and synapses associated with connection weights, in which the output of a neuron is derived from an activation function via the weighted sum of its inputs. During the training, the weights are randomly initialized within a range. When the output of a neuron is different from an expected value, the weights are adjusted iteratively until their quantities are almost unchanged.

A neuron in the hidden and output layers is activated by applying inputs x1(p), x2(p), … ,xn(p) at iteration p, which are weighted by w1(p), w2(p), … ,wn(p), respectively. For instance, the sigmoid function serves as the activation function of the neuron, which is expressed as follows

$$y(\mathbf{x}) = Y^{\text{signal}} \left[ \sum\_{i=1}^{n} \mathbf{x}\_i(p) \cdot \mathbf{w}\_i(p) - \theta \right] = 1/1 + \exp \left( -\sum\_{i=1}^{n} \mathbf{x}\_i(p) \cdot \mathbf{w}\_i(p) - \theta \right) \tag{24}$$

where θ is a given bias acting as the quantity deviated from the original input of the neuron.

However, the weighted sum can only solve linear problems. To overcome linear inseparability, a hidden layer is added to constitute the MLP. Because such a neural network is trained with supervised learning, a back-propagation algorithm is developed for updating the weights from the output layer to the input layer [11]. Once all the weights have been trained, the MLP can be employed to predict the output immediately when an input is fed.

Figure 23 shows a three-layer perceptron comprising a hidden layer, which requires the backpropagation mechanism (algorithm) to update the weights in the course of training.

Figure 22. Example of the architecture of a multilayer perceptron.

On the Design of a Photo Beauty Measurement Mechanism Based on Image Composition and Machine Learning http://dx.doi.org/10.5772/intechopen.69502 115

Figure 23. Three-layered back-propagation neural network.

better than Figure 21(a), then replace the original tree with the pruned tree in Figure 21(b);

The multilayer perceptron (MLP) is a feed-forward neural network whose architecture is composed of three main substructures, namely the input, hidden, and output layers. Figure 22

The MLP comprises various neurons and synapses associated with connection weights, in which the output of a neuron is derived from an activation function via the weighted sum of its inputs. During the training, the weights are randomly initialized within a range. When the output of a neuron is different from an expected value, the weights are adjusted iteratively

A neuron in the hidden and output layers is activated by applying inputs x1(p), x2(p), … ,xn(p) at iteration p, which are weighted by w1(p), w2(p), … ,wn(p), respectively. For instance, the sigmoid function serves as the activation function of the neuron, which is expressed as follows

where θ is a given bias acting as the quantity deviated from the original input of the neuron. However, the weighted sum can only solve linear problems. To overcome linear inseparability, a hidden layer is added to constitute the MLP. Because such a neural network is trained with supervised learning, a back-propagation algorithm is developed for updating the weights from the output layer to the input layer [11]. Once all the weights have been trained, the MLP

Figure 23 shows a three-layer perceptron comprising a hidden layer, which requires the back-

propagation mechanism (algorithm) to update the weights in the course of training.

¼ 1=1 þ exp �

Xn i¼1

xiðpÞ � wiðpÞ � θ !

ð24Þ

otherwise, keep the unpruned tree in Figure 21(a).

shows the fundamental architecture of an MLP.

until their quantities are almost unchanged.

<sup>y</sup>ðxÞ ¼ <sup>Y</sup>sigmoid <sup>X</sup><sup>n</sup>

i¼1

Figure 22. Example of the architecture of a multilayer perceptron.

xiðpÞ � wiðpÞ � θ " #

can be employed to predict the output immediately when an input is fed.

6.3. Artificial neural network

114 Perception of Beauty

In Figure 23, Ni, Nj, and Nk are the numbers of neurons in the input, hidden, and output layers of the network, respectively. The weights wij and wjk and biases are initialized by taking the random numbers that are uniformly distributed within a small empirical range, say � <sup>2</sup>:<sup>4</sup> Ni , <sup>2</sup>:<sup>4</sup> Ni � �.

Both the weights wij and wjk are further updated through the delta updating rule depicted below. At iteration p, wij, and wjk are adjusted by Δwij (p) and Δwjk (p) according to the following formulas

$$
\Delta w\_{\vec{\eta}}(p) = \alpha \mathbf{x}\_i(p)\delta\_{\vec{\eta}}(p) \text{ and } \Delta w\_{\vec{\eta}k}(p) = \alpha \mathbf{x}\_{\vec{\eta}}(p)\delta\_{k}(p) \tag{25}
$$

where α is the learning rate ranged from 0.1 to 0.5, and δ<sup>j</sup> (p) and δ<sup>k</sup> (p) are the error gradients for the hidden and output layers, respectively.

Subsequently, the weights wij and wjk at iteration p + 1 are calculated as follows:

$$\Delta w\_{\vec{\eta}}(p+1) = w\_{\vec{\eta}}(p) + \Delta w\_{\vec{\eta}}(p) \text{ and } w\_{\vec{\eta}k}(p+1) = w\_{\vec{\eta}}(p) + \Delta w\_{\vec{\eta}}(p) \tag{26}$$

Following are the respective inputs yj and yk for neuron j and neuron k in the hidden and output layers at iteration p:

$$y\_j(p) = Y^{\text{sigmoid}} \left[ \sum\_{i=1}^n x\_i(p) w\_{ij}(p) - \theta\_j \right], j = 1, 2, \dots, m \tag{27}$$

$$y\_k(p) = Y^{signal} \left[ \sum\_{j=1}^{m} y\_j(p) w\_{jk}(p) - \theta\_k \right], \ k = 1, 2, \dots, l \tag{28}$$

Where θ<sup>j</sup> and θ<sup>k</sup> stand for the input biases of neurons j and k, respectively.

The error between the desired value and the value predicted by the three-layer perceptron is obtained from

$$e\_k(p) = y\_{d,k}(p) - y\_k(p) \tag{29}$$

where yd,k (p) and yk (p) are the desired and predicted outputs, respectively.

The error gradient for the output layer is computed as follows

$$\delta\_k(p) = y\_k(p)[1 - y\_k(p)]e\_k(p) \tag{30}$$

The weights between the hidden layer and the output layer are adjusted by

$$
\Delta w\_{jk}(p) = \alpha y\_j(p)\delta\_k(p) = \alpha y\_j(p)e\_k(p)y\_k(p)[1 - y\_k(p)]\tag{31}
$$

Thus, the error for the output of the output layer can be propagated back to the hidden layer, and the error for the output of the hidden layer is computed as follows

$$\varepsilon\_{\vec{l}}(p) = \sum\_{k=1}^{1} \delta\_k(p) w\_{\vec{l}k}(p) \tag{32}$$

The error gradient for the hidden layer is formulated below

$$\delta\_{\vec{\gamma}}(p) = y\_{\vec{\gamma}}(p)[1 - y\_{\vec{\gamma}}(p)]e\_{\vec{\gamma}}(p) \tag{33}$$

The weights between the input layer and the hidden layer are computed by

$$
\Delta w\_{\overline{\eta}}(p) = \alpha y\_i(p)\delta\_{\overline{\eta}}(p) = \alpha y\_i(p)e\_{\overline{\eta}}(p)y\_j(p)[1 - y\_j(p)] \tag{34}
$$

Thus, the iteration pis increased by 1, and the procedure is repeated until the sum of squared errors is sufficiently small or the number of iterations reaches a given maximum. The following defines the sum of squared errors

$$E = \frac{1}{2} \sum\_{p=1}^{N\_T} \sum\_{k=1}^{l} \left( y\_{d,k}(p) - y\_k(p) \right)^2 \tag{35}$$

where NT is the number of training samples and l is the number of neurons in the output layer. In our proposed method, such a three-layer perceptron is applied to classifying the type of image composition for an input photo.

#### 7. Personal preference

Photo aesthetics is subjective to different groups of people. To deal with this problem, we adopt social networks to collect people's preferences; for instance, the attributes of personal information and the features of his/her favorite pictures. The correlation between the attributes of people and photo features is calculated. A bias is used to influence one of the feature values obtained from different people, which is formulated as follows

On the Design of a Photo Beauty Measurement Mechanism Based on Image Composition and Machine Learning http://dx.doi.org/10.5772/intechopen.69502 117

$$
\hat{b}\_i = b\_i + \sum\_{j=1}^n norm(p\_j) \boldsymbol{\gamma}\_{ij} \,\mu\_i \tag{36}
$$

where I is the index for a photo feature (brightness, color contrast, etc.), j is the index for the personal attribute (gender, age, education, etc.), and n is the number of personal attributes. Besides, bi is the original bias for each feature value, pj is a photographer's attribute value, norm (pj) is a normalized attribute value (from 0 to 1), γij is the correlation for a photo feature value and a photographer's attribute value, which is ranged from �1 to +1, and μ<sup>i</sup> is the personal influence for a feature value. The bias value can be used for the decision tree described in the previous section.
