2.2.4. Decision tree (DT) classifier

Pr Yj=di

124 Machine Learning - Advanced Techniques and Emerging Applications

2.2.3. Support vector machine (SVM) classifier

subject to dið Þ w Yi þ b ≥ 1 i ¼ 1, 2, …, M:

<sup>i</sup>¼<sup>1</sup> <sup>α</sup><sup>i</sup> di Yi and <sup>P</sup><sup>M</sup>

<sup>i</sup>¼<sup>1</sup> <sup>α</sup>idi Yi Yj

where μ<sup>j</sup>

[13].

function as

<sup>w</sup> <sup>¼</sup> <sup>P</sup><sup>M</sup>

b = dj � <sup>P</sup><sup>M</sup>

function

probability.

� � <sup>¼</sup> <sup>1</sup>

min w, <sup>b</sup>

L wð Þ¼ ; <sup>b</sup>; <sup>α</sup> k k <sup>w</sup>

zation problem that describes the hyper-plane can be written as

1 2 X M

0 @

i¼1

X M

j¼1

From expression (22), we can assess <sup>α</sup> and compute w using <sup>w</sup> <sup>¼</sup> <sup>P</sup><sup>M</sup>

class Yð Þ¼ <sup>x</sup> sign <sup>X</sup>

min<sup>α</sup>

k k w 2 2 !

> 2 <sup>2</sup> �<sup>X</sup> M

σj ffiffiffiffiffiffi <sup>2</sup><sup>π</sup> <sup>p</sup> <sup>e</sup>

� <sup>Y</sup>�<sup>μ</sup> ð Þ<sup>j</sup>

<sup>2</sup>σ<sup>j</sup> , Y<sup>1</sup> < Y < Yk, σ<sup>j</sup> > 0,

, where wk k<sup>2</sup> <sup>¼</sup> wT <sup>w</sup> (20)

αið Þ dið Þ� w Yi þ b 1 , α<sup>i</sup> ≥ 0 (21)

A, α<sup>j</sup> ≥ 0 (22)

<sup>i</sup>¼<sup>1</sup> <sup>α</sup><sup>i</sup> di Yi. Then by

(23)

, σ<sup>j</sup> are mean and variance of the set Y1;Y2; …;Yk ½ �: Eq. (19) means that Naïve Bayes

classifier will label the new Yx with the class label di that achieves the highest posterior

For a given training set of pairs Yi ð Þ ; di , i ¼ 1, 2…M , where Yi ∈R , and di ∈ ð Þ þ1; �1 , the minimum weight w and a constant b that maximize the margin between the positive and negative class (i.e., w Yi þ b ¼ �1 ) with respect to the hyper-plane equation w Yi þ b ¼ 0 can be estimated using support vector machine classifier by performing the following optimization

The solution of this quadratic optimization problem can be expressed using Lagrangian

where α ¼ ð Þ α1; α2;…; α<sup>M</sup> is the Lagrangian multipliers. IF we let L wð Þ¼ ; b; α 0 , we can get

<sup>i</sup>¼<sup>1</sup> <sup>α</sup>idi <sup>¼</sup> 0 , and by substituting them into Eq. (21), the dual optimi-

M

1

i¼1 αj

� �, we classify the new instance Yx using the following classification

αjdjð Þþ Yi Yx b !

i¼1

didj Yi Yj

choosing α<sup>i</sup> > 0, from the vector of α ¼ ð Þ α1; α2;…; α<sup>M</sup> and calculating b from

M

i¼1

� �αiα<sup>j</sup> �<sup>X</sup>

For the training of a set of pairs of sensing decision Yi ð Þ ; di , i ¼ 1, 2, …, M , di ∈ ð Þ �1; 1 , the decision tree classifier creates a binary tree based on either impurity or node error splitting rule in order to split the training set into separate subset. Then, it repeats the splitting rule recursively for each subset until the leaf of the subset becomes pure. After that, it minimizes the error in each leaf by taking the majority vote of the training set in that leaf [14]. For classifying a new example Yx, DT classifier selects the leaf where the new Yx falls in and classifies the new Yx with the class label that occurs most frequently among that leaf.
