*2.1.2.3 Attribute selection measures (ASM)*

Whilst enforcing a decision tree, the principle issue arises that how to select the first rate attribute for the foundation node and for sub nodes. So, to resolve such issues there may be a technique, which is known as characteristic choice measure or ASM. With this one can effortlessly choose the first-rate attribute for the nodes of the tree.

The two basic techniques for ASM are:

1. Information gain;

2.Gini Index.

**Information advantage:** Information gain is the clever idea for defining impurity. Impurity measure is a heuristic for selection of the splitting criterion that separates a given dataset of class labelled training tuples into individual classes. If we divide D into smaller partitions as per the outcomes of the splitting criterion, each partition should ideally be pure with all the tuples falling into each partition belonging to the same class. It is the change in entropy after the segmentation of a dataset primarily based on characteristics. It calculates how tons of information is provided about a category. Consistent with the cost of records gain, the node can be split up and build the selection tree.

The information gain can be calculated by using the following notations:

IG <sup>¼</sup> Entropy Sð Þ� weighted average <sup>∗</sup> Entropy each feature ð Þ � �

Entropy is the measure of randomness of the data. It can be calculated as:

<sup>E</sup> ¼ � Probability of yes � � � Probability of no � � <sup>E</sup> ¼ �P yes � � log <sup>2</sup>P yes � � � P no ð Þlog <sup>2</sup>P no ð Þ

**Gini Index:** It is the degree of impurity or purity used at the same time as developing a decision tree for CART algorithm. An attribute with a low Gini index must be desired in comparison to the high Gini index. It only creates the binary splits, which are done by the algorithm by using the Gini index. It can be calculated by the usage of following method:

$$\text{GI} = 1 - \sum\_{\mathbf{j}} \mathbf{P}\_{\mathbf{j}}^{2}$$
