**6. K-nearest neighbor (KNN) algorithm**

The K-nearest neighbor (KNN) classification rule is one of the most well-known and widely used nonparametric pattern classification methods. Its simplicity and effectiveness have led it to be widely used in a large number of classification problems. When there is little or no prior knowledge about the distribution of the data, the KNN method should be one of the first choices for classification. It is a powerful non-parametric classification system which bypasses the problem of probability densities completely.

Nearest Neighbor (NN) is a "lazy" learning method because training data is not preprocessed in any way. The class assigned to a pattern is the class of the nearest pattern known to the system, measured in terms of a distance defined on the feature (attribute) space. On this space, each pattern defines a region (called its Voronoi region). When distance is the classical Euclidean distance, Voronoi regions are delimited by linear borders. To improve over 1-NN classification, more than one neighbor may be used to determine the class of a pattern (K-NN) or distances other than the Euclidean may be used. The KNN rule classifies χ by assigning it the label most frequently represented among the K nearest samples; this means that, a decision is made by examining the labels on the K-nearest neighbors and taking a vote KNN classification was developed from the need to perform discriminate analysis when reliable parametric estimates of probability densities are unknown or difficult to determine (Parvin et al., 2008).

 K-NN is the most usable classification algorithm. This algorithm operation is based on comparing a given new record with training records and finding training records that are similar to it. It searches the space for the k training records that are nearest to the new record as the new record neighbors. In this algorithm nearest is defined in terms of a distance metric such as Euclidean distance. Euclidean distance between two records (or two points in n-dimensional space) is defined by:

$$\text{If } \begin{aligned} \chi\_1 = (\chi\_{11}, \chi\_{12}, \dots, \chi\_{1n}) \text{ and } \begin{aligned} \chi\_2 = (\chi\_{21}, \chi\_{22}, \dots, \chi\_{2n}) \text{ then} \\ \sum\_{i=1}^n (\chi\_{1i} - \chi\_{2i})^2 \end{aligned} \end{aligned}$$

$$\text{dist}(\chi\_1, \chi\_2) = \sqrt{\sum\_{i=1}^n (\chi\_{1i} - \chi\_{2i})^2} \tag{1}$$

Where 1 and 2 are two records with n attributes. This Formula measures the distance between two patterns <sup>1</sup> and 2 (Moradian & Baraani, 2009). The K-nearest neighbor classifier is a supervised learning algorithm where the result of a new instance query is classified based on majority of the K-nearest neighbor category. The training samples are described by n-dimensional numeric attributes. Each sample represents a point in an ndimensional pattern space. In this way, all of the training samples are stored in an ndimensional pattern space.

The following discussion introduces an example demonstrating the general concept of this algorithm in detail. The K nearest neighbor algorithm is very simple. It works based on minimum distance from the query instance to the training samples to determine the nearest neighbors. After we gather K nearest neighbors, we take simple majority of these K-nearest neighbors to be the prediction of the query instance. The data for KNN algorithm consists of several multivariate attributes names that will be used to classify the object Y. Suppose that the K factor is set to be equal to 8 (there are 8 nearest neighbors) as a parameter of this algorithm. Then the distance between the query instance and all the training samples is computed. Because there are only quantitative , the next step is to find the K-nearest neighbors. All training samples are included as nearest neighbors if the distance of this training sample to the query is less than or equal to the Kth smallest distance. In other words, the distances are sorted of all training samples to the query and determine the Kth minimum distance. The unknown sample is assigned the most common class among its k nearest neighbors. As illustrated above, it is necessary to find the distances between the query and all training samples. These K training samples are the closest k nearest neighbors for the unknown sample. Closeness is defined in terms of Euclidean distance (Bawaneh et al., 2008).

Let us consider a set of patterns <sup>1</sup>,..., *<sup>P</sup> <sup>N</sup> R* of known classification where each pattern belongs to one of the classes *CR CR CR CR* 1 2 , ,..., *<sup>S</sup>* . The nearest neighbor (NN) classification rule assigns a pattern Z of unknown classification to the class of its nearest neighbor, where *<sup>i</sup>* is the nearest neighbor to Z if

$$Dist(\mathcal{X}\_l, Z) = \min \left( Dist(\mathcal{X}\_l, Z) \quad l = 1, 2, \dots, N \right) \tag{2}$$

Dist is the Euclidean distance between two patterns in *<sup>P</sup> R* . This scheme is called the 1-NN rule since it classifies a pattern based on only one neighbor of Z. The k-NN rule considers the k-nearest neighbors of Z and uses the majority rule. Let tl where l=1, 2,..,s be the number of neighbors from class l in the k-nearest neighbors of Z (Pal & Ghosh, 2001).

$$\sum\_{l=1}^{s} t\_l = k \tag{3}$$

Then Z is assigned to class j if

370 MATLAB – A Fundamental Tool for Scientific Computing and Engineering Applications – Volume 1

The K-nearest neighbor (KNN) classification rule is one of the most well-known and widely used nonparametric pattern classification methods. Its simplicity and effectiveness have led it to be widely used in a large number of classification problems. When there is little or no prior knowledge about the distribution of the data, the KNN method should be one of the first choices for classification. It is a powerful non-parametric classification system which

Nearest Neighbor (NN) is a "lazy" learning method because training data is not preprocessed in any way. The class assigned to a pattern is the class of the nearest pattern known to the system, measured in terms of a distance defined on the feature (attribute) space. On this space, each pattern defines a region (called its Voronoi region). When distance is the classical Euclidean distance, Voronoi regions are delimited by linear borders. To improve over 1-NN classification, more than one neighbor may be used to determine the class of a pattern (K-NN) or distances other than the Euclidean may be used. The KNN rule classifies χ by assigning it the label most frequently represented among the K nearest samples; this means that, a decision is made by examining the labels on the K-nearest neighbors and taking a vote KNN classification was developed from the need to perform discriminate analysis when reliable parametric estimates of probability densities are

 K-NN is the most usable classification algorithm. This algorithm operation is based on comparing a given new record with training records and finding training records that are similar to it. It searches the space for the k training records that are nearest to the new record as the new record neighbors. In this algorithm nearest is defined in terms of a distance metric such as Euclidean distance. Euclidean distance between two records (or two points in

> ,…, 2*<sup>n</sup>* ) then

12 1 2 1 (,) ( ) *n*

 

*i*

classifier is a supervised learning algorithm where the result of a new instance query is classified based on majority of the K-nearest neighbor category. The training samples are described by n-dimensional numeric attributes. Each sample represents a point in an ndimensional pattern space. In this way, all of the training samples are stored in an n-

The following discussion introduces an example demonstrating the general concept of this algorithm in detail. The K nearest neighbor algorithm is very simple. It works based on minimum distance from the query instance to the training samples to determine the nearest

*i i*

are two records with n attributes. This Formula measures the distance

(1)

(Moradian & Baraani, 2009). The K-nearest neighbor

**6. K-nearest neighbor (KNN) algorithm** 

bypasses the problem of probability densities completely.

unknown or difficult to determine (Parvin et al., 2008).

 ) and 2 =( 21 , 22 

<sup>2</sup>

 and 2 

*dist*

n-dimensional space) is defined by:

 ,…, 1*<sup>n</sup>* 

 and 2 

between two patterns <sup>1</sup>

dimensional pattern space.

If 1 =( 11 , 12 

Where 1 

$$\underbrace{\text{t} \mathtt{=} \mathtt{Max} \begin{pmatrix} \mathtt{t}\_{1} \\ \mathtt{t} \end{pmatrix}}\_{\mathtt{1}} \tag{4}$$

Here is step by step on how to compute KNN algorithm:


The MATLAB program of K-NN is :

**Figure 4.** Flowchart for the K- nearest neighbors
