**1. Introduction**

Wireless sensor networks are widely used for a variety of purposes, including systems that must function safely. More mission-critical subsystems like cars, drones, and others are joining the area of WSNs, although historically, geographically close systems would link wirelessly over time. As a result, it has become imperative to create WSNs that are fault tolerant. Data security plays a crucial part in successful communication. In earlier days for the security purpose Encryption, Firewall, Virtual Private networks (VPN) were used to provide data security. But these methods are not enough to secure the data. Therefore, machine learning approach gives an effective way to deal with the problem. Many researches have performed studies and arrived at various conclusions on data safety. With hardware implementation, these

methods seem to be complex. So, machine learning gives the best solution for the problem. This is the easiest way and does not consume large amount of time compared to other methods and at the same time it is a cost-effective method.

From Education to Entertainment industry data is the backbone. Therefore data security and safety is significant. Hackers may duplicate the data packets or even IP address itself therefore it is difficult to identify the malicious data in the network. Machine learning techniques give the efficient solution.

A Hardware model is implemented [1], using sensors. This method seems to be complicated. Attack detection has achieved through block chain technology [2], But this method suffers from computational delay, block chain overheads, cost of implementations. Other machine learning classifiers are used to find the attacks. The major drawback from the research is more computational time and more false positive values [3, 4]. In the present study false positive values are comparatively less and it is shown in the confusion matrix and it is discussed in the result section.

The Data set considered for the present study is KDD Cup 99 dataset which contains large number of data sets and is publically available. The major attacks that are considered in this attack are DoS (Denial of Service) attack in which an unauthorized user getting access to the network. Probe attack. R2L (Remote to user) attack in which an unauthorized user can send data packets to the system where he or she cannot have the access as a local user. U2R (User to root) attack in which the unauthorized can get into the root.

To analyze the data as attacked or normal data four classifiers are considered, RF (Random Forest), Support Vector machine (SVM), Multilayer Perceptron (MLP), Stochastic Gradient Descent (SGD) Classifiers are used. Raw data cannot be used to test and train the machine learning model. So, Data preprocessing steps such as Feature Selection, Encoding. The Preprocessed data are applied to the different classifiers. Efficiency parameters such as accuracy, precision, recall, F-measure, selectivity, specificity, G-mean are found out. By comparing all these parameters the final result can be achieved. Different percentage of attacks can be introduced in the data set. So that an efficient classifier can be found out for different percentage of attacks. The Efficiency parameter can be obtained from Confusion matrix. Confusion matrix contains True positive (TP), True Negative (TN), False Positive (FP), False Negative (FN) value.

In the present study, a brief description on available data set in the internet is presented. Further, pre-processing of data is discussed in detail. When data sets are applied to 4 different types of classifiers, efficient classifier is derived with respect to confusion matrix parameter.

The paper is organized as follows, Section 2 the motivation for the present study is discussed, Section 3 reviews the related works carried out in the field of intrusion detection system and various data faults that occur and the type of classifiers used is presented. Section 4 introduces the proposed Method, In Section 5 discusses the performance measures and analyses. The Paper finally concludes with Section 6 with future research directions.
