**3. Conclusion**

**iv.** *Machine learning*: there are two main branches of machine learning namely supervised learning whereby the pattern for the anomaly is learnt and known, whereas in supervised mode, detection is done by inference or featuring. The latter is more challenging as the anomaly pattern is unknown and the algorithm learnt from the data points is to be analyzed. The supervised mode comprises the following methods: Decision Table, Random Forest, K-nearest Neighbor, SVMs, Deep Learning, Naive Bayes. The popular "*unsupervised*" algorithms are K-means clustering, DBSCAN, N-SVM, Stream Clustering,

Below listed are the 10 main issues, in which some are inherent to the IoT network and others

**i.** Missing data points/holes: missing data can happen due to device malfunctioning, for instance, or issues related to device identification for example. "Potent, climate warming gases are being emitted into the atmosphere but are not being recorded in official inventories," a BBC (http://www.bbc.com/news/science-environment-40669449) investigation has found.

**ii.** Data corruption: for instance, data can be corrupted due to external factors or device malfunctioning; thus, it is important to ensure that the data points analyzed are accurate

**iii.** Encrypted data: in most IoT networks, data are encrypted during transmission and normally decrypted for customer usage. If detection is to be performed on encrypted data,

**iv.** Sensor fusion: data points from different sensors can be aggregated for a specific function. For example, different parameters like temperature, carbon footprint, wind speed can be captured from different sensors and merged for modelling on a server for environmental impact study. In such cases, the TSA needs to deal with multiple datasets. Sensor

**v.** Real-time detection: this is probably more inherent to the network itself, but the process-

**vi.** Seasonality: also called as periodic time series, arrives when the time series is influenced

**vii.** Heteroscedasticity: it involves frequent changes in variances that can render the transfor-

**viii.**Noisy data: data points with very low amplitude can be drowned into the intrinsic transmission electronic noise. Network equipment vendors are proposing edge computing routers that would actually clean the IoT device data in a closer location prior to run the

**ix.** Traffic surge: at times, there could be excessive throughput like number SMS on the eve

and LDA (Latent Dirichlet Allocation).

and come from the system under investigation.

anomaly detection might not be straightforward.

fusion is also assimilated to evolving sources.

mation of the time series more complex.

complete analytics on the cloud.

ing and programming aspects of the TSA are also determinants.

by the seasonal factors such as day, night, month, and so on.

of New Year that could bring an overload on the ADE.

**2.3. Problematics**

to the time series properties.

4 Time Series Analysis and Applications

This chapter highlights the challenges relevant to core elements involved in the development of an anomaly detection engine (ADE). It was found that an accurate and reliable ADE relies on three main selection factors namely, the quality of the data points, the time series transformation, and where analytics are executed. Moreover, due to the heterogeneous nature of networking environments, the convergence of communication and data protocols in IoT requires special attention when it comes to anomaly detection software development. For instance, raw data points from a smart water application are surely completely different from that from a health care IoT application; hence, the domain of application is another determinant factor in the construction of an efficient ADE. Machine learning in the unsupervised mode is indeed very efficient in situations where datasets are unpredictable. Moreover, cases where data points show nonlinear time series require multivariate analysis that makes the process more computing intensive. This property is not favorable to real-time anomaly detection as more computation at the ADE level will affect the accuracy of the ADE. From a software development perspective, the trend is similar to data mining tools embedded in popular database servers. Once the dataset is compiled, the user can choose the most appropriate statistical tool. In a near future, ERP solution providers will probably propose the ADE as a customizable module that would best fit the customers' requirements. Future work will investigate into the challenges from empirical experimentations and how anomaly detection can be translated as a service in cloud computing.
