Preface

Anomaly detection is the practice of pinpointing outliers in a set of similar items. Its main objective is to discern between typical and atypical data. Various strategies, including statistical techniques, supervised and unsupervised learning, semi-supervised learning, time-series analysis, and deep learning (DL) methodologies, can be employed to perform anomaly detection. Anomaly detection is versatile, finding use in diverse domains such as medicine, finance, manufacturing, identity protection, data security, networking, video surveillance, and cyber security.

The emergence of next-generation networks has brought forth an array of sophisticated and complex systems that are often vast, decentralized, and dynamic. These networks, which might include 6G and beyond, Internet of Things (IoT) networks, or even intricate data center networks, have multifaceted traffic patterns, a multitude of interconnected devices, and varied user behaviors. Such complexity makes them susceptible to a wide range of anomalies, from performance hiccups to security breaches. Traditional anomaly detection methods might struggle with the sheer scale and intricacy of these networks. In contrast, machine learning (ML), with its capacity to learn from vast amounts of data, can potentially discern patterns and anomalies that would be imperceptible to conventional methods. Given the critical nature of networks, especially in sectors like healthcare or finance, it is paramount not just to detect anomalies but also to understand the "why" behind them. By making artificial intelligence's (AI) decisions interpretable, it becomes easier for network administrators or security experts to take informed actions, rectify issues, or even preemptively address potential vulnerabilities.

In light of the aforementioned context, the editor requested chapters that address various aspects of anomaly detection, such as novel strategies based on AI, machine learning, optimization, control, statistics, and social computing, among others. We received several chapter proposals, and each chapter was reviewed thoroughly. Three sections and seven chapters were finally selected for this book and the details of the chapters are as follows:

Chapter 1 by Venkata Krishna Parimala offers a foundational introduction to the book, establishing a solid framework for understanding anomaly detection from AI and ML perspectives. This chapter lays the groundwork by elucidating key concepts, methodologies, and the significance of anomaly detection in the realm of AI) and ML, setting the stage for more detailed explorations in subsequent chapters. The first section of the book deals with anomaly detection with time series of data. Chapter 2 presents a review of anomaly detection in medical time series with generative adversarial networks. The author, Miloš Cekić, discusses anomaly detection in medical time series, such as diagnosing diseases like epilepsy or preventing fatal events like cardiac arrhythmias. Generative adversarial networks (GANs) have demonstrated potential in various areas, including cybersecurity and data augmentation. Recently, they have been applied to detect anomalies in medical time series. This chapter reviews the use

of GANs in this context, addressing the nature of time-series anomalies, challenges in medical time series, and DL issues. The discussion includes popular GAN models and their application in detecting anomalies in ECG and EEG medical time series. Chapter 3 discusses anomaly detection with time series in IoT. This chapter explores anomaly detection in IoT using ML and DL. With 85 billion devices anticipated by 2025, cyber security challenges arise, according to the authors Menachem Domb et al. The chapter reviews these issues, suggesting protocols and solutions for a safer IoT landscape. Chapter 4 explores the intricacies of anomaly detection in time series. The authors, Farrukh Arslan et al., delve into the numerous challenges anomaly detection confronts in contemporary applications.

The second section of the book discusses anomaly detection vs. intrusion detection. In Chapter 5, Surendra Bhosale et al. explore anomaly detection using the Adaptive Dolphin Atom Search Optimization (DASO) method. They utilize DASO combined with deep RNN techniques to address anomaly detection and intrusions. In Chapter 6, Siamak Parhizkari explains anomaly detection within intrusion detection systems. The chapter examines various facets of anomaly detection, including signature-based detection and both supervised and unsupervised learning methods. It further details the application of anomaly detection in intrusion detection systems.

The third section of the book presents anomaly detection models. In Chapter 7, Hironori Uchida et al. explain software log anomaly detection models. The chapter presents technological advancements in automated software log analysis. Despite DL's high accuracy in software log anomaly detection, its adoption in software development remains limited. Evaluations of five models, including the proposed Neocortical Algorithm, on the BGL dataset revealed overfitting tendencies and highlighted the need for diverse datasets.

This book will assist researchers in understanding the advancements occurring in the field of anomaly detection with AI and ML techniques and their applications.

I would like to thank Publishing Process Manager Ms. Karla Skuliber and other members of the editorial team at IntechOpen for their kind cooperation and help. I also extend our sincere thanks to the contributing authors and reviewers for their interest and support.

> **Venkata Krishna Parimala** Computer Science Department, Sri Padmavati Mahila University, Tirupati, India

Section 1

## Anomaly Detection with Time Series Data

#### **Chapter 1**

## Introductory Chapter: Anomaly Detection – Recent Advances, AI and ML Perspectives and Applications

*Venkata Krishna Parimala*

#### **1. Introduction**

The significance of anomaly detection transcends industries and impacts various facets of daily life and societal functioning. In the world of finance, it serves as a guardian of economic stability. Beyond fraud detection, it helps regulatory authorities monitor for signs of market manipulation or systemic risks that could lead to economic downturns. It is not just about protecting individual investors; it's about safeguarding the entire financial infrastructure on which modern economies rely.

In healthcare, the stakes are even more personal. Anomaly detection algorithms are being integrated into wearable devices, constantly monitoring physiological data to provide real-time health insights. This has the potential to revolutionize preventive medicine by catching symptoms before they manifest into more severe conditions, thereby facilitating early intervention and potentially saving lives.

In transportation, particularly in aviation and autonomous vehicles, anomaly detection is critical for ensuring safety. Algorithms continuously monitor system health and can alert human operators or initiate fail-safes if something goes awry. The ability to detect a malfunction before it leads to a catastrophic failure could mean the difference between a controlled emergency landing and a tragic accident.

The technology also has growing applications in environmental protection. Algorithms can analyze satellite imagery to identify illegal deforestation or poaching activities, enabling timely intervention. Similarly, in marine biology, anomaly detection helps researchers identify unusual patterns in sea temperature or marine life behavior, offering early indicators of environmental issues like ocean acidification.

Additionally, anomaly detection plays a critical role in the realm of data integrity and information verification. In the age of 'fake news,' these algorithms can sift through vast amounts of data to flag misinformation or anomalous reporting, thereby helping to maintain the integrity of public discourse.

Finally, the technology is making inroads into the field of disaster management. By analyzing data from seismic sensors, weather satellites, and historical records, anomaly detection can provide early warnings for natural disasters like earthquakes, tsunamis, or hurricanes, enabling timely evacuations and preparation, thereby minimizing loss of life and property.

The significance of anomaly detection is multi-dimensional, affecting both individual lives and the larger fabric of society. Its potential to drive proactive solutions, prevent crises, and even save lives makes it an indispensable tool in the modern data-driven world.

#### **2. The limitations of traditional methods**

Traditional methods of anomaly detection have provided a foundational framework for identifying outliers in data, but as data have grown more complex, these methods are showing their limitations more prominently. One of the most glaring issues is the assumption of a specific data distribution. Traditional techniques often assume that data follow a Gaussian or similar distribution, an assumption that is frequently violated in real-world applications. This not only affects the accuracy but also limits the type of anomalies that can be detected.

Another substantial limitation is scalability. Traditional methods were not designed to handle the massive datasets generated in contemporary applications, such as social media analytics, sensor networks, and large-scale e-commerce. Processing large datasets often requires significant computational resources, making these methods inefficient and sometimes impractical for big data applications.

Sensitivity to parameter settings is another drawback. The effectiveness of traditional methods often hinges on the appropriate selection of parameters like thresholds or cluster sizes. Inconsistent or suboptimal parameter selection can result in missed anomalies or an excessive number of false alarms. This makes traditional methods highly dependent on domain expertise and often requires manual tuning, which is both time-consuming and susceptible to human error.

Traditional methods also struggle with high-dimensional data. In scenarios where multiple attributes or features are involved, the effectiveness of traditional methods diminishes. They often suffer from the "curse of dimensionality," a phenomenon where the data become increasingly sparse as the dimensionality increases, making it challenging to identify meaningful patterns.

The issue of temporal dynamics is another limitation. Traditional methods are often ill-suited for detecting anomalies in time-series data where temporal correlations are essential. They usually treat data points as independent entities, ignoring the temporal relationships that are often crucial for accurate anomaly detection in sequences.

Lastly, interpretability and transparency, although considered a strength of traditional methods, can also be a limitation. The simplified models may offer easier interpretation but at the cost of capturing the complexities of the data. This trade-off often leads to models that are overly simplistic, failing to capture the nuanced behaviors that more advanced models can identify.

#### **3. The role of AI and ML in anomaly detection**

The infusion of artificial intelligence (AI) and machine learning (ML) technologies into anomaly detection is revolutionizing the field, offering a robust set of tools and methodologies that far exceed the capabilities of traditional techniques. These advanced algorithms are designed to tackle multi-dimensional and large-scale data, making them well-suited for modern applications that often involve big data and streaming analytics.

#### *Introductory Chapter: Anomaly Detection – Recent Advances, AI and ML Perspectives… DOI: http://dx.doi.org/10.5772/intechopen.113968*

Machine learning models like Random Forests and Support Vector Machines have been particularly effective in feature selection and reducing dimensionality, which are common challenges in high-dimensional data spaces. Deep learning techniques, such as Long Short-Term Memory (LSTM) networks, have shown exceptional performance in time-series anomaly detection, a critical aspect in sectors like finance and industrial automation. More recently, Generative Adversarial Networks (GANs) have been adapted for anomaly detection, proving effective in learning complex data distributions without the need for explicit labeling.

One of the most compelling advancements is the introduction of semi-supervised and unsupervised learning techniques. These models do not require a fully labeled dataset for training, a feature that is particularly advantageous in scenarios where labeling is costly or impractical. This opens up new avenues for anomaly detection in fields like cybersecurity, where attacks are continually evolving, and manual labeling quickly becomes obsolete.

Furthermore, the AI and ML models are increasingly becoming capable of real-time learning, a critical requirement in dynamic environments. For example, reinforcement learning algorithms can interact with their environment in real-time, adapting their anomaly detection strategies as they gain more information. This is invaluable in applications such as autonomous driving and real-time network security, where the cost of failing to detect an anomaly could be catastrophic.

In addition to performance benefits, AI and ML are also contributing to the explainability and interpretability of anomaly detection models. With the advent of techniques like Local Interpretable Model-agnostic Explanations (LIME) and SHAP (SHapley Additive exPlanations), these complex models are becoming less of a 'black box,' thereby gaining greater acceptance in fields that require rigorous validation, such as healthcare and aviation.

Anomaly detection is a growing field with applications across various domains such as healthcare, building management, cybersecurity, weather forecasting, and surveillance. With the advent of artificial intelligence (AI) and machine learning (ML), sophisticated techniques are being developed to tackle complex anomaly detection tasks. However, each domain has its own set of challenges and requirements that influence the choice of techniques and their effectiveness.

In healthcare, Cekić et al. [1] shed light on the importance of anomaly detection in medical time series data, such as electrocardiography (ECG) and electroencephalography (EEG). They highlight the use of Generative Adversarial Networks (GANs) for this purpose. While GANs have shown promise, they also present challenges related to medical data, such as limited labeled samples and the complex nature of anomalies. In a similar vein, Esmaeili et al. [2] investigate the use of GANs for anomaly detection in biomedical imaging. Their study, conducted on seven different medical imaging datasets, shows highly variable performance (AUC: 0.475-0.991; Sensitivity: 0.17- 0.98; Specificity: 0.14-0.97), indicating the method's limitations and the need for further research.

In the context of building management, Copiaco et al. [3] take a unique approach by using two-dimensional (2D) image representations of energy time-series data for deep anomaly detection. Their method achieved impressive F1-scores of 93.63 and 99.89% on simulated and real-world datasets, respectively. Himeur et al. [4] expand on this by surveying AI and big data analytics in building automation and management systems (BAMSs). They identify the current limitations, including the system's focus primarily on heating, ventilation, and air conditioning (HVAC) controls, and suggest AI as a promising solution.

Cybersecurity is another critical application area. Javaheri et al. [5] focus on Distributed Denial of Service (DDoS) attacks, providing a comprehensive survey that proposes effective defensive strategies. They emphasize the use of fuzzy logic-based methods as a promising avenue for future research. Zehra et al. [6] discuss the security challenges in Network Function Virtualization (NFV), advocating for machine learning-based anomaly detection techniques to enhance network security.

In other specialized applications, Jin et al. [7] provide a comprehensive review of Graph Neural Networks (GNNs) for time series analysis, which includes forecasting, classification, and anomaly detection. Their work serves as a guide to understand the strengths and limitations of using GNNs for time-series data. Patriarca et al. [8] delve into the importance of weather forecasting for aerodrome operations and propose a machine learning-based approach for anomaly detection in historical weather data. Finally, Şengönül et al. [9] explore the use of AI in surveillance video anomaly detection, noting the increasing need for automated systems due to the sheer volume of video data being generated.

In summary, while AI and machine learning offer promising solutions for anomaly detection across domains, the effectiveness of these techniques varies significantly. The limitations often arise from domain-specific challenges such as data sparsity, complexity of the anomalies, and computational constraints. Therefore, tailored approaches and continuous research are essential for advancing the field.

#### **Author details**

Venkata Krishna Parimala Computer Science Department, Sri Padmavati Mahila University, Tirupati, India

\*Address all correspondence to: pvk@spmvv.ac.in

© 2023 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Introductory Chapter: Anomaly Detection – Recent Advances, AI and ML Perspectives… DOI: http://dx.doi.org/10.5772/intechopen.113968*

#### **References**

[1] Cekić M. Anomaly Detection in Medical Time Series with Generative Adversarial Networks: A Selective Review. London: IntechOpen; 2023. DOI: 10.5772/intechopen.112582

[2] Esmaeili M et al. Generative adversarial networks for anomaly detection in biomedical imaging: A study on seven medical image datasets. IEEE Access. 2023;**11**:17906-17921. DOI: 10.1109/ACCESS.2023.3244741

[3] Copiaco A, Himeur Y, Amira A, Mansoor W, Fadli F, Atalla S, et al. An innovative deep anomaly detection of building energy consumption using energy time-series images. Engineering Applications of Artificial Intelligence. 2023;**119**:105775. DOI: 10.1016/j. engappai.2022.105775

[4] Himeur Y, Elnour M, Fadli F, et al. AI-big data analytics for building automation and management systems: A survey, actual challenges and future perspectives. Artificial Intelligence Review. 2023;**56**:4929-5021. DOI: 10.1007/s10462-022-10286-2

[5] Javaheri D, Gorgin S, Lee J-A, Masdari M. Fuzzy logic-based DDoS attacks and network traffic anomaly detection methods: Classification, overview, and future perspectives. Information Sciences. 2023;**626**:315-338. DOI: 10.1016/j.ins.2023.01.067

[6] Zehra S, Faseeha U, Syed HJ, Samad F, Ibrahim AO, Abulfaraj AW, et al. Machine learning-based anomaly detection in NFV: A comprehensive survey. Sensors. 2023;**23**:5340. DOI: 10.3390/s23115340

[7] Jin M, Koh HY, Wen Q, Zambon D, Alippi C, Webb GI, et al. A survey on

graph neural networks for time series: Forecasting, classification, imputation, and anomaly detection. 2023. arXiv:2307.03759 [cs.LG]. DOI: 10.48550/ arXiv.2307.03759

[8] Patriarca R, Simone F, Di Gravio G. Supporting weather forecasting performance management at aerodromes through anomaly detection and hierarchical clustering. Expert Systems with Applications. 2023;**213**(Part C):119210. DOI: 10.1016/j. eswa.2022.119210

[9] Şengönül E, Samet R, Abu Al-Haija Q, Alqahtani A, Alturki B, Alsulami AA. An analysis of artificial intelligence techniques in surveillance video anomaly detection: A comprehensive survey. Applied Sciences. 2023;**13**(8):4956. DOI: 10.3390/app13084956

#### **Chapter 2**

## Anomaly Detection in Medical Time Series with Generative Adversarial Networks: A Selective Review

*Miloš Cekić*

#### **Abstract**

Anomaly detection in medical data is often of critical importance, from diagnosing and potentially localizing disease processes such as epilepsy to detecting and preventing fatal events such as cardiac arrhythmias. Generative adversarial networks (GANs) have since their inception shown promise in various applications and have been shown to be effective in cybersecurity, data denoising, and data augmentation, and have more recently found a potentially important place in the detection of anomalies in medical time series. This chapter provides a selective review of this novel use of GANs, in the process highlighting the nature of anomalies in time series, special challenges related to medical time series, and some general issues in approaching time series anomaly detection with deep learning. We cover the most frequently applied GAN models and briefly detail the current landscape of applying GANs to anomaly detection in two commonly used medical time series, electrocardiography (ECG) and electroencephalography (EEG).

**Keywords:** anomaly detection, medical time series, generative adversarial network (GAN), electrocardiogram (ECG), electroencephalogram (EEG)

#### **1. Introduction**

The increasingly widespread deployment of advanced technology in modern healthcare has led to exponential growth in the generation and collection of medical time series data, which offer unprecedented opportunities for the detection and diagnosis of disease processes. Being able to use these "big" data to detect and localize anomalies can lead to early and precise disease recognition, timely and proactive intervention, development of personalized treatment plans, improved patient outcomes, and better risk management while at the same time offering opportunities for better understanding and classification of disease pathology [1–5].

The extraction of meaningful insights from medical time series data, however, poses a number of significant challenges. For one, the data are highly complex and often multidimensional/multimodal and nonstationary. They also usually suffer from noise, missing values, and artifacts as well as from the inherent variability of human physiology and broad range of normality across individuals, which makes the identification of "abnormal" a nontrivial task [6]. Anomalies within the data can be highly heterogeneous and can manifest as subtle deviations or sudden, drastic changes. Additionally, specific disease data that could be used to train analytical models are at best highly heterogeneous and more likely simply not available; even when data are available, properly labeled data are generally lacking. Finally, at the present time, there is no expert knowledge related to large medical datasets that can parallel classical medical knowledge—it is sometimes not even clear what exactly to look for since traditional medical semiology operates in a different conceptual space (defined signs/ symptoms vs. patterns in massive data) [1–3, 6].

Traditionally, statistical and rule-based methods have been employed for anomaly detection in medical time series data [4, 7, 8]. These methods rely on predefined thresholds, statistical models, or expert knowledge to identify deviations from normal. These approaches, however, often struggle to capture complex and nonlinear patterns that may be present in the data. This has recently led to a growing interest in leveraging machine learning and deep learning techniques for anomaly detection in medical imaging [1–3, 6, 7, 9–11]. A widely used type of model is the generative adversarial network (GAN), which has demonstrated superiority in a variety of tasks in medical imaging due to its powerful ability to learn the distribution of the training data and to generate novel but realistic samples that reflect the underlying data characteristics [12–26]. A GAN trained on normal instances only, for example, can capture the complex patterns and dependencies inherent in the data, enabling the generation of synthetic samples that closely adhere to the learned distribution. Anomalies, being significantly different from normal, can then be identified by how far they deviate from their reconstruction [27–32]. The application of GANs in anomaly detection for medical time series has demonstrated promising results, and GAN-based approaches have been shown to be able to effectively capture temporal dependencies, handle complex patterns, and adapt to individual patient variations. Moreover, they can detect subtle anomalies that may go unnoticed by traditional methods [29, 31, 32].

This chapter endeavors to provide a brief review of the current landscape of the use of GANs in anomaly detection in medical time series data. We first present a brief overview of properties of anomalies, time series data, and specific challenges related to medical time series. We then discuss general ways of approaching time series with deep learning methods before discussing GANs and GAN applications to anomaly detection in general and to time series in particular. We then review the current state of the use of GANs in medical imaging and anomaly detection in specific fields of electrocardiography (ECG) and electroencephalography (EEG). Finally, we briefly discuss some challenges and future directions.

#### **2. Anomalies in time series: problem complexities and challenges**

#### **2.1 Properties of time series data**

While time is a fundamental concept in nearly all data, time series explicitly involve the temporal dimension. The following is a brief summary of the specific properties of time series and how they affect anomaly detection.

*Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

#### *2.1.1 Temporality*

A time series is an ordered sequence of data points indexed by time (usually but not always across equal temporal intervals) [33–36]. We can define a time series as a vector *X* such that: *X* ¼ f g *x*1, *x*2, … , *xt* , where *xi* represents the datum at time *i* ∈*T* and *T* ¼ f g 1, 2, … , *t* . The (necessary) assumption of continuity of the underlying generative process implies that each point is in some way conditioned on previous values (past states of the process), with this dependency captured as a joint distribu-

tion of a set of observations: *p x*ð Þ¼ 1, *x*2, … , *xt p x*ð Þ<sup>1</sup> Q*t* 2 *p x*ð *<sup>t</sup>*j*x*1, *x*2, … , *xt*�1Þ. The influence of the past is generally assumed to decrease with time, though this may not necessarily be the case [33].

#### *2.1.2 Dimensionality*

Time series data may be univariate or multivariate, with the dimension representing the number of individual data attributes captured at each time point. The above specification of a time series vector is univariate. Multivariate series can be defined as a time-ordered set of multidimensional vectors *Xt* (rather than points), with *Xt* <sup>¼</sup> *<sup>x</sup>*<sup>1</sup> *<sup>t</sup>* , *x*<sup>2</sup> *<sup>t</sup>* , … , *xd t* � �, where d is the number of dimensions; the multivariate time series is then a rank d + 1 tensor *X<sup>j</sup> i* , where *j*∈ *D* and *D* ¼ f g 1, 2, … , *d* is the number of dimensions and *i* ∈*T* and *T* ¼ f g 1, 2, … , *t* denotes time [9, 33]. Alternatively, multivariate series can also be conceptualized as a collection of univariate time series. While analysis of univariate time series needs to consider only the relationship between the current state and previous states (temporal dependency), multivariate series entail dependencies and correlations (semantics) across both previous states (temporal) within a series and other dimensions (spatial) at any given time point, keeping in mind that any given datum may also depend on a mixture within and across different time series (spatiotemporal dependencies). These dependencies may be multiscale (short-, medium-, or long-range) and in some cases nonstationary or dynamic, meaning that the scale and structure of dependencies itself may vary in time [37–40].

#### *2.1.3 Nonstationarity*

A time series is assumed to be stationary if its statistical properties do not change over time. Most real-world time series are not stationary, however, meaning the mean and variance (and other moments of the distribution) vary. Common sources of nonstationarity include trends (baseline drift that may be local or global and linear or nonlinear), seasonal cycles (with a stable period), nonseasonal cycles (with a variable period), pulses and steps (including concept drift and change points, instances where the relationship between input and output changes), and random/irregular movement. Because nonstationarity implies that the data distribution itself changes in some way, an appropriate model will need to somehow capture the underlying generative process rather than the statistics of the apparent data [33–36].

#### *2.1.4 Noise*

Real-world datasets typically contain a significant amount of noise or unwanted signal, which represents the semantic boundary between normal data and true

anomalies. The classical definition of an anomaly or outlier is "an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism" [41]. In this context, it is helpful to differentiate between outlier and anomaly: *outlier* refers to "unusual data objects that are statistically rare but may or may not result from the same generative process as other data objects," while, in contrast, *anomalies* are defined as "data objects that have resulted from a distinct generative process that may or may not appear in the data as outliers" [42]. This distinction is necessarily contextual and application-specific, although typically anomalies will have a much higher outlier score than noise [43] since they are presumably generated by a different underlying process.

#### **2.2 Properties of anomalies**

The detection of anomalies is a specific problem in pattern recognition that is distinct from other analytical and learning tasks. Key complexities are discussed below. These issues apply to any anomaly detection and not just time series, but with time series and medical time series in particular additional specific challenges arise.

#### *2.2.1 Unknown nature of anomalies*

Anomalies are by definition unknown and may involve unknown abrupt behaviors, patterns, or distributions, which remain unknown until they occur. Even if a particular type of anomaly is known and categorized (there are two distinct issues here, since even if an anomaly type exists, it may not be immediately classifiable due to the heterogeneity of its manifestations), recognizing it may still be difficult. With machine learning, there is the added complexity that, even if various diseases are categorized in terms of their specific symptoms, the disease process may not be clearly defined within the particular modality performed for its detection [10, 11, 33, 37–39].

#### *2.2.2 Anomaly class heterogeneity*

Since anomalies are irregular and heterogeneous, one class of anomalies may have very different abnormal characteristics when compared to another class of anomalies; in other words, not only are anomaly classes themselves heterogeneous, the heterogeneity *within* anomaly classes or types is itself heterogeneous [11, 36, 39].

#### *2.2.3 Dataset/class imbalance*

Anomalies are typically rare events, occurring much less often than normal instances, which account for the overwhelming majority of the data. It is therefore extremely difficult and labor-intensive (if not impossible) to collect a sufficient amount of labeled and/or clearly defined abnormal instances that could be used for anomaly definition and model training. The result is severe class imbalance in any potential training set [29, 32, 40].

#### *2.2.4 Types of anomalies*

Anomalies can generally be classified into three types, point anomalies, contextual anomalies, and collective anomalies, which in time series correspond to *Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

abnormal time points, abnormal time intervals/subsequences, and abnormal time series [10, 29, 33, 37, 39].

#### *2.2.4.1 Point anomalies*

A point anomaly is a single datum at which the value of the observed variable is significantly different either from the entire time series (global) or from neighboring points in a time series (local or contextual). Point anomalies may be univariate or multivariate and usually entail extreme values. An example may include an abnormal blood pressure reading, which would then need to be defined as noise or something potentially indicating a deeper problem.

#### *2.2.4.2 Time interval/subsequence anomalies*

This type involves a subsequence of points that does not reflect the normal behavior of the system and in which each individual observation may be within normal range but the subsequence as a whole is anomalous. The subsequence may affect a single (univariate) or multiple (multivariate) time-dependent variables. An example would be an epileptic seizure on electroencephalogram (EEG), which may not be out of normal range in any individual point, but in which the multivariate pattern across multiple electrodes over a finite time period reflects the abnormality.

#### *2.2.4.3 Collective or time series anomalies*

This class includes cases where the entirety of a (or several) time series of a multivariate dataset is anomalous with respect to the dataset as a whole. This type is distinct from subsequence anomalies due to the length of the anomaly, which extends to the same length as the sequence. The idea here is that what is represented is not a temporary anomaly in the functioning of a part of a multidimensional system that returns to normal at some point, but rather that there is a persistent underlying abnormality in some portion of the system that can only be detected in the context of the entire dataset. An example would be abnormal signal related to a voxel (volume element) or group of voxels in a resting-state functional magnetic resonance imaging (MRI) (rs-fMRI) study representing a brain region that might reflect an underlying disease process such as an epileptogenic focus or tumor. The only way to detect this is by looking at the relationship of the various time series to each other both within individual patients and across individuals.

#### **2.3 Challenges specific to medical time series**

Anomaly detection in medical time series data comes with unique challenges, stemming from the nature of the data, the inherent variability in human physiology, and the requirement for the results to be interpretable by healthcare professionals. The following are the major domain-specific challenges.

#### *2.3.1 Noise and artifacts*

Medical time series data can contain a significant amount of noise due to sensor inaccuracies, patient movement, measurement errors, and physiological artifacts [1, 2]. For example, an ECG signal may contain noise from muscle contractions, or a glucose monitor may have inaccuracies due to calibration errors. These artifacts can distort the underlying physiological signal and lead to false detections of anomalies.

#### *2.3.2 Missing and irregularly sampled data*

Medical time series data often suffer from missing values and irregular sampling intervals. For example, a patient might remove a wearable device for a period, leading to missing data, or a sensor might malfunction. Irregularly sampled data can arise in outpatient settings where measurements are taken at each visit, but the visits occur at irregular intervals. These irregularities pose challenges for conventional time series analysis methods, which typically assume regular sampling, and require specialized techniques to handle missing values, synchronize timestamps, and ensure consistent analysis across different time series [1, 2, 6].

#### *2.3.3 Nonstationarity*

Medical time series data often exhibit nonstationarity, meaning that their statistical properties change over time. This could be due to a patient's changing health status, the effect of medications or interventions, or changes in external conditions such as time of day or physical activity. Traditional time series analysis techniques often assume stationarity, so nonstationarity poses a significant challenge [35, 43].

#### *2.3.4 High dimensionality*

Medical time series data can involve a high number of variables or channels, possibly with different types of data that are collected differently (e.g., blood pressure, heart rate, electrocardiogram, oxygen (O2) saturation, and respiration may all be monitored simultaneously in the intensive care unit (ICU)) or may consist of massively multidimensional imaging data (e.g., for the open source dataset in the Human Connectome Project, each raw rs-fMRI time point contains 673,920 voxels or dimensions, which over the span of an approximately 15 min scan generates 8 108 data points per run per subject). This high dimensionality presents challenges in data storage, computation, visualization, and analysis. It also increases the need for complexity of anomaly detection algorithms, as they need to be able to handle and interpret data across multiple dimensions [1, 4, 5, 38].

#### *2.3.5 Intra- and interpatient variability*

There is a high degree of variability in physiological parameters both within the same individual over time (intrapatient variability) and between different individuals (interpatient variability). This variability complicates the task of defining what constitutes an "anomaly." An anomalous value for one individual might be normal for another, and a reading that is normal for a person at rest might be abnormal during exercise. Anomaly detection methods need to account for these interpatient variabilities and develop patient-specific or subgroup-specific models to accurately capture normal and abnormal patterns [1].

*Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

#### *2.3.6 Lack of labeled data*

Supervised learning methods, which can be very effective for anomaly detection, require labeled data for training. Obtaining such labeled data in healthcare settings is challenging, however, as it typically requires expert clinicians to manually review and label the data, which is time-consuming and expensive [2]. This essentially forces most anomaly detection (AD) models to be semi-supervised or unsupervised (see below).

#### *2.3.7 Privacy and security*

Medical data are highly sensitive, and strong safeguards are required to protect patient privacy. This can make it challenging to gather sufficient data for building robust anomaly detection models, and it also requires careful consideration when deploying these models to ensure that patient data are handled securely [1, 27, 28].

#### *2.3.8 Interpretability and explainability*

Healthcare professionals need to understand the detected anomalies and trust the reasoning behind them to make informed decisions. Anomaly detection methods should therefore provide interpretable results, visualizations, or explanations that can lead to improved or optimized treatment decisions. Many advanced machine learning models, however, and particularly deep learning models, often act as "black boxes," making their predictions difficult to interpret. This lack of interpretability can hinder the adoption of these models in clinical practice [1, 28].

#### **2.4 Summary**

Anomaly detection in multidimensional time series data is a significant challenge both from conceptual and technical points of view. Anomalies are by nature rare and heterogeneous, and therefore not easily amenable to classification, which means that they must generally be classified not as what they are but rather as what they are not. This makes not just detection but even simply definition of anomalies challenging: what is normal? How "different" must something be from some generalization (e.g., mean or average) of normal to constitute an anomaly? How do we define this difference? If the normal data change over time, how do we learn and describe this aspect of normality? What is the normal evolution of the behavior of our system? For multivariate time series, the issues multiply due to dependencies in the data that may extend over time and across multiple variables and may involve different spatiotemporal scales simultaneously. Determining normal behavior then is quite challenging, and the detection of potentially multiscale spatiotemporal anomalies even more so. As an extended example, consider rs-fMRI once again: say a brain region comprising a set of voxels is abnormal and therefore generates an anomalous signal. Detecting this anomaly requires that we are somehow able to learn all the multivariate dependencies at multiple scales across multiple subjects (the normal population): spatially between not just voxels but groups of voxels comprising brain regions, which may vary in size; temporally in that the length of the anomalous series may vary from short unusual behaviors to longer term abnormalities; and spatiotemporally in that different length signals from regions of different sizes may affect other regions at different spatial scales. Since anomalies are heterogeneous and we cannot know *a priori* exactly what

we are looking for, detecting an "anomalous brain region" in the highly multidimensional rs-fMRI signal requires significant model complexity and computational power.

Finally, there are additional practical concerns: generalized unavailability of labeled training data, unreliable or noisy data, heterogeneously organized or collected data, data privacy (extremely important in medical datasets), interpretability, throughput and automatization, and scalability from specific research purposes to clinical applicability.

#### **3. Anomaly detection in time series: concepts and models**

#### **3.1 Basic paradigms**

Traditional anomaly detection methods in healthcare include statistical approaches, rule-based methods, and machine learning techniques. Statistical methods often rely on distribution-based models, such as Gaussian or Hidden Markov Models, to capture the normal patterns and detect deviations from them. More sophisticated classical models include autoregressive integrated moving average (ARIMA) models, exponential smoothing state space model (ETS), and Seasonal-Trend decomposition using LOESS (STL). These approaches are useful for detecting point or global anomalies, but are not helpful in the detection of contextual or collective anomalies [42–45] and are not useful for multivariate or multidimensional data [37–39]. Rule-based methods utilize expert-defined rules or thresholds to identify anomalies based on predefined criteria. Machine learning models, especially unsupervised machine learning algorithms, like clustering (K-means, density-based spatial clustering of applications with noise (DBSCAN)) or K-nearest neighbors (K-NN), are often used to detect outliers in time series data. Supervised learning methods, like support vector machine (SVM) or Random Forests, can also be used when labeled data are available. Deep learning methods, however, are best for capturing complex nonlinear relationships and dependencies in time series data [33, 35, 44–47]. Long Short-Term Memory (LSTM) networks and autoencoders (AEs) have been common choices due to their ability to model temporal dependencies, with GANs being utilized more recently for anomaly detection in time series [29, 33, 39].

Generally speaking, machine (and deep) learning methods can be grossly categorized along two axes: learning scheme vs. anomaly determination. Note that additional axes are possible, e.g., univariate vs. multivariate, anomaly type (point, subsequence, or time series); however, these will not be specifically considered here since most medical time series are multivariate and most deep learning approaches attempt to find anomalies in an anomaly-type agnostic manner [33].

#### *3.1.1 Learning scheme*

The three major learning schemes in machine learning are supervised, unsupervised, and semi-supervised. Supervised models aim to learn a mapping from data to their corresponding annotations and then use this mapping to perform classification on test data. Important examples of a supervised approach in the medical field are applications to automated brain tumor and ischemic stroke recognition and segmentation (using the extensively labeled Multimodal Brain Tumor Image

#### *Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

Segmentation, BraTS [48], and Ischemic Stroke Lesion Segmentation, ISLES, datasets [49]). Although these networks can successfully learn to recognize visual anomalies in medical images [2, 4], they require accurate labels and annotations and can only learn to recognize specific predefined abnormalities that are already present in the training set. Given the difficulty and time-consuming nature of labeling data manually, this approach is usually only appropriate for very specific use-cases.

Most approaches to anomaly detection time series therefore take the form of unsupervised or semi-supervised learning [32, 38, 45]. In unsupervised anomaly detection, the algorithms separate anomalies without prior knowledge or any explicit distinction between normal and abnormal, whereas in semi-supervised approaches all training data are assumed to be normal. Unsupervised methods are the most flexible since they depend entirely on the internal features of the data; however, this type of approach generates its own set of problems including potentially nonconvergent training, unclear recognition of anomalies, and difficulty in interpretation. For most medical data, including time series, a semi-supervised approach is generally considered to be most appropriate [6, 10, 11, 45–47], given the fact that the overwhelming amount of data collected is normal and that the anomalies themselves are highly heterogeneous.

#### *3.1.2 Anomaly determination*

Most anomaly detection involves an "anomaly score"—a number that is calculated based on the analytical technique that can then be compared to what is expected from a normal dataset. Once the data are learned (the model or distribution is assumed and fitted), a measure of the difference of each particular datum (whatever form this takes, point, subsequence, etc.) from the learned distribution must be determined, and this is the anomaly score. The following approach is nearly universal in machine (and by extension deep) learning for anomaly detection in time series: pick or create a model/architecture that we think will appropriately model the dataset, train the model on normal data in order to learn the data distribution (presumably with all dependencies), and then test new data with reference to the learned distribution using an anomaly score, which then determines if the datum in question was generated by the same underlying process the model was trained on or something different [11, 29].

Three basic approaches to determining anomaly scores are used, regardless of the time-modeling approach taken (see below): forecasting/prediction, reconstruction, and distance/dissimilarity [45].

#### *3.1.2.1 Prediction*

Prediction or forecasting-based models learn to predict expected future values based on the learned data, with anomalies determined based on the residual between the predicted value and the observed quantity. Most forecasting models use a sliding window to forecast one point at a time, although short sequences can also be generated. There is no robust forecasting-based model for rapidly and continuously changing time series, however, since such time series can only be predicted in the very short term if at all, and forecasting models are known to demonstrate significantly increased prediction errors as the number of time points increases [39]. This also makes them generally unsuitable for subsequence anomalies. Even in the deep learning context, forecasting-based models can only make short-term predictions with acceptable accuracy.

There is certainly a place for forecasting-based models in medical time series analysis, however. For any real-time applications where early or real-time anomaly detection is important, forecasting models are crucial. An example would be patients on telemetry or in the ICU, where blood pressure, respiration, and ECG may, individually or in combination, signal an impending collapse. This may also be the case in predicting the onset of seizures with EEG or cardiac arrest with ECG, where any advance notice of a critical event may significantly alter outcomes.

#### *3.1.2.2 Reconstruction*

Reconstruction-based models are not subject to the constraints of prediction models. With this type of model, normal behavior is modeled by encoding subsequences of normal training data (usually input as a sliding window that provides the temporal context for each datum) into a lower-dimensional latent space. In a semisupervised context where the model is trained on only normal data, the model should be incapable of generating an anomalous output sequence. Anomalies are therefore detected by reconstructing a subsequence/sliding window from the test data and comparing it to the actual values, which generates a "reconstruction" error. Anomalies are generally flagged when the reconstruction probability falls below a specified threshold.

Most deep learning methods including generative models such as autoencoders (AEs), variational autoencoders (VAEs), generative adversarial networks (GANs), and transformers use reconstruction error as the anomaly score [29, 33]. Although these models are different in their architectures, training, and objective functions, most approaches using one of these models calculate anomaly scores as reconstruction errors. Note that in multivariable time series with multiscale spatiotemporal dependencies deciding what exactly constitutes "similarity" may be difficult. Fortunately, analytically defining similarity is usually not necessary and the anomaly score is often related to the loss function of the model. While reconstruction-based AD methods are fairly intuitive and quite widely used, they can be plagued with difficulties such as computational cost for data reconstruction, mode collapse, nonconvergence, and instability [33, 50].

#### *3.1.2.3 Distance/dissimilarity*

Distance-based models are based on a similarity metric that flags anomalies if their distance from normal is past a certain threshold. Clustering is an unsupervised machine learning model that is effective for grouping data and detecting anomalies; it involves mapping the time series data into a multidimensional space where the data are grouped near centroids based on feature similarity. Anomalies are then detected if they are far from existing clusters or have low probability of belonging to a cluster. Examples of clustering methods include K-means algorithm, one-class support vector machine (OCSVM), and Gaussian mixture model (GMM) [33]. More sophisticated machine learning methods such as Dynamic Time Warping provide a more complex comparison of temporal sequences (or subsequences, usually determined with a sliding window of fixed length) by allowing nonlinear alignment between sequences that are locally out of phase [11, 45]. Clustering methods are currently the benchmarks for anomaly detection in time series [39] and some recent studies demonstrate that many advanced algorithms do not deliver improved performance on basic univariate time series in comparison to more traditional methods and may in fact result in inferior

*Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

performance [47]. The performance of clustering methods is generally degraded on complex high-dimensional datasets [33, 43], however. In such cases, methods to reduce data dimensionality such as expert opinion or various feature selection and extraction techniques such as deep autoencoders, principal component analysis (PCA) [51], and multidimensional scaling (MDS) [52, 53] can be used, in effect utilizing a hybrid approach using deep learning for dimensionality reduction and clustering methods for anomaly detection.

#### **3.2 Capturing temporal context**

The history of a sequence contains significant information regarding its potential future behavior and most deep learning approaches in some way depend on modeling the temporal dimension in order to explicitly capture the past during reconstruction or prediction.

#### *3.2.1 Input*

Input shape is essential to capturing temporal context and can take the form of individual (multidimensional) points or windows, which consist of a subsequence that contains some portion of the historical information. The width of the window is usually predetermined and can be based on the known or estimated/expected characteristics of the dataset and the presumed underlying process. Windows can be advanced by some number of data points (window step) and used in order (sliding windows) or they can be shuffled and entered out of order depending on the application and dataset. To specifically address the challenge of comparing subsequences rather than points, many models use representations of subsequences instead of the raw data. A sliding window decomposition/extraction is usually performed in the preprocessing stage after other operations such as missing value imputation, changing the sampling rate, or data normalization, have been completed [38, 39].

#### *3.2.2 Temporal modeling*

Several approaches to modeling temporal context and dependencies are commonly used in deep learning models. These essentially provide ways of organizing "memory," which amounts to in some way utilizing appropriately weighted previously encountered data in order to generate current or future output. These approaches may constitute the model architecture itself or they may be utilized at the midlevel of network dynamics and can be combined with various more basic activation functions as well as higher-level deep learning architectures.

#### *3.2.2.1 Recurrent neural networks*

Recurrent neural networks (RNNs) are a class of neural networks specifically designed for modeling sequential data and are well suited for capturing temporal and long-term dependencies. Unlike traditional feedforward neural networks, RNNs have a recurrent connection that allows information to be looped back and processed at each time step. This enables RNNs to retain memory of previous time steps that they can use to inform predictions at subsequent time steps. The hidden state of the RNN serves as an internal representation that evolves over time, capturing the context and

history of the time series [52, 53]. A major shortcoming of RNNs is their instability due to the vanishing or exploding gradient problem, where the learning gradient becomes extreme as the network becomes deeper [54]. The earliest and most widely used RNN modification designed to address this problem is the long short-term memory (LSTM) unit [55], which avoids the problem by controlling retained information through a memory cell and input, output, and forget gates. If the LSTM unit detects an important feature from an earlier input sequence, it can carry this information over an extended distance, sometimes up to thousands of steps [39, 55]. A simpler and more computationally efficient but similarly effective modification is the gated recurrent unit (GRU) [56], which modulates the flow of information inside the unit but without a separate memory cell [52, 53]. Both LSTM and GRU cells are able to learn long-term dependencies by determining the number of weighted previous states to keep or forget at each time step. Both types of cells have been used with success in time series anomaly detection [39].

#### *3.2.2.2 Convolutional neural networks*

Convolutional neural networks (CNNs) have traditionally been used for image analysis but can be adapted to time series data and in some applications demonstrate better performance than RNNs [57–59], which still remain the most commonly used approach to temporal modeling. CNNs treat time series data as a one-dimensional (1D) array rather than a sequence and employ convolutional operations to capture local patterns and dependencies within the data. By applying one-dimensional convolutions, CNNs can learn hierarchical representation of time series and automatically extract relevant features at different timescales. Pooling layers can be added to downsample the output and reduce dimensionality. The learned features can then be fed into fully connected layers for classification or regression tasks [29, 33]. Anomaly detection can be performed by training the model on a normal dataset and then computing prediction or reconstruction error during inference. CNNs can be used for multidimensional time series as well as for real-time detection of anomalies; however, the extensive computational demands of CNNs make them less efficient for real-time monitoring. A specific approach for time series, the temporal convolutional network (TCN) [60], uses one-way convolutions in order to maintain temporally ordered/ causal relations in the data. The TCN can generate sequences of any length and employs dilated convolutions, where the receptive field of the convolutional filters expands exponentially, which allow the model to capture long-range dependencies in time series.

Convolutional neural networks and RNNs can be combined in the same network architecture in order to capture spatial and temporal dependencies distinctly but simultaneously. In such models, CNNs are usually used to capture local patterns and features, with the output then fed into a RNN (either LSTM or GRU) that then processes the features extracted from the CNN in a sequential manner. The RNN performs the task of modeling the temporal context and relationships between the features extracted by the CNN [34, 61].

#### *3.2.2.3 Attention*

The attention mechanism, initially popularized in the context of natural language processing [62], has been extended to handle general time series data and provides a

#### *Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

recent alternative to both RNN and CNN models. In the context of time series analysis, the attention mechanism allows the model to focus on specific temporal segments or patterns within the input sequence that are most relevant for making predictions. Using attention, the model can dynamically weigh the contribution of each time step based on its relevance, rather than relying solely on a fixed window or fixed-size context. Attention mechanisms typically involve a scoring mechanism that calculates the relevance or attention weight for each time step, followed by a weighted combination of the time step representations to produce a context vector that is then used for making predictions or further processing. Attention mechanisms in time series analysis have shown effectiveness in tasks such as sequence classification, forecasting, and anomaly detection, as they enable models to selectively attend to relevant temporal information while disregarding irrelevant or noisy segments of the time series [63–65].

#### *3.2.2.4 Graph neural networks*

Graph neural networks (GNNs) for time series data combine the power of graph structures with the ability to model temporal dependencies. GNNs enable the representation and analysis of time series data as graphs, where each data point is a node and the temporal relationships between them are represented as edges [66]. By incorporating recurrent or convolutional mechanisms, GNNs can capture the dynamics and patterns of time series data within the graph framework. These models leverage information from neighboring nodes and the temporal context to make predictions or perform tasks such as forecasting, classification, or anomaly detection. GNNs for time series offer a flexible and effective approach for handling complex temporal dependencies while leveraging the benefits of graph representations, enabling improved understanding and analysis of time-varying data [66]. GNN ideas have been implemented in graph convolutional networks [67] and graph attention networks [68] and are a promising future direction.

#### **3.3 Summary**

Deep learning models can automatically learn representations from raw time series data and capture both local and global dependencies, making them well suited for anomaly detection tasks. One of the key advantages of using deep learning for anomaly detection in time series is the ability to handle highdimensional and complex data. Deep learning models such as RNNs, CNNs, and their variants, have been successfully applied to capture temporal dependencies, spatial patterns, and multiscale features in time series data and can effectively extract meaningful representations from the input data and detect anomalies based on deviations from learned normal patterns. These medium-scale models can be combined and integrated into larger scale unsupervised architectures such as autoencoders and GANs, which have been widely employed to learn compact representations of normal patterns in data. By reconstructing the input data or generating synthetic data samples, these models can detect anomalies by measuring the reconstruction error or the divergence between the real and generated data. More recent ways of capturing the temporal context include attention and GNNs, and these have shown promise in anomaly detection in time series.

#### **4. Generative adversarial models for anomaly detection**

#### **4.1 Motivation for the use of GANs in medical data**

The key motivation for utilizing GANs for anomaly detection in medical time series is that they almost directly address many of the challenges specific to medical time series discussed above [1, 14, 16–21]. Since GANs can learn to recreate normal data patterns and detect anomalies as deviations from these patterns, they are optimized to operate in an unsupervised setting; they can easily be combined with a temporal model (e.g., RNN or Attention) to model complex temporal and multivariate dependencies due to their ability to capture intricate data structures and patterns, nonlinearities, and high-dimensional relationships and they can do so in high-dimensional datasets involving multiple variables recorded over time; they can generate synthetic realistic data that can supplement original medical datasets, which often suffer from lack of volume and diversity, thus aiding in the training of more robust and generalizable models; and they have shown impressive generalization capabilities, allowing them to learn representations that generalize well to unseen data, which are particularly valuable in anomaly detection in medical time series, where the ability to accurately detect anomalies in previously unseen or rare cases is crucial.

#### **4.2 GAN overview**

Generative adversarial networks are a class of deep learning models introduced by Goodfellow et al. in 2014 [69] designed to generate synthetic data that resemble the training dataset by learning and reproducing the distribution inherent in the data. A GAN consists of two parts, a generator (G) and a discriminator (D), which are usually implemented as neural networks. Both networks play a two-player minimax game, in which G tries to generate data that D cannot distinguish from the real training data, while D tries to correctly classify data as real (from the training data) or fake (from G).

The generator's role is to produce synthetic data samples, such as images, texts, or audio, which mimic the distribution of the training data. The generator is a mapping function that projects random noise or an input vector sampled from a prior distribution (which may be uniform or Gaussian) to the data space. The network gradually learns to transform the noise into output that is generated from the dataset distribution and is ideally indistinguishable from the training data. The discriminator, on the other hand, acts as a binary classifier. It is trained to distinguish between real data samples from the training set and synthetic samples generated by the generator. The discriminator takes both real and generated samples as inputs and outputs a probability score indicating the likelihood that the input is real. The objective of the discriminator is to learn to accurately classify the data samples.

The generator and the discriminator are trained simultaneously. Initially, the generator produces poor quality outputs and the discriminator can easily tell the difference between real and fake. As training progresses, the generator becomes better at generating fake outputs that appear real, and the discriminator becomes better at distinguishing between the real and the fake. The training continues until a point at which the discriminator can no longer distinguish fake data from real data (Nash equilibrium). The optimization process uses backpropagation and an optimization algorithm (like stochastic gradient descent) to adjust the parameters of G and D. The loss function used for the training is typically binary cross-entropy.

*Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

In the context of anomaly detection, after the GAN is trained, new data instances can be fed to the trained generator, which then attempts to reconstruct or regenerate these instances. The regenerated samples are then compared to the original data. If the difference between the original and the regenerated sample surpasses a defined threshold, the data instance is flagged as an anomaly. The underlying assumption is that the GAN will be less successful in accurately reproducing instances that are significantly different from the distribution it was trained on, i.e., anomalous instances. This procedure allows for effective anomaly detection in an unsupervised setting [27, 29–33].

The objective function summarizing this process is:

$$\text{Min}\_{\text{G}}\text{Max}\_{\text{D}}f(\mathcal{D},\mathcal{G}) = \mathbb{E}\_{\mathbf{x}\sim p\_{\text{data}}(\mathbf{x})}[\log\left(\mathcal{D}(\mathbf{x})\right)] + \mathbb{E}\_{\mathbf{z}\sim p\_{\text{z}}(\mathbf{z})}[\log\left(\mathbf{1} - \mathcal{D}(\mathcal{G}(\mathbf{z}))\right)] \tag{1}$$

Where *<sup>x</sup>* and *<sup>z</sup>* are the expectation values over real and random data samples, respectively, D(x) is the probability estimate of the discriminator D if x is real, G(z) is the output of the generator G for a given vector z as input, and D(G(z)) is the probability estimate of the discriminator D that the fake generated sample G(z) is real.

Generative adversarial networks have achieved remarkable success in various domains, including image synthesis, video generation, text generation, and more and have been applied to additional tasks such as image-to-image translation, style transfer, super-resolution, and data augmentation. The model structure described above is considered to be "vanilla" GAN. Since it is the most basic, the GAN model has been modified and extended to address specific challenges. For example, Wasserstein GAN [70] was developed to address unstable optimization and mode collapse; it shares the same minmax training procedure with the original GAN model but adjusts the loss function to minimize the Wasserstein distance between the real and fake data distributions. Conditional GAN (C-GAN) [71] adds conditioning information to both the generator and discriminator, which allows the model to generate data based on the chosen set of conditional parameters. Deep Convolutional GAN (DC-GAN) [72] uses convolutional layers in both generator and discriminator in order to generate more realistic images. Self-attention GAN (SA-GAN) [73] adds the self-attention mechanism to the convolutional GAN in order to model long-range, multilevel dependencies across image regions in order to avoid using only spatially local properties for generating high-resolution images. In Bidirectional GAN (Bi-GAN) [74], an encoder is added to the generator and discriminator in order to map the data into a latent space, enabling learning of a bidirectional mapping between the data space and the latent space. Finally, Cycle-consistent GAN (CycleGAN) involves training two GANs simultaneously using cycle consistency loss in order to encourage learned mappings in both directions and optimize image-toimage translation [75].

#### **4.3 GANs for anomaly detection**

Generative adversarial networks have proven to be effective as an unsupervised anomaly detection technique and have overcome significant challenges that are common to medical datasets such as a lack of adequately labeled datasets, dearth of anomalous data, and unbalanced datasets [28–31]. The following is a brief summary of the major models, which constitute the core models that have been adapted and modified for different applications.

#### *4.3.1 AnoGAN*

The AnoGAN model [76] was one of the earliest uses of GANs for anomaly detection. It uses a deep convolutional GAN (DC-GAN) architecture (see above) trained on normal data. Once the model is trained, anomaly scoring for a new instance is performed by calculating the anomaly score as a discrepancy between the instance and its reconstructed version obtained from the latent (random) space of the GAN. To reconstruct an instance, AnoGAN employs an optimization process that finds the closest point to it in the latent space by minimizing the difference between the reconstructed output and the original input using gradient descent and updating the latent code iteratively until convergence. The anomaly score is then calculated based on the difference between the original instance and the reconstructed output. A problem with this approach is that the GAN only implicitly models the data distribution and the optimization procedure for recovering the latent representation of a given sample is computationally costly and not practical for large datasets [29, 31]. The same authors followed up [77] with a modified model based on Wasserstein GAN, fast unsupervised anomaly detection with generative adversarial networks (f-AnoGAN), which substantially sped up the process of mapping to the latent space by moving from an iterative gradient descent approach to a learned mapping, which made the model more suitable for real-time anomaly detection.

#### *4.3.2 Efficient GAN (E-GAN)*

Efficient GAN [78] is based on AnoGAN, but it uses the bidirectional GAN (Bi-GAN) rather than a DC-GAN and incorporates an encoder into the architecture in order to alleviate the computational complexity associated with inference. Here, the discriminator separates two joint distributions: the given sample and the corresponding latent space (the output of the encoder) versus the original latent space and its generated synthetic sample (the output of the generator). The encoder acts as a regularization mechanism, helping to mitigate mode collapse and stabilizing the training process as well as resulting in significantly improved efficiency in detecting anomalies.

#### *4.3.3 GANomaly*

GANomaly [79] represents an addition of an autoencoder to AnoGAN in order to learn both the image and latent representations jointly. Here, the generator is constructed of an encoder and decoder, with an additional encoder that takes the output from the generated sample space and maps it back to a latent space. The discriminator as usual compares the generated samples to the original data. The total training loss then consists of the reconstructed loss in the latent space, the reconstructed loss in the sample space, and the adversarial loss in the sample space. The anomaly score is based on the encoder loss [31]. An important aspect of this model is that the space used for comparison is from the original sample space rather than depending on random sampling from a latent space as in other GAN-based models. This results in a model with high detection accuracy that has been adapted extensively to specific applications. Skip-GANomaly, for example, adds skip connections between the encoder and generator, leading to improved reconstruction accuracy and thus more accurate anomaly detection [29].

*Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

#### **4.4 GANs for anomaly detection in time series**

Generative adversarial networks offer a powerful framework for modeling and generating complex data distributions, making them well suited for capturing the intricate patterns and dynamics present in time series data. Several types of GAN architectures have been adapted for anomaly detection in time series. The following is a selected review of several important models.

#### *4.4.1 TAnoGAN*

Time Series Anomaly Detection with Generative Adversarial Networks (TAnoGAN) [80] is the simplest GAN-based model adapted specifically for time series. It is similar to the AnoGAN model, except that rather than using deep CNNs, both the generator and the discriminator are composed of LSTM layers in order to model temporality. A sliding window inputs real subsequences and the generator outputs simulated sequences of the same length, which are then compared by the discriminator using pointwise distance. A shortcoming of this model is that it is incapable of dealing with multivariate time series.

#### *4.4.2 MAD-GAN*

Multivariate Anomaly Detection GAN (MAD-GAN) [81] was designed specifically to address anomaly detection in multivariate time series. MAD-GAN also consists of LSTM layers in the generator and discriminator, and its training is similar to that of TAnoGAN. The model generates a residual loss based on the idea that the generator implicitly models the data distribution by learning to map it back into its latent space. During inference, this residual loss is combined with the standard discrimination loss to determine whether the sample is abnormal or not.

#### *4.4.3 TadGAN*

Time series anomaly detection using generative adversarial network (TadGAN) [82] also uses the LSTM structure in its generator and discriminator, but introduces an encoder in order to generate the latent space from which the generator produces synthetic data samples (rather than from a random latent space as usual). There are two discriminators, one for the samples and one for the latent space. The model uses Wasserstein distance for the discriminator loss and cycle consistency loss for the reconstruction error, which are computed through a combination of pointwise difference, area difference, and dynamic time warping. Anomalies are detected through a combination of reconstruction and discriminator errors.

#### *4.4.4 Beat-GAN*

Beat-GAN [59] was specifically developed for ECG anomaly detection. It utilizes an encoder-decoder architecture as the generator, and both the encoder and decoder are CNNs rather than LSTMs. They deal with temporality by utilizing a onedimensional filter sliding along the temporal dimension. An interesting contribution that this model uses in order to deal with the inherent nonstationarity of their data (heart rate variability) is a modified form of time warping, where data are imputed during decelerations and removed during accelerations in order to generate a steady

"beat." The model functions essentially like an autoencoder, but uses the discriminator for regularization to improve stability. During inference, anomalies are detected by a combination of pairwise reconstruction error and discriminator error.

#### **5. GANs in medicine**

#### **5.1 GANs in medical imaging**

Due to their capacity to generate realistic images, it is not surprising that the most common use of GANs in medical imaging has been primarily in data synthesis [83–85]. While some of these uses do not directly involve anomaly detection, given that they are in some way involved in detecting and diagnosing specific disease or behavioral states, they are at least obliquely related [21, 28]. The following are some of the most common current applications.

#### *5.1.1 Data augmentation*

As previously discussed, scarcity of labeled data represents one of the main limitations to the application of deep learning in medicine [86, 87]. Medical datasets are often imbalanced and lack diversity, which can lead to biased or poor-performing models. GANs have been shown to be able to generate synthetic medical data, helping to augment existing datasets, rectify imbalance, increase diversity, and improve the performance of machine learning models. A key advantage of being able to generate synthetic data with the same statistical characteristics as the original data but without personal health information is the ability to widely share and analyze data without the risk of violating patient privacy, which is often a barrier to producing large public datasets.

From an anomaly detection perspective, data synthesis can be used to turn an unsupervised or semi-supervised anomaly problem into a supervised binary (or multiclass) classification problem: GANs are used to generate synthetic data that statistically resemble anomalies, which can then be used for balanced classification training in a supervised manner. This is, in fact, the most common current application of GANs in anomaly detection [21, 29, 30] and has been applied to images [83, 84] as well as time series such as ECG [88] and EEG [89]. While this approach to anomaly recognition is more straightforward than classical anomaly detection, it assumes that the known anomalous data cover the entire distribution of possible anomalies, which may not be the case and could result in excellent classification of known anomalies but possible misclassification of unknown ones [39, 85].

#### *5.1.2 Image-to-image translation*

A powerful application specific to GANs is their ability for image-to-image translation such as converting MRI images to computed tomography (CT) images or vice versa, enhancing the quality of medical images, or generating angiography images from MRI images. There are multiple potential benefits to this, including decreased need for multimodal studies, reduced acquisition time and radiation exposure, and increased availability of appropriate imaging in cases where there is limited access to multimodal imaging, e.g., in a clinical scenario where a certain imaging modality (such as MRI) might be optimal for diagnosis or treatment planning but only another *Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

modality (such as CT) is available to the patient. The predominant type of GAN used for this application is CycleGAN [21, 90–93].

#### *5.1.3 Super-resolution/denoising*

Generative adversarial networks can also be used to increase the resolution of medical images, known as super-resolution, and to reduce noise and artifacts. Since the quality of medical images is often degraded by various factors such as hardware limitation or patient movement, this can be extremely helpful for discernment of fine details that can be critical for correct diagnosis. Super-resolution and noise reduction are types of image-to-image translation that involve converting low-resolution images into high-resolution images by imputing data. This is especially interesting in cases where the imaging modality is intrinsically low-resolution, such as positron emission tomography (PET) [94–99].

#### *5.1.4 Image segmentation*

Image segmentation is important for measuring and visualizing anatomical structures, delineating pathological regions, and for surgical planning and image-guided interventions. The process of applying GANs to image segmentation is slightly different than that of applying vanilla GAN: the generator now aims to create an image where each pixel corresponds to a particular class label and the discriminator attempts to differentiate between the ground-truth segmentation (real) and the generator's segmentation (synthetic). This is again a type of image-to-image translation for which GANs have been used successfully [100–102].

#### *5.1.5 Disease progression modeling*

Disease progression modeling involves predicting how a disease will develop in a patient over time, which can allow for early intervention, optimized treatment plans, and better patient outcomes. In the context of disease progression modeling, the generator could be conditioned on a particular disease stage or on past medical history in order to generate synthetic data predicting what could potentially happen at the next disease stage or time point. Different models or approaches might be used to handle different kinds of data (continuous, discrete, etc.) and different diseases. GANs have been successfully applied to tumor growth [103] and Alzheimer's disease prediction [104].

#### *5.1.6 Brain decoding*

Brain decoding involves using machine learning algorithms to map patterns of brain activity (measured via EEG or fMRI) to mental states or processes. For example, using visual image reconstruction decoding researchers have been able to reproduce images a person is viewing directly from brain activity [105–107]. This process is often referred to as "mind reading" or "brain-to-image reconstruction". The most direct clinical applicability of this technique at this time is in brain-computer interfaces, which would potentially allow disabled and paralyzed individuals to communicate and control external devices more easily. The technique also offers the potential for significant advancements in the understanding of the biology of consciousness and mind-body medicine.

#### **5.2 GANs in medical time series: ECG**

#### *5.2.1 ECG overview*

Electrocardiography (ECG or EKG) is a diagnostic tool that records the electrical activity of the heart over a period of time through electrodes placed on the skin, typically in a standard 12-lead setup for a clinical ECG. The ECG waveform represents the electrical depolarization and repolarization of the cardiac muscle during each heartbeat and can provide a large amount of valuable information such as the heart rate, rhythm, and the size and position of the chambers. It can also show evidence of damage to the cardiac muscle (ischemia or infarction), effects of drugs or devices (such as a pacemaker), and other types of heart disease or conditions (e.g., pericarditis, electrolyte imbalances). A significant advantage of ECG is that it is noninvasive, inexpensive, and relatively quick to perform. However, it requires expert interpretation, and while it is highly valuable, it may not provide a definitive diagnosis on its own and may need to be combined with other studies [108, 109].

#### *5.2.2 GANs in ECG anomaly detection*

As discussed above, Beat-GAN was specifically designed to detect anomalies in ECG. It outperformed other anomaly detection methods (including OCSVM) and achieved high accuracy and fast inference. It was also to some extent interpretable since it was able to pinpoint anomalies in sample space. The model has also been applied successfully to time series in other domains [59]. Shin et al. [110] deployed the AnoGAN architecture, but modified it with extensive data preprocessing as well as dimensionality reduction with t-distributed stochastic neighbor embedding (t-SNE), which they utilized to generate an experimentally determined objective decision boundary that could effectively differentiate between normal signal and arrhythmia based on anomaly score.

Li et al. [111] proposed Single-Lead convolutional generative adversarial network (SLC-GAN) for automated myocardial infarction (MI) in single-lead ECG. The model involves a GAN with multiple convolutional layers (DC-GAN) in the generator and discriminator with an added CNN classifier for MI detection. The GAN portion learns to generate synthetic ECG data, which are then used to augment the volume of the training data for the classifier. The model achieved excellent classification accuracy and provides a good example of using synthetic data in order to change the problem parameters from unsupervised to supervised and improve performance. Xia et al. [112] extended this idea by applying a transformer model in the generator with a CNN discriminator. This model was then used to generate synthetic data, which were used to augment training of a classifier that combined a CNN-based feature extraction block followed by a bidirectional LSTM (Bi-LSTM) architecture. The overall model achieved superior performance and demonstrated improved classification than models that did not use added synthetic data. A similar conclusion was reached by Rath et al. [113], who tested several machine and deep learning methods and found that the best performance on ECG classification was achieved by a GAN-LSTM ensemble model.

Qin et al. [114] proposed ECG-ADGAN, a semi-supervised model that incorporates a Bi-LSTM network in addition to multiple 1D convolutional layers into the GAN generator in order to preserve temporal patterns of the ECG signal. Training takes place in two stages, with stage I following normal GAN training to Nash equilibrium

*Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

between the generator and discriminator and stage II freezing the generator and training the discriminator/classifier separately specifically for anomaly detection. The authors also utilized mini-batch training during stage I to improve convergence. The model demonstrated superior detection of unknown anomalies when compared to supervised learning methods.

Wang and colleagues [115] further extended this approach beyond binary classification (normal vs. abnormal) by incorporating GAN into a two-level hierarchical framework in order to not only detect but also classify different types of arrhythmias. The first level consists of a memory-augmented deep autoencoder with GAN (MadeGAN) designed to perform anomaly detection; the second level consists of a multibranching deep CNN architecture utilizing transfer learning to allow classification of different types of heart disease given the fundamental imbalance in the training dataset. This framework was able to effectively capture disease-altered features of ECG signals and accurately predict and classify heart disease with better performance compared to existing methods.

#### **5.3 GANs in medical time series: EEG**

#### *5.3.1 EEG overview*

An electroencephalogram (EEG) is a neuroimaging technique used to record the electrical activity of the brain. It is carried out using multiple electrodes placed on the scalp according to a standardized placement system, usually the 10–20 system. These electrodes measure voltage fluctuations resulting from ionic current flows within neural populations in the brain. The resulting traces, EEG waves, represent the summation of postsynaptic potentials (PSPs) from a large number of neurons, specifically from cortical pyramidal neurons, detected as fluctuations in voltage over time [116].

EEG waves are characterized by their frequency (measured in Hertz), amplitude (measured in microvolts), and waveform morphology. They are typically categorized into bandwidths: delta (0.5–4 Hz), theta (4–8 Hz), alpha (8–12 Hz), beta (12–30 Hz), and gamma (30–100 Hz), each of which may correspond to different states of brain activity or consciousness. For instance, alpha waves are typically associated with relaxed, closed-eye states, while beta waves are associated with active thinking, attentional focus, or rapid eye movement (REM) sleep [116].

EEG can be used for the detection and study of various neurological and psychiatric conditions such as epilepsy, sleep disorders, encephalopathies, and even cognitive processes. The main advantage of EEG in cognitive imaging is its high temporal resolution, which allows for the study of fast-dynamic processes within the brain. Its spatial resolution is limited, however, and it is less effective for capturing activity occurring deep within the brain. Anomaly detection in EEG traces involves all the challenges discussed above: multidimensional time series with complex dependencies, highly complex and nonstationary normal behavior, and rarity and extreme heterogeneity of disease patterns, from normal brain aging to active seizures.

#### *5.3.2 GANs in EEG epilepsy detection*

Epilepsy is a chronic neurological disorder marked by recurrent seizures, which are symptoms of abnormal excessive or synchronous neuronal activity in the brain. Seizures can vary greatly in their presentation, from minor sensory disturbances or momentary lapses in consciousness ("absence seizures") to full-body convulsions

("grand mal seizure"). EEG is crucial in diagnosing and localizing the disorder, as well as in detecting seizures when they occur. Seizures are characterized by a variety of EEG patterns and frequencies such as spike-and-wave discharges (70 ms waveforms often followed by a slow wave, with specificity based on the frequency band), sharp waves (typically seen in focal seizures), polyspikes (typically seen in generalized seizures), focal slowing (which helps with localization and can be seen before, during, or after focal seizures), or generalized paroxysmal fast activity (rapid continuous spiking typically seen in severe diseases such as Lennox-Gastaut syndrome). Seizure detection and monitoring has an important role in diagnosis, improving quality of life, and general understanding of the disease. For example, alerting a patient about an impending seizure might allow them to take appropriate safety precautions or breakthrough medications for seizure control [117–121]. In this context, automatic seizure detection or prediction essentially consists of a binary classification task between the ictal (seizure) or pre-ictal and nonictal (nonseizure) EEG patterns.

A number of machine and deep learning approaches have been applied to epileptic seizure detection and prediction, but these predominantly apply feature extraction such as wavelets or independent component analysis (ICA) followed by a classifier such as random forest or support vector machine (SVM) [122]. One of the only models to directly apply a semi-supervised GAN model to seizure detection was introduced by You et al. [119], who modified the AnoGAN model (DC-GAN architecture) for seizure detection in a behind-the-ear EEG two-channel signal. The EEG signal channels were filtered and transformed into spectrogram images, which were combined to form a virtual channel image that was fed to the network for training. The GAN was trained on normal data and anomalies were detected using a combination of residual and discriminative loss. The authors noted that the addition of a Gram matrix of the feature maps from each convolutional layer was shown to improve performance. The model demonstrated a 96.3% sensitivity for automated seizure detection.

Zhu and Shoaran [123] utilized an unsupervised adversarial model to map power spectrum features from intracranial EEG recordings into a subject-invariant feature space via domain transfer learning. The model consisted of an encoder and decoder functioning as a generator for both the source (labeled data) and target (unlabeled or new data) domains, with a discriminator designed to try to differentiate between the domains. A discriminative model was then trained on the resultant subject-invariant features to generate predictions about the target patients. The model demonstrated improved performance compared to the more conventional subject-specific approach, allowing for better generalization.

In contrast, Truong et al. [124] used a GAN to extract features from the EEG data that could then undergo binary classification. The generator was trained to synthesize realistic short-time Fourier transform images from a noise vector that were then passed through the discriminator. After training, the discriminator was able to collect and flatten features that could be used with any generic classifier, in this case two fully connected layers. The model again consisted of a modified DC-GAN architecture that was trained in an unsupervised manner where information regarding seizure onset was disregarded.

While there are very few applications of GANs directly to seizure detection, most applications use GANs in order to produce balanced training datasets with generated synthetic data, again effectively changing the problem from an unsupervised to a supervised one. One of the first applications of GANs to EEG data generation was proposed by Pascual et al. [125] who used their conditional GAN model to generate

#### *Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

synthetic ictal signals conditioned on interictal data from individual patients. The model consisted of a convolutional autoencoder as the generator, in this case translating the latent code into an ictal sample rather than restoring the original sample. The discriminator had the same architecture as the encoder of the generator and was trained to distinguish between real and fake ictal signals. The inclusion of synthetic samples resulted in improved classification performance compared to training on only real samples. The additional benefit of the synthetic procedure, as the authors point out, was deidentification of the original data and significant improvement in data privacy. Multiple additional groups have applied similar data augmentation approaches with various modifications, including different feature extraction methods, different generator and discriminator architectures, different loss functions, utilization of LSTM/GRU cells or attention instead of CNNs, and the application of different classifiers [126–143].

#### **6. Conclusion: challenges and future directions**

Although GANs have shown significant promise in anomaly detection in medical time series, serious challenges remain, such as (notorious) training instability, lack of clear evaluation metrics for generated data, limited interpretability, inability to explicitly model causal relationships, question of preservation of temporal dynamics, privacy concerns, and ethical considerations in the use of generated data. For example, evaluating the performance of GAN-based anomaly detection in time series is nontrivial since traditional evaluation metrics such as precision, recall, or F1-score or even more advanced techniques such as maximum mean discrepancy or Fréchet inception distance [144, 145] may be unable to fully capture the highly complex characteristics of the data. Additional issues such as fairness in AI models also need to be considered to ensure that algorithmic decisions do not discriminate against certain demographic groups.

Future research efforts should focus on refining training techniques, incorporating domain knowledge, and developing hybrid approaches to enhance the performance and applicability of GANs in anomaly detection in time series data. Since GANs were originally designed to generate realistic images, their use is still predominantly focused on the generation of synthetic data. This is, of course, extremely useful as it can help resolve issues of unbalanced datasets and lack of anomalous and labeled data and transform unsupervised or semi-supervised approaches to anomaly detection into more manageable supervised problems. Direct anomaly detection with GANs is becoming more common, however, as different research groups realize that applying generative deep learning to data directly is feasible and provides significant advantages (and since generation of synthetic data inherently biases the network away from correct classification or detection of unknown or unusual anomalies). While GANs have been applied successfully, at least initially, in anomaly detection in time series such as ECG and EEG, there are many other areas where they could be extremely useful, such as rs-fMRI, ICU physiological data, or even monitoring of medical records for the possibility of subtle signs of chronic disease. In these areas, the application would leverage the model's ability to learn the underlying distribution, and modifications such as adding or embedding different modules (such as autoencoders) into the adversarial structure and/or including recurrence, attention, or graph structures could serve to better model the long-range spatiotemporal dynamics and dependencies in the data. Additional potential areas of future research include building patient-specific models, improving generality through transfer learning, and real-time applications in the monitoring of healthcare data. Finally, there is a need for more research on the practical deployment of GANs in clinical settings, which involves not only technical considerations but also evaluation in terms of clinical outcomes, cost-effectiveness, user experience, and ethical, legal, and societal implications.

### **Author details**

Miloš Cekić University of California, Los Angeles, Los Angeles, California, USA

\*Address all correspondence to: mcekic@mednet.ucla.edu

© 2023 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

#### **References**

[1] Ching T, Himmelstein DS, Beaulieu-Jones BK, Kalinin AA, Do BT, Way GP, et al. Opportunities and obstacles for deep learning in biology and medicine. Journal of the Royal Society Interface. 2018;**15**:2017038. DOI: 10.1098/ rsif.2017.0387

[2] Roy S, Meena T, Lim SJ. Demystifying supervised learning in healthcare 4.0: A new reality of transforming diagnostic medicine. Diagnostics. 2022;**12**:2549. DOI: 10.3390/diagnostics12102549

[3] Kaushik S, Choudhury A, Sheron PK, Dasgupta N, Natarajan S, Pickett LA, et al. AI in healthcare: Time-series forecasting using statistical, neural, and ensemble architectures. Frontiers in Big Data. 2020;**3**:4. DOI: 10.3389/fdata. 2020.00004

[4] Wang WK, Chen I, Hershkovich L, Yang J, Shetty A, Singh G, et al. A systematic review of time series classification techniques used in biomedical applications. Sensors. 2022; **22**(20):8016. DOI: 10.3390/s22208016

[5] Kline A, Wang H, Li Y, Dennis S, Hutch M, Xu Z, et al. Multimodal machine learning in precision health: A scoping review. NPJ Digital Medicine. 2022;**5**:171. DOI: 10.1038/s41746-022-00712-8

[6] Fernando T, Gammulle H, Denman S, Sridharan S, Fookes C. Deep learning for medical anomaly detection—A survey. ACM Computing Surveys. 2021;**54**(7): 141. DOI: 10.1145/3464423

[7] Tschuchnig ME, Gadermayr M. Anomaly detection in medical imaging— A mini review. arXiv. arXiv preprint arXiv:2108.11986. 2021. DOI: 10.48550/ arXiv.2108.11986

[8] Samariya D, Ma J. Anomaly detection on health data. In: Traina A, Wang H,

Zhang Y, Siuly S, Zhou R, Chen L, editors. Health Information Science (HIS 2022). Cham, Switzerland: Springer Nature; 2022; LNCS, (13705):34-41. DOI: 10.1007/978-3-031-20627-6\_4

[9] Chalapathy R, Chawla S. Deep learning for anomaly detection: A survey. arXiv. arXiv preprint arXiv: 1901.03407. 2019. DOI: 10.48550/ arXiv.1901.03407

[10] Pang G, Shen C, Cao L, Van Den Hengel A. Deep learning for anomaly detection: A review. ACM Computing Surveys. 2022;**54**(2):1-38. DOI: 10.1145/ 3439950

[11] Li G, Jung JJ. Deep learning for anomaly detection in multivariate time series: Approaches, applications, and challenges. Information Fusion. 2023;**91**: 93-102. DOI: 10.1016/j.inffus.2022. 10.008

[12] Gui J, Sun Z, Wen Y, Tao D, Ye J. A review on generative adversarial networks: Algorithms, theory, and applications. IEEE Transactions on Knowledge and Data Engineering. 2023; **35**(4):3313-3332. DOI: 10.1109/ TKDE.2021.3130191

[13] Aggarwal A, Mittal M, Battineni G. Generative adversarial network: An overview of theory and applications. International Journal of Information Management, Data Insights. 2021;**1**: 100004. DOI: 10.1016/j.jjimei.2020. 100004

[14] Dash A, Ye J, Wang G. A review of generative adversarial networks (GANs) and its applications in a wide variety of disciplines–from medical to remote sensing. International Journal of Applied Earth Observation and Geoinformation.

2021;**108**:102734. DOI: 10.48550/ arXiv.2110.01442

[15] Jabbar A, Li X, Omar B. A survey on generative adversarial networks: Variants, applications, and training. ACM Computing Surveys (CSUR). 2021; **54**(8):1-49. DOI: 10.1145/3463475

[16] Yi X, Walia E, Babyn P. Generative adversarial network in medical imaging: A review. Medical Image Analysis. 2019; **2019**(58):101552. DOI: 10.1016/j.media. 2019.101552

[17] Koshino K, Werner RA, Pomper MG, Bundschuh RA, Toriumi F, Higuchi T, et al. Narrative review of generative adversarial networks in medical and molecular imaging. The Annals of Translational Medicine. 2021;**9**(9):821. DOI: 10.21037/atm-20-6325

[18] Lan L, You L, Zhang Z, Fan Z, Zhao W, Zeng N, et al. Generative adversarial networks and its applications in biomedical informatics. Frontiers in Public Health. 2020;**8**:164. DOI: 10.3389/ fpubh.2020.00164

[19] Kazeminia S, Baur C, Kuijper A, van Ginneken B, Navab N, Albarqouni S, et al. GANs for medical image analysis. Artificial Intelligence in Medicine. 2020; **109**:101938. DOI: 10.1016/j.artmed. 2020.101938

[20] Iqbal A, Sharif M, Yasmin M, Raza M, Aftab S. Generative adversarial networks and its applications in the biomedical image segmentation: A comprehensive survey. The International Journal of Multimedia Information Retrieval. 2022;**11**:333-368. DOI: 10.1007/s13735-022-00240-x

[21] Laino ME, Cancian P, Politi LS, Della Porta MG, Saba L, Savevski V. Generative adversarial networks in brain imaging: A narrative review. Journal of Imaging. 2022;**8**:83. DOI: 10.3390/ jimaging8040083

[22] Soomro TA, Zheng L, Afifi AJ, Ali A, Soomro S, Yin M, et al. Image segmentation for MR brain tumor detection using machine learning: A review. IEEE Reviews in Biomedical Engineering. 2023;**16**:70-90. DOI: 10.1109/RBME.2022.3185292

[23] Krithika M, Suganthi K. Review of medical image synthesis using GAN techniques. ITM Web of Conferences. 2021;**37**:01005. DOI: 10.1051/itmconf/ 20213701005

[24] Ali H, Biswas R, Mohsen F, Shah U, Alamgir A, Mousa O, et al. The role of generative adversarial networks in brain MRI: A scoping review. Insights Into Imaging. 2022;**13**:98. DOI: 10.1186/ s13244-022-01237-0

[25] Jeong JJ, Tariq A, Adejumo T, Trivedi H, Gichoya JW, Banerjee I. Systematic review of generative adversarial networks (GANs) for medical image classification and segmentation. Journal of Digital Imaging. 2022;**35**:137-152. DOI: 10.1007/ s10278-021-00556-w

[26] Yahaya MSM, Teo J. Data augmentation using generative adversarial networks for images and biomarkers in medicine and neuroscience. Frontiers in Applied Mathematics and Statistics. 2023;**9**: 1162760. DOI: 10.3389/fams. 2023.1162760

[27] Sabuhi M, Zhou M, Bezemer CP, Musilek P. Applications of generative adversarial models in anomaly detection: A systematic literature review. IEEE Access. 2021;**9**:161003-161029. DOI: 10.1109/ACCESS.2021.3131949

*Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

[28] Wang R, Bashyam V, Yang Z, Yu F, Tassopoulou V, Chitapalli SS, et al. Applications of generative adversarial networks in neuroimaging and clinical neuroscience. NeuroImage. 2023;**269**: 119898. DOI: 10.1016/j.neuroimage. 2023.119898

[29] Li H, Li Y. Anomaly detection based on GAN: A survey. Applied Intelligence. 2023;**53**:8209-8231. DOI: 10.1007/ s10489-022-03905-6

[30] Di Mattia F, Galeone P, De Simoni M, Ghelfi E. A survey on GANs for anomaly detection. arXiv. arXiv preprint arXiv:1906.11632. 2019. DOI: 10.48550/arXiv.1906.11632

[31] Esmaeili M, Toosi A, Roshanpoor A, Changizi V, Ghazisaeedi M, Rahmim A, et al. Generative adversarial networks for anomaly detection in biomedical imaging: A study on seven medical image datasets. IEEE Access. 2023;**11**: 17906. DOI: 10.1109/ACCESS.2023. 3244741

[32] Chen X, Konukoglu E. Unsupervised abnormality detection in medical images with deep generative methods. In: Biomedical Image Synthesis and Simulation: Methods and Applications. London, UK; Academic Press; 2022. DOI: 10.1016/B978-0-12-824349- 7.00022-0

[33] Choi K, Yi J, Park C, Yoon S. Deep learning for anomaly detection in timeseries data: Review, analysis, and guidelines. IEEE Access. 2021;**9**:120043. DOI: 10.1109/ACCESS.2021.3107975

[34] Hamilton JD. Time Series Analysis. Princeton, NJ, USA: Princeton University Press; 2020. DOI: 10.1515/ 9780691218632

[35] Mills T. Applied Time Series Analysis: A Practical Guide to Modeling and Forecasting. London, UK: Academic Press; 2019. DOI: 10.1016/B978-0- 12-813117-6.00001-6

[36] Shumway RH, Stoffer DS. Time Series Analysis and its Applications. 4th ed. New York, NY, USA: Springer; 2017. DOI: 10.1007/978-3-319-52452-8

[37] Chandola V, Banerjee A, Kumar V. Anomaly detection for discrete sequences: A survey. IEEE Transactions on Knowledge and Data Engineering. 2012;**24**(5):823-839. DOI: 10.1109/ TKDE.2010.235

[38] Blazquez-Garcıa A, Conde A, Mori U, Lozano JA. A review on outlier/ anomaly detection in time series data. ACM Computing Surveys. 2021;**54**(3): 1-33. DOI: 10.1145/3444690

[39] Darban AA, Webb GI, Pan S, Aggarwal CC, Salehi M. Deep learning for time series anomaly detection: A survey. arXiv. arXiv preprint arXiv: 2211.05244. 2022. DOI: 10.48550/ arXiv.2211.05244

[40] Brophy E, Wang Z, She Q, Ward T. Generative adversarial networks in time series: A systematic literature review. ACM Computing Surveys. 2023;**55**(10): 199. DOI: 10.1145.3559540

[41] Hawkins DM. Identification of Outliers. London, UK: Springer Netherlands; 1980. DOI: 10.1007/ 978-94-015-3994-4

[42] Ranga, Suri NNR, Murty N, Athithan MG. Outlier Detection: Techniques and Applications. New York, NY, USA: Springer; 2019. DOI: 10.1007/ 978-3-030-05127-3

[43] Aggarwal CC. Outlier Analysis. 2nd ed. New York, NY, USA: Springer; 2017. DOI: 10.1007/978-3-319-47578-3

[44] Munir M, Chattha MA, Dengel A, Ahmed S. A comparative analysis of traditional and deep learning-based anomaly detection methods for streaming data. In: 18th IEEE International Conference on Machine Learning and Applications (ICMLA). Los Alamitos, CA, USA: IEEE Computer Society Conference Publishing Services; 2019. pp. 561-566. DOI: 10.1109/ ICMLA.2019.00105

[45] Ruff L, Kauffmann JR, Vandermeulen RA, Montavon G, Samek W, Kloft M, et al. A unifying review of deep and shallow anomaly detection. Proceedings of the IEEE. 2021; **109**(5):756-795. DOI: 10.1109/ JPROC.2021.3052449

[46] Rewicki F, Denzler J, Niebling J. Is it worth it? Comparing six deep and classical methods for unsupervised anomaly detection in time series. Applied Sciences. 2023;**13**(3):1778. DOI: 10.3390/app13031778

[47] Audibert J, Michiardi P, Guyard F, Marti S, Zuluaga MA. Do deep neural networks contribute to multivariate time series anomaly detection? Pattern Recognition. 2022;**132**:108945. DOI: 10.1016/j.patcog.2022.108945

[48] Baid U et al. The RSNA-ASNR-MICCAI BraTS 2021 benchmark on brain tumor segmentation and radiogenomic classification. arXiv. arXiv preprint arXiv:2107.02314. 2020. DOI: 10.48550/ arXiv.2107.02314

[49] Petzsche MRH et al. ISLES 2022: a multi-center magnetic resonance imaging stroke lesion segmentation dataset. Scientific Data. 2022;**9**:762. DOI: 10.1038/s41597-022-01875-5

[50] Hsu CY, Liu WC. Multiple timeseries convolutional neural network for fault detection and diagnosis and empirical study in semiconductor manufacturing. Journal of Intelligent Manufacturing. 2020;**32**:1-14. DOI: 10.1007/s10845-020-01591-0

[51] Bao Y, Tang Z, Li H, Zhang Y. Computer vision and deep learning– based data anomaly detection method for structural health monitoring. Structural Health Monitoring. 2019; **18**(2):401-421. DOI: 10.1177/ 1475921718757405

[52] Tang Z, Chen Z, Bao Y, Li H. Convolutional neural network-based data anomaly detection method using multiple information for structural health monitoring. Structural Control and Health Monitoring. 2019;**26**(1): e2296. DOI: 10.1002/stc.2296

[53] Chung J, Gulcehre C, Cho KH, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv: 1412.3555. 2014; arXiv. DOI: 10.48550/a rXiv.1412.3555

[54] Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks. 1994; **5**(2):157-166. DOI: 10.1109/72.279181

[55] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation. 1997;**9**(8):1735-1780. DOI: 10.1162/neco.1997.9.8

[56] Cho K, van Merrienboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, et al. Learning phrase representations using RNN encoderdecoder for statistical machine translation. arXiv. arXiv preprint arXiv: 1406.1078. 2014. DOI: 10.48550/ arXiv.1406.1078

[57] Choi Y, Lim H, Choi H, Kim IJ. GAN-based anomaly detection and *Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

localization of multivariate time series data for power plant. In: Proc. IEEE Int. Conf. Big Data Smart Comput. (BigComp). Los Alamitos, CA, USA: IEEE Computer Society Conference Publishing Services; 2020. pp. 71-74. DOI: 10.1109/BigComp48618. 2020.00-97

[58] Wen T, Keyes R. Time series anomaly detection using convolutional neural networks and transfer learning. arXiv. arXiv preprint arXiv:1905.13628. 2019. DOI: 10.48550/arXiv.1905. 13628

[59] Zhou B, Liu S, Hooi B, Cheng X, Ye J. BeatGAN: Anomalous rhythm detection using adversarially generated time series. In: Proc. 28th Int. Joint Conf. Artif. Intell. Menlo Park, CA, USA: AAAI Press; 2019. pp. 4433-4439. DOI: 10.24963/ ijcai.2019/616

[60] Lea C, Vidal R, Reiter A, Hager GD. Temporal convolutional networks: A unified approach to action segmentation. arXiv. arXiv preprint arXiv:1611.05267. 2016. DOI: 10.48550/arXiv.1611.05267

[61] Mamandipoor B, Majd M, Sheikhalishahi S, Modena C, Osmani V. Monitoring and detecting faults in wastewater treatment plants using deep learning. Environmental Monitoring and Assessment. 2020;**192**(2):1-12. DOI: 10.1007/s10661-020-8064-1

[62] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. arXiv. arXiv preprint arXiv:1706.03762. 2017. DOI: 10.48550/arXiv.1706.03762

[63] Guo Y, Liao W, Wang Q, Yu L, Ji T, Li P. Multidimensional time series anomaly detection: A GRU-based Gaussian mixture variational autoencoder approach. In: Proceedings of the 10th Asian Conference on

Machine Learning. Brookline, MA, USA: MLR Press/Microtome Publishing; 2018. pp. 97-112

[64] Lee TJ, Gottschlich J, Tatbul N, Metcalf E, Zdonik S. Greenhouse: A zero-positive machine learning system for time-series anomaly detection. arXiv. arXiv preprint arXiv:1801.03168. 2018. DOI: 10.48550/arXiv.1801.03168

[65] Lu Z, Lv W, Xie Z, Du B, Xiong G, Sun L, et al. Graph sequence neural network with an attention mechanism for traffic speed prediction. ACM Transactions on Intelligent Systems and Technology (TIST). 2022, 2022;**13**(2): 1-24. DOI: 10.1145/3470889

[66] Wu L, Cui P, Pei J, Zhao L, Song L. Graph neural networks. In: Wu L, Cui P, Pei J, Zhao L, editors. Graph Neural Networks: Foundations, Frontiers, and Applications. Singapore: Springer; 2022. DOI: 10.1007/978-981-16-6054-2\_3

[67] Kipf TN, Welling M. Semisupervised classification with graph convolutional networks. arXiv. arXiv preprint arXiv:1609.02907. 2016. DOI: 10.48550/arXiv.1609.02907

[68] Veličković P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. arXiv. arXiv preprint arXiv:1701.10903. 2017. DOI: 10.48550/arXiv.1710.10903

[69] Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. Neural Information Processing Systems. 2014;**2**:2672-2680. DOI: 10.3156/jsoft.29.5\_177\_2

[70] Arjovsky M, Chintala S, Bottou L. Wasserstein generative adversarial networks. Proceedings of the 34th international conference on machine learning. PMLR. 2017;**70**:214-223

[71] Mirza M, Osindero S. Conditional generative adversarial nets. arXiv. arXiv preprint arXiv:1411.1784. 2014. DOI: 10.48550/arXiv.1411.1784

[72] Radford A, Metz L, Chitala S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv. arXiv preprint arXiv:1511.06434. 2015. DOI: 10.48550/arXiv.1511.06434

[73] Zhang H, Goodfellow I, Metaxas D, Odena A. Self-attention generative adversarial networks. arXiv. arXiv preprint arXiv:1805.08318. 2018. DOI: 10.48550/arXiv.1805.08318

[74] Donahue J, Krahenbuhl P, Darrell T. Adversarial feature learning. arXiv. arXiv preprint arXiv:1605.09782. 2016. DOI: 10.48550/arXiv.1605.09782

[75] Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv. arXiv preprint arXiv: 1703.10593. 2017. DOI: 10.48550/ arXiv.1703.10593

[76] Schlegl T, Seebock P, Waldstein SM, Schmidt-Erfurth U, Langs G. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In: Information Processing in Medical Imaging, Lecture Notes in Computer Science. Cham: Springer; 2017. pp. 146-147. DOI: 10.1007/978-3-319-59 050-9.12

[77] Schlegl T, Seebock P, Waldstein SM, Langs G, Schmidt-Erfurth U. f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks. Medical Image Analysis. 2019;**54**:30-44. DOI: 10.1016/j.media.2019.01.010

[78] Zenati H, Foo CS, Lecouat B, Manek G, Chandrasekhar VR. Efficient GAN-based anomaly detection. arXiv. arXiv preprint arXiv:1902.03984. 2018, 2019. DOI: 10.48550/arXiv.1902.03984

[79] Akcay S, Atapour-Abarghouei A, Breckon TP. GANomaly: Semisupervised anomaly detection via adversarial training. In: Lecture Notes in Computer Science. Berlin, Germany: Springer; 2019. pp. 622-637. DOI: 10.1007/978-3-030-20893-6 39

[80] Bashar MA, Nayak R. TANoGAN: Time series anomaly detection with generative adversarial networks. In: 2020 IEEE Symposium Series on Computational Intelligence (SSCI). Piscataway, NJ, USA: IEEE Publishing; 2020. pp. 1778-1785. DOI: 10.1109/ SSCI47803.2020.9308512

[81] Li D, Chen D, Jin B, Shi L, Goh J, Ng SK. MADGAN: Multivariate anomaly detection for time series data with generative adversarial networks. In: Lecture Notes in Computer Science. Berlin, Germany: Springer; 2019. pp. 703-716. DOI: 10.1007/978-3- 030-30490-4 56

[82] Geiger A, Liu D, Alnegheimish S, Cuesta-Infante A, Veeramachaneni K. TadGAN: Time series anomaly detection using generative adversarial networks. In: Proceedings - 2020 IEEE International Conference on Big Data. Piscataway, NJ, USA: IEEE Publishing; 2020. pp. 33-43. DOI: 10.1109/ bigdata50022.2020.9378139

[83] Sorin V, Barash Y, Konen E, Klang E. Creating artificial images for radiology applications using generative adversarial networks (GANs) – A systematic review. Acta Radiologica. 2020;**27**:1175-1185. DOI: 10.1016/j.acra.2019.12.024

[84] Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nature

*Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

Reviews. Cancer. 2018;**18**:500-510. DOI: 10.1038/s41568-018-0016-5

[85] Festag S, Denzler J, Spreckelsen C. Generative adversarial networks for biomedical time series forecasting and imputation. Journal of Biomedical Informatics. 2022;**129**:104058. DOI: 10.1016/j.jbi.2022.104058

[86] Islam J, Zhang Y. GAN-based synthetic brain PET image generation. Brain Informatics. 2020;**7**(1):3. DOI: 10.1186/s40708-020- 00104-2

[87] Hirte AU, Platscher M, Joyce T, Heit JJ, Tranvinh E, Federau C. Realistic generation of diffusion-weighted magnetic resonance brain images with deep generative models. Magnetic Resonance Imaging. 2021;**81**:60-66. DOI: 10.1016/j.mri.2021.06.001

[88] Thambawita V et al. DeepFake electrocardiograms using generative adversarial networks are the beginning of the end for privacy issues in medicine. Scientific Reports. 2021;**11**:21896. DOI: 10.1038/s41598-021-01295-2

[89] Lashgari E, Liang D, Maoz U. Data augmentation for deep-learning-based electroencephalography. Journal of Neuroscience Methods. 2020;**346**: 108885. DOI: 10.1016/j.jneumeth.2020. 108885

[90] Cheng D, Qiu N, Zhao F, Mao Y, Li C. Research on the modality transfer method of brain imaging based on generative adversarial network. Frontiers in Neuroscience. 2021;**15**: 655019. DOI: 10.3389/fnins.2021.655019

[91] Yurt M, Dar SU, Erdem A, Erdem E, Oguz KK, Çukur T. mustGAN: Multistream generative adversarial networks for MR image synthesis. Medical Image

Analysis. 2021;**70**:101944. DOI: 10.1016/ j.media.2020.101944

[92] Jin CB, Kim H, Liu M, Jung W, Joo S, Park E, et al. Deep CT to MR synthesis using paired and unpaired data. Sensors. 2019;**22**(19):2361. DOI: 10.3390/ s19102361

[93] Lan H, Alzheimer Disease Neuroimaging Intitiative, Toga AW, Sepehrband F. Three-dimensional selfattention conditional GAN with spectral normalization for multimodal neuroimaging synthesis. Magnetic Resonance in Medicine. 2021;**86**: 1718-1733. DOI: 10.1002/mrm.28819

[94] Zhao K, Zhou L, Gao S, Wang X, Wang Y, Zhao X, et al. Study of low-dose PET image recovery using supervised learning with CycleGAN. PLoS One. 2020;**15**:e0238455. DOI: 10.1371/journal. pone.0238455

[95] Sundar LKS, Iommi D, Muzik O, Chalampalakis Z, Klebermass EV, Hienert M, et al. Conditional generative adversarial networks aided motion correction of dynamic 18F-FDG PET brain studies. Journal of Nuclear Medicine. 2021;**62**:871-880. DOI: 10.2967/jnumed.120.248856

[96] Delannoy Q, Pham CH, Cazorla C, Tor-Díez C, Dollé G, Meunier H, et al. SegSRGAN: Super-resolution and segmentation using generative adversarial networks—Application to neonatal brain MRI. Computers in Biology and Medicine. 2020;**120**:103755. DOI: 10.1016/j.compbiomed.2020. 103755

[97] Shaul R, David I, Shitrit O, Raviv TR. Subsampled brain MRI reconstruction by generative adversarial neural networks. Medical Image Analysis. 2020;**65**:101747. DOI: 10.1016/j.media.2020.101747

[98] An Y, Lam HK, Ling SH. Autodenoising for EEG signals using generative adversarial network. Sensors. 2022;**22**:1750. DOI: 10.3390/s22051750

[99] Wolterink JM, Leiner T, Viergever MA, Isgum I. Generative adversarial networks for noise reduction in low-dose CT. IEEE Transactions on Medical Imaging. 2017;**36**(12):2536-2545. DOI: 10.1109/TMI.2017.2708987

[100] Sille R, Choudhury T, Sharma A, Chauhan P, Tomar R, Sharma D. A novel generative adversarial network-based approach for automated brain tumour segmentation. Medicina. 2023;**59**(1):119. DOI: 10.3390/medicina59010119

[101] Yuan W, Wei J, Wang J, Ma Q, Tasdizen T. Unified generative adversarial networks for multimodal segmentation from unpaired 3D medical images. Medical Image Analysis. 2020; **64**:101731. DOI: 10.1016/j.media.2020. 101731

[102] Oh KT, Lee S, Lee H, Yun M, Yoo SK. Semantic segmentation of white matter in FDG-PET using generative adversarial network. Journal of Digital Imaging. 2020;**33**:816-825. DOI: 10.1007/ s10278-020-00321-5

[103] Elazab A, Wang C, Gardezi SJS, Bai H, Hu Q, Wang T, et al. GP-GAN: Brain tumor growth prediction using stacked 3D generative adversarial networks from longitudinal MR images. Neural Networks. 2020;**132**:321-332. DOI: 10.1016/j.neunet.2020.09.004

[104] Han C, Rundo L, Murao K, Noguchi T, Shimahara Y, Milacski ZÁ, et al. MADGAN: Unsupervised medical anomaly detection GAN using multiple adjacent brain MRI slice reconstruction. BMC Bioinformatics. 2021;**22**(Suppl 2):31. DOI: 10.1186/ s12859-020-03936-1

[105] Ren Z, Li J, Xue X, Li X, Yang F, Jiao Z, et al. Reconstructing seen image from brain activity by visually-guided cognitive representation and adversarial learning. NeuroImage. 2021; **228**:117602. DOI: 10.1016/j.neuro image.2020.117602

[106] Huang W, Yan H, Wang C, Yang X, Li J, Zuo Z, et al. Deep natural image reconstruction from human brain activity based on conditional progressively growing generative adversarial networks. Neuroscience Bulletin. 2021;**37**:369-379. DOI: 10.1007/ s12264-020-00613-4

[107] Al-Tahan H, Mohsenzadeh Y. Reconstructing feedback representations in the ventral visual pathway with a generative adversarial autoencoder. PLoS Computational Biology. 2021;**17**: 1-19. DOI: 10.1371/journal.pcbi.1008775

[108] Goldberger AL, Goldberger ZD, Shvilkin A. Golberger's Clinial Electrocardiography: A Simplified Approach. 9th ed. Philadelphia, PA, USA: Elsevier; 2017. DOI: 10.1016/ C2014-0-03319-9

[109] Skandarani Y, Lalande A, Afilalo J, Jodoin PM. Generative adversarial networks in cardiology. The Canadian Journal of Cardiology. 2022;**38**:196-203. DOI: 10.1016/j.cjca.2021.11.003

[110] Shin DH, Park RC, Chung K. Decision boundary-based anomaly detection model using improved AnoGAN from ECG data. IEEE Access. 2020;**8**:108664-108674. DOI: 10.1109/ ACCESS.2020.3000638

[111] Li W, Tang YM, Yu KM, To S. SLC-GAN: An automated myocardial infarction detection model based on generative adversarial networks and convolutional neural networks with single-lead electrocardiogram synthesis.

*Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

Information Sciences. 2022;**589**:738-750. DOI: 10.1016/j.ins.2021.12.083

[112] Xia Y, Xu Y, Chen P, Zhang J, Zhang Y. Generative adversarial network with transformer generator for boosting ECG classification. Biomedical Signal Processing and Control. 2023;**80**: 104276. DOI: 10.1016/j. bspc.2022.104276

[113] Rath A, Mishra D, Panda G, Satapathy SC. Heart disease detection using deep learning methods from imbalanced ECG samples. Biomedical Signal Processing and Control. 2021;**68**: 102820. DOI: 10.1016/j.bspc.2021. 102820

[114] Qin J, Gao F, Wang Z, Wong DC, Zhao Z, Relton SD, et al. A novel temporal generative adversarial network for electrocardiography anomaly detection. Artificial Intelligence in Medicine. 2023;**136**:102489. DOI: 10.1016/j.artmed.2023.102489

[115] Wang Z, Stavrakis S, Yao B. Hierarchical deep learning with generative adversarial network for automatic cardiac diagnosis from ECG signals. Computers in Biology and Medicine. 2023;**155**:106641. DOI: 10.1016/j.compbiomed.2023.106641

[116] Nunez PL, Srinivasan R. Electric Fields of the Brain: The Neurophysics of EEG. 2nd ed. Oxford, UK: Oxford University Press; 2006. DOI: 10.1093/ acprof:oso/9780195050387.001.0001

[117] Habashi AG, Azab AM, Eldawlatly S, Aly GM. Generative adversarial networks in EEG analysis: An overview. Journal of Neuroengineering and Rehabilitation. 2023;**20**:40. DOI: 10.1186/s12984-023- 01169-w

[118] Wei Z, Zou J, Zhang J, Xu J. Automatic epileptic EEG detection using convolutional neural network with improvements in time-domain. Biomedical Signal Processing and Control. 2019;**53**:101551. DOI: 10.1016/ j.bspc. 2019.04.028

[119] You S, Cho BH, Yook S, Kim JY, Shon YM, Seo DW, et al. Unsupervised automatic seizure detection for focalonset seizures recorded with behind-theear EEG using an anomaly-detecting generative adversarial network. Computer Methods and Programs in Biomedicine. 2020;**193**:105472. DOI: 10.1016/j.cmpb.2020.105472

[120] Tomson T, Nashef L, Ryvlin P. Sudden unexpected death in epilepsy: Current knowledge and future directions. Lancet Neurology. 2008;**7**: 1021-1031. DOI: 10.1016/S1474-4422 (08)70202-3

[121] Usman SM, Khalid S, Bashir Z. Epileptic seizure prediction using scalp electroencephalogram signals. Biocybernetics and Biomedical Engineering. 2021;**41**:211-220. DOI: 10.1016/j.bbe.2021.01.001

[122] Natu M, Bachute M, Gite S, Kotecha K, Vidyarthi A. Review on epileptic seizure prediction: Machine learning and deep learning approaches. Computational and Mathematical Methods in Medicine. 2022;**2022**: 7751263. DOI: 10.1155/2022/7751263

[123] Zhu B, Shoaran M. Unsupervised domain adaptation for cross-subject few-shot neurological symptom detection. In: International IEEE/EMBS Conference on Neural Engineering. Piscataway, NJ, USA: IEEE Publishing; 2021. DOI: 10.1109/NER49283. 2021.9441235

[124] Truong ND, Kuhlmann L, Bonyadi MR, Querlioz D, Zhou L, Kavehei O. Epileptic seizure forecasting with generative adversarial networks. IEEE Access. 2019;**7**:143999-144009. DOI: 10.1109/ACCESS.2019.2944691

[125] Pascual D, Amirshahi A, Aminifar A, Atienza D, Ryvlin P, Wattenhofer R. EpilepsyGAN: Synthetic epileptic brain activities with privacy preservation. IEEE Transactions on Biomedical Engineering. 2021;**68**(8): 2435-2446. DOI: 10.1109/TBME.2020. 3042574

[126] Yin X, Han Y, Sun H, Xu Z, Yu H, Duan X. Multi-attention generative adversarial network for multivariate time series prediction. IEEE Access. 2021;**9**:57351-57363

[127] Usman SM, Khalid S, Bashir S. A deep learning based ensemble learning method for epileptic seizure prediction. Computers in Biology and Medicine. 2021;**136**:1104710. DOI: 10.1016/j.comp biomed.2021.104710

[128] Salazar A, Vergara L, Safont G. Generative adversarial networks and Markov random fields for oversampling very small training sets. Expert Systems with Applications. 2021; **163**:113819. DOI: 10.1016/j.eswa.2020. 113819

[129] Yin X, Han Y, Xu Z, Liu J. VAECGAN: A generating framework for longterm prediction in multivariate time series. Cybersecurity. 2021;**4**:22. DOI: 10.1186/s42400-021-00090-w

[130] Rasheed K, Qadir J, O'Brien TJ, Kuhlmann L, Razi A. A generative model to synthesize EEG data for epileptic seizure prediction. IEEE Transactions on Neural Systems and Rehabilitation Engineering. 2020;**29**:2322-2332. DOI: 10.1109/TNSRE.2021.3125023

[131] Gang D, Alkhachroum A, Bicchi MAM, Jagged JR, Cajigas I, Chen ZS. Deep learning for robust detection of interictal epileptiform discharges. Journal of Neural Engineering. 2021;**18**:056015. DOI: 10.1088/1741-2552/abf28e

[132] Luo TJ, Fan Y, Chen L, Guo G, Zhou C. EEG signal reconstruction using a generative adversarial network with Wasserstein distance and temporalspatial-frequency loss. Frontiers in Neuroinformatics. 2020;**14**:15. DOI: 10.3389/fninf.2020.00015

[133] Wang J, Mu W, Wang A, Wang L, Han J, Wang P, et al. Generative adversarial networks for electroencephalogram signal analysis: A mini review. In: International Winter Conference on Brain Computer Interface (BCI). Piscataway, NJ, USA: IEEE Publishing; 2023. DOI: 10.1109/ BCI57258.2023.10078666

[134] Handa P, Gupta E, Muskan S, Goel N. A review on software and hardware developments in automatic epilepsy diagnosis using EEG datasets. Expert Systems. 2023:e13374. DOI: 10.1111/exsy.13374

[135] Daoud H, Bayoumi M. Generative adversarial network based semisupervised learning for epileptic focus localization. In: IEEE International Conference on Bioinformatics and Biomedicine (BIBM). Piscataway, NJ, USA: IEEE Publishing; 2021. DOI: 10.1109/BIBM52615.2021.9669695

[136] Dong Z, Zhou S. EEG-based seizure detection using generative model and deep learning. In: IEEE International Conference on E-Health and Bioengineering (EHB). Piscataway, NJ, USA: IEEE Publishing; 2022. DOI: 10.1109/EHB55594.2022.9991438

[137] Ganti B, Chaitanya G, Balamurugan S, Nagaraj N, *Anomaly Detection in Medical Time Series with Generative Adversarial Networks… DOI: http://dx.doi.org/10.5772/intechopen.112582*

Balasubramanian K, Pati S. Time-series generative adversarial network approach of deep learning improves seizure detection from the human thalamic SEEG. Frontiers in Neurology. 2022;**13**: 755094. DOI: 10.3389/fneur.2022.755094

[138] Xu M, Jie J, Zhou W, Zhou H, Jin S. Synthetic epileptic brain activities with TripleGAN. Computational and Mathematical Methods in Medicine. 2022;**2022**:2841228. DOI: 10.1155/2022/ 2841228

[139] Zhang X, Yao L, Dong M, Liu Z, Zhang Y, Li Y. Adversarial representation learning for robust patient-independent epileptic seizure detection. IEEE Journal of Biomedical and Health Informatics. 2020;**24**(10): 2852-2859. DOI: 10.1109/JBHI.2020. 2971610

[140] Boonyakitanont P, Lek-uthai A, Chomtho K, Songsiri J. A review of feature extraction and performance evaluation in epileptic seizure detection using EEG. Biomedical Signal Processing and Control. 2020;**57**:101702. DOI: 10.1016/j.bspc.2019.101702

[141] Cherian R, Kanaga EG. Theoretical and methodological analysis of EEG based seizure detection and prediction: An exhaustive review. Journal of Neuroscience Methods. 2022;**369**: 109483. DOI: 10.1016/j.jneumeth. 2022.109483

[142] Nafea MS, Ismail ZH. Supervised machine learning and deep learning techniques for epileptic seizure recognition using EEG signals—A systematic literature review. Bioengineering. 2022;**9**:781. DOI: 10.3390/bioengineering9120781

[143] Yuan J, Ran X, Liu K, Yao C, Yao Y, Wu H, et al. Machine learning applications on neuroimaging for

diagnosis and prognosis of epilepsy: A review. Journal of Neuroscience Methods. 2022;**368**:109441. DOI: 10.1016/j.jneumeth.2021.109441

[144] Xu Q, Huang G, Yuan Y, Guo C, Sun Y, Wu F, et al. An empirical study on evaluation metrics of generative adversarial networks. arXiv. arXiv preprint arXiv:1806.07755. 2018. DOI: 10.48550/arXiv.1806.07755

[145] Borji A. Pros and cons of GAN evaluation measures: New developments. Computer Vision and Image Understanding. 2022;**215**:103329. DOI: 10.1016/j.cviu.2021.103329

#### **Chapter 3**

## Anomaly Detection in IoT: Recent Advances, AI and ML Perspectives and Applications

*Menachem Domb, Sujata Joshi and Arulmozhi Khn* 

#### **Abstract**

IoT comprises sensors and other small devices interconnected locally and via the Internet. Typical IoT devices collect data from the environment through sensors, analyze it and act back on the physical world through actuators. We can find them integrated into home appliances, Healthcare, Control systems, and wearables. This chapter presents a variety of applications where IoT devices are used for anomaly detection and correction. We review recent advancements in Machine/Deep Learning Models and Techniques for Anomaly Detection in IoT networks. We describe significant indepth applications in various domains, Anomaly Detection for IoT Time-Series Data, Cybersecurity, Healthcare, Smart city, and more. The number of connected devices is increasing daily; by 2025, there will be approximately 85 billion IoT devices, spreading everywhere in Manufacturing (40%), Medical (30%), Retail, and Security (20%). This significant shift toward the Internet of Things (IoT) has created opportunities for future IoT applications. The chapter examines the security issues of IoT standards, protocols, and practical operations and identifies the hazards associated with the existing IoT model. It analyzes new security protocols and solutions to moderate these challenges. This chapter's outcome can benefit the research community by encapsulating the Information related to IoT and proposing innovative solutions.

**Keywords:** anomaly detection, internet of things [IoT], cybersecurity, data security, threats, risks, smart devices, time-series data, AI, machine learning, deep learning, healthcare, smart city, IoT environments, internet of things, anomaly detection, IoT intrusion detection, machine learning, deep learning, transfer learning, network security, convolutional neural network

#### **1. Introduction**

The wide variety of IoT devices lacking any standard creates connectivity issues and increases the security vulnerability of IoT local networks and the entire Internet. Machine Learning techniques are already used in ECG, X-ray, pattern recognition, cancer detection, brain signal modeling, and IoT services on electrical impedance planes to discover defects. Extending ML and DL technologies to detect anomalies where it is already operating is a natural and effective transition. Anomalies are events or patterns that deviate significantly from predictable behavior. Detection methods are expected to identify anomaly occurrences and their probable cause promptly. To comply with this chapter topic, we focus on these applications incorporating Machine Learning and Deep learning methods. Chatterjee & Ahmed [1] provide a comprehensive survey on Anomaly Detection in IoT and propose four measurements for evaluating IoT Anomaly Detection methods: how they approach the problem, how they are applied, the type of method, and the algorithm latency. Anomaly detection using deep learning is described by Chalapathy and Chawla [2], and Yassine et al. [3] provide a review of the methodologies, situations, and computation platforms used for anomaly detection in the energy industry. Talagala et al. [4] propose a distributional unsupervised for anomaly detection in high-dimensional data. Yin et al. [5] extract unique temporal features from a given temporal data file using a combination of CNN and LSTM and continue in [5] to detect anomalies involving CNN, LSTM, and Deep neural network (DNN).

They [1] also define 18 application types of anomaly detection processes. The following are examples of various application types. Sobhani et al. [6] demonstrate that the accuracy of final load projections is improved when eliminating observations from the original input using local load information. T. Asakura et al. [7] detect damage to industrial rotating equipment by calculating the feature vectors of the anomaly vibration data extracted from sensors' vibration signal features to construct a monitoring system for machinery equipment. Huang et al. [8] proposed anomalies detection in Manufacturing using density peak weighted fuzzy C-means (WFCM). Yasaei et al. [9] detect unexpected event changes in sensor signals using an adaptive data-driven monitoring method. Zekry et al. [10] use a convolutional LSTM model for anomaly detection in the context of connected vehicles. Wang et al. [11] log anomalies in IoT systems using a natural language processing approach, extracting the relevance between words and vectoring them. The method trains supervised models to detect anomalies reducing computational time. Xu et al. [12] used I-LSTM and Deep learning in smart-city data for multi-classification anomaly detection to improve smart homes' service quality. Tripathi et al. [13] proposed reliable and transparent city connectivity using IoT, MEC, and Blockchain consensus. Ullah et al. [14] presented a timely identification of abnormal incidents in surveillance networks, incorporating LSTM with CNN, where CNN features are collected from successive frames. LSTM is used to distinguish between normal and abnormal values. The in-depth features and multi-layer BD-LSTM provide high-level training and validation data to real-world IoT surveillance networks. The DeL-IoT framework [15] detects IoT abnormalities by observing flow-level traffic instances that pass through switches. The IoT anomaly identification and prediction framework uses a Deep Learning technique to identify anomalies. Mirsky et al. [16] proposed a Blockchain-based distributed anomaly detection algorithm using the Markov chain (MC) to simulate sequences efficiently. Y. An et al. [17] proposed anomaly detection capable of relieving network congestion and CPUs from the computing pressures of centralized servers, unlocking the potential of edge intelligence in IoT. Shen et al. [18] propose a privacy-preserving SVM training strategy using encrypted IoT data. Data providers encrypt their data locally using their private keys and then record the encrypted data on the Blockchain.

The rest of the chapter comprises as follows: The next section outlines security issues unique to the IoT environment. Section 3 presents a generic two-stage Anomaly Detection approach. In the first stage, a process builds the envelope around the weighted average, and the comparison is done in the second stage. In Section 4,

Anomaly Detection using Random Forest Machine Learning is presented, and it concludes in Section 5.

#### **2. IoT security issues**

We see a considerable rise in the use of IoT applications in our day-to-day lives. The IoT enhances web-based applications by enabling connections via the Internet between people and their equipment/devices in a real-world or virtual environment. IoT improves Web-enabled applications by allowing links between "everyone" and "everything" in a real-world and virtual environment [19]. Utilizing IoT applications and services is now easier than ever because of the exponential expansion of smart devices. As the asset value of the data kept, processed, and conveyed increases along with scale, so do the attacks against them. These predictions show that there will be a rise in the number and level of threats and attacks against IoT devices, necessitating more robust security measures. This section aims to investigate recent IoT cybersecurity solutions.

Artificial intelligence, Machine/Deep Learning, and networking have become the current area of IoT-related research. Adopting ultra-lightweight protocols for security and core functionality is a significant development in the IoT.

IoT security is constantly evolving, with new risks always being found. The focus of IoT security discussions is ACL techniques, interim encryption techniques, hardware-specific security solutions, and SQL-related attack measures. Identifying IoT-related cybersecurity risks, providing classifications, and looking for prevailing solutions to address them. The following questions are addressed:


#### **2.1 Literature review**

The recent industrial trends include embedded networking in the wireless segment, where IoT is the major player. The demand for smart applications and systems grew, leading to the rise of IoT in commercial segments [20]. Due to the immense increase in the retail segments, the usage of smart applications has spiked up, increasing their dependability, which further leads to high risks. IoT devices have emerged as the spot for intrusion activities because of the lightweight protocols and standards that are currently present on these devices [21, 22], and the entities that make up these devices have easier access to servers [23] because the security is not fully resolved. The problem with the traditional model is the lack of low-powered device algorithms and the incompatibility of security tools due to differences in policy and implementation methods [24]. A variety of hardware-based techniques and unique solutions have been suggested in recent research to address traditional security challenges.

Xin Zhang and Fengtong Wen [25] proposed an authentication for IoT, where two algorithmic models have been built to ensure valid authentication. The scope of the security solution offered in this work is constrained to protect only lightweight sensor devices from the standard network layer and physical layer-based attacks. M. Dahman Alshehri and Farookh K. Hussain [26] proposed a cluster-based fuzzy architecture and a secured communications model for IoT nodes. This study effectively provides a detection technique against the network's malicious nodes but does not cater to the threats posed by the audit attack surface. This study does not adequately address the performance analysis of operational communication and computing costs. Chen et al. [27] offered a unique Low scale Denial of-Service attack detection approach that incorporates trust evaluation with Hilbert-Huang Transformation in Zigbee WSN to address the security risks considering a large number of devices with low energy which is susceptible to attacks. This work's signal and anomaly detection technique helps reduce the attack level. It has an extensible design because it supports cloud and edge computing, but higher storage overheads persist as a problem. In traditional network security, IDS is entrusted with identifying and keeping track of threat behaviors. Hence, such models do not expressly target the IoT environment.

#### **2.2 Security architecture and communication**

This section discusses the IoT security architecture. Use-cases for IoT range from single node devices to cross platform deployments of technology and real-time cloud systems [28]. IoT operations consist of three main tasks: transmitting, retrieving, and data processing. Application Layer: Embedded interface modules enable devices to communicate with the underlying architecture. The device Management Plane identifies the data's source and destination to maintain the device's input–output operations. For instance, the Aggregator aggregates the given device data assets into a fixed set. A communication Layer is an intermediary layer with network components that establish various communication protocols and standards. This layer comprises stacks of current protocols and criteria for controlling traffic throughout the system. Standard protocols enable proper communication among IoT devices. Such systems need a defined set of simple rules to initialize and share data information. **Figure 1** depicts the multi-layer architecture of IoT.

The IoT's communication protocols include:


*Anomaly Detection in IoT: Recent Advances, AI and ML Perspectives and Applications DOI: http://dx.doi.org/10.5772/intechopen.111944*

**Figure 1.** *IoT multi-layer architecture.*


Heterogeneous physical components such as switches, actuators, gateways, sensor nodes, and other embedded devices make up this unstable environment. A significant impact on networking principles is made by the intelligent device's engineering process, which is the backbone of the whole concept. Gadgets with self-configuring capabilities of the M2M communication paradigm are IoT innovations. Through algorithms and auxiliary technology, this configuration gives nodes the intelligence they

need to make decisions for themselves under any circumstance [29, 30]. It is helpful during rescue operations and other emergencies where configuring the network for a specific area is complex, and there is no support for damaged nodes. However, as machines are not failsafe, it becomes susceptible if it depends too heavily on them. Particularly in the present, adversaries use weak authentication, unpatched firmware, and online credential vulnerabilities [31].

Following are some of the IoT security issues:


*Anomaly Detection in IoT: Recent Advances, AI and ML Perspectives and Applications DOI: http://dx.doi.org/10.5772/intechopen.111944*

6.Exposure Threats: IoT endpoints, such as sensors and IP cameras which are in public spaces, are the threat points that are easiest for the enemy to access. As a result, the user's integrity and authentication are threatened by physical-based and proximity threats [34]. Our changes to the protocol method to safeguard devices from adversaries are the biggest security difficulties in this area.

#### **2.3 Classification of IoT attacks**

Several commercial businesses have made significant financial investments to secure their IoT-based networks in recent years. IoT attacks are split into two modules:

#### *2.3.1 Protocol-based attacks*

Protocol-based attacks utilize known published protocols to serve their benefits, affecting the communication channel. It is divided into two types:

1. Communications protocols attacks: (a) Attacks on communication protocols several types of exploitation occur when nodes transition, such as sniffer attacks, flooding attacks, and key preshredding attacks. (b) Network protocol attacks where connection establishment is exploited include Wormhole attacks, selective forward attacks, and sniffing attacks.

#### *2.3.2 Transmitted data attacks*

Threats on initial packets and messages moving across communication nodes. Some of its most severely affected security exploitations are data leakage, malicious node VM formation, hash collision, and denial of service. Active and passive attacks compromise the system's security—the effectiveness of the network is less affected by passive attack protection systems, which are restricted to monitoring techniques. Modern, responsive security techniques are needed to counter active attacks to reduce risk and affect network performance.


passwords and user credentials by exploiting program gaps or developing workarounds for the current authentication procedure.

e.Port Scanning - Synchronize requests, target ports, sources, firewalls, packets, open nodes [38], and listening nodes. Synchronize scans are a frequently used technique that creates a partial connection to the target node on the target port by sending a synchronized packet to test the host system's initial response.

#### **2.4 IoT security solutions**

In contrast to traditional security, which is tool-centric, the most recent cybersecurity solutions focus on software-oriented techniques [39, 40]. The security characteristics that current systems address are authentication, trust, and integrity. Even in its current state, the Internet of Things (IoT) cannot support powerful devices and is not adaptable enough to keep up. **Table 1** summarizes the IoT protocols, emphasizing their characteristics and security concerns. According to the findings, protocol-based



*Anomaly Detection in IoT: Recent Advances, AI and ML Perspectives and Applications DOI: http://dx.doi.org/10.5772/intechopen.111944*

**Table 1.**

*Summary list of security protocols for IoT.*

security solutions protect against most IoT attack surfaces [41]. Using secure techniques performed over the Data Link and Transport layers, protocols like COAP and DDS enable efficient immunity against well-known attacks like DDoS attacks and botnet attacks. In Sigfox and EnOcean, new methodologies prevent new threat issues like asynchronous code definition and poor payload encryption. The lightweight protocols MQTT and BLE have also emerged as a viable defense against dangers posed by malicious nodes and Man in Middle attacks. Divided security management is beneficial for more straightforward management of security measures and increases the efficacy of the most suggested solutions.

#### **2.5 Summary**

This section discussed IoT's current cyber security trends by researching various protocols, standards, and threats. The research findings on the cyber security risks convey that the traditional methods must be more efficient against attacks in heterogeneous IoT environments. Our study further reveals that most cyber security solutions include encryption techniques with low energy use, which also is successful in securing channel attacks in IoT. IoT security increased after integrating with various technologies.

The complications of the IoT system have increased, and security features' openness has decreased. Even though the previously discussed issues have been attempted to address the evolution of communication technologies and protocols, there is always room for research.

#### **3. Anomaly detection using an optimized envelope**

IoT systems collect vast amounts of data to track and analyze the structure of future recorded data. However, this data cannot be stored as is due to limited storage but must be reduced to allow future data analysis based on past data that will not be compromised. We propose a parameterized method of sampling the data optimally. Our approach has three parameters– an averaging process for constructing an average data cycle from past observations, an envelope method for defining an interval around the average data cycle, and an entropy method for comparing new data cycles to the constructed envelope enabling identifying anomalies and predicting future cycle behavior. This section concentrates on finding the optimal envelope using entropy methods.

We often have sequential data collected by sensors, and computational power and bandwidth resources prohibit us from collecting large-scale data. Sampling preserves the most critical information from the original data and reduces the complexity of the subsequent knowledge discovery task to a traceable version without compromising performance. Dictionary learning [42] helps extract patterns hidden in data. We can apply dictionary learning to sequential data for natural language processing, video analysis, and nonsequential data tasks [43]. Given the IoT data collected sequentially, we can find a method that maintains a basis where we have enough elements to describe the sequential patterns of the data. It helps to extract a set of common sequential patterns from the sequential telematics data. In a smart home system, we may collect the most frequent activity trajectories for home members to use for member authentication. We aim to find an optimal sampling method given a set of timeseries records, where we collect information before and after the sampling reduction process regarding the data's purpose in the context of the relevant application. Many

known data reduction techniques enable restoring the original data set from the reduced one. Among these are compression and compaction routines and dictionary methods. Given the sequential data, we may apply Classification and Prediction. Classification defines whether a series of daily temperatures represent an El-Niño year or whether the data points to suspected intrusion.

#### **3.1 Related work**

Vlachos et al. [44] proposed a procedure for getting the best practical estimated gap between two extreme measurements related to any data sequence. Sakurada and Yairi [45] use auto-encoders with nonlinear dimensionality reduction for the anomaly detection task. Reeves et al. [46] generate domain representations using scaleable layers. Chilimbi and Hirzel [47] implement an iterative scheme that uses temporal data to construct a profile. Then, they identify repeated data sequences with the same order, prefetches them, and let the program continue executing the prefetched instructions. Lane and Brodley [48] use instance-based learning (IBL) for boundary determination by good user behavior and heuristics. Kasiviswanathan et al. [49] detect and cluster user content for optimization. Mairal et al. [42] create a dictionary and adapt it to specific data using data vectors proposing an optimization algorithm for dictionary learning based on stochastic approximations. Aldroubi et al. [50] claim that a collection of subspaces gives the best sparse representation providing an optimized sampling in subspaces union. Rubinstein et al. [51] survey the various options up to the most recent contributions and structures. Cherian et al. [52] propose learning over-complete dictionary models where the signal can have both Gaussian and (sparse) Laplacian noise. Dictionary teaching in this setting leads to a complex nonconvex optimization problem, further exacerbated by large input datasets. Duarte-Carvajalino and Sapiro [53] introduce a framework for the joint design and optimization of the nonparametric dictionary and the sensing matrix. They demonstrate the use of random sensing matrices and those optimized independently of the learning of the dictionary. They complement the classical image datasets, maximizing the size of the sampling data to keep the balance between the sampling data and the information extracted from it. Our problem statement focuses on extracting concepts, methods, rules, and measurements so that, at the end of the process, the original sampling data becomes redundant and need no longer be stored. However, we incorporate an ongoing learning process to keep improving and adjusting the extracted artifacts to natural changes in the sampled mechanism's behavior. Our study concentrates on time-dependent streaming sampling data divided by fixed periods to repeat the analysis process for each period/cycle. We propose a condensed and adjustable representation of the data. Reeves et al. [46] offer an alternative to the subject.

#### **3.2 Introducing the envelope approach**

Assuming periodic data sampling and extraction of logical artifacts at the period level, we analyze the data collected over several periods. We divide the period into time units. For example, we divide it into daily time units for a year. We average the samples collected during each time unit and extract one value representing it. We repeat this process for the period and get a graph illustrating the average values for an intermediate and typical period. We then calculate the envelope around this average. The generated envelope represents the standard range of values such that unanalyzed periods are compared to this envelope. This period is normal if its graph value is entirely within the envelope. If it is totally out of the envelope, it is an exception.

If just sections of the graph are within the envelope, we use an entropy measure to calculate the "distance" of the given period from the standard envelope. Assuming an existing entropy threshold, we can decide whether the period is typical. We apply the same concept at the unit level and determine whether a specific time unit in a period is within the standard. This particular check is relevant to anomaly detection of IoT behavior. **Figure 2** depicts the main blocks of the envelope construction process.

The process has three key elements: an average measure per time unit, the boundaries around the middle chart, and an entropy value representing the distance of an actual chart from the envelope. We propose an optimal intensity of each component to generate a balanced and reliable anomaly detection method. We start by analyzing typical data collected from several time-dependent cycles, determining the average value per time unit, and drawing the boundaries around the average to get the envelope, as described in detail in **Figure 3**.

**Figure 4** describes the anomaly detection process by summing–up the number of cases in the examined chart that exceeds the envelope boundaries and in what direction.

This envelope method is generic and may be used for any application for anomaly detection, such as IoT sensors. In high variations, it can detect damaged or attacked sensors or support automatic instant corrections where abnormal behavior is seen. We may run a backtracking process for ongoing calibration of system parameters. This idea may be used to construct a multi-dimension envelope to comply with dependency among several columns within the same record.

#### **3.3 Experiment**

We accepted detailed Meteorological data about El-Niño (EN) and NonEl-Niño years (NEN) from 1980 to 1998. We took data from the El-Niño years 1982, 1983, 1987, 1988, 1991, and 1992 for the positive envelopes. All other years in the range were Non-El-Niño years. We tested three methods for generating envelopes: (1) minimum over all cycles and maximum over all cycles, (2) average cycle ± standard deviation, and (3) confidence interval (CI). **Figure 5** visually confirms that 1995 is a regular year concerning its temperature spread. The Red and Blue charts represent the envelope's

**Figure 2.** *The process of constructing the optimal envelope.* *Anomaly Detection in IoT: Recent Advances, AI and ML Perspectives and Applications DOI: http://dx.doi.org/10.5772/intechopen.111944*

#### **Figure 3.**

*The process of Constructing the Optimal Envelope.*

upper and lower borders, respectively, while the Green chart represents the temperature in 1995. We realize that most temperatures are within the envelope upper/ lower boundaries, generating a relatively low Entropy, 0.3631, beneath the selected threshold, concluding that 1995 is indeed a NEN year. However, referring to the 1992 and 1988 years, we got 0.4266 and 0.3857 Entropy values above the threshold; hence they are classified as EN years. However, we did not get a precise classification when we applied the ± standard deviation and the confidence interval (CI) methods.

#### **3.4 Summary**

Classification methods have recently gained attention due to rising IoT security issues and threats. In this section, we proposed an envelope construction to classify streams of time-dependent events within a defined data cycle. We discussed three envelope construction options: min–max, standard deviation, and confidence interval (CI). We described an Entropy calculation and a Threshold determination to classify whether a given steam data cycle is abnormal. We used Meteorological data streams to demonstrate our proposal technology's correct classification of daily temperature streams for a year cycle. Several extensions to our proposal include discovering early trends of behavior changes, determining the number of data cycles required for constructing the optimal envelope, exploring the possibility of dividing one cycle into segments associating different envelopes to each segment, and defining rules for anomaly discovery.

**Figure 4.** *Classifying an unclassified Cycle.*

### **4. Anomaly detection using random forest machine learning**

The total transmitted data over the various sensors is growing accordingly. Sensors typically are low in storage, memory, and processing power resources. Data security and privacy are part of this ever-increasing domain's significant concerns and

#### *Anomaly Detection in IoT: Recent Advances, AI and ML Perspectives and Applications DOI: http://dx.doi.org/10.5772/intechopen.111944*

drawbacks. A penetration discovery tool is recommended to predict possible attacks. Machine Training data leads to the definition of good and bad patterns for generating a Lightweight and activation framework comprised of Machine learning rule discovery, threat modeling, and timely reaction to rule violations. The model discovers exceptions and immediately updates the system. Random Forest (RF) is used for anomaly detection and rules generation. We converge IoT groups' resource sharing to build an efficient IoT security framework. IoT networks collect and exchange vast data raising major security issues. To cope with it, we propose a decentralized, layered, distributed, and parallel processing model embedded in the.

The IoT network utilizes the remaining resources to execute the RF method to detect abnormalities. The model supports continued use and is decentralized over time.

The system identifies repeated patterns, while the Machine Learning algorithms discover the geometric, arithmetic, and additive. The patterns are translated into rules to be executed in violation cases. Anan adaptive extension is used to detect changes in generating data and adapt the decision mod to manage suspected situations sel.

The aim is to have a framework with training data collection analyzing it to detect patterns, proportions, etc., and converting it to rules. Combining the collected rules and RF trees is deployed in the IoT devices and network. The rules are executed when data is received from or transmitted to an IoT device. The corresponding action is triggered to cope with the situation if the result is positive or negative.

#### **4.1 Literature review**

Eghbal et al. [54] propose analyzing numerical data and generating fuzzy rules. The algorithm uses some rule-and-data-dependent parameters and a function that modifies the rule evaluation measures to assess the candidate rules effectively. Ref. [55] uses Sugeno integrals. They are qualitative criteria aggregations where it is possible to assign weights to criteria groups. It shows how to extract if-then rules expressing the selection of good situations based on local regulations and evaluations to detect bad conditions [56]. Dealing with converting data into the appropriate layout requires a significant investment in manual reformatting. The paper introduces a synthesis engine to extract structured relational data. It uses examples to synthesize a program utilizing an extraction language that extends regular expressions with geometric constructs. Ref. [57] proposes a fast and compact decision rules algorithm. It works online to learn rule sets. It presents a technique to detect local drifts relying on the rule set modularity. Each rule monitors the evolution of performance metrics to detect concept drift. It provides valuable information about the dynamics of the process generating data, faster adaptation to changes, and generates more compact rule sets [58, 59]. It uses averaging techniques to propose a method in which a previous algorithm for association rules mining specifies the minimum support automatically. It uses fuzzy logic to distribute data in different clusters and then tries to introduce to the user the most appropriate threshold automatically [60]. Suggests a two-stage hybrid model for data classification and rule extraction. The first stage uses a Fuzzy ARTMAP classifier with Q-learning and Genetic Algorithm for rule extraction from QFAM. Given a new data sample, the model can provide a prediction about the target class of the data sample and give a fuzzy if-then rule to explain the forecast. Q-values are applied to reduce the number of prototypes generated by QFAM [61]. Proposes a granular-rules extraction method to simplify a data set into a granular-rule set with unique granular rules [62]. It describes a QAR (Quick Access Recorder) anomaly detection algorithm. The method retains the time characteristics data and strengthens the relationship between the condition and

decision attributes [63]. Describes an approach of data mining with Excel using the XLMiner add-on. It presents an example of mining association rules to illustrate this approach's steps [64]. Introduces an algorithm for choosing which instances to request next in a setting where the learner can access a pool of unlabelled samples and request some labels [65]. It focuses on understanding the stochastic process's role and how it defines a distribution over functions. It presents the simple equations for incorporating training data and examines how to learn the hyper-parameters using the marginal likelihood [66]. Proposes an active learning algorithm that balances such exploration with refining the decision boundary by dynamically adjusting the investigation probability at each step [67]. Offers a multiclass learning model that optimizes informative training compounds to support learning progress. Random Forest (RF) is used to predict quantitative compound activities. The global prediction is made by aggregating the predictions of the ensemble. Y. Brostaux [68] investigates the impact of noise in training data on the RF learning curve.

The reviewed literature focuses on improvements to known rule discovery mechanisms to transform them to become lightweight and able to be executed in a limited resource setting. In most cases, the proposed solution remains general purpose but can run with fewer required resources. Our proposal exploits the unique IoT attributes utilizing it to build a combined comprehensive framework for IoT security.

#### **4.2 Rules generation and deployment process**

The process consists of seven stages. Stage 1 composes training data from the IoT network; Stage 2 uses discovery techniques to extract essential measurements and patterns. Stage 3 consists of generating for each measure and pattern a rule. Stage 4 evaluates the effectiveness of each law against a set of training data. Stage 5 checks the generated rule set's completeness and integrity. Stage 6 simulates the same training data expecting all the designated rules to be executed. Stage 7 deploys the developed regulations set. The system is ready to accept the IoT traffic data in real-time and automatically check it against the rules set. **Figure 6** depicts The seven stages Anomaly Detection Process.

#### **4.3 Extracting simple rules from training data**

Sensor record layout includes record ID, timestamp, and values per attribute. Simple rules, such as if-then, max, min, etc., are extracted directly from the record and its associated workflows.

#### **4.4 Compound and multi-stage rules extraction**

IoT rule engines assume real-time data streaming, instant reasoning, and actuators using Machine Learning extraction of compound rules from the continuous data records. The outcome contains thresholds, measurements, and decision trees that keep expanding, consuming vast storage, memory, storage, and runtime when analyzing the decision tree for the specific rule and tracing the tree path to understand its logic. Complex Event Processing (CEP) engines support matching time-series data patterns from different sources but have downsides in IoT since the logic requires high processing power and much time. We cope with these drawbacks by reducing the number of decision trees and improving the search navigation scope to a reasonable search time. IoT attributes and functionality are used to optimize tree navigation and process sharing. We use the bootstrap aggregation technique, counting the majority vote in the

*Anomaly Detection in IoT: Recent Advances, AI and ML Perspectives and Applications DOI: http://dx.doi.org/10.5772/intechopen.111944*

#### **Figure 6.**

*The anomaly detection process.*

case of decision trees. Many trees reduce the depth and width of each tree and eventually save pruning and analysis time. The algorithm accepts the number of trees, K, and the number of features, F, randomly sampled features for building a decision tree. For extensive and high-dimensional data, a large K is used. Estimating the performance of Random Forest for one core is based on the following parameters: # trees [K], # features [F], # rows [R], and maximum depth [D]. The estimated runtime formula is K\*F2 \*R\*2D. Hence, keeping just the most critical features, lowering the number of records, and keeping the maximum depth low will improve the overall Random Forest performance.

#### **4.5 Experiment and summary**

We use Excel functions and macros to generate compound rules such as pattern recognition for practical purposes. We also ran the Excel Machine Learning extension to create additional rules. We loaded the spreadsheet with 8 years of training data. All IoT devices are interconnected. In each device, we installed RF searching executable and deployed the generated simple rules and the RF trees in each device. We loaded the data by streaming it to the testing environment. Some generated rules do not require real-time reaction, consume processing power and memory space beyond the capacity of a typical sensor, and are executed at cloud processes. To have meaningful testing data, we intentionally added to the El-Niño file abnormal extreme values (e.g., over the maximum or lower than the minimum), wrong correlations, and classification interrupts. We loaded the data by streaming it to the testing environment. The corresponding rules and RF trees instantly detected all anomalies. We did not notice any data flowing interruptions or delays.

This section demonstrates the ability to build a lightweight, simple, and handy framework for anomaly detection, rules extraction, and rules execution given enough training data. We then described accuracy and performance improvements. Based on

the accuracy and performance results, the feasibility and effectiveness of the proposed framework have been empirically proven.

#### **5. Specific examples and case studies of successful anomaly detection**

This section outlines practical and successful anomaly detection examples in various application domains. Most modern hospitals have automated laboratories, such as Chemistry, where all the blood tests are executed by dedicated machinery, which is frequently calibrated at every time interval. The calibration is done according to the manufacturer's instructions. However, some laboratory managers run ongoing anomaly detection demons to ensure real-time control. We got a request to develop an ongoing anomaly detection process that also considers actual historical testing results and incorporates an anomaly detection check that considers the history of the specific population who visited the lab in the past. We collected 3 years of lab results per machine. We ran our envelope construction process and provided a very compressed envelope considering many parameters. As a result, any machinery problem is detected in near real-time, preventing any escape of exceptional results.

Another example is detecting abnormal data streaming sequencing, timing, and frequency from a permanent external resource using a sensor for each sampled attribute. The system listens to the communication line for a while when receiving transmissions from the designated source. The method constructs a multi-dimensional envelope corresponding to each feature based on the collected features, such as timing, interval length, and frequency. The multi-dimensional envelope and a weighted compound entropy measurement provide comprehensive communications anomaly detection.

#### **6. Limitations and practical considerations related to IoT anomaly detection mechanism**

Anomaly detection systems include a preprocessing stage for defining the normal value range where any value within the specified range is designated normal. In contrast, any other value is an exception. For a time-dependent data stream, the standard value range may vary depending on the repeatable cycle, such as season or different repeatable time ranges. Therefore, the correct determination of the repeatable cycle is crucial to the accuracy of the anomaly detection process. Thus, the following vital limitations and vulnerabilities are essential to mention:


*Anomaly Detection in IoT: Recent Advances, AI and ML Perspectives and Applications DOI: http://dx.doi.org/10.5772/intechopen.111944*

#### **7. Conclusion**

This chapter deals with current and future trends in Anomaly detection concepts and technologies for the IoT context. We started with an overview of various IoT applications spread over most functional domains, such as Industry machinery, Health, Smart home, and smart city. Most of the new developments in IoT focus on solutions to the severe security breach caused by interconnecting numerous IoT devices to the Internet. These solutions provide tools for detecting/identifying operations anomalies. Therefore, we allocated Section 2 to cover IoT operation and communications security aspects. Then we elaborated on generating an envelope for anomaly detection for temporal transactions, which are the nature of IoT activity and networks. We finally elaborate on advanced technology for anomaly detection using Random Forest distributed over a network of IoT devices.

IoT keeps evolving and spreading fast everywhere in all functional domains in the modern world. Thus, new developments and recent trends will continue growing, so new chapters will follow.

#### **Author details**

Menachem Domb1 \*, Sujata Joshi2 and Arulmozhi Khn<sup>2</sup>

1 Ashkelon Academy College [AAC], Ashkelon, Israel

2 Symbiosis Institute of Digital and Telecom Management, Symbiosis International (Deemed University), Pune, India

\*Address all correspondence to: dombmnc@edu.aac.ac.il

© 2023 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

### **References**

[1] Chatterjee A, Ahmed BS. IoT anomaly detection methods and applications (survey). Internet of Things. 2022;**19**:100568. DOI: 10.1016/j. iot.2022.100568

[2] Chalapathy R, Chawla S. Deep learning for anomaly detection: A survey. 2019. arXiv:1901.03407 Google Scholar

[3] Himeur Y, Ghanem K, Alsalemi A, Bensaali F, Amira A. Artificial intelligence based anomaly detection of energy consumption in buildings: A review, current trends, and new perspectives. Applied Energy. Elsevier; 2021;**287**:1- 26. Article 116601. DOI: 10.1016/j. apenergy.2021.116601. Available from: https://www.sciencedirect.com/science/ article/pii/S0306261921001409

[4] Talagala PD, Hyndman RJ, Smith-Miles K. Anomaly detection in high-dimensional data. Journal of Computational and Graphical Statistics. 2021;**30**(2):360-374. DOI: 10.1080/10618600.2020.1807997

[5] Yin C, Zhang S, Wang J, Xiong NN. Anomaly detection based on convolutional recurrent auto-encoder for IoT time series. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2022;**52**(1):112-122. DOI: 10.1109/ TSMC.2020.2968516

[6] Sobhani M, Hong T, Martin C. Temperature anomaly detection for electric load forecasting. International Journal of Forecasting. 2020; 36 (2): 324-333. DOI: 10.1016/j. ijforecast.2019.04.022. Available from: https://www.sciencedirect.com/science/ article/pii/S0169207019301633

[7] Asakura T, Yashima W, Suzuki K, Shimotou M. Anomaly detection in

a logistic operating system using the Mahalanobis–Taguchi method. Applied Sciences. Basel, Switzerland: MDPI; 2020;**10**(12):1-25. DOI: 10.3390/ app10124376. Available from: https:// www.mdpi.com/2076-3417/10/12/4376

[8] Huang S, Guo Y, Yang N, Zha S, Liu D, Fang W. A weighted fuzzy C-means clustering method with density peak for anomaly detection in IoT-enabled manufacturing process. Journal of Intelligent Manufacturing. Germany: Springer; 2021;**32**:1845-1861. DOI: 10.1007/s10845-020-01690-y

[9] Yasaei R, Hernandez F, Al Faruque MA. IoT-CAD: Context-aware adaptive anomaly detection in IoT systems through sensor association. In: 2020 IEEE/ACM International Conference on Computer-Aided Design, ICCAD. NY, USA: ACM; 2020. pp. 1-9

[10] Zekry A, Sayed A, Moussa M, Elhabiby M. Anomaly detection using IoT sensor-assisted ConvLSTM models for connected vehicles. In: 2021 IEEE 93rd Vehicular Technology Conference, VTC2021-Spring. New York, USA: IEEE; 2021. pp. 1-6. DOI: 10.1109/ VTC2021-Spring51267.2021.9449086

[11] Wang J, Tang Y, He S, Zhao C, Sharma PK, Alfarraj O, et al. LogEvent2vec: LogEvent-to-vector based anomaly detection for large-scale logs in the Internet of Things. Sensors. Basel, Switzerland: MDPI; 2020;**20**(9):1-27. DOI: 10.3390/ s20092451. Available from: https://www. mdpi.com/1424-8220/20/9/2451

[12] Xu R, Cheng Y, Liu Z, Xie Y, Yang Y. Improved long short-term memory (LSTM) based anomaly detection with concept drift adaptive method for supporting IoT services. Future

*Anomaly Detection in IoT: Recent Advances, AI and ML Perspectives and Applications DOI: http://dx.doi.org/10.5772/intechopen.111944*

Generation Computer Systems. 2020; 112: 228-242. DOI: 10.1016/j. future.2020.05.035. Available from: https://www.sciencedirect.com/science/ article/pii/S0167739X20302235

[13] Tripathi G, Abdul Ahad M, Paiva S. SMS: A secure healthcare model for smart cities. Electronics. Basel, Switzerland: MDPI; 2020;**9**(7):1-18. DOI: 10.3390/ electronics9071135. Available from: https://www.mdpi. com/2079-9292/9/7/1135

[14] Ullah W, Ullah A, Haq IU, Muhammad K, Sajjad M, Baik SW. CNN features with bi-directional LSTM for realtime anomaly detection in surveillance networks. Multimedia Tools and Applications. 2021;**80**(11):16979-16995

[15] Tsogbaatar E, Bhuyan MH, Tanaka Y, Fall D, Gonchigsumlaa K, Elmroth E, et al. Del-IoT: A deep ensemble learning approach to uncover anomalies in IoT, Internet of Things. 2021;**14**:100391. DOI: 10.1016/j.iot.2021.100391. Available from: https://www.sciencedirect.com/ science/article/pii/S2542660521000354

[16] Mirsky Y, Golomb T, Elovici Y. Lightweight collaborative anomaly detection for the IoT using blockchain. Journal of Parallel and Distributed Computing. 2020;**145**:75-97. DOI: 10.1016/ j.jpdc.2020.06.008. Available from: https://www.sciencedirect.com/science/ article/pii/S0743731520303154

[17] An Y, Yu FR, Li J, Chen J, Leung VCM. Edge intelligence (EI)-enabled HTTP anomaly detection framework for the Internet of things (IoT). IEEE Internet of Things Journal. 2021;**8**(5):3554-3566. DOI: 10.1109/JIOT.2020.3024645

[18] Shen M, Tang X, Zhu L, Du X, Guizani M. Privacy-preserving support vector machine training over blockchainbased encrypted IoT data in smart

cities. IEEE Internet of Things Journal. 2019;**6**(5):7702-7712. DOI: 10.1109/ JIOT.2019.2901840

[19] Wan J et al. Software defined industrial IoT in the context of industry 4.0. IEEE Sensors Journal. 2016;**16**(20):7373-7380. DOI: 10.1109/ JSEN.2016.2565621

[20] Lemayian JP, Al-Turjman F. Intelligent IoT communication in smart environments: An overview. In: Artificial Intelligence in IoT. Transactions on Computational Science and Computational Intelligence. Singapore: Springer; 2019. DOI: 10.1007/978-3-030-04110-6\_10

[21] Wang KH, Chen CM, Fang W, Wu TY. A new ultra-lightweight authentication protocol in IoT environment for RFID tags. The Journal of Supercomputing. 2018;**74**(1):65-70. DOI: 10.1007/s11227-017-2105-8

[22] Singh S, Sharma PK, Moon SY, Park JH. Advanced lightweight encryption algorithms for IoT devices: Survey, challenges and solutions. Journal of Ambient Intelligence and Humanized Computing. Germany: Springer; 2017;**18**:1. DOI: 10.1007/ s12652-017-0494-4

[23] Rachit SB, Ragiri PR. Security trends in Internet of Things: A survey. SN Applied Sciences. 2021;**3**(1):1-14. DOI: 10.1007/s42452-021-04156-9

[24] Bembe M, Abu-Mahfouz A, Masonta M, Ngqondi T. A survey on low-power wide area networks for IoT applications. Telecommunication Systems. 2019;**71**(2):249-274. DOI: 10.1007/s11235-019-00557-9

[25] Zhang X, Wen F. A novel anonymous user WSN authentication for Internet of Things. Soft Computing. 2019;**23**(14):5683-5691. DOI: 10.1007/ s00500-018-3226-6

[26] Alshehri MD, Hussain FK. A fuzzy security protocol for trust management in the Internet of things (Fuzzy-IoT). Computing. 2019;**101**(7):791-818. DOI: 10.1007/s00607-018-0685-7

[27] Chen H, Meng C, Shan Z, Fu Z, Bhargava BK. A novel low-rate denial of service attack detection approach in Zigbee wireless sensor network by combining Hilbert-Huang transformation and trust evaluation. IEEE Access. 2019;**7**:32853-32866. DOI: 10.1109/ACCESS.2019.2903816

[28] Gubbi J, Palaniswami M, Buyya R, Marusic S. Internet of Things: A vision, architectural elements, and future directions. Future Generation Computer Systems. 2013;**29**(7):1645-1660. DOI: 10.1016/j.future.2013.01.010

[29] Li S, Da Xu L, Zhao S. 5G Internet of Things: A survey. Journal of Industrial Information Integration. 2018;**10**:1-9. DOI: 10.1016/j.jii.2018.01.005

[30] Arfaoui G et al. A security architecture for 5G networks. IEEE Access. 2018;**6**:22466-22479. DOI: 10.1109/ACCESS.2018.2827419

[31] Mohanty SN et al. An efficient lightweight integrated blockchain (ELIB) model for IoT security and privacy. Future Generation Computer Systems. 2020;**102**:1027-1037. DOI: 10.1016/j. future.2019.09.050

[32] Chatterjee S, Mukherjee R, Ghosh S, Ghosh D, Ghosh S, Mukherjee A. Internet of Things and cognitive radio - Issues and challenges. In: 2017 4th International Conference on Opto-Electronics and Applied Optics (Optronix) 2017. NY, USA: IEEE; 2018. pp. 1-4. DOI: 10.1109/ OPTRONIX.2017.8349993

[33] Fortino G, Russo W, Savaglio C. Simulation of agent-oriented Internet of things systems. In: CEUR Workshop Proc. Vol. 1664. 2016. pp. 8-13

[34] Leloglu E. A review of security concerns in the Internet of Things. Journal of Communications and Computers. 2017;**5**(01):121-136. DOI: 10.4236/jcc.2017.51010

[35] Goyal P, Sahoo AK, Sharma TK. Internet of things: Architecture and enabling technologies. Materials Today: Proceedings. 2019;**34**(January):719-735. DOI: 10.1016/j.matpr.2020.04.678

[36] Soni A, Upadhyay R, Jain A. Internet of Things and Wireless Physical Layer Security: A Survey. In: Computer Communication, Networking and Internet Security: Proceedings of IC3T. Singapore: Springer; 2017. pp. 115-123. DOI: 10.1007/978-981-10-3226-4\_11

[37] Xu H, Sgandurra D, Mayes K, Li P, Wang R. Analyzing the resilience of the Internet of things against physical and proximity attacks. Security, Privacy, and Anonymity in Computation, Communication, and Storage: SpaCCS 2017 International Workshops, Guangzhou, China; Switzerland. In: Proceedings 10. In: Lect. Notes Computer Science. (including Subser. Lect. Notes Bioinformatics), 12-15 December 2017. Switzerland: Springer International Publishing; Vol. 10658 LNCS. 2017. pp. 291-301. DOI: 10.1007/978-3-319-72395-2\_27

[38] Salim MM, Rathore S, Park JH. Distributed denial of service attacks and its defenses in IoT: A survey. Vol. 76(7). US: Springer; 2020. DOI: 10.1007/ s11227-019-02945-z

[39] Stiawan D, Idris MY, Malik RF, Nurmaini S, Alsharif N, Budiarto R. Investigating Brute force attack patterns *Anomaly Detection in IoT: Recent Advances, AI and ML Perspectives and Applications DOI: http://dx.doi.org/10.5772/intechopen.111944*

in IoT network. Journal of Electrical and Computer Engineering. Hindawi; 2019;**2019**:1-14. DOI: 10.1155/2019/4568368

[40] Shen H, Shen J, Khan MK, Lee JH. Efficient RFID authentication using elliptic curve cryptography for the Internet of Things. Wireless Personal Communications. 2017;**96**(4):5253-5266. DOI: 10.1007/s11277-016-3739-1

[41] Om Kumar CU, Sathia Bhama PRK. Detecting and confronting flash attacks from IoT botnets. The Journal of Supercomputing. 2019;**75**(12):8312-8338. DOI: 10.1007/s11227-019-03005-2

[42] Mairal J, Ponce J, Bach F, Sapiro G. Online dictionary learning for sparse coding. In: 26th Annual International Conference on Machine Learning. NY, USA: ACM; 2009. pp. 689-696

[43] Dietterich TG. Machine Learning for Sequential Data, Joint IAPR and Structural and Syntactic Pattern Recognition (SSPR). Germany: Springer; 2002. pp. 15-30

[44] Vlachos M, Freris NM, Kyrillidis A. Compressive mining: Fast and optimal data mining in the compressed domain. The VLDB Journal. 2015;**24**(1):1-24

[45] Sakurada M, Yairi T. Anomaly detection using autoencoders nonlinear dimensional reduction, MLSDA 2014. In: Machine Learning for Sensory Data Analysis. NY, USA: ACM; 2014. pp. 4-11. DOI: 10.1145/2689746.2689747

[46] Reeves G, Liu J, Nath S, Zhao F. Managing massive time series streams with multi-scale compressed trickles. Proceedings of the VLDB Endowment. 2009;**2**(1):97-108

[47] Chilimbi TM, Hirzel M. Dynamic hot data stream prefetching for general purpose programs. In: ACM SIGPLAN

Notices. Vol. 37(5). NY USA: ACM; 2002. pp. 199-209

[48] Lane T, Brodley CE. Temporal sequence learning and data reduction for anomaly detection. ACM TISSEC. 1999;**2**(3):295-331

[49] Kasiviswanathan SP, Melville P, Banerjee A, Sindhwani V. Emerging topic detection using dictionary learning. In: Proceedings of the 20th ACM international conference on Information and knowledge management. NY, USA: ACM; 2011. pp. 745-754

[50] Aldroubi A, Cabrelli C, Molter U. Optimal nonlinear models for sparsity and sampling. Journal of Fourier Analysis and Applications. 2008;**14**(5-6):793-812

[51] Rubinstein R, Bruckstein AM, Elad M. Dictionaries for sparse representation modeling. Proceedings of the IEEE. 2010;**98**(6):1045-1057

[52] Cherian A, Sra S, Papanikolopoulos N. Denoising sparse noise via online dictionary learning. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). NY, USA: IEEE; 2011. pp. 2060-2063

[53] Duarte-Carvajalino JM, Sapiro G. Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization, DTIC Document, Tech. Rep. 2008

[54] Mansoori EG, Zolghadri MJ, Katebi SD. SGERD: A steady-state genetic algorithm for extracting fuzzy classification rules from data. IEEE Transactions of Fuzzy Systems. 2008;**16**(4):1061-1071 ISSN: 1063-6706

[55] Extracting decision rules from qualitative data using Sugeno integral. In: Proceedings of the 13th European Conference, ECSQARU 2015, Compiègne, France. July 2015; Vol. 9161. pp. 14-24. ISBN 978-3-319-20806-0. ISSN 0302-9743

[56] Daniel B, Gulwani S, Hart T, Zorn B. FlashRelate: extracting relational data from semi-structured spreadsheets using examples, Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation. New York: ACM. June 2015; Vol. 50(6). pp. 218-228

[57] Very fast decision rules for classification in data streams, data mining and knowledge discovery. 2015;**29**(1):168-202 ISSN1384-5810

[58] Jafarzadeh H, Torkashvand RR, Asgari C, Amiry A. Provide a new approach for mining fuzzy association rules using apriori algorithm. Indian Journal of Science and Technology. 2015;**8**(S7):127-134 ISSN: 0974-6846

[59] Pourpanaha F, Limb CP, Saleh JM. A hybrid model of fuzzy ARTMAP and genetic algorithm for data classification and rule extraction. Expert Systems with Applications. 2016;**49**(7):4-85

[60] Mashinchi R, Selamat A, Ibrahim S, Krejcar O. Granular-Rule Extraction to Simplify Data. In: Nguyen N, Trawiński B, Kosala R, editors. Intelligent Information and Database Systems. ACIIDS 2015. Lecture Notes in Computer Science. vol. 9012. Germany, Cham: Springer; 2015. pp. 421-429. DOI: 10.1007/978-3-319-15705-4\_41

[61] Yang H, Xiao C, Qiao Y. Study on anomaly detection algorithm of QAR data based on attribute support of rough set. International Journal of Hybrid Information Technology. 2015;**8**(1):371-382 ISSN: 1738-9968

[62] Tang H. A simple approach of data mining in excel. In: 2008 4th International Conference on Wireless Communications, Networking and Mobile Computing, Dalian, China. Piscataway, NJ, USA: IEEExplore; 2008. pp. 1-4. DOI: 10.1109/WiCom.2008.2679

[63] Tong S, Koller D. Support Vector Machine Active Learning with Applications to Text Classification. Journal of Machine Learning Research. NY, USA: Microtom Publishing; 2001;**2**(1):45-66. DOI: 10.1162/153244302760185243

[64] Rasmussen CE. Support Vector Machine Active Learning with Applications to Text Classification. CiteSeerX; 2006

[65] Osugi T, Kim D, Scott S. Balancing Exploration and Exploitation: A New Algorithm for Active Machine Learning. In: 5th IEEE International Conference on Data Mining. NY, USA: IEEE; 2005. pp. 8. DOI: 10.1109/ICDM.2005.33

[66] Lang T, Flachsenberg F, von Luxburg U, Rarey M. Feasibility of active machine learning for multiclass compound classification. 2016. DOI: 10.1021/acs.jcim.5b00332

[67] Trees SB, Jothi Venkataeswaran C. Improving classification accuracy based on random forest model with uncorrelated high performing. International Journal of Computer Applications. 2014;**101**:(13)

[68] Brostaux Y. Random forests and decision trees clasifiers effects of data quality on the learning curve, ibs2006\_poster

#### **Chapter 4**

## Anomaly Detection in Time Series: Current Focus and Future Challenges

*Farrukh Arslan, Aqib Javaid, Muhammad Danish Zaheer Awan and Ebad-ur-Rehman*

#### **Abstract**

Anomaly detection in time series has become an increasingly vital task, with applications such as fraud detection and intrusion monitoring. Tackling this problem requires an array of approaches, including statistical analysis, machine learning, and deep learning. Various techniques have been proposed to cater to the complexity of this problem. However, there are still numerous challenges in the field concerning how best to process high-dimensional and complex data streams in real time. This chapter offers insight into the cutting-edge models for anomaly detection in time series. Several of the models are discussed and their advantages and disadvantages are explored. We also look at new areas of research that are being explored by researchers today as their current focuses and how those new models or techniques are being implemented in them as they try to solve unique problems posed by complex data, high-volume data streams, and a need for real-time processing. These research areas will provide concrete examples of the applications of discussed models. Lastly, we identify some of the current issues and suggest future directions for research concerning anomaly detection systems. We aim to provide readers with a comprehensive picture of what is already out there so they can better understand the space – preparing them for further development within this growing field.

**Keywords:** anomaly detection, anomaly detection in time series, high dimensional data, big data, current focus and future challenges, machine learning, deep learning, forecasting, real time

#### **1. Introduction**

Time series data mining is becoming increasingly important due to the advances in technology which have allowed us to collect and store large amounts of structured temporal data. A wide range of tasks can be performed with this time series data such as classification [1], clustering [2], forecasting [3] and outlier detection [4–6]. Extracting meaningful insights from this data opens up new opportunities for research across diverse areas.

Anomaly detection in time series is a critical task with significant implications in numerous fields, including finance [7], healthcare [8], and security [9]. Identifying and analyzing outliers in time-series data is a critically important operation for obtaining meaningful insights [6]. As described in a seminal paper, there are two types of univariate time-series outliers - type I and type II [10, 11]. Whilst type I outlier events occur individually, type II can be linked to subsequent power shifts. It's essential to have an in-depth understanding of both these outlier types if a meaningful analysis of the data is to be undertaken.

The detection of unusual patterns or events within data streams can provide valuable insights and help identify potentially harmful or fraudulent behavior. However, processing high-dimensional [12, 13] and complex data streams [14] in real-time [15] remains a challenging task. In recent years, many statistical [16], machine learning [17], and deep learning [18] techniques have been proposed to tackle these challenges.

This chapter provides an overview of the current state-of-the-art models for anomaly detection in time series, including their strengths and limitations. We also explore new areas of research as current focuses of researchers in this field, that are being explored by implementing recent techniques, models of machine and deep learning to solve unique challenges posed by high-dimensional and complex data, high-volume data streams, and the need for real-time processing.

One of the primary challenges of anomaly detection in time series is dealing with high-dimensional and complex data. Traditional statistical methods such as ARIMA [19] and exponential smoothing [20] have been used in the past but lack the flexibility to handle complex data with multiple attributes. Newer techniques, including machine learning-based approaches such as isolation forests [21], autoencoder-based methods [22], and deep learning techniques such as LSTM and CNNs [18], have shown promising results in detecting anomalies in high-dimensional and complex data. However, these approaches also have their limitations, including high computational costs and the need for extensive training data [23].

Another challenge in anomaly detection in time series is dealing with high-volume data streams in real-time [24]. Traditional batch processing methods are not suitable for real-time applications where timely detection is critical. As a result, new methods such as online anomaly detection algorithms [25] and sliding window-based [26] approaches have been proposed, which process data in a continuous and efficient manner.

In addition to the current state-of-the-art models, techniques, and areas of research, this chapter also identifies some of the current issues and suggests future directions for research concerning anomaly detection systems. These include the need for more interpretable models, the development of novel unsupervised methods, the integration of domain knowledge, and the need to address the issue of data imbalance.

Sections of this chapter discuss recent algorithms, models separately as well as within current wide areas of research i.e., Forecasting, Real time anomaly detection, Dealing with Big and high dimensional data processing problem, Anomaly detection using Artificial Intelligence (AI), Industrial Control Systems. There will be a literature review of many models and strategies implemented in these areas. All these areas as well as models will be solving challenges as discussed before. Overall, the purpose is to provide readers a literature review about current focuses of researchers and to facilitate further development and research in time series data anomaly detection**.**

#### **2. Big data and high dimensionality**

Modern data sets are increasingly high-volume, high-velocity, and high-variety, making it difficult to identify anomalies with accuracy. Researchers and developers have started to investigate new approaches for coping with this complexity. When the number of features grows exponentially, more data is needed to create accurate models - leading to sparse and isolated data points. Gartner [13] defines big data as a collection of attributes that require cost-effective analytics to generate insight. The key challenges associated with big data are described in the 5 Vs: value, veracity, variety, velocity and volume [27].

The next stated paragraphs will give an overview of how real-time big data processing can be used to detect anomalous events through machine learning algorithms, as well as its current limitations and challenges.

McNeil et al. [28] examined existing tools used to detect malware on mobile devices. It was noted that these methods lacked the capability to incorporate group user profiling, which is necessary to automate behavior-based dynamic analysis for focused malware identification. To overcome this, they demonstrated a scalable architecture called SCREDENT, which allows users to classify, identify, and forecast possible target malwares in real time. While initial evaluation indicated that the approach had promise, further testing failed to demonstrate desired results.

In reference [29], a new architecture was proposed to detect threats in real-time using stream processing and machine learning. This architecture promotes an environment with minimal human oversight, allowing for improved detection of both known and previously unseen cyberattacks in order to hone attack classification and anomaly detection capabilities. However, their results did not benefit from the open KDD dataset as much as expected.

In reference [30], complex network infrastructure with vast logs files became the focus of another approach which looks to assess security logs that contain various device information through data mining and machine learning techniques. This proposed approach is split into two phases: defining/configuring the detection mechanism and then executing it at runtime. Nevertheless, the practical implementation turned out to need more automation due to its high human intervention levels; similarly, its output accuracy was not precise enough.

Recent research has highlighted the use of machine learning models for anomaly detection. But, the inability of inherent system performance to keep up with increasing network traffic is an issue that needs to be addressed. To do so, a novel model utilizing Hadoop, HDFS, MapReduce, cloud and multiple machine learning algorithms were developed. Weka interface was used to assess accuracy and efficiency through naive bayes, decision tree and support vector methods. However, the implementation of cloud infrastructure and real-time data streaming has not been sufficiently discussed as yet in this project [31].

Research paper [32] focuses on anomaly detection in streaming data, and provides a new approach to evaluate online anomaly detection with entropy and Pearson correlation. Big data streaming components like Kafka queues and Spark Streaming are used as a means of ensuring scalability and generality, although some processes which were potentially complicated by long batch processing periods or data limitations were not resolved.

Researchers have proposed a method for anomaly detection in smart grids that uses real-time, minimal energy consumption [33]. The proposed in-memory distributed framework, comprised of Spark Streaming and Lambda System, is viable for

scalable live streaming. However, it did take longer to train the model and there were scheduling issues with real-time tasks.

In reference [34], another framework was presented - one which involves sensor data preprocessing: anomaly detection using principal statistical analysis and Bayesian networks; as well as sensor data redundancy elimination using static or dynamic Bayesian networks (SBNs/DBNs). Included were two algorithms: static sensor data redundancy detection algorithm (SSDRDA) and real-time sensor data redundancy detection algorithm (RSDRDA), both serving to reduce redundant data in either static datasets or real-time scenarios respectively.

Anomaly detection in time series requires modifications in the above framework of approaches for efficiency. Anomaly detection in real-time big data analytics is a promising area of study, particularly when machine learning techniques are incorporated. Advancements in this field are likely to yield high accuracy and efficiency. Thus, the potential benefits of such research cannot be underestimated.

Following points will lay out the foremost research challenges in this field, in order to promote progress.

**Redundancy:** Managing real-time big data from diverse sources is challenging. Current technologies like Hadoop and spark fail to address redundancy, data quality and reliability, cost [35], and storage schema [36]. A new framework is needed to tackle these complexities.

**Computational cost:** Anomaly detection requires multiple techniques, increasing computation cost. Large datasets and high dimensionality cause algorithmic instability and computational expense [23]. Big data and cloud technology enable parallel and distributed processing and reduce computing costs. Cheaper processors and high chips improve system power and data processing in real-time, minimizing computational expenses.

**Nature of Input data:** Input data has instances with binary/categorical or continuous attributes, and can be univariate or multivariate. Anomaly detection algorithm selection depends on data diversity and attribute type [37]. A hybrid framework using unsupervised machine learning algorithms can detect anomalies in different datasets.

**Noise and missing value:** Network sensor streaming data has various types and can produce false alarms due to noise and missing values from high speed [37]. Noise can hide true anomalies [38]. An auto noise cleansing module in the detection framework can remove unnecessary features and handle missing values.

**Parameters Selection:** Optimal parameters for machine learning algorithms are hard to select [39]. Real-time anomaly detector needs to consider multiple and single hyperparameters, which may change over time [40]. Parameter choice affects algorithm performance; eccentricity techniques can reduce selection processes [41].

**Inadequate Architecture:** Organizations need big data architecture for large realtime data. Existing architectures are insufficient. Real-time analytics and application components can create efficient environment [42]. Big data technologies and hybrid machine learning algorithms can solve architectural problems. Scalability for data in motion and at rest is achieved.

**Data visualizations:** Data or reports need effective, visual insights. Anomalies from connected devices can use heat maps, scatter plots, parallel coordinates and node-link graphs for 2D/3D views. 3D interaction needs data understanding and user rotation and zoom [43]. Opensource visualization techniques in frameworks can automatically select techniques for better user experience.

**Heterogeneity of data:** Unstructured data is varied and large, such as emails, faxes, form documents, social media posts, etc. Transcription is expensive. Hybrid Machine Learning algorithms can identify data types quickly and accurately. Complex machine learning models can recognize heterogeneous information sources from unstructured text.

**Accuracy:** Anomaly detection with existing technologies is inaccurate. A hybrid machine learning algorithm can analyze large data from modern applications with low memory and power. Our team combines real time big data technologies with this algorithm for efficient and accurate results.

#### **3. Transformer**

The Transformer architecture, introduced by Vaswani et al. [44], has been widely used in various natural language processing tasks, such as machine translation and text classification. However, its effectiveness in anomaly detection is limited due to the lack of a specific mechanism to capture anomalous patterns. To address the shortcoming of Transformer architecture, researchers have created a variety of modifications to attempt to improve its performance, including the incorporation of an Anomaly-Attention mechanism. One such modification is the Anomaly Transformer, as illustrated in reference [45], which utilizes this mechanism to improve the detection of anomalous patterns in data. Thus, the Anomaly Transformer architecture represents an important advancement in the application of Transformers for anomaly detection.

Recent studies have focused on leveraging Transformer based architectures to benefit the time series anomaly detection task. These approaches are capable of modeling temporal dependencies, enabling better anomaly detection quality [45]. For instance, TranAD [46], MT-RVAE [47] and TransAnomaly [48] all fuse Transformers with VAEs i.e., neural generative models [49] or GANs [50], demonstrating improved performance in anomaly detection. These models have been explained further below.

**TranAD:** TranAD [46] is a robust adversarial training procedure designed to address small deviations in anomalies that the typical Transformer-based network may overlook. This GAN-style approach consists of two transformer encoders and two decoders in order to maintain stability. The results of an ablation study illustrate the performance of this architecture, with F1 scores dropping by nearly 11% when the Transformer-based encoder-decoder was replaced. This clearly demonstrates the efficacy of using Transformers for time series anomaly detection. It's valuable for modern industrial systems where instant detection of anomalies is compulsory.

**MT-RVAE:** There are two different approaches that combine VAE and Transformer to create novel models for time-series analysis. MT-RVAE, proposed by Wang et al. [47], uses a multiscale Transformer model to extract information from sequences of varying scales like complex satellite systems with several subsystems. Each subsystem's temporal features must be analyzed in correlation [47]. This addresses the limitations traditional Transformers had in being able to accurately analyze sequential data as these models were limited to local information extraction only. TransAnomaly, proposed by Zhang et al. [48], combines VAE with transformer for increased parallelization purposes. The combination of these two techniques is predicted to reduce training costs up to 80%.

**GTA:** GTA [51] leverages Transformers and graph-based learning to accurately detect anomalies in multivariate time series data, even when there are few dimensions or limited close relationships among sequences. This method features a multi-branch attention mechanism composed of global-learned attention, regular multi-head attention, and neighborhood convolution for increased accuracy, as well as a graph

convolution structure for modeling influence propagation processes. Thus, GTA seeks to provide an improved approach for analyzing and detecting anomalies in multivariate time series data than previous methods. It's valuable for internet connected sensory devices like smart power grids, water distribution networks as they remain under attack of cyber-attacks [51].

**AnomalyTrans:** AnomalyTrans [45] is a novel approach in distinguishing anomalies. Drawing inspiration from TranAD, AnomalyTrans makes it more difficult for anomalies to create strong connections with the entire time series, though retaining connectivity between adjacent time points. The model leverages Transformer and Gaussian prior-association to reach this objective. Through utilizing a minimax strategy to optimize the anomaly model, AnomalyTrans enforces restrictions on prior- and series-associations that result in a greater divergence between them.

**D3TN:** Disentangled Dynamic Deviation Transformer Network (D3TN) [52] is highly effective system for multivariate time series anomaly detection. It considers both short-term and long-term temporal dependencies as well as complex intersensor dependencies. To better model static topology, a new disentangled multi-scale aggregation scheme for graph convolutional neurons for fixed inter-sensor relationships was introduced. A self-attention mechanism was also employed to capture dynamic directed interaction in various subspaces that vary with time and unexpected events. Moreover, parallel processing of the time series helps simulate complex temporal correlations that span multiple time periods.

**DATN:** The Decompositional Auto-Transformer Network (DATN) [53] is a unique anomaly detection method for time series. This novel approach breaks complex time series into seasonal and trend components, before then renovating them with deep models. Additionally, the design integrates an auto-transfomer block to detect important representations and dependencies based on seasonality and trends in the series. Furthermore, rather than using a traditional complex transformer decoder, we substitute it with a more efficient linear decoder.

Transformers have been applied in many real word scenarios for anomaly detection in time series like: SMD is a 5-week-long dataset acquired from one of the leading Internet companies with 38 characteristics. Pooled Server Metrics (PSM) was procured internally from multiple server nodes at eBay and consists of 26 variables. Besides these, both Mars Science Laboratory rover (MSL) and Soil Moisture Active Passive (SMAP) satellite datasets from NASA have been compiled as well, containing 55 and 25 features respectively with regard to the anomaly data derived from the Incident Surprise Anomaly (ISA) reports for spacecraft monitoring systems. Last but not least, the Secure Water Treatment (SWaT) dataset contains 51 indicators derived from continuous operation on a critical infrastructure system [45, 53].

The possible future challenges are indicated below in paragraphs.

**Inductive Biases for Time Series Transformers:** Transformers are powerful, general networks for modeling long-range dependencies. But they require quite a bit of data to train effectively and avoid falling prey to data overfitting. Time series data often follows seasonal or periodic patterns, as well as other trends, which suggests that incorporating this information into Transformers has potential to lead improvement in performance. For instance, recent studies have demonstrated the effectiveness of frequency processing [54] and capturing series periodicity [55]. Additionally, both explicitly allowing cross-channel dependency [56] and preventing it via channel-independent attention module [57] have yielded better models for certain tasks. The challenge then lies in finding a balance between designing inductive bias

#### *Anomaly Detection in Time Series: Current Focus and Future Challenges DOI: http://dx.doi.org/10.5772/intechopen.111886*

to suppress noise while amplifying signal — a task whose solution is yet to come but promises exciting possibilities ahead.

**Transformers and GNN for Time Series:** As datasets with multi-dimensional and spatial–temporal characteristics become more widespread, it's essential to have tools which can effectively capture the complexities that these data represent. Graph Neural Networks (GNNs) is one method of modeling dependencies and relations with each other between dimensions. Recent studies have shown that combining GNNs with Transformers/Attentions leads to impressive performance improvement in areas such as traffic forecasting [58, 59] and multi-modal forecasting [60, 61], knowledge of latent causality and the underlying clarity of spatial–temporal performance can be increased with a greater comprehension. It is an important development that could result in more effective use of Transformer-GNN hybrid models for spatial–temporal modeling in time series going forward.

**Pre-trained Transformers:** As the advances of large-scale Transformers using pre-training have yielded observable improvements across a wide range of natural language processing tasks [62, 63] and CV [64], research conducted on their efficacy for time series applications has been limited. Works existing to this day primarily focus on classification activities [65, 66]. In order to develop effective pre-trained Transformer models that are equipped to address a range of use cases within time series analysis, further examination will be required in the future.

**Architecture Level Variants:** Considering the success of Transformer variants in NLP and CV, it may be beneficial to transfer this concept over to time series data and tasks. We can look into more architecture-level designs for Transformers which may optimize performance on time series specific models. Examples of these variants include lightweight [67, 68], cross-block connectivity [69], adaptive computation time [70, 71], and recurrence [72]. These architecture-level designs provide us with a whole new range of opportunities for improvement.

**Transformers with Neural Architecture Search:** Tuning Transformer hyperparameters such as embedding dimension, heads, and layers can have a significant impact on performance. Thankfully, Neural Architecture Search (NAS) provides a means to automatically find architectures that optimize performance. Recently, NAS technologies in NLP and CV have been applied to transformers [73, 74]. For machine data which may be high dimensional yet long in length, this technique is especially important for designing memory- and computationally-efficient transformers. We anticipate further progress in this area as the industry gears up for more efficient time series Transformers.

#### **4. Non-pattern anomaly detection**

Non-Pattern Anomaly Detection is an underdiagnosed but powerful method of identifying anomalies in time series. Existing techniques use initial profiling to determine which behavior should be tagged as "normal" or "abnormal," but this definition fails to capture the nuanced changes between situations in different conditions. Researchers recognized the importance of such a technique and emphasized its potential for detecting abnormalities even in the absence of statistical methods that often play a dominant role in machine learning processes. Team of researchers aimed to compare current machine learning algorithms relating to NP-AD approaches and assessed how various datasets demonstrated their capacity for anomalies on diverse situations [75].

#### **5. Hybrid models**

Multivariate time-series anomaly detection is a complex challenge due to the imbalance of anomalous data and its underlying intricacies. Combining different methods for detecting anomalies in time series has been well explored, resulting in improved accuracy. Notably, hybrid models combining statistical and deep learning approaches have been found to provide greater precision when determining uncertainty and quantifying forecasts associated with these models.

One example hybrid approach is called HAD-MDGAT – it's based on a GAT (graph attention network) combined with multi-channel temporal stacked Denoising Autoencoder (MDA), designed to learn temporal and spatial correlations among observations. Ablation study results show that MDA enhances anomaly detection accuracy dramatically; this model with an MDA layer scored 10.86% higher than one without the extra layer [76].

A research paper published in reference [77] outlined a novel Long Short-Term Memory (LSTM) network-based method for accurately forecasting multivariate time series data. In addition, the study featured an LSTM Autoencoder networkbased approach coupled with a one-class Support Vector Machine (SVM) algorithm, which was employed for anomaly detection. Their findings demonstrated that the LSTM Autoencoder based method outperforms the previously proposed LSTM based method. Moreover, their proposed forecast approach surpassed several other methods by NASA. The LSTM based methodology is well suited to forecasting while the combination of an LSTM Autoencoder with the OCSVM is suitable for detecting anomalies [77].

MES-LSTM is a combination of a multivariate forecasting model and Long Short-Term Memory, a form of Recurrent Neural Network (RNN). Accurate attribution is an important part of any system as it reinforces confidence in the mechanics and makes sure learning processes are not based on spurious effects. While MES-LSTM does a great job of anomaly detection, overall performance could still benefit from improvement [78].

A hybrid deep-learning model that integrates long short-term memory (LSTM) and autoencoder (AE) networks was proposed for anomaly detection tasks in Indoor Air Quality (IAQ ) time series data. The LSTM cells are stacked together to learn the long-term dependencies in time-series data, while the AE helps identify an optimal threshold based on reconstruction loss rates across all sequences. This powerful combination helps detect outliers with precision and efficiency [79].

A SeqVAE-CNN model to carry out unsupervised deep learning for anomaly detection. This model takes inspiration from Variational Autoencoders (VAEs) and Convolutional Neural Networks (CNNs), creating a Seq2Seq structure that can capture both temporal relationships and spatial features in multivariate time-series data. The experimental results of their model on 8 datasets from different domains suggest it has a higher performance for anomaly detection; indeed, the highest AUROC and F1 scores have been observed when using our model [80].

Researchers propose a hybrid model of VAE-LSTM for unsupervised anomaly detection in time series. This model combines the features extracted from the VAE module [81], which capture local patterns for short windows, with the Long Short-Term Memory (LSTM) module, which captures long-term correlations in the time series. Additionally, Electrical power grids are vulnerable to cyber-attacks, existing attack detection methods are limited so to tackle Graph Convolutional Long Short-Term Memory (GC-LSTM) with a deep convolution network has been proposed to

*Anomaly Detection in Time Series: Current Focus and Future Challenges DOI: http://dx.doi.org/10.5772/intechopen.111886*

further improve time series classification and analysis with respect to anomaly detection and attack graph models [82].

A new hybrid anomaly detector that merges two detection approaches i.e., Key Performance Indicators (KPIs) that are used in physics and Unsupervised Variational Autoencoder (VAE), thereby improving accuracy and decreasing the possibility of overlooking defective elements in safety-critical scenarios. Performance is discussed in comparison to different VAE architectures like long short-term memory (LSTM-VAE) and bidirectional LSTM (BiLSTM-VAE). Additionally, the efficient choice of hyperparameters in these structures can be optimized with the help of a genetic algorithm as presented in reference [83].

Due to the many advantages of conventional anomaly detection in time series models, further innovations in this area have the potential to yield beneficial results. In fact, tackling the complexities associated with real-world time series requires advanced solutions, such as hybridization of hybrid classes. Research shows that this technique can provide great improvements in terms of forecasting accuracy and has been gaining much attention recently.

#### **6. Forecasting and anomaly**

Time Series Forecasting has always been a useful tool to detect trends, patterns of any data. It's about predicting the next time stamps using previous or existing trends. Anomaly Detection and Time series forecasting have been interlinked several times by researchers in this field. Several Machine Learning algorithms have been implemented, sometimes merged with each other to derive another novel strategy to predict whether the next time stamp is normal or abnormal.

The power of forecasting lies in its potential to revolutionize healthcare. The goal? To empower medical professionals to take proactive and timely action, reducing patient transfers and hospital stay lengths, ultimately leading to improved survival rates. But the accuracy of predictions relies heavily on expertly combining machine learning algorithms like autoencoders and extreme gradient boosting (XGBoost) [84].

Autoencoders excel in feature extraction [85]. They are uniquely adept at unsupervised anomaly detection when labeled data is scarce or nonexistent [86]. They're trained via reconstruction error, only triggering an alert if said error exceeds a predetermined threshold - prompting a swift remedial response. As for XGBoost, this decision tree-based ensemble principle takes physiological variables from time ti as input and outputs variables from the next temporal unit; ti+1 [84].

All told, tapping into modern technology's full potential could allow for massive improvements in healthcare outcomes - starting with careful utilization of solutions like autoencoders or XGBoost models.

Recent research has sought to compare the performance of supervised and unsupervised algorithms on physiological data. Heart rate data, due to its ubiquity and non-invasiveness, is ideal for predicting anomalies. Five algorithms were evaluated for detecting anomalies in heart rate -- two unsupervised techniques and three supervised methods. The models were tested on real heart rate data and findings demonstrated that both local outlier factor and random forests algorithms were effective in detecting abnormalities in this type of data. Additionally, results showed that simulated data can lead algorithms to a similar level as real labeled information when not available, enabling rapid initial deployment without prior knowledge [8].

DeepAnT is a deep learning-based anomaly detection approach for streaming and non-streaming time series data. It can detect a broad range of anomalies, from point anomalies to contextual and discords. Instead of learning about anomalies, DeepAnT uses unlabeled data to determine normal time series. The two key components of DeepAnT are its time series predictor (which uses CNN and takes context into account) and its anomaly detector module, which identifies whether an upcoming time stamp is normal or anomalous.

DeepAnT stands out against the competition by only needing a relatively small data set to generate a model. It utilizes parameter sharing of a convolutional neural network (CNN) which allows for good generalization capabilities. Unsupervised anomaly detection in DeepAnT removes the need for labeling, making it directly applicable to real-world scenarios with large streams of complex data from heterogeneous sensors. Neural networks are popular as they enable automatic feature discovery without having any prior domain knowledge; this capability is what makes them such excellent candidates for time series anomaly detection. DeepAnT optimizes through leveraging a CNN and raw data, making it more robust to variations than many other neural networks and statistical models on the market [87]. Using a data-driven approach can be beneficial in many contexts, especially when there is access to an abundance of untagged data. However, the data quality has a great impact factor on its accuracy; if too much of the dataset is contaminated (5% or more), then it could potentially lead to wrong inferences upon deployment. Additionally, selecting the right network architecture and hyperparameters are often difficult tasks. Nevertheless, new automated techniques have been developed that may assist in optimizing these settings instead of opting for human expertise [88]. Last but not least, one major drawback is the susceptibility to adversarial examples [89] which could restrict its usage in safety-critical system models. Luckily though, research into understanding and defending against such cases has increased progressively over time with some successful results achieved.

Light curve prediction and anomaly detection using LSTM neural networks is an important research area for time domain astronomy. A series of processing was done on star images collected from the National Astronomical Observatories of China using GWAC's mini-GWAC system, resulting in light luminance data over a period of time. Researchers explored a model of LSTM neural network to accurately predict light curves, with an optimal structure obtained through model training and validation; meanwhile, an anomaly detection mechanism based on prediction error was implemented. Results showed that this method has great potential when tested on real light curve data [90]. More historical data and certain well-known astronomical principles are needed to further improve upon this method.

Motorsports have limited access to sensors during competitions, limiting predictive capabilities and providing an edge for competitors. The proposed variational autoencoder-based selective prediction (VASP) framework addresses this challenge by combining the tasks of anomaly detection and time series prediction in one powerful approach. VASP consists of a variational autoencoder (VAE), an anomaly detector, and LSTM predictors which can all work together to help produce more robust predictions. Even if anomalies occur in the input signals, VASPs accuracy is not significantly impacted like that of other deep learning approaches such as long shortterm memory (LSTM) neural networks. Try out VASP today to take your predictive insights to the next level with more effective technique [91].

#### **7. Anomaly detection using AI**

Time series data bring their own set of challenges when model analysis is applied, like notions of time and uncertainty, and the presence of drift. Typically, the time series window is broken down into two pieces with either sliding endpoints or landmark endpoints. In this paper, they categorize anomalies and outliers as the same, as presented in reference [92]. Detecting these outliers has been and remains an area of exploration for researchers and practitioners alike. Time series data is one of the most useful modalities available for a variety of applications. Upon analyzing this type of data, it becomes clear that outlier detection plays a key role. Companies such as Microsoft [93] have even created outlier detection services to monitor business data with triggers to alert them when outliers are present.

As stated in reference [94], AI assurance is an important process that must be incorporated throughout the engineering lifecycle of an AI system. This process should ensure the system is dependable and its outcomes valid, trustworthy, and ethical. Moreover, it should also be data-driven, explainable to all users, unbiased in its learning processes, and fair for all involved.

One of the recent hot algorithm in AI for Anomaly detection in time series is GAN proposed by Goodfellow [50], have become some of the most discussed topics in deep learning. The use of a generator in GANs helps to generate expected normal behavior, while a discriminator can distinguish between "normal" and "abnormal" behaviors. GAN technology has led to exciting new developments in deep learning. Generative adversarial networks (GAN) are an innovative form of AI that offer a powerful solution to the generative modeling problem. GAN is composed of two models - a generator used to create normal behavior and a discriminator used to distinguish between normal and abnormal behaviors. When dealing with imbalanced industrial time series data, GAN can be applied to derive an anomaly detection architecture that outperforms classic algorithms and other deep learning models such as big-GAN, ANOGAN and DBN [95]. The attached article further elaborates on the inner workings of GANs and their core design considerations. Additionally, drawing from research conducted by Li et al. [96], this architecture can feature a dynamic threshold generated by the discriminator which serves as a predictive warning for system failures or anomalies. GAN based approach is used to diagnose faults by generating much higher anomaly scores when a fault sample is fed into the trained model [95].

Moving on to next one GTAD, researchers have developed a new anomaly detection algorithm for multivariate time series, called Graph Attention Network and Temporal Convolutional Network for Multivariate Time Series Anomaly Detection (GTAD). This algorithm takes into account the correlation and temporal dependencies that many other existing algorithms fail to address. GTAD promises to provide better results when it comes to spotting anomalies in complex data sets. GTAD is an unsupervised approach powered by graph attention networks and temporal convolutional networks [97].

TadGAN is a breakthrough in unsupervised anomaly detection that makes use of Generative Adversarial Networks (GANs). At the core of the system are Long Short-Term Memory (LSTM) Recurrent Neural Networks, which provide an excellent base model for creating Generators and Critics. TadGAN is unique in its ability to capture temporal correlations with cycle consistency loss for more accurate time-series data reconstruction [98].

**Future concerns:** Combining information between different dimensions of multivariate time series is a key focus of future work [95] in AI algorithms. When it comes to GAN-based anomaly detection models, there can be difficulties in determining

the right sliding window length and maintaining stability during training. Further research is needed in order to more effectively train GANs [50].

#### **7.1 AI based toolkits for automated anomaly detectors**

**TODS:** TODS is a comprehensive automated Time Series Outlier Detection System with a modular design that enables easy construction of pipelines. It includes a range of primitives for data processing, time series analysis, feature analysis, detection algorithms and reinforcement methods. This makes TODS suitable for both research and industrial applications [99, 100].

**ANOVIZ:** ANOVIZ is an innovative anomaly detection solution for multivariate time series. It provides you with accurate detections, as well as easy-to-use visualizations and user interfaces to promote better explanation and assessment of the quality of those detections [101].

**AnomalyKiTS:** AnomalyKiTS is a system that allows end users to detect anomalies in time series data. It provides a range of algorithms, as well as an enrichment module to label identified anomalies. AnomalyKiTS offers four categories of model building capabilities, enabling users to select the best option for their needs [102].

**TranAD:** TranAD is an advanced model designed to provide superior recognition and diagnosis results. Our proprietary focus score-based self-conditioning and adversarial training technology extract multi-modal features, while MAML ensures quick and efficient on-the-fly training with minimal data. With TranAD, you get the best of both worlds: powerful detection capabilities and superior performance. TranAD has been proven to outperform existing baseline methods [46]. There is a range of data sizes, formats, and anomalies to consider when deciding which anomaly detection toolkit to use.

**Future Scrutinizes:** To maintain the quality of pipeline discovery system, researchers are planning on adding more primitives in the future as well as improved integral searchers to ensure optimal performance. To incorporate predefined rules efficiently into pipelines, researchers should also aim to develop learning-based active learning techniques for our reinforcement module. Existing solutions may not be comprehensive enough for certain applications, such as scenarios where semisupervised or prediction-based unsupervised anomaly methods are needed.

The current research focused on using machine learning techniques to detect anomalies before they arise in future forecasting, leveraging stacked and bidirectional LSTM. The analysis produced promising results as noted in reference [103], validating the use of such models for anomaly detection. The review of AI-based energy monitoring and anomaly detection commercial solutions for buildings [104] provides an overview of the available systems. Efficient predictive maintenance of equipment in various industries requires the detection of anomalies in time-varying multivariate data. Researchers presented MTV (Multivariate Time Series Visualization), a visual analytics system that helps to streamline collaboration between humans and AI, facilitating the most ideal workflow [105].

#### **8. Conclusion**

In conclusion, this chapter has provided an insight into the cutting-edge models for anomaly detection in time series, discussed their merits and pitfalls, and highlighted new areas of research that are being explored to solve unique problems posed

#### *Anomaly Detection in Time Series: Current Focus and Future Challenges DOI: http://dx.doi.org/10.5772/intechopen.111886*

by high-dimensional and complex data, high-volume data streams, and a need for real-time processing. These research areas have provided concrete examples of the applications of discussed models. Moreover, citations will help readers about how these models can be used in real world scenarios. We have also identified some of the current issues and suggested future directions for research concerning anomaly detection systems.

As the field of anomaly detection in time series continues to evolve and new challenges arise, it is crucial that researchers remain focused on developing innovative solutions that can effectively process high-dimensional and complex data in real-time. By better understanding the existing state-of-the-art models and the challenges that still need to be addressed, researchers can identify new opportunities for developing more effective anomaly detection systems.

We have not explored all the current algorithms, models, and new research areas. This chapter has just provided readers an overview of some current techniques and areas so that they can identify what is going on exactly in this field. Interested readers will definitely go out and do some more research about it and prepare themselves for further development within this growing field.

#### **Acknowledgements**

The authors acknowledge the editor of the book for his support throughout the writing process.

#### **Conflict of interest**

The authors declare no conflict of interest.

#### **Notes/thanks/other declarations**

The authors would like to thank the Editor of the book and the publisher for giving them a valuable opportunity to prepare a book chapter.

#### **Author details**

Farrukh Arslan\*, Aqib Javaid, Muhammad Danish Zaheer Awan and Ebad-ur-Rehman Department of Electrical Engineering, University of Engineering and Technology, Lahore, Pakistan

\*Address all correspondence to: farrukh\_arslan@uet.edu.pk

© 2023 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

*Anomaly Detection in Time Series: Current Focus and Future Challenges DOI: http://dx.doi.org/10.5772/intechopen.111886*

#### **References**

[1] Sen PC, Hajra M, Ghosh M. Supervised classification algorithms in machine learning: A survey and review. In: Advances in Intelligent Systems and Computing. Singapore: Springer Singapore; 2020. pp. 99-111

[2] Ezugwu AE, Ikotun AM, Oyelade OO, Abualigah L, Agushaka JO, Eke CI, et al. A comprehensive survey of clustering algorithms: State-of-theart machine learning applications, taxonomy, challenges, and future research prospects. Engineering Applications of Artificial Intelligence. 2022;**110**(104743):104743

[3] Petropoulos F, Apiletti D, Assimakopoulos V, Babai MZ, Barrow DK, Ben Taieb S, et al. Forecasting: Theory and practice. International Journal of Forecasting. 2022;**38**(3):705-871

[4] Ratanamahatana CA, Lin J, Gunopulos D, Keogh E, Vlachos M, Das G. Mining time series data. In: Data Mining and Knowledge Discovery Handbook. Boston: Springer; 2009. pp. 1049-1077

[5] Fu T-C. A review on time series data mining. Engineering Applications of Artificial Intelligence. 2011;**24**(1):164-181

[6] Esling P, Agon C. Time-series data mining. ACM Computing Surveys. 2012;**45**(1):1-34

[7] Hilal W, Gadsden SA, Yawney J. Financial fraud: A review of anomaly detection techniques and recent advances. Expert Systems with Applications. 2022;**193**(116429):116429

[8] Šabić E, Keeley D, Henderson B, Nannemann S. Healthcare and anomaly detection: Using machine learning to predict anomalies in heart rate data. AI & Society. 2021;**36**(1):149-158

[9] Sharma B, Sharma L, Lal C. Anomaly detection techniques using deep learning in IoT: A survey. In: 2019 International Conference on Computational Intelligence and Knowledge Economy (ICCIKE). Dubai, United Arab Emirates: IEEE; 2019. pp. 146-149

[10] Gupta M, Gao J, Aggarwal CC, Han J. Outlier detection for temporal data: A survey. IEEE Transactions on Knowledge and Data Engineering. 2014;**26**(9):2250-2267

[11] Fox AJ. Outliers in time series. Journal of the Royal Statistical Society. 1972;**34**(3):350-363

[12] Bommert A, Sun X, Bischl B, Rahnenführer J, Lang M. Benchmark for filter methods for feature selection in high-dimensional classification data. Computational Statistics and Data Analysis. 2020;**143**(106839):106839

[13] Big data basics for digital marketers [Internet]. Gartner. [cited 28 April 2023]. Available from: https://www.gartner. com/en/marketing/insights/articles/ big-data-basics-for-digital-marketers

[14] Blázquez-García A, Conde A, Mori U, Lozano JA. A review on outlier/anomaly detection in time series data. ACM Computing Surveys. 2022;**54**(3):1-33

[15] Ahmad S, Purdy S. Real-time anomaly detection for streaming analytics. arXiv [cs.AI]. 2016

[16] Barbariol T, Chiara FD, Marcato D, Susto GA. A review of tree-based approaches for anomaly detection. In: Springer Series in Reliability Engineering. Cham: Springer International Publishing; 2022. pp. 149-185

[17] Nassif AB, Talib MA, Nasir Q, Dakalbab FM. Machine learning for anomaly detection: A systematic review. IEEE Access. 2021;**9**:78658-78700

[18] Schmidl S, Wenig P, Papenbrock T. Anomaly detection in time series: A comprehensive evaluation. Proceedings VLDB Endowment. 2022;**15**(9):1779-1797

[19] Kozitsin V, Katser I, Lakontsev D. Online forecasting and anomaly detection based on the ARIMA model. Applied Sciences (Basel). 2021;**11**(7):3194

[20] Tang H, Wang Q, Jiang G. Time series anomaly detection model based on multifeatures. Computational Intelligence and Neuroscience. 2022;**2022**:2371549

[21] Xu H, Pang G, Wang Y, Wang Y. Deep isolation forest for anomaly detection. arXiv [cs.LG]. 2022

[22] Thill M, Konen W, Wang H, Bäck T. Temporal convolutional autoencoder for unsupervised anomaly detection in time series. Applied Soft Computing. 2021;**112**(107751):107751

[23] Fan J, Han F, Liu H. Challenges of big data analysis. National Science Review. 2014;**1**(2):293-314

[24] Toledano M, Cohen I, Ben-Simhon Y, Tadeski I. Real-time anomaly detection system for time series at scale. In: Anandakrishnan A, Kumar S, Statnikov A, Faruquie T, Xu D, editors. Proceedings of the KDD 2017: Workshop on Anomaly Detection in Finance. PMLR; 2018. pp. 56-65

[25] Mason A, Zhao Y, He H, Gompelman R, Mandava S. Online anomaly detection of time series at scale. In: 2019 International Conference on Cyber Situational Awareness, Data Analytics and Assessment (Cyber SA). Oxford, UK: IEEE; 2019. pp. 1-8

[26] Ranjan KG, Tripathy DS, Prusty BR, Jena D. An improved sliding window prediction-based outlier detection and correction for volatile time-series. International Journal of Numerical Modelling. 2021;**34**(1):e2816

[27] Zhai Y, Ong Y-S, Tsang IW. The emerging "big dimensionality". IEEE Computational Intelligence Magazine. 2014;**9**(3):14-26

[28] McNeil P, Shetty S, Guntu D, Barve G. SCREDENT: Scalable real-time anomalies detection and notification of targeted malware in mobile devices. Procedia Computer Science. 2016;**83**:1219-1225

[29] Lopez MA, Gonzalez Pastana Lobato A, Duarte OCMB, Pujolle G. An evaluation of a virtual network function for real-time threat detection using stream processing. In: 2018 Fourth International Conference on Mobile and Secure Services (MobiSecServ), Miami Beach, FL, USA; 2018. pp. 1-5. DOI: 10.1109/ MOBISECSERV.2018.8311440

[30] Goncalves D, Bota J, Correia M. Big data analytics for detecting host misbehavior in large logs. In: 2015 IEEE Trustcom/BigDataSE/ISPA. Helsinki, Finland: IEEE; 2015

[31] Cui B, He S. Anomaly detection model based on Hadoop platform and weka interface. In: 2016 10th International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS). Fukuoka, Japan: IEEE; 2016. pp. 84-89

[32] Rettig L, Khayati M, Cudré-Mauroux P, Piorkówski M. Online Anomaly

*Anomaly Detection in Time Series: Current Focus and Future Challenges DOI: http://dx.doi.org/10.5772/intechopen.111886*

Detection over Big Data Streams. In: Braschler M, Stadelmann T, Stockinger K, editors. Applied Data Science. Cham: Springer; 2019. DOI: 10.1007/978-3-030- 11821-1\_16

[33] Liu X, Nielsen PH. Regression-Based Online Anomaly Detection for Smart Grid Data. arXiv (Cornell University); 2016

[34] Xie S, Chen Z. Anomaly detection and redundancy elimination of big sensor data in Internet of things [Internet]. arXiv [cs.DC]. 2017

[35] Bhadani AK, Jothimani D. Big data: Challenges, opportunities, and realities. In: Effective Big Data Management and Opportunities for Implementation. IGI Global; 2016. pp. 1-24

[36] Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Ullah KS. The rise of "big data" on cloud computing: Review and open research issues. Information Systems. 2015;**47**:98-115

[37] Chandola V, Banerjee A, Kumar V. Anomaly detection. ACM Computing Surveys. 2009;**41**(3):1-58

[38] Erfani SM, Rajasegarar S, Karunasekera S, Leckie C. Highdimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognition. 2016;**58**:121-134

[39] Mirsky Y, Shabtai A, Shapira B, Elovici Y, Rokach L. Anomaly detection for smartphone data streams. Pervasive and Mobile Computing. 2017;**35**:83-107

[40] Sarker RA, Elsayed SM, Ray T. Differential evolution with dynamic parameters selection for optimization problems. IEEE Transactions on Evolutionary Computation. 2014;**18**(5):689-707

[41] Akoglu L, Tong H, Koutra D. Graph-based anomaly detection and description: A survey [Internet]. arXiv [cs.SI]. 2014

[42] Katal A, Wazid M, Goudar RH. Big data: Issues, challenges, tools and good practices. In: 2013 Sixth International Conference on Contemporary Computing (IC3). Noida, India: IEEE; 2013. pp. 404-409

[43] Shiravi H, Shiravi A, Ghorbani AA. A survey of visualization systems for network security. IEEE Transactions on Visualization and Computer Graphics. 2012;**18**(8):1313-1329

[44] Vaswani A, Shazeer NM, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention Is all you Need. NIPS [Internet]; 2017 Available from: https://www. semanticscholar.org/paper/Attention-is-All-you-Need-Vaswani-Shazeer/204e3073 870fae3d05bcbc2f6a8e263d9b72e776

[45] Xu J, Wu H, Wang J, Long M. Anomaly Transformer: Time series anomaly detection with Association Discrepancy [Internet]. arXiv [cs.LG]. 2021 [cited 28 April 2023]. Available from: http://arxiv.org/abs/2110.02642

[46] Tuli S, Casale G, Jennings NR. TranAD: Deep transformer networks for anomaly detection in multivariate time series data [Internet]. arXiv [cs. LG]. 2022 [cited 28 April 2023]. Available from: http://arxiv.org/abs/2201.07284

[47] Wang X, Pi D, Zhang X, Liu H, Guo C. Variational transformer-based anomaly detection approach for multivariate time series. Measurement (Lond) [Internet]. 2022;**191**(110791):110791 Available from: https://www.sciencedirect.com/science/ article/pii/S0263224122000914

[48] Zhang H, Xia Y, Yan T, Liu G. Unsupervised anomaly detection in multivariate time series through transformer-based variational autoencoder. In: 2021 33rd Chinese Control and Decision Conference (CCDC). Kunming, China: IEEE; 2021. pp. 281-286

[49] Kingma DP, Welling M. Auto-Encoding Variational Bayes [Internet]. arXiv [stat.ML]. 2013 [cited 28 April 2023]. Available from: http://arxiv.org/ abs/1312.6114

[50] Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative Adversarial Networks [Internet]. arXiv [stat.ML]. 2014 [cited 28 April 2023]. Available from: http:// arxiv.org/abs/1406.2661

[51] Chen Z, Chen D, Zhang X, Yuan Z, Cheng X. Learning graph structures with Transformer for multivariate time series anomaly detection in IoT [Internet]. arXiv [cs.LG]. 2021 [cited 28 April 2023]. Available from: http://arxiv.org/ abs/2104.03466

[52] Wang C, Xing S, Gao R, Yan L, Xiong N, Wang R. Disentangled dynamic deviation transformer networks for multivariate time series anomaly detection. Sensors (Basel) [Internet]. 2023 [cited 28 April 2023];23(3):1104. Available from: https://www.mdpi. com/1424-8220/23/3/1104

[53] Wu B, Fang C, Yao Z, Tu Y, Chen Y. Decompose auto-transformer time series anomaly detection for network management. Electronics [Internet]. 2023;**12**(2):354. DOI: 10.3390/ electronics12020354

[54] Zhou T, Ma Z, Wen Q, Wang X, Sun L, Jin R. FEDformer: Frequency Enhanced Decomposed Transformer for long-term series forecasting [Internet]. arXiv [cs.LG]. 2022 [cited 28 April 2023]. Available from: http://arxiv.org/ abs/2201.12740

[55] Wu H, Xu J, Wang J, Long M. Autoformer: Decomposition Transformers with Auto-Correlation for long-term series forecasting [Internet]. arXiv [cs. LG]. 2021 [cited 28 April 2023]. pp. 22419-22430. Available from: https:// proceedings.neurips.cc/paper/2021/hash/ bcc0d400288793e8bdcd7c19a8ac0c2b-Abstract.html

[56] Zhang Y, Yan J. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting [Internet]. 2023 [cited 28 April 2023]. Available from: https:// openreview.net/pdf?id=vSVLM2j9eie

[57] Nie Y, Nguyen NH, Sinthong P, Kalagnanam J. A time series is worth 64 words: Long-term forecasting with transformers [Internet]. arXiv [cs.LG]. 2022 [cited 28 April 2023]. Available from: http://arxiv.org/abs/2211.14730

[58] Cai L, Janowicz K, Mai G, Yan B, Zhu R. Traffic transformer: Capturing the continuity and periodicity of time series for traffic forecasting. Transactions in GIS[Internet]. 2020 [cited 28 April 2023];**24**(3):736-755 Available from: https://research-information.bris.ac.uk/ en/publications/traffic-transformercapturing-the-continuity-andperiodicity-of-t

[59] Xu M, Dai W, Liu C, Gao X, Lin W, Qi G-J, et al. Spatial-Temporal Transformer Networks for traffic flow forecasting [Internet]. arXiv [eess.SP]. 2020 [cited 28 April 2023]. Available from: https://paperswithcode.com/ paper/spatial-temporal-transformernetworks-for

[60] Li L, Yao J, Wenliang L, He T, Xiao T, Yan J, et al. GRIN: Generative relation and intention network for multi-agent trajectory prediction. Advances in Neural Information Processing Systems [Internet]. 2021

*Anomaly Detection in Time Series: Current Focus and Future Challenges DOI: http://dx.doi.org/10.5772/intechopen.111886*

[cited 28 April 2023];34:27107-27118. Available from: https://proceedings. neurips.cc/paper/2021/hash/ e3670ce0c315396e4836d7024abcf3dd-Abstract.html

[61] Ding C, Sun S, Zhao J. MST-GAT: A multimodal spatial–temporal graph attention network for time series anomaly detection. Information Fusion [Internet]. 2023;**89**:527-536 Available from: https://www.sciencedirect.com/ science/article/pii/S156625352200104X

[62] Devlin J, Chang M-W, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North. Stroudsburg, PA, USA: Association for Computational Linguistics; 2019. pp. 4171-4186

[63] Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, et al. Language Models are Few-Shot Learners [Internet]. arXiv [cs.CL]. 2020 [cited 28 April 2023]. p. 1877-901. Available from: https://proceedings.neurips.cc/ paper/2020/hash/1457c0d6bfcb4967418b fb8ac142f64a-Abstract.html

[64] Chen H, Wang Y, Guo T, Xu C, Deng Y, Liu Z, et al. Pre-trained image processing transformer. In: In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE; 2021. pp. 12299-12310

[65] Zerveas G, Jayaraman S, Patel D, Bhamidipaty A, Eickhoff C. A Transformer-based Framework for Multivariate Time Series Representation Learning. Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. 2021

[66] Yang C-HH, Tsai Y-Y, Chen P-Y. In: Meila M, Zhang T, editors. arXiv [cs.LG] Voice2Series: Reprogramming Acoustic Models for Time Series Classification

[Internet]. 2021. pp. 11808-11819 Available from: https://proceedings.mlr. press/v139/yang21j.html

[67] Wu Z, Liu Z, Lin J, Lin Y, Han S. Lite Transformer with Long-Short Range Attention [Internet]. arXiv [cs.CL]. 2020 [cited 28 April 2023]. Available from: https://iclr.cc/virtual\_2020/ poster\_ByeMPlHKPH.html

[68] Mehta S, Ghazvininejad M, Iyer S, Zettlemoyer L, Hajishirzi H. DeLighT: Deep and Light-weight Transformer [Internet]. openreview.net. 2023 [cited 28 April 2023]. Available from: https:// openreview.net/forum?id=ujmgfuxSLrO

[69] Bapna A, Chen MX, Firat O, Cao Y, Wu Y. Training deeper neural machine translation models with transparent attention [Internet]. arXiv [cs.CL]. 2018 [cited 28 April 2023]. Available from: https://aclanthology.org/D18-1338.pdf

[70] Dehghani M, Gouws S, Vinyals O, Łukasz JU, Google K, Google B. UNIVERSAL TRANSFORMERS [Internet]. Arxiv.org. [cited 28 April 2023]. Available from: http://arxiv.org/ abs/1807.03819v3

[71] Xin J, Tang R, Lee J, Yu Y, Lin J. DeeBERT: Dynamic early exiting for accelerating BERT inference. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics; 2020. pp. 2246-2251

[72] Dai Z, Yang Z, Yang Y, Carbonell J, Le QV, Salakhutdinov R. Transformer-XL: Attentive language models beyond a fixed-length context [Internet]. arXiv [cs. LG]. 2019 [cited 28 April 2023]. Available from: http://arxiv.org/abs/1901.02860

[73] So DR, Liang C, Le QV. The Evolved Transformer [Internet]. arXiv [cs.LG].

2019 [cited 28 April 2023]. Available from: http://arxiv.org/abs/1901.11117

[74] Chen M, Peng H, Fu J, Ling H. AutoFormer: Searching transformers for visual recognition. In: In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE; 2021. pp. 12270-12280

[75] Tkach V, Kudin A, Kebande VR, Baranovskyi O, Kudin I. Non-patternbased anomaly detection in timeseries. Electronics (Basel) [Internet]. 2023 [cited 28 April 2023];**12**(3):721 Available from: https://www.mdpi. com/2079-9292/12/3/721

[76] Zhou L, Zeng Q, Li B. Hybrid anomaly detection via multihead dynamic graph attention networks for multivariate time series. IEEE Access [Internet]. 2022;**10**:40967-40978 Available from: https://ieeexplore.ieee. org/abstract/document/9758699/

[77] Nguyen HD, Tran KP, Thomassey S, Hamad M. Forecasting and anomaly detection approaches using LSTM and LSTM autoencoder techniques with the applications in supply chain management. International Journal of Information Management [Internet]. 2021;**57**(102282):102282 Available from: https://www. sciencedirect.com/science/article/pii/ S026840122031481X

[78] Mathonsi T, van Zyl TL. Statistics and deep learning-based hybrid model for interpretable anomaly detection [Internet]. arXiv [cs.LG]. 2022 [cited 28 April 2023]. Available from: http://arxiv. org/abs/2202.12720

[79] Nizam H, Zafar S, Lv Z, Wang F, Hu X. Real-time deep anomaly detection framework for multivariate time-series data in industrial IoT. IEEE Sensors

Journal [Internet]. 2022;**22**(23):22836- 22849 Available from: https://ieeexplore. ieee.org/abstract/document/9915308/

[80] Choi T, Lee D, Jung Y, Choi H-J. Multivariate time-series anomaly detection using SeqVAE-CNN hybrid model. In: 2022 International Conference on Information Networking (ICOIN). Jeju-si, Korea: IEEE; 2022. pp. 250-253

[81] Lin S, Clark R, Birke R, Schonborn S, Trigoni N, Roberts S. Anomaly detection for time series using VAE-LSTM hybrid model. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Barcelona, Spain: IEEE; 2020. pp. 4322-4326

[82] Presekal A, Stefanov A, Rajkumar VS, Palensky P. Attack graph model for cyber-physical power systems using hybrid deep learning. IEEE Transactions on Smart Grid [Internet]. 2023:1-1 Available from: https://ieeexplore.ieee.org/abstract/ document/10017381/

[83] Terbuch A, O'Leary P, Khalili-Motlagh-Kasmaei N, Auer P, Zohrer A, Winter V. Detecting anomalous multivariate time-series via hybrid machine learning. IEEE Transactions on Instrumentation and Measurement [Internet]. 2023;**72**:1-11 Available from: https://ieeexplore.ieee.org/abstract/ document/10015855/

[84] Boloka T, Crafford G, Mokuwe W, Van Eden B. Anomaly detection monitoring system for healthcare. In: 2021 Southern African Universities Power Engineering Conference/ Robotics and Mechatronics/Pattern Recognition Association of South Africa (SAUPEC/RobMech/PRASA). Potchefstroom, South Africa: IEEE; 2021. pp. 1-6

*Anomaly Detection in Time Series: Current Focus and Future Challenges DOI: http://dx.doi.org/10.5772/intechopen.111886*

[85] Luo A, Yang F, Li X, Nie D, Jiao Z, Zhou S, et al. Hybrid graph neural networks for crowd counting. Proceedings of the AAAI Conference on Artificial Intelligence [Internet]. 2020 [cited 28 April 2023];**34**(07):11693-11700 Available from: https://ojs.aaai.org/ index.php/AAAI/article/view/6839

[86] Goldstein M, Uchida S. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PLoS One [Internet]. 2016;**11**(4):e0152173. DOI: 10.1371/ journal.pone.0152173

[87] Karadayi Y, Aydin MN, Ogrenci AS. Unsupervised anomaly detection in multivariate spatio-temporal data using deep learning: Early detection of COVID-19 outbreak in Italy. IEEE Access [Internet]. 2020;**8**:164155-164177 Available from: https://ieeexplore.ieee. org/abstract/document/9187620/

[88] Zoph B, Le QV. Neural architecture search with reinforcement learning [Internet]. arXiv [cs.LG]. 2016 [cited 28 April 2023]. Available from: http://arxiv. org/abs/1611.01578

[89] Kurakin A, Goodfellow I, Bengio S. Adversarial machine learning at scale [Internet]. arXiv [cs.CV]. 2016 [cited 28 April 2023]. Available from: http://arxiv. org/abs/1611.01236

[90] Zhang R, Zou Q. Time series prediction and anomaly detection of light curve using LSTM neural network. Journal of Physics: Conference Series. 2018;**1061**:012012

[91] von Schleinitz J, Graf M, Trutschnig W, Schröder A. VASP: An autoencoder-based approach for multivariate anomaly detection and robust time series prediction with application in motorsport. Engineering Applications of Artificial Intelligence [Internet].

2021;**104**(104354):104354 Available from: https://www.sciencedirect.com/science/ article/pii/S0952197621002025

[92] Haris M, Sharif U, Gupta K, Mohammed A, Jiwani N. Anomaly detection in time series using deep learning [Internet]. Ijeast.com. [cited 28 April 2023]. Available from: https:// www.ijeast.com/papers/296-305%20 Tesma0706.pdf

[93] Ren H, Xu B, Wang Y, Yi C, Huang C, Kou X, et al. Time-series anomaly detection service at Microsoft. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York, NY, USA: ACM; 2019

[94] Batarseh FA, Freeman L, Huang C-H. A survey on artificial intelligence assurance. Journal of Big Data [Internet]. 2021;**8**(1):60. DOI: 10.1186/ s40537-021-00445-7

[95] Jiang W, Hong Y, Zhou B, He X, Cheng C. A GAN-based anomaly detection approach for imbalanced industrial time series. IEEE Access [Internet]. 2019;**7**:143608-143619 Available from: https://ieeexplore.ieee. org/abstract/document/8853246/

[96] Li D, Chen D, Jin B, Shi L, Goh J, Ng S-K. MAD-GAN: Multivariate anomaly detection for time series data with generative adversarial networks. In: Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series. Cham: Springer International Publishing; 2019. pp. 703-716

[97] He Y, Zhao J. Temporal convolutional networks for anomaly detection in time series. Journal of Physics: Conference Series. 2019;**1213**:042050

[98] Geiger A, Liu D, Alnegheimish S, Cuesta-Infante A, Veeramachaneni K. TadGAN: Time series anomaly detection using generative adversarial networks. In: 2020 IEEE International Conference on Big Data (Big Data). Atlanta, GA, USA: IEEE; 2020. pp. 33-43

[99] Lai K-H, Zha D, Wang G, Xu J, Zhao Y, Kumar D, et al. TODS: An automated time series outlier detection system. Proceedings of the AAAI Conference on Artificial Intelligence [Internet]. 2021 [cited 28 April 2023];**35**(18):16060-16062 Available from: https://ojs.aaai.org/index.php/ AAAI/article/view/18012

[100] Milutinovic M, Schoenfeld B, Martinez-Garcia D, Ray S, Shah S, Yan D. On Evaluation of AutoML Systems [Internet]. Automl.org. [cited 28 April 2023]. Available from: https:// www.automl.org/wp-content/ uploads/2020/07/AutoML\_2020\_ paper\_59.pdf

[101] Trirat P, Nam Y, Kim T, Lee J-G. ANOVIZ: A visual inspection tool of anomalies in multivariate time series [Internet]. Github.io. [cited 28 April 2023]. Available from: https://itouchz. github.io/files/AnoViz\_AAAI23.pdf

[102] Patel D, Ganapavarapu G, Jayaraman S, Lin S, Bhamidipaty A, Kalagnanam J. AnomalyKiTS: Anomaly detection toolkit for time series. Proceedings of the AAAI Conference on Artificial Intelligence [Internet]. 2022 [cited 28 April 2023];**36**(11):13209-13211 Available from: https://ojs.aaai.org/ index.php/AAAI/article/view/21730

[103] Girish L, Rao SKN. Anomaly detection in cloud environment using artificial intelligence techniques. Computing [Internet]. 2023;**105**(3):675- 688. DOI: 10.1007/s00607-021-00941-x

[104] Himeur Y, Ghanem K, Alsalemi A, Bensaali F, Amira A. Artificial intelligence based anomaly detection of energy consumption in buildings: A review, current trends and new perspectives. Applied Energy [Internet]. 2021;**287**(116601):116601 Available from: https://www. sciencedirect.com/science/article/pii/ S0306261921001409

[105] Liu D, Alnegheimish S, Zytek A, Veeramachaneni K. MTV: Visual analytics for detecting, investigating, and annotating anomalies in multivariate time series. Proceedings of the ACM on Human-Computer Interaction [Internet]. 2022;**6**(CSCW1):1-30. DOI: 10.1145/3512950

Section 2
