**1. Introduction**

### **1.1 Modern-day AI algorithms**

Machine learning (ML) and artificial intelligence (AI) significantly impacted society. AI algorithms, such as deep neural networks (DNNs), achieved accuracy that exceeded human-level perception for various applications, including medical imaging, natural language processing, and computer vision [1–3]. The popularity of AI algorithms was mainly driven by the availability of big datasets (for various applications like image classification and object detection) [1, 4, 5], as well as the increased computing power provided by the next-generation ML hardware accelerators and general-purpose computing platforms. **Figure 1** illustrates the taxonomy of ML algorithms, which could be broadly categorized into supervised and unsupervised

**Figure 1.**

*Taxonomy showing different ML with different learning techniques.*

learning. Unsupervised learning involved extracting features from a distribution without data annotation, including selecting samples, denoising data, and clustering data into groups. The unsupervised learning algorithm aimed to find the optimal representation of the data that preserved maximum information about the input data *x* while ensuring the representation was simpler than the data itself, utilizing constraints such as lower-dimensional, sparse, or independent representation [6, 7]. Some popular unsupervised learning techniques were clustering, principal component analysis (PCA), autoencoders, and Gaussian potential functions.

Supervised learning involves training the ML model using a set of labeled training data and then testing it using a labeled testing set. There are two types of supervised learning: classical approaches and deep learning. Classical approaches use conventional techniques that employ a probabilistic model to determine the next state based on a set of parameters. Some popular classical techniques are decision trees, support vector machines (SVM), Markov chains, and maximum likelihood estimation (MLE) [8–11]. However, classical techniques had limitations such as difficulty in scaling, lack of generalization, and the need for significant data engineering for each algorithm. Classical techniques serve as the foundation for deep learning algorithms, which overcome their limitations. This chapter focuses on deep learning techniques used in supervised learning. Convolutional neural networks (CNNs) stand out for their superior performance in various machine-learning tasks, including computer vision, object detection, and object segmentation. Additionally, recurrent neural networks (RNNs) excel in processing temporal data, while graph convolutional networks (GCNs) combine graphs and neural networks for various applications. The chapter covers recent advancements in CNNs, RNNs, and GCNs, discussing their structures, training methods, and execution efficiency for training and inference operations.

*End-to-End Benchmarking of Chiplet-Based In-Memory Computing DOI: http://dx.doi.org/10.5772/intechopen.111926*

**Figure 2.**

*(a) Convolution layer consisting of the input feature map (IFM), kernel, and the output feature map (OFM), (b) FC layer in a CNN.*

Conventional CNNs consist of layers connected either sequentially or with skip connections. Besides convolutional layers, ReLU, pooling, and batch-normalization methods are commonly used to enhance performance. **Figure 2** illustrates the typical structure of a convolutional and fully connected (FC) layer. Sequential layers usually involve a stack of convolution (conv) layers that extract features from the input. Convolution layer kernels may include 7 7, 5 5, 3 3, and 1 1. Additionally, as proposed in MobileNet [12], depthwise convolutions divide a given N N convolution into two parts: first, a N 1 convolution is performed, followed by a 1 N convolution. Depth-wise convolution yields better accuracy and lower hardware complexity. Pooling layers are periodically utilized to reduce the feature map size and eliminate noisy input. In order to perform classification on extracted features, a set of FC layers or classifier layers are utilized. These layers, along with the Conv layers, have a set of weights that are trained to achieve the best accuracy. Popular CNN structures, such as AlexNet [1], GoogleNet [13], ResNet [14], DenseNet [15], MobileNet [12], and SqueezeNet [16], utilize a combination of convolutional, pooling, and FC layers to achieve high accuracy in a variety of ML tasks. Additionally, CNNs like DenseNet and ResNet utilize skip connections from previous layers to create a highly branched structure, which aims to improve the feature extraction process. These skip connections are present within the conv layers only. In contrast, conventional CNNs face several limitations, such as the vanishing gradient problem, higher hardware costs during training and inference, and over-parameterization [17–19]. To address these issues, network architecture search (NAS) was introduced to automatically find the optimal neural network architecture based on the specific design point for the target application. Different design points, such as higher accuracy, better generalization, higher hardware efficiency, and lower memory access, have been proposed for NAS. Popular techniques like NasNet [20], FBNet [21], AmoebaNet [22], PNAS [23], ECONas [24], and MNasNet [25] have been developed to address these issues.

RNN is a commonly used deep learning technique that offers an effective solution for modeling data with sequential or temporal structures and variable length inputs and outputs in various applications [26–28]. It processes sequential data one element at a time, using a connectionist model that selectively passes information. This enables RNNs to model input and/or output data consisting of a sequence of dependent elements. Additionally, RNNs can simultaneously model both sequential and time dependencies at different scales. They employ a feedforward network that includes edges connecting adjacent time steps, which introduces time to the model. While conventional edges do not have cycles, recurrent edges that connect adjacent time

steps can form cycles. Modern RNN architectures can be categorized into two main types. The first is long-short-term memory (LSTM), which includes a memory cell, a computation unit that replaces traditional nodes in the hidden layer of a network [29]. The second variant of RNNs is bi-directional RNNs (BRNNs), as proposed in [30].

As more applications rely on graphs to represent data, using CNNs and RNNs to capture hidden patterns within Euclidean data becomes limited. For instance, in ecommerce, a graph-based learning system can use interactions between users and products to recommend highly accurate products. However, graphs' complexity and irregularities pose significant challenges to existing DNNs. To address this issue, Graph Neural Networks (GNNs) were introduced and categorized into three types: Recurrent GNNs (RecGNNs) [31–33], Convolutional GNNs (CGNNs) [34–36], and Graph Autoencoders (GAE) [37–39]. RecGNNs use recurrent neural architectures to learn node representations, while CGNNs generalize convolution operations to graph data by aggregating features from neighboring nodes. GAEs map nodes into a latent feature space, preserving the node's topological information with a low-dimensional vector. Finally, GAEs learn network embeddings using an encoder and a decoder to enforce the preservation of graph topological information.

#### **1.2 Hardware implications of DNNs**

State-of-the-art DNNs, such as CNNs, RNNs, and GCNs, have diverse structures that lead to significant demands on compute and memory resources. Achieving higher accuracy with these machine-learning models requires increased computational complexity and model size, which, in turn, require more memory to store both the weights and activations. The increased model size and complexity also lead to a larger volume of on-chip data movement. For instance, the ImageNet dataset's [1] ResNet-50 [14] requires 50 MB of memory and 4 GFLOPs per inference, while DenseNet-121 [15] requires 110 MB of memory and 8 GFLOPs per inference. Conventional architectures that separate memory and computation lead to a considerable number of external memory accesses, which reduce energy efficiency and performance. The average cost of an external memory access is 1000 times higher than the energy required for computations [40]. When considering the total energy spent on performing inference for VGG-16 and ResNet-50 using conventional von Neumann architectures, a floating-point 32-bit (FP-32) multiplication results in 3.2pJ, while an FP-32 add requires 0.9pJ in the 45 nm technology node [41]. Therefore, performing inference for one image consumes 65 mJ of energy using the VGG-16 CNN, while ResNet-50 consumes 16 mJ. Scaling the computation energy up, for 1000 inference performs, VGG-16 takes 65 J while ResNet-50 consumes 16 J of energy. In conclusion, the higher accuracy achieved by DNNs results in higher computational complexity, increased memory requirements, more off-chip memory access, and lower energy efficiency.

This chapter delves into in-memory computing (IMC) as an alternative to traditional von-Neumann architecture, which offers improved energy efficiency, better performance, and reduced off-chip memory access. IMC has emerged as a promising solution to tackle the memory access, energy efficiency, and performance bottlenecks encountered by DNN applications. Hardware architectures for IMC, such as those based on SRAM and nanoscale nonvolatile memory (e.g., resistive RAM or RRAM), provide a dense and parallel structure to achieve high performance and energy efficiency [42–57]. Additionally, the chapter introduces chiplet-based IMC architectures, as well as a benchmarking simulator for this architecture. The SIAM simulator and the associated architecture enabled through SIAM are described in detail. Overall, this chapter explores various benchmarking simulators for IMC architectures.
