*End-to-End Benchmarking of Chiplet-Based In-Memory Computing DOI: http://dx.doi.org/10.5772/intechopen.111926*

consumption of an IMC system under a given workload of DNN. NeuroSim is highly flexible and allows users to assess the performance of IMC-based AI accelerators under different system specifications, considering both traditional CMOS-based memory technology (such as SRAM) and emerging nonvolatile memory technologies (like ReRAM and STTMRAM) for the in-memory compute elements. NeuroSim assumes a tile-based architecture [44], consisting of multiple tiles where processing elements (PEs) and IMC-based crossbar arrays reside. Predictive Technology Model (PTM) [94] is used to simulate lower-level components, such as buffers, ADC, and multiplexers, and verified against circuit simulation (e.g., SPICE), reaching more than 90% accuracy. The interface between NeuroSim and popular ML frameworks such as PyTorch and TensorFlow has also been created to make it more user-friendly [95]. However, one major drawback of NeuroSim is that it assumes H-Tree based bus interconnect for inter-tile communication, which can consume up to 90% of the total energy consumption of DNN inference [96]. To overcome this issue, Krishnan et al. [97] (**Figure 7**) proposed an evaluation framework for IMC-based AI accelerators that considers cycle-accurate network-on-chip (NoC) simulation [98]. Similarly, MNSIM [99] also evaluates the performance of IMC-based systems to execute AI applications like NeuroSim. In addition, MNSIM integrates software-hardware co-design techniques in the evaluation framework. GeneiX, proposed by Chakraborty et al. [78], is another evaluation framework for crossbar-based IMC accelerators considering the non-idealities in the memory elements.

While it is essential to evaluate hardware performance under AI workloads, it is also important to assess the accuracy of the AI workload when implemented on-chip. Memory imperfections can lower the accuracy of DNNs. RxNN [100] and SpikeSim [66] are frameworks that evaluate the accuracy of DNN workloads in the presence of memory imperfections. These techniques focus on evaluating the performance of IMC systems executing DNN inference. However, emerging edge devices require online learning, which involves training the DNN. Therefore, evaluating the performance of AI accelerators while executing DNN inference alone is insufficient. An evaluation framework for IMC-based AI accelerators with on-chip training is introduced in [101]. The authors incorporate non-linearity, asymmetry, device-to-device, and cycle-to-cycle variation of weight update into the Python wrapper and peripheral circuits for error/weight gradient computation in NeuroSim core for a given AI

#### **Figure 7.**

*Block diagram on an IMC benchmarking simulator proposed in [97]. The simulator consists of a circuit part and an interconnect part that perform system-level benchmarking of IMC architectures.*

workload. The training framework is based on the authors' prior work [102], where they proposed an SRAM-based transposable function. Since SRAM-based arrays can perform write operations quickly while consuming minimal energy, the weightgradient computation function is implemented through SRAM-based arrays rather than other nonvolatile memory technologies.
