**3. Neuromorphic hardware-software Co-design**

#### **3.1 Algorithm-driven neuromorphic hardware optimizations**

The accuracy of data representation in neural network models has a significant impact on computing accuracy. It is worth noting that ReRAM devices face challenges in providing high-precision computing due to manufacturing and hardware design costs although analog conductance states can be ideally achieved. Therefore, ReRAM devices with the limited physical precision pose a severe challenge to neuromorphic computing accuracy.

Researchers have attempted to alleviate the accuracy loss caused by the limited precision through hardware-software co-optimization. For example, Wang et al. proposed a new quantization regularization according to the computing characteristics of ReRAM devices and leverage different levels of regularization for different network layers [31]. They also minimized the impact of quantization by dynamical bias tuning under the fixed weights. Their quantization method achieved minimal accuracy loss under the limited resolution of synaptic weights. Yang et al. investigated a novel approach that processed quantization and training concurrently by optimizing the calculation of continuous weights and quantized weights in stochastic gradient descent [32]. Researchers found that quantizing partial sums was an effective approach to performing high-precision calculations in the ReRAM-based neuromorphic computing system. A team from the University of Illinois at Urbana-Champaign developed a comprehensive quantization approach that considered inputs, weights, and partial sums [33]. They developed a deep reinforcement learningbased search method that can automatically discover the best-mixed configuration to identify the optimal precision configurations of these three types of data. **Table 1** presents a comparison of the accuracy achieved by these quantization methods.

DNN models are becoming increasingly complex and involve tremendous parameters. It was noted that sparsity exists in the neural networks, which indicates that a significant number of parameters are redundant and can be pruned without causing accuracy loss [34, 35]. In the ReRAM-based neuromorphic designs, pruned neural networks can ideally reduce hardware costs and improve computation speed and energy efficiency. Therefore, identifying and eliminating redundancies is crucial to enhance computational efficiency while maintaining accuracy. SNrram [63]

*Enabling Neuromorphic Computing for Artificial Intelligence with Hardware-Software… DOI: http://dx.doi.org/10.5772/intechopen.111963*


#### **Table 1.**

*The accuracy comparison of quantization methods.*


*\* IRU represents indexing register unit.† SWOF represents structurally compressed weight oriented fetching.‡ IPMC represents in-layer pipeline for memory and computation.*

#### **Table 2.**

*The comparison of pruning methods.*

introduced a sparsity transfer algorithm to standardize sparsity in weights and activation and an indexing register unit was designed to store sparsity indexes and parse the data. Later, researchers proposed ReCom, another hardware and software co-design accelerator for high-sparsity networks based on ReRAM [64]. ReCom utilized a group lasso algorithm to standardize the shape of filters and accomplish pruning by compressing sparse weights through the regularization of each layer. Other methods besides regularization for ReRAM-based network pruning have been proven to be effective [65, 66]. **Table 2** illustrates a comparison of these pruning methods.

### **3.2 Hardware-driven hardware-software co-optimization**

In ReRAM-based neuromorphic systems, process variation [67, 68], circuit noise [69, 70], retention issues, and endurance issues [71–73] greatly impact its practical applications in real world [74].

Bit failure in resistive devices is a common fault in high-density crossbar arrays. The ReRAM device is unable to switch its conductivity in response to the writing voltage when the failure occurs, which causes fixed resistive devices that are expressed as constant weights in neural network computation. These fixed resistive devices may destroy the overall neural network accuracy. To solve this challenge, Liu et al. first proposed a novel hardware and software co-design approach to improve computational accuracy in neuromorphic designs with high defects [75]. The proposed approach had two strategies: network retraining (software) and redundant resistive devices (hardware). The network retraining consisted of a standard weighttuning process and a retraining method that prevented the weights at defective resistive devices from being updated. The redundant resistive device strategy deployed additional columns for highly significant weights with defects. Similarly, researchers at Northeastern University proposed a hierarchical progressive pruning method to improve the fault tolerance of ReRAM computing under stuck-off defects and a corresponding differential mapping scheme to support their method for both stuck-on and stuck-off defects [76].

A series of research works have utilized network retraining techniques to minimize the accuracy loss introduced by the imperfect hardware [77–79]. Nevertheless, as the location and quantity of defective resistive devices can differ for each ReRAM-based chip, it is necessary to retrain the network for every instance, leading to an enormous computational burden. To address this challenge, researchers at Purdue University proposed CxDNN, a solution that combines hardware-software compensation techniques for DNNs [80]. CxDNN consists of three optimization steps: a quantization and conversion algorithm, a re-training method, and hardware compensation. The quantization and conversion algorithm extracts a fixed-point neural network from open-access weights based on the accuracy of inputs, weights, and ADC/DAC components. The re-training process mitigates accuracy loss resulting from the nonlinear representation of components such as ADC/DAC, and leverages the available weights to accelerate calculation. Finally, the hardware compensation mechanism adjusts the compensation factor of each column in crossbar arrays based on relative and absolute errors to reduce accuracy loss caused by hardware limitations.

In addition to the aforementioned permanent defects, ReRAM-based systems can also experience instabilities during processing, including noise [81], drifting [82], and programming errors [83]. To mitigate these issues during computation, researchers proposed FTNNA, a ReRAM-customized advanced error-correcting output code (ECCO) scheme [84]. FTNNA applied collaborative logistic classifiers to replace the

*Enabling Neuromorphic Computing for Artificial Intelligence with Hardware-Software… DOI: http://dx.doi.org/10.5772/intechopen.111963*

classic softmax function and adjusts the weights of these classifiers through transfer learning. Furthermore, they designed a variable-length decode-free coding scheme to reduce neural competition [84]. This approach resulted in significant accuracy improvements without the need for any hardware-specific calibration. Researchers at Tsinghua University proposed a re-configurable redundancy scheme to rescue accuracy degradation caused by stocked resistive devices [85]. Many other studies are dedicated to addressing various aspects of ReRAM technology, such as stuck failures [86, 87], temperature [88], IR-drop [89, 90], and other factors, to enhance the error tolerance of ReRAM-based neuromorphic computing.
