**2. Background and motivation**

#### **2.1 Unsustainable trend for DL compute**

Deep neural networks (DNNs) [15–18] have advanced state-of-the-art in a plethora of applications, particularly those mimicking human sensory processing tasks such as image recognition, object detection, time-series signal (e.g., speech) processing etc. However, due to exponentially growing models performing highdimensional tensor processing and global gradient backpropagation, their compute requirements are scaling unsustainably. Specifically, a study by OpenAI in 2018 [19] illustrated that DNN training compute is doubling every 3.4 months. Comparing this with hardware driven by Moore's Law doubling every 24 months, the gap between compute demands and hardware supply is increasing at an exponential rate of 8x per year or 500x every 3 years (see **Figure 1**). There is also growing evidence that the current trajectory of deep learning compute scaling is economically, technologically and environmentally unsustainable [20–22]. On a more optimistic note, some other recent studies have claimed that the rate of DNN model explosion has slowed down between 2018 and 2020 [23, 24], and that the trend is expected to plateau and shrink in the future [25].

There are significant ongoing efforts to deal with this DNN compute complexity explosion. Reducing data value precision through quantization and DNN model sizes

#### **Figure 1.**

*Figure taken from [19] with annotations added. There is a significant trending gap between DNN computation demand and hardware resource supply provided by Moore's law. This demand-supply gap is increasing at the rate of 8x/year. Thus, to sustain this trend, every year the amount of hardware resources would need to increase by 8 or the training time will need to increase by 8. A promising candidate solution is to revisit and develop computing paradigms based on the brain's neocortex, which is an existence proof for a highly efficient computing machine for intelligent sensory processing.*

through pruning can help mitigate the computation complexity [26–31]. More efficient ways of using the computation hardware infrastructure, including static or dynamic exploitation of value sparsity to avoid unnecessary computation, are also being developed [32–38]. These are all valuable efforts to mitigate the complexity explosion and to help sustain the continuation of the current productive trends. However, there is also the need to concurrently explore other potentially promising alternative paradigms and approaches.
