*6.1.2 MNIST handwritten digit recognition benchmark*

Here, we demonstrate much larger multi-column multi-layer TNN designs with each layer composed of multiple TNN mini-columns, trained to recognize 10 classes of handwritten digits from 0 through 9 [59]. Three multi-layer TNNs are presented here. They include 2-layer, 3-layer and 4-layer designs containing 389 K, 1.3 M and 3 M total synaptic counts respectively, as shown in **Table 2**. With increasing number of layers, these three TNNs can achieve 93, 97 and 99% accuracy on the MNIST dataset, while consuming only about 2, 8 and 18 mW of power, respectively [12]. This demonstrates the huge energy-efficiency potential of TNNs. Using *TNNSim*, we also illustrate online incremental learning capability wherein a TNN design learns a new


*2-layer, 3-layer and 4-layer TNN designs achieve progressively decreasing error rates (7%, 3% and 1%) while consuming increasing power. The 4-layer design incurs just 18 mW power in delivering state-of-the-art accuracy of 99%. Note the synapse counts of these multi-column multi-layer TNN designs are significantly higher than that of the single mini-column TNN designs in Table 1.*

#### **Table 2.**

*Design space exploration for MNIST: Using TNN7 macros, PPA for three multi-layer TNN prototype designs from [10].*

*Cortical Columns Computing Systems: Microarchitecture Model, Functional Building Blocks… DOI: http://dx.doi.org/10.5772/intechopen.110252*

#### **Figure 11.**

*Online incremental learning: A single TNN mini-column trained only on the digits 0–8 learns a new previously unseen digit '9' within 500 samples (that consist about 50 samples of digit '9') in an unsupervised manner [11]. This demonstrates the online learning capability of TNNs, which can quickly learn new classes from relatively few examples without forgetting previously learned information.*

previously unseen pattern using unsupervised STDP. **Figure 11** shows the converged weights of a single mini-column that is first trained using only digits 0–8 (iteration 0). When introducing the previously unseen digit '9', within around 500 new examples (iteration 500) that contain about 50 examples of '9', it learns the new digit '9' without forgetting previously learned digits in an unsupervised fashion [11]. This demonstrates the ability for this mini-column to perform online continual learning.

#### **6.2 Proposed C3S simulation & synthesis tools (ongoing work)**

Building on the previously described three components, the overarching goal of our current research is to develop an end-to-end framework that can automatically translate application-specific C3S models in software to highly customized hardware designs. It can be utilized as an architecture and design synthesis framework for implementing C3S processing units. We envision this as a complete design framework spanning applications, architecture, microarchitecture, and custom macro suite, to generate application-specific C3S processing units for diverse sensory processing applications. As shown in **Figure 12**, the framework consists of two main components.

C3S-Sim (Application and Architectural Design Exploration): This is the software simulator framework, consisting of (1) a PyTorch simulator (extended from *TNNSim* to incorporate C3S functional modeling), and (2) a cycle-accurate architectural simulator in C++ to mimic hardware and derive accurate latency performance information for C3S designs.

C3S-Syn (Micro-architectural Design and Implementation): This is the hardware implementation framework that takes in the PyTorch C3S functional models and generates C3S hardware designs. It is envisioned to include (1) PyVerilog conversion for automated RTL generation from PyTorch, (2) automated RTL-to-GDSII flow that leverages C3S-specialized custom macro cells and generates application-specific postlayout netlist and PPA results for a specific C3S design.

#### **Figure 12.**

*Envisioned end-to-end cortical columns computing system (C3S) design framework consisting of C3S-Sim for application exploration and C3S-Syn for microarchitectural implementation. C3S-Sim consists of a PyTorch tool (extension of TNNSim) to design application-specific C3S functional models and a cycle-accurate architectural simulator for hardware performance estimation. C3S-Syn incorporates the extended microarchitecture model and functional building blocks for C3S implementation, with an automated design flow to translate PyTorch functional models to application-specific hardware designs.*
