**1. Introduction**

The field of deep learning (DL) has seen extraordinary progress over the last decade and has established itself as the de facto standard technology for sensory processing tasks such as visual object recognition/detection, audio/time-series signal processing, natural language processing, etc. However, this progress has thrived with heavy dependence on increasing hardware resources, and power and energy

consumption. These hardware resources such as CPUs, GPUs and specialized accelerators primarily employ Turing computation model and von-Neumann computer architecture. Such computation model and architecture were originally developed for numerical and symbolic processing tasks that were difficult for humans. In contrast, human neocortex is highly efficient in sensory processing and pattern recognition and capable of online continual learning. Current deep learning is attempting to replicate human-like sensory processing capability using conventional Turing-von Neumann computing machines.

It has been observed by Hans Moravec and others that these two computation models and systems are quite different and distinct; this observation is known as the "Moravec's Paradox" [1–3]. The current overarching grand challenge is: Can we design much more energy-efficient computing hardware systems for human-like sensory processing with continuous learning capability by mimicking the architecture, organization, and operation of the human neocortex? This grand challenge is not new and has been around for decades. In 1990, Carver Mead first coined the term "neuromorphic computing" [4] for this grand challenge. But the same notion and aspiration can be traced back to Frank Rosenblatt's "perceptrons" from the late 1950's [5]. This chapter describes our particular approach and strategy in pursuing this grand challenge as part of the current resurgence of interest in neuromorphic computing [6].

Our research builds on the seminal works by James E. Smith on temporal neural networks (TNNs) [7, 8] and is strongly influenced by Jeff Hawkins' recent book "A Thousand Brains: A New Theory of Intelligence" [9]. Unlike convolutional neural networks (CNNs), TNNs are temporal spiking neural networks that embrace a strong adherence to biological plausibility [10]. TNNs encode and process information in temporal form mimicking the brain's neocortical sensory signal processing. We developed a microarchitecture model for implementing highly efficient TNN designs [11, 12] and demonstrated state-of-the-art clustering performance on a wide variety of time-series signals [13].

Hawkins' new theory on intelligence [9], informed by extensive neuroscience research, suggests Cortical Columns (CCs) as the key compute units within the human neocortex. The neocortex gains its intelligence through CC's ability to model sensory information in structured Reference Frames (RFs), and continuously update its models as the sensor interacts with the environment. Each CC is computationally powerful and can learn any specific task. There is great synergy between CCs and TNNs.

Our current research focuses on extending the TNN design and implementation framework to incorporate CC attributes. Unlike artificial neural networks that separate training and inference, CCs store and process information in RFs, support online continuous learning, and can dynamically adapt to sensory input changes. Sensory processing units built based on such CCs, can be truly "intelligent" as per Hawkins' definition, and can enable contextualization and personalization of applications and services for supporting edge AI on diverse mobile and wearable devices.

This chapter presents a framework for designing and implementing *Cortical Columns Computing Systems (C3S)*. This framework consists of three major components: (1) a microarchitecture model for designing and implementing cortical columns and CC-based computing systems using off-the-shelf digital CMOS technology; (2) a suite of specialized functional building blocks to implement application-specific CC-based processing units with significant improvements on Power-Performance-Area (PPA) efficiency; and (3) a PyTorch-based software simulator tool for rapid design space exploration targeting specific applications; and a design synthesis tool for direct CMOS implementation of special-purpose C3S sensory processing units in the form of *Cortical Columns Computing Systems: Microarchitecture Model, Functional Building Blocks… DOI: http://dx.doi.org/10.5772/intechopen.110252*

chiplets, potentially supporting the latest Universal Chiplet Interconnect Express (UCIe) [14] standard. This chapter presents the progress we have made on C3S and our future research directions.
