**5. Proposed QoS-QoR aware CNN FPGA accelerator Co-design approach for feature IoT world**

#### **5.1 QoS-QoR CNN accelerator for IoT devices**

Motivated by the idea and challenges discussed above, we propose a QoS-QoR aware CNN FPGA Accelerator co-design process that includes a hardware-oriented CNN topology and an accelerator design that takes into consideration CNN-specific properties. CNNs and accelerators are created in tandem to find the greatest balance among both QoS and QoR. Targeted QoS, QoR, and hardware resource limitations are inputs to this procedure, while the resulting CNN model and its related accelerator architecture are outputs. The entire process is broken down into three steps:


*Future Internet of Things: Connecting the Unconnected World and Things Based on 5/6G… DOI: http://dx.doi.org/10.5772/intechopen.104673*

• Step Three: Exploration and exploitation of CNN that is hardware-dependent. We begin exploring CNNs with the first level technique by stacking the selected packet and utilizing stochastic coordinate descent to explore CNNs under provided QoS and QoR restrictions (SCD). The QoS of SCD's CNN outputs is precisely assessed before being sent to SCD for updating the CNN model. To increase QoR, produced CNNs that match QoS criteria are presented in the purpose of training and tuning.

Based on this, we propose an accelerator-based selected CNN design that provides a pipeline architecture for efficient CNN implementation with a maximum resource sharing technique. It contains a foldable structure that uses the same hardware components to calculate CNN sets sequentially, saving resources when targeting tiny IoT devices. To improve QoS, it also uses an unfolded structure for computing operations inside bundles in a pipeline manner. The proposed design can benefit from both recurring and pipeline structures by combining the two levels of design. The acceleration phases, on the other hand, are carried out using Xilinx vivado high level synthesis (HLS).

#### **5.2 Proposed architecture: Acceleration and designing tools**

HLS approaches have increased the development quality of FPGA-based hardware design in recent years by enabling FPGAs to be programmed in high-level languages (e.g., C/C++) [58]. Designing an FPGA-based CNN accelerator with highperformances, on the other hand, is far from simple, as it necessitates specific hardware development, repeated hardware/software testing to assure operational accuracy, and efficient design space exploration for advanced throttle settings. We've seen a rising interest in expanding automation frameworks for developing CNN accelerators from a higher level of abstraction, using particular algorithmic descriptions to CNN and top quality predefined hardware models for rapid design and prototyping, in order to increase the effectiveness of accelerator design. However, there are still design issues, as new development patterns in cloud and embedded FPGAs create fundamentally distinct challenges in satisfying the diverse demands of CNN applications. For example, many arrays are frequently employed in the newest versions of cloud FPGAs to double available resources and give better throughput. When accelerator architectures struggle to grow up/down to meet chip size, cross-routing and distributed on-chip memory can simply create timing violations and reduce possible performance. On the other hand, on-board FPGAs combine heterogeneous components (such as a CPU and a GPU) to efficiently handle various aspects of the targeted activities. It is very difficult to fully use on-chip resources and reap the benefits of specific hardware without the need for an extremely flexible task partitioning scheme. Meanwhile, many researchers are experimenting with fast CONV algorithms to see if they can improve the program [59]. While these accelerators deliver superior performance than classic designs, they are constrained by use cases and necessitate more complicated design approaches. As shown in **Figure 4**, the proposed QoS-QoR aware CNN FPGA accelerator co-design is consisted of Zynq Processor that is used in all tasks management, like predictions, GPIO management, and automatically mapping the CNN accelerator with the right parameters. The Axi DMA is used to speed up the data and communication exchange between DDR and CNN accelerator. This co-design aims to test the created CNN accelerators on an edge object detection application.

**Figure 4.** *QoS-QoR aware DNN/CNN FPGA accelerator Co-design.*
