Efficient FPGA Resource Utilization in Wired-Logic Processors Using Coarse and Fine Segmentation of LUTs for Non-Linear Functions
D Li, T Zhao, K Kobayashi, A Kosuge… - … on Circuits and …, 2024 - ieeexplore.ieee.org
2024 IEEE International Symposium on Circuits and Systems (ISCAS), 2024•ieeexplore.ieee.org
A coarse-and fine-grained lookup table (LUT) segmentation technique is developed for
wired-logic artificial intelligence (AI) processors to improve field-programmable gate array
(FPGA) resource utilization efficiency. While wired-logic processors have achieved several
orders of magnitude higher energy efficiency than conventional FPGA-based deep neural
network (DNN) processors on the CIFAR-10 dataset by eliminating DRAM/BRAM access
during inference processing, huge hardware resources are required for the large-scale …
wired-logic artificial intelligence (AI) processors to improve field-programmable gate array
(FPGA) resource utilization efficiency. While wired-logic processors have achieved several
orders of magnitude higher energy efficiency than conventional FPGA-based deep neural
network (DNN) processors on the CIFAR-10 dataset by eliminating DRAM/BRAM access
during inference processing, huge hardware resources are required for the large-scale …
A coarse- and fine-grained lookup table (LUT) segmentation technique is developed for wired-logic artificial intelligence (AI) processors to improve field-programmable gate array (FPGA) resource utilization efficiency. While wired-logic processors have achieved several orders of magnitude higher energy efficiency than conventional FPGA-based deep neural network (DNN) processors on the CIFAR-10 dataset by eliminating DRAM/BRAM access during inference processing, huge hardware resources are required for the large-scale DNNs with long-bit-width data. Implementing even small DNNs proves challenging as they surpass the hardware resources available in commercial FPGAs. To address these issues and enable the implementation of larger-scale neural networks alongside the processing of long-bit-width data, two techniques are proposed: (1) an LUT segmentation technique based on coarse and fine granularity, and (2) accuracy optimization through the incorporation of redundant bits. The application of these proposed techniques to state-of-the-art wired-logic processors markedly enhances the scalability of a single FPGA, thereby facilitating the implementation of larger-scale neural networks across various tasks, including CIFAR-10 classification and keyword spotting. The hardware resource requirements for non-linear functions in processing elements decreased by 92%, and 92.8%, respectively. Remarkably, the recognition accuracy for CIFAR-10 remains consistent, while there is a negligibly small degradation in accuracy for the keyword spotting task by 1.2%.
ieeexplore.ieee.org
Showing the best result for this search. See all results