Abstract:
Over the past few years, Convolutional Neural Networks (CNNs) have provided major breakthroughs in fields such as computer vision and natural language processing, resulting in a rise in the adoption of CNNs with increased levels of complexity. Consequently, the need for fast and power efficient processing of such networks has become critically important. Conventional hardware solutions, namely CPUs and GPUs, fail to address these requirements as CPUs are not suited for processing massively parallel multiply and accumulate (MAC) operations and GPUs are not power efficient. However, Field Programmable Gate Arrays (FPGAs) have emerged as a promising alternative due to their extensive programmability, ease of executing parallel operations and wide interfacing capabilities. In this paper, we have designed a hardware accelerator for speeding up the inference phase of LeNet–5 to enable faster classification of handwritten digits. We employ software quantization for ease of implementation on FPGAs, and partial pipelining to process the various layers of a typical CNN. Targeting the Xilinx Zynq-7000 SoC, we report a speedup improvement of at least 4.7x and power efficiency improvement of 32x compared to similar works.