Department of Electrical and Electronics Engineering
Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1925
Browse
68 results
Search Results
Item FPGA-based implementation of single-cycle high-throughput ldpc encoder for 5g new radio(IEEE, 2025-02) Asati, AbhijitThis paper presents a novel architecture for a high-throughput encoder for quasi-cyclic low-density parity-check (QC-LDPC) codes. This low-complexity encoder is specifically tailored for the 5th Generation (5G) New Radio (NR) standard. To achieve high throughput, we employ an automated approach to design customized encoders for individual base graphs using a Verilog Code Generator (VeCoGen) that we developed in MATLAB. Our proposed architecture improves efficiency by rearranging the wire indices during the hardware design language (HDL) code generation stage itself, instead of performing shifting operations on the data carried by those wires on the fly. This eliminates the need for RAM (for storing base graph coefficients) and dedicated barrel shifters. The parity bits are computed in parallel, while maintaining optimal performance and gate count through effective exploitation of the base graph’s sparsity. Consequently, the encoder can be designed as a fully combinational circuit between the input and output registers, enabling the generation of the complete codeword within a single clock cycle. This results in exceptionally high throughput, optimal hardware utilization, and minimal number of XOR operations, making our approach highly effective for 5G NR applications.Item Dedicated hardware architecture for localizing iris in VW images(Elsevier, 2022-07) Asati, Abhijit; Gupta, AnuThis study presents dedicated hardware for iris localization that can be used as a coprocessor in the development of real-time and low-cost embedded iris biometric systems. Though the hardware architecture is described for iris localization in the visible wavelength (VW) images, the concept used can be applied to near infrared (NIR) images as well. In general, the architecture can be used for a class of iris localization algorithms based on the edge-map generation and circular Hough transform (CHT). The architecture presented here generates the edge-maps for limbic and pupil boundary detection using median filtering followed by Sobel edge detection; however, an additional reflection removal module is used for pupil boundary detection. Further, the CHT hardware module detects circle in each edge-map. The proposed architecture was implemented in programmable logic of the Zynq-7000 SoC device from Xilinx. This hardware implementation gives an iris localization accuracy of 98.43% and average processing time of 5.148 ms for UBIRIS.v1 VW database images (200 × 150 pixel). The algorithm used is suitable for less unconstrained and frontal-view iris images captured with subjects’ active participation; however, the images may contain non-ideal issues such as reflection and occlusion by eyelids and eyelashes.Item Dedicated hardware architecture for localizing iris in VW images(Elsevier, 2022-07) Asati, Abhijit; Gupta, AnuThis study presents dedicated hardware for iris localization that can be used as a coprocessor in the development of real-time and low-cost embedded iris biometric systems. Though the hardware architecture is described for iris localization in the visible wavelength (VW) images, the concept used can be applied to near infrared (NIR) images as well. In general, the architecture can be used for a class of iris localization algorithms based on the edge-map generation and circular Hough transform (CHT). The architecture presented here generates the edge-maps for limbic and pupil boundary detection using median filtering followed by Sobel edge detection; however, an additional reflection removal module is used for pupil boundary detection. Further, the CHT hardware module detects circle in each edge-map. The proposed architecture was implemented in programmable logic of the Zynq-7000 SoC device from Xilinx. This hardware implementation gives an iris localization accuracy of 98.43% and average processing time of 5.148 ms for UBIRIS.v1 VW database images (200 × 150 pixel). The algorithm used is suitable for less unconstrained and frontal-view iris images captured with subjects’ active participation; however, the images may contain non-ideal issues such as reflection and occlusion by eyelids and eyelashes.Item Comparative Analysis of ST, ECRL and Static Logic Style at Different Process Technologies(IEEE, 2023-04) Asati, AbhijitIn the lower VLSI process technologies, to design the low-power VLSI circuits selection of suitable logic style becomes important to minimize the chip’s power to meet the power density need with minimum sacrifice in the speed. This study’s emphasis is on the design and optimization of digital code converters. To compare propagation delay, power consumption and power delay product (PDP) at 32 nm and 22 nm process technologies, sub-threshold (ST) logic style implementations of Gray code to Binary code (GB), Binary code to Gray code (BG), and BCD code to Excess-3 code (BE3) code converters are used. These implementations are compared with Efficient Charge Recovery Logic (ECRL) and static CMOS logic style implementationsItem Lightweight convolutional neural network architecture implementation using TensorFlow lite(Springer, 2023-06) Asati, AbhijitRecently, with the increase in the precision of convolutional neural networks (CNN) on a wide variety of classification and recognition tasks, the demand for their deployment has dramatically increased. Even the focus is on lightweight, faster, and low-power implementations. In this paper, we have implemented a CNN model onto an embedded platform, ‘Raspberry Pi 4-Model B edge computing system (RP4-BECS)’. This CNN model was initially trained and verified in MATLAB and then implemented on the Machine Learning (ML) framework to generate a TensorFlow lite (TF-lite) flat buffer format. This implementation offers a reduced size of models with good prediction accuracy and lesser inference time as compared with the available literature. We attempted three trials for all the digits from 0 to 9 to evaluate average prediction accuracy and average inference time. An average prediction accuracy of 99.32% and average inference time of 22.53 ms is achieved for the Sign Language Digits Database (SLDD). Further, an average prediction accuracy of 99.09% and average inference time of 13.28 ms is achieved for the Modified National Institute of Standards and Technology Database (MNIST). The model sizes implemented using TF-Lite are highly reduced to 1.53 MB for SLDD and 148 KB for the MNIST database. The obtained accuracy, inference time and model sizes are better than published results.Item Verification of Hardware Resource Utilization through High Level Synthesis for FPGA Implementation(IEEE, 2023) Asati, Abhijit; Shenoy, Meetha V.Recently, there has been a sharp rise in demand for hardware implementations because of the improved accuracy of Convolutional Neural Networks (CNN) on a wide range of classification and recognition applications. To achieve the needed performance, they include heavy processor operations and memory bandwidth. For optimized hardware deployment, which necessitates thorough optimization of system architectures and algorithms to get particularly efficient designs, a target system’s hardware resources and an estimation of its performance at a greater degree of abstraction are crucial. Since the programmable hardware fabric may be customized for each unique network, Field Programmable Gate Arrays (FPGA) can accomplish this efficiency in this situation. This paper shows the high-level synthesis (HLS) of each of the different layers of optimized CNN using the MATLAB HDL coder. Along with its HDL resource utilization report, we also investigated the computational processes and hardware resource estimation of the previously developed optimized CNN. The hardware resources required by all the convolutional and fully connected layers of the optimized CNN matches exactly will the previously calculated resources. So, the hardware resource utilization is verified through HLS. The architecture takes fixed-point math into account. All layers are synthesized in Vivado 2022.2 with the Zynq UltraScale+ MPSoC ZCU104 Evaluation Kit as the target.Item Hardware Software Co-design of k-means Clustering Algorithm(IEEE, 2023) Asati, AbhijitThe k-means clustering algorithm is a method that is frequently utilized for the purpose of grouping data points considering their similarity. Within the scope of this research, we investigate the viability of using a hardware-software co-design (HSC) strategy in order to speed up the k-means algorithm's execution. The studies are carried out using a Zedboard HSC platform based on Zynq 7000 architecture, which incorporates both processing system (PS) part implemented as set of instructions as software component and programmable logic (PL) part implemented on configurable FPGA fabric as hardware component using RTL code. In implementing k-means clustering algorithm, calculations of distance are carried out by PS and the results are communicated to PL, for performing the distance comparison & cluster reassignment. In order to reduce the resource utilization and the execution time, three different design configurations are being studied using HSC approach where the PL part follows different architectures. The results show comparison of execution speed, resource utilization and power when the different design architecture for the PL part are compared.Item Design and Analysis of a Scan Chain in Subthreshold Region(IEEE, 2023) Asati, AbhijitTesting of manufactured Integrated Circuit (IC) is performed using design for testability (DFT) techniques such as scan chain which is most popular in sequential circuits. The scan cell involves the modification of a D flip-flop (DFF) with a multiplexer at its input. During testing, a pattern is applied through the scan input pin (SI) in which individual flip-flops toggle their values as the test patterns are shifted in hence a significant amount of power is consumed in scan chain. Although moving to a lower technology node decreases the power consumption in a circuit, a further drastic reduction (i.e. 10 6 order) in power consumption is obtained by operating the circuit in the subthreshold region. In this work scan chain is designed to operate correctly in the subthreshold region using suitable device sizes, using both transmission gate (TG) based and true single phase clocked (TSPC) logic for 16, 22 & 32 nm technology nodes. Further, their average powers are compared. In addition, the Monte Carlo simulation and comparative analysis are performed to study the effect of variation of power supply and temperature.Item Convolutional Neural Network Hardware Optimization Using Bayesian Method(IEEE, 2024-04) Asati, Abhijit; Shenoy, Meetha V.Convolutional Neural Network (CNN) models have demonstrated significant benefits in the realm of computer vision and applications related to image processing. Optimizing hyperparameters in CNN models is crucial to ensuring an effective implementation of the model, whether on software, hardware, or a ‘software-hardware co-design’ platform, thereby enhancing overall performance and results. This work proposes a CNN architecture and applies the Bayesian optimization algorithm to find the best set of hyperparameter values which reduces training and recognition time both. In addition, a new parameter i.e., ‘Network optimization parameter’ (NOP) is defined which considers optimization of hardware resources for a given accuracy of the trained model. This parameter needs to be minimized which helps evaluate the best set of hyperparameter values and is essential for further implementing the CNN model in the hardware platform. The optimization is performed on both the processors, a Central Processing Unit (CPU) and a Graphical Processing Unit (GPU), in optimizing the CNN model to clearly understand the impacts of utilizing different processing units. An accuracy of 99.48 % is achieved for the Modified National Institute of Standards and Technology (MNIST) database, and an accuracy of 88.78 % is achieved for the Canadian Institute For Advanced Research (CIFAR-10) database. The proposed models are highly optimized and have lesser resource requirements (due to the lesser layer complexities and smaller filter sizes) while delivering higher accuracies compared to the available literature. Further, the calculated NOP for the proposed network is highly reduced compared to the published literature.Item Improved Implementation of PYNQ-Based FFT Hardware Accelerator(IEEE, 2024) Asati, AbhijitIn recent times, the idea of hardware accelerators has gained a lot of attraction, which are implemented on Field Programmable Gate Arrays (FPGAs) to speed-up the com-monly used software functions. The interest increased even more when Xilinx released Python Productivity for Zynq (PYNQ), a framework offering a much easier design methodology for such accelerators. In this study, we have designed a hardware accel-erator for a Fast Fourier Transform (FFT) filter using PYNQ. The performance of the designed accelerator is compared with its already available software implementation in the Numerical Python (NumPy) library and with published designs on FFT filter accelerators that used custom-designed Direct Memory Access (DMA) Intellectual Property (IP). However, customizing the IPs requires a lot of specialization in various areas, such as architecture, low-level programming, API development, etc. Therefore, in this research, we have used standard DMA IP available in Vivado along with run-time non-configurable (i.e. fixed size) FFT to show that one can still achieve a high speed-up, or even more and lesser hardware utilization (%) using smart design choices. The technique to employ that fixed size FFT design for different input sample sizes during run time is also explained in the paper. The results for delay, resources, and power consumption for different sample sizes are then shown and compared, which will aid in choosing the platform for computing FFT and designing such accelerators accordingly