BITS Faculty Publications

Permanent URI for this communityhttp://localhost:4000/handle/123456789/1867

Browse

Now showing 1 - 20 of 69

3-D device matrix approach: A new algorithm for plotting energy band diagrams in semiconductors
(IEEE, 2012) Asati, Abhijit
Energy band diagram is one of the important results produced by a device simulator. In order to provide more flexibility and emphasis over this domain, an algorithm is presented which can be used to draw the device band diagram in any plane in either direction for a user specified biasing conditions. Algorithm uses a three dimensional matrix of the device, which is reduced to a two dimensional matrix and subsequently into a vector, after plane and line selection by user. Basics of device level tool development are highlighted through this approach. This work is useful to the students and device level engineers, offering them an interactive and flexible way to draw band diagram anywhere in the device. The algorithm implementation is modular and matrix based, which is done using MATLAB®.
Accurate Iris Localization Using Edge Map Generation and Adaptive Circular Hough Transform for Less Constrained Iris Images
(IJECE, 2016-08) Gupta, Anu; Asati, Abhijit
This paper proposes an accurate iris localization algorithm for the iris images acquired under near infrared (NIR) illuminations and having noise due to eyelids, eyelashes, lighting reflections, non-uniform illumination, eyeglasses and eyebrow hair etc. The two main contributions in the paper are an edge map generation technique for pupil boundary detection and an adaptive circular Hough transform (CHT) algorithm for limbic boundary detection, which not only make the iris localization more accurate but faster also. The edge map for pupil boundary detection is generated on intersection (logical AND) of two binary edge maps obtained using thresholding, morphological operations and Sobel edge detection, which results in minimal false edges caused by the noise. The adaptive CHT algorithm for limbic boundary detection searches for a set of two arcs in an image instead of a full circle that counters iris-occlusions by the eyelids and eyelashes. The proposed CHT and adaptive CHT implementations for pupil and limbic boundary detection respectively use a two-dimensional accumulator array that reduces memory requirements. The proposed algorithm gives the accuracies of 99.7% and 99.38% for the challenging CASIA-Iris-Thousand (version 4.0) and CASIA-Iris-Lamp (version 3.0) databases respectively. The average time cost per image is 905 msec. The proposed algorithm is compared with the previous work and shows better results.
Adiabatic Logic Code Converter Design at Different Sub-micron Technologies
(Springer, 2022-05) Asati, Abhijit
In modern VLSI design, the focus is shifting toward low-power VLSI design techniques to reduce the power density on the chip. Adiabatic logic is suitable in design of low-power VLSI circuits. In this work, we focus on design and analysis of digital code converters. The code converters designed in this work are, namely, gray to binary, binary to gray, and BCD to excess-3 using different logic styles of adiabatic logic at different technology nodes. The adiabatic styles used for design of the code converters in this work are ECRL, IPGL, and 2N_2N2P, and for comparison of power and PDP metrics, static CMOS implementation is taken as reference. All simulations are performed using LTspice simulator at 32, 22, and 16 nanometer technology nodes using PTM models. We have also compared various metrics such as delay, power, and power–delay product (PDP) metrics of the circuits using different logic styles and at different technology nodes.
Analysis & implementation of ultra low-power 4-bit CLA in subthreshold regime
(IEEE, 2014) Gupta, Anu; Asati, Abhijit
The paper presents the analysis and implementation of ultra low-power, low voltage and low area 4-bit carry look ahead adder circuits. Sub-threshold design technique has been used to reduce the power consumption and area while maintaining low complexity of logic design in the proposed circuit. Simulation results illustrate the superiority of the circuits in sub-threshold region against the conventional low power design technique, in terms of power, area and power delay product (PDP). The CLA is implemented on TSMC 0.18μm process models in Cadence Virtuoso Schematic composer with improved driving ability and circuit robustness at 0.4V single ended supply voltage and simulations are carried out on Spectre S. The proposed 4-bit CLA can operate up to 5 MHz and used 0.035 μW of power and occupied an area of 60×92.5 μm 2 .
Analysis of Logical Effort-Based Optimization in the Deep Submicron Technologies
(Springer, 2022-12) Asati, Abhijit
A convenient way to estimate and optimize the delay of VLSI digital circuits is the popular logical effort-based optimization. In this paper, we analyzed the effect of various circuit parameters such as logical effort (G), branching effort (B), electrical effort (H), and parasitic effort (P) on the delay of a given circuit for two different technology nodes, namely 180 and 16 nm. The analysis results show the variation of delay with a particular logical effort parameter. The variation between simulation delay and logical effort delay is indicated by a parameter τ’, which is compared with the τ which is the delay of an inverter driving an identical inverter with no parasitic for a chosen technology. The effectiveness of the logical effort-based optimization is explored. Further, the logical effort-based delay reduction, a super buffer-based delay reduction, and delay of an un-optimized circuit are also compared. The effect of technology on logical effort method for each parameter in the deep submicron sizes has also been investigated in this research work.
Area, Speed and Power Optimized Implementation of a Band-Pass FIR Filter Using High-Level Synthesis
(Springer, 2021-07) Asati, Abhijit; Shekhar, Chandra
This paper proposes an area, speed and power-optimized band-pass digital signal processing filter targeted for Kintex-7 Field Programmable Gate Array device. The filter was designed using MATLAB and Simulink and code generated using HDL Coder from MathWorks. The implementation was created using a novel high-level synthesis design method, which reduces pessimism associated with bit-width constraints in synthesis for inputs, outputs, and intermediate data nodes. MATLAB HDL coder generated Register Transfer Level (RTL) code was implemented on Xilinx Kintex 7 using Vivado software. The obtained results are superior to those of previous implementations for exact filter specifications. We also performed an RTL simulation for the filter and compared the functional verification results with a golden double-precision implementation in MATLAB. The results suggest that constraining the bit width and pessimism reduction has less than 1% impact on the filter accuracy within limits specified by architecture specifications.
Area-optimal FPGA implementation of the YOLO v2 algorithm using High-Level Synthesis
(IEEE, 2020) Asati, Abhijit; Shekhar, Chandra
Field-programmable gate arrays (FPGAs) have been used as pre-silicon validation platforms in VLSI designs. In this paper, we propose a FPGA-based you-only-look-once (YOLO) v2 object detector implementation that provides better performance in terms of speed, achieves higher accuracy, and requires fewer resources compared with the alternatives. It is constructed using a convolutional deep neural network (CNN). We apply high-level synthesis (HLS) to model and optimize the implementation using multiple directives, such as pipelining, loop unrolling, in-lining, etc. The proposed YOLO v2 design is implemented on a Xilinx Zynq xc7z020clg484-1 device. We run simulations to test its functionality using an xSim simulator. The proposed implementation not only runs faster, but it utilizes an order of magnitude fewer resources than available implementations in the literature.
Automated HDL generation of two’s complement Dadda multiplier with Parallel Prefix Adders
(IJAREEIE, 2013) Asati, Abhijit
Dadda multipliers are among the fastest multipliers owing to their logarithmic delay. The partial products of two‟s complement multiplication are generated by an algorithm described by Baugh-Wooley. The complicated and irregular reduction of partial products by Dadda algorithm and use of Parallel Prefix adders with logarithmic delay in the final stage of addition makes it difficult to write a generic Verilog code for them. To solve this difficulty, we described a C program which automatically generates a Verilog file for a Dadda multiplier with Parallel Prefix adders like Kogge-Stone adder, Brent-Kung adder and Han-Carlson adder of user defined size. We compared their post layout results which include propagation delay, area and power consumption. The Verilog codes have been synthesized using 90 nm technology library. We observed that the multiplier using Kogge-Stone adder in the final stage gives higher speed and lower Power Delay Products when compared to that using Brent-Kung and Han-Carlson adders
Automated HDL Generation of Two’s Complement Wallace Multiplier With Paralle Prefix Adders
(IJAREEIE, 2013) Asati, Abhijit
Dadda multipliers are among the fastest multipliers owing to their logarithmic delay. The partial products of two‟s complement multiplication are generated by an algorithm described by Baugh-Wooley. The complicated and irregular reduction of partial products by Dadda algorithm and use of Parallel Prefix adders with logarithmic delay in the final stage of addition makes it difficult to write a generic Verilog code for them. To solve this difficulty, we described a C program which automatically generates a Verilog file for a Dadda multiplier with Parallel Prefix adders like Kogge-Stone adder, Brent-Kung adder and Han-Carlson adder of user defined size. We compared their post layout results which include propagation delay, area and power consumption. The Verilog codes have been synthesized using 90 nm technology library. We observed that the multiplier using Kogge-Stone adder in the final stage gives higher speed and lower Power Delay Products when compared to that using Brent-Kung and Han-Carlson adders.
Clock Gating Analysis of TG Based D Flip-Flop for Different Technology Nodes
(IEEE, 2020) Asati, Abhijit
Dynamic power dissipation depends on the switching activity of the circuit. In this paper we analyzed power consumption of TG based D flip-flop at different technology nodes and power saving obtained by applying dynamic XOR based clock gating technique to this flip-flop. This work deals with implementation of a transmission gate based D flip-flop in 3 different technology nodes namely 32 nm, 22 nm and 16 nm. The circuit level simulation result of D flip-flop shows power consumption with and without clock gating at the several frequencies of operation and several data activity factors at these technology nodes. Although the power dissipation decreases with the lower technology node, the additional power saving may be obtained using the dynamic XOR based clock gating approach at higher frequency of operation and low data activity, which has been investigated in this research work.
Comparative Analysis of ST, ECRL and Static Logic Style at Different Process Technologies
(IEEE, 2023-04) Asati, Abhijit
In the lower VLSI process technologies, to design the low-power VLSI circuits selection of suitable logic style becomes important to minimize the chip’s power to meet the power density need with minimum sacrifice in the speed. This study’s emphasis is on the design and optimization of digital code converters. To compare propagation delay, power consumption and power delay product (PDP) at 32 nm and 22 nm process technologies, sub-threshold (ST) logic style implementations of Gray code to Binary code (GB), Binary code to Gray code (BG), and BCD code to Excess-3 code (BE3) code converters are used. These implementations are compared with Efficient Charge Recovery Logic (ECRL) and static CMOS logic style implementations
Computational Operations and Hardware Resource Estimation in a Convolutional Neural Network Architecture
(Springer, 2022-05) Asati, Abhijit; Shenoy, Meetha V
The convolutional neural network (CNN) models have proved to be very advantageous in computer vision and image processing applications. Recently, due to the increased accuracy of the CNNs on an extensive variety of classification and recognition tasks, the demand for real-time hardware implementations has dramatically increased. They involve intensive processing operations and memory bandwidth for achieving desired performance. The hardware resources and approximate performance estimation of a target system at a higher level of abstraction is very important for optimized hardware implementation. In this paper, initially we developed an ‘Optimized CNN model’, and then we explored the approximate operations and hardware resource estimation for this CNN model along with suitable hardware implementation process. We also compared the computed operations and hardware resource estimation of few published CNN architectures, which shows that optimization process highly helps in reducing the hardware resources along with providing a similar accuracy. This research has mainly focused on the computational complexity of the convolutional and fully connected layers of our implemented CNN model.
Convolutional Neural Network Hardware Optimization Using Bayesian Method
(IEEE, 2024-04) Asati, Abhijit; Shenoy, Meetha V.
Convolutional Neural Network (CNN) models have demonstrated significant benefits in the realm of computer vision and applications related to image processing. Optimizing hyperparameters in CNN models is crucial to ensuring an effective implementation of the model, whether on software, hardware, or a ‘software-hardware co-design’ platform, thereby enhancing overall performance and results. This work proposes a CNN architecture and applies the Bayesian optimization algorithm to find the best set of hyperparameter values which reduces training and recognition time both. In addition, a new parameter i.e., ‘Network optimization parameter’ (NOP) is defined which considers optimization of hardware resources for a given accuracy of the trained model. This parameter needs to be minimized which helps evaluate the best set of hyperparameter values and is essential for further implementing the CNN model in the hardware platform. The optimization is performed on both the processors, a Central Processing Unit (CPU) and a Graphical Processing Unit (GPU), in optimizing the CNN model to clearly understand the impacts of utilizing different processing units. An accuracy of 99.48 % is achieved for the Modified National Institute of Standards and Technology (MNIST) database, and an accuracy of 88.78 % is achieved for the Canadian Institute For Advanced Research (CIFAR-10) database. The proposed models are highly optimized and have lesser resource requirements (due to the lesser layer complexities and smaller filter sizes) while delivering higher accuracies compared to the available literature. Further, the calculated NOP for the proposed network is highly reduced compared to the published literature.
Dedicated hardware architecture for localizing iris in VW images
(Elsevier, 2022-07) Asati, Abhijit; Gupta, Anu
This study presents dedicated hardware for iris localization that can be used as a coprocessor in the development of real-time and low-cost embedded iris biometric systems. Though the hardware architecture is described for iris localization in the visible wavelength (VW) images, the concept used can be applied to near infrared (NIR) images as well. In general, the architecture can be used for a class of iris localization algorithms based on the edge-map generation and circular Hough transform (CHT). The architecture presented here generates the edge-maps for limbic and pupil boundary detection using median filtering followed by Sobel edge detection; however, an additional reflection removal module is used for pupil boundary detection. Further, the CHT hardware module detects circle in each edge-map. The proposed architecture was implemented in programmable logic of the Zynq-7000 SoC device from Xilinx. This hardware implementation gives an iris localization accuracy of 98.43% and average processing time of 5.148 ms for UBIRIS.v1 VW database images (200 × 150 pixel). The algorithm used is suitable for less unconstrained and frontal-view iris images captured with subjects’ active participation; however, the images may contain non-ideal issues such as reflection and occlusion by eyelids and eyelashes.
Dedicated hardware architecture for localizing iris in VW images
(Elsevier, 2022-07) Asati, Abhijit; Gupta, Anu
This study presents dedicated hardware for iris localization that can be used as a coprocessor in the development of real-time and low-cost embedded iris biometric systems. Though the hardware architecture is described for iris localization in the visible wavelength (VW) images, the concept used can be applied to near infrared (NIR) images as well. In general, the architecture can be used for a class of iris localization algorithms based on the edge-map generation and circular Hough transform (CHT). The architecture presented here generates the edge-maps for limbic and pupil boundary detection using median filtering followed by Sobel edge detection; however, an additional reflection removal module is used for pupil boundary detection. Further, the CHT hardware module detects circle in each edge-map. The proposed architecture was implemented in programmable logic of the Zynq-7000 SoC device from Xilinx. This hardware implementation gives an iris localization accuracy of 98.43% and average processing time of 5.148 ms for UBIRIS.v1 VW database images (200 × 150 pixel). The algorithm used is suitable for less unconstrained and frontal-view iris images captured with subjects’ active participation; however, the images may contain non-ideal issues such as reflection and occlusion by eyelids and eyelashes.
Dedicated hardware architecture for localizing iris in VW images
(Elsevier, 2022-07) Asati, Abhijit; Gupta, Anu
This study presents dedicated hardware for iris localization that can be used as a coprocessor in the development of real-time and low-cost embedded iris biometric systems. Though the hardware architecture is described for iris localization in the visible wavelength (VW) images, the concept used can be applied to near infrared (NIR) images as well. In general, the architecture can be used for a class of iris localization algorithms based on the edge-map generation and circular Hough transform (CHT). The architecture presented here generates the edge-maps for limbic and pupil boundary detection using median filtering followed by Sobel edge detection; however, an additional reflection removal module is used for pupil boundary detection. Further, the CHT hardware module detects circle in each edge-map. The proposed architecture was implemented in programmable logic of the Zynq-7000 SoC device from Xilinx. This hardware implementation gives an iris localization accuracy of 98.43% and average processing time of 5.148 ms for UBIRIS.v1 VW database images (200 × 150 pixel). The algorithm used is suitable for less unconstrained and frontal-view iris images captured with subjects’ active participation; however, the images may contain non-ideal issues such as reflection and occlusion by eyelids and eyelashes.
Design and Analysis of a Scan Chain in Subthreshold Region
(IEEE, 2023) Asati, Abhijit
Testing of manufactured Integrated Circuit (IC) is performed using design for testability (DFT) techniques such as scan chain which is most popular in sequential circuits. The scan cell involves the modification of a D flip-flop (DFF) with a multiplexer at its input. During testing, a pattern is applied through the scan input pin (SI) in which individual flip-flops toggle their values as the test patterns are shifted in hence a significant amount of power is consumed in scan chain. Although moving to a lower technology node decreases the power consumption in a circuit, a further drastic reduction (i.e. 10 6 order) in power consumption is obtained by operating the circuit in the subthreshold region. In this work scan chain is designed to operate correctly in the subthreshold region using suitable device sizes, using both transmission gate (TG) based and true single phase clocked (TSPC) logic for 16, 22 & 32 nm technology nodes. Further, their average powers are compared. In addition, the Monte Carlo simulation and comparative analysis are performed to study the effect of variation of power supply and temperature.
Design and ASIC implementation of column compression Wallace/Dadda multiplier in sub-threshold regime
(IEEE, 2015) Gupta, Anu; Asati, Abhijit
In this paper, the design and comparative analysis is done in between the most well-known column compression multipliers by Wallace [5] and Dadda [6] in sub-threshold regime. In order to reduce the hardware which ultimately reduces an area and power, energy efficient basic modules AND gates, half adders, full adders and partial product generate units have been analyzed. At the last stage ripple carry adder (RCA) and Han-Carlson adder are used to implement Wallace and Dadda multiplier. The performance metrics considered for the analysis of the adders are: power, delay and PDP. Simulation studies are carried out for 8x8 input data width. The proposed circuits show an energy efficient agreement with Spectre simulations using 45nm CMOS technology at 0.4V supply voltage. The proposed Wallace/Dadda multipliers using Han-Carlson adder (HCA) outperform its counterparts exhibiting low power consumption and lesser propagation delay as compared to Wallace/Dadda multipliers using RCA operated in the subthreshold region
Design of a Static Current Simulator Using Device Matrix Approach
(IEEE, 2012) Asati, Abhijit
I-V characteristic is one of the important results produced by a device simulator. In this article, a novel and interactive matrix based algorithm is presented to draw the device structure in 2-D or 3-D style and to plot the I-V characteristic of the device for user specified doping and biasing conditions. Algorithm creates 2-D or 3-D matrix of the device from device description mentioned by the user. This device matrix undergoes many different operations, and various mathematical computations are performed, using which I-V characteristic is plotted. This approach gives a novel idea of basic device level tool development. The students and device level engineers can find this work useful which offers them an interactive and instant way to draw I-V characteristics of the device. The algorithm implementation is modular and matrix based, which is done using MATLAB ® .
Design of ultra low power flip flops in sub-threshold region for bio-medical application in 45nm, 32nm and 22nm technologies
(IEEE, 2015) Asati, Abhijit
Designing of low power circuits is one of the most important research topic currently. Specially, for Medical Implant devices which run on non-rechargeable batteries power consumption becomes the most important issue as these batteries are very expensive. Majority of the human body signals are of low frequency which makes power consumption more important factor than the performance (speed). Hence, through designing the circuits in the subtheshold region one can actually save on the power by compromising with the maximum operating frequency. This paper presents flip flop architectures and their performance metrics in subtheshold region of operation.