BITS Faculty Publications
Permanent URI for this communityhttp://localhost:4000/handle/123456789/1867
Browse
49 results
Search Results
Item VLSI for embedded intelligence:(Springer, 2025) Gupta, Anu; Chaturvedi, NitinThis book constitutes the proceedings of the 27th International Symposium on VLSI Design and Test, VDAT 2023. The 32 regular papers and 16 short papers presented in this book are carefully reviewed and selected from 220 submissions. They are organized in topical sections as follows: Low-Power Integrated Circuits and Devices; FPGA-Based Design and Embedded Systems; Memory, Computing, and Processor Design; CAD for VLSI; Emerging Integrated Circuits and Systems; VLSI Testing and Security; and System-Level Design.Item Tunable energy-efficient approximate circuits for self-powered AI and autonomous edge computing systems(IEEE, 2025-03) Chaturvedi, NitinArtificial Intelligence is applied in various domains of compute-intensive applications ranging from image recognition healthcare to statistical analysis. Additionally, recent advancements in Deep Neural Network (DNN) running on millions of devices for various AI tasks deliver an accuracy comparable to human levels. However, the accuracy in computations came with an additional cost of increased computational resources and power consumption in traditional computing units. Moreover, this problem becomes more complex while deploying computationally intensive heavy machine learning (ML) models on energy-constrained edge devices. Approximate computing has emerged as a promising paradigm for error-tolerant AI/ML applications deployed on energy-constrained edge devices where the complexity of hardware computing units can be reduced by optimizing circuit logic while slightly trading off computational accuracy. Therefore, we propose novel approximate compressors to design multiply and accumulate (MAC) hardware unit of Deep Neural Network and Convolutional Neural Network (DNN/CNN) that achieve energy-efficient and faster computations with slightly reduced precision. We also propose tunable compressors and MAC unit that support switching between two approximation modes to enable runtime adjustment of energy efficiency and accuracy for energy-autonomous edge devices. We validated and verified the design of the proposed approximate circuit at 7nm and 55nm technology nodes. The simulation result for the tunable compressor shows an average reduction of 49 % in energy consumption, and 30 % in delay compared to the state-of-the-art compressor. In addition, an average reduction of 36 % in energy consumption and 18 % in delay was observed for the MAC unit compared with the conventional MAC.Item Design of a Programmable Delay Line with On-Chip Calibration to Achieve Immunity Against Process Variations(Springer, 2022-12) Chaturvedi, NitinIn recent times, CMOS Delay Lines (DL) are rapidly gaining interest due to increased demand for high precision delay in VLSI systems. Delay lines serve as a fundamental block for a wide range of applications including Delay Locked Loops (DLL), Phase Locked Loop (PLL), ring oscillators, clock synchronizers, etc. providing precise time delays. However, one of the major challenges faced by the CMOS delay line is the deviation in delays due to process, voltage and temperature (PVT) variations. Addressing this challenge, in this work, we aim to mitigate the impact of one of these variations on the delay line. Therefore, we propose to design a programmable delay line (DL) based on a voltage-controlled buffer which is insensitive to the process variation. To achieve immunity against process variations and obtain a high precision delay value, a novel calibration technique is proposed which dynamically tunes the biasing voltage of the buffer resulting in a constant delay under all process corners. Our simulation results for the proposed DL demonstrate a total delay of 559 psec with a delay error of less than 2%.Item Design of In-Memory Computing Enabled SRAM Macro(IEEE, 2022) Chaturvedi, NitinThe era of nanoscale devices has resulted in tremendously fast and compact modern processing systems. The von-neumann architecture is still one of the most widely adopted architectures in these computing systems comprising separate memory and processing units. However, the growing computational requirements of emerging applications with large data set are posing a great challenge to these conventional computing systems due to constant data transfer between the two physically separate memory and computing block. The heavy data transportation between the processing core and memory results in large power consumption, especially for big-data applications. Addressing this challenge, we propose to bring processing closer to the memory. Therefore, in this work, we design an In-Memory Computing enabled SRAM macro (IMC-SRAM) which is capable of performing logical computations within memory in addition to normal memory operations. We utilize differential 9T bitcell and modified peripheral circuitry to realize boolean logic operation such as AND/NAND and OR/NOR within the memory array. The proposed design has been validated using SPICE simulations with operating frequency of 1GHz across all process corners using NCSU 45nm technology.Item Design of a tunable delay line with on-chip calibration to generate process-invariant PWM signal for in-memory computing(Springer, 2023-06) Shenoy, Meetha V.; Chaturvedi, NitinThe recent compute-in-memory (CiM) architectures are proposed as a promising solution to support Deep Neural Network and Convolutional Neural Network to solve large and complex tasks in various machine learning applications. The CiM architecture overcomes the limitation of the current Von-Neumann architecture by performing logic computations within the memory also called as in-memory computing. In most CiM, the in-memory logic operations are performed on the weights stored in memory using the inputs that are processed through bitlines or wordlines using pulse width modulated (PWM) signals. For precise operation, the applied input signals must be stable. However, one of the main challenges faced during the input signal generation is the deviation in the width values due to process, voltage, and temperature variations. Addressing this challenge, in this work, we aim to mitigate the impact of one of these variations on the generated PWM signals. Therefore, in this work, we propose to design a tunable delay line that provides a linear PWM signal corresponding to an input vector which is further utilized to perform local computation in memory. Further, to minimize the impact of process variations, we propose an autonomous on-chip calibration circuit that dynamically tunes the delay lines to obtain stable and process-invariant pulse width modulated signals. Our simulation results for the proposed DL demonstrate a total delay of 559 psec with a delay error of less than 2% under various process corners.Item A High-Speed Bitwise Computation in SRAM Using Assisted Bitline Charging/Discharging(Springer, 2023) Chaturvedi, NitinToday’s era of nanoscale integration and lower technology nodes has yielded tremendously fast and compact processors and memories. However, the fundamental principle of VLSI architectures such as von-neumann remains intact. Hence, the bottleneck associated with them, such as restricted throughput due to frequent data movement, needs to be addressed in the era of pervasive computing consisting of extensive data-intensive applications such as AI, ML, DL. One possible solution to overcome this bottleneck is direct in-memory logic operations. Therefore, this paper aims to design a robust Compute-in-Memory SRAM (CiM-SRAM) capable of executing logical functions directly within memory in addition to normal read and write operations. This work proposes a logic-decoupled bitcell capable of computing universal logical functions like NAND/AND and NOR/OR. Further, to accelerate the computation, we utilize a bitline assist circuit that rapidly charges/discharges the bitline, thereby reducing the computational time by 20%. The proposed design has been simulated at 1 GHz frequency across all process corners using 45 nm technology and validated to demonstrate functional feasibility and significant performance improvement.Item Modeling, hardware architecture, and performance analyses of an AEAD-based lightweight cipher(Springer, 2024-02) Chaturvedi, NitinEnsuring data security and integrity is crucial for achieving the highest level of protection and performance in modern cyber-physical systems (CPS). Authenticated encryption with associated data (AEAD) is an efficient and secure way to encrypt data that ensures confidentiality and authenticity. The proposed work focuses on image encryption using the TinyJAMBU cipher within the AEAD scheme. In this paper, image encryption using the TinyJAMBU cipher with software and hardware modeling has been proposed, and image encryption evaluation over standard matrices has been performed. The hardware architecture for TinyJAMBU has been implemented on the Xilinx Virtex-7 FPGA device. The implementation results are compared with the realization of other contemporary ciphers that make TinyJAMBU-128’s implementation better in terms of look-up tables (LUTs), slice utilization, and power consumption. In the experimentation phase, the results of TinyJAMBU-128/192/256 for image encryption have been compared with existing image encryption techniques. It has been observed that, compared to other implementations, the proposed image encryption application using TinyJAMBU provides better results for PSNR, MSE, RMSE, and UACI.Item Design and implementation of successive approximation register data converter(AIP, 2024) Gupta, Anu; Chaturvedi, Nitin; Shekhar, ChandraAnalog-to-Digital Converters (ADCs) serve as crucial interfaces between the analog and digital domains, facilitating the transformation of analog signals into digital representations. Data processing in the digital domain presents distinct performance advantages over the analog domain in particular aspects. To facilitate the reverse conversion of processed digital signals back into the real-world signal domain, Charge Redistribution Digital-to-Analog Converters (DACs) are employed. DACs also play a pivotal role as significant components in specific ADC architectures, such as the Successive Approximation Register (SAR) Analog-to-Digital (A/D) Converter. Moreover, a Strong-Arm Latch Comparator has been utilized to compare the input analog voltage with the output voltage of the DAC. This paper primarily focuses on the implementation and thorough analysis of the SAR-ADC. The study includes calculatinganalog voltages’ precise range and corresponding digital outputs. The maximum Differential Non-Linearity (DNL) error, offset error, and full-scale error for this specific SAR-ADC have been measured and found to be 0.28*LSB, 0.2*LSB, and 0.22*LSB, respectively. The results presented in this paper provide valuable insights into the performance and accuracy of the SAR-ADC, paving the way for further advancements and applications in the domain of A/D conversion.Item Design and Analysis of Modified Strong Arm Latch Comparator with Reduced Kickback Noise(Springer, 2024-10) Gupta, Anu; Shekhar, Chandra; Chaturvedi, NitinThis research paper introduces three techniques to reduce kickback noise in the Strong Arm Latch Comparator (SAL). The first technique focuses on utilizing high clock power and generating two clocks with different duty cycles. While initially addressing the issue by applying a single clock to the kickback-reducing circuit, the reduction of kickback noise did not meet the desired level. To overcome this limitation, a new design is proposed, incorporating a delay in the programmability of the kickback-reducing circuit, which effectively eliminates the need for kickback and clock requirements. A comparative study is conducted, evaluating all the designs, including the proposed design, based on power, delay, and analysis of various types of noise. Results show that the proposed technique outperforms other kickback-reducing designs in terms of propagation latency, power consumption, and kickback currents. Additionally, the impact of a comparator’s common-mode voltage (Vcm) on its performance in TSMC 180 nm CMOS technology is demonstrated using the Cadence Schematic Editor tool.Item A CMOS/MTJ Based Novel Non-volatile SRAM Cell with Asynchronous Write Termination for Normally OFF Applications(Springer, 2019) Chaturvedi, NitinNon-volatile SRAM (NV-SRAM) enables normally off computing while achieving faster power off/on time by storing the state in its locally embedded non-volatile elements. Emerging magnetic memory such as spin transfer torque magnetic tunnel junction (STT-MTJ) is preferred in the NV-SRAM design because of its attractive features like unlimited endurance, high density, scalability and CMOS compatibility. However, write operation in MTJ is stochastic which means duration of MTJ write is undeterministic. As a result, it suffers from reliability issue like write errors. Existing solution to reduce the write error rate mainly consist of increased write pulse duration which leads to high power consumption. However, if write completion could be detected on fly and write current could be cut-off immediately, energy consumption can be reduced by a large extent. Therefore, this work proposes a novel non-volatile SRAM cell with asynchronous write termination scheme. In the proposed NV-SRAM cell, write operation is continuously monitored and terminated as soon as MTJ is switched to the required state. Our analysis indicates that the proposed cell achieves reduction in write power by 23% when compared with the cell without write assist. Moreover, the proposed write termination circuit achieves 2.52–14% more power saving when compared to existing write termination circuits.