Department of Electrical and Electronics Engineering

Permanent URI for this collectionhttp://localhost:4000/handle/123456789/1925

Browse

Now showing 1 - 20 of 49

Ab Initio Study of Carbon Nanotube Field Effect Transistor Gas Sensor for Detection of Ammonia and Nitrogen Dioxide Gas
(IEEE, 2022-07) Gupta, Navneet; Chaturvedi, Nitin
Lebel-free sensors are capable for sensing low concentration of gas molecules. In this article, the importance of Carbon Nanotube Field Effect Transistor (CNFET) is described for gas sensing application. The first principal study to investigate the CNFET to detection of low concentration of ammonia (NH 3 ) and nitrogen dioxide (NO 2 ) gas molecules. By discussing the electronic and transport properties of CNFET, we find that CNFET can be used for gas sensing applications. Detailed analysis of binding energy, e-k diagram, density of state (DOS), device density of state (DDOS), transmission pathways and current-voltage (I-V) characteristics configuration have been performed using density functional theory (DFT) and non-equilibrium green function (NEGF) method. It has been observed that CNFET can be used for the potential application of gas sensor at room temperature. Out theoretical findings are corroborated with experimental data and this virtual device structure can be converted into physical device to get nano dimensions integrated gas senso
Adaptive Block Pinning Based: Dynamic Cache Partitioning for Multi-core Architectures
(IJCST, 2010-12) Chaturvedi, Nitin
This paper is aimed at exploring the various techniques currently used for partitioning last level (L2/L3)caches in multicore architectures, identifying their strengths and weaknesses and thereby proposing a novel partitioning scheme known as Adaptive Block Pinning which would result in a better utilization of the cache resources in CMPs. The widening speed gap between processors and memory along with the issue of limited on-chip memory bandwidth make the last-level cache utilization a crucial factor in designing future multicore processors. Contention for such a shared resource has been shown to severely degrade performance when running multiple applications. As architectures incorporate more cores, multiple application workloads become increasingly attractive, further exacerbating contention at thelast-level cache. Several Non-Uniform Cache Architecture (NUCA) schemes have been proposed which try to optimally use the capacity of last-level shared caches and lower access times on an average. This isdone by continually monitoring the cache usage by each core and dynamically partitioning it so as to increment the overall hit ratio.
An Adaptive Block Pinning Cache for Reducing Network Traffic in Multi-core Architectures
(IEEE, 2013) Chaturvedi, Nitin
With advent of new technologies there is exponential increase in multi-core processor (CMP) cache sizes accompanied by growing on-chip wire delays make it difficult to implement traditional caches with single, uniform access latency. Non-Uniform Cache Architecture (NUCA) designs have been proposed to address this issue. A NUCA partitions the complete cache memory into smaller multiple banks and allows banks near the processor cores to have lower access latencies than those further away, thus reducing the effects of the cache's internal wire delays. Traditionally, NUCA organizations have been classified as static (S-NUCA) and dynamic (D- NUCA). While in S-NUCA a data block is mapped to a unique bank in the NUCA cache, D-NUCA allows a data block to be mapped in multiple banks. In D-NUCA designs a data blocks can migrate towards the processor core that access them most frequently. This migration of data blocks will increase network traffic. The short life time of data blocks and low spatial locality in many applications results in eviction of block with few unused words. This effectively increases miss rate, and waste on chip network bandwidth. Unused word transfers also wastes a large fraction of on chip energy consumption.In this paper, we present an efficient and implementable cache design that eliminate unnecessary coherence traffic and match data movements to an applications spatial locality. It also presents one way to scale on-chip coherence with less costeffective techniques such as shared caches augmented to track cached copies, explicit eviction notification and hierarchal design. Based on our scalability analysis of this cache design we predict that this design consistently reduce miss rate and improve the fraction of data transmitted that is actually utilized by the application
An adaptive coherence protocol with adaptive cache for multi-core architectures
(IEEE, 2013) Chaturvedi, Nitin
Next generation multicore processors and their applications will process massive amounts of data with significant sharing. Data movement between cores and shared cache hierarchy and its management impacts memory access latency and consumes power. The efficiency of high-performance shared-memory multicore processors depends on the design of the on-chip cache hierarchy and the coherence protocol. Current multicore cache hierarchies uses a fixed size of cache block in the cache organization and in the design of the coherence protocols. The fixed size of block in the set is basically choosen to match average spatial locality requirement across a range of applications, but it also results in wastage of bandwidth because of unnecessary coherence traffic for shared data. The additional bandwidth has a direct impact on the overall energy consumption. In this paper, we present a new adaptable and implementable cache design with novel proposal of the design of cache coherence protocol that eliminate unnecessary coherence traffic and match data movements to an applications spatial locality.
An adaptive migration–replication scheme (AMR) for shared cache in chip multiprocessors
(Springer, 2015-07) Chaturvedi, Nitin
Most of today’s chip multiprocessors implement last-level shared caches as non-uniform cache architectures. A major problem faced by such multicore architectures is cache line placement, especially in scenarios where multiple cores compete for line usage in the single non-uniform shared L2 cache. Block migration has been suggested to overcome the problem of optimum placement of cache blocks. Previous research, however, shows that an uncontrolled block migration scheme leads to scenarios where a cache line ‘ping-pongs’ between two requesting cores resulting in higher access latency for both the requestors and greater power dissipation. To address this problem, this paper first proposes a mechanism to dynamically profile data block usage from different cores on the chip. We then propose an adaptive migration–replication scheme for shared last-level non-uniform cache architectures that adapts between selectively replicating frequently used cache lines near the requesting cores and cache line migration towards the requesting core in case of fewer requests. AMR eliminates ‘ping-ponging’ of cache lines between the banks of the requesting cores. However, any mechanism that dynamically adapts between migration and replication at runtime is bound to have a complex search scheme to locate data blocks. To simplify the data lookup policy, this work also presents an efficient data access mechanism for non-uniform cache architectures. Our proposal relies on low overhead and highly accurate in-hardware pointers to keep track of the on-chip location of the cache block. We show that our proposed scheme reduces the completion time by on average 12.25, 8.1 and 3 % and energy consumption by 11.65, 8.5 and 2.1 % when compared to state-of-the-art last-level cache management schemes S-NUCA, D-NUCA and HK-NUCA, respectively. SPEC and PARSEC benchmarks were used to thoroughly evaluate our proposal.
Adaptive Zone-Aware Multi-bank on Chip last level L2 Cache Partitioning for Chip Multiprocessors
(IJCA, 2010) Chaturvedi, Nitin
This paper proposes a novel efficient Non-Uniform Cache Architecture (NUCA) scheme for the Last-Level Cache (LLC) to reduce the average on-chip access latency and improve core isolation in Chip Multiprocessors (CMP). The architecture proposed is expected to improve upon the various NUCA schemes proposed so far such as S-NUCA, D-NUCA and SP-NUCA[9][10][5] in terms of average access latency without a significant reduction in the hit rate. The complete set of L2 banks is divided into various zones. Each core belongs to one particular zone which is the closest to it. Consequently, adjacent cores are grouped into the same zone. Each zone individually follows the SP-NUCA scheme [5] for maintaining core isolation and sharing common blocks. However, blocks that need to be shared by cores which belong to different zones are replicated. This scheme is much more scalable than the SP-NUCA scheme and bounds the maximum on-chip access latency to a lower value as the number of cores increases.
A CMOS/MTJ Based Novel Non-volatile SRAM Cell with Asynchronous Write Termination for Normally OFF Applications
(Springer, 2019) Chaturvedi, Nitin
Non-volatile SRAM (NV-SRAM) enables normally off computing while achieving faster power off/on time by storing the state in its locally embedded non-volatile elements. Emerging magnetic memory such as spin transfer torque magnetic tunnel junction (STT-MTJ) is preferred in the NV-SRAM design because of its attractive features like unlimited endurance, high density, scalability and CMOS compatibility. However, write operation in MTJ is stochastic which means duration of MTJ write is undeterministic. As a result, it suffers from reliability issue like write errors. Existing solution to reduce the write error rate mainly consist of increased write pulse duration which leads to high power consumption. However, if write completion could be detected on fly and write current could be cut-off immediately, energy consumption can be reduced by a large extent. Therefore, this work proposes a novel non-volatile SRAM cell with asynchronous write termination scheme. In the proposed NV-SRAM cell, write operation is continuously monitored and terminated as soon as MTJ is switched to the required state. Our analysis indicates that the proposed cell achieves reduction in write power by 23% when compared with the cell without write assist. Moreover, the proposed write termination circuit achieves 2.52–14% more power saving when compared to existing write termination circuits.
A comparative analysis of read/write assist techniques on performance & margin in 6T SRAM cell design
(IEEE, 2017) Chaturvedi, Nitin
With the advent of technology, a change from feature size to nanometer regime resulted in the scaling of operating voltages and dimensions. Reducing them can greatly boost the energy efficiency but it also leads to increased design challenges. To deal with the activity limitations imposed by the low overdrive voltage and the intrinsic read stability/write margin trade off, large scale SRAM arrays largely rely on assist techniques. These techniques address the problem of preserving the functionality of the 6T SRAM cell by improving the read and write margins of the cell. In this paper, we show a comprehensive analysis of the effectiveness of some assist methods. This paper presents the margin sensitivity analysis of assist techniques to assess the productiveness of assist methods and to investigate their direct impact on the voltage sensitive yield. In addition, the effect of temperature variation and process variation have also been analyzed
Design and analysis of 6T SRAM cell with NBL write assist technique using FinFET
(IEEE, 2017) Chaturvedi, Nitin
Using FinFET for designing of SRAM cells has shown a great deal of advantages over planar bulk devices due to the additional control on the gates and due to fully depleted behavior. The improvements have been noted in sub-threshold slope, drive currents, short-channel effects and mismatches. As the memories become denser, the stability of the SRAM cells becomes a point of great concern. This calls for the need of assist circuitry for improving the reliability and stability of the cells. In this work, a write assist technique is discussed to improve the stability of the device. This design decreases the WL CRIT drastically and reduces the write delay of the cell. The simulations have been carried out on HSPICE with 32 nm PTM libraries for FinFET.
Design and Analysis of Modified Strong Arm Latch Comparator with Reduced Kickback Noise
(Springer, 2024-10) Gupta, Anu; Shekhar, Chandra; Chaturvedi, Nitin
This research paper introduces three techniques to reduce kickback noise in the Strong Arm Latch Comparator (SAL). The first technique focuses on utilizing high clock power and generating two clocks with different duty cycles. While initially addressing the issue by applying a single clock to the kickback-reducing circuit, the reduction of kickback noise did not meet the desired level. To overcome this limitation, a new design is proposed, incorporating a delay in the programmability of the kickback-reducing circuit, which effectively eliminates the need for kickback and clock requirements. A comparative study is conducted, evaluating all the designs, including the proposed design, based on power, delay, and analysis of various types of noise. Results show that the proposed technique outperforms other kickback-reducing designs in terms of propagation latency, power consumption, and kickback currents. Additionally, the impact of a comparator’s common-mode voltage (Vcm) on its performance in TSMC 180 nm CMOS technology is demonstrated using the Cadence Schematic Editor tool.
Design and implementation of successive approximation register data converter
(AIP, 2024) Gupta, Anu; Chaturvedi, Nitin; Shekhar, Chandra
Analog-to-Digital Converters (ADCs) serve as crucial interfaces between the analog and digital domains, facilitating the transformation of analog signals into digital representations. Data processing in the digital domain presents distinct performance advantages over the analog domain in particular aspects. To facilitate the reverse conversion of processed digital signals back into the real-world signal domain, Charge Redistribution Digital-to-Analog Converters (DACs) are employed. DACs also play a pivotal role as significant components in specific ADC architectures, such as the Successive Approximation Register (SAR) Analog-to-Digital (A/D) Converter. Moreover, a Strong-Arm Latch Comparator has been utilized to compare the input analog voltage with the output voltage of the DAC. This paper primarily focuses on the implementation and thorough analysis of the SAR-ADC. The study includes calculatinganalog voltages’ precise range and corresponding digital outputs. The maximum Differential Non-Linearity (DNL) error, offset error, and full-scale error for this specific SAR-ADC have been measured and found to be 0.28*LSB, 0.2*LSB, and 0.22*LSB, respectively. The results presented in this paper provide valuable insights into the performance and accuracy of the SAR-ADC, paving the way for further advancements and applications in the domain of A/D conversion.
Design of a Low Power 11T-1MTJ Non-Volatile SRAM Cell with Half-Select Free Operation
(IEEE, 2020) Chaturvedi, Nitin
Over the past few decades, CMOS scaling has been a key driving factor to achieve faster, cheaper and denser digital systems. However, as the technology scales down, there is an exponential increase in leakage current which poses serious design challenges for low power system. SRAM being the biggest on-chip component, suffers from large static power dissipation which in turn significantly affects the overall performance of the system. In addition to large power consumption, SRAM cell also suffers from half-select disturbance issue which severely degrades the reliability of system. So, to address the aforementioned challenges, we review and compare the various existing SRAM cells in order to select the best SRAM cell design (TFC-9T) which offers advantages of low power and half-select disturbance free operation. To further reduce the static power consumption, we propose to modify the selected TFC-9T SRAM cell using emerging non-volatile magnetic tunnel junction (MTJ).
Design of a Low Power Approximate Adder based on Magnetic Tunnel Junction for Image Processing Applications
(IEEE, 2021) Chaturvedi, Nitin
With the growth of big data applications such as voice/speech recognition, data mining, and computer vision, conventional computing system faces significant challenges. The increasing computational complexity and large data set results in large power consumption. To address this challenge, we propose to combine the benefits of approximate and in-memory computing which effectively reduces power consumption without any significant impact on the output. In this work, a low power approximate adder based on nonvolatile memory element (Magnetic Tunnel Junction (MTJ)) is designed for a wide range of applications. Furthermore, the proposed approximate adder is demonstrated to perform edge detection on a 512x512 image using the Sobel Edge Detection Algorithm. The effect on the quality of image using metrics like mean square error (MSE), peak signal to noise ratio (PSNR), and structural similarity index (SSIM) are also investigated.
Design of a Programmable Delay Line with On-Chip Calibration to Achieve Immunity Against Process Variations
(Springer, 2022-12) Chaturvedi, Nitin
In recent times, CMOS Delay Lines (DL) are rapidly gaining interest due to increased demand for high precision delay in VLSI systems. Delay lines serve as a fundamental block for a wide range of applications including Delay Locked Loops (DLL), Phase Locked Loop (PLL), ring oscillators, clock synchronizers, etc. providing precise time delays. However, one of the major challenges faced by the CMOS delay line is the deviation in delays due to process, voltage and temperature (PVT) variations. Addressing this challenge, in this work, we aim to mitigate the impact of one of these variations on the delay line. Therefore, we propose to design a programmable delay line (DL) based on a voltage-controlled buffer which is insensitive to the process variation. To achieve immunity against process variations and obtain a high precision delay value, a novel calibration technique is proposed which dynamically tunes the biasing voltage of the buffer resulting in a constant delay under all process corners. Our simulation results for the proposed DL demonstrate a total delay of 559 psec with a delay error of less than 2%.
Design of a Robust Logic Gate using Magnetic Tunnel Junction
(IEEE, 2019) Chaturvedi, Nitin
In the era of big data, limited communication bandwidth poses a great challenge for the conventional Von-Neumann architecture. Moreover, significant data movement between memory and processor to handle ever growing data set further degrade the system performance. To address this issue the most efficient way is to perform computation within the memory. This promising solution of integrating logic within the memory avoids expensive data transfers between memory and processor thereby resulting in higher performance and energy efficiency. Therefore, in this paper emerging non-volatile memory such as Magnetic Random Access Memory (MRAM) is explored as one of the most promising candidates to compute within the memory. It offers several additional advantages such as zero standby leakage power consumption and instant on capability. This work presents the structure of Computational Random-Access Memory (CRAM) and design of universal logic gates such as NAND and NOR. Next, to increase the reliability of these gates a novel technique is proposed which significantly reducing the functional error probability.
Design of a STT-MTJ Based Random-Access Memory With In-situ Processing for Data-Intensive Applications
(IEEE, 2022-08) Chaturvedi, Nitin
During the last few years, deep learning techniques are frequently applied in large-scale image processing, detection in a variety of computer vision, cognitive tasks, and information analysis applications. The execution of deep learning algorithms like CNN and FCNN requires high dimensional matrix multiplication, which contributes to significant computational power. The frequent data movement between memory and core is one of the main reasons for considerable power consumption and latency, hence becoming a major performance bottleneck for conventional computing systems. To address this challenge, we propose an in-memory computing array that can perform computation directly within the memory, hence, reducing the overhead associated with data movement. The proposed Random-Access Memory with in-situ Processing (RAMP) array reconfigures the emerging magnetic random-access memory to realize logic and arithmetic functions inside the memory. Furthermore, the array supports independent operations over multiple rows and columns, which helps in accelerating the execution of matrix operations. To validate the functionality and evaluate the performance of the proposed array, we perform extensive spice simulations. At 45nm, the proposed array takes 5.39 ns, 0.68 ns, 0.68 ns, 0.7 ns and consumes 2.2 pJ/bit, 0.21 pJ/bit, 0.23 pJ/bit, 0.7 pJ/bit while performing a memory write, memory read, logic, and arithmetic operations respectively.
Design of a tunable delay line with on-chip calibration to generate process-invariant PWM signal for in-memory computing
(Springer, 2023-06) Shenoy, Meetha V.; Chaturvedi, Nitin
The recent compute-in-memory (CiM) architectures are proposed as a promising solution to support Deep Neural Network and Convolutional Neural Network to solve large and complex tasks in various machine learning applications. The CiM architecture overcomes the limitation of the current Von-Neumann architecture by performing logic computations within the memory also called as in-memory computing. In most CiM, the in-memory logic operations are performed on the weights stored in memory using the inputs that are processed through bitlines or wordlines using pulse width modulated (PWM) signals. For precise operation, the applied input signals must be stable. However, one of the main challenges faced during the input signal generation is the deviation in the width values due to process, voltage, and temperature variations. Addressing this challenge, in this work, we aim to mitigate the impact of one of these variations on the generated PWM signals. Therefore, in this work, we propose to design a tunable delay line that provides a linear PWM signal corresponding to an input vector which is further utilized to perform local computation in memory. Further, to minimize the impact of process variations, we propose an autonomous on-chip calibration circuit that dynamically tunes the delay lines to obtain stable and process-invariant pulse width modulated signals. Our simulation results for the proposed DL demonstrate a total delay of 559 psec with a delay error of less than 2% under various process corners.
Design of an MTJ/CMOS-Based Asynchronous System for Ultra-Low Power Energy Autonomous Applications
(World Scientific, 2021) Chaturvedi, Nitin
Most of today’s IoT-based computing systems offer an opportunity to build smarter systems for application areas such as healthcare monitoring and wireless sensor nodes. Since these systems are energy limited and remain idle for most of the time, they suffer from large leakage power dissipation. Another problem faced by such computing systems is sporadic power failures when employed with energy harvesters where the system loses its current state and needs long reinitialization time. To address these problems, this work combines asynchronous design techniques with nonvolatility to achieve ultra-low power operation during active mode and data retention during power failure. This paper first presents a detailed analysis of different implementations of volatile c-element and compares their performance in terms of power and delay. Then one of the implementations is selected for nonvolatile design of a hybrid c-element using emerging spin transfer torque–magnetic tunnel junction (STT–MTJ) technology which allows energy-efficient data retention during idle mode/power-off mode and during sudden power failures. Using this hybrid c-element, we design a novel nonvolatile weak conditioned half-buffer. The extensive analysis of these designs with different design metrics is performed at the circuit level using Synopsys HSPICE circuit simulator.
Design of In-Memory Computing Enabled SRAM Macro
(IEEE, 2022) Chaturvedi, Nitin
The era of nanoscale devices has resulted in tremendously fast and compact modern processing systems. The von-neumann architecture is still one of the most widely adopted architectures in these computing systems comprising separate memory and processing units. However, the growing computational requirements of emerging applications with large data set are posing a great challenge to these conventional computing systems due to constant data transfer between the two physically separate memory and computing block. The heavy data transportation between the processing core and memory results in large power consumption, especially for big-data applications. Addressing this challenge, we propose to bring processing closer to the memory. Therefore, in this work, we design an In-Memory Computing enabled SRAM macro (IMC-SRAM) which is capable of performing logical computations within memory in addition to normal memory operations. We utilize differential 9T bitcell and modified peripheral circuitry to realize boolean logic operation such as AND/NAND and OR/NOR within the memory array. The proposed design has been validated using SPICE simulations with operating frequency of 1GHz across all process corners using NCSU 45nm technology.
Design of non-volatile asynchronous circuit using CMOS-FDSOI/FinFET technologies
(IEEE, 2016) Chaturvedi, Nitin
This paper investigates the application of Spin-Transfer Torque Magnetic Tunnel Junctions (STT-MTJ) in nonvolatile memory design. MTJs are favored in NVM design as they can provide indefinite data retention and very high read/write speeds. In this work, we have presented the design and analysis of a non-volatile, low power Muller C-element with almost-zero leakage current and instantaneous back-up and wake-up times. The simulations results of the C-element based on technology incorporating CMOS FD-SOI and Spin Transfer Torque MTJs are compared with those of a design in which FinFETs are utilized instead of the FDSOI transistors. The two implementations are compared on the basis of idle power consumption, energy required for read and write functionality as well as output delay in addition to the scalability analysis of both the technologies.