# Low-Power High-Speed and Compact Ternary VLSI Circuit Designs using Carbon Nanotube Field Effect Transistors #### **THESIS** Submitted in partial fulfillment of the requirements for the degree of **DOCTOR OF PHILOSOPHY** by SNEH LATA MUROTIYA 2010PHXF026P Under the Supervision of **Prof. Anu Gupta** BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI 2015 ## BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI ## **CERTIFICATE** This is to certify that the thesis entitled "Low-Power High-Speed and Compact Ternary VLSI Circuit Designs using Carbon Nanotube Field Effect Transistors" and submitted by Sneh Lata Murotiya, ID No 2010PHXF026P for award of Ph.D. of the Institute embodies original work done by her under my supervision. (Signature of the Supervisor) **Dr. Anu Gupta**Associate Professor Birla Institute of Technology & Science, Pilani Pilani - 333031 (Rajasthan) INDIA Date: # Dedicated To My Beloved and Inspiring Parents Shrimati Manju Devi And Shri Suresh Kumar Murotiya ## **ACKNOWLEDGEMENTS** Foremost, I would like to express my humble gratitude and sincere thanks to my supervisor Prof. Anu Gupta for her valuable guidance, encouragement, suggestions, and moral support throughout the period of this research work. It has been a privilege for me to work and learn under her. Next I would like to express my deep sense of satisfaction to BITS Pilani for providing all the necessary facilities and support to complete the research work. My special thanks to Prof. V.Sambasiva Rao, Acting Vice Chancellor, BITS Pilani, and Prof. Ashoke Kumar Sarkar, Director, BITS Pilani, Pilani campus, for giving me an opportunity to pursue my research successfully. I am thankful to Prof. Sanjay Kumar Verma, Dean, Academic Research (Ph.D. Programme), BITS Pilani, and Dr. Hemant R. Jadhav, Professor-in-Charge, Academic Research (Ph.D. Programme), BITS Pilani, Pilani Campus, for providing all necessary guidelines and extending full support which were crucial for the completion of this thesis. I am thankful to members of Doctoral Advisory Committee, Prof. Navneet Gupta and Prof. Abhijit Rameshwar Asati, for their critical comments, which helped me improve the quality of manuscript. Thanks are due to Prof. Vinod Kumar Chaubey, Prof. Surekha Bhanot, Mr. Pawan Sharma, Ms. Vinita Tiwari, Mr. Nitin Chaturvedi and other faculty colleagues of Dept. of Electrical & Electronics Engineering for their constant motivation and encouragement. Lastly, in everyday life, the portion of Earth exposed to Sun is said to have day, whereas the remaining portion is said to have night. Similarly behind all accomplishments there are equal sacrifices. I am thankful to Almighty Gods for giving their divine blessings to my husband Mr. Deepak Kabra and our daughter Ms. Vaani Kabra. I deeply value the cooperation and support received from my mother in-law Sumitra Devi, father in-law Shri Babu Lal Kabra, Sisters Mrs. Suman Ladda, Mrs. Saroj Kaliya, Ms. Sangeeta Murotiya and Ms. Soniya Murotiya, Brother Mr. Kamlesh Murotiya, Mrs. Vanita Murotiya and other family members. Sneh Lata Murotiya #### **ABSTRACT** Carbon nanotube field effect transistor (CNTFET) shows great promises as extension to Silicon MOSFET for building high performance and low power VLSI circuit. Three-valued (ternary) logic is a promising alternative to traditional binary logic for accomplishing simplicity and energy efficiency in modern digital design. Ternary logic has an elegant association with CNTFET because the best way to design ternary circuit is the multiple-threshold method and desired threshold voltage can be easily achieved by utilizing different diameter of CNT in CNTFET device. This thesis develops designs of ternary arithmetic and logic unit (TALU) and content addressable memory cell using CNTFETs. First, 2-bit hardware optimized ternary ALU (HO-TALU) is presented. 2-bit HO-TALU gets minimization in required hardware at both architectural as well as at circuit level. At architecture level, HO-TALU has a new addersubtractor (AS) module which performs both addition and subtraction operations using an adder module only with the help of multiplexers. Thus, it eliminates a subtractor module from the conventional architecture. At circuit level, HO-TALU minimizes ternary function expressions and utilizes binary gates along with ternary gates in realization of functional modules: AS, multiplier, comparator and exclusive-OR. AS module has a minor loss in power-delay product (PDP) but multiplier, comparator and exclusive-OR modules show improved PDP. As a consequence, HO-TALU gets significant reduction in device count with marginally increase in PDP for addition and subtraction operations only in comparison with CNTFET-based ternary designs available in the literature. Design of 2-bit HO-TALU is modified to develop a 2-bit HO-TALU slice which could be easily cascaded to construct N-bit HO-TALU. Ternary full adder (TFA) which is a basic sub-block of AS module, is modified using different circuit techniques to improve its efficiency in terms of PDP. Three new designs of TFA are presented. The first TFA design named as high speed TFA (HS-TFA) uses a symmetric pull-up and pull-down networks along with a resistive voltage divider as its integral part, which is configured using transistors. Compared to recently developed TFA available in literature, HS-TFA gets improved speed but high power dissipation. In order to reduce power consumption, a second TFA named as low power TFA (LP-TFA) is proposed. LP-TFA makes use of complimentary pass transistor logic style and achieves low power consumption with marginal decrease in PDP. To get improved PDP further, a third TFA is implemented in dynamic logic. This TFA is named as dynamic TFA (DTFA) which uses a keeper designed for ternary values in order to alleviate charge sharing problem. The realization of all three TFA takes the advantages of inherent binary nature (0 and 1) of input carry leading to simplicity in designs. Next, a new design of comparator module of 2-bit HO-TALU is presented. First, 1-bit comparator is developed using pass transistor logic with reduced number of stages in critical delay path. Then, 1-bit design is utilized to create 2-bit and N-bit comparator where a static binary tree configuration is used to correct the voltage levels. The proposed 2-bit comparator achieves better PDP in comparison with that of available counterparts. This comparator, HS-TFA and DTFA have high driving capability. Moreover, all new TFAs and 2-bit comparator are less sensitive to voltage and temperature variations with respect to existing designs. Next, design of 2-bit power optimized ternary ALU (PO-TALU) using CNTFETs is presented. 2-bit PO-TALU functional modules: adder-subtractor-exclusive-OR (ASE) and multiplier, are designed using new complementary CNTFET-based binary computational unit and a low complexity encoder. ASE eliminates exclusive-OR and subtractor modules from the conventional architecture. Multiplier uses a new efficient carry-add (CA) block in place of ternary half adder. As a result, PO-TALU design gets significant improvements in terms of power and power-delay product with device count compared to existing designs. Design of 2-bit PO-TALU slice is shown so that parallel N-bit PO-TALU can be constructed with N/2 slices connected in cascade. Further, increased attraction for bandwidth-hungry real-time applications like internet has raised a demand for high speed CAM circuits to perform table lookup task. Binary CAM (BCAM) and ternary CAM (TCAM) cells designed based on low capacitance search logic are presented in CNTFET technology. A new three-valued CAM (3CAM) cell is also presented. This cell uses CNTFETs with two different threshold voltages in implementation of low capacitance search network which leads to a fast and compact CAM design with respect to CNTFET based 3CAM cell recently published in the literature. # TABLE OF CONTENTS | CE | ERTIFICATE | i | |-----|------------------------------------------------------------------------|-----| | AC | CKNOWLEDGEMENTS | iii | | AE | BSTRACT | iv | | TA | ABLE OF CONTENTS | vi | | LIS | ST OF TABLES | ix | | LIS | ST OF FIGURES | X | | LIS | ST OF ABBREVIATIONS | XV | | 1. | Introduction | 1 | | | 1.1. Background | 1 | | | 1.2. Motivation and Objectives | 6 | | | 1.3. Thesis Outline | 11 | | 2. | Literature Review | 13 | | | 2.1. Introduction | 13 | | | 2.2. Carbon Nanotube Field Effect Transistor (CNTFET) | 13 | | | 2.3. Three-valued (Ternary) Arithmetic and Logic Circuits | 19 | | | 2.3.1. Ternary Circuit based on MOSFET | 19 | | | 2.3.2. Ternary Circuit based on CNTFET | 24 | | | 2.4. Content Addressable Memory (CAM) Cell | 29 | | | 2.4.1. CAM Cell based on MOSFET | 29 | | | 2.4.2. CAM Cell based on CNTFET | 33 | | | 2.5. Research Gaps and Scope of the Presented Work | 35 | | 3. | Design of 2-bit Hardware Optimized Ternary ALU (HO-TALU) using CNTFETs | 36 | | | 3.1. Introduction | 36 | | | 3.2. Design of Ternary Logic Gates | 37 | | | 3.3. Architecture & Functions of 2-bit HO-TALU | 41 | | | 3.4. Synthesis, Minimization and Realization of 2-bit HO-TALU Function | 47 | | | 3.5. Design & Implementation of 2-bit HO-TALU Functional Module | 49 | | | 3.5.1. Adder-Subtractor (AS) Module | 49 | | | 3.5.2. Comparator Module | 56 | | | 3.5.3. Exclusive-OR Module | 60 | | | 3.5.4. Multiplier Module | 62 | |----|-----------------------------------------------------------------------------------------------------------|-----| | | 3.5.5. T-OR/T-NOR/ T-AND/T-NAND Module | 64 | | | 3.6. 2-bit HO-TALU Slice for N-bit HO-TALU | 65 | | | 3.7. Results and Discussion | 68 | | | 3.7.1. Functional Verification of 2-bit HO-TALU | 69 | | | 3.7.2. Hardware Efficiency Evaluation of 2-bit HO-TALU | 70 | | | 3.7.3. Performance Evaluation of 2-bit HO-TALU | 71 | | | 3.8. Conclusion | 73 | | 4. | Performance Boosted Designs of Sub-Blocks of 2-bit Hardware Optimized Ternary ALU (HO-TALU) using CNTFETs | 75 | | | 4.1. Introduction | 75 | | | 4.2. Designs of Ternary Full Adder (TFA) | 76 | | | 4.2.1. High Speed TFA (HS-TFA) | 76 | | | 4.2.2. Low Power TFA (LP-TFA) | 82 | | | 4.2.3. Dynamic TFA (DTFA) | 87 | | | 4.2.4. Results and Discussion | 90 | | | 4.3. Design of Comparator Module | 98 | | | 4.3.1. 1-bit Comparator | 98 | | | 4.3.2. Design of N-bit Comparator | 101 | | | 4.3.3. Results and Discussion | 103 | | | 4.4. Conclusion | 110 | | 5. | Design of 2-bit Power Optimized Ternary ALU (PO-TALU) using CNTFETs | 111 | | | 5.1. Introduction | 111 | | | 5.2. Architecture & Functions of 2-bit PO-TALU | 112 | | | 5.3. Minimization and Realization of 2-bit PO-TALU Functions | 117 | | | 5.4. Design & Implementation of 2-bit PO-TALU Functional Module | 119 | | | 5.4.1. Adder-Subtractor-Exclusive-OR (ASE) Module | 119 | | | 5.4.2. Multiplier Module | 130 | | | 5.4.3. Comparator Module | 134 | | | 5.5. Implementation of 2-bit PO-TALU Slice | 134 | | | 5.6. Results and Discussion | 136 | | | 5.6.1. Functional Verification of 2-bit PO-TALU | 137 | | | 5.6.2 Performance Evaluation of 2 hit PO TALLI | 137 | | 5.7. Conclusion | 144 | |------------------------------------------------------------------------------|-----| | 6. Design of High Speed Content Addressable Memory (CAM) Cells using CNTFETs | 145 | | 6.1. Introduction | 145 | | 6.2. Design of CAM Cells | 146 | | 6.2.1. Binary CAM (BCAM) Cell | 146 | | 6.2.2. Ternary CAM (TCAM) Cell | 147 | | 6.2.3. Three-Valued CAM (3CAM) cell | 149 | | 6.3. Results and Discussion | 152 | | 6.4. Conclusion | 156 | | 7. Conclusion and Future work | 157 | | 7.1. Conclusion | 157 | | 7.2. Future Scope of Work | 160 | | REFERENCES | 161 | | APPENDIX I | 187 | | APPENDIX II | 200 | | LIST OF PUBLICATION | 204 | | BRIEF BIOGRAPHY OF THE CANDIDATE | 205 | | BRIEF BIOGRAPHY OF THE SUPERVISOR | 206 | # **LIST OF TABLES** | Table 2.1 | Technology parameters for CNTFET [117] | | |------------|------------------------------------------------------------------------------------------|---------| | Table 3.1 | Definition of logic states in ternary logic | | | Table 3.2 | Truth table of ternary inverters | | | Table 3.3 | Truth table of ternary NAND and NOR gates | | | Table 3.4 | Function table of HO-TALU | | | Table 3.5 | Truth table of ternary half adder (THA) | 48 | | Table 3.6 | Truth table of ternary half subtractor (THS) | | | Table 3.7 | Truth table of ternary full adder (TFA) and full subtractor (TFS) | | | Table 3.8 | Truth table of 2-bit ternary comparator | 56-57 | | Table 3.9 | Truth table of 1-bit ternary XOR | | | Table 3.10 | Truth table of 1-bit ternary multiplier | | | Table 3.11 | Comparison of ternary circuits based on device count | 70 | | Table 3.12 | Simulation results of ternary circuits | 72 | | Table 4.1 | Truth table of ternary full adder (TFA) | 77-78 | | Table 4.2 | Switching activity for Sum and Carry generator of high speed ternary full adder (HS-TFA) | 78 | | Table 4.3 | Truth table of 1-to-6 ternary decoder | 80 | | Table 4.4 | Simulation results of CNTFET-based ternary full adder (TFA) designs | 93 | | Table 4.5 | Decoding of outputs for comparison response | 98 | | Table 4.6 | Truth table of 1-bit comparator | 99 | | Table 4.7 | Simulation results of 2-bit comparator circuits | 105 | | Table 5.1 | Function table of 2-bit PO-TALU | 113 | | Table 5.2 | Truth table of ternary encoder | 118 | | Table 5.3 | Function select table for ASE module | 120 | | Table 5.4 | Addition rules for ternary half adder (THA) | 121 | | Table 5.5 | Subtraction rules for ternary half subtractor (THS) | 123 | | Table 5.6 | Addition rules for ternary full adder (TFA) | 127 | | Table 5.7 | Subtraction rules for ternary full subtractor (TFS) | 128-129 | | Table 5.8 | Truth table of carry add (CA) | 131 | | Table 5.9 | Design rules for ternary 1-bit multiplication | 133 | | Table 5.10 | Decoding of outputs for comparison response | 134 | | Table 5.11 | Simulation results of CNTFET-based adder circuits | 139 | | Table 5.12 | Simulation results of CNTFET-based multiplier circuits | 140 | | Table 6.1 | Ternary encoding for 16T TCAM cell | 148 | # **LIST OF FIGURES** | Figure 1.1 | Evolution of MOSFET gate length (filled blue circles and open blue circles for ITR targets) and integration complexity of microprocessor chip (red stars), as a function of time [7] | | |-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----| | Figure 1.2 | Schematic view of Si based NWFET [30] | 3 | | Figure 1.3 | Schematic view of n-type III-V compound semiconductor FET [33] | | | Figure 1.4 | Schematic view of graphene nanoribbon transistor [22] | 4 | | Figure 1.5 | Schematic view of CNTFET [39] | 5 | | Figure 1.6 | Advantages of ternary logic circuits [61] | 8 | | Figure 2.1 | Unrolled sheet of graphite and the rolled lattice structure of CNT [86] | | | Figure 2.2 | e 2.2 Carbon nanotube field effect transistor (CNTFET) (a) schematic view of a CNTFET device, (b) SB-CNTFET, (c) M-CNTFET, (d) T-CNTFET | | | Figure 3.1 | Negative ternary inverter (NTI) [199] | 39 | | Figure 3.2 | Positive ternary inverter (PTI) [199] | 39 | | Figure 3.3 | Standard ternary inverter (STI) [199] | 40 | | Figure 3.4 | Standard ternary NAND (STNAND) gate [199] | 40 | | Figure 3.5 | Standard ternary NOR (STNOR) gate [199] | 41 | | Figure 3.6 | (a) Pin out diagram of 2-bit HO-TALU | 42 | | Figure 3.6 | (b) Architecture of 2-bit HO-TALU | 43 | | Figure 3.7 | 1-to-3-line ternary decoder (a) logic level diagram (b) truth table | 44 | | Figure 3.8 | Logic level diagram of function selection logic block with active high outputs (FSB-AHO) | 45 | | Figure 3.9 | Logic level diagram of transmission gate block with active high enable (TGB-AHE) | 46 | | Figure 3.10 | Ternary function implementation for 2-bit HO-TALU | 47 | | Figure 3.11 | K-map of ternary half adder (THA) | 48 | | Figure 3.12 | Logic level diagram of ternary half adder (THA) | 49 | | Figure 3.13 | Logic level diagram of 2-bit ternary adder-subtractor (AS) | 50 | | Figure 3.14 | Logic level diagram of ternary half adder-subtractor (HAS) | 51 | | Figure 3.15 | K-map of ternary half subtractor (THS) | 52 | | Figure 3.16 | (a) Logic diagram of $S_1/D_1$ generator of ternary full adder-subtractor (FAS) | 53 | | Figure 3.16 | (b) Logic diagram of $C_1/B_1$ generator of ternary full adder-subtractor (FAS) | 53 | | Figure 3.17 | K-map for ternary full adder (TFA) | 55 | | Figure 3.18 | K-map for ternary full subtractor (TFS) | 55 | | Figure 3.19 | (a) Ternary K-map for EQ of comparator module | 58 | | Figure 3.19 | (b) Ternary K-map for LE of comparator module | 59 | | Figure 3.20 | (a) Logic level diagram for EQ generator of comparator module | | |-------------|-------------------------------------------------------------------------------------|----| | Figure 3.20 | (b) Logic level diagram for LE and GR generator of comparator module | | | Figure 3.21 | Block diagram of the exclusive-OR module | | | Figure 3.22 | Logic level diagram of THA_SUM block of exclusive-OR module | | | Figure 3.23 | Block diagram of multiplier module | | | Figure 3.24 | K-map of 1-bit ternary multiplier | | | Figure 3.25 | Logic level diagram of 1-bit ternary multiplier | 64 | | Figure 3.26 | Logic level diagram of (a) T-NAND (b) T-AND (c) T-NOR (d) T-OR modules | 64 | | Figure 3.27 | Block diagram of 2-bit HO-TALU slice | 65 | | Figure 3.28 | Block diagram for modified adder-subtractor (MAS) | 66 | | Figure 3.29 | Cascaded configuration of modified adder-subtractor (MAS) for N-bit HO-TALU | | | Figure 3.30 | Logic diagram for modified comparator (MCOMP) | 67 | | Figure 3.31 | 3.31 Cascaded configuration of modified comparator (MCOMP) for N-bit HO-TALU | | | Figure 3.32 | Transient waveform of full adder-subtractor (FAS) | 69 | | Figure 3.33 | Comparison of ternary circuits based on device count | 71 | | Figure 3.34 | Comparison of HO-TALU sub modules based on PDP | 73 | | Figure 4.1 | (a) Pin diagram of high speed ternary full adder (HS-TFA) | 76 | | Figure 4.1 | (b) Block diagram of high speed ternary full adder (HS-TFA) | 77 | | Figure 4.2 | Logic level diagram of 1-to-6-line ternary decoder | 80 | | Figure 4.3 | (a) Schematic diagram for Sum generator of high speed ternary full adder (HS-TFA) | 81 | | Figure 4.3 | (b) Schematic diagram for Carry generator of high speed ternary full adder (HS-TFA) | 81 | | Figure 4.4 | Schematic diagram of ternary buffer (TB) | 82 | | Figure 4.5 | (a) Pin diagram of low power ternary full adder (LP-TFA) | 82 | | Figure 4.5 | (b) Block diagram of low power ternary full adder (LP-TFA) | 83 | | Figure 4.6 | K-map of low power ternary full adder (LP-TFA) for (a) Sum (b) Carry | 84 | | Figure 4.7 | Schematic diagram of a 1-to-5-line ternary decoder | 84 | | Figure 4.8 | (a) Schematic diagram of Sum generator of low power ternary full adder (LP-TFA) | 85 | | Figure 4.8 | (b) Schematic diagram of Carry generator of low power ternary full adder (LP-TFA) | 86 | | Figure 4.9 | (a) Pin diagram of dynamic ternary full adder (DTFA) | 87 | | Figure 4.9 | (b) Block diagram of dynamic ternary full adder (DTFA) | 88 | | Figure 4.10 | (a) Schematic diagram of Sum generator of dynamic ternary full adder (DTFA) | | | Figure 4.10 | (b) Schematic diagram of Carry generator of dynamic ternary full adder (DTFA) | 89 | | Figure 4.11 | Schematic diagram of 1-to-4-line ternary decoder | | |-------------|-----------------------------------------------------------------------------------------------------------|-----| | Figure 4.12 | Transient waveform of high speed ternary full adder (HS-TFA) | | | Figure 4.13 | 3 (a) Delay versus output load capacitor plot for five ternary full adder (TFA) designs | | | Figure 4.13 | (b) Power consumption versus output load capacitor plot for five ternary full adder (TFA) designs | | | Figure 4.13 | (c) Power-delay product (PDP) versus output load capacitor plot for five ternary full adder (TFA) designs | | | Figure 4.14 | Power consumption versus operating frequency plot for five ternary full adder (TFA) designs | | | Figure 4.15 | Power-delay product (PDP) versus supply voltage plot for five ternary full adder (TFA) designs | | | Figure 4.16 | 1.16 Power-delay product (PDP) versus temperature plot for five ternary full adder (TFA) designs | | | Figure 4.17 | 1-bit comparator (a) pin diagram (b) block diagram | 99 | | Figure 4.18 | 8 K-map for 1-bit comparator | | | Figure 4.19 | 4.19 Schematic diagram of 1-bit comparator | | | Figure 4.20 | 2-bit comparator (a) pin diagram (b) block diagram | 101 | | Figure 4.21 | 4-bit comparator (a) pin diagram (b) block diagram | | | Figure 4.22 | (a) Schematic diagram of binary grouping block | 103 | | Figure 4.22 | (b) Schematic diagram of inverted binary grouping block | 103 | | Figure 4.23 | Transient waveform of 2-bit comparator | 105 | | Figure 4.24 | (a) Delay versus output load capacitor plot for 2-bit comparator circuits | 107 | | Figure 4.24 | (b) Power consumption versus output load capacitor plot for 2-bit comparator circuits | 107 | | Figure 4.24 | (c) Power-delay product (PDP) versus output load capacitor plot for 2-bit comparator circuits | 108 | | Figure 4.25 | 25 Power consumption versus operating frequency plot for 2-bit comparator circuits | | | Figure 4.26 | 4.26 Power-delay product (PDP) versus supply voltage plot for 2-bit comparator circuits | | | Figure 4.27 | Power-delay product (PDP) versus temperature plot for 2-bit comparator circuits | 109 | | Figure 5.1 | (a) Pin diagram of 2-bit PO-TALU | 112 | | Figure 5.1 | (b) Architecture of 2-bit PO-TALU | 113 | | Figure 5.2 | Logic level diagram of 1-to-6-line ternary decoder | 114 | | Figure 5.3 | Logic level diagram of function select logic block with active low outputs (FSB-ALO) | | | Figure 5.4 | Logic level diagram of transmission gate block with active low enable (TGB-ALE) | 117 | | Figure 5.5 | Ternary function implementation for 2-bit PO-TALU | 118 | | Figure 5.6 | Design of ternary encoder | | |-------------|----------------------------------------------------------------------------------------------------------|-----| | Figure 5.7 | Block diagram of adder-subtractor-exclusive-OR (ASE) module | | | Figure 5.8 | (a) Schematic diagram of S0/D0/E0 generator of half adder-subtractor-exclusive-OR (HASE) | | | Figure 5.8 | (b) Schematic diagram of $C_0/B_0$ generator of half adder-subtractor-exclusive-OR (HASE) | | | Figure 5.9 | K-maps of $X_{1S}$ , $X_{2S}$ , $X_{1C}$ and $X_{2C}$ for ternary half adder (THA) | | | Figure 5.10 | K-maps of $X_{1D}$ , $X_{2D}$ , $X_{1B}$ and $X_{2B}$ for ternary half subtractor (THS) | | | Figure 5.11 | (a) Schematic diagram of S1/D1/E1 generator of full adder-subtractor-exclusive-OR (FASE) | | | Figure 5.11 | (b) Schematic diagram of $C_1/B_1$ generator of full adder-subtractor-exclusive-OR (FASE) | | | Figure 5.12 | (a) K-maps of $X_{3S}$ and $X_{4S}$ for ternary full adder (TFA) | 126 | | Figure 5.12 | (b) K-maps of $X_{3C}$ and $X_{4C}$ for ternary full adder (TFA) | 127 | | Figure 5.13 | (a) K-maps of $X_{3D}$ and $X_{4D}$ for ternary full subtractor (TFS) | 129 | | Figure 5.13 | (b) K-maps of $X_{3B}$ and $X_{4B}$ for ternary full subtractor (TFS) | 130 | | Figure 5.14 | Block diagram of multiplier functional module | 131 | | Figure 5.15 | Logic level diagram of carry add (CA) | 131 | | Figure 5.16 | K-maps of $X_{1CA}$ and $X_{2CA}$ for carry add (CA) | 132 | | Figure 5.17 | (a) Schematic diagram for P <sub>0</sub> (Product) generator of ternary 1-bit multiplier | 132 | | Figure 5.17 | (b) Schematic diagram for C <sub>0</sub> (Carry) generator of ternary 1-bit multiplier | | | Figure 5.18 | K-maps of $X_{1P0}$ , $X_{2P0}$ , $X_{1C0}$ and $X_{2C0}$ for ternary 1-bit multiplier | 133 | | Figure 5.19 | Pin diagram of 2-bit PO-TALU Slice | 135 | | Figure 5.20 | Block diagram for modified adder-subtractor-exclusive-OR (MASE) | | | Figure 5.21 | ure 5.21 Cascaded configuration for modified adder-subtractor-exclusive-OR (MASE) of N-bit PO-TALU | | | Figure 5.22 | 22 Transient waveform of full adder-subtractor-exclusive-OR (FASE) | | | Figure 5.23 | 5.23 (a) Delay versus output load capacitor plot for six ternary full adder (TFA) designs | | | Figure 5.23 | 5.23 (b) Power consumption versus output load capacitor plot for six ternary full adder (TFA) designs | | | Figure 5.23 | (c) Power-delay product (PDP) versus output load capacitor plot for six ternary full adder (TFA) designs | 142 | | Figure 5.24 | 4 Power consumption versus operating frequency plot for six ternary full adder (TFA) designs | | | Figure 5.25 | 25 Power-delay product (PDP) versus supply voltage plot for six ternary full adder (TFA) designs | | | Figure 5.26 | Power-delay product (PDP) versus temperature plot for six ternary full adder (TFA) designs | 144 | | Figure 6.1 | Schematic diagram of 9T binary CAM (BCAM) cell | 146 | | Figure 6.2 | Schematic diagram of 16T ternary CAM (TCAM) cell | 148 | |------------|---------------------------------------------------------|-----| | Figure 6.3 | Schematic diagram of 11T three-valued CAM (3CAM) cell | 150 | | Figure 6.4 | Schematic diagram for current race sensing scheme [232] | 153 | | Figure 6.5 | Transient waveform of 11T three-valued CAM (3CAM) cell | 154 | ### **LIST OF ABBREVIATIONS** 3CAM Three-valued Content Addressable Memory AHAL Active High/Active Low ALU Arithmetic and Logic Unit AS Adder-Subtractor ASE Adder-Subtractor-Exclusive-OR BCAM Binary Content Addressable Memory BTA Balanced Ternary Adder CA Carry-Add CAM Content Addressable Memory CMOS Complementary Metal Oxide Semiconductor CNT Carbon Nanotube CPL Complementary Pass-transistor Logic CPU Central Processing Unit CNTFET Carbon Nanotube Field Effect Transistor DECMOS Depletion Enhancement Complementary Metal Oxide Semiconductor DRAM Dynamic Random Access Memory DTFA Dynamic Ternary Full Adder FA Full Adder FAS Full Adder-Subtractor FASE Full Adder- Subtractor-Exclusive-OR FSB-AHO Function Selection logic Block with Active High Outputs FSB-ALO Function Select logic Block with Active Low Outputs HA Half Adder HAS Half Adder-Subtractor HASE Half Adder- Subtractor-Exclusive-OR HO-TALU Hardware Optimized Ternary ALU HS-TFA High Speed Ternary Full Adder IC Integrated Circuit LP-TFA Low Power Ternary Full Adder LSB Least Significant Bit MAS Modified Adder-Subtractor MASE Modified Adder-Subtractor-Exclusive-OR M-CNTFET MOSFET-like CNTFET MCOMP Modified Comparator MSB Most Significant Bit MVL Multiple-Valued Logic NTI Negative Ternary Inverter NTNAND Negative Ternary NAND NTNOR Negative Ternary NOR NWFET Nanowire Field Effect Transistor Pd Palladium PDN Pull-Down Network PDP Power-Delay Product PO-TALU Power Optimized Ternary ALU PTI Positive Ternary Inverter PTNAND Positive Ternary NAND PTNOR Positive Ternary NOR PU Pull-Up PUN Pull-Up Network QAT Quasi-Adiabatic Ternary QoS Quality of Service RBSD Redundant Binary Signed Digit RSFG Recharged CMOS Semi Floating Gate SB Schottky Barrier SB-CNTFET Schottky Barrier CNTFET Si Silicon SNM Static Noise Margin SRAM Static Random Access Memory STDL Simple Ternary Differential Logic STI Standard Ternary Inverter STNAND Standard Ternary NAND STNOR Standard Ternary NOR SUS-LOC Supplementary Symmetrical Logic Circuit Structure SWCNT Single Walled Carbon Nanotube TALU Ternary ALU TAND Ternary AND TB Ternary Buffer TC Two's complement TCAM Ternary Content Addressable Memory T-CNTFET Tunneling CNTFET TDDL Ternary Dynamic Differential Logic TFA/S Ternary Full Adder/ Subtractor THA/S Ternary Half Adder / Subtractor TGB-AHE Transmission Gate Block with Active High Enable TGB-ALE Transmission Gate Block with Active Low Enable TG Transmission Gate TOR Ternary OR ### 1.1 Background Since the introduction of integrated circuits (ICs) in 1952 [1] and the realization of the first IC at Texas Instruments in 1958 [2], the last five decades witnessed a phenomenal growth of Silicon (Si) based microelectronics industry. Rapid advancements in this industry are achieved mainly due to continuous scaling or miniaturization of all electronics components (passive and active) integrated on the ICs. IC miniaturization techniques sustained the scaling of complementary metal oxide semiconductor (CMOS) devices and metallic interconnects that used for the connection of device terminals with power supply voltage [3]. Miniaturization in IC technology makes less testing requirements at system level, achieves significant cost savings and faster switching, and leads to compact, low power and highly reliable designs. As a consequence, it provides faster and improved ICs for high definition digital television, digital receiver, DSP, high speed microprocessor, communication, business transactions, traffic control, space guidance, medical treatment, weather monitoring, internet, and many other commercial, industrial, and scientific enterprises [4]. Further, according to Moore's law, the number of transistors that can be manufactured on a single chip is expected to grow exponentially with time [5]. This prediction turned out to be true as illustrated in Figure 1.1 [6-7]. Figure 1.1 plots the decrement in number of transistors integrated on a single microprocessor chip as a function of time. As can be observed, integration density doubles in every 18 months. To meet the IC density predicted by Moore's law, technology scaling has been pursued aggressively until today since 1970s. The gate length of a Metal Oxide Semiconductor Field Effect Transistor (MOSFET) is scaled down by a factor of 0.7 in every 2 years, as shown in Figure 1.1. Since 2006, at 65 nm technology node the gate length of a MOSFET has arrived at deep sub-micron/nano range. Today, technology node is 20 nm, and 14 nm has been expected as feature size in the near future [7-8]. **Figure 1.1:** Evolution of MOSFET gate length (filled blue circles and open blue circles for ITR targets) and integration complexity of microprocessor chip (red stars), as a function of time [7]. Further, scaling down the gate length of CMOS technology in nano ranges results in various critical challenges and reliability issues. One of the issues is increased leakage current which occurs due to various quantum mechanical tunneling including band-to-band tunneling, direct gate oxide tunneling, and source to drain tunneling [9]. Other issues are large process variations, the effects of crystal misalignments, the randomness of discrete doping, and increment of interface scatterings since the mean free path of electrons becomes comparable to component dimensions [8-9]. These device-level effects cause the current-voltage (I-V) characteristics to be substantially different from well-tempered MOSFET. As a result, researchers have major concerns regarding further improving device performance by scaling down the feature size of MOSFET. Besides, circuit level effects such as short channel effects, increment in the resistance of metallic on-chip interconnects and power dissipation will surely reduce the suitability of MOSFET for advanced applications in time to come [10-13]. Researchers developed double-gate MOSFETs and FinFET/tri-gate devices [14-15] to reduce short channel effects. In these devices, gate is placed on two/three sides of the channel, which results in better control on the channel and considerable reduction in drain to source sub-threshold leakage current. Researchers also have begun the exploration of new devices and channel material in sub10 nm technology node, which could be the possible alternatives to Si-CMOS. Based on ITRS [16], some of the emerging devices which have the capabilities to replace Sitechnology in post Si era are nanowire field effect transistor (NWFET) [17], III-V compound semiconductor field effect transistor [18-21], graphene field effect transistor [22-24], and carbon nanotube field effect transistor (CNTFET) [25-27]. NWFET uses a semiconducting nanowire having diameter around 0.5 nm as a channel material. This nanowire can be made from Si, germanium, III-V, In<sub>2</sub>O<sub>3</sub>, ZnO or SiC semiconductors [28-29]. The schematic view of silicon based NWFET is shown in Figure 1.2 [30]. The main advantages offered by NWFET due to use of small diameter are 1-D conduction and minimized short channel effects. The basic challenge faced by this device is fabrication of diffused P-N junctions. For this, current technology utilizes metal drain source junctions which result in ambipolar conduction [31] but produces a large OFF state current. **Figure 1.2:** Schematic view of Si based NWFET [30] In the III-V compound semiconductor FET, III-V compound semiconductor like InSb, InAs, InGaAs is used as a channel material. These materials provide high mobility of carriers in the channel. As a consequence, these III-V compound semiconductor FETs are able to deliver three times higher performance with same power consumption or they can reduce power by one tenth with same performance, compared to Si-MOSFETs [32]. The schematic view of an n-type transistor is shown in Figure 1.3 where ZrO<sub>2</sub> and InGaAs are used as the gate dielectric and the channel material, respectively [33]. In this device, the carrier mobility is found to be 3000 cm<sup>2</sup>/V-S. Two major challenges faced by III-V compound semiconductor FET are lower bandgap of III-V material which results in excessive leakage and large static power consumption, and formation of a compatible high-k dielectric interface [34] which is necessary in the electrostatic control of the device. **Figure 1.3:** Schematic view of n-type III-V compound semiconductor FET [33] The graphene nanoribbon transistor uses a monolayer of carbon atoms, packed into a 2-D honeycomb lattice as the channel material. Figure 1.4 shows the schematic view of this device which is fabricated with nanoribbons having a width around 2 nm [22]. The use of graphene as channel material provides very high mobility (15,000 cm²/V-S) resulting in fast switching, monolayer thin body for optimum electrostatic scaling, and excellent thermal conductivity [35]. Consequently, graphene nanoribbon transistor is capable to deliver 100 or 1000 times higher performance than Si-MOSFET [36]. The main challenge faced by this device is the comparatively low I<sub>ON</sub>/I<sub>OFF</sub> ratio (~7) [37], which cause an enormous amount of energy in the integrated circuit made of billions of graphene transistors [38]. **Figure 1.4:** Schematic view of graphene nanoribbon transistor [22] CNTFET uses a single or an array of semiconducting single wall carbon nanotubes (SWCNTs) as a channel material. The gate electrode is placed above the CNT channel and separated from it by a thin layer of gate dielectric. The schematic view of CNTFET is shown in Figure 1.5, where an array of four SWCNTs is used for channel [39]. CNTFET could be more achievable and promising candidate to extend or complement traditional Si device due to its excellent properties such as ballistic transport operation [40], high carrier mobility (10<sup>3</sup>-10<sup>4</sup> cm<sup>2</sup>/V-S) [41], easy integration of high-k dielectric material [42] (other than SiO<sub>2</sub>) resulting in better gate electrostatics, strong chemical bonding, high thermal conductivity (1700-3000 W/mK), high chemical stability [41], and better matching of P and N-type CNTFETs which simplifies transistor sizing in complex circuits. Figure 1.5: Schematic view of CNTFET [39] The first CNT-based transistor is announced by Martel et al. [43] & Tans et al. [44] in 1998. After that, significant advancements were achieved in the fabrication of CNT-based devices and circuits. Based on CNTFETs, some state-of-the-art designs such as logic gates, five-stage ring oscillator fabricated along a single CNT, a capacitive sensor interface circuit, a percolation-transport-based decoder, stand-alone circuit elements such as half-adder sum generators, D-latches and static random access memory (SRAM) cells have been fabricated [44-50]. In 2006, IBM demonstrated the first IC built using SWCNTs [51]. Cao et al. [52] announced that they made medium scale IC using CNTFETs on a thin plastic substrate. Recently, Shulaker et al. [53] used the CNT imperfection-immune methodology presented in [54-55] to fabricate first CNT computer entirely using CNTFETs. Similar to the first silicon based computer, the CNT computer is a synchronous digital system which runs stored programs and is programmable. The operating system of this computer achieves multitasking by executing a counting program and an integer-sorting program concurrently. Although the operating frequency of the CNT computer is reported to be 1 KHz only due to academic experimental limitations and capacitive loading introduced by the measurement setup, this demonstration is an important milestone in the development of complex and highly energy-efficient CNT based electronic system. At present, the fundamental challenges faced by CNT technology are the CNT misalignment and unwanted growth of metallic tubes [56]. The above mentioned emerging devices have the potential to become the successor of Si-CMOS in near future. CNTFET and NWFET are 1-D devices, graphene nanoribbon FET is a 2-D device, and the III-V compound semiconductor FET is a 3-D device. 1-D devices provide ballistic transport operation without any scattering and therefore, attain superior performance in comparison with 2-D and 3-D devices. CNTFET provides easy integration of high-k dielectric material due to the absence of dangling bonds, which in turn results in lower sub-threshold slopes and lower OFF current. As previously mentioned, the mobility of carriers in NWFET, graphene FET, III-V semiconductor FET and CNTFET is higher than Si-MOSFET, which results in higher carrier velocities and fast switching. In CNTFET and graphene transistors, the carrier mobility is in the same order of magnitude (10<sup>3</sup>-10<sup>4</sup>cm<sup>2</sup>/V-S) which makes them promising candidates for future high speed circuits. Furthermore, based on ITRS 2009 [16], CNTFET and graphene transistors demonstrated the highest possibility to become a part of future devices. When this research work begun, R&D in CNTFET was ahead compared to that of graphene transistor. Therefore, in this thesis, CNTFET-based circuits are targeted. ### 1.2 Motivation and Objectives As described earlier, the scaling of CMOS technology has been pursued aggressively over the last few decades to integrate more number of transistors on a single chip. However, material properties are directly related to dimension. For traditional Si-based devices, as the physical gate length is reached to nanoscale range, many device-level effects (such as increased leakage current, variations in doping, larger process variations and reduced gate control) are exhibited with MOSFETs [9]. To overcome these issues, researchers are exploring new alternatives of Si-CMOS process. CNTFET has proved to be a promising alternative due to its various superior properties such as unique 1-D band structure, ballistic transport operation and low OFF-current [40-42], together with its resemblance to MOSFET in terms of intrinsic attributes. Thereby, CNTFET is a promising device which enables high performance and low power designs for the next generation of modern electronics [57-58]. Further, digital system design has traditionally been associated with the binary logic where digital computations are performed on two possible logic values that are '0' and '1' in the Boolean space. Since the world around us is multi-valued, many practical applications such as robotics, process control and decision support systems need more than two-valued logic for efficient and optimum solution. In 1921, Post [59] demonstrated a definition of multi-valued logic (MVL) as an extension of conventional binary logic. In 1964, Alexander et al. [60] announced that the most efficient radix for realization of switching circuits is a natural base (e = 2.7183) which shows that the best integer radix is three rather than two. In 1970's, MVL-based system implementations were reported in the technical literature and referred as voltage mode ternary circuits [61-63]. Over the last couple of decades, three-valued or ternary logic has attracted considerable interest because of several advantages with respect to the binary logic in design of digital VLSI circuits. Figure 1.6 summaries the advantages offered by ternary logic [61]. This logic reduces chip area occupied by the interconnection wires and functional units in VLSI integration. Ternary signals carry more information on a single wire, thus reduces number of wires and IC pins required for the same range of data. This decreases number of interconnects and consequently, leads to increased space between any two wires without any increment of total silicon area. This also decreases resistance and capacitances associated with interconnect and contacts, and as a consequence, ternary logic achieves simplicity and increased energy efficiency in digital design [62]. Furthermore, other added advantages are less complex error detection/error correction code and high speed serial/ serial-parallel arithmetic operation. For example, 14-bit binary addition can be obtained by a 9-bit ternary adder which reduces number of ripple carriers to approximately half with respect to its binary implementation and thus, increase the speed of electronic circuits approximately by the factor of two. Raychowdhury and Roy [64] demonstrated that an efficient MVL implementation of a signed 32-bit multiplier is able to reduce both chip area and power by more than 50% in comparison to its fastest binary counterpart. In [65], MVL blocks have been added with binary logic ICs to improve the overall performance of system. Similarly, the advantages of ternary logic have been confirmed in a number of the applications including memory, communication, machine learning, fuzzy logic, artificial intelligence, robotics, data mining, digital signal processing, digital control systems and image processing etc. [66]. Figure 1.6: Advantages of ternary logic circuits [61] Voltage-mode MVL circuit processes information based on voltage levels. The best way to design and implement these circuits is using multi-threshold method [64]. In CMOS technology, multi-threshold design relies on body effects where different bias voltages are applied to the base or buck terminal of the transistors. To get these bias voltages, multiple supply voltages are required which leads to costly as well as complex power grid design. On the other hand, ternary logic has an elegant association with CNTFET devices. In particular, CNTFET provides a unique opportunity of achieving two distinct threshold voltages merely by employing CNTs with different diameters [67-69]. Therefore, a multi-threshold design can be accomplished easily in the CNTFETs. In the past, the important concerns of VLSI designers were propagation delay, area, cost and reliability. However, increasing power consumption is being given importance along with others constraints in the recent years, due to increasing levels of integration and desire for portability. There is a remarkable success and growth of portable applications including notebook and laptop computers, audio and video based multimedia products, personal wireless communications systems such as digital assistants and communicators which requires high speed computation and vastly increased capabilities with low power consumption [5]. The ever increasing market segment of portable electronic devices enables the implementation of long-lasting battery-operated systems. The progress of battery technology is slow compared to advances in microelectronics technology. Thus, it is unlikely to give a power solution for the mobile systems [70]. It has become imperative to develop VLSI circuits and systems which reduce heat dissipation in order to allow a large density of functions on a single chip. The situation has been further aggravated by the fact that the clock rate of microprocessor have already reached at 1 GHz mark, leading to a significant increase in switching power consumption. Furthermore, energy efficient circuits are also required in high performance computers, AC powered systems in which sinking large amount of heat through packages is becoming a difficult problem. Hence, designers are facing with more constraints: small chip area, high throughput, high speed, and at the same time, low-power dissipation. In today's digital world, the operations such as automation, process control and many other complex computations are accomplished by various programmable chips like microprocessors, microcontroller and dedicated processors etc. The most basic and important processing unit of these chips is an arithmetic and logic unit (ALU) which is responsible for performing various arithmetic and logic operations such as addition, subtraction, multiplication, magnitude comparison and XOR etc. ALU is the heart of the instruction execution portion of every processor. For example, architecture of 8085 microprocessor includes 8-bit ALU to process binary data. Some other state-of-art binary and ternary ALU designs can be found in [71-73]. The increasing demand of high performance in modern information processing systems clearly points to the need of efficient implementation of ALU designs in terms of hardware, speed and power. Therefore, it is essential to develop an efficient ALU using CNTFETs for ternary logic. In this thesis, we target the realization of CNTFET-based ternary ALU (TALU) for advanced electronic systems. Novel designs of TALU as well as its functional modules which include adder, subtractor, multiplier, comparator and exclusive-OR, are introduced and compared with the existing state-of-art works. These designs are evaluated based on four metrics: device count, propagation delay, power dissipation and power-delay-product. As the driving capability is an important parameter for the digital circuits, presented designs are tested under different loading conditions. These designs are also analyzed at different frequencies to examine their performance with variation in operating frequency. Further, another important characteristic of digital designs which should be considered is their susceptibility to voltage and temperature variations. For this, the presented circuits are evaluated over a vast range of supply voltage and temperature. On the other hand, increased attraction for bandwidth-hungry real-time applications and more usage of internet have raised a demand for very high speed networks. On the internet, a message like e-mail or webpage is transferred by first breaking it into data packets, and then, sending them towards the destination. Each data packet contains a header which has the information like data length, data type, sequence number, source address and destination address, and a payload [74]. Based on the information of the header, data packet is transferred to an output port by the network switch. A router which is a more sophisticated switch maintains a routing table and route incoming data packets from source to destination according to the information stored in the routing table [74]. Routers also send information to each other for the updating of their routing tables. In general, optical fiber based physical medium transport the data packet from one router to another. Advances in optical fiber technologies like wavelength division multiplexing, have achieved very high-speed data transportation on optical fibers. To get the benefits offered by optical fiber technology, routers or network switches should have the ability to meet the increased data transfer rates [75]. In a network switch, the most time consuming task is table lookups. New approaches like policy based routing, flow analysis and Quality of Service (QoS) are increasing the number and variety of table lookups. The low priority packets, like data are transferred after the high priority packets such as voice and video, to maintain the QoS. These new approaches need multiple look up for each packet before it is delivered. For table lookup task, software solutions like radix tree are relatively slow and not scalable with the size of the table. The hash function can perform lookup task in one memory access under normal conditions, however, its worst case search time is considerably higher than that of tree searches [76]. As a consequence, many of software solutions executing table lookup tasks at different network layers are now being substituted by their hardware counterparts. One of the most efficient hardware solutions is content addressable memory (CAM) which can be integrated as a co-processor with network processing unit to perform table lookup task. Further, CAMs are also used in many other key applications including tag directories in associative cache memory system [77-78], translation look-aside buffers in virtual memories [79-80], parametric curve extraction [81], data compression [82], image coding [83], real-time pattern searching in virus (or intrusion) detection systems [84] and gene pattern matching in bioinformatics [85] etc. Since most of these applications use smaller CAMs, the current research related to CAMs is mainly governed by network applications which demand high density CAMs with low power and high search speed. Design of low power and high speed CAM structures continues to be in high demand, and ballistic transport operation and low off current characteristics of CNTFET make them excellent candidate for high speed and increased integration density of CAM design. In this thesis, designing of CNTFET-based CAM cells is focused for fast match operation. Novel CAM structures are presented and compared with their existing counterparts. ### **Objectives of the Research are:** - 1) To develop architectures and circuits of ternary logic based ALU (TALU) optimized in terms of hardware using CNTFETs. - To improve the performance of sub-blocks of above hardware efficient TALU for power-delay product (PDP) efficiency using different circuit techniques. - 3) To find a new architecture of TALU and circuits of its sub-blocks optimized for low power ternary system using CNTFETs. - 4) To design ternary logic based CAM cell for fast search operation using CNTFETs. #### 1.3 Thesis Outline This thesis is organized as follows: Chapter 2 deals with literature survey. First, details of CNTFET device are given, and then ternary logic and arithmetic circuits implemented in CMOS and CNTFET technology are reviewed. Further, a survey of CAM cells realized using CMOS as well as CNTFET is included. Chapter 3 presents a design of 2-bit hardware optimized TALU (HO-TALU) using CNTFETs. Architecture and functionality of the 2-bit HO-TALU are described. HO-TALU introduces adder-subtractor (AS) module which eliminates a subtractor block from the conventional architecture. This section is followed by the description of ternary function minimization and realization. Design and implementation of HO-TALU functional modules and their integration over TALU slice are explained. HO-TALU modules utilize binary gates with ternary gates. The last section of this chapter demonstrates results for functional test and performance evaluation of HO-TALU including its hardware assessment. Chapter 4 explained the performance boosted designs of sub-blocks of 2-bit HO-TALU using CNTFETs. First, three designs of ternary full adder (TFA) which is an important subblock of AS module, are described. The first TFA design contains a symmetric pull-up and pull-down networks along with a resistive voltage divider as its integral part, which is configured using transistors and leads to a high speed design. The second TFA is designed based on complimentary pass transistor logic style in order to achieve low power consumption. The third TFA is implemented using dynamic logic style in order to get reduced power-delay-product. All new TFA designs are analyzed, evaluated and compared with the existing adder designs. This section is followed by the demonstration of design of new comparator module of 2-bit HO-TALU. This circuit is designed using pass transistor logic style and minimizes the number of stages to get improved performance. It is used in implementation of 2-bit and N-bit comparators which use binary tree configuration to correct the voltage levels. New design of comparator is analyzed, evaluated and compared with the existing comparator designs. Chapter 5 describes a design of 2-bit power optimized TALU (HO-TALU) using CNTFETs. The architecture and functions of 2-bit PO-TALU are explained which is followed by the demonstration of ternary function minimization and realization. PO-TALU functional blocks: adder-subtractor-exclusive-OR (ASE) and multiplier, are designed using complementary CNTFET-based binary computational unit and a low complexity encoder. ASE eliminates exclusive-OR and subtractor blocks from the conventional architecture. Multiplier uses a new carry-add (CA) block in place of ternary half adder. Implementation of these blocks is shown which is followed by the extension of PO-TALU for 2-bit slice. The last section of this chapter demonstrates simulation results and comparison with existing CNTFET-based designs. **Chapter 6** presents Binary CAM (BCAM) and ternary CAM (TCAM) cells designed based on low capacitance search logic in CNTFET technology. A new three-valued CAM (3CAM) cell is also presented using CNTFETs. This cell uses multi threshold voltage structure in implementation of low capacitance search network which leads to fast and compact CAM design. The presented CAM cells are simulated and compared with the existing memory designs. Finally, **chapter 7** presents the summary of the work demonstrated in this thesis, by including key findings, main contributions and important observations, and also discusses possible directions for the future work. ### 2.1 Introduction In chapter 1, the potential of CNTFET for high performance and low power modern designs due to its various excellent properties such as unique 1-D band structure, ballistic transport operation and low OFF-current [40-42], has been demonstrated. The relevance and motivation to develop CNTFET-based designs of ternary (three-valued) arithmetic and logic unit (TALU), and content addressable memory (CAM) cells, have also been discussed. Scientists and researchers interest in ternary logic is increasing over the past few decades because of providing several advantages such as reduced chip area and less number of interconnects, less complex error detection/error correction code and high speed serial/serial-parallel arithmetic operations etc. As a consequence, significant published literature is available on design and implementation of ternary arithmetic and logic circuits using MOSFETs [61-63]. In addition, due to the unique property of CNTFET for controlling threshold voltage by the CNT diameter, a number of researchers have found it as a fundamental device for the ternary design [64] [67-68]. In this chapter, the literature available on the designs and circuit implementations of ternary arithmetic and logic circuits based on CMOS as well as CNTFET technology is reviewed. Further, CAM has been in research since last few decades. Several circuit techniques and architectures have been developed to reduce the cell area, delay and power consumption of CAMs. This chapter provides a brief review of various designs of CAM cell developed in CMOS and CNTFET technology. In section 2.2, electronics properties of CNTFET which make it very competitive in future electronics, is provided. Ternary logic and arithmetic circuits implemented in CMOS and CNTFET technology are reviewed in section 2.3. A review of CAM designs realized using CMOS as well as CNTFET is given in section 2.4. This is followed by the section 2.5 in which gaps in the published research work have been compiled along with the problem statement of the thesis. ### 2.2 Carbon Nano Tube Field Effect Transistor (CNTFET) Carbon nanotube (CNT) is an allotrope of carbon with cylindrical structure, which could be single-walled (SWCNT) or multi walled. A SWCNT is obtained by rolling up a sheet of graphite along a wrapping vector $C_h = n_1 a + n_2 b$ , where $n_1$ and $n_2$ are positive integers which specify the chirality of the tube, and 'a' and 'b' are lattice unit vectors [86], as shown in Figure 2.1. Depending upon the value of $n_1$ and $n_2$ , SWCNT can be either metallic or semiconducting. If $n_1$ - $n_2$ is a multiple of 3, SWCNT is metallic or else it is semiconducting [87]. Similarly, SWCNT is further classified into three groups according to value of $n_1$ and $n_2$ : (1) armchair CNT when $n_1 = n_2 = n$ , (2) zigzag CNT when $n_1 = 0$ or $n_2 = 0$ , and (3) chiral CNT when $n_1$ and $n_2$ are different and nonzero. All armchair CNTs behave as conductors. On the other hand, zigzag and chiral CNTs show conductor behavior when the difference between the indices $(n_1$ - $n_2)$ is an integer multiple of 3 otherwise they are semiconducting CNTs, which are used in CNTFET [88]. Figure 2.1: Unrolled sheet of graphite and the rolled lattice structure of CNT [86] CNTFET is a type of FET that makes use of a single or an array of semiconducting SWCNTs as a channel formed between two metal electrodes acting as a source and drain contacts. Device is turned ON and OFF through the gate electrode placed around CNT channel. The schematic view of CNTFET is shown in Figure 2.2 (a) [39]. Undoped segments of CNTs serve as a channel under the gate electrode, while heavily doped CNT segments placed between the gate and the source/drain electrodes offer low electrical resistance in the ON-state of CNTFET [89]. Since the electrons are only confined to the narrow CNTs, carrier mobility goes up substantially on account of ballistic transport operation, in comparison with the bulk MOSFET. **Figure 2.2:** Carbon nanotube field effect transistor (CNTFET) (a) Schematic view of a CNTFET device, (b) SB-CNTFET, (c) M-CNTFET, (d) T-CNTFET Three types of CNTFET devices have been reported in the literature. They are known as schottky barrier CNTFET (SB-CNTFET), MOSFET-like CNTFET (M-CNTFET) and band-to-band tunneling CNTFET (T-CNTFET). SB-CNTFET works on the principle of direct tunneling through a schottky barrier (SB) at the source/drain-channel junction. This device is fabricated by making a direct contact between metal and semiconducting CNT, and shown in Figure 2.2 (b). The presence of SB at the CNT-metal junction limits the trans-conductance of the CNTFET in the ON state and decrease current deliver capability, which in turn reduce the suitability of SB-CNTFET for high-speed applications. Besides, SB-CNTFET shows strong ambipolar behavior which limits the usage of this device in complementary transistors-based circuits. To eliminate the above mentioned drawback of SB-CNTFET, M-CNTFET has been developed and shown in Figure 2.2 (c). This device operates like a normal MOSFET with high speed and low power consumption. It is fabricated using heavily-doped source and drain CNT regions. Due to absence of SB at source/drain-channel junction, M-CNTFET has significantly higher ON current which makes it very suitable for ultra-high-performance digital circuits. T-CNTFET which is shown in Figure 2.2 (c) has low ON current and very good cut-off characteristics. As a result, this device proves to be a super candidate for subthreshold and ultra-low-power design [90]. Based on the stated advantages as well as similarities of M-CNTFET with MOSFET in terms of operation and intrinsic attributes, this kind of CNTFET is used in this thesis for implementation of the presented circuits. The gate width of CNTFET can be approximated as [91]: $$W \approx \min(W_{\min}, N \times S) \tag{2.1}$$ Where $W_{min}$ is the minimum gate width, N is the number of tubes and S is the pitch which is the distance between the centers of two adjoining CNTs under the same gate. The threshold voltage is the voltage needed to turn ON the device electro-statically via the gate. For a CNTFET, it can be approximated to the first order as the half band gap and can be calculated as [91]: $$V_{th} \approx \frac{E_g}{2e} = \frac{1}{\sqrt{3}} \frac{aV_{\pi}}{eD_{CNT}} \approx \frac{0.43}{D_{CNT}(nm)}$$ (2.2) Where $V_{\pi}$ (= 3.033 eV) is the carbon $\pi$ - $\pi$ bond energy in the tight bonding model, a (= 0.249 nm) is the carbon-carbon atom distance and e is the unit electron charge. $D_{CNT}$ is the diameter of the CNT, which depends on the chirality vector ( $n_1$ , $n_2$ ) and can be calculated as [91]: $$D_{CNT} = \frac{\sqrt{3}a}{\pi} \left( \sqrt{n_1^2 + n_2^2 + n_1 n_2} \right) \approx 0.0783 (nm) \left( \sqrt{n_1^2 + n_2^2 + n_1 n_2} \right)$$ (2.3) According to the eq. (2.2) and (2.3), the threshold voltage of CNTFET is inversely proportional to the CNT diameter, and CNT diameter is directly proportional to chirality vector. For a CNTFET with (19, 0), CNT diameter is 1.487 nm and consequently, threshold voltage is 0.289 V. Similarly, for a CNTFET with (13, 0), CNT diameter is 1.02 nm and consequently, threshold voltage is 0.422V. The threshold voltage of P-CNTFET is same as that of N-CNTFET with an opposite sign. As the chirality vector increases, threshold voltage of CNTFET deceases. Thereby, CNTFET provides a unique opportunity for setting threshold voltage by varying the chirality vector of CNT. Different research group have demonstrated advances on manufacturing process for well controlled CNTs. For example, Li et al. [92] have used discrete catalytic nano-particles of various sizes for growth of single wall CNTs (SWCNTs) with controlled chirality vectors. Ohno et al. [93] have presented a possibility of chirality assignment of SWCNT by micro-photocurrent spectroscopy. Wang et al. [94] has described a synthesis process using different carbon precursors on Co–Mo catalysts for fabricating SWCNTs with well-controlled chirality structure. Lin et al. [95] has reported post-processing techniques to control the threshold voltage of multiple-tube CNTFET. Other excellent properties which make CNTFET a potential candidate for building highly efficient electronic system requiring high performance and low power are mentioned as follows: - 1. Long scattering mean free path ( $\sim 1 \mu m$ ) [96] which leads to lower delay and less heating which is very consequential from IC point of view [97-98]. - 2. High carrier mobility $(10^3-10^4\text{cm}^2/\text{V-S})$ in semiconducting CNTs [41] which provides high ON current (>1mA/ $\mu$ m). - 3. Easy integration of high-k dielectric material (other than SiO<sub>2</sub>) due to the absence of dangling bonds, resulting in better gate electrostatics [42]. - 4. Strong chemical bonding, high thermal conductivity (1700-3000 W/mK) and chemical stability lead to high current densities ( $\sim 10^{10} \text{A/cm}^2$ ) [41]. - 5. Better matching of complementary CNTFETs: P and N-type CNTFETs with same sizes have equal carrier mobility, thereby deliver same drive currents, which is very important for transistor sizing of complex circuits [98]. Besides the mentioned advantages of this emerging technology, it also faces some major challenges that must be resolved to make it feasible for commercial purpose. These challenges are mentioned as follows: - 1. CNT packing density [89] - 2. CNT diameter [99] and density variation [100] - 3. CNT misalignment [101-102] - 4. Metallic-CNT (m-CNT) growth Encouraging efforts are being made for resolving these challenges in time to time. Perfectly aligned and denser SWCNTs array based CNTFET was demonstrated in [103-104]. CNT synthesis processes such as wafer-scale CNT transfer along with wafer-scale-aligned growth [104], multiple cycles of chemical vapor deposition growth [105] and CNT transfer through multiple sacrificial layers [106] etc., enable us to pack nearly 5-50 CNTs/μm. Durkop et al. [42] developed CNTFET with high-quality ohmic contacts, high-k dielectrics HfO<sub>2</sub> films and electrostatically doped source and drain regions. Researchers described various CNT doping methods such as direct chemical doping [107] and atomic layer deposition [108]. For P-CNTFET, Mann et al. [109] have used Palladium (Pd) which leads to ohmic contact between CNT valance band and Pd electrode. Similarly, for N-CNTFET, Zhang et al. [110] have utilized Scandium which leads to ohmic contact between CNT conduction band and Sc electrode. Different research groups analyzed the impact of CNT diameter and density variations on the performance of CNTFET-based circuits in [111-112]. Zhang et al. [112] introduced integrated framework and aligned-active layout technique to overcome the effect of CNT variations. For the elimination of unwanted m-CNTs, various processing methods such as selective chemical etching [113], current-induced electrical burning [114] and VLSI-compatible m-CNTs removal [115] were described. Patil et al. [116] demonstrated automated algorithm and design technique to implement misaligned CNT-immune logic structures. Researchers [117-129] proposed different CNTFET device model in the literature. Stanford model [117] is used in this thesis, to evaluate CNTFET-based circuits under various test conditions and to perform comparison with their existing counterparts, at 32nm technology node. The operating voltage (V<sub>dd</sub>) for all proposed designs is chosen as 0.9V due to default value of the CNTFET Stanford model. This standard model has been designed for MOSFET-like single-walled CNTFET (M-CNTFET), in which each transistor may include one or more CNTs as its channel. This model considers a realistic, circuit-compatible CNTFET structure and includes practical device non-idealities. The modelled non-idealities incorporate inter- CNT charge screening effects, scattering, schottky-barrier effects at the contacts, parasitic, doped source-drain extension regions, back-gate (substrate bias) effect and source/drain, and gate resistances and capacitances. The model also includes a full transcapacitance network to deliver more accurate transient and dynamic response. The technology parameters of CNTFET along with their brief description and numeric value are given in Table 2.1[117]. **Table 2.1:** Technology parameters for CNTFET [117] | Parameter | Description | Value | |----------------------------|-----------------------------------------------------------------------|-------------------| | $L_{ch}$ | Physical channel length | 32.0 nm | | L <sub>geff</sub> | The mean free path in the intrinsic CNT channel region | 100.0 nm | | $L_{ss}$ | The length of doped CNT source-side extension region | 32.0 nm | | L <sub>dd</sub> | The length of doped CNT drain-side extension region | 32.0 nm | | $E_{\mathrm{fi}}$ | The Fermi level of the doped S/D tube | 0.6 eV | | K <sub>gate</sub> | The dielectric constant of high-k top gate dielectric material | 16.0 | | $T_{ox}$ | The thickness of high-k top gate dielectric material | 4.0 nm | | $C_{ m sub}$ | The coupling capacitance between the channel region and the substrate | 40.0 pF/m | | $V_{\rm fbn}, V_{\rm fbp}$ | Flatband voltage for n-CNTFET and p-CNTFET, respectively | 0.0 eV,<br>0.0 eV | | L_channel | Physical gate length | 32.0 nm | | Pitch | The distance between the centers of two adjacent CNTs | 20.0 nm | | L <sub>eff</sub> | The mean free path in p+/n+ doped CNT | 15.0 nm | | phi_M | The work function of Source/Drain metal contact | 4.6 eV | | phi_S | CNT work function | 4.5 eV | # 2.3 Three-valued (Ternary) Arithmetic and Logic Circuits Ternary arithmetic and logic circuits designed in MOSFET as well as CNTFET technology are discussed in the following sub-sections. ## 2.3.1 Ternary Circuits based on MOSFET Several authors [130-139] have presented MOSFET-based designs of ternary logic circuits. In most cases, they used power supply voltages higher than device threshold voltage, larger off-chip resistors and multiple power sources, which result in high power consumption in the circuits. Balla and Antoniou [62] developed a low power ternary logic family which contains a set of inverters, NAND and NOR gates. Using these gates, they implemented half and full adder, and 1-trit multiplier, which were further utilized in construction of a shift register, an N-trit adder and an N-trit multiplier, and cyclic convolution. These circuits were constructed using MC 4007 and MC14011 discrete transistors and tested to analysis their performance. It was shown that they get significant reduction in power-delay product (PDP) and device count with respect to earlier ternary designs presented in [136] and [139]. Heung and Mouftah [63] presented ternary logic family that does not include resistors. They developed inverters, NAND and NOR gates based on the use of depletion enhancement complementary metal-oxide-semiconductor (DECMOS) technology, then presented a design of ternary full adder using these gates. These circuits utilize two power supplies lower than the threshold voltage of transistors. They were studied using SPICE 2G simulation package. It was shown that they provide low power consumption and high speed in comparison with their binary counterpart. Above described ternary circuits of [62] and [63] require four type of devices namely the depletion PMOS, the depletion NMOS, the enhancement PMOS and the enhancement NMOS. The standard CMOS process does not support depletion MOSFETs. Therefore, these designs are not compatible with the standard CMOS process. Srivastava and Venkatapathy [140] developed positive ternary inverter (PTI), negative ternary inverter (NTI) and ternary full adder without using depletion mode transistors and resistors. They designed these circuits using CMOS inverter and pass transistors (at the output) in 2 $\mu$ m n-well standard CMOS process to operate them below + 2V. In these circuits, width/length (W/L) ratios of the transistors were adjusted to get optimum performance. It was shown that PTI and NTI get improvement in transient time, noise margin and chip area by the factor of four, half and two, respectively, in comparison with their counterpart of [63] implemented in DECMOS technology. However, in these designs, the flexibility in process modification to adjust threshold voltage of MOSFETs was absent. Srivastava [141] used back-gate bias method in addition with the W/L ratio of MOSFET to get desired location of transition region (around the midway between low and high voltage levels) in dc voltage transfer characteristics. They were designed STI, NTI and PTI for operation at a low voltage ( $\pm$ 1 V) in 2 $\mu$ m, n-well standard CMOS technology, simulated with SPICE 3 and utilized in the design of CMOS ternary logic circuits. Wang et al. [142] reported dynamic ternary logic circuits in which Yoeli-Rosenfeld algebra [9] was implemented. In these circuits, an overlapped four-phase clocking scheme was used and different basic circuit block were connected according to the permitted fan-out diagrams. These circuits were simulated using SPICE program with 2 µm CMOS process parameters. It was found that dynamic circuits shows reduction in speed-power-area product by three to four times than static ternary circuits presented in [140]. However, these circuits suffer from dc power dissipation and degraded voltage swing due to ratioed logic. Wu and Huang [144-145] suggested dynamic ternary circuits which use two-phase non-overlapped clocks and have full voltage swing without any dc power dissipation. Based on NMOS differential tree, they also presented simple ternary differential logic (STDL) for dynamic complex circuits to form a pipeline system. These circuits were simulated using SPICE program with 1.2 µm CMOS process parameters. It was shown that PDP of these circuits is only 23% to that of dynamic designs presented in [142]. These circuits also have advantage in term of layout area with respect to the designs of [142]. However, power supply voltages used in circuits of [144] are too low and thus, noise and impulse spike can easily influence them. Dynamic circuits of [145] require threshold adjusted MOS (non-standard CMOS) processing and four power supplies. In addition, the highest voltage available in the circuit is used to drive the precharge transistors and to increase the threshold voltage of PMOSFETs by means of body effect where the bulk potential of these transistors is being raised above the high voltage level. Thus, output logic swing available is less than the maximum voltage and noise margin is consequently reduced. A. Herrfeld and Hentschke [146] developed ternary dynamic differential logic (TDDL) and presented a TDDL-based ternary full adder. This dynamic circuit technique needs only a single clock signal and its inverse. The TDDL-based circuit uses enhancement mode MOS transistors with threshold voltages (V<sub>th</sub>) < $\Delta V$ , where $\Delta V$ represents the differential voltage between two adjacent states resulting in larger noise margin. Other favorable properties are no static power consumption and use of standard CMOS process restricted to enhancement mode P-type and N-type MOS transistors. However, these circuits are quite complex (minimum 15 transistors required for an inverter). Totto and Saletti [147] also suggested a dynamic circuit solution that allows the implementation of ternary circuits using a standard CMOS process, with only three power supplies, maximum possible noise margin and zero static power dissipation, at the expense of a slightly more complex circuit structure (minimum 9 transistors required for an inverter). Mateo and Rubio [149] presented quasi-adiabatic ternary (QAT) CMOS logic in order to get ternary logic benefit of reducing the area for low-power digital ICs. They realized basic ternary gates and ternary half adder. These circuits were simulated using the level six model of a 1 μm CMOS technology in HSPICE program. It was demonstrated that PDP of QTA based ternary half adder increase by one order of magnitude in comparison with that of adiabatic binary half adder of [148] having non fully adiabatic switching, but it decreases by two orders of magnitude compared to that of static binary half adder and dynamic ternary half adder of [142]. QTA based ternary half adder also shows 65% area saving with respect to adiabatic binary logic of [148]. Mateo and Rubio also demonstrated a QAT-based 5 x 5 trit multiplier implemented using 0.7 µm CMOS technology in [150]. The reported results show that PDP of QTA based 5 x 5 multiplier increases by one order of magnitude in comparison with that of fully adiabatic binary 8 x8 multipliers of [151] and [152] having non fully adiabatic switching and the breakage of reversibility, but it decreases by one and seven order of magnitude compared to that of static CMOS binary 8 x 8 multiplier and quasi-adiabatic binary 8 x 8 multiplier of [153], respectively. QTA based multiplier also shows 60% area saving and benefit in routing with respect to fully adiabatic binary one of [151] and [152]. Shivashankar and Shivaprasad [154-155] presented a systematic procedure for the simplification and implementation of ternary functions using a 3-to-1 line ternary multiplexer as building block. Map method which reduces number of design steps, was used for simplification of ternary functions. One or more input variables were considered as data select variables in the realization of ternary functions. As a consequence, this realization requires fewer multiplexers in comparison with the designs presented in [156]. An algorithm for reducing unary operators and ternary gates required in the data paths was also discussed. Authors considered single level and multi-level multiplexing techniques, and developed designs of a ternary adder, subtractor and ternary to analogue converter. Sipos et al. [157] described a design method for ternary multiplexers with any number of inputs. They used 3-to-1-line ternary multiplexer as a basic circuit to design those multiplexers having higher number of inputs. This basic multiplexer was built using minimum and maximum ternary functions, and the control circuit was build using ternary circuits named as indicators of logic levels. Designs of multiplexer were implemented using supplementary symmetrical logic circuit (SUS-LOC) structure, and simulated in ORCAD environment using transistors from Breakout library to validate their operation. Sathish et al. [158] presented a method for defining, implementing, analyzing, testing ternary circuits with VHDL Simulator. They demonstrated VHDL modeling of ternary circuits such as 9-to-1-line and 27-to-1-line multiplexers, half adder, half subtractor, full adder, full subtractor, 1-bit multiplier, 1-bit and 2-bit comparator, ripple carry adder and carry save adder, 1-bit and 2-bit position shifter and barrel shifter, where all circuits were implemented using 3-to-1-line multiplexers. All the designs were simulated using VHDL simulator with the help of technology dependent package called 9-state StdLogic\_1164 package to verify their functionality and timing specifications. Gundersen et al. [159] presented a carry free balanced ternary adder (BTA) implemented using recharged CMOS semi floating gate (RSFG) devices. They also realized a balanced ternary subtractor by applying inverted inputs to BTA. This adder contains RSFG ternary inverter blocks, auto zero circuit which convert an input signal to a valid recharge signal, and metal plate capacitors. BTA offers carry free addition and thus, can be utilized as a basic block in realizing fast multiplier circuits. Authors also described a design of ternary counter based on RSFG devices in [160]. This counter uses balanced ternary notation and suitable for implementation of fast adder structure which can add both positive and negative operands. They also presented a comparator structure based on RSFG ternary inverter blocks and metal plate capacitors. RSFG based designs of [159] and [160] were simulated by using Cadence with analog design environment in 90 µm CMOS process. These circuits operate at a clock frequency of 1 GHz with power supply voltage of 1.0 V only. Zeng et al. [161] presented a design of ternary full adder based on multi-valued switch-level theory. Using this ternary full adder, they designed a ternary ripple carry adder which has the characters of low power and high speed. These designs use complementary pass-transistor logic (CPL) which leads to simple, regular and symmetry structure with respect to gate level designs. In [162], authors described low power design of ternary magnitude comparator (TMC) based on switch-level design technique. This circuit has full-swing output signal which improves noise margin, and less number of transistors insuring a simple circuit and smaller area than gate level TMC. Designs of [161] and [162] were simulated in PSPICE, using TSMC 0.25 µm CMOS device parameters. Simulation results show that they consume less energy (approximately by a factor of two) in comparison with gate level designs. Dhande et al. [73] presented the architecture, design and implementation of 2 bit ternary arithmetic and logic unit (TALU). This design performs operations on 2-bit operands and can be extended for N-bit operands by cascading N/2 TALU slices. It uses ternary decodes and ternary gates. These CMOS ternary gates were realized in enhancement and depletion MOS technology making them suitable for VLSI implementation. TALU sub-blocks were simulated in PSPICE program. It was found that sub-blocks of TALU get significant reduction in power consumption, delay and hardware complexity in comparison with that of [62] [163] [164] and [165]. For instance, TALU sub-block such as full adder (FA) uses 56 ternary gates only while FA of [62] [163] [164] and [165] use 108, 115, 83 and 120 binary gates with an addition encoder, respectively. Aline et al. [166] demonstrated the design of typical blocks of a ternary DSP using SUS-LOC structure. SUS-LOC uses enhancement and depletion type MOSFETs due to their different threshold voltages. The authors designed a library of basic ternary logic elements, memory and arithmetic cells, and used VHDL to get performance modeling and architecture-level simulation. All reported results were extracted in 0.25 µm CMOS technology with transistor model cards derived from level 3 of SPICE. It was shown that DSP sub-modules such as adder, register and shifter etc. get advantages in delay and energy consumption compared to binary CMOS circuits. Chen and Rajashekhara [167] presented a multiplier design using ternary logic and redundant binary signed-digit (RBSD) numbers. Bit pair recoding was used for generation of partial products in RBSD form using two's complement (TC) of multiplier and multiplicand operands. Then RBSD adders were used for adding partial products. The resultant RBSD product was converted back into TC form by utilizing a RBSD to TC converter designed based on borrow look back technique presented in [168]. RBSD number system allows carry free addition of partial products which leads to high speed multiplier design. In addition, since each ternary bit supports a RBSD digit, the use of ternary logic in RBSD adders reduces circuit complexity and number of interconnects in comparison with the designs of [169] and [170] where two bits per RBSE digit is used due to binary logic. All functional units and entire design of 4 x 4 multiplier were simulated using SPICE program to confirm the correctness of logic. Layouts of RBSE adder and partial product generator were produced using MAGIC software on SUN work station. Wang et al. [171] introduced the principle of energy recovery and switch level design technique for the design of ternary circuits. They presented a design of 4 x 4 ternary adiabatic multiplier. In this design, double power clocks were used for charging and discharging of the output node capacitances in adiabatic manner through bootstrapped NMOS transistors and cross-memory structure. This design was simulated in PSPICE with TSMC $0.25~\mu m$ CMOS device. The reported results indicated that it consumes 91% less energy with respect to that of double pass-transistor logic based ternary multiplier. ## 2.3.2 Ternary Circuits based on CNTFET In the recent years, CNTFET has been extensively studied as a potential alternative to the conventional MOSFET for implementing two valued [172-191] and multi valued [192-196, 198-213] circuits. However, the implementation of multiple-valued circuit could be of more interest in CNTFET technology. Because the best way to design voltage-mode multi-valued circuit is the multiple-threshold method, and desired threshold voltage can be obtained merely by using different diameter of CNT in CNTFET device. Roychowdhury et al. [64, 193] developed a functionally complete set of ternary operators based on CNTFET. This set contains literal and its complement, cycle and its complement, min, and tsum operators. In design of these operators, all logic levels are expressed in terms of voltage values considering sufficient noise margin to avoid error in computation. Using these ternary operators, multiplexers, half adders, and ripple carry full adders were developed and simulated using HSPICE with circuit-compatible model of CNTFET presented in [194]. However, these ternary circuits require large ohmic resistive loads (at least $100 \text{ M}\Omega$ values) which result in area overhead and larger power dissipation. Besides, resistive load is difficult to be integrated into CNTFET technology. Lin et al. [195] presented a design of CNTFET-based ternary inverter in which resistive load was replaced by P-CNTFET active load. Based on the same design technique, they developed ternary NAND and NOR gates in [67]. These gates were simulated using HSPICE with Stanford CNTFET model of [117]. The reported results demonstrated that PDP of this ternary inverter is reduced by 300% in comparison with its counterpart presented in [64]. In addition, these gates find less chip area, larger noise margin and better integration in comparison with CNTFET based ternary designs of [64]. Authors also presented a design technique which uses both ternary logic gates and binary logic gates to take advantage of both logic design styles' merits. It was shown that this design technique leads to 90% reduction in PDPs of ternary half adder and 1-bit multiplier with respect to their counterparts of [73] designed using CNTFET based ternary gates only. Nan et al. [196] demonstrated CNTFET-based ternary structures without larger off chip resistor that use a combination of different back biasing voltages and diameter of CNTFET for low power consumption. These circuits were simulated using HSPICE with CNTFET model of [197]. It was found that the presented STI design gets at least 1000 times reduction in PDP with respect to STI designs of [64] and [67]. Similarly, it shows reduction in leakage current by five and nine orders in comparison with STI of [64] and [67], respectively. In addition, it uses four transistors instead of six transistors used in STI of [67]. Liang et al. [198] designed pseudo-complementary CNTFET-based ternary circuits by considering a trade-off between static power consumption and area cost. They replaced resistors used in [64] by P-CNTFET (with gates are connected to ground) and accomplished threshold voltage control by adjusting chirality vector in CNTFET. HSPICE simulation results were verified the correctness of pseudo complementary approach. Author also described a realistic framework through transistor-level analysis to estimate error rate of ternary gates. They developed stochastic computational models for reliability evaluation of ternary gates. Moaiyeri et al. [199] demonstrated CNTFET-based ternary circuits which implements all type of ternary logic including positive, negative and standard, in a single structure. They presented designs of ternary inverters, NAND and NOR gates, half adder and 1-bit multiplier. These circuits were simulated using HSPICE with 32nm CNTFET model of [117]. Simulation results demonstrated reduction in energy consumption by 53% and 40% on average for ternary logic and arithmetic circuits in comparison with their corresponding designs presented in [67]. In addition, these circuits show high driving capability, larger noise margin and insusceptibility to process variations with respect to ternary logic and arithmetic circuits presented in [67]. Vudadha et al. [201] presented a ternary multiplexer based design methodology for realization of CNTFET-based ternary circuits. By using transmission gate based ternary 3:1 multiplexer, they developed ternary half adder and 1-bit comparator circuits. These designs were simulated in HSPICE with CNTFET model of [117]. It was found that ternary half adder gets reduction in delay and PDP by 27% and 23%, respectively, in comparison with its counterpart presented in [67]. In [203], Vudadha et al. presented a design of encoder optimized at transistor level for implementation of ternary functions. Utilizing this encoder, they implemented ternary half adder. Results obtained from HSPICE demonstrated improvement in delay, power and PDP by 22%, 20% and 39% in comparison with that of ternary half adder presented in [67]. Vudadha et al. [205] developed a CNTFET-based ternary comparator that uses binary logic along with ternary logic for optimized implementation. 1-bit comparator design was extended for N-bit operand length by using grouping technique based on prefix structure. This comparator design with different operant length was simulated in HSPICE with CNTFET model of [117]. Simulation results were indicated power dissipation of 0.65 μW and delay of 21 pS for 1-bit design. In [206], authors reported a different implementation of ternary comparator that reduces the complexity of design by eliminating the need of complex ternary decoder. HSPICE simulation results demonstrated that 1-bit comparator design show reduction in power consumption and delay by 81% and 41.6%, respectively, with respect to its counterpart realized based on the design technique of [67]. Nepal K. [207] demonstrated CNTFET-based dynamic ternary structure based on complete model approach of [208]. Using this technique, they presented design of ternary inverter, buffer and MIN gate. These designs use two power supply voltages ( $V_{dd}$ and $V_{dd}$ /2) and single diameter CNTFETs. HSPICE simulation results indicated that ternary inverter shows reduction in PDP by one order of magnitude with respect to that of inverter design presented in [67]. Moaiyeri et al. [209] presented two designs of CNTFET-based ternary full adder. These designs utilize capacitor-based scaled analog summer along with a ternary buffer. In the first design, ternary buffer contains two cascaded ternary inverters in which first one works as a threshold detector and the second one work as a standard inverter to get output from its complement. In the second design, ternary buffer contains threshold detector only to generate Sum signal. The first design uses 5 capacitors and 24 transistors, and the second one contains 18 transistors and 5 capacitors, while ternary full adder designed using cascading of two half adder of [67] uses more than 200 transistors with one extra power supply. However, in the first design, there exist a long path between the input and output, and also due to used capacitors, delay and power consumption are increased. Although second design has more driving capability and less delay, it has more static power consumption compared to first one. Apart from this, both designs suffer from low noise margin due to parasitic effect caused by used capacitors. Ebrahimi et al. [210] presented a CNTFET-based ternary full adder which contains two cascaded half adder blocks to produce output Sum. The so-called half adder does not generate final output Carry. A separate sub-circuit is used to generate Carry. The presented design utilizes 106 transistors including ternary inverters which provide complementary input signals. HSPICE simulation results demonstrated reduction in PDP of presented circuit by 61% and 85% compared to first and second designs of [209] at 3 fF output load. However, this ternary full adder has low driving power due to its long critical path consisting of several pass-transistors in series. #### **Present Scenario** In 2013-2015, some researchers presented ternary logic based digital circuits using CNTFETs. For instance, Moaiyeri et al. [200] developed a universal approach for implementing CNTFET-based ternary circuits with no static power dissipation. In this method, the path from power supply $(V_{dd})$ to ground is eliminated in the static state of the circuit which leads to considerable improvement in power consumption and energy efficiency. HSPICE simulation results with 32 nm CNTFET model of [117] demonstrated that these circuits get 82% reduction in static power consumption in comparison with that of ternary designs presented in [199]. Vudadha et al. [202] designed ternary half adder using a combination of binary 2:1 multiplexer and ternary 3:1 multiplexer. Simulation results indicated 58% reduction in power and 64% reduction in PDP with respect to ternary multiplexer based design of [201]. Sridevi et al. [204] realized ternary combinational circuits including half adder, half subtractor, full adder, full subtractor and 2-bit comparator, based on negation of literals technique. These circuits were evaluated using HSPICE simulator with CNTFET model of [117]. The reported results dictated 5–145 times improvement in PDP with less gate count with respect to ternary–binary combinational gate designs presented in [67]. Mirzaee et al. [211] implemented a ternary full adder on the basis of cascading two half adder blocks to generate Sum and, a carry generator to produce Carry. In this design, resistors and capacitors, which were implemented using transistors, were utilized for voltage division. Authors also presented designs of ternary half adder and 4-bit ternary ripple adder. HSPICE simulation results indicated improvement in delay and power consumption by 66% and 45%, respectively, for ternary full adder in comparison with its counterpart presented in [210] at 0.7 V power supply voltage. In addition, it uses 46 fewer transistors compared to ternary full adder of [210]. However, it suffers from high power dissipation due to used resistors, and low noise margin due to used capacitors. Keshavarzian et al. [212] presented a CNTFET-based ternary full adder implemented using a Sum generator and a Carry generator. Each of these generators contain two pull up networks, two pull down networks and one resistive voltage divider configured using transistors. This TFA utilizes 106 transistors in its realization. A ternary buffer was used for high driving capability of TFA. HSPICE simulation results indicated reduction in PDP by 82%, 93%, and 53%, compared to the first and second TFA designs of [210], and TFA of [209], respectively. Sridharan et al. [213] demonstrated CNTFET-based designs for single-trit and multi-trit adders. They presented single-trit adder design with less complex encoder and carry generation unit (in comparison with designs of [73] and [67]). This design uses 142 transistors in its implementation. For multi-trit design, authors used single-trit adder blocks with reduced number of encoder and decoder blocks (in comparison with direct cascading of single-trit adders). The presented adder designs were simulated in HSPICE with CNTFET model of [117] and device parameters of [214]. The reported results demonstrated 79% reduction in PDP for 3-trit adder and 88% reduction in PDP for 9-trit adder, in comparison with the gate level designs of [67]. But these designs contain a direct path between $V_{dd}$ and ground for some input combinations which results in high static power consumption. Next, Panahi et al. [215] developed two designs of ternary half subtractor based on CNTFETs. The first design is based on complementary CNTFET design style with resistive voltage divider configured using transistors. It uses 42 transistors including ternary inverters for complements of inputs. The second design uses transmission gates and reduces number of transistors to 18. These designs were simulated in HSPICE with CNTFET model of [117]. It was shown that the second design gets 400% improvement in PDP with respect to that of first design. # 2.4 Content Addressable Memory (CAM) Cell A content addressable memory (CAM) is an application specific memory that compares input search data against a stored data, and returns the address of matching data, within one single clock cycle which makes it faster than other software and hardware search systems. There are two types of CAMs: binary CAM (BCAM) and ternary CAM (TCAM). Binary CAM (BCAM) stores logic 0 and logic 1. It performs exact match searches and therefore, it is useful for tag comparison in cache memory. Ternary CAM (TCAM) provides an added flexibility of pattern matching with the use of don't care (X). It stores and searches for an X value along with logic 0 and logic 1. This X value provides wildcard entry where memory cell show a match irrespective of input bits. This feature makes TCAMs popular for realizing networking applications such as packet forwarding and packet classification in network routers. ## 2.4.1 CAM Cell based on MOSFET The first BCAM cell referred as PMOS-dominated diode cell was described by Koo [216] in 1970. This cell contains two NMOS transistors and seven PMOS transistors including one diode connected PMOS for match line pull up. In this cell, bit-line load is data-dependent which leads to unpredictable read and write delays. In addition, the diode based match-line pull up leads to slower search operation, and pull up current provided by bit lines rather than a hard supply creates buffering and electro migration concerns. Kadota et al. [217] presented a 10T active pull-down cell. This cell contains eight NMOS transistors and two PMOS transistors. It resolves the difficulties faced by diode-based cell of [216]: bit-line load is data-independent and the match-line pull-down occurs through an active network connected to a hard supply. In this cell, bit line state during match line precharge has the opposite polarity from that of read/write precharge, which results in increased search cycle time. This cell also suffers from charge-sharing problem. Improved 10T active pull-down cell was demonstrated by Uvieghara et al. [218]. This cell resolves charge sharing problem by swapping of transistors in its XOR based comparison logic portion but the problem of incompatibility with bit line states still remains. Bergh et al. [219] presented a 9T cell with dedicated search lines in triple-metal process. These extra lines offer several advantages. Bit line loads are data independent. The states of match line are independent from read/write activity. Bit lines and search lines have reduced load in comparison with them where a single pair is used. As a consequence, the presented cell leads to high speed and low power consumption with respect to that of above described cells. In addition, read and search operations can be carried out in parallel, which results in increased processing throughput by a factor of two. However, this cell requires wider pull down transistor. Miyatake et al. [220] presented a 9T CAM cell with PMOS as a bit-match device. This match device causes reduced voltage swing in pulling down the match line, in comparison with NMOS pull-down. As a consequence, the presented cell finds smaller match line charging and discharging currents, reduced noise level, and lower power consumption (all approximately by 40%) in comparison with conventional 9T cell presented in [221]. However, the PMOS-based match-line pull-down driver is weak in driving down the word match line for mismatch case in comparison with NMOS-based driver. Liu et al. [222] developed a low-voltage 12T CAM cell with a fast tag-compare capability, based on partially depleted SOI CMOS dynamic threshold techniques. In addition to ten transistors used in conventional cell, this cell uses two extra pass transistors for dynamically controlling of the bodies of transistors in compare network. According to results obtained by MEDICI simulator program, it is 45% faster than that of conventional 10T cell of [218], at supply voltage of 0.7 V. Thirugnanam et al. [223] introduced a 9T toggling match line CAM. They used one additional wire called as active high/active low signal (AHAL) which is shared between two adjacent rows, compared to a basic cell. The presented cell alternates between active high and active low output in every access with AHAL signal which is turned ON in alternate cycle to find the active state of the match line. Simulations were performed in HSPICE using 0.25 µm CMOS technology. The reported results demonstrated that the presented cell shows 40% reduction in power consumption over read misses due to reduced switching activity in match lines by half, in comparison with selective precharge CAM cell of [224] and modified CAM cell of [225]. Mundy et al. [226] demonstrated the first dynamic 5T BCAM cell based on PMOS devices in 1972. This cell contains two 3T dynamic random access memory (DRAM) cells connected back to back with one common read access device. In this cell, ample charge storage is not achieved due to low value of gate capacitance which contains a series connection of gate oxide capacitance and depletion layer capacitance. Yamagata et al. [227] presented a dynamic 5T cell with DRAM-type capacitors. This cell uses NMOS devices and performs all operations with complementary signals with reference to PMOS based cell of [226]. It achieves true differential charge storage but the associated DRAM-type capacitors are difficult to fabricate and rarely available to ASIC designer. Wade and Sodini [228] demonstrated a dynamic 5T cell which does not require advanced DRAM fabrication techniques for resolving charge storage problem. This cell utilizes cross-coupled connection of bit lines. When writes are performed, the storage device is switched OFF; there is no series connection between the depletion capacitance and gate capacitance, and sufficient charge storage is obtained. The presented cell performs fast read and search operation with respect to its counterpart presented in [227]. Dynamic cells of [226-228] need refresh operation, and have data dependency of bit line loads. In [229], Jones presented an 8T dynamic latch cell based on 4T-DRAM. Although this cell need more area compared to the 3T-DRAM based cells, its realization and operation mechanism are much simpler. This cell achieves fully differential data storage with positive feedback latching, and compatible with standard SRAM read/write peripherals. However, it suffers from match line charge sharing problem. Pagiamtzis et al. [230] described a 16T TCAM cell with NOR-based comparison logic. This cell uses two 6T SRAM cells for data storage and four NMOS transistors for bit comparison. It provides a full swing voltage at the gates of comparison transistors and leads to fast match operation. Roth et al. [231] presented a modified version of this cell. They have used PMOS devices instead of NMOS devices for comparison circuitry, which enables a more compact layout by reducing the number of spacing of n-diffusion to p-diffusion in the cell. Additionally, the modified cell reduces wiring capacitance and thereby leads to low power consumption. But, the cell has slow search operation due to high equivalent resistance associated with comparison transistors. Arsovski et al. [232] reported a 12T TCAM cell with NAND-type compare logic. This cell uses asymmetric 4T static cells for data storage and four NMOS transistors for compare network. An asymmetrical arrangement of storage cell contains a hard node which stores full rail voltage signals. The listed cell area is 17.54 µm<sup>2</sup> which is comparable with that of conventional BCAM cell; therefore the presented cell provides ternary implementation without any cost. Choi et al. [233-234] presented a 16T TCAM cell with NAND-type compare network. This cell utilizes two 6T SRAM storage cells and four NMOS compare transistors. This cell shows low power characteristics but suffers from high search delay in comparison with TCAM presented in [231]. Sultan et al. [235] demonstrated a 12T TCAM cell. This cell contains two asymmetric 4T data storage cells and one comparison circuit based on low capacitance search logic. This comparison logic reduces match line capacitance (by 50% or 75% depending on globally masked bits in search data) in comparison with that of NOR comparison logic based TCAM cell of [230] and [231]. The layout of presented cell, which was drawn in Cadence Virtuoso with UMC 0.13 $\mu$ m technology, occupies 12.93 $\mu$ m<sup>2</sup>. Mohan et al. [236] demonstrated the suitability of 5T-SRAM cells for conventional TCAM cells to reduce leakage as well as cell area. They also proposed NMOS coupled 14T TCAM and PMOS coupled 14T TCAM, which use unused state of conventional TCAM for further reduction in cell leakage by eliminating one of the sub-threshold leakage paths. PTM simulation results demonstrated that the presented cells show 40% reduction in leakage with smaller degradation (< 8%) in static noise margin (SNM) over the conventional TCAM cell. The reported results also demonstrated that these cells takes 3.6 nS (maximum) for read and write operations. Kumar et al. [237] reported a 16T-TCAM cell with match line testing circuitry in 180 nm technology. In this cell, a network which contains a transmission gate and a capacitor, is added with the existing ternary NOR cell to check the masking condition during the search operation. It is shown that match line conditions have been tested successfully and the resultant outputs are in accordance with desired ones. Fries et al. [238] presented a dynamic TCAM with coupled match line. This cell uses four N-type transistors, and performs all three basic operations: match, read and write. The match line in the cell is coupled with one of the cell transistor resulting in long matching delays. Authors also demonstrated a match line cut-off scheme to deal with the coupling effects. Valerie et al. [239] reported a dynamic TCAM cell which uses six NMOS transistors and two DRAM-type capacitors. In this cell, match line pull down device which is controlled by the cell node, was isolated from coupling of match line in order to increase the speed of match line discharge. Noda et al. [240] proposed a planer dynamic TCAM fabricated in 130 nm CMOS technology. This cell contains eight NMOS transistors and four planer capacitors. These capacitors are arranged in two complementary pairs which improves stability of TCAM. In addition, this planer dynamic concept leads to a small cell area of 4.79 $\mu m^2$ which is approximately half of the SRAM based TCAM cell implemented in the same technology. Noda et al. [241] also proposed a cost-efficient dynamic TCAM implemented in 130 nm embedded DRAM technology. This cell consists of six NMOS transistors and two capacitors. It uses dual oxide process for the reliability of data retention and fast search operation. The presented cell leads to small area of 3.59 $\mu m^2$ , which is 47% less than that of conventional static TCAM and 25% less than that of dynamic TCAM presented in [240], both fabricated in 130 nm CMOS technology, Additionally, it leads to low power consumption and smaller noise level due to low capacitance of search lines and match lines. Along with these advantages, larger storage node capacitance (30 fF) increases its soft error immunity level and robustness. Frias et al. [242] introduced five decoupled dynamic TCAM cells. They categorized these cells based on the number of transistors used. The presented cells contain six, six-and-a-half, seven and-a-half, and ten-and-a-half transistors (one transistor is shared between two adjacent cells). These cells lead to shorter matching delay due to decoupling of match lines from cell transistors. For evaluation, they were simulated using 0.25 µm CMOS technology and compared with the dynamic cells presented in [228]. Among all cells, 7.5-T DDCAM achieves shorter match delay (89.7 pS). Similarly, 6-T DDCAM cell gets the smallest match delay-current product which is one third of that of other cells. #### 2.4.2 CAM Cell based on CNTFET Design of fast and compact CAM structure is of highest priority and ballistic transport operation and low off current make CNTFET a suitable device for high performance and increased integration density of CAM design. This memory performs parallel data comparison with data storage. It consists of SRAM cells for data storage and a compare network for data comparison. Using CNTFETs, different designs of SRAM cell have been proposed in the literature for binary and ternary storage. Bachtold et al. [46] presented a resistive-load CNTFET-based SRAM cell. However, this cell needs large off-chip resistors (100 M $\Omega$ ) which results in high power dissipation and increased cell area. To address this problem, Lin et al. [243-244] developed a resistor-less CNTFET-based SRAM cell which employs P-CNTFET as active load. This design uses two different diameters for P-CNTFET and N-CNTFET. Authors also introduced a metric called as 'SPR' (static power-noise margin product to power-delay-product ratio) to capture different figures of merit including stability, power consumption and write time. The presented cell was simulated in HSPICE using the Stanford CNTFET model of [117] and the Berkeley Predictive 32 nm CMOS model of [245] to perform comparison with its CMOS counterpart. Simulation results indicated that SPR of the cell is four times higher than its CMOS counterpart having the same configuration. The reported results also demonstrated insusceptibility to process, voltage and temperature variations with respect to CMOS SRAM cell. Kim et al. [246] presented a CNTFET-based 8T SRAM cell. This design disconnects the feedback loop of the two back-to-back inverters of 6T SRAM structure during write operation and, separates the write and read bits with 8T configuration. During write operation, the presented technique reduces number of discharge and minimizes dynamic power consumption. HSPICE simulations were performed with the Stanford CNTFET model of [117] and the Berkeley Predictive 32 nm CMOS model of [245]. Compared to 6T CMOS SRAM cell, the presented 8T cell shows 48% reduction in dynamic power consumption with 56% wider SNM at the expense of 2% and 3% increase in leakage power and write delay, respectively. You and Nepal [247] reported two design of ternary SRAM cell; one utilizes 14 transistors and other utilizes 8 transistors. The first design uses six-transistor based ternary inverters, and the second design uses three- transistors based ternary inverters with one extra power supply $(V_{dd}/2)$ . In these designs, CNTFETs with different diameters are utilized. SPICE simulations indicated that 8T ternary cell has more power consumption than 14T ternary cell. It was also demonstrated that the delay of both designs is comparable, and there exist a trade-off between power and area for making a choice in between these two designs. Lin et al. [248] presented a CNTFET-based ternary memory cell. This cell uses a transmission gate for write operation, and a buffer along with another transmission gate for read operation. Two back to back six transistors-based ternary inverters were used for ternary data storage. The presented cell eliminates the need of extra power supply voltage by making the use of CNTFET chirality vectors for threshold voltage control. This cell gets high SNM due to separate read and write operations, and 90% lower standby power consumption with respect to conventional binary CMOS cell. The reported cell also shows 41.6% area saving compared to that of its CMOS ternary counterpart at 32 nm. Das et al. [249] evaluated the performance of a CNTFET-based 4-bit binary CAM (BCAM) array. They put four 8T NOR-type BCAM cells in parallel to form this array. Simulations were performed in HSPICE with Stanford CNTFET model of [117]. Authors used current race match line sensing amplifier to indicate match line conditions. The reported results demonstrated that the presented CAM array shows 2-4 times speed improvement with 17.4% power saving with respect to its CMOS counterpart. Nepal and You [250] developed an alternate design of CNTFET-based TCAM cell (i.e. 3CAM) using true three valued structures. First, they demonstrated CNTFET-based 8T NOR-type BCAM cell. This cell uses 6T SRAM cell of [244] for data storage and, four N-CNTFETs forming match pull-down network for compare logic. Secondly, they described CNTFET-based 16T TCAM cell with NOR based compare logic. This cell contains two 6T SRAM cells of [244] and four compare transistors. It needs additional area due to use of two storage cells. To address this problem, authors reported a 3CAM cell which has the same usability as TCAM. The presented 3CAM cell uses a ternary memory cell of [247] for data storage and does not require a second binary SRAM which is used in TCAM structure. HSPICE simulation results demonstrated that 3CAM cell has 1.61 nS search delay which is comparable to that of TCAM cell, and 2.29 μW power consumption which is higher than TCAM cell having power consumption of 1.26 μW. The presented 3CAM cell shows 25% area saving over TCAM cell. # 2.5 Research Gaps and Scope of the Presented Work Based on the literature review done in previous section, the following issues have been considered and addressed: - Research work in the area of development of ternary logic and arithmetic circuits using CNTFETs is minimal for compact VLSI sub-system. Hardware optimization of such designs at architecture and circuit level is one of the issues addressed in this thesis. - 2. Exploration of the use of CNTFET for energy efficient ternary sub-system is limited. Design space of speed and power optimized ternary circuit based on CNTFET has been taken up in the presented work. - 3. Less work has been done in the area of constructing CAM structures using CNTFETs for real time applications. There is a need for designing of CAM circuits based on CNTFET. # Design of 2-bit Hardware Optimized Ternary ALU (HO-TALU) using CNTFETs ## 3.1 Introduction An ALU is one of the main components inside central processing unit (CPU) of a digital computer, and even it is found inside the simplest microprocessors also, where it is responsible for performing arithmetic and logic operations. The increasing demand for highly optimized modern information processing system clearly points to the need of efficient implementation of ALU in terms of power, speed and hardware. This chapter presents design of 2-bit hardware optimized ternary ALU (HO-TALU) using CNTFETs. 2-bit HO-TALU gets minimization in required hardware at both architectural as well as at circuit level. At architecture level, HO-TALU has a new adder-subtractor (AS) module which performs both addition and subtraction operations using an adder module only with the help of multiplexers. Thus, it eliminates a subtractor module from the conventional architecture. At circuit level, HO-TALU minimizes ternary function expressions in comparison with function minimization proposed in [73]. Additionally, it utilizes binary gates along with ternary gates in realization of functional modules: AS, multiplier, comparator and exclusive-OR. HSPICE simulation results show that multiplier, comparator and exclusive-OR have advantage in power-delay product (PDP) but AS has a minor loss in PDP. As a consequence, HO-TALU gets significant reduction in device count with marginally increase in PDP for addition and subtraction operations only in comparison with CNTFET-based ternary designs available in the literature. Design of 2-bit HO-TALU is modified to develop a 2-bit HO-TALU slice which could be easily cascaded to construct N-bit HO-TALU. In section 3.2, designs of ternary gates are demonstrated. Architecture and functions of HO-TALU are presented in section 3.3. Section 3.4 provides ternary function minimization and realization, and section 3.5 and 3.6 explain design of HO-TALU functional modules and their integration over TALU slice, respectively. In section 3.7, transient simulation results for functional test and performance evaluation are presented with hardware assessment, followed by the conclusion in Section 3.8. # 3.2 Design of Ternary Logic Gates Ternary logic is a type of multi-valued logic, which adds a third value to the conventional binary logic. Table 3.1 shows the definition of ternary logic states denoted by 0, 1 and 2 and their equivalent voltage levels. An n-variable ternary function $f(a_1, a_2...a_n)$ is a logic function which is mapped on $\{0, 1, 2\}$ <sup>n</sup> to $\{0, 1, 2\}$ . The basic ternary operations (AND, OR and NOT) are defined as follows. $$a_{1}.a_{2}.a_{3}.....a_{n} = min(a_{1}, a_{2}, a_{3}.....a_{n})$$ $$a_{1} + a_{2} + a_{3}.....a_{n} = max(a_{1}, a_{2}, a_{3}.....a_{n})$$ $$\bar{a}_{n} = 2 - a_{n}$$ (3.1) Where $a_1$ , $a_2$ , $a_3$ ... $a_n = \{0, 1, 2\}$ , - denotes arithmetic subtraction and operators $\cdot$ , +, $\bar{}$ , denotes ternary AND, OR and NOT operations, respectively [143]. The most fundamental building blocks of ternary system are NOT (or inverter), NAND and NOR gates. These gates operate according to their respective convention provided by eq. (3.1). **Table 3.1:** Definition of logic states in ternary logic | Logic states | Voltage Level (V) | |--------------|---------------------------| | 0 | 0 | | 1 | 0.45 (V <sub>dd</sub> /2) | | 2 | 0.9 (V <sub>dd</sub> ) | There are three types of inverters, namely positive ternary inverter (PTI), negative ternary inverter (NTI) and standard ternary inverter (STI), which are defined as follows. $$NOT_{k}(a) = \begin{cases} 2 - a & \text{if } a \neq 1 \\ k & \text{if } a = 1 \end{cases}$$ (3.2) In eq. (3.2), 'a' is a ternary input signal; variable k is 0 for NTI, 1 for STI, and 2 for PTI. The truth table of ternary inverters is shown in Table 3.2. Similar to ternary inverters, there could be three types of ternary NAND and NOR gates. **Table 3.2:** Truth Table of ternary inverters | a | NTI | PTI | STI | |---|-----|-----|-----| | 0 | 2 | 2 | 2 | | 1 | 0 | 2 | 1 | | 2 | 0 | 0 | 0 | A two-input ternary NAND and NOR gate having inputs 'a' and 'b' are defined as follows. $$NAND_{k}(a,b) = \begin{cases} 2 - \min(a,b) & \text{if } \min(a,b) \neq 1 \\ k & \text{if } \min(a,b) = 1 \end{cases}$$ $$(3.3)$$ $$NOR_{k}(a,b) = \begin{cases} 2 - \max(a,b) & \text{if } \max(a,b) \neq 1 \\ k & \text{if } \max(a,b) = 1 \end{cases}$$ $$(3.4)$$ According to eq. (3.3) and (3.4), depending upon the logic value of variable k i.e. 0, 1 and 2, a ternary NAND/NOR gate will operate as a negative ternary NAND/NOR (NTNAND/NTNOR) gate, standard ternary NAND/NOR (STNAND/STNOR) gate and positive ternary NAND/NOR (PTNAND/PTNOR) gate, respectively. Table 3.3 gives truth table of ternary NAND and NOR gates. Further, designs of ternary logic gates presented in [199] are used in this work. These gates use complementary CNTFET logic style and unique feature of CNTFET for threshold voltage adjustment without using multiple power supply voltages. **Table 3.3:** Truth Table of ternary NAND and NOR gates | a | b | NTNAND | PTNAND | STNAND | NTNOR | PTNOR | STNOR | |---|---|--------|--------|--------|-------|-------|-------| | 0 | 0 | 2 | 2 | 2 | 2 | 2 | 2 | | 0 | 1 | 2 | 2 | 2 | 0 | 2 | 1 | | 0 | 2 | 2 | 2 | 2 | 0 | 0 | 0 | | 1 | 0 | 2 | 2 | 2 | 0 | 2 | 1 | | 1 | 1 | 0 | 2 | 1 | 0 | 2 | 1 | | 1 | 2 | 0 | 2 | 1 | 0 | 0 | 0 | | 2 | 0 | 2 | 2 | 2 | 0 | 0 | 0 | | 2 | 1 | 0 | 2 | 1 | 0 | 0 | 0 | | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | Figure 3.1 shows schematic diagram and symbol of NTI, where D represents diameter of CNTFET. Design of NTI uses low threshold N-CNTFET (T1) and high threshold P-CNTFET (T2). When input voltage ( $V_a$ ) is 0 V, T1 is turned OFF and T2 is turned ON, and output is pulled up to 0.9 V. As $V_a$ changes to 0.45 V or 0.9 V, T1 is turned ON and T2 is turned OFF, and output is discharged to 0 V. The operation of NTI confirms its entries of Table 3.2. **Figure 3.1:** Negative ternary inverter (NTI) [199] Figure 3.2 shows schematic diagram and symbol of PTI. This design of PTI uses high threshold N-CNTFET (T3) and low threshold P-CNTFET (T4). When input voltage ( $V_a$ ) is 0 V or 0.45 V, T3 is turned OFF and T4 is turned ON, and output is pulled up to 0.9 V. As $V_a$ changes to 0.9 V, T3 is turned ON and T4 is turned OFF, and output is discharged to 0 V. The operation of PTI confirms its entries given in Table 3.2. **Figure 3.2:** Positive ternary inverter (PTI) [199] Figure 3.3 shows schematic diagram and symbol of STI. STI is realized by combining NTI and PTI circuits through a network which contains one P-CNTFET (T5) and one N-CNTFET (T6) having same geometries. Consequently, output voltage $(V_{out})$ is obtained as the average value of output voltage of NTI $(V_{out\_N})$ and PTI $(V_{out\_P})$ , which is represented as follows $$V_{\text{out}} = \frac{(V_{\text{out}_N} + V_{\text{out}_P})}{2}$$ (3.5) When the input voltage $(V_a)$ is 0 V, both $V_{out\_N}$ and $V_{out\_P}$ are 0.9 V and consequently $V_{out}$ is 0.9 V. Similarly, when $V_a = 0.9$ V, both $V_{out\_N}$ and $V_{out\_P}$ are 0 V and consequently $V_{out}$ is 0 V. Finally, when $V_a = 0.45$ V, $V_{out\_P}$ and $V_{out\_N}$ are 0.9 V and 0 V, respectively and consequently, $V_{out}$ is 0.45 V. Further, based on eq. (3.3), (3.4) and described method of designing ternary inverters, CNTFET-based STNAND/STNOR gates are implemented. Figure 3.4 and 3.5 show the schematic diagram and symbol of these gates, respectively. Figure 3.3: Standard ternary inverter (STI) [199] Figure 3.4: Standard ternary NAND (STNAND) gate [199] Figure 3.5: Standard ternary NOR (STNOR) gate [199] # 3.3 Architecture & Functions of 2-bit HO-TALU Figure 3.6 shows the pin diagram and architecture of 2-bit HO-TALU. There are two ternary data inputs A $(A_1 A_0)$ and B $(B_1 B_0)$ , and multiple outputs: Sum/Difference and Carry/Borrow, Product and $C_{out}$ , GR, EQ and LE, and different logic outputs such as A+B, $\overline{A+B}$ , A.B and $\overline{A.B}$ . Two select inputs $(S_1$ and $S_0)$ select desired logic and arithmetic operation, as described in Table 3.4. HO-TALU is capable of providing four arithmetic operations and five logic operations. The arithmetic operations include addition, subtraction, multiplication and comparison. The logic operations include AND, NAND, OR, NOR and XOR. The architecture of HO-TALU consists of the following main components: 1-to-3-line decoder, function selection logic block with active high outputs (FSB-AHO), transmission gate block with active high enable (TGB-AHE) and different functional blocks (referred as modules) which include AS, multiplier, comparator, exclusive-OR, T-OR, T-NOR, T-AND and T-NAND. Terms 'Bi' and 'T' are used to indicate binary and ternary nature of logic gates, respectively. **Table 3.4:** Function Table of 2-bit HO-TALU | $S_1$ | $S_0$ | Function | | |-------|-------|----------------|--| | 0 | 0 | Addition | | | 0 | 1 | Subtraction | | | 0 | 2 | Multiplication | | | 1 | 0 | Comparison | | | 1 | 1 | OR | | | 1 | 2 | NOR | | | 2 | 0 | AND | | | 2 | 1 | NAND | | | 2 | 2 | XOR | | Figure 3.6 (a): Pin out diagram of 2-bit HO-TALU Figure 3.6 (b): Architecture of 2-bit HO-TALU ## 1-to-3-line Ternary Decoder Figure 3.7 shows logic diagram of 1-to-3-line ternary decoder and its truth table [199]. It is a one-input, three-output combinational circuit that generates three unary functions for an input 'a' as $a^0$ , $a^1$ and $a^2$ . This design uses NTI to produce $a^0$ , PTI followed by a binary inverter to generate $a^2$ , and a binary NOR gate having inputs $a^0$ and $a^2$ to produce $a^1$ . The logic response of a decoder is given as: $$a^{c} = \begin{cases} 2 & \text{if } a = c \\ 0 & \text{if } a \neq c \end{cases}$$ (3.6) Where variable c takes 0, 1 and 2. In Figure 3.6 (b), decoder (DEC1) generates unary functions $A_0^0$ , $A_0^1$ and $A_0^2$ for $A_0$ , DEC2 generates $A_1^0$ , $A_1^1$ and $A_1^2$ for $A_1$ . Similarly, decoder DEC3 generates $B_0^0$ , $B_0^1$ and $B_0^2$ for $B_0$ , and DEC4 generates $B_1^0$ , $B_1^1$ and $B_1^2$ for $B_1$ . These unary functions are fed to functional modules for desired outputs. **Figure 3.7:** 1-to-3-line ternary decoder (a) logic level diagram (b) truth table ## **Function Select Logic Block with Active High Outputs (FSB-AHO)** Figure 3.8 shows logic diagram of function select logic block with active high outputs (FSB-AHO). It has two inputs $S_0$ and $S_1$ , and nine outputs ADD, SUB, MULT, COMP, OR, NOR, AND, NAND and XOR. It contains an array of nine binary AND gates with two 1-to-3-line decoders (DEC1 and DEC2). DEC1 generates unary functions $S_0^0$ , $S_0^1$ and $S_0^2$ for $S_0$ . Similarly, DEC2 generates unary functions $S_1^0$ , $S_1^1$ and $S_1^2$ for $S_1$ . These functions are applied to AND gates for a desired output. FSB-AHO selects one particular TALU function depending on the bit combination of $S_0$ and $S_1$ , as described in Table 3.4. Consider the case when $S_1S_0=02$ . $S_1^0$ , $S_1^1$ and $S_1^2$ are 2, 0 and 0, respectively. Similarly, $S_0^0$ , $S_0^1$ and $S_0^2$ are 0, 0 and 2, respectively. The third AND gate of the array having inputs $S_1^0$ and $S_0^2$ makes its output MULT high, while all other AND gates have one (or more) input equal to 0 which makes their outputs low. The high MULT further enables TGB-AHE2 for multiplication operation, as shown in Figure 3.6 (b). Thus, depending upon the logic state of $S_0$ and $S_1$ , only one particular output of FSB-AHO block is active (high) for desired TALU function. **Figure 3.8:** Logic level diagram of function selection logic block with active high outputs (FSB-AHO) ## **Transmission Gate Block with Active High Enable (TGB-AHE)** Figure 3.9 shows logic level diagram of transmission gate block with active high enable (TGB-AHE). It contains an array of transmission gates (TGs) which connect decoder output lines to the data inputs of functional modules. This array is activated when input enable (EN) is high. In Figure 3.6(b), the number of TGs used in the array is mentioned with each individual TGB-AHE. Further, TG is implemented using the parallel connection of P-CNTFET and N-CNTFET. In a TG array, N-CNTFET gate of all TGs is connected to EN and the gate of all P-CNTFETs is connected to $\overline{\text{EN}}$ which is created by using a binary inverter. When EN is 2 (high), the N- CNTFET gate is 0.9 V and the P-CNTFET gate is at 0 V, as a consequence, both transistors conduct and there is a closed path between input (I/P) and output (O/P). Similarly, when EN is 0 (low), the N-CNTFET gate is at 0 V and the P-CNTFET gate is at 0.9 V, consequently both transistors are OFF and there is an open circuit between I/P and O/P. TGB-AHE gets value of EN from one or more outputs of FSB-AHO through either some logic or directly, as shown in Figure 3.6(b). FSB-AHO outputs MULT, COMP, AND, NAND, OR, NOR and XOR are connected directly to EN of the TGB-AHE2, TGB-AHE3, TGB-AHE4, TGB-AHE5 and TGB-AHE6, TGB-AHE7 and TGB-AHE8, respectively. Since TGB-AHE1 is associated with the AS functional module which performs addition and subtraction, its enable input (EN1) must be high whenever one of these operations is desired. For this, a binary OR gate is added with FSB-AHO. FSB-AHO outputs ADD and SUB are applied to this gate which generates EN1. To demonstrate the working of TGB-AHE, addition operation is assumed. FSB-AHO makes its output ADD to high. Consequently binary OR gate set EN1 to 0.9 V, which enables TGB-AHE1. This enabled block supplies decoder generated unary functions of the inputs, to the AS module. This module performs addition operation and produces SUM/Difference and Carry/Borrow outputs. Other modules remain isolated from data inputs due to low value of enable signal (EN) of their TGB-AHE. **Figure 3.9:** Logic level diagram of transmission gate block with active high enable (TGB-AHE) # 3.4 Synthesis, Minimization and Realization of 2-bit HO-TALU Function A ternary n-variable function F $(a_1, a_2...a_n)$ is expressed in its canonical sum form as follows, based on theorem 1 of [143]. $$F(a_1, a_2, \dots, a_n) = F_1(a_1, a_2, \dots, a_n) + 1. F_2(a_1, a_2, \dots, a_n)$$ (3.7) Where $F_1$ is a function which contains terms of 2's and $F_2$ is a function which contains terms of 1's. Accordingly, a ternary function can be synthesized from its truth table. For the minimization of ternary functions, there are three basic methods: - 1. Algebraic method - 2. Tabular method - 3. Ternary K-map method In this work, ternary K-map method [143] is used for function minimization. Further, the realization of minimized ternary functions is shown in Figure 3.10. This implementation method employs 1-to-3-line ternary decoders for converting ternary input signals (0, 1 and 2) into binary signals (0 and 2). These signals are processed by a computation unit containing binary logic gates, and then binary outputs are converted back into ternary outputs using an encoder based on ternary logic gates. Figure 3.10: Ternary function implementation for 2-bit HO-TALU The design procedure is as follows: - 1. Construct truth table and draw K-map for output variables. - 2. Find the arrays (i.e. cell grouping) of 3 x 1, 3 x 2 and 3 x 3 cells with 2's terms and 1's terms. Cells with 2's term can be considered as don't care for the formation of arrays with 1's terms. Produce and minimize the ternary function. - 3. Realize the minimized ternary function. To elaborate synthesis, minimization and realization of ternary functions, designing of a ternary half adder (THA) is explained as follows. The truth table of the THA is given in Table 3.5. The K-map of outputs $S_0$ (Sum) and $C_0$ (Carry) are shown in Figure 3.11. **Table 3.5:** Truth table of ternary half adder (THA) | A | В | S <sub>0</sub> (Sum) | C <sub>0</sub> (Carry) | |---|---|----------------------|------------------------| | 0 | 0 | 0 | 0 | | 0 | 1 | 1 | 0 | | 0 | 2 | 2 | 0 | | 1 | 0 | 1 | 0 | | 1 | 1 | 2 | 0 | | 1 | 2 | 0 | 1 | | 2 | 0 | 2 | 0 | | 2 | 1 | 0 | 1 | | 2 | 2 | 1 | 1 | **Figure 3.11:** K-map of ternary half adder (THA) From the K-maps, functions $F_1$ and $F_2$ are derived for $S_0$ and $C_0$ . The simplified expressions of these variables are expressed as: $$S_0(Sum) = F_1 + 1.F_2 = A_0^2 B_0^0 + A_0^1 B_0^1 + A_0^0 B_0^2 + 1.(A_0^1 B_0^0 + A_0^0 B_0^1 + A_0^2 B_0^2)$$ (3.8) $$C_0(Carry) = F_3 + 1.F_4 = 0 + 1.(A_0^2 B_0^1 + A_0^2 B_0^2 + A_0^1 B_0^2$$ (3.9) In case of THA, minimization by grouping of cells is not possible. Based on eq. (3.8) and (3.9), THA is implemented and shown in Figure 3.12. This implementation is taken from [199] and used for this work. **Figure 3.12:** Logic level diagram of ternary half adder (THA) # 3.5 Design & Implementation of 2-bit HO-TALU functional module Design and implementation of 2-bit HO-TALU functional modules including AS, comparator, multiplier, exclusive-OR, T-OR, T-NOR, T-AND and T-NAND are described in the following sub-sections. ## 3.5.1 Adder-Subtractor (AS) Module Figure 3.13 shows the block diagram of the proposed AS module. It performs addition and subtraction operations on A $(A_1A_0)$ and $(B_1B_0)$ , which are represented by A+B and A-B, respectively. AS module consists of one half adder-subtractor (HAS) block and one full adder-subtractor (FAS) block. $A_0$ and $B_0$ are added or subtracted using HAS which generates outputs $S_0/D_0$ and $C_0/B_0$ . This $C_0/B_0$ is passed through a 1-to-3-line decoder to generate its unary functions. Then, these functions are supplied to FAS that adds or subtracts $A_1$ , $B_1$ and $C_0/B_0$ , and produces outputs $S_1/D_1$ and $C_1/B_1$ . $S_0/D_0$ and $S_1/D_1$ represent least significant bit (LSB) and most significant bit (MSB) of Sum/Difference TALU output, and $C_1/B_1$ represents the Carry/ Borrow TALU output. M is mode input which decides the operation between addition and subtraction. It is supplied by SUB output line of FSB-AHO. Therefore, when M=0, AS performs addition and for M=2, it performs subtraction. Design of HAS and FAS blocks are demonstrated in the following sub-sections. **Figure 3.13:** Logic level diagram of 2-bit ternary adder-subtractor (AS) #### Half Adder-Subtractor (HAS) Block The schematic diagram of HAS block is shown in Figure 3.14. It performs both addition and subtraction operations using one THA only with three multiplexers (MUXs), based on the similarity in realization of THA and ternary half subtractor (THS). The implementation of THA is demonstrated in previous section (section 3.4). For THS, using its truth table given in Table 3.6, K-map is drawn and shown in Figure 3.15. From this K-map, simplified expressions of outputs $D_0$ (Difference) and $B_0$ (Borrow) are derived as: $$D_0(\text{Difference}) = A_0^2 B_0^0 + A_0^1 B_0^2 + A_0^0 B_0^1 + 1. \ (A_0^1 B_0^0 + A_0^0 B_0^2 + A_0^2 B_0^1)$$ (3.10) $$B_0(Borrow) = 1. (A_0^0 B_0^1 + A_0^0 B_0^2 + A_0^1 B_0^2)$$ (3.11) Minimized expressions of output variables $S_0$ (Sum) and $C_0$ (Carry) of THA are given in eq. (3.8) and (3.9) in section 3.4. Based on the eq. (3.8) and (3.10), expression for $D_0$ will be same as that of $S_0$ if variable $B_0^1$ is replaced by $B_0^2$ , and $B_0^2$ is replaced by $B_0^1$ . Similarly, according to eq. (3.9) and (3.11), expression for $B_0$ will be same as that of $C_0$ if variable $A_0^0$ is replaced with $A_0^2$ . Hence, a THA can produce $D_0$ and $B_0$ also; having proper selection of input variables which is accomplished using MUXs as shown in Figure 3.14. When M=0, MUX1, MUX2 and MUX3 select $B_0^1$ , $B_0^2$ and $A_0^2$ , respectively, and HAS compute the functions given by eq. (3.8) and (3.9) for addition operation. Similarly, when M=2, MUX1, MUX2, MUX3 select $B_0^2$ , $B_0^1$ and $A_0^0$ , respectively, and HAS computes the functions given by eq. (3.10) and (3.11) for subtraction operation. Figure 3.14: Logic level diagram of ternary half adder-subtractor (HAS) **Table 3.6:** Truth Table of ternary half subtractor (THS) | $\mathbf{A_0}$ | $\mathbf{B_0}$ | D <sub>0</sub> (Difference) | B <sub>0</sub> (Borrow) | |----------------|----------------|-----------------------------|-------------------------| | 0 | 0 | 0 | 0 | | 0 | 1 | 2 | 1 | | 0 | 2 | 1 | 1 | | 1 | 0 | 1 | 0 | | 1 | 1 | 0 | 0 | | 1 | 2 | 2 | 1 | | 2 | 0 | 2 | 0 | | 2 | 1 | 1 | 0 | | 2 | 2 | 0 | 0 | **Figure 3.15:** K-map of ternary half subtractor (THS) ## Full Adder-Subtractor (FAS) Block The schematic diagram of the FAS block is shown in Figure 3.16. It performs both addition and subtraction operations using one ternary full adder (TFA) only with six MUXs, based on the similarity in realization of TFA and ternary full subtractor (TFS). For TFA, using its truth table given in Table 3.7, K-map is drawn and shown in Figure 3.17. From this K-map, simplified expressions of outputs $S_1$ (Sum) and $C_1$ (Carry) are derived as follows. $$\begin{split} S_{1}\left(Sum\right) &= C_{0}^{0}(A_{1}^{0}B_{2}^{0} + A_{1}^{1}B_{1}^{1} + A_{1}^{2}B_{1}^{0}) + C_{0}^{1}(A_{1}^{1}B_{1}^{0} + A_{1}^{0}B_{1}^{1} + A_{1}^{2}B_{1}^{2}) \\ &+ C_{0}^{2}(A_{1}^{0}B_{1}^{0} + A_{1}^{1}B_{1}^{2} + A_{1}^{2}B_{1}^{1}) + 1.\{C_{0}^{0}(A_{1}^{1}B_{1}^{0} + A_{1}^{0}B_{1}^{1} + A_{1}^{2}B_{1}^{2}) + \\ C_{0}^{1}(A_{1}^{0}B_{1}^{0} + A_{1}^{1}B_{1}^{2} + A_{1}^{2}B_{1}^{1}) + C_{0}^{2}.(A_{1}^{2}B_{1}^{2} + A_{1}^{1}B_{1}^{1}) + C_{0}^{2}A_{1}^{0}B_{1}^{0})\} \end{split}$$ $$(3.12)$$ $$C_{1}(Carry) = A_{1}^{2}B_{1}^{2} + C_{0}^{2} + 1.\{C_{0}^{0}(A_{1}^{1}B_{1}^{2} + A_{1}^{2}B_{1}^{1} + A_{1}^{2}B_{1}^{2} + C_{0}^{1}.(A_{1}^{2}B_{1}^{2} + A_{1}^{1}B_{1}^{1}) + C_{0}^{2}\overline{A_{1}^{0}B_{1}^{0}}\}$$ (3.13) Similarly, for THS, using its truth table given in Table 3.7, K-map is drawn and shown in Figure 3.18. From this K-map, simplified expressions of outputs $D_1$ (Difference) and $B_1$ (Borrow) are derived as: $$\begin{split} &D_{1}\left(\text{Difference}\right) = C_{0}^{0}(A_{1}^{0}B_{1}^{1} + A_{1}^{1}B_{1}^{2} + A_{1}^{2}B_{1}^{0}) + C_{1}^{2}\left(A_{1}^{1}B_{1}^{0} + A_{1}^{0}B_{1}^{2} + A_{1}^{21}B_{1}^{1}\right) \\ &+ C_{0}^{1}(A_{1}^{0}B_{1}^{0} + A_{1}^{1}B_{1}^{1} + A_{1}^{2}B_{1}^{2}) + 1.\{C_{0}^{0}(A_{1}^{1}B_{1}^{0} + A_{1}^{0}B_{1}^{2} + A_{1}^{2}B_{1}^{1}) \\ &+ C_{0}^{2}(A_{1}^{0}B_{1}^{0} + A_{1}^{1}B_{1}^{1} + A_{1}^{2}B_{1}^{2}) + C_{0}^{1}(A_{1}^{0}B_{1}^{1} + A_{1}^{1}B_{1}^{2} + A_{1}^{2}B_{1}^{0})\} \end{split}$$ (3.14) $$B_{1} (Borrow) = A_{1}^{0}B_{1}^{2} + C_{0}^{2} + 1.\{C_{0}^{0}(A_{1}^{1}B_{1}^{2} + A_{1}^{0}B_{1}^{1} + A_{1}^{0}B_{1}^{2} + C_{0}^{1}.(A_{1}^{0}B_{1}^{2} + A_{1}^{1}B_{1}^{1}) + C_{0}^{2} \frac{A_{1}^{2}B_{1}^{0}}{A_{1}^{2}B_{1}^{0}}\}$$ (3.15) **Figure 3.16 (a):** Logic diagram of S<sub>1</sub>/D<sub>1</sub> generator of ternary full adder-subtractor (FAS) Figure 3.16 (b): Logic diagram of $C_1/B_1$ generator of ternary full adder-subtractor (FAS) Table 3.7: Truth Table of ternary full adder (TFA) and full subtractor (TFS) | $\mathbf{A_1}$ | <b>B</b> <sub>1</sub> | C <sub>0</sub> | S <sub>1</sub> (Sum) | C <sub>1</sub> (Carry) | D <sub>1</sub> (Difference) | B <sub>1</sub> (Borrow) | |----------------|-----------------------|----------------|----------------------|------------------------|-----------------------------|-------------------------| | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 0 | 0 | 1 | 1 | 0 | 2 | 1 | | 0 | 0 | 2 | 2 | 0 | 1 | 1 | | 0 | 1 | 0 | 1 | 0 | 2 | 1 | | 0 | 1 | 1 | 2 | 0 | 1 | 1 | | 0 | 1 | 2 | 0 | 1 | 0 | 1 | | 0 | 2 | 0 | 2 | 0 | 1 | 1 | | 0 | 2 | 1 | 0 | 1 | 0 | 1 | | 0 | 2 | 2 | 1 | 1 | 2 | 2 | | 1 | 0 | 0 | 1 | 0 | 1 | 0 | | 1 | 0 | 1 | 2 | 0 | 0 | 0 | | 1 | 0 | 2 | 0 | 1 | 2 | 1 | | 1 | 1 | 0 | 2 | 0 | 0 | 0 | | 1 | 1 | 1 | 0 | 1 | 2 | 1 | | 1 | 1 | 2 | 1 | 1 | 1 | 1 | | 1 | 2 | 0 | 0 | 1 | 2 | 1 | | 1 | 2 | 1 | 1 | 1 | 1 | 1 | | 1 | 2 | 2 | 2 | 1 | 0 | 1 | | 2 | 0 | 0 | 2 | 0 | 2 | 0 | | 2 | 0 | 1 | 0 | 1 | 1 | 0 | | 2 | 0 | 2 | 1 | 1 | 0 | 0 | | 2 | 1 | 0 | 0 | 1 | 1 | 0 | | 2 | 1 | 1 | 1 | 1 | 0 | 0 | | 2 | 1 | 2 | 2 | 1 | 2 | 1 | | 2 | 2 | 0 | 1 | 1 | 0 | 0 | | 2 | 2 | 1 | 2 | 1 | 2 | 1 | | 2 | 2 | 2 | 0 | 2 | 1 | 1 | Figure 3.17: K-map for ternary full adder (TFA) **Figure 3.18:** K-map for ternary full subtractor (TFS) Based on the eq. (3.12) and (3.14), function for $D_1$ will be same as that of $S_1$ if $B_1^1$ is replaced by $B_1^2$ , $B_1^2$ is replaced by $B_1^0$ , $C_0^1$ is replaced by $C_0^2$ , and $C_0^2$ is replaced by $C_0^1$ . Similarly, according to eq. (3.12) and (3.14), function for $B_1$ will be same as that of $C_1$ if variable $A_1^0$ is replaced by $A_1^2$ , and $A_1^2$ is replaced by $A_1^0$ . Hence, a TFA can generate $D_1$ and $B_1$ also, along with $S_1$ and $C_1$ , having proper selection of input variables. For this, MUXs are used in FAS block. When $M_1^0$ when $M_2^0$ in $M_1^0$ mux It is worth mentioning that by performing addition and subtraction operations using AS module, HO-TALU architecture achieves efficient hardware implementation of these two operations in comparison with the existing TALU architecture which contains separate adder and subtractor modules. #### 3.5.2 Comparator Module A comparator module performs comparison of A $(A_1A_0)$ and B $(B_1B_0)$ , and determines their relative magnitudes. The response of comparison is specified by three output variables GR, EQ and LE that indicate whether A is greater than B (A>B), A is less than B (A<B), or A is equal to B (A=B). The truth table of comparator is given in Table 3.8. According to Table 3.8, following logic is satisfied: - 1. One of output variables GR, EQ and LE is 2 then the remaining two variables are always 0. - 2. Two of GR, EQ and LE are 0 then the remaining third one is always 2. - 3. Two of GR, EQ and LE are never equal to 2 at the same time. - 4. GR, EQ and LE have only two logic values 2 and 0, and they never can be 1. **Table 3.8:** Truth Table of 2-bit ternary comparator | $\mathbf{A_1}$ | $\mathbf{A_0}$ | $\mathbf{B_1}$ | $\mathbf{B_0}$ | EQ | LE | GR | |----------------|----------------|----------------|----------------|----|----|----| | 0 | 0 | 0 | 0 | 2 | 0 | 0 | | 0 | 0 | 0 | 1 | 0 | 2 | 0 | | 0 | 0 | 1 | 2 | 0 | 2 | 0 | | 0 | 1 | 1 | 0 | 0 | 2 | 0 | | $\mathbf{A_1}$ | $\mathbf{A_0}$ | $\mathbf{B_1}$ | $\mathbf{B_0}$ | EQ | LE | GR | |----------------|----------------|----------------|----------------|----|----|----| | 0 | 1 | 2 | 1 | 0 | 2 | 0 | | 0 | 1 | 2 | 2 | 0 | 2 | 0 | | 0 | 2 | 0 | 0 | 0 | 0 | 2 | | 0 | 2 | 1 | 1 | 0 | 2 | 0 | | 0 | 2 | 2 | 2 | 0 | 2 | 0 | | | | | | | | | | | | • | • | • | • | • | | • | | • | • | • | • | • | | 1 | 0 | 0 | 0 | 0 | 0 | 2 | | 1 | 0 | 0 | 1 | 0 | 0 | 2 | | 1 | 0 | 1 | 2 | 0 | 2 | 0 | | 1 | 1 | 1 | 0 | 0 | 0 | 2 | | 1 | 1 | 2 | 1 | 0 | 2 | 0 | | 1 | 1 | 2 | 2 | 1 | 2 | 0 | | 1 | 2 | 0 | 0 | 0 | 0 | 2 | | 1 | 2 | 1 | 1 | 0 | 0 | 2 | | 1 | 2 | 2 | 2 | 0 | 2 | 0 | | • | • | • | • | • | • | • | | • | • | • | • | • | • | • | | • | | • | • | • | • | • | | 2 | 0 | 0 | 0 | 0 | 0 | 2 | | 2 | 0 | 0 | 1 | 0 | 0 | 2 | | 2 | 0 | 1 | 2 | 0 | 0 | 2 | | 2 | 1 | 1 | 0 | 0 | 0 | 2 | | 2 | 1 | 2 | 1 | 2 | 0 | 0 | | 2 | 1 | 2 | 2 | 0 | 2 | 0 | | 2 | 2 | 0 | 0 | 0 | 0 | 2 | | 2 | 2 | 1 | 1 | 0 | 0 | 2 | | 2 | 2 | 2 | 2 | 2 | 0 | 0 | Based on first three observations, GR is expressed in terms of EQ and LE as follows. $$GR = \overline{EQ + LE}$$ (3.16) To derive EQ and LE, K-map is drawn as shown in Figure 3.19. From this K-map, simplified expressions for EQ and LE are derived and expressed as: $$EQ = (A_1^0 B_1^0 + A_1^1 B_1^1 + A_1^2 B_1^2) (A_0^0 B_0^0 + A_0^1 B_0^1 + A_0^2 B_0^0)$$ (3.17) $$LE = (A_1^0 B_1^0 + A_1^2 B_1^2 + \overline{A}_1^2 B_1^2) + (A_0^0 B_0^1 + \overline{A}_0^0 B_0^2 + A_0^2 B_0^0)(A_1^0 + B_1^0 + A_1^1 B_1^1)$$ (3.18) Based on above forth observation, the comparator is implemented using binary gates only in order to achieve improved performance. The logic diagram of the comparator module is shown in Figure 3.20. The circuits shown in Figure 3.20 (a) and (b) compute the functions given by eq. (3.17) and (3.18) and accordingly, generate EQ and LE, respectively. Then these output variables are passed through one NOR gate to generate GR based on eq. (3.16) and shown in Figure 3.20 (b). Figure 3.19(a): Ternary K-map for EQ of comparator module Figure 3.19(b): Ternary K-map for LE of comparator module Figure 3.20 (a): Logic level diagram for EQ generator of comparator Module Figure 3.20(b): Logic level diagram for LE and GR generator of comparator module The comparator design achieves hardware optimization at three levels. At first level, it implements GR using only one NOR gate. At second level, simplified logic expressions are used for EQ and LE. At third level, these functions are implemented using binary gates only instead of ternary gates. Therefore, for comparison operation, HO-TALU reduces number of logic gates approximately to one fourth with respect to its counterpart presented in [73]. #### 3.5.3 Exclusive-OR Module The block diagram of exclusive-OR module is shown in Figure 3.21. It performs XOR operation on A $(A_1A_0)$ and B $(B_1B_0)$ , and generates two outputs $A_0 \oplus B_0$ and $A_1 \oplus B_1$ . The ternary XOR operation is carried out using mod-3 addition where carry bit generated from ternary addition is ignored [251]. As a consequence, exclusive-OR module is implemented using sum generation circuitry of THA, which is represented by THA\_SUM block. This module consists of two THA\_SUM blocks where each block performs 1-bit XOR operation. The truth table of 1-bit XOR operation for $A_0$ and $B_0$ is given in Table 3.9. The schematic diagram of THA\_SUM block is shown in Figure 3.22 where eight binary NAND gates, two ternary NAND gates and an inverter produce XOR function which is defined as follows. $$A_0 \oplus B_0 = A_0^2 B_0^0 + A_0^1 B_0^1 + A_0^0 B_0^2 + 1.(A_0^1 B_0^0 + A_0^0 B_0^1 + A_0^2 B_0^2)$$ (3.19) The proposed exclusive-OR module leads to compact structure with respect to its counterpart presented in [73] which utilizes both sum and carry generator of THA block. Figure 3.21: Block diagram of the exclusive-OR module Table 3.9: Truth table of 1-bit ternary XOR | $A_0$ | $\mathbf{B}_{0}$ | $A_0 \oplus B_0$ | |-------|------------------|------------------| | 0 | 0 | 0 | | 0 | 1 | 1 | | 0 | 2 | 2 | | 1 | 0 | 1 | | 1 | 1 | 2 | | 1 | 2 | 0 | | 2 | 0 | 2 | | 2 | 1 | 0 | | 2 | 2 | 1 | Figure 3.22: Logic level diagram of THA\_SUM block of exclusive-OR module # 3.5.4 Multiplier Module Multiplication of ternary numbers is performed in the same way as multiplication of decimal or binary numbers. The multiplicand is multiplied by each bit of the multiplier, starting from the least significant position and each such multiplication forms a partial product. Successive partial products are shifted one position to the left and the final product is obtained from the sum of the partial products. Figure 3.23 shows the block diagram of the multiplier module (decoded unary functions for input data are not shown). The architecture of multiplier presented in [73] is used, here. It performs multiplication of $A_1A_0$ and $B_1B_0$ , and produces product of four bits $M_3M_2M_1M_0$ with $C_{out}$ . Figure 3.23: Block diagram of multiplier module Multiplier module contains 1-bit multipliers, ternary half adders (THAs) and ternary full adders (TFAs) for the generation of partial products, shifting and final addition of partial products. Table 3.10 gives the truth table of 1-bit multiplier which performs multiplication of $A_0$ and $B_0$ , and generates outputs $P_0$ and $C_0$ . Figure 3.24 shows the K-map for the same. From the K-map, simplified expressions of $P_0$ and $C_0$ are derived and expressed as: $$P_0 = (A_0^2 B_0^1 + A_0^1 B_0^2) + 1.(A_0^1 B_0^1 + A_0^2 B_0^2)$$ (3.20) $$C_0 = 1.(A_0^2 + B_0^2) (3.21)$$ **Table 3.10:** Truth table of 1-bit ternary multiplier | $A_0$ | $B_0$ | P <sub>0</sub> (Product) | C <sub>0</sub> (Carry) | |-------|-------|--------------------------|------------------------| | 0 | 0 | 0 | 0 | | 0 | 1 | 0 | 0 | | 0 | 2 | 0 | 0 | | 1 | 1 | 1 | 0 | | 1 | 2 | 2 | 0 | | 2 | 2 | 1 | 1 | Figure 3.24: K-map of 1-bit ternary multiplier Figure 3.25 gives the logic diagram of 1-bit ternary multiplier which computes the functions given by eq. (3.20) and (3.21). A 1-bit multiplier is implemented using design methodology presented in [199]. The implementation of THA is already discussed in section 3.4 and shown in Figure 3.12. The realization of TFA is same as FAS block discussed in section 3.5.1 and shown in Figure 3.16, excluding MUXs. The combination of binary and ternary logic gates used in subblocks of multiplier leads to a hardware efficient design with respect to the multiplier design presented in [73]. Figure 3.25: Logic level diagram of 1-bit ternary multiplier # 3.5.5 T-OR/T-NOR/T-AND/T-NAND Module Figure 3.26 shows the logic diagram of T-OR/T-NOR/T-AND/T-NAND modules. They perform logic operations on A $(A_1A_0)$ and B $(B_1B_0)$ using ternary gates which are discussed in section 3.2. Since logic operations manipulate the bits of the operands separately and treat each bit as a ternary variable, each type of ternary gate is repeated two times to get 2-bit logic outputs. Figure 3.26: Logic level diagram of (a) T-NAND (b) T-AND (c) T-NOR (d) T-OR modules # 3.6. 2-bit HO-TALU Slice for N-bit HO-TALU Design of 2-bit HO-TALU is extended to implement a 2-bit HO-TALU slice which can be duplicated for n/2 times to build an N-bit TALU. The pin diagram of the 2-bit HO-TALU slice is shown in Figure 3.27. Compared to design of 2-bit HO-TALU, this slice has new input signals which are named as cascaded signals: Carry<sub>c</sub>/Borrow<sub>c</sub>, GR<sub>c</sub>, EQ<sub>c</sub> and LE<sub>c</sub>. # Figure 3.27: Block diagram of 2-bit HO-TALU slice To incorporate these signals for implementation of 2-bit TALU slice, some modifications are required in 2-bit HO-TALU design. Modification in the AS module is shown in Figure 3.28 (decoded unary functions for input data are not shown). The modified AS (MAS) module uses FAS block in place of HAS to deal with cascaded input of carry/borrow signal i.e. Carry\_/Borrow\_c (C\_c/B\_c). For N-bit HO-TALU, the cascaded configuration of this module is shown in Figure 3.29 where input $C_{c0}/B_{c0}$ of MAS1 is connected to ground (logic 0). Input $C_{c1}/B_{c1}$ of MAS2 is connected to output $C_1/B_1$ of MAS1. Similarly, for other MAS module, input $C_c/B_c$ comes from output C/B of their respective previous block, thus the generated carries or borrows propagates in a chain through the MAS modules. As soon as the previous carrier or borrow are available, the correct Sum/Difference and Carry/Borrow emerge from the output S/D and C/B of MAS modules. Figure 3.28: Block Diagram for modified adder-subtractor (MAS) **Figure 3.29:** Cascaded configuration of modified adder-subtractor (MAS) for N-bit HO-TALU The modified comparator (MCOMP) module of 2-bit slice is shown in Figure 3.30. MCOMP contains 2-bit HO-TALU comparator module which is named as COMP\_Pre block, here and one small added circuitry. In this module, cascading comparator inputs GR<sub>c</sub>, EQ<sub>c</sub> and LE<sub>c</sub>, and signals GR<sub>i</sub>, EQ<sub>i</sub> and LE<sub>i</sub> generated by COMP\_Pre block, are applied to added circuit which produces three outputs EQ, LE and GR. This added circuit is implemented based on the following observations: - 1. For EQ to be 2, both EQc and EQi should be 2. - 2. For LE to be 2, either LE<sub>i</sub> should be 2 or both EQ<sub>i</sub> and LE<sub>c</sub> should be 2. - 3. For GR to be 2, either GR<sub>i</sub> should be 2 or both EQ<sub>i</sub> and GR<sub>c</sub> should be 2. Accordingly, EQ, LE and GR are obtained and expressed as follows. $$EQ = \overline{EQ_c \cdot EQ_i}$$ (3.22) $$LE = LE_i + LE_c \cdot EQ_i \tag{3.23}$$ $GR = GR_i + GR_c \cdot EQ_i$ (3.24) Figure 3.30: Logic diagram for modified comparator (MCOMP) For N-bit TALU, the cascaded configuration of MCOMP module is shown in Figure 3.31, where cascading comparator inputs $EQ_{c0}$ , $GR_{c0}$ and $LE_{c0}$ of MCOMP1 is connected to $V_{dd}$ , 0, and 0 respectively. Inputs $EQ_{c1}$ , $GR_{c1}$ and $LE_{c1}$ of MCOMP2 are connected to outputs $EQ_1$ , $GR_1$ and $LE_1$ of MCOMP1. Similarly, for other MCOMP block, inputs $EQ_c$ , $GR_c$ and $LE_c$ come from outputs EQ, $EQ_c$ 0 and $EQ_c$ 1 are connected in a chain through the MCOMP blocks. The final N-bit comparison result emerges from MCOMP $E_{N/2}$ 1 block where $EQ_{N-1}$ 2 and $EQ_{N-1}$ 3 and $EQ_{N-1}$ 4 and $EQ_{N-1}$ 5 block where $EQ_{N-1}$ 6 and $EQ_{N-1}$ 6 and $EQ_{N-1}$ 6 block where $EQ_{N-1}$ 6 and $EQ_{N-1}$ 6 block where $EQ_{N-1}$ 6 and $EQ_{N-1}$ 6 block where $EQ_{N-1}$ 6 and $EQ_{N-1}$ 6 block where $EQ_{N-1}$ 6 and $EQ_{N-1}$ 6 block where For 2-bit HO-TALU slice, logic operation modules of HO-TALU are used without any modification. Further, as the number of bits increases, multiplication becomes more complex. For N-bit multiplication, the multiplier block presented in [62] is used where parallel N-bit design requires N2 one-bit ternary multipliers, (N - 1) THAs, and N (N - 1) TFAs. **Figure 3.31:** Cascaded configuration of modified comparator (MCOMP) for N-bit HO-TALU # 3.7 Results and Discussion Design of proposed 2-bit HO-TALU is analyzed and evaluated using Synopsys HSPICE simulator with the Stanford model of 32 nm CNTFET [117] which includes non-idealities of CNTFET. Details of the Stanford model have been demonstrated in section 2.2 of chapter 2. The chirality vector of CNTFETs used in binary gates and TG block is (19, 0). The threshold voltage of these transistors is 0.289V with the diameter of 1.487 nm. The diameter of CNTFETs utilized in ternary gates is given in section 3.2 of this chapter. Other technology parameters of CNTFET have same values as mentioned in section 2.2 of chapter 2. To perform comparison of proposed ternary designs in CNTFET technology, TALU design presented in [73] is reproduced. For this, circuits of TALU modules presented in [73] are taken and implemented using CNTFET-based ternary gates of [199] as these logic gates outperform other existing CNTFET-based gates. Design of THA presented in [199] leads to energy-efficient and compact design with respect to other CNTFET-based THA circuits. As a consequence, THS, TFA and TFS are also implemented using the design methodology of [199] and referred as CNTFET-based circuits of [199], for comparison of HAS and FAS. Further, in order to perform comparison with 32 nm CMOS technology, proposed circuits are implemented using CMOS-based binary and ternary logic gates presented in [140-141]. The CMOS-based ternary gates use multiple voltages for threshold voltage adjustment and rely on multi-threshold method for ternary operation. Berkeley Predictive 32nm CMOS model [145] is used to simulate CMOS-based designs. For reproduced designs, aspect ratios of MOSFETs, diameter of CNTFETs and value of other parameters, are chosen according to the information given in the respective papers from the literature. #### 3.7.1 Functional Verification of 2-bit HO-TALU To verify the functionality of 2-bit HO-TALU, its sub-circuits as well as entire design are tested through transient simulations. For instant, the transient waveform of FAS cell is shown in Figure 3.32. The first three waveforms represent inputs $A_1$ , $B_1$ and $C_0$ (input Carry/Borrow). When mode input M=0, FAS performs addition ( $A_1+B_1+C_0$ ) and generates outputs $S_1$ (Sum) and $C_1$ (Carry). These signals are shown in the fourth and fifth waveforms respectively. Similarly, when M=2, FAS performs subtraction ( $A_1-B_1-C_0$ ) and generates two outputs $D_1$ (Difference) and $B_1$ (Borrow). These signals are shown in the sixth and seventh waveforms, respectively. Depending upon the value of mode input M, FAS performs correct ternary addition and subtraction operation and thus, the functionality of FAS is verified. Similarly, transient waveforms of proposed HAS, multiplier, comparator, exclusive-OR and logic operation modules (only T-AND is included) are shown in Appendix which confirm their correct operations. **Figure 3.32:** Transient waveform of full adder-subtractor (FAS) # 3.7.2. Hardware Efficiency Evaluation of 2-bit HO-TALU Design of 2-bit HO-TALU is evaluated on the basis of hardware efficiency. For addition and subtraction operations, HO-TALU design introduces an AS module while the existing TALU designs contain separate adder and subtractor module. The sub-blocks of AS are HAS and FAS. HAS is compared with its counterpart of [73] and [199], which are considered as a combination of THA and THS. Similarly, FAS is compared with its counterpart of [73] and [199], which are considered as a combination of TFA and TFS. Designs of 2-bit multiplier, 2-bit comparator and 2-bit exclusive-OR are compared with their counterparts presented in [73]. Comparison of proposed ternary circuits based on device count is given in Table 3.11 and shown in Figure 3.33. HAS block reduces number of transistors by 34% and 76% compared to the designs presented in [199] and [73], respectively. Similarly, FAS block reduces number of transistors by 41% and 82% compared to the designs presented in [199] and [73], respectively. Designs of 2-bit multiplier, 2-bit comparator and 2-bit exclusive-OR achieve reduction in device count by 64%, 82% and 76% respectively, in comparison with their counterparts presented in [73]. Table 3.11: Comparison of ternary circuits based on device count | S. No. | Circuits | Device<br>Count | % improvement in Device count | |--------|---------------------------------------------------------|-----------------|-------------------------------| | i | Combination of HA and HS designs of [199] | 184 | | | ii | Combination of HA and HS circuits of [73] using CNTFETs | 504 | | | iii | Proposed HAS | 122 | 34 wrt (i), 76 wrt (ii) | | iv | Combination of FA and FS designs of [199] | 428 | | | v | Combination of FA and FS circuits of [73] using CNTFETs | 1448 | | | vi | <b>Proposed</b> FAS | 250 | 41 wrt (iv), 82 wrt (v) | | vii | 2-bit multiplier circuit of [73] using CNTFETs | 3532 | | | viii | Proposed 2 bit multiplier | 1256 | 64 wrt (vii) | | ix | 2-bit comparator circuit of [73] using CNTFETs | 600 | | | X | Proposed 2-bit comparator | 104 | 82 wrt (ix) | | | | | T | | xi | 2-bit exclusive-OR circuit of [73] using CNTFETs | 504 | | | xii | Proposed 2-bit exclusive-OR | 116 | 76 wrt (xi) | Figure 3.33: Comparison of ternary circuits based on device count #### 3.7.3 Performance Evaluation of 2-bit HO-TALU To evaluate performance of proposed ternary circuits, speed and power are extracted from transient simulations. The average power consumption is measured over a long period of time. For worst case delay determination, all possible output transition delays are measured. Further, due to the increased demand for high-speed, high-throughput computation and complex functionality in mobile environments, reduction of delay and power consumption is very challenging. On account of the trade-off between power consumption and delay, the efficiency of the circuits is evaluated by computing PDP, which is the multiplication of the average power consumption and the maximum delay. Delay, power and PDP of ternary circuits in both 32 nm CNTFET and 32 nm MOSFET technologies, at 0.9 V supply voltage with room temperature, 2.1 fF output load and 250 MHz operating frequency, are listed in Table 3.12. Ternary circuits are also simulated at 1 GHz operating frequency with 3 fF output load and at three different supply voltages. Results obtained from these simulations are listed in [253]. To evaluate performance at architecture level, different HO-TALU data-paths containing decoder, FSB-AHO, TGB-AHE and separate sub block, have been simulated. Delay, power and PDP results are extracted and shown in Table 1 of Appendix II. Comparison of CNTFET-based ternary designs based on PDP is shown in Figure 3.34. Table 3.12 show that the proposed CNTFET-based circuits achieve about two hundred times lower PDP in comparison with that of their CMOS counterparts, which verifies the potential benefit of CNTFET circuits. In comparison with circuits of [73] implemented using CNTFET-based gates of [199], proposed multiplier, comparator and exclusive-OR get reduction in PDP by 75%, 65% and 28%, respectively. But, PDP of sub-modules HAS and FAS has marginally increased by 2% and 5%, respectively, in comparison with their CNTFET-based counterparts presented in [199]. Thus, all 2-bit HO-TALU modules achieve good hardware efficiency with a minor loss of PDP for addition and subtraction operations only, with respect to CNTFET circuits available in the literature. **Table 3.12:** Simulation results of ternary circuits | Circuits | Delay (×10 <sup>-10</sup> S) | Power (×10 <sup>-6</sup> W) | PDP<br>(×10 <sup>-16</sup> J) | |---------------------------------------------------------------|------------------------------|-----------------------------|-------------------------------| | CNTFET-based THA of [199] | 0.69 | 1.01 | 0.69 | | CNTFET-based HAS for addition operation ( <b>proposed</b> ) | 0.71 | 1.02 | 0.72 | | HAS based on CMOS logic gates of [141] for addition operation | 1.77 | 81.35 | 144 | | CNTFET-based TFA of [199] | 0.80 | 1.45 | 1.16 | | CNTFET-based FAS for addition operation (proposed) | 0.82 | 1.49 | 1.22 | | FAS based on CMOS logic gates of [141] for addition operation | 2.45 | 145 | 355 | | 2-bit multiplier of [73] using CNTFETs | 1.96 | 23.3 | 45.67 | | CNTFET-based 2-bit multiplier ( <b>proposed</b> ) | 1.45 | 7.82 | 11.34 | | 2-bit comparator of [73] using CNTFETs | 0.81 | 0.99 | 0.80 | | CNTFET-based 2-bit comparator ( <b>proposed</b> ) | 0.48 | 0.63 | 0.30 | | 1-bit exclusive-OR of [73] using CNTFETs | 0.69 | 1.01 | 0.69 | | CNTFET-based 1-bit exclusive-OR ( <b>proposed</b> ) | 0.69 | 0.60 | 0.41 | Figure 3.34: Comparison of ternary circuits based on PDP ## 3.8 Conclusion In this chapter, design of 2-bit HO-TALU using CNTFETs has been presented. 2-bit HO-TALU has a new AS module which performs both addition and subtraction operations using an adder module only with the help of MUXs. Thus, it eliminates a subtractor module from the conventional architecture. HO-TALU minimizes ternary function expressions and utilizes binary gates along with ternary gates in realization of functional modules: AS, multiplier, comparator and exclusive-OR. As a consequence, the sub-blocks of AS: HAS and FAS use nearly 76% and 82% less transistors, respectively, than conventional designs which contain separate adder and subtractor blocks. Multiplier, comparator and exclusive-OR show reduction in device count by 64%, 82% and 76%, respectively, with respect to their existing counterparts. Results obtained from HSPICE simulator with Stanford model of 32nm CNTFET have shown that all HO-TALU modules achieve great improvement (nearly two hundred times) in PDP with respect to their CMOS-based counterpart, which verifies the potential benefit of CNTFET circuits. In comparison with existing CNTFET-based designs, proposed multiplier, comparator and exclusive-OR get reduction in PDP by 75%, 65% and 28%, respectively. But, PDP of submodules HAS and FAS has marginally increased by 2% and 5%, respectively. Thus, all HO-TALU modules achieve good hardware efficiency with a minor loss of PDP for addition and subtraction operations only, with respect to CNTFET circuits available in the literature. Besides, design of 2-bit HO-TALU is extended to develop a 2-bit HO-TALU slice which could be easily cascaded to construct an N-bit HO-TALU. In the next chapter, TFA which is a basic sub-block of AS, is modified using different circuit techniques to improve its efficiency in terms of PDP. Three TFA designs named as high speed ternary full adder (HS-TFA), low power ternary full adder (LP-TFA), dynamic ternary full adder (DTFA) are presented. In addition, a modified comparator with improved PDP is also presented for modern electronics with CNTFETs. # Performance Boosted Designs of Sub-Blocks of 2-bit Hardware Optimized Ternary ALU (HO-TALU) using CNTFETs # 4.1 Introduction In chapter 3, design of a 2-bit hardware optimized ternary ALU (HO-TALU) has been presented using CNTFETs. This chapter presents performance boosted designs of sub-blocks of CNTFET-based 2-bit HO-TALU using different circuit techniques. Three new designs of ternary full adder (TFA) which is an important sub-block of adder-subtractor (AS) are proposed. These designs are optimized in terms of speed, power and finally power-delay product (PDP). The first TFA design named as high speed TFA (HS-TFA) uses a symmetric pull-up and pull-down networks along with a resistive voltage divider as its integral part, which is configured using transistors. Compared to recently developed TFA available in literature, HS-TFA gets improved speed but high power dissipation. In order to reduce power consumption, a second TFA named as low power TFA (LP-TFA) is proposed. LP-TFA makes use of complimentary pass transistor logic style and achieves low power consumption with marginal decrease in PDP. To get improved PDP further, a third TFA is implemented in dynamic logic. This TFA is named as dynamic TFA (DTFA) which uses a keeper designed for ternary values in order to alleviate charge sharing problem. The realization of all three TFA takes the advantages of inherent binary nature (0 and 1) of input carry leading to simplicity in designs. Next, a new design of comparator module of 2-bit HO-TALU is presented. First, 1-bit comparator is developed using pass transistor logic with reduced number of stages in critical delay path. Then, 1-bit design is utilized to create 2-bit and N-bit comparator where a static binary tree configuration is used to correct the voltage levels. The proposed 2-bit comparator achieves better PDP in comparison with that of available counterparts. This comparator, HS-TFA and DTFA have high driving capability. Moreover, all new TFAs and 2-bit comparator are less sensitive to voltage and temperature variations with respect to existing designs. In section 4.2, designs of TFA which include HS-TFA, LP-TFA and DTFA, along with their simulation results and comparison, are presented. Section 4.3 describes design of ternary comparator with its evaluation, followed by conclusion in section 4.4. # 4.2 Designs of Ternary Full Adder (TFA) A TFA adds three bits (A, B and $C_{in}$ ) in which A and B are significant bits (1-bit ternary numbers) and $C_{in}$ is carry bit generated by the previous bit addition during N-bit operation. In this, the maximum sum of A and B is 4 at least significant position and 5 at other positions, which gives maximum value of $C_{in}$ i.e. logic 1. Therefore, $C_{in}$ never gets logic 2 in N-bit ternary addition. By using this concept, TFA is designed based on the binary nature (0 and 1) of $C_{in}$ with ternary nature of A and B. #### 4.2.1 High Speed TFA (HS-TFA) The pin diagram and block diagram of the first proposed TFA named as HS-TFA are shown in Figure 4.1. The truth table of TFA is given in Table 4.1, where A and B are ternary in nature, and C<sub>in</sub> is binary in nature (0 and 1). HS-TFA has three inputs A, B and C<sub>in</sub>, and two outputs Sum and Carry. Inputs A, B and C<sub>in</sub> are passed through 1-to-6-line ternary decoders DEC1, DEC2 and DEC3, respectively, to generate their unary functions. HS-TFA consists of a Sum generator and a Carry generator to produce Sum and Carry signals. Sum generator contains symmetrical pull-up network (PUN) and pull-down network (PDN) along with resistive voltage divider which is implemented using two constantly switched ON transistors T1 and T2. A block marked as 'PUN for $\Sigma$ in =1, 2, 4 & 5' represents a network that will switched ON when $\sum$ (A+B+Cin) = 1, 2, 4 & 5. Similarly, a block marked as 'PDN for $\Sigma$ in =0, 1, 3 & 4' represents a network that will switched ON when $\Sigma$ (A+B+Cin) = 0, 1, 3 & 4. These blocks are designed based on their switching activities required to generate Sum. The detailed schematic of Sum generator is shown in Figure 4.3 (a). Further, T1 (P-CNTFET) and T2 (N-CNTFET) having same geometry bring about the same resistance and perform voltage division to get logic 1 for Sum. Table 4.2 summarizes how the ON state of corresponding PUN and PDN connect Sum to appropriate voltage source (V<sub>dd</sub> or ground) for each possible combination of A, B and Cin. When $\sum (A+B+Cin) = 0$ , PDN is turned ON which connects Sum to ground through T2. Although T1 is also ON, the PUN is OFF. When $\sum$ (A+B+Cin) = 1, both PUN and PDN are subsequently ON, nodes X2 and X1 are charged to V<sub>dd</sub> and 0, respectively, and T1 and T2 perform voltage division between node voltages of X1 and X2, and produce Sum as $(X1+X2)/2 = V_{dd}/2$ (i.e. voltage level of logic 1). When $\sum$ (A+B+Cin) = 2, PUN is switched ON and connects Sum to $V_{dd}$ through T1. Similarly, for other values of $\sum$ (A+B+Cin), Sum gets the proper value through PUN, PDN or both, as shown in Table 4.2. **Figure 4.1 (a):** Pin diagram of high speed ternary full adder (HS-TFA) **Figure 4.1(b):** Block Diagram of high speed ternary full adder (HS-TFA) **Table 4.1:** Truth table of ternary full adder (TFA) | A | В | C <sub>in</sub> | Sum | Carry | |---|---|-----------------|-----|-------| | 0 | 0 | 0 | 0 | 0 | | 0 | 0 | 1 | 1 | 0 | | 0 | 1 | 0 | 1 | 0 | | 0 | 1 | 1 | 2 | 0 | | 0 | 2 | 0 | 2 | 0 | | 0 | 2 | 1 | 0 | 1 | | A | В | C <sub>in</sub> | Sum | Carry | |---|---|-----------------|-----|-------| | 1 | 0 | 0 | 1 | 0 | | 1 | 0 | 1 | 2 | 0 | | 1 | 1 | 0 | 2 | 0 | | 1 | 1 | 1 | 0 | 1 | | 1 | 2 | 0 | 0 | 1 | | 1 | 2 | 1 | 1 | 1 | | 2 | 0 | 0 | 2 | 0 | | 2 | 0 | 1 | 0 | 1 | | 2 | 1 | 0 | 0 | 1 | | 2 | 1 | 1 | 1 | 1 | | 2 | 2 | 0 | 1 | 1 | | 2 | 2 | 1 | 2 | 1 | Table 4.2: Switching activity for Sum and Carry generator of HS-TFA | $\sum (A+B+C_{in})$ | | | For Sum Generator | | | | For C<br>Gene | - | |---------------------|-----|-----|-------------------|-------|-----|-----|---------------|-------| | | PUN | PDN | $X_1$ | $X_2$ | Sum | PUN | $X_3$ | Carry | | 0 | OFF | ON | 0 | - | 0 | OFF | - | 0 | | 1 | ON | ON | 0 | 2 | 1 | OFF | - | 0 | | 2 | ON | OFF | - | 2 | 2 | OFF | - | 0 | | 3 | OFF | ON | 0 | - | 0 | ON | 2 | 1 | | 4 | ON | ON | 0 | 2 | 1 | ON | 2 | 1 | | 5 | ON | OFF | - | 2 | 2 | ON | 2 | 1 | The Carry generator contains a block marked as 'PUN for $\Sigma$ in = 3, 4 & 5' which represents a network that will switched ON when $\Sigma$ (A+B+Cin) = 3, 4 & 5. This block is designed based on its switching activities required to generate Carry. The detailed schematic of Carry generator is shown in Figure 4.3 (b). Carry generator also contains two transistors T3 and T4, which are always switched ON. They perform voltage division between node voltage $X_3$ and ground, and produce Carry as $X_3/2$ . Switching of PUN which connects Carry to $V_{dd}$ for different combination of A, B and $C_{in}$ , is also given in Table 4.2. When $\sum$ (A+B+C<sub>in</sub>) is more than 2, PUN is switched ON and $X_3$ is charged to $V_{dd}$ , then T3 and T4 make Carry equal to logic 1. If $\sum$ (A+B+C<sub>in</sub>) is 0 or 1, Carry is connected to ground only through T4. Further, a ternary buffer (TB) is used at both outputs Sum and Carry to decouple next stage gate inputs with present stage outputs. Moreover, it provides high performance without sacrificing the overall energy efficiency of TFA. Based on switching activities given in Table 4.2 and the truth table of TFA given in Table 4.1, $X_1$ , $X_2$ and $X_3$ signals are derived and expressed as: $$\begin{split} X_{1} &= 0 * \{ A^{0} (B^{0} \overline{C_{in}^{2}} + B^{1} C_{in}^{0} + B^{2} \overline{C_{in}^{0}}) + A^{1} (B^{0} C_{in}^{0} + B^{1} \overline{C_{in}^{0}} + B^{2} \overline{C_{in}^{2}}) \\ &+ A^{2} (B^{0} C_{in}^{0} + B^{1} \overline{C_{in}^{2}} + B^{2} \overline{C_{in}^{0}}) \} \end{split} \tag{4.1}$$ $$X_{2} = 2 * \{ \overline{A^{0}} (\overline{B^{0}} C_{in}^{0} + \overline{B^{1}} C_{in}^{2} + \overline{B^{2}} * \overline{C_{in}^{0}}) + \overline{A^{1}} (\overline{B^{0}} C_{in}^{2} + \overline{B^{1}} \overline{C_{in}^{0}} + \overline{B^{2}} C_{in}^{0}) + \overline{A^{2}} (\overline{B^{0}} \overline{C_{in}^{0}} + \overline{B^{1}} C_{in}^{0} + \overline{B^{2}} C_{in}^{0}) \}$$ $$(4.2)$$ $$X_{3} = 2*\{\overline{A^{0}}\overline{B^{2}}C_{in}^{0} + \overline{A^{1}}(\overline{B^{2}}\overline{C_{in}^{0}} + B^{0}C_{in}^{0}) + \overline{A^{2}}(B^{0}\overline{C_{in}^{0}} + C_{in}^{0})\}$$ $$(4.3)$$ Where $A^0, A^2, \overline{A^0}, \overline{A^1}$ and $\overline{A^2}$ are the unary functions of A generated by a 1-to-6-line decoder DEC1; $B^0, B^1, B^2, \overline{B^0}, \overline{B^1}$ and $\overline{B^2}$ are the unary functions of B generated by DEC2; and $C^0_{in}, C^2_{in}, \overline{C^0_{in}}$ and $\overline{C^2_{in}}$ are the unary functions of $C_{in}$ generated by DEC3. Figure 4.2 shows the logic diagram of 1-to-6-line ternary decoder which utilizes design concept presented in [213]. The decoder has one input 'a' and six outputs $a^0$ , $a^1$ , $a^2$ , $\overline{a^0}$ , $\overline{a^1}$ and $\overline{a^2}$ which are known as unary functions. It contains a negative ternary inverter (NTI) followed by a binary inverter to generate $a^0$ and $\overline{a^0}$ , a positive ternary inverter (PTI) followed by a binary inverter to produce $\overline{a^2}$ and $a^2$ , finally a binary NOR gate having inputs as $a^0$ and $a^2$ , followed by a binary inverter to get $a^1$ and $\overline{a^1}$ . Table 4.3 shows the unary functions generated by the decoder for input 'a'. Figure 4.2: Logic level diagram of 1-to-6-line ternary decoder Table 4.3: Truth table of 1-to-6 ternary decoder | a | $a^0$ | a <sup>1</sup> | a <sup>2</sup> | $\overline{a}^0$ | $\overline{a}^1$ | $\overline{a^2}$ | |---|-------|----------------|----------------|------------------|------------------|------------------| | 0 | 2 | 0 | 0 | 0 | 2 | 2 | | 1 | 0 | 2 | 0 | 2 | 0 | 2 | | 2 | 0 | 0 | 2 | 2 | 2 | 0 | The schematic diagram of Sum generator and Carry generator of HS-TFA are shown in Figure 4.3. PUN of Sum and Carry generator are realized based on eq. (4.2) and (4.3) using P-CNTFETs. PDN of the Sum generator is constructed based on eq. (4.1) using N-CNTFETs. For each possible combination of A, B and $C_{in}$ , HS-TFA provides a suitable path for the desired output logic values. For instance, when A = 2, B = 2 and $C_{in} = 0$ , then $A^2 = B^2 = Cin^0 = 2$ , and all other true unary functions are 0. In the circuitry of Sum generator, N-CNTFETs whose gate are connected to $A^2$ , $B^2$ and $Cin^0$ switch ON and create a PD path which makes $X_1$ equal to 0. Similarly, P-CNTFETs whose gate are connected to $\overline{A^2}$ , $\overline{B^2}$ and $Cin^2$ turn ON and create a PU path to make $X_2$ equal to 2. T1 and T2 perform voltage division and produce logic 1 at Sum. Now, in PUN of Carry generator, transistors whose gates are connected to $\overline{A^2}$ , $B^2$ and $\overline{C_{in}^0}$ turn ON and create a PU path to generate $X_3$ equal to 2. Then T3 and T4 perform voltage division and produces logic 1 at Carry. Further, the schematic diagram of TB is shown in Figure 4.4, which contains one NTI, one PTI, two binary inverters and two transistors T5 and T6. **Figure 4.3 (a):** Schematic diagram for Sum generator of high speed ternary full adder (HS-TFA) Figure 4.3(b): Schematic diagram for Carry generator of high speed ternary full adder (HS- Figure 4.4: Schematic diagram of ternary buffer (TB) # 4.2.2 Low Power TFA (LP-TFA) The pin diagram and block diagram of the second proposed TFA are shown in Figure 4.5. This TFA is designed for pass transistor logic style in order to achieve low power and named as low power TFA (LP-TFA). Inputs A and B are passed through a 1-to-5-line decoders (DEC1 and DEC2) to generate their unary functions. Input $C_{in}$ is passed through a NTI and a binary inverter. LP-TFA contains a Sum generator to produce Sum and $\overline{Sum}$ signals, and a Carry generator to produce Carry signal. The Sum generator is designed based on complementary pass-transistor logic (CPL) style. It consists of two N-CNTFET based pass-transistor networks; one for each signal rail (Sum and Sum). These N-CNTFET networks pass logic 0 & logic 1 but it degrades logic 2 due to the threshold voltage drop across the N-type transistors. This logic degradation makes swing (or level) restoration necessary for Sum (and Sum) in order to avoid static currents at the subsequent output circuitry. As a consequence, two small P-CNTFETs T1 and T2 are used for level restoration of logic 2. Besides, TB is used to get high speed and high driving capability without sacrificing the overall energy efficiency of the design. Figure 4.5 (a): Pin diagram of low power ternary full adder (LP-TFA) **Figure 4.5 (b):** Block diagram of low power ternary full adder (LP-TFA) Carry generator of LP-TFA contains only one N-CNTFET pass transistor network because no swing restoration is required for Carry signal as its maximum voltage level is $V_{dd}/2$ only. From the truth table given in Table 4.1, K-map for LP-TFA design is drawn and shown in Figure 4.6 where A and B are ternary signals and Cin is the binary signal (0 and 1). Based on K-map, the switch level logic expressions of Sum and Carry are derived and expressed as: $$Sum = P_1 * C_{in}^0 + P_2 * \overline{C_{in}^0}$$ (4.4) Carry = $$P_3 * C_{in}^0 + P_4 * \overline{C_{in}^0}$$ (4.5) Here, signals $P_1$ and $P_2$ are equal to Sum when $C_{in} = 0$ and $C_{in} = 1$ , respectively. Simplified expressions of $P_1$ and $P_2$ are expressed as follows. $$P_{1} = A * B^{0} + (1 * A^{0} + \overline{A^{2}} * \overline{A^{0}})B^{1} + (A^{0} * \overline{A^{2}} + 1 * A^{2})B^{2}$$ (4.6) $$P_2 = A * B^2 + (A^0 + \overline{A^2} + 1 * A^2)B^1 + (\overline{A^2} * \overline{A^0} + 1 * A^0)B^0$$ (4.7) Similarly, signals $P_3$ and $P_4$ are equal to the Carry signal when $C_{in}=0$ and $C_{in}=1$ , respectively, and their simplified expressions are as: $$P_3 = 0*A^0 + 0*B^0 + 0*\overline{A^2}\overline{B^2} + 1*B^2\overline{A^0} + 1*A^2\overline{B^0}$$ (4.8) $$P_4 = 1 * B^2 + 1 * A^2 + 1 * \overline{A^0} \overline{B^0} + 0 * A^0 \overline{B^2} + ) * B^0 \overline{A^2}$$ (4.9) Where $A^0, \overline{A^0}, A^2$ , and $\overline{A^2}$ are the unary functions of A generated by a 1-to-5-line decoder DEC1; $B^0, B^1$ and $B^2$ are the unary functions of B, generated by decoder DEC2. $C^0_{in}$ and $\overline{C^0_{in}}$ are generated by a NTI and a binary inverter. The schematic diagram of a 1-to-5-line decoder is shown in Figure 4.7. It contains one NTI, one PTI and one binary inverter and one binary NOR gate for the generation of unary functions. **Figure 4.6:** K-map of low power ternary full adder (LP-TFA) for (a) Sum (b) Carry Figure 4.7: Schematic diagram of a 1-to-5-line ternary decoder The schematic diagram of Sum generator and Carry generator of CNTFET-based LP-TFA are shown in Figure 4.8. Two transistors T3 and T4 perform voltage division between 0 and $V_{dd}$ and generate $V_x$ equal to $V_{dd}/2$ . N-CNTFET pass transistor based network 1 of Sum generator realizes $P_1$ and $P_2$ based on the eq. (4.6) and (4.7), respectively. According to eq. (4.4), these signals are passed through transistors T5 and T6 to generate Sum. Similarly, N-CNTFET pass transistor based network 2 of Sum generator realizes $\overline{P_1}$ and $\overline{P_2}$ which are passed through transistors T7 and T8 to generate $\overline{Sum}$ . In order to get level restoration for logic 2, transistor T1 and T2 are utilised. Since level restoration is not required for logic 1, these transistors should OFF when Sum and $\overline{Sum}$ are equal to logic 1. For this, the threshold voltage of T1 and T2 are set as 0.55V which is greater than voltage level of logic 1 (0.45V). TB shown in Figure 4.4 is used at Sum and $\overline{Sum}$ for output buffering circuitry. **Figure 4.8(a):** Schematic diagram of Sum generator of low power ternary full adder (LP-TFA) **Figure 4.8(b):** Schematic diagram of Carry generator of low power ternary full adder (LP-TFA) NCNTFET pass transistor based network 3 of Carry generator realizes $P_3$ and $P_4$ based on the eq. (4.8) and (4.9), respectively. Then, according to the eq. (4.5), these signals are passed through transistors T9 and T10 to generate Carry. implementation reduces transistor overhead approximately by 51% for Carry and eliminates the need of one extra power supply $(V_{dd}/2)$ for logic 1 with respect to CPL based CMOS TFA circuit presented in [161]. # 4.2.3 Dynamic TFA (DTFA) The pin diagram and block diagram of the third proposed TFA are shown in Figure 4.9. TFA is designed for dynamic logic style in order to achieve high performance and named as dynamic TFA (DTFA). DTFA has five inputs A, B, $C_{in}$ , CLK and $\overline{CLK}$ , and two outputs Sum and Carry. Inputs A and B are passed through 1-to-4-line decoders (DEC1 and DEC2) to generate their unary functions. Input $C_{in}$ is passed through a NTI and a binary inverter. Two transistors T1 and T2 perform voltage division between 0 and $V_{dd}$ to generate voltage $V_{ddm}$ (equal to $V_{dd}/2$ ) for logic 1 at outputs. DTFA contains a Sum generator and a Carry generator to produce Sum and Carry signals. The Sum generator contains three clock operated transistors T3, T4 and T5, which are responsible for dynamic operation of DTFA. A pull-up network (i.e. PUN1) and a pull-down network (i.e. PDN1) are used to get logic 2 and logic 0 for Sum. DTFA uses a keeper designed for ternary logic (named as ternary keeper) to alleviate charge sharing problems. Besides, TB is used at the output to decouple next stage gate inputs with present stage output as well as to provide high output driving capabilities without sacrificing the overall energy efficiency of the design. Carry generator of TFA contains two clock operated transistors T6 and T7, and one pull-down network (i.e. PDN2) only due to the binary nature (0 and 1) of Carry signal. Similar to Sum generator, it uses ternary keeper and TB at the output. **Figure 4.9(a):** Pin diagram of dynamic ternary full adder (DTFA) Figure 4.9 (b): Block diagram of dynamic ternary full adder (DTFA) The schematic diagram of Sum generator and Carry generator of CNTFET-based DTFA are shown in Figure 4.10. Inputs $A_n$ , $\overline{A_n}$ , $A_p$ and $\overline{A_p}$ are the unary functions of A generated by DEC1; $B_n$ , $\overline{B_n}$ , $B_p$ and $\overline{B_p}$ are the unary functions of B generated by DEC2; and $B_n$ and $\overline{B_n}$ are generated by a NTI and followed binary inverter. The schematic diagram of 1-to-4-line ternary decoder is shown in Figure 4.11. It contains one NTI, one PTI and two binary inverters for the generation of unary functions. PUN1 and PDN1 of Sum generator, and PDN2 of Carry generator are realized based on truth table of TFA given in Table 4.1. For producing Sum equal to 2 and 0, there is always one pull-up (PU) path and pull-down (PD) path exist in Sum generator. For making Carry equal to 0, a PD path exists in Carry generator circuit. To minimize sub-threshold leakage current, PUN and PDN use CNTFETs with smaller diameter which is chosen as 0.626 nm [253]. Figure 4.10(a): Schematic diagram of Sum generator of dynamic ternary full adder (DTFA) **Figure 4.10(b):** Schematic diagram of Carry generator of dynamic ternary full adder (DTFA) Figure 4.11: Schematic diagram of 1-to-4 ternary decoder DTFA operation is divided into two phases- precharge phase and evaluation phase. - 1) Precharge phase: When CLK = 1, Sum and Carry are precharged to $V_{ddm}$ (logic 1) through T5 and T7, respectively. During this phase, evaluation transistors T3, T4 and T6 are OFF and hence, pull-up (to logic 2) and pull-down (to logic 0) paths are disabled. - 2) Evaluation phase: For CLK = 0, precharge transistors T5 and T7 are OFF and, evaluation transistors T3, T4 and T6 are ON. Sum signal is conditionally charged to logic 2 and discharged to logic 0 based on the Sum generator pull-up and pull-down topology, respectively, along with input values. Similarly, Carry signal is conditionally discharged to logic 0 based on the input values and Carry generator pull down topology. If all PU and PD paths are turned OFF, Sum and Carry remain at their precharged values (logic 1) which are further maintained by ternary keepers. The keeper circuit contains transistors T8 and T9, driven by NTI and PTI, respectively as shown in Figure 4.10. Keeper is activated only when Sum (and Carry) is set to logic 1. Sizing of T8 and T9 is kept smaller (i.e. number of CNTs = 2) than the equivalent sizing of other pull-up and pull-down transistors so that desired output level can be obtained. TB shown in Figure 4.4, is used at Sum and Carry for output buffering circuitry. In order to save eight transistors, common NTI and PTI are used in both ternary keeper and TB of Sum generator and Carry generator. #### 4.2.4 Results and Discussion In this section, proposed designs of TFA are analyzed and evaluated under various test conditions using Synopsis HSPICE simulator with 32nm Stanford CNTFET model of [117] which considers practical non-idealities of CNTFET. Details of the Stanford model have been given in section 2.2 of chapter 2. In TFA designs, the chirality vector of CNTFETs utilized for voltage divider is (14, 0). The threshold voltage of these transistors is 0.392 V with the diameter of 1.096 nm. The chirality vector of PUN and PDN transistors of DTFA are (8, 0). The threshold voltage of these transistors is 0.686 V with the diameter of 0.624 nm. The chirality vector of remaining CNTFETs of all TFA designs is (19, 0). The threshold voltage of these transistors is 0.289 V with the diameter of 1.487 nm. The other technology parameters of CNTFET have same values as mentioned in section 2.2 of chapter 2. For comparison of proposed designs, recently published CNTFET-based TFA designs of [212] and [213] are reproduced. Further, design of ternary half adder (THA) presented in [199] leads to energy-efficient and compact design with respect to other CNTFET-based THA circuits. As a consequence, TFA is also implemented using the design methodology of [199] and referred as CNTFET-based TFA of [199]. In this TFA, $V_{dd}/2$ is generated using the same method as used in LP-TFA and DTFA, in order to perform comparison with proposed TFA designs which don't have any extra power supply. For reproduced designs, chirality vector of CNTFETs and value of other device parameters are chosen according to the information given in the respective papers from the literature. #### **Simulation Setup** All the designs are simulated at room temperature, at 250MHz operating frequency and at 0.9 V power supply voltage. Moreover, a capacitor of 2.1fF is connected at all output nodes of the circuit to include loading effects. A complete input pattern including all 324 possible input transitions is applied to the circuit. Total 387 delays which consider all Sum and Carry transitions are measured. For static TFA (HS-TFA, LP-TFA, TFA of [199], [212] and [213]), delay is measured from 50% of voltage level of input to 50% of voltage level of output. For DTFA, delay is calculated from 50% of voltage level of clock signal (CLK) to 50% of voltage level of output because outputs appear only when clock signal makes transition from logic 2 to logic 0 for evaluation phase. The maximum value of measured delay is stated as the delay of the circuit. The power consumption of the circuits includes power from supply (V<sub>dd</sub>) as well as input sources (and clock also for DTFA). To get this, average power from V<sub>dd</sub>, input sources and clock are measured separately with the pattern of 324 input transitions for a long time period, and then added. Although some of input transitions may not alter the values of the output nodes, they could cause switching activities at internal nodes resulting in some power consumption. As a consequence, an input pattern of all possible transitions confirms that measured average power consumption is an accurate estimation of the power consumption of the circuit. Due to the trade-off between power consumption and delay, circuits are also evaluated based on PDP which is computed by multiplication of the average power consumption and maximum delay. Further, to evaluate the performance at TALU architecture level, data-paths available in HO-TALU, which contains decoder, FSB-AHO, TGB-AHE and TFA block, have been simulated. ## **Evaluation of Proposed TFA** Figure 4.12 shows the sample transient waveform of HS-TFA. The first three waveforms represent inputs A, B and $C_{in}$ , and the last two waveforms show Sum and Carry signals which authenticate its correct operation. Transient waveforms of LP-TFA and DTFA are included in Appendix. Figure 4.12: Transient waveform of high speed ternary full adder (HS-TFA) A comparison on delay, power consumption, PDP and device count of CNTFET-based TFA designs is shown in Table 4.4. Simulation results show that HS-TFA operates faster among all the CNTFET-based static designs (LP-TFA, TFA of [199], [212] and [213]). In comparison with TFA of [199], HS-TFA gains reduction in delay and device count by 9% and 50%, respectively, but consumes 69.7% more power. Compared to the design of [212], HS-TFA achieves reduction in delay and device count by 49% and 13%, respectively, but it consumes 36.8% more power. In comparison with TFA of [213], HS-TFA operates 16% faster with 87% reduced Power and 25% less device count. Additionally, performance parameters extracted at architecture level, are shown in Table 2 of Appendix II. According to above reported results, LP-TFA has the lowest power among all the CNTFET-based TFA designs. Compared to the design of [199], LP-TFA consumes less power by 24% with reduction in device count by 52% but it has 20% more delay. In comparison with TFA of [212], it achieves reduction in delay, power and device count by 30.5%, 66.6% and 16%, respectively. Compared to the design of [213], LP-TFA consumes less power by 97% with reduction in device count by 28% but it has 13% more delay. Simulation results listed in Table 4.4 show that DTFA achieves lowest delay and PDP among all the CNTFET-based designs. In comparison with the design of [199], it gains reduction in delay, power and device count by 24%, 15% and 51%, respectively. Compared to the design of [212], DTFA gets advantages in delay, power and PDP by 57.6%, 62.5% and 14%, respectively. Similarly, it shows 98% less power, 30% less delay and 28% less device count, with respect to the TFA design of [213]. According to the conducted simulation results, TFA of [199] and [212] have PDP of same order while TFA of [213] has higher order of PDP, therefore, proposed TFAs are compared with design of [199] and [212] only in the next simulations. **Table 4.4:** Simulation results of CNTFET-based ternary full adder (TFA) designs | Circuits | Delay<br>(×10 <sup>-10</sup> S) | Power PDP (×10 <sup>-6</sup> W) (×10 <sup>-16</sup> J) | | <b>Device Count</b> | |-------------------|---------------------------------|--------------------------------------------------------|------|---------------------| | HS-TFA (proposed) | 1 0.73 | | 5.05 | 106 | | LP-TFA (proposed) | | | 1.45 | 102 | | DTFA (proposed) | 1 ()61 | | 0.99 | 105 | | TFA of [199] | A of [199] 0.80 | | 1.53 | 214 | | TFA of [212] | ΓFA of [212] 1.44 | | 6.26 | 122 | | TFA of [213] 0.87 | | 53.7 | 46.7 | 142 | As the driving capability is an important parameter for a digital circuit, proposed TFA designs are tested under different loading conditions to examine their driving capability. For this purpose, simulations are performed with different values of output load ranging from 2 fF to 6 fF at room temperature with 0.9 V power supply and 250 MHz operating frequency. The delay, power consumption and PDP of TFA designs versus load capacitor are plotted in Figure 4.13. Although HS-TFA has high power consumption and high PDP variation, it shows less delay and its variation compared to that of other static designs (LP-TFA, TFA of [199] and [212]) at all values of output load. Figure 4.13 shows that LP-TFA achieves less power in comparison with that of other TFA designs under all loading condition. In addition, it shows marginal decrease in PDP variation but high delay variation with respect to other static designs at all output loads. It can be seen that DTFA gets lower delay and PDP as well as their variation, than other TFA designs on different loads. Therefore, the superiority of DTFA and HS-TFA becomes more considerable with increased load capacitance, which shows their high driving capability compared to existing TFA designs. To examine the performance of TFA designs at different frequencies, simulations are conducted at operating frequency varying from 100 MHz to 1000 MHz with room temperature, 0.9 V power supply and 2.1 fF output load. The power consumption of ternary circuits is plotted in Figure 4.14. According to the simulation results, proposed TFA designs work properly and, LP-TFA and DTFA consumes less power but HS-TFA has high power consumption compared to existing TFA designs, at different frequencies. **Figure 4.13 (a):** Delay versus output load capacitor plot for five ternary full adder (TFA) designs **Figure 4.13 (b):** Power consumption versus output load capacitor plot for five ternary full adder (TFA) designs **Figure 4.13 (c):** Power-delay product (PDP) versus output load capacitor plot for five ternary full adder (TFA) designs **Figure 4.14:** Power consumption versus operating frequency plot for five ternary full adder (TFA) designs Another important characteristic of TFA designs which should be considered is their susceptibility to voltage and temperature variations. For this, simulations are performed at different supply voltages ranging from 0.7 V to 1.1 V. Other simulation parameters are taken as room temperature, 250 MHz frequency and 2.1 fF output load. PDP computed from this simulation is plotted in Figure 4.15. According to the plotted results, DTFA, LP-TFA and HS-TFA are less sensitive to voltage variation in comparison with their counterpart of [199] and [212]. Further, to examine the sensitivity of the TFA designs to temperature variations, simulations are conducted at different temperatures varying from 0°C to 100°C. Other test parameters are 0.9 V supply voltage, 250 MHz frequency and 2.1 fF output load. The PDP of TFA designs with temperature variation is plotted in Figure 4.16. It can be inferred from the results that DTFA, LP-TFA and HS-TFA operate reliably and outperform existing TFA designs in a vast range of ambient temperature. Hence, proposed TFA can provide suitable solution for high performance, low power and compact TALU designs. **Figure 4.15:** Power-delay product (PDP) versus supply voltage plot for five ternary full adder (TFA) designs **Figure 4.16:** Power-delay product (PDP) versus temperature plot for five ternary full adder (TFA) designs # 4.3 Design of Comparator Module Conventional designs of ternary comparator presented in [73], [206] and [252], generate three primary outputs: GR, LE and EQ that indicate A > B, A < B and A = B conditions, respectively. When A > B, outputs GR, LE and EQ becomes 2, 0 and 0, respectively. For A < B, outputs GR, LE and EQ are 0, 2 and 0, correspondingly. Similarly, when A = B, outputs GR, LE and EQ becomes 0, 0 and 2, respectively. It is observed that only two outputs are sufficient to interpret the magnitude relationship between A and B, as shown in Table 4.5. Therefore, only two outputs GR and EQ are considered for the response of proposed comparator. But, it makes the decoding logic of comparator response complex in those applications where three outputs are desired. **Comparator Outputs Results** GR EQ 2 0 A > B2 0 A = B0 0 A < B2 2 Invalid **Table 4.5:** Decoding of outputs for comparison response ## 4.3.1 1-bit Comparator The pin diagram and block diagram of proposed 1-bit comparator are shown in Figure 4.17. 1-bit comparator compares two 1-bit ternary numbers (A and B) and generates outputs GR and EQ. Inputs A is passed through a NTI and a PTI to generate $A_n$ and $A_p$ , respectively. Similarly, B is passed through a NTI and a PTI to generate $B_n$ and $B_p$ , correspondingly. 1-bit comparator is designed based on pass transistor logic style. It contains network1 and network 2 for generation of GR and EQ, respectively. The truth table of 1-bit comparator is given in Table 4.6. Based on this table, K-maps are drawn for GR and EQ, and shown in Figure 4.18. From K-maps, switch level expressions of these outputs are derived and expressed as: $$GR = 0 * A_n + B_n * A_n A_p + B_p * A_p$$ (4.10) $$EQ = A_n * B_n + (0 * A_n + A_p * A_n) B_n B_p + (0 * A_p + 2 * A_p) B_p$$ (4.11) Figure 4.17. 1-bit ternary comparator (a) pin diagram (b) block diagram **Table 4.6:** Truth table of 1-bit comparator | A | В | GR | EQ | |---|---|----|----| | 0 | 0 | 0 | 2 | | 0 | 1 | 0 | 0 | | 0 | 2 | 0 | 0 | | 1 | 0 | 2 | 0 | | 1 | 1 | 0 | 2 | | 1 | 2 | 0 | 0 | | 2 | 0 | 2 | 0 | | 2 | 1 | 2 | 0 | | 2 | 2 | 0 | 2 | Figure 4.18: K-map for 1-bit comparator The transistor level implementation of pass transistor based network 1 and network 2 of 1-bit comparator are shown in Figure 4.19. These networks are realized based on eq. (4.10) and (4.11). Since the P-CNTFET and N-CNTFET device with same size have same carrier mobility and consequently same current driving capability, both devices are utilized in realization of pass transistor based networks. For all the possible combinations of A and B, circuit of 1-bit comparator provides suitable path that produces desired output logic. For instance, when A = 0, $A_n$ is 2 and transistor T1 is ON, which passes 0 to GR. Now, consider three different values of B. When B = 0, $B_n$ is 2, transistor T5 is ON and passes $A_n$ (logic 2) to EQ. For this case, outputs GR and EQ are set to 0 and 2, respectively, which indicates equal condition. When B = 1, $B_p$ and $B_n$ are 2 and 0 respectively. Transistors T6, T8 and T9 are ON and pass 0 to EQ. In this case, both outputs GR and EQ are set to 0 which indicates lesser condition. Similarly, when B = 2, Transistors T10 and T12 are ON and pass 0 to EQ. Both outputs GR and EQ remains at 0 to show lesser condition. Compared to the 1-bit comparator of [206], proposed one reduces the number of transistor from 32 to 20 (including inverters). Figure 4.19: Schematic diagram of 1-bit comparator ## 4.3.2 Design of N-bit Comparator An N-bit comparator compares two N-bit ternary numbers $A_{N-1}...A_1A_0$ and $B_{N-1}...B_1B_0$ , and produces two outputs $GR_{[N-1:0]}$ and $EQ_{[N-1:0]}$ . It is designed using proposed 1-bit comparator blocks and a binary tree network of [206]. 1-bit comparator generates greater and equal signals indicated by $GR_i$ and $EQ_i$ , respectively, for bit position (i) = 0....N-1. Binary tree network contains binary grouping blocks which combine these signals to form group signals which are defined as follows. $$GR[2j+1:2j] = GR[2j+1] + EQ[2j+1]GR[2j]$$ $$(4.12)$$ $$EQ[2j+1:2j] = EQ[2j+1]EQ[2j]$$ $$(4.13)$$ Where j = 0...N/2 -1. In the tree configuration, this grouping is done at every stage until final greater and equal signals indicated by GR [N-1: 0] and EQ [N-1: 0] are not obtained. Figure 4.20 shows the pin diagram and block diagram of a 2-bit comparator. The first stage contains two 1-bit comparator blocks which generate $GR_i$ and $EQ_i$ for i=0, 1. The second stage contains a binary grouping block which generates complements of $GR_{[1:0]}$ , and $EQ_{[1:0]}$ . To get $GR_{[1:0]}$ and $EQ_{[1:0]}$ , two binary inverters are used at the outputs. Further, to show the implementation of N-bit design, an example of 4-bit comparator is considered. Figure 4.20: 2-bit comparator (a) pin diagram (b) block diagram Figure 4.21 shows the pin diagram and block diagram of a 4-bit comparator. The first stage contains four 1-bit comparator blocks which generate $GR_i$ and $EQ_i$ for i=0...3. The second stage contains two binary grouping blocks which generate $GR_{[1:0]}$ , $EQ_{[1:0]}$ , $GR_{[3:2]}$ and $EQ_{[3:2]}$ . The third stage contains one inverted binary grouping block which generates final outputs $GR_{[3:0]}$ and $EQ_{[3:0]}$ . Figure 4.21: 4-bit comparator (a) pin diagram (b) block diagram The schematic diagram of both binary grouping block and its inverted version are shown in Figure 4.22. They are implemented based on complementary CNTFET logic style. Although 1-bit comparator circuitry generates degraded logic levels at its outputs due to threshold voltage drop across N-CNTFET and P-CNTFET devices, design of binary grouping block provides level restoration and as a consequence, full logic levels are obtained at the final outputs of N-bit design. Compared to the N-bit comparator of [206], proposed one reduces the number of stages in the critical path by eliminating a slower NOR gate and thus, it leads to a high speed design. Figure 4.22(a): Schematic diagram of binary grouping block Figure 4.22(b): Schematic diagram of inverted binary grouping block #### 4.3.3 Results and Discussion In this section, the proposed 2-bit comparator is analyzed and simulated using Synopsis HSPICE simulator with 32nm Stanford CNTFET model under various conditions [117]. The chirality vector of all CNTFETs used in comparator circuit is (19, 0). The threshold voltage of these transistors is 0.289V with the diameter of 1.487 nm. Other technology parameters of CNTFETs have same values as mentioned in section 2.2 of chapter 2. For comparison of proposed comparator design, CNTFET-based comparators of [206] and [252] are reproduced. In addition, 2-bit comparator of [73] is implemented using CNTFET-based ternary gates of [199] as these logic gates outperform other existing CNTFET-based gates, and referred as CNTFET-based comparator of [73] for comparison. For reproduced designs, chirality vector of CNTFETs and value of other device parameters are chosen according to the information given in the respective papers from the literature. Here, ternary design of chapter 3 which was published in [252], is referred as design of [252]. #### **Simulation Setup** Transient simulation is performed at room temperature, at 250MHz operational frequency and at 0.9V supply voltage. Moreover, load capacitor of 2.1fF is used at all output nodes of the circuit for the simulation. The average power consumption is measured over a long period of time. For worst case delay determination, all possible output transition delays are measured. The delay is measured at the 50% point of the rising edge of input data to the 50% point of the rising edge of the comparator output. On account of the trade-off between power consumption and delay, PDP is computed by the multiplication of the average power and worse case delay. #### **Evaluation of Proposed 2-bit Comparator** Figure 4.23 shows the sample transient waveform of 2-bit comparator. The first four waveforms represent inputs $A_0$ , $A_1$ , $B_0$ and $B_1$ , and the last two waveforms show outputs GR and EQ which confirms its correct operation. A comparison on delay, power consumption, PDP and device count of CNTFET-based 2-bit comparator designs is shown in Table 4.7. It can be seen from the simulation results that proposed design achieves 16% less delay, 14% less power and 29% less PDP with 34% less device count in comparison with that of comparator of [206]. Compared to comparator of [252], proposed one achieves reduction in delay, power, PDP and device count by 14%, 33%, 48% and 43%, respectively. In comparison with comparator of [73], comparator gets advantages in delay, power, PDP and device count by 49%, 57%, 91% and 79%, respectively. Further, simulation results are also obtained with different simulation set-up in which 500 MHz operating frequency, 2 fF output load and 20 nS transient time are set. These results are presented in [254]. Figure 4.23: Transient waveform of 2-bit comparator **Table 4.7:** Simulation results of 2-bit comparator circuits | Circuits | Delay<br>(×10 <sup>-10</sup> S) | Power (×10 <sup>-6</sup> W) | PDP (×10 <sup>-16</sup> J) | Device<br>Count | |---------------------|---------------------------------|-----------------------------|----------------------------|-----------------| | Proposed comparator | 0.41 | 0.42 | 0.17 | 54 | | Comparator of [206] | 0.49 | 0.49 | 0.24 | 82 | | Comparator of [252] | 0.48 | 0.63 | 0.30 | 104 | | Comparator of [73] | 0.81 | 0.99 | 0.80 | 600 | To examine driving capability of comparator designs, they are simulated using various output load capacitors, ranged from 2 fF up to 6 fF, at 250 MHz operating frequency with room temperature and 0.9 V supply voltage. The delay, power consumption and PDP of 2-bit comparator designs against load capacitor variation are plotted in Figure 4.24. According to the plotted result, delay and power of proposed 2-bit comparator are lower than that of other designs for all output load capacitors. In addition, the superiority of proposed 2-bit comparator becomes more considerable by increasing the load capacitance, which demonstrates its high driving capability. To evaluate the performance of comparator designs at different frequencies, simulations are conducted at operating frequency ranging from 100 MHz to 1000 MHz with 0.9 power supply, room temperature and 2.1fF output load. Figure 4.25 plots power consumption of 2-bit comparator designs versus operating frequency. Simulation results show that proposed 2-bit comparator operates reliably and consumes less power in comparison with other designs at all frequencies. Comparator designs are also evaluated at different supply voltages, ranging from 0.7 V to 1.1 V. For this, simulations are performed at room temperature with 250 MHz frequency and 2.1 fF output load. PDP of 2-bit comparator computed from this simulation are plotted in Figure 4.26. It can be inferred from results that the proposed 2-bit design is robust to voltage variations, and has low PDP at all supply voltages, in comparison with other designs. Further, to examine the sensitivity of the comparator designs to temperature variations, simulations are conducted at different temperatures varying from $0^{\circ}$ C to $100^{\circ}$ C. Other test parameters are 0.9 V supply voltage, 250 MHz frequency and 2.1 fF output load. PDP of 2-bit comparator with temperature variation is plotted in Figure 4.27. The plotted result demonstrates less susceptibility of proposed 2-bit comparator design to temperature variations with respect to other designs. Simulation has also been performed to assess process variations at two extreme corners where the CNT diameter of all transistors used in the design is taken as a $\pm 10\%$ variation from the original value. Performance parameters including delay, power consumption and PDP, extracted from these simulations are shown in Table 3 of Appendix II. According to the reported results, the proposed comparator leads to an efficient design in comparison with other existing counterparts. Figure 4.24 (a): Delay versus output load capacitor plot for 2-bit comparator circuits **Figure 4.24 (b):** Power consumption versus output load capacitor plot for 2-bit comparator circuits **Figure 4.24 (c):** Power-delay product (PDP) versus output load capacitor plot for 2-bit comparator circuits **Figure 4.25:** Power consumption versus operating frequency plot for 2-bit comparator circuits **Figure 4.26:** Power-delay product (PDP) versus supply voltage plot for 2-bit comparator circuits **Figure 4.27:** Power-delay product (PDP) versus temperature plot for 2-bit comparator circuits ## 4.4 Conclusion This chapter has presented three novel designs of CNTFET-based TFA which is a basic subblock of AS functional module of 2-bit HO-TALU, using different circuit techniques. The first TFA named as HS-TFA contains a symmetric pull-up and pull-down networks along with a resistive voltage divider as its integral part, which is configured using transistors. Compared to most energy efficient TFA available in literature, HS-TFA has high driving capability and gets reduction in delay by 9% but it shows high power dissipation. The second TFA named as LP-TFA has been developed using complimentary pass transistor logic style. This LP-TFA shows reduction in power by 24% with improvement in PDP by 5%, but it has 20% more delay. The third TFA named as dynamic TFA (DTFA) has been implemented based on dynamic logic, which uses a ternary keeper to compensate charge loss due to charge sharing problem. DTFA has high driving capability and achieves reduction in power, delay and PDP by 24%, 15% and 35%, respectively. But it needs CNTFET devices with smaller diameter (0.626 nm) additionally in order to reduce charge leakage. All three TFAs have been designed based on inherent binary nature (0 and 1) of input carry, which leads to reduced device count in designs. Further, new design of 1-bit comparator has been developed using pass transistor logic with reduced number of stages in critical delay path. This design has been used to create 2-bit and N-bit comparator where a static binary tree configuration has been utilized to correct the voltage levels. The proposed 2-bit comparator has high driving capability and achieves 29% reduction in PDP with 34% less device count compared to that of its counterpart available in literature. But, it has two output signals to check greater, lesser and equal conditions, which make the decoding logic of comparator response complex in those applications where three outputs (one for each condition) are desired. Apart from these, all new TFAs and 2-bit comparator show less susceptibility to voltage and temperature variations with respect to existing designs. In the next chapter, design of 2-bit power optimized ternary ALU (PO-TALU) using CNTFETs is presented. 2-bit PO-TALU makes use of new complementary CNTFET-based design style and a low complexity encoder in implementation of ternary functions to achieve low power consumption. In addition, it incorporates adder-subtractor-exclusive-OR module which leads to compact TALU structure. # Design of 2-bit Power Optimized Ternary ALU (PO-TALU) using CNTFETs ## 5.1 Introduction Minimizing area, delay and cost have always been main constraints for VLSI designers, but recently, reducing the power consumption has also received considerable attention due to increasing values of integration density, and the need of portable and reliable circuits. There is a tremendous interest towards compact and portable applications such as notebook and laptop computers, which require high throughput and immensely increased capability. Therefore, low power consumption has become one of the important constraints in designing of modern processor. Further, an ALU is an important part of a digital computer where it performs all arithmetic and logical operations. Modern CPU and graphics processing unit need very powerful ALU. This chapter presents a design of 2-bit power optimized ternary ALU (PO-TALU) using CNTFETs. 2-bit PO-TALU functional modules: adder-subtractor-exclusive-OR (ASE) and multiplier, are designed using new complementary CNTFET-based binary computational unit and a low complexity encoder (in comparison with prior design of [213] and [252]). ASE eliminates exclusive-OR and subtractor modules from the conventional architecture. Multiplier uses a new efficient carry-add (CA) block in place of ternary half adder. As a result, PO-TALU design gets significant improvements in terms of power and power-delay product (PDP) with device count compared to existing designs. Design of 2-bit PO-TALU slice is shown so that parallel N-bit PO-TALU can be constructed with N/2 slices connected in cascade. The rest of the chapter will be organized as follows. In section 5.2, architecture and functions of PO-TALU is presented. Section 5.3 demonstrates minimization and realization of PO-TALU ternary functions. In section 5.4, design and implementation of PO-TALU functional modules are described. Section 5.5 presents the extension of PO-TALU for 2-bit PO-TALU slice. Section 5.6 gives simulation results and comparison with existing CNTFET-based designs, followed by the conclusion in section 5.7. #### 5.2 Architecture & Functions of 2-bit PO-TALU Figure 5.1 shows the pin diagram and architecture of proposed 2-bit PO-TALU. The ternary data inputs from A (A<sub>1</sub> A<sub>0</sub>) are combined with the ternary data inputs from B (B<sub>1</sub> B<sub>0</sub>) and operations are performed to generate one of the following outputs: Sum/Diff/XOR & Carry/Borrow, PROD, GR & EQ, 'A.B' and 'A+B'. Two select inputs (S<sub>1</sub> and S<sub>0</sub>) are used to select one desired operation, as described in Table 5.1. PO-TALU performs six arithmetic and three logic operations. The arithmetic operations include addition (ADD), subtraction (SUB), increment by logic 1 (INC), decrement by logic 1 (DEC), multiplication (MUL) and comparison (COMP). The logic functions include AND, exclusive-OR (XOR) and OR. Compared to TALU architecture of [252], the proposed one eliminates exclusive-OR module, and performs two more arithmetic functions: INR and DEC, without adding any extra functional module. Moreover, the required logic 1 for INC and DEC operations is generated by using two constantly switched-on transistors T1 and T2. T1 (P-CNTFET) and T2 (N-CNTFET) with same geometry and equal threshold voltage bring the same resistance due to the equal mobility of N and P substances. As a consequence, they can be utilized to perform voltage division for generation of V<sub>dd</sub>/2 (i.e. voltage level of logic 1). Hence, PO-TALU does not require an extra power supply for V<sub>dd</sub>/2. Term 'Bi' is used to refer binary logic gates in ternary system. PO-TALU design is composed of the following main components: 1-to-6-line ternary decoder, function select logic block with active low outputs (FSB-ALO), transmission gate block with active low enable (TGB-ALE) and separate functional modules like ASE, multiplier and comparator etc. Figure 5.1 (a): Pin diagram of 2-bit PO-TALU Figure 5.1 (b): Architecture of 2-bit PO-TALU Table 5.1: Function table of 2-bit PO-TALU | Selectio | Function | | |----------|----------|----------| | $S_1$ | $S_0$ | Function | | 0 | 0 | ADD | | 0 | 1 | SUB | | 0 | 2 | INC | | 1 | 0 | DEC | | 1 | 1 | MUL | | 1 | 2 | COMP | | 2 | 0 | XOR | | 2 | 1 | AND | | 2 2 | | OR | #### 1-to-6-line Ternary Decoder A 1-to-6-line ternary decoder presented in chapter 4 (Figure 4.2) is used here and shown in Figure 5.2. It contains one negative ternary inverter (NTI), one positive ternary inverter (PTI) and three binary inverter and one binary NOR gate for the generation of unary functions. As shown in Figure 5.1(b), DEC1 (1-to-6-line decoder) generates unary functions $A_0^0$ , $A_0^1$ , $A_0^2$ , $\overline{A_0^0}$ , $\overline{A_0^1}$ and $\overline{A_0^2}$ for $A_0$ , DEC2 generates $A_1^0$ , $A_1^1$ , $A_1^2$ , $\overline{A_1^0}$ , $\overline{A_1^1}$ and $\overline{A_1^2}$ for $A_1$ . Similarly, DEC3 generates $B_0^0$ , $B_0^1$ , $B_0^2$ , $\overline{B_0^0}$ , $\overline{B_0^1}$ and $\overline{B_0^2}$ for $B_0$ , and DEC4 generates $B_1^0$ , $B_1^1$ , $B_1^2$ , $\overline{B_1^0}$ , $\overline{B_1^1}$ and $\overline{B_1^2}$ for $B_1$ . These unary functions are fed to different functional modules for desired outputs. Figure 5.2: Logic level diagram of 1-to-6-line ternary decoder ## **Function Select Logic Block with Active Low Outputs (FSB-ALO)** The logic level diagram of function select logic block with active low outputs (FSB-ALO) is shown in Figure 5.3. FSB-ALO has two select inputs $S_0$ and $S_1$ , and nine active low outputs ADD, SUB, INR, DEC, MUL, COMP, XOR, AND & OR. It comprises binary NAND gates and two 1-to-3-line decoders (DEC1 and DEC2). Here, decoder design of [199] is used because only unary functions are required for $S_1$ and $S_0$ . DEC1 produces signals $S_0^0$ , $S_0^1$ and $S_0^2$ for $S_0$ . Similarly, DEC2 produces signals $S_1^0$ , $S_1^1$ and $S_1^2$ for $S_1$ . These signals are applied to NAND gates for desired outputs. FSB-ALO selects one particular TALU operation depending on the bit combination of $S_0$ and $S_1$ , as described in Table 5.1. Consider the case when $S_1S_0 = 12$ . $S_1^0$ , $S_1^1$ and $S_1^2$ are 0, 2 and 0, respectively. Similarly, $S_0^0$ , $S_0^1$ and $S_0^2$ are 2, 0 and 0, respectively. In the array of NAND gates, $S_1^0$ NAND gate makes its output COMP equal to logic 0 because both of its inputs ( $S_1^1$ and $S_0^2$ ) are equal to logic 2. While all other NAND gates have one (or more) input equal to logic 0, which makes their outputs equal to logic 2. The active low 'COMP' further enables TGB-ALE3 to pass input data for comparison operation, as shown in Figure 5.1(b). Thus, for each possible combination of $S_0$ and $S_1$ , there is only one particular output which is active low for respective TALU function. **Figure 5.3:** Logic level diagram of function select logic block with active low outputs (FSB-ALO) #### **Transmission Gate Block with Active Low Enable (TGB-ALE)** Figure 5.4 shows logic level diagram of transmission gate block with active low enable (TGB-ALE). A TGB-ALE contains an array of transmission gates (TGs), which connect decoder output lines generated for input A and B, to the data inputs of functional block. This array is activated when input enable (EN) is low. In Figure 5.1(b), the number of TGs used in the array is mentioned with each individual TGB-ALE. A TG is implemented using the parallel connection of P-CNTFET and N-CNTFET. In a TG array, the gate of P-CNTFET of all TGs is connected to EN and the gate of all N-CNTFETs is connected to $\overline{\text{EN}}$ which is generated by using a binary inverter. When EN is equal to logic 0, the P-CNTFET gate is at ground and the N-CNTFET gate is at $V_{dd}$ , thereby, both transistors conduct and there is a closed path between input (I/P) and output (O/P) of TG. When EN is equal to logic 2, the P- CNTFET gate is at $V_{dd}$ and the N-CNTFET gate is at ground, both transistors are OFF and there is an open circuit between I/P and O/P of TG. Thus, the active low value of EN enables TGB-ALE. TGB-ALE gets value of EN from one or more outputs of FSB-ALO through either some logic or directly, as shown in Figure 5.1(b). FSB-ALO outputs MUL, COMP, AND & OR are connected directly to EN2 of TGB-ALE2, EN3 of TGB-ALE3, EN4 of TGB-ALE4 and EN5 of TGB-ALE5, respectively. Since TGB-ALE1 is associated with the ASE functional module which performs five operations ADD, SUB, INC, DEC and XOR, its enable input EN1 must be active low whenever one of these operations is desired. For this, a small logic circuit containing three binary NAND gates and one binary NOR gate is added with FSB-ALO. FSB-ALO outputs ADD and XOR are applied to the first NAND gate G0, SUB and DEC are applied to the second NAND gate G1 and, INC and DEC are connected to third NAND gate G2. The outputs of gate G0, G1 and G2 are then passed to a NOR gate G3 which generates EN1 of TGB-ALE1. Once the active low value of EN enabled a TGB-ALE, ternary input data values are transferred to the respective functional module and desired operation is performed. Further, INC and DEC functions of the PO-TALU are expressed as 'A+1' and 'A-1' respectively. These operations are executed by ASE module which performs 'A+B' and 'A-B'; therefore, B (B<sub>1</sub>B<sub>0</sub>) should be set to logic 1. To set B<sub>1</sub>B<sub>0</sub> as 01, two N-CNTFETs T3 and T4 with two TGs: TG1 and TG2 are used. As shown in Figure 5.1(b), the gate of T3, T4 and TG transistors are connected to the output of gate G2. Whenever FSB-ALO output INC (or DEC) is made active low, output of gate G2 is set to logic 2, T3 and T4 are activated while TG1 and TG2 are disabled, which pass logic 0 (ground) and logic 1 (i.e. $V_{dd}$ /2 generated by T1 and T2) in place of B<sub>1</sub> and B<sub>0</sub>, respectively, to the decoders. In this way, logic 1 is passed to the ASE module to perform increment and decrement by logic 1. When output of gate G2 is set to logic 0, T3 and T4 are OFF and, TG1 and TG2 are activated which pass B<sub>1</sub> and B<sub>0</sub> to the decoders so that desired operation can be performed on input data A (A<sub>1</sub>A<sub>0</sub>) and B (B<sub>1</sub>B<sub>0</sub>). **Figure 5.4:** Logic level diagram of transmission gate block with active low enable (TGB-ALE) #### 5.3 Minimization and Realization of 2-bit PO-TALU Functions 2-bit PO-TALU uses ternary K-map method for ternary function minimization. The detail of this K-map method is provided in chapter 3. Further, a conventional method for realization of ternary function employs a ternary to binary decoder; a binary computation unit and an encoder for converting binary outputs back to ternary outputs. It is observed that encoder and computation circuit can be improved upon (when compared to ternary designs presented in [213] and [252]). Figure 5.5 shows the block diagram of ternary function implementation for PO-TALU. In order to achieve low power design, its functional modules: ASE and multiplier are designed using complementary CNTFET-based binary computation unit and pass transistor-based encoder circuit. Design of encoder is described below. Figure 5.5: Ternary function implementation for 2-bit PO-TALU #### **Design of Ternary Encoder** Figure 5.6 shows the design of ternary encoder and its symbol. This design contains two transistors T1 and T2, which have identical parameters and operates as a resistive voltage divider. As a consequence, output node 'Out' can be expressed as: Out = $$\frac{X_1 + X_2}{2}$$ (5.1) Here, $X_1$ and $X_2$ are encoder inputs having binary nature (0 and 2). The truth table of encoder is shown in Table 5.2. According to (1), Out is the average value of $X_1$ and $X_2$ . If $X_1$ and $X_2$ becomes 0, Out will also be equal to 0. In the same manner, if $X_1$ and $X_2$ becomes 2, Y will also be 2. At last, if $X_1$ and $X_2$ becomes 0 and 2, respectively, Y will be equal to 1. Similar scenario will occur if $X_1$ and $X_2$ becomes 2 and 0, respectively. Further, the proposed encoder design uses pass transistor logic and eliminates a direct path from $V_{dd}$ to ground which results in less power dissipation compared to that of encoder design presented in [213]. In this way, the proposed encoder leads to low power designs of TALU functional modules (as shown in section 5.6.2). **Table 5.2:** Truth table of ternary encoder | $X_1$ | $\mathbf{X}_2$ | Out | |-------|----------------|-----| | 0 | 0 | 0 | | 0 | 2 | 1 | | 2 | 2 | 2 | Figure 5.6: Design of ternary encoder # 5.4 Design & Implementation of 2-bit PO-TALU Functional Module ## 5.4.1 Adder-Subtractor-Exclusive-OR (ASE) Module The proposed ASE module differs from the adder-subtractor (AS) module of [252] in such a way that it performs XOR operation also along with addition and subtraction operations using one common adder structure, by utilizing the concept of Modulo-3 addition of ternary numbers [251]. Figure 5.7 shows the block diagram of the proposed ASE module where operations are performed on $A_1A_0$ and $B_1B_0$ . Outputs $S_0/D_0/E_0$ and $S_1/D_1/E_1$ represent least significant bit (LSB) and most significant bit (MSB) of Sum/Difference/XOR output of POTALU, and output $C_1/B_1$ represents the Carry/ Borrow output of POTALU. $M_0$ and $M_1$ are binary mode inputs used for operation selection of ASE module, as demonstrated in Table 5.3. Figure 5.7: Block diagram of adder-subtractor-exclusive-OR (ASE) module **Table 5.3:** Function select table for ASE module | $M_0$ | $\mathbf{M}_1$ | Operation | |-------|----------------|-------------| | 0 | 2 | Addition | | 2 | 2 | Subtraction | | 0 | 0 | XOR | As shown in Figure 5.1(b), signal $M_0$ is generated through the binary NAND gate G1, whose inputs are SUB and DEC, and $M_1$ is directly connected to the XOR output of FSB-ALO. For different operations, following logics are satisfied based on the logic states of $M_0$ and $M_1$ . - a) The value of $M_0$ is logic 2 whenever SUB (or DEC) is active low otherwise it is logic 0. - b) The value of $M_1$ is logic 0 whenever XOR is active low otherwise it is logic 2. - c) When ADD (or INC) function is selected, $M_0 = 0$ and $M_1 = 2$ which cause ASE module to perform addition. - d) For SUB (or DEC) function, both $M_0$ and $M_1$ are logic 2 and ASE module works as a subtractor. - e) If XOR function is selected, both $M_0$ and $M_1$ are logic 0 and ASE module performs addition with zero output carry. The ASE module contains two sub-blocks: half adder-subtractor-exclusive-OR (HASE) and full adder-subtractor-exclusive-OR (FASE) to execute operations with 2-bit ternary numbers. As shown in Figure 5.7, HASE performs operations on $A_0$ and $B_0$ , and generates $S_0/D_0/E_0$ with $C_0/B_0$ . The value of $C_0/B_0$ is transferred to FASE that adds $A_1$ and $B_1$ with $C_0/B_0$ , and produces $S_1/D_1/E_1$ and $C_1/B_1$ . #### Half Adder-Subtractor-Exclusive-OR (HASE) Block The proposed HASE performs addition, subtraction and XOR operations using a ternary half adder (THA) circuit only with the help of multiplexers and pass transistors. Design of HASE is shown in Figure 5.8. Based on ternary addition rules given in Table 5.4 and Table 5.2 (the truth table of encoder), the proposed design finds encoder input variables $X_{1S}$ and $X_{2S}$ for Sum (S<sub>0</sub>) output and, $X_{1C}$ and $X_{2C}$ for Carry (C<sub>0</sub>) output of THA. The values of $X_{1S}$ , $X_{2S}$ , $X_{1C}$ and $X_{2C}$ for different input combinations are also included in Table 5.4. The K-maps of $X_{1S}$ , $X_{2S}$ , $X_{1C}$ and $X_{2C}$ are shown in Figure 5.9. From the K-maps, the simplified expressions of these encoder inputs are derived and expressed as follows. $$X_{1s} = \overline{\overline{A_0^0} B_0^2 + \overline{A_0^1} B_0^1 + \overline{A_0^2} B_0^0}, \quad X_{2s} = \overline{\overline{B_0^2} A_0^1 + \overline{A_0^2} B_0^1 + \overline{A_0^0} B_0^0}$$ $$X_{1C} = 0, X_{2C} = \overline{A_0^0 + B_0^0 + A_0^1 B_0^1}$$ (5.2) Table 5.4: Addition rules for ternary half adder (THA) | $\mathbf{A_0}$ | $\mathbf{B}_0$ | X <sub>1S</sub> | $X_{2S}$ | S <sub>0</sub> (Sum) | X <sub>1C</sub> | $X_{2C}$ | C <sub>0</sub> (Carry) | |----------------|----------------|-----------------|----------|----------------------|-----------------|----------|------------------------| | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 0 | 1 | 0 | 2 | 1 | 0 | 0 | 0 | | 0 | 2 | 2 | 2 | 2 | 0 | 0 | 0 | | 1 | 1 | 2 | 2 | 2 | 0 | 0 | 0 | | 1 | 2 | 0 | 0 | 0 | 0 | 2 | 1 | | 2 | 2 | 0 | 2 | 1 | 0 | 2 | 1 | **Figure 5.8 (a)**: Schematic diagram of $S_0/D_0/E_0$ generator of half adder-subtractor-exclusive-OR (HASE) **Figure 5.8 (b):** Schematic diagram of $C_0/B_0$ generator of half adder-subtractor-exclusive-OR (HASE) Figure 5.9: K-maps of $X_{1S},\,X_{2S},\,X_{1C}$ and $X_{2C}$ for ternary half adder (THA) For the subtraction operation, based on ternary subtraction rules provided in Table 5.5 and Table 5.2, encoder input variables $X_{1D}$ & $X_{2D}$ for Difference ( $D_0$ ) generation and, $X_{1B}$ & $X_{2B}$ for Borrow ( $D_0$ ) generation are found. Table 5.5 also includes the values of $D_0$ and $D_0$ and $D_0$ are shown in Figure 5.10. From the K-maps, the simplified expressions of $D_0$ and $D_0$ are derived, and expressed as: $$X_{1D} = \overline{A_0^0 B_0^1 + \overline{A_0^1 B_0^2 + \overline{A_0^2 B_0^0}}}, X_{2D} = \overline{A_0^1 B_0^1 + A_0^2 B_0^2 + A_0^0 B_0^0}$$ $$X_{1B} = 0, X_{2B} = \overline{A_0^2 + B_0^0 + A_0^1 B_0^1}$$ (5.3) **Table 5.5**: Subtraction rules for ternary half subtractor (THS) | $\mathbf{A_0}$ | $\mathbf{B}_0$ | X <sub>1D</sub> | $X_{2D}$ | D <sub>0</sub> (Difference) | $X_{1B}$ | $X_{2B}$ | B <sub>0</sub> (Borrow) | |----------------|----------------|-----------------|----------|-----------------------------|----------|----------|-------------------------| | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 0 | 1 | 2 | 2 | 2 | 0 | 2 | 1 | | 0 | 2 | 0 | 2 | 1 | 0 | 2 | 1 | | 1 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | | 1 | 2 | 2 | 2 | 2 | 0 | 2 | 1 | | 2 | 0 | 2 | 2 | 2 | 0 | 0 | 0 | | 2 | 1 | 0 | 2 | 1 | 0 | 0 | 0 | | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | According to (5.2) and (5.3), THA can produce $X_{1D}$ , $X_{2D}$ , $X_{1B}$ and $X_{2B}$ also having proper selection of input variables. As a consequence, HASE adds multiplexers with THA to perform subtraction operation, utilizing the design concept of HAS presented in [252]. For XOR function, it adds two transistors T1 and T2 with a binary inverter at the output of encoder2 to make its Carry/Borrow output ( $C_0/B_0$ ) equal to logic 0 so that propagated Carry does not affect the Modulo-3 addition of $A_1$ and $B_1$ in FASE. When mode input $M_1$ = 2, T1 is ON, T2 is OFF and passes the output of encoder2 (generated Carry/Borrow) to $C_0/B_0$ for addition and subtraction operation. When $M_1$ = 0, T1 is OFF, T2 is ON and passes 0 to $C_0/B_0$ for XOR operation. In this way, when $M_0$ = 0 and $M_1$ = 2, HASE operates as THA and compute the functions given by (5.2) for encoder inputs which are then passed through Encoder1 and Encoder2 to produce final Sum and Carry outputs, respectively. Similarly, when $M_0$ = 2 and $M_1 = 2$ , HASE behaves as a ternary half subtractor (THS) and computes the function given by (5.3) for encoder inputs, then encoder outputs give final subtraction outputs. At last, when $M_0 = 0$ and $M_1 = 2$ , HASE operates as an XOR circuit and performs modulo-3 addition with zero output Carry. **Figure 5.10:** K-maps of $X_{1D}$ , $X_{2D}$ , $X_{1B}$ and $X_{2B}$ for ternary half subtractor (THS) #### Full Adder-Subtractor-Exclusive-OR (FASE) Block The proposed FASE performs addition, subtraction and XOR operations using one ternary full adder (TFA) circuit only with the help of multiplexers and pass transistors. A TFA adds three bits in which two are significant bits (1-bit ternary numbers) and third one is Carry bit ( $C_0$ ) generated by the previous bit addition during N-bit operation. In this, the maximum sum of two 1-bit ternary numbers is 4 at least significant position and 5 at other positions, which gives maximum value of $C_0$ i.e. logic 1. Therefore, $C_0$ never gets logic 2 in ternary addition [257]. By using this concept, TFA is designed based on the binary nature (0 and 1) of $C_0$ . Consequently, proposed TFA eliminates the need of a 1-to-6-line decoder for $C_0$ and uses only one NTI and a binary inverter for $C_0^0$ and $C_0^1$ . Design of FASE is shown in Figure 5.11. Similar to HASE, the proposed FASE implements encoder input variables ( $X_{3S}$ , $X_{4S}$ , $X_{3C}$ and $X_{4C}$ ) for realization of TFA, and adds multiplexers and two transistors (with one binary inverter) at the output of Encoder2 to construct subtractor and exclusive-OR structures, respectively. Design rules for addition operation including $X_{3S}$ , $X_{4S}$ , $X_{3C}$ and $X_{4C}$ are given in Table 5.6. The K-maps for $X_{3S}$ , $X_{4S}$ , $X_{3C}$ and $X_{4C}$ are shown in Figure 5.12. From the K-maps, the simplified expressions of these encoder inputs are derived and expressed as follows. $$X_{3S} = \overline{C_0^0 Y_1 + C_0^1 Y_2}, \qquad X_{4S} = \overline{C_0^0 \overline{Y}_2 + C_0^1 \overline{Y}_3}$$ $$X_{3C} = 0, \qquad X_{4C} = \overline{B_1^0 \overline{A_1^2} + A_1^0 \overline{B_1^2} + C_0^0 [(\overline{A_1^2} + \overline{B_1^0}) (A_1^0 + \overline{B_1^2})]}$$ (5.4) Where $$Y_{1} = \overline{(\overline{A_{1}^{0}} + \overline{B_{1}^{0}})} \quad (\overline{A_{1}^{1}} + \overline{B_{1}^{2}}) \quad (\overline{A_{1}^{2}} + \overline{B_{1}^{1}}), \quad Y_{2} = \overline{(\overline{A_{1}^{0}} + \overline{B_{1}^{2}})} \quad (\overline{A_{1}^{1}} + \overline{B_{1}^{1}}) \quad (\overline{A_{1}^{2}} + \overline{B_{1}^{0}})$$ $$Y_{3} = \overline{(\overline{A_{1}^{0}} + \overline{B_{1}^{1}})} \quad (\overline{A_{1}^{1}} + \overline{B_{1}^{0}}) \quad (\overline{A_{1}^{2}} + \overline{B_{1}^{2}})$$ **Figure 5.11 (a):** Schematic diagram of $S_1/D_1/E_1$ generator of full adder-subtractor-exclusive OR (FASE) **Figure 5.11 (b):** Schematic diagram of $C_1/B_1$ generator of full adder-subtractor-exclusive OR (FASE) **Figure 5.12 (a):** K-maps of $X_{3S}$ and $X_{4S}$ for ternary full adder (TFA) **Table 5.6**: Addition rules for ternary full adder (TFA) | $\mathbf{A_1}$ | <b>B</b> <sub>1</sub> | C <sub>0</sub> | $X_{3S}$ | X <sub>4S</sub> | S <sub>1</sub> (Sum) | X <sub>3C</sub> | X <sub>4C</sub> | C <sub>1</sub> (Carry) | |----------------|-----------------------|----------------|----------|-----------------|----------------------|-----------------|-----------------|------------------------| | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 0 | 0 | 1 | 0 | 2 | 1 | 0 | 0 | 0 | | 0 | 1 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | | 0 | 1 | 1 | 2 | 2 | 2 | 0 | 0 | 0 | | 0 | 2 | 0 | 2 | 2 | 2 | 0 | 0 | 0 | | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 2 | 1 | | 1 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | | 1 | 0 | 1 | 2 | 2 | 2 | 0 | 0 | 0 | | 1 | 1 | 0 | 2 | 2 | 2 | 0 | 0 | 0 | | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 2 | 1 | | 1 | 2 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | | 1 | 2 | 1 | 0 | 2 | 1 | 0 | 2 | 1 | | 2 | 0 | 0 | 2 | 2 | 2 | 0 | 0 | 0 | | 2 | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 1 | | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | | 2 | 1 | 1 | 0 | 2 | 1 | 0 | 2 | 1 | | 2 | 2 | 0 | 0 | 2 | 1 | 0 | 2 | 1 | | 2 | 2 | 1 | 2 | 2 | 2 | 0 | 2 | 1 | **Figure 5.12 (b):** K-maps of $X_{3C}$ and $X_{4C}$ for ternary full adder (TFA) To examine the structural symmetry present between TFA and ternary full subtractor (TFS), encoder input variables $X_{3D}$ and $X_{4D}$ for Difference (D<sub>1</sub>) output, and $X_{3B}$ and $X_{4B}$ for Borrow (B<sub>1</sub>) output of TFS are found based on design rule for subtraction given in Table 5.7. This table also includes the values of $X_{3D}$ , $X_{4D}$ , $X_{3B}$ and $X_{4B}$ . The K-maps for these encoder inputs are shown in Figure 5.13. From the K-maps, the simplified expressions are derived and expressed as follows. $$X_{3D} = \overline{C_0^0 Z_2 + C_0^1 Z_1}, \qquad X_{4D} = \overline{C_0^0 Z_3 + C_0^1 \overline{Z_2}}$$ $$X_{3B} = 0, \qquad X_{4B} = \overline{B_1^0 \overline{A_1^0 + A_1^2 \overline{B_1^2} + C_0^0 [(\overline{A_1^0 + B_1^0}) (A_1^2 + \overline{B_1^2})]}$$ (5.5) Where $$\begin{split} Z_1 &= \overline{(\overline{A_1^0} + \overline{B_1^2})} \quad (\overline{\overline{A_1^1}} + \overline{\overline{B_1^0}}) \quad (\overline{\overline{A_1^2}} + \overline{\overline{B_1^1}}), \quad Z_2 &= \overline{(\overline{A_1^0} + \overline{B_1^0})} \quad (\overline{\overline{A_1^1}} + \overline{\overline{B_1^1}}) \quad (\overline{\overline{A_1^2}} + \overline{\overline{B_1^2}}) \\ Z_3 &= \overline{(\overline{A_1^0} + \overline{B_1^1})} \quad (\overline{\overline{A_1^1}} + \overline{\overline{B_1^2}}) \quad (\overline{\overline{A_1^2}} + \overline{\overline{B_1^0}}) \end{split}$$ Eq. (5.4) and (5.5) confirm that TFA and TFS have same schematic with variation in inputs. Depending upon the value of mode input $M_0$ and $M_1$ , the proposed FASE receives selective inputs and accordingly, computes the function given by eq. (5.4) and (5.5) to perform addition and subtraction operation, respectively. For XOR operation, FASE performs addition of $A_1$ and $B_1$ with $C_0$ which is equal to logic 0, generated from the HASE. In this way, it completes Modulo-3 addition for XOR function of $A_1$ and $B_1$ , with generation of zero output carry. The operation mechanism of the proposed FASE is similar to that of proposed HASE. **Table 5.7**: Subtraction rules for ternary full subtractor (TFS) | $\mathbf{A_1}$ | <b>B</b> <sub>1</sub> | C <sub>0</sub> | X <sub>3D</sub> | X <sub>4D</sub> | D <sub>1</sub> (Difference) | X <sub>3B</sub> | $X_{4B}$ | B <sub>1</sub> (Borrow) | |----------------|-----------------------|----------------|-----------------|-----------------|-----------------------------|-----------------|----------|-------------------------| | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 0 | 0 | 1 | 2 | 2 | 2 | 0 | 2 | 1 | | 0 | 1 | 0 | 2 | 2 | 2 | 0 | 2 | 1 | | 0 | 1 | 1 | 0 | 2 | 1 | 0 | 2 | 1 | | 0 | 2 | 0 | 0 | 2 | 1 | 0 | 2 | 1 | | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 2 | 1 | |----------------|-----------------------|----------------|----------|----------|-----------------------------|-----------------|-----------------|-------------------------| | A <sub>1</sub> | <b>B</b> <sub>1</sub> | C <sub>0</sub> | $X_{3D}$ | $X_{4D}$ | D <sub>1</sub> (Difference) | X <sub>3B</sub> | X <sub>4B</sub> | B <sub>1</sub> (Borrow) | | 1 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 1 | 1 | 1 | 2 | 2 | 2 | 0 | 2 | 1 | | 1 | 2 | 0 | 2 | 2 | 2 | 0 | 2 | 1 | | 1 | 2 | 1 | 0 | 2 | 1 | 0 | 2 | 1 | | 2 | 0 | 0 | 2 | 2 | 2 | 0 | 0 | 0 | | 2 | 0 | 1 | 0 | 2 | 1 | 0 | 0 | 0 | | 2 | 1 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | | 2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | | 2 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 2 | 2 | 1 | 2 | 2 | 2 | 0 | 2 | 1 | Figure 5.13 (a): K-maps of $X_{3D}$ and $X_{4D}$ for ternary full subtractor (TFS) **Figure 5.13 (b):** K-maps of $X_{3B}$ and $X_{4B}$ for ternary full subtractor (TFS) It is worth mentioning that by performing addition, subtraction and XOR operations using ASE module, PO-TALU design saves one complete exclusive-OR functional module in comparison with TALU design of [252]. Along with this saving, it also saves a subtractor functional module in comparison with TALU design presented in [73]. # **5.4.2** Multiplier Module Figure 5.14 shows implementation of the multiplier functional module (decoded unary functions for input data are not shown). This module performs multiplication between $A_1A_0$ and $B_1B_0$ , and produces product of four bits $M_3M_2M_1M_0$ . It contains 1-bit multiplier, THA, TFA and a new block named as carry-adder (CA), for partial product generation, left shift operations and summation of all shifted partial products. These blocks are described below. Design of CA is shown in Figure 5.15. It is used for the addition of intermediate carry bits which have only two logic values 1 and 0. There are two inputs $C_1$ and $C_2$ , and one output $O_{CA}$ which is ternary in nature. The truth table of CA is shown in Table 5.8. Based on table 5.2 and table 5.8, the proposed design finds encoder input variables $X_{1CA}$ and $X_{2CA}$ . The values of $X_{1CA}$ and $X_{2CA}$ are also included in Table 5.8. The K-maps for $X_{1CA}$ and $X_{2CA}$ are shown in Figure 5.16. The simplified expressions of these encoder inputs are derived from the K-maps, and expressed as: $$X_{1CA} = \overline{C_{1n} + C_{2n}}, \ X_{2CA} = \overline{C_{1n} \cdot C_{2n}}$$ (5.6) Figure 5.14: Block diagram of multiplier functional module Figure 5.15: Logic level diagram of carry add (CA) Table 5.8: Truth table of carry add (CA) | C <sub>1</sub> | $\mathbb{C}_2$ | X <sub>1CA</sub> | X <sub>2CA</sub> | O <sub>CA</sub> | |----------------|----------------|------------------|------------------|-----------------| | 0 | 0 | 0 | 0 | 0 | | 0 | 1 | 0 | 2 | 1 | | 1 | 0 | 0 | 2 | 1 | | 1 | 1 | 2 | 2 | 2 | **Figure 5.16:** K-maps of $X_{1CA}$ and $X_{2CA}$ for CA The schematic diagram of ternary 1-bit multiplier is shown in Figure 5.17. Based on design rule of multiplication provided in Table 5.9 and Table 5.2, the proposed design finds encoder input variables $X_{1p0}$ , $X_{2p0}$ for the output product ( $P_0$ ) and, $X_{1c0}$ and $X_{2c0}$ for output carry ( $C_0$ ). The values of $X_{1p0}$ , $X_{2p0}$ , $X_{1c0}$ and $X_{2c0}$ are given in Table 5.9. The K-maps for $X_{1p0}$ , $X_{2p0}$ , $X_{1c0}$ and $X_{2c0}$ are shown in Figure 5.18. The simplified expressions of these encoder inputs derived from the K-maps, and expressed as follows. $$X_{1P0} = A_0^2 B_0^1 + A_0^1 B_0^2, \quad X_{2P0} = \overline{\overline{A_0^0} + \overline{B_0^0}}$$ $$X_{1C0} = 0, \quad X_{2C0} = \overline{(\overline{A_0^2} + \overline{B_0^2})}$$ (5.7) Figure 5.17 (a): Schematic diagram for P<sub>0</sub> (Product) generator of ternary 1-bit multiplier **Figure 5.17(b):** Schematic diagram for C<sub>0</sub> (Carry) generator of ternary 1-bit multiplier **Table 5.9:** Design rules for ternary 1-bit multiplication | $\mathbf{A_0}$ | $\mathbf{B}_0$ | $X_{1P0}$ | $X_{2P0}$ | P <sub>0</sub> (Product) | $X_{1C0}$ | $X_{2C0}$ | C <sub>0</sub> (Carry) | |----------------|----------------|-----------|-----------|--------------------------|-----------|-----------|------------------------| | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | | 0 | 1 | 0 | 2 | 1 | 0 | 0 | 0 | | 0 | 2 | 2 | 2 | 2 | 0 | 0 | 0 | | 1 | 1 | 0 | 2 | 1 | 0 | 0 | 0 | | 1 | 2 | 0 | 0 | 0 | 0 | 2 | 1 | | 2 | 2 | 0 | 2 | 1 | 0 | 2 | 1 | **Figure 5.18:** K-maps of $X_{1P0}$ , $X_{2P0}$ , $X_{1C0}$ and $X_{2C0}$ for ternary 1-bit multiplier The proposed 1-bit multiplier computes the functions given by (5.7) and generates $X_{1P0}$ , $X_{2P0}$ , $X_{1C0}$ and $X_{2C0}$ . $X_{1P0}$ , $X_{2P0}$ are fed to an Encoder1 which produces $P_0$ . Similarly, $X_{1C}$ and $X_{2C}$ are fed to Encoder2 which generates $C_0$ . Further, the implementation of THA and TFA are same as HASE and FASE, respectively, excluding multiplexers and pass transistors. The proposed multiplier functional module reduces no. of transistors at two levels. At first level, it uses only two CA blocks instead of four HA blocks used in multiplier design presented in [73] and [252]. Additionally, design of CA contains less number of transistors compared to HA. At second level, it uses proposed encoder-based 1-bit multiplier, THA and TFA, which are very compact in comparison with their counterparts present in [73] and [252]. ## **5.4.3** Comparator Module A comparator module is a circuit that performs comparison between A $(A_1A_0)$ and B $(B_1B_0)$ and produces two outputs represented by GR and EQ. Decoding of these two outputs for comparison response is shown in Table 5.10. Design of 2-bit comparator presented in chapter 4 is used here. This design contains two pass transistor based 1-bit ternary comparators, one binary grouping block and two binary inverters for non-inverted outputs. **Table 5.10.:** Decoding of outputs for comparison response | Com | Results | | |-----|---------|---------| | GR | EQ | Results | | 2 | 0 | Greater | | 0 | 2 | Equal | | 0 | 0 | Lesser | | 2 | 2 | Invalid | Design of logic functional modules TAND and TOR are same as described in chapter 3. # 5.5 Implementation of 2-bit PO-TALU Slice 2-bit PO-TALU design is extended to build a 2-bit PO-TALU slice which can be cascaded for n/2 times to construct an N-bit TALU. Figure 5.19 shows the pin diagram of the 2-bit PO-TALU slice. Compared to 2-bit PO-TALU, it has some extra inputs named as cascaded signals: Carry<sub>c</sub>/Borrow<sub>c</sub>, GR<sub>c</sub> and EQ<sub>c</sub>. To include these inputs to realize 2-bit PO-TALU slice, PO-TALU design is modified. Modified ASE (MASE) functional module is shown in Figure 5.20. MASE uses FASE instead of HASE to deal with cascaded Carry/Borrow signal (i.e. Carry\_/Borrow\_c). For N-bit TALU, the cascaded configuration of MASE is shown in Figure 5.21. For addition and subtraction operations, the operating mechanism of MASE is same as MAS module of 2-bit TALU slice presented in chapter 3. For XOR operation, each MASE module performs modulo-3 addition with zero input and output Carry signal. ## Select lines (common to S<sub>1</sub> and S<sub>0</sub> of Next Slice) Figure 5.19: Pin diagram of 2-bit PO-TALU slice Figure 5.20: Block Diagram for modified adder-subtractor-exclusive-OR (MASE) **Figure 5.21:** Cascaded configuration for modified adder-subtractor-exclusive-OR (MASE) of N-bit PO-TALU For modification of comparator module, grouping logic method presented in chapter 4 is utilized. Multiplier module is extended based on the design methodology of [62]. Logic modules of PO-TALU which use ternary AND (TAND) and ternary OR (TOR) gates only, are not required any modifications. #### 5.6 Results and Discussion In this section, proposed 2-bit PO-TALU design is analyzed and simulated using HSPICE simulator with the Stanford model for 32 nm CNTFET [117]. This standard model has been demonstrated in section 2.2 of chapter 2. Further, the chirality vector of constantly switched ON transistors used in the generation of $V_{dd}/2$ for logic 1 as well as in proposed encoder, is (14, 0). The diameter of these transistors is 1.096 nm with the threshold voltage of 0.392 V. The chirality vector of all remaining transistors used in the PO-TALU design is (19, 0). The threshold voltage of these transistors is 0.289 V with the diameter of 1.487 nm. Other technology parameters of CNTFET have same values as mentioned in section 2.2 of chapter 2. To compare the performance of the proposed circuits described in the previous sections, the CNTFET-based ternary circuits presented in [199] [212] [213] [252] [255] [256] and [257] are reproduced and simulated. Ternary designs of [199] and [252] use an extra power supply (i.e. V<sub>dd</sub>/2), which has been eliminated using constantly switched ON transistors having the chirality vector of (14, 0). The diameter of the CNTFETs and the value of the other device parameters are chosen according to the information provided in the respective papers of the literature. Here, ternary designs of chapter 3 which were published in [252] are referred as designs of [252]. Similarly, ternary designs of chapter 4 which were published in [255] [256] and [257], are referred as designs of [255], [256] and [257], respectively. #### **5.6.1** Functional Verification of 2-bit PO-TALU For functional verification of 2-bit PO-TALU, sub-circuits as well as entire design of PO-TALU are tested through transient simulations. The simulated waveform of the proposed FASE is shown in Figure 5.22. The first three waveforms represent inputs $A_1$ , $B_1$ and $C_0$ (input Carry/Borrow). When mode inputs $M_0 = 0$ and $M_1 = 2$ , FASE performs addition $(A_1+B_1+C_0)$ and generates outputs $S_1$ (Sum) and $C_1$ (Carry), which are shown by the fourth and fifth waveforms respectively. Similarly, when $M_0 = M_1 = 2$ , FASE performs subtraction $(A_1-B_1-C_0)$ and generates two outputs $D_1$ (Difference) and $B_1$ (Borrow), which are shown by the fifth and sixth waveforms respectively. At last, when $M_0 = M_1 = 0$ , FASE performs modulo-3 addition with zero output carry to execute XOR operation. For this, it produces $E_1$ (i.e. $A_1$ XOR $B_1$ ) and $C_1$ (Carry\_XOR), which are displayed in the remaining two waveforms. Depending upon the value of mode inputs $M_0$ and $M_1$ , FASE performs correct ternary addition, subtraction and XOR operations and thus, the functionality of FASE is verified. Similarly, the simulated transient waveforms included in Appendix, confirms the correct functionality of proposed HASE, CA, 1-bit multiplier, and logic function modules (only TOR is shown). ## 5.6.2 Performance Evaluation of 2-bit PO-TALU To evaluate performance of proposed ternary circuits, speed and power are extracted from transient simulations. The average power consumption is measured over a long period of time. For worst case delay determination, all possible output transition delays are measured. On account of the trade-off between power consumption and delay, the efficiency of the circuits is evaluated by computing power-delay product (PDP), which is the multiplication of the average power consumption and the maximum delay. Simulations are performed at room temperature, at 250 MHz operating frequency and at 0.9 V power supply voltage with output load capacitor of 2.1 fF. Further, to evaluate the performance at TALU architecture level, datapaths of PO-TALU, which contains decoder, FSB-ALO, TGB-ALE and functional module, have been simulated. A comparison on delay, power consumption, PDP and device count of CNTFET-based adder circuits is shown in Table 5.11. Simulation results show that the proposed THA achieve great improvement in power by 68%, with reduction in delay by 8% and 11% compared to the THA of [199] and HAS of [252], respectively. In addition, it shows reduction in device count by 41% and 53% with respect to the THA of [199] and HAS of [252], respectively. Figure 5.22: Transient waveform of full adder-subtractor-exclusive-OR (FASE) **Table 5.11:** Simulation results of CNTFET-based ternary adder circuits | Circuits | <b>Delay</b> (×10 <sup>-10</sup> S) | Power (×10 <sup>-6</sup> W) | PDP (×10 <sup>-16</sup> J) | <b>Device Count</b> | |---------------|-------------------------------------|-----------------------------|----------------------------|---------------------| | Proposed THA | 0.63 | 0.47 | 0.30 | 66 | | THA of [199] | 0.69 | 1.47 | 1.02 | 112 | | Proposed HASE | 0.66 | 0.49 | 0.32 | 82 | | HAS of [252] | 0.71 | 1.48 | 1.05 | 142 | | Proposed TFA | 0.83 | 0.82 | 0.68 | 114 | | TFA of [256] | 0.61 | 1.63 | 0.99 | 105 | | TFA of [257] | 1.00 | 1.45 | 1.45 | 102 | | TFA of [199] | 0.80 | 1.91 | 1.53 | 214 | | TFA of [255] | 0.73 | 6.89 | 5.05 | 106 | | TFA of [212] | 1.44 | 4.35 | 6.26 | 122 | | TFA of [213] | 0.87 | 53.7 | 46.7 | 142 | | Proposed FASE | 0.87 | 0.86 | 0.75 | 140 | | FAS of [252] | 0.82 | 1.95 | 1.61 | 280 | Similarly, the proposed HASE (for ADD operation) consumes 66% less power with 4% and 7% less delay compared to the THA of [199] and HAS of [252], respectively. Additionally, it gets decrement in device count by 26% and 42% in comparison with the THA of [199] and HAS of [252], respectively. According to the reported results of Table 5.11, the proposed TFA shows lowest power among all the CNTFET-based TFA designs. Compared to the design of [256], it consumes less power by 50% but it has increase in delay and device count by 26% and 7%, respectively. In comparison with TFA of [257], it achieves 44% reduction in power and 20% reduction in delay but device count is increased by 10%. Compared to the design of [199], the proposed TFA shows improvement in power and device count by 57% and 47% with an increase in delay by 3% only. In comparison with TFA of [255], it saves power by 88% but shows 12% and 7% increase in delay and device count, respectively. Compared to TFA of [212], it achieves reduction in power, delay and device count by 81%, 42% and 7%, respectively. Similarly, the proposed TFA shows improvement in power, delay and device count by 81%, 5% and 20%, respectively, with respect to the design of [213]. In comparison with FSA of [252], it gets 58% reduction in power and 59% reduction in device count with comparable delay performance. Table 5.11 also shows that the proposed FASE (for ADD operation) gets advantages in power and delay with device count compared to other designs. Simulation results of CNTFET-based multiplier circuits are listed in Table 5.12. According to Table 5.12, the proposed 1-bit and 2-bit multipliers have lowest power among all CNTFET-based multiplier designs. The proposed 1-bit multiplier gets improvement in power, delay and device count by 70%, 5% and 37%, respectively, compared to its counterpart presented in [199]. Similarly, the proposed 2-bit multiplier shows 62% reduction in power, 31% reduction in delay and 65% reduction in device count with respect to 2-bit multiplier of [252]. Ternary circuits are also simulated at 500MHz operating frequency with 1fF output load and small transient time interval. Results obtained from these simulations are listed in [260]. Additionally, performance parameters extracted from the simulations done at architecture level, are shown in Table 4 of Appendix II. Further, as seen in Table 5.11, PDP of TFAs presented in [199] [212] [255] [256] and [257] are of the same order while PDP of design presented in [213] is of higher order, therefore the proposed TFA is compared with these TFA designs only in the next simulations. **Table 5.12:** Simulation results of CNTFET-based multiplier circuits | Circuits | Delay<br>(×10 <sup>-10</sup> S) | Power (×10 <sup>-6</sup> W) | PDP (×10 <sup>-16</sup> J) | Device<br>Count | |---------------------------|---------------------------------|-----------------------------|----------------------------|-----------------| | Proposed 1-bit multiplier | 0.51 | 0.29 | 0.15 | 50 | | 1-bit multiplier of [199] | 0.54 | 0.97 | 0.52 | 80 | | Proposed 2-bit multiplier | 1.00 | 2.94 | 2.94 | 458 | | 2-bit multiplier of [252] | 1.45 | 7.82 | 11.34 | 1296 | To examine the driving capability, the proposed designs are tested under different loading conditions. Simulations are performed with different values of output load ranging from 2fF to 6fF at room temperature with 0.9V power supply and 250 MHz operating frequency. The delay, power consumption and PDP of TFA designs versus load capacitor are plotted in Figure 5.23. The plotted results show that the proposed design outperforms all other designs in terms of power and PDP at all values of output load. **Figure 5.23 (a):** Delay versus output load capacitor plot for six ternary full adder (TFA) designs **Figure 5.23 (b):** Power consumption versus output load capacitor plot for six ternary full adder (TFA) designs **Figure 5.23 (c):** Power-delay product (PDP) versus output load capacitor plot for six ternary full adder (TFA) designs To evaluate the performance of proposed designs at different frequencies, simulations are performed at operating frequency varying from 100 MHz to 1000 MHz with room temperature, 0.9 V power supply and 2.1 fF output load. The power consumption of TFA designs with different frequency is shown in Figure 5.24. For THA and multiplier designs, the power consumption results with different frequency are shown in Figure 1 and Figure 2 of Appendix II. According to the plotted results, proposed designs work properly, and consume less power compared to all other existing designs, at all frequencies. The proposed designs are tested under voltage variations to check their sensitivity to these variations. Simulation is performed at different supply voltages ranging from 0.7 V to 1.1 V. Other simulation parameters are taken as room temperature, 250 MHz frequency and 2.1fF output load. For TFA designs, PDP computed from this simulation is plotted in Figure 5.25. PDP results of THA and multiplier are plotted in Figure 3 and Figure 4 of Appendix II. According to the plotted results, proposed designs are less sensitive to voltage variation in comparison with their counterparts. Simulations are also conducted at different temperatures varying from 0°C to 100°C. Other test parameters are 0.9 V supply voltage, 250 MHz frequency and 2.1 fF output load. PDP with temperature variation is shown in Figure 5.26. It can be inferred from the results that the proposed designs operate reliably and outperforms existing designs in a vast range of ambient temperatures. **Figure 5.24.** Power consumption versus operating frequency plot for six ternary full adder (TFA) designs **Figure 5.25.** Power-delay product (PDP) versus supply voltage plot for six ternary full adder (TFA) designs **Figure 5.26.** Power-delay product (PDP) versus temperature plot for six ternary full adder (TFA) designs ## 5.7 Conclusion This chapter has presented a 2-bit PO-TALU in CNTFET technology. PO-TALU functional modules: ASE and multiplier have been designed using new complementary CNTFET-based binary computational unit and a low complexity encoder. ASE eliminates an exclusive-OR module and subtractor module from the conventional architecture. Multiplier uses a new efficient CA block in place of THA. In comparison with existing energy efficient CNTFET-based designs, HSPICE simulation results have shown that the sub-blocks of ASE: HASE and FASE consume 66% and 47% less power. HASE shows reduction in delay and device count by 4% and 26%, correspondingly. FASE shows 25% reduction in device count but it has 29% more delay. Sub-block of multiplier module: 1-bit multiplier shows reduction in power, delay and device count by 70%, 5% and 37%, respectively. ASE and multiplier are less sensitive to voltage and temperature variations. Design of 2-bit PO-TALU has been modified to implement 2-bit PO-TALU slice which could be easily cascaded to form an N-bit PO-TALU. # Design of High Speed Content Addressable Memory (CAM) cells using CNTFETs #### 6.1. Introduction Content addressable memory (CAM) is an application specific memory that allows its access based on the stored data rather than a physical address location. CAM performs parallel data comparison with data storage, and the result of this comparison is determined by the state of the match lines. There are two types of CAM: Binary CAM (BCAM) and Ternary CAM (TCAM). BCAM is capable of storing and searching two logic states: 0 and 2. This cell performs exact-match searches and is mainly used for tag comparison in cache memory. TCAM is capable of storing and searching three logic states: 0, 2 and don't care (X). Hence, TCAM provides an added flexibility of pattern matching with the use of X. An alternate design of a TCAM using three-valued circuit structure (i.e. 3CAM) has been presented in the literature in order to reduce cell area. A 3CAM works on true valued ternary logic values: 0, 1 and 2, where logic 1 represent don't care (X) state. TCAM are popular mainly for realizing network applications such as packet forwarding and packet classification [230]. In particular, high performance network routers require a lot of fast TCAM cells to get desired fast look-up operation in larger routing tables [241]. Therefore, design of fast and compact CAM structure continues to be of the highest priority for real time applications. In this chapter, BCAM and TCAM cells designed based on low capacitance search logic of [235] are presented in CNTFET technology. A new three-valued CAM (3CAM) cell is also presented. This cell uses CNTFETs with two different threshold voltages in implementation of low capacitance search network which leads to fast and compact CAM design with respect to CNTFET based 3CAM cell recently reported in the literature. In section 6.2, design and implementation of CNTFET-based BCAM, TCAM and 3CAM cells are presented. Section 6.3 provides simulation results and comparison with the existing designs, followed by the conclusion in section 6.4. # 6.2 Design of CAM Cells ## 6.2.1 Binary CAM (BCAM) Cell Figure 6.1 shows a schematic diagram of CNTFET-based BCAM cell implemented using nine transistors and named as 9T BCAM cell. It is capable of storing and searching two logic states: 0 and 2. 9T BCAM consists of a basic 6T SRAM cell for data storage, and a search network for data comparison. SRAM cell comprises of a latch containing transistors T1, T2, T3 and T4, along with two access transistors T5 and T6. T5 and T6 are turned ON whenever a word line WL is activated for a read or write operation, and connect the cell to the complementary data line columns DL and $\overline{DL}$ . The data is stored at the nodes Q and $\overline{Q}$ of the cell. Figure 6.1. Schematic diagram of 9T Binary CAM (BCAM) cell #### **Read/Write Operation** For read operation, DL and $\overline{DL}$ lines are precharged to high, and left floating. When WL is high, T5 and T6 are ON and voltage levels of Q and $\overline{Q}$ are transferred to DL and $\overline{DL}$ , and data stored in cell is read. During this operation, both storage nodes Q and $\overline{Q}$ remain unchanged. For write operation, desired data and its complement are placed on DL and $\overline{DL}$ . When WL is high, T1 and T2 are ON and voltage levels of DL and $\overline{DL}$ are transferred to Q and $\overline{Q}$ , and data is written into the cell. For a successful read and write to the cell, SRAM transistors should be properly sized [244]. The sizing ratio of latch pull-up transistor (T1 and T3) to access transistor (T5 and T6) is taken as 0.5, and the sizing ratio of latch pull-down transistor (T2 and T4) to access transistor (T5 and T6) is taken as 1.5. In CNTFET, sizing is decided by the number of tubes. Consequently, T1 and T3 are used with one tube, T2 and T4 are used with three tubes, and the number of tubes used for T5 and T6 is two. The chirality vector for P-CNTFETs (T1 and T3) and N-CNTFETs (T2, T4, T5 and T6) of SRAM cell are chosen as (16, 0) and (19, 0) respectively, for the best-combined performance in terms of stability, power consumption, and write time of SRAM cell [244]. #### **Search Operation** The search network of 9T BCAM cell is designed based on low-capacitance search logic to speed up compare operation. It contains three transistors T7, T8 and T9 with chirality vector of (19, 0). The number of tubes is three. 9T-BCAM compares the data stored at Q and its complement ( $\overline{Q}$ ) with the data placed on search lines SL and its complement ( $\overline{SL}$ ), respectively. When the value stored at Q matches with the value at SL, logic 0 is passed through either T7 or T8 to the node $V_x$ which is the gate of T9 and therefore, T9 is turned OFF and match-line ML is disconnected from ground for a match condition. Similarly, when data at Q doesn't match with SL, $V_x$ is charged to 0.9 V (logic 2) through either T7 or T8, which turns ON T9 and shorts ML with ground for a mismatch condition. For instant, when SL = 1, $\overline{SL} = 0$ , T7 is OFF and T8 is ON, which transfers the voltage level of $\overline{Q}$ to $V_x$ . For Q = 2, $\overline{Q}$ is logic 0 and hence, logic 0 is transferred to $V_x$ for match condition. When Q = 0, $\overline{Q}$ is logic 2 and hence, logic 2 is transferred to $V_x$ for mismatch case. Similarly, when SL = 0, $\overline{SL} = 2$ , T7 is ON and T8 is OFF, which transfers the voltage level of Q to $V_x$ . For Q = 0, logic 0 is transferred to $V_x$ for match condition and when Q = 2, logic 2 is transferred to $V_x$ for mismatch condition. ## 6.2.2 Ternary CAM (TCAM) Cell Figure 6.2 shows a schematic diagram of CNTFET-based TCAM cell implemented using 16 transistors and named as 16T TCAM cell. This cell is capable to store and search three logic states: 0, 2 and don't care (X). These states are encoded by two bits as shown in Table 6.1. 16T TCAM incorporates two SRAM cells for data storage and one compare network for data comparison. These SRAM cells store '02', '20', and '22' for the cell storage of 0, 2 and don't care (X), respectively. This 'X' value represents a stored don't care. The state '00' is not used and never allowed for data storage. The design parameters of SRAM cell are same as that of 9T BCAM. The read and write operation of TCAM cell is similar to BCAM. | <b>Table 6.1.</b> | Ternary end | oding for | 16T ternary | CAM | (TCAM) | cell | |-------------------|-------------|-----------|-------------|-----|--------|------| |-------------------|-------------|-----------|-------------|-----|--------|------| | Townswy State | Store | d data | Search data | | | |---------------|-------|--------|-------------|--------|--| | Ternary State | $Q_0$ | $Q_1$ | $SL_0$ | $SL_1$ | | | 0 | 0 | 2 | 0 | 2 | | | 2 | 2 | 0 | 2 | 0 | | | X | 2 | 2 | 0 | 0 | | Figure 6.2.: Schematic diagram of 16T ternary CAM (TCAM) cell ## **Search Operation** The compare network of 16T-TCAM uses low-capacitance search logic in order to achieve high speed search operation. It contains four transistors T13, T14, T15 and T16 with chirality vector of (19, 0). The gate of T16 is connected to one control line indicated by 'SG'. This control line is generated by NORing of search line SL0 and SL1, and shared by all the cells in the same column of the memory. The cell compares data stored at node $Q_0$ and $Q_1$ , with the search data placed on the $SL_0$ and $SL_1$ . For global masking, it performs searching with X by setting both $SL_0$ and $SL_1$ to 0. The state '22' is not allowed as a searching state. When $SL_0 = SL_1 = 0$ , SG line is set to logic 2 and T16 is turned ON. In this case, T13 and T14 are OFF, and node $V_y$ which represents gate of T15, is connected to ground through T16. Thus SG line removes floating condition at $V_y$ when bits are globally masked. A 16T TCAM cell behaves similar to 9T BCAM cell whenever 0 and 2 is stored or being searched. For instance, when $SL_0=0$ and $SL_1=2$ , T13 is OFF and T14 is ON, which transfers the logic level of $\overline{Q_1}$ to $V_y$ . For $Q_0=0$ and $Q_1=2$ , $\overline{Q_1}$ is logic 0 and hence, 0 is transferred to $V_y$ which turns OFF T15 and disconnects match-line ML from ground to indicate a match case. When $Q_0=2$ and $Q_1=0$ , $\overline{Q_1}$ is logic 2 and hence, $V_y$ is set at logic 2 which turns ON T15 and connects ML to ground to indicate a mismatch case. In the same way, when $SL_0=2$ and $SL_1=0$ , T13 is ON and T14 is OFF, which transfers $\overline{Q_0}$ to $V_y$ . For $Q_0=2$ and $Q_1=0$ , $\overline{Q_0}$ is logic 0 and thus, $V_y$ is set at logic 0 which turns OFF T15 and disconnects ML from ground indicating a match case. When $Q_0=0$ and $Q_1=2$ , $\overline{Q_0}$ is logic 2 and thereby, $V_y$ is set at logic 2 which turns ON T15 and connects ML to ground to indicate a mismatch case. When a 16T TCAM stores an X i.e. $Q_0 = Q_1 = 2$ , $V_y$ is set at logic 0 through T13 for $SL_0 = 2$ and $SL_1 = 0$ , or through T14 for $SL_0 = 0$ and $SL_1 = 2$ , or through T16 for $SL_0 = 0$ and $SL_1 = 0$ . This value of $V_y$ turns OFF T15 and hence, ML is disconnected from ground. Therefore, the stored X always shows a match irrespective of the search data. Now, when there is a search for X i.e. $SL_0 = SL_1 = 0$ , both T13 and T14 are OFF. In this case, control line SG is set to logic 2 which connects $V_y$ to ground through T16. T15 is turned OFF and thereby, ML is not shorted to ground and indicates a match regardless of the stored data. This match case confirms global masking feature of 16T TCAM. #### 6.2.3 Three-Valued CAM (3CAM) Cell Figure 6.3 shows schematic diagram of proposed CNTFET-based 3CAM cell. This cell is implemented using 11 transistors and named as 11T-3CAM cell. It differs from TCAM in a way that it works on true valued ternary logic. 11T-3CAM is capable to store and search three logic states: 0, 1 and 2, where logic 1 is used for don't care state (X). Figure 6.3: Schematic diagram of 11T three-valued CAM (3CAM) cell ## **Read/Write Operation** 11T-3CAM cell uses one ternary memory (TMEM) cell of [247] for data storage, and eliminates the need for a second storage cell (i.e. SRAM) typically used in traditional TCAM construction. TMEM cell contains two cross coupled static ternary inverters (STIs); STI has one input signal Q which can be at out of three values: 0, 1 and 2, and produces an output $\overline{Q}$ with a logic value of 2-Q (i.e. 2, 1 or 0, respectively). Two STIs contain transistors T1, T2, T3, T4, T5 and T6. They use two power supplies: a regular supply voltage (V<sub>dd</sub>) and a lower supply voltage (V<sub>dd</sub>/2). The chirality vector of T1 and T4 is (8, 2). The number of CNTs is eight. The chirality vector of T2 and T5 is also (8, 2). The threshold voltage of these transistors is 0.599 V with the diameter of 0.718 nm. The number of CNTs is twenty. The chirality vector of T3 and T6 is (10, 0). The threshold voltage of these transistors is 0.55 V with the diameter of 0.783 nm. The number of CNTs is two. These transistors are connected between voltage rail $V_{dd}/2$ and node Q, and $\overline{Q}$ , respectively. The gate of these transistors is connected to V<sub>dd</sub> causing them to be always ON which in-effect act as pull-up for logic 1. But, they also allow a dc path to exist whenever pull up for logic 2 and pull down for logic 0 occurs. To minimize this fight, T3 and T6 are made weaker by reducing number of tubes to two, lowering the pitch to 10 nm and increasing the channel length to 64 nm. Along with two cross coupled STIs, TMEM cell contains two access transistors T7 and T8 with the chirality vector of (19, 0). The threshold voltage of these transistors is 0.289 V and diameter is 1.487 nm. The number of CNTs is three. TMEM cell stores data at Q and $\overline{Q}$ . For read operation, complementary data line columns DL and $\overline{DL}$ are precharged to logic 2 and left floating. When WL is asserted, T7 and T8 are ON and voltage levels of Q and $\overline{Q}$ are transferred to DL and $\overline{DL}$ , and data stored in cell is read. During the read operation, both storage nodes Q and $\overline{Q}$ remain unchanged. For write operation, desired data and its complement are placed on DL and $\overline{DL}$ . When WL is asserted, T7 and T8 are ON and voltage levels of DL and $\overline{DL}$ are transferred to Q and $\overline{Q}$ , and data is written into the cell. #### **Search Operation** 11T-3CAM contains one new compare network for data comparison. This compare network utilizes low-capacitance search logic to get fast match operation. It consists of three transistors T9, T10 and T11. 11T-3CAM compares complementary ternary data stored at Q and $\overline{Q}$ with the complementary data placed on the search line SL and $\overline{SL}$ . Exact matching is implemented through storing and subsequently searching for either logic 0 or logic 2. Pattern matching is done with local and global masking. In local masking, logic 1 is stored at both Q and $\overline{Q}$ which represents a stored don't care. Global masking is performed through searching for logic 1 on both SL and $\overline{SL}$ which represents a searched don't care state. To get match whenever logic 1 is stored or being searched, the threshold voltage of T11 is set as 0.55 V using the chirality vector of (10, 0). The diameter of T11 is 0.783nm. The threshold voltage of T9 and T10 is taken as 0.289 V using the chirality vector of (19, 0). The diameter of these transistors is 1.487nm. A 3CAM cell behaves similar to TCAM whenever logic 0 or logic 2 is stored or being searched. For example, when logic 2 is stored, Q is at logic 2 and $\overline{Q}$ is at logic 0. If there is a search for logic 2 i.e. SL=2 and $\overline{SL}=0$ , T9 is OFF and T10 is ON, which drives node $V_z$ to logic 0, as a consequence T11 is OFF and match line (ML) is not shorted to ground indicating a match condition. If there is a search for logic 0 i.e. SL=0 and $\overline{SL}=2$ , T9 is ON and T10 is OFF, which pulls up $V_z$ to logic 2, thus T11 is ON which makes a connection between ML and ground to indicate a mismatch condition. For the case when logic 1 is stored (Q = $\overline{Q}$ = 0.45 V), $V_z$ is charged up to 0.45 V through T9 if SL=0 (and $\overline{SL}=2$ ) or through T10 if the SL=2 (and $\overline{SL}=0$ ) or through both T9 and T10 if SL=1 (and $\overline{SL}=1$ ). In either condition, T11 is OFF due to high threshold voltage and hence ML is disconnected from ground and the stored logic 1 always shows a match regardless of the search value. Now, when there is a search for logic 1, both T9 and T10 are ON. In this case, $V_z$ sets at 64mV when Q=0 (and $\overline{Q}=2$ ) or Q=2 (and $\overline{Q}=0$ ), due to voltage conflict between T9 and T10. This value of $V_z$ turns OFF T11 and thereby ML is disconnected from ground, regardless of the stored value. Thus, the proposed cell shows match for both local and global masking. ## **6.3** Results and Discussion In this section, CNTFET-based CAM cells are analyzed and evaluated using Synopsis HSPICE simulator with 32nm Stanford CNTFET model of [117] which considers practical non-idealities of CNTFET. Details of the Stanford model have been given in section 2.2 of chapter 2. In order to compare the performance of the proposed designs, CAM cells presented in [250] are reproduced and simulated. For fair comparisons, equivalent number of CNTs is used in CNTFETs of all cells while keeping the same transistor size ratios. #### **Simulation Setup** To sense ML during CAM search operation for generation of match result, current race sensing scheme is used and shown in Figure 6.4 [232]. Prior to a search operation, ML is precharged to low and MLPRE signal is turned ON. As a consequence, transistor T5 is turned OFF and transistor T4 is turned ON, and node M\_SENSE is precharged to high which makes sense amplifier output MLSO to low. At the same time, search lines of memory are set to their new data values and hence there is only a single ML/SL pre charge phase. In the evaluation phase, MLPRE signal is turned off and $\overline{EN}$ signal is asserted to low, which turns ON transistor T2 and connects ML to the current source implemented using a biased transistor T1. Since ML is disconnected from ground under a match case, it charges up at a higher rate compared to a ML having at least one mismatch. Once the voltage at ML rises above the threshold voltage of transistor T5 (sense amplifier), M\_SENSE is discharged to ground and MLSO is latched to high value indicating a match. After that $\overline{EN}$ is switched to a high value which turns OFF T2, and disconnects the current source from ML. **Figure 6.4:** Schematic diagram for current race sensing scheme [232] A current race sensing scheme saves power in two ways. First, it turns OFF the current when ML reaches the threshold voltage of T5 and consequently, limits the voltage swing of all MLs to approximately half of $V_{dd}$ (0.45 V). Thereby, it reduces the ML power dissipation by a factor of two in comparison with that of full swing ML sensing scheme. Secondly, during precharging of ML to ground, search lines do not need to be reset between consecutive searches, hence minimizing SL switching activity, a current race sensing scheme reduces SL power dissipation by a factor of two. For structure of current sensing scheme, the chirality vector of (19, 0) is used for all transistors with three tubes. $V_{BIAS}$ signal is set to 0.58 V which generates a current of 10 $\mu$ A in T1 to ensure that the maximum voltage on ML in case of any mismatch should be much smaller than the threshold voltage of T5. All transient simulations are performed at room temperature with 0.9 V power supply voltage. These simulations have been performed on a $1 \times 20$ memory array for getting a realistic capacitive loading at ML. The match delay is measured from the time when the ML starts charging till it is latched by the sense amplifier. For average power, power is measured for every operation. #### **Functional Verification of CAM Cells** Figure 6.5 shows simulated transient waveform of 11T-3CAM cell for the different search data. The first waveform shows a MLPRE signal which goes high to precharge ML and low to evaluate ML, and repeats for three cycles. The second and third waveforms represent $\overline{EN}$ and SL respectively. The remaining waveforms show ML and MLSO for different stored Q values. Figure 6.5: Transient waveform of 11T three-valued CAM (3CAM) cell In the first search evaluation cycle when SL=2 and MLPRE is low, ML and MLSO show high for Q=2 & Q=1, and low for Q=0, indicating match and mismatch conditions, respectively. Similarly, in the third search evaluation cycle when SL=0 and MLPRE is low, ML and MLSO show low for Q=2, and high for Q=1 & Q=0, indicating mismatch and match conditions respectively. In the second search evaluation cycle, when MLPRE is low and SL=1, ML and MLSO show high to indicate always a match irrespectively of Q, which verify global masking. Similarly, when Q=1, ML and MLSO show high indicating always a match irrespectively of SL, which verify local masking. Hence, the simulated waveform confirms correct functionally of 11T-3CAM. Similarly, the simulated transient waveforms of 9T BCAM and 16T TCAM cells included in Appendix, confirms their correct functionality. #### **Simulation Results** Comparison of the CNTFET-based CAM cells is shown in Table 6.2. Since match delay is directly proportional to ML capacitance, lower ML capacitance of proposed designs leads to high speed match operation. 9T-BCAM achieves reduction in match delay by 74% with 6% saving in power and 10% saving in device count compared to that of BCAM of [250]. Similarly, 16T TCAM shows 73% reduction in match delay with comparable power performance than its counterpart of [250]. **Table 6.2.** Comparison of CNTFET-based CAM cells | Memory Cells | Match Delay (×10 <sup>-11</sup> S) | Power consumption (×10 <sup>-6</sup> W) | Transistor<br>Count | |---------------------|------------------------------------|-----------------------------------------|---------------------| | 9T BCAM (proposed) | 0.61 | 3.34 | 9 | | BCAM of [250] | 2.41 | 3.57 | 10 | | 16T TCAM (proposed) | 0.60 | 4.34 | 16 | | TCAM of [250] | 2.22 | 4.35 | 16 | | 11T-3CAM (proposed) | 0.69 | 8.8 | 11 | | 3CAM of [250] | 1.95 | 8.9 | 12 | Since 3CAM cell has the same usability as the TCAM cell, 11T-3CAM is compared with TCAM cells also along with existing 3CAM cells. Compared to 3CAM of [250], 11T-3CAM gets reduction in match delay and transistor count by 64% and 8%, respectively, without any loss in power performance. Although 11T-3CAM has high power consumption than that of TCAM of [250] due to voltage conflict occurs between the storage of logic 1 and logic 0 (or logic 2), it achieves 69% reduction in match delay and 31% reduction in transistor count. Further, to determine the stability of proposed cell, we measured the minimum noise voltage present at each of the storage nodes in 3CAM that would flip the state of the cell by using the method described in [250]. The storage cell of proposed 3CAM cell achieves a read margin of 0.12 V and a write margin of 0.22 V, respectively. #### 6.4 Conclusion This chapter has presented design of high speed CAM cells in CNTFET technology. BCAM and TCAM cells have been designed based on low capacitance search logic. BCAM provides storing and searching of two logic values: 0 and 2, while TCAM provide an added flexibility of pattern matching with the use of don't care (X). A new 3CAM cell has also been developed. 3CAM cell stores three logic values: 0, 1 and 2 in a single ternary SRAM cell and eliminates the need for a second binary SRAM typically used in TCAM construction. The proposed 3CAM cell uses CNTFETs with two different threshold voltages (0.289 V and 0.55 V) in implementation of low capacitance search network. HSPICE Simulation results have confirmed that all presented CAM cells perform the correct functionality during the read, write and search operations. Compared to existing CNTFET-based counterpart, BCAM cell achieves reduction in match delay by 74% with 6% saving in power and 10% saving in device count. Similarly, TCAM shows 73% reduction in match delay with same device count and comparable power performance. 3CAM cell gets reduction in match delay and transistor count by 64% and 8%, respectively, without any loss in power performance. Hence, the observed results reveal that the presented CAM cells are capable to enhance the performance of memory systems. ## 7.1. Conclusion CNTFET is a promising alternative to the traditional Si MOSFET for high performance and low power VLSI circuit. CNTFET shows elegant association with ternary logic. In particular, the best way to design ternary circuit is the multiple-threshold method, and desired threshold voltage can be achieved by utilizing different diameter of CNT in CNTFET device. Additional, ternary logic reduces chip area as well as the complexity of interconnects with increasing their information contents. Further, arithmetic and logic unit (ALU) is the most basic and important component of processor in a digital computer. Modern computer needs efficient implementation of ALU in terms of hardware for increased value of integration density, speed for high throughput and immensely increased capabilities, and power for compact and portable applications. Next, content addressable memory (CAM) is an application specific memory which performs parallel data comparison with data storage. CAM is mainly popular for realizing network applications which require a lot of fast CAM cells to get high speed look-up operation in larger routing tables. In this thesis, efficient ternary ALU (TALU) and high speed CAM cell have been developed using CNTFETs. Design of a 2-bit hardware optimized TALU (HO-TALU) has been presented in chapter 3. HO-TALU has a new adder-subtractor (AS) module which performs both addition and subtraction operations using an adder module only with the help of multiplexers. Thus, it eliminates a subtractor module from the conventional architecture. HO-TALU minimizes ternary function expressions and utilizes binary gates along with ternary gates in realization of functional modules: AS, multiplier, comparator and exclusive-OR. As a consequence, the sub-blocks of AS: HAS and FAS use nearly 76% and 82% less transistors, respectively, than conventional designs which contain separate adder and subtractor blocks. Multiplier, comparator and exclusive-OR show reduction in device count by 64%, 82% and 76%, respectively, with respect to their existing counterparts. Results obtained from HSPICE simulator with Stanford model of 32nm CNTFET, have shown that all HO-TALU modules achieve great improvement (nearly two hundred times) in power-delay product (PDP) with respect to their CMOS-based counterpart, which verifies the potential benefit of CNTFET circuits. In comparison with existing CNTFET-based designs, proposed multiplier, comparator and exclusive-OR get reduction in PDP by 75%, 65% and 28%, respectively. But, PDP of sub-modules HAS and FAS has marginally increased by 2% and 5%, respectively. Thus, all HO-TALU modules achieve good hardware efficiency with a minor loss of PDP for addition and subtraction operations only, with respect to CNTFET circuits available in the literature. Besides, design of 2-bit HO-TALU is extended to develop a 2-bit HO-TALU slice which could be easily cascaded to construct an N-bit HO-TALU. Ternary full adder (TFA) which is a basic sub-block of AS has been modified using different circuit techniques to improve their efficiency in terms of PDP, and presented in chapter 4. Three new designs of TFA have been developed. The first TFA named as HS-TFA contains a symmetric pull-up and pull-down networks along with a resistive voltage divider as its integral part, which is configured using transistors. Compared to most energy efficient TFA available in literature, HS-TFA has high driving capability and gets reduction in delay by 9% but it shows high power dissipation. The second TFA named as low power TFA (LP-TFA) has been developed using complimentary pass transistors logic style. This LP-TFA shows reduction in power by 24% with improvement in PDP by 5%, but it has 20% more delay. The third TFA named as dynamic TFA (DTFA) has been implemented based on dynamic logic, which uses a ternary keeper to compensate charge loss due to charge sharing problem. DTFA has high driving capability and achieves reduction in power, delay and PDP by 24%, 15% and 35%, respectively. But it needs CNTFET devices with smaller diameter (0.626 nm) additionally in order to reduce charge leakage. All three TFAs have been designed based on inherent binary nature (0 and 1) of input carry, which leads to reduced device count in designs. Further, new design of 1-bit comparator has been developed using pass transistor logic with reduced number of stages in critical delay path. This design has been used to create 2-bit and N-bit comparator where a static binary tree configuration has been utilized to correct the voltage levels. The proposed 2-bit comparator has high driving capability and achieves 29% reduction in PDP with 34% less device count compared to that of its counterpart available in literature. But, it has two output signals to check greater, lesser and equal conditions, which make the decoding logic of comparator response complex in those applications where three outputs (one for each condition) are desired. Apart from these, all new TFAs and 2-bit comparator show less susceptibility to voltage and temperature variations with respect to existing designs. Design of a 2-bit power optimized TALU (PO-TALU) has been presented in chapter 5. 2-bit PO-TALU functional modules: adder-subtractor-exclusive-OR (ASE) and multiplier have been designed using new complementary CNTFET-based binary computational unit and a low complexity encoder. ASE eliminates an exclusive-OR module and subtractor module from the conventional architecture. Multiplier uses a new efficient block named as carry add (CA) block in place of THA. In comparison with existing energy efficient CNTFET-based designs, HSPICE simulation results have shown that the sub-blocks of ASE: half adder-subtractor-exclusive-OR (HASE) and full adder-subtractor-exclusive-OR (FASE) consume 66% and 47% less power. HASE shows reduction in delay and device count by 4% and 26%, correspondingly. FASE shows 25% reduction in device count but it has 29% more delay. Sub-block of multiplier module: 1-bit multiplier shows reduction in power, delay and device count by 70%, 5% and 37%, respectively. ASE and multiplier are less sensitive to voltage and temperature variations. Design of 2-bit PO-TALU has been modified to implement 2-bit PO-TALU slice which could be easily cascaded to form an N-bit PO-TALU. Hence, TALU designs presented in this thesis, can serve as an efficient functional unit for modern ternary microprocessor with CNTFET in nanoscale era. Design of high speed CAM cells has been presented in chapter 6. Binary CAM (BCAM) and ternary CAM (TCAM) cells have been designed based on low capacitance search logic. BCAM provides storing and searching of two logic values: 0 and 2, while TCAM provide an added flexibility of pattern matching with the use of don't care (X). A new three-valued CAM (3CAM) cell has also been developed. 3CAM cell stores three logic values: 0, 1 and 2 in a single ternary SRAM cell and eliminates the need for a second binary SRAM typically used in TCAM construction. The proposed 3CAM cell uses CNTFETs with two different threshold voltages (0.289 V and 0.55 V) in implementation of low capacitance search network. HSPICE Simulation results have confirmed that all presented CAM cells perform the correct functionality during the read, write and search operations. Compared to existing CNTFET-based counterpart, BCAM cell achieves reduction in match delay by 74% with 6% saving in power and 10% saving in device count. Similarly, TCAM shows 73% reduction in match delay with same device count and comparable power performance. 3CAM cell gets reduction in match delay and transistor count by 64% and 8%, respectively, without any loss in power performance. Hence, the observed results reveal that the presented CAM cells are capable to enhance the performance of memory systems. ## 7.2 Future Scope of Work CNTFET with ballistic transport operation, excellent thermal conductivity and high current driving capability, is turned out to be a promising alternative to the conventional Si-MOSFET. The design space of CNTFET-based digital circuit and memory circuit can be carried forward for development of highly efficient modern electronics. All CNTFET-based TALU designs presented in this thesis perform nine ternary operations. It would be interesting to extend these designs for an increased number of ternary operations with modification in function select logic, decoder and functional modules. Therefore, the presented work can be used to design ternary microprocessors with CNTFETs. Besides, functional modules of TALU such as AS and multiplier etc. are the building blocks of many other applications including video and image processing and DSP architectures. It is possible to explore the use of TALU functional modules in efficient realization of these applications. A fast and compact 3CAM cell designed using CNTFETs, has been shown in chapter 6. This cell has twice power consumption compared to that of conventional TCAM cell due to voltage conflict occurred between the storage of logic 1 and logic 0 (or logic 2); therefore alternate ways need to be found to reduce this. Further, it is possible to explore other circuit element of CAM design such as sense amplifier and match line precharge control circuit by utilizing CNTFETs for real-time applications like multimedia data transmission. Apart from circuit design domain, it would be interesting to explore techniques for layout formation of presented designs. ## REFERENCES - 1. D. H. Roberts, "Silicon integrated circuits: A personal view of the first 25 years," *Electronics and Power*, vol. 30, no. 4, pp. 282-284, Apr. 1984. - 2. J.S. Kilby, "The Integrated Circuit's Early History," in *Proc. IEEE*, Jan. 2000, vol. 88, no. 1, pp. 109-111. - 3. N. Srivastava and K. Banerjee, "Performance Analysis of CarbonNanotube Interconnects for VLSI Applications," in *Proc. IEEE/ACM Int. Conf. Computer-Aided Design*, Nov.2005, pp. 383-390. - 4. S. C. Kang and Y. Leblebici, "Introduction," in *CMOS digital integrated circuits:* analysis and design, Tata McGraw-Hill Edition: New Delhi, 2003, pp. 1-45. - 5. E. Mollick, "Establishing Moore's Law," in *IEEE Annals History Computing*, July-Sept. 2006, vol. 28,no. 3, pp. 62-75. - 6. M. Horowitz, E. Alon, D. Patil, S. Naffziger, R. Kumar and K.Bernstein, "Scaling, Power, and the Future of CMOS," in *Proc. IEEE Int. Electron Device Meeting, IEDM Tech. Dig.*, Washington, DC, Dec. 2005, pp. 7-15. - 7. F. Schwierz, J. Pezoldta and R. Granznera, "Two-dimensional materials and their prospects in transistor electronics," *Nanoscale*, vol. 7, no. 18, pp. 8261-8283, Apr. 2015. - 8. The International Technology Roadmap for Semiconductors, 2013 ITRS edition. Available: http://www.itrs.net. - 9. K. Roy, S. Mukhopadhyay and H. Meimand-Mehmoodi, "Leakage current mechanisms and leakage reduction techniques in deep-submicron CMOS circuits," in *Proc. IEEE*, Feb. 2003, *vol.* 91, no. 2, pp. 305-327. - 10. C. Durkan, "Current at the nanoscale an introduction to nanoelectronics," Imperial College Press, Cambridge: UK. (Book cover all effects), Mar. 2007. - 11. K. Saraswat, H. Cho, P. Kapur and K-H. Koo, "Performance Comparison between Copper, Carbon Nanotube and Optical Interconnects," in *Proc.IEEE Int. Symp. Circuits and Systems*, Seattle, WA, May 2008, pp. 2781-2784. - 12. A. Naeemi, R. Sarvari, and J. D. Meindl, "Performance comparison between carbon nanotube and copper interconnects for GSI," in *Proc. IEEE Int. Electron Device Meeting, IEDM Tech. Dig.*, Dec. 2004, pp. 699-702. - 13. P. Avouris, M. Radoslavjevic and S. Wind, "Carbon nanotube electronics and Optoelectronics, in Carbon Nanotubes," *Springer-Verlag*, Berlin, Germany, 2004. - 14. D. Hisamoto, L. Wen-Chin, J. Kedzierski, E. Anderson, H.Takeuchi, K.Asano, K. Tsu-Jae, J. Bokor and H. Chenming, "A folded-channel MOSFET for deep-sub-tenth micron era," in *Proc. IEEE Int. Electron Device Meeting, IEDM Tech. Dig.*, San Francisco, CA, USA, Dec. 1998, pp. 1032-1034. - J. Kavalieros, B. Doyle, S.Datta, G. Dewey, M. Doczy, B.Jin, D.Lionberger, M.Metz, W.Rachmady, M. Radosavljevic, U.Shah, N. Zelick and R.Chau, "Tri-Gate Transistor Architecture with High-k Gate Dielectrics, Metal Gates and Strain Engineering," in *Proc. IEEE Int. Symp. VLSI Technol.*, Honolulu, HI, 2006, pp. 50-51. - 16. Emerging Research Devices, in International Technology Roadmap for Semiconductors, 2009 ITRS edition. Available: http://www.itrs.net. - 17. Y. Cui, Z. Zhong, D. Wang, W. U. Wang, and C. M. Lieber, "High Performance Silicon Nanowire Field Effect Transistors," *Nano Lett.*, vol. 3, no. 2, pp. 149-152, 2003. - 18. T. Ashley, A.R.Barnes, L. Buckle, S. Datta, A.B. Dean,M.T. Emery,M.Fearn, D.G. Hayes,K.P. Hilton,R. Jefferies, T.Martin, K.J. Nash,T.J. Phillips,W.A Tang,P.J. Wilding and R.Chau, "Novel InSb-based quantum well transistors for ultra-high speed, low power logic applications," in *Proc. IEEE 7<sup>th</sup>Int. Solid-State and Integrated Circuits Technology*, Oct. 2004, vol. 3, pp. 2253-2256. - 19. S. Datta, T.Ashley, J. Brask and L.Buckle, M.Doczy, M.Emeny, D.Hayes, K. Hilton, R.Jefferies, T. Martin, T.J. Phillips, D. Wallis, P. Wilding and R.Chau, "85nm gate length enhancement and depletion mode InSb quantum well transistors for ultra high speed and very low power digital logic applications," in *Proc. IEEE Int. Electron Device Meeting, IEDM Tech. Dig.*, Washington, DC, Dec. 2005, pp. 763-766. - 20. K. Chien-I, H. Heng-Tung, W. Chien-Ying, Y. Chang-Edward, Y.Miyamoto, C. Yu-Lin D. Biswas, "A 40-nm-Gate InAs/In0.7Ga0.3As Composite-Channel HEMT with - 2200 mS/mm and 500-GHz fT," in *IEEE Int. Indium Phosphide & Related Materials*, Newport Beach, CA, May 2009, pp. 128-131. - 21. T. Ashley, A. B. Dean, C. T. Elliott, R. Jefferies, F. Khaleque, and T. J. Phillips, "High speed, low-power InSb transistors," in *Proc. IEEE Int. Electron Device Meeting, IEDM Tech. Dig.*, Washington, DC, USA, Dec. 1997, pp. 751-754. - 22. X. Wang, Y. Ouyang, X. Li, H. Wang, J. Guo, and H. Dai, "Room Temperature All Semiconducting sub-10nm Graphene Nanoribbon Field-Effect Transistors," *Phys. Rev. Lett.*, vol. 100, no. 20, 2008. DOI: 10.1103/PhysRevLett.100.206803 - 23. C. M. Lemme, T. J. Echtermeyer, M. Baus, and H. Kurz, "A Graphene Field-Effect Device," *IEEE Electron Device Letters.*, vol. 28, no. 4, pp. 282-284, 2007. - 24. S. Kim et al., "Realization of a high mobility dual-gated graphene field-effect transistor with Al<sub>2</sub>O<sub>3</sub> dielectric," *Applied Physics Letters*, vol. 94, no. 6, 2009. DOI: 10.1063/1.3077021 - 25. A. Javey, J. Guo, Q. Wang, M. Lundstrom, and H. Dai, "Ballistic carbon nanotube field-effect transistors," *Nature*, vol. 424, pp. 654-657, 2003.DOI: 10.1038/nature01797 - 26. Y. B. Kim, "Challenges for nanoscale MOSFETs and emerging nanoelectronics," *Trans. Electr. Electron. Mater.*, vol. 11, no. 3, pp.93–105, 2010. - 27. J. Deng, N.Patil, K. Ryu, A. Badmaev, Z. Chongwu, S. Mitra, H.-S.P.Wong, "Carbon nanotube transistorcircuits: circuit-level performance benchmarking and design options for living with imperfections," in *Proc. IEEE Int. Solid State Circuits* (ISSCC), Dig. Technical Papers, San Francisco, CA, Feb. 2007, pp. 70-588. - 28. P. Clarke. (2010, Feb. 23). Junctionless transistor is ready for 20-nm node, says researcher [Online]. Available: http://www.eetimes.com/document.asp?doc\_id=1173121. - 29. S. D. Sung, Y. H. Kyoung, C. H. Keun, L. Ming, Y. Y. Yeoh, L. Sung-Young, K.M. Sung, Y.Eun Jung, K. Min Sang, C. W. Oh, K. Sung Hwan, K.Dong-Won and P. Donggun, "High-Performance Twin Silicon Nanowire MOSFET (TSNWFET) on Bulk Si Wafer," *IEEE Trans. Nanotechnol.*, vol. 7, no. 2, pp. 181-184, 2008. - 30. S. M. Koo, M. D. Edelstein, Q. Li, C. A. Richter, and E. M. Vogel, "Silicon nanowires as enhancement-mode Schottky barrier field-effect transistors," *Nanotechnology*, vol. 16, no. 9, pp. 1482-1485, 2005. - 31. R. Chau, S. Datta, M. Doczy, B. Doyle, B. Jin, J. Kavalieros, A. Majumdar, M. Metz and M. Radosavljevic, "Benchmarking nanotechnology for high-performance and low-power logic transistor applications," *IEEE Trans. Nanotechnology*, vol. 4, no. 2, pp. 153-158, 2005. - 32. P. Clarke. (2010, Dec. 2). Intel's Gargini pushes III-V-on-silicon as 2015 transistor option [Online]. Available:http://www.eetimes.com/document.asp?doc\_id=1173048 - 33. N. Goel, D.Heh, S. Koveshnikov, I.Ok, S.Oktyabrsky, V. Tokranov, R. Kambhampatic, M.Yakimov, Y. Sun, P.Pianetta, C.K. Gaspe, M.B. Santos, J.Lee, S.Datta, P. Majhi, W.Tsai, "Addressing the gate stack challenge for high mobility InxGa1-xAs channels for NFETs," in *Proc. IEEE Int. Electron Device Meeting*, *IEDM Tech. Dig.*, Dec. 2008, pp. 1-4. - 34. Y. Sun, E.W. Kiewra, J.P. de Souza, S.J Koester, J.J. Bucchignano, N.Ruiz, K.E. Fogel, D.K. Sadana, G.G. Shahidi, J.Fompeyrine, D.J. Webb, M. Sousa, C. Marchiori, R. Germann and K.T.Shiu, "High mobility III-V channel MOSFETs for post-Si CMOS applications," in *IEEE Int. Conf. IC Design and Technology*. *ICICDT*, Austin, TX, May 2009, pp. 161-164. - 35. A. K. Geim and K. S. Novoselov, "The rise of graphene," *Nat Mater*, vol. 6, no. 3, pp. 183-191, Mar. 2007. - 36. R. C. Johnson. (2010, Jan. 2). Graphene wafers ready to fab carbon chips [Online]. Available: http://www.eetimes.com/document.asp?doc\_id=1172918 - 37. J. Kedzierski, H. Pei-Lan, P.Healey, P.W. Wyatt, C.L. Keast, M. Sprinkle, C. Berger, H.de, A.Walt, "Epitaxial Graphene Transistors on SiC Substrates," *IEEE trans. Electron Devices*, vol. 55, no. 8, pp. 2078-2085, Aug. 2008. - 38. K. Bullis. (2008, Jan. 28). Graphene Transistors [Online]. Available: http://www.technologyreview.com/news/409449/graphene-transistors/ - 39. J. Deng and H.-S. P.Wong, "A Compact SPICE Model for Carbon-Nanotube Field-Effect Transistors Including Nonidealities and Its Application—Part II: Full Device - Model and Circuit Performance Benchmarking, " *IEEE Trans. Electron Device*, vol. 54, no.12, pp. 3195 3205, Dec. 2007. - 40. Z. Yao, C. L. Kane, and C. Dekker, "High-Field Electrical Transport in Single-Wall Carbon Nanotubes," *Phys. Rev. Lett.*, vol. 84, no. 13, pp. 2941-2944, Mar. 2000. - 41. T. Durkop, S. A. Getty, E. Cobas, and M. S. Fuhrer, "Extraordinary Mobility in Semiconducting Carbon Nanotubes," *Nano Lett.*, vol. 4, no. 1, pp. 35-39, 2004. - 42. A. Javey, J. Guo, D. B. Farmer, Q. Wang, D. Wang, R.G. Gordon, M. Lundstrom and H. Dai, "Carbon Nanotube Field-Effect Transistors with Integrated Ohmic Contacts and High-k Gate Dielectrics," *Nano Lett.*, vol. 4, no. 3, pp. 447-450, Mar. 2004. - 43. R. Martel, T. Schmidt, H. R. Shea, T. Hertel, and P. Avouris, "Single- and multi-wall carbon nanotube field-effect transistors," *Applied Physics Lett.*, vol. 73, no. 17, pp. 2447-2449, Oct. 1998. - 44. S. J. Tans, A. R. M. Verschueren, and C. Dekker, "Room-temperature transistor based on a single carbon nanotube," *Nature*, vol. 393, no. 6680, pp. 49-52, May.1998. - 45. V. Derycke, R. Martel, J. Appenzeller, and P. Avouris, "Carbon Nanotube Inter- and Intramolecular Logic Gates," *Nano Letters*, vol. 1, no. 9, pp. 453-456, 2001. - 46. A. Bachtold, P. Hadley, T. Nakanishi, and C. Dekker, "Logic Circuits with Carbon Nanotube Transistors," *Science*, vol. 294, no. 5545, pp. 1317-1320, Nov. 2001. - Z. Chen, J. Appenzeller, Y. M. Lin, J. S. Oakley, A. G. Rinzler, J. Tang, S. J. Wind, P. M. Solomon, P. Avouris, "An integrated logic circuit assembled on a single carbon nanotube," *Science*, vol. 311, no. 5768, pp. 1735, Mar. 2006. - 48. N. Patil, A. Lin, E. Myers, H. S. P.Wong and S. P. Mitra, "Integrated wafer-scale growth and transfer of directional carbon nanotubes and misaligned-carbon-nanotube-immune logic," in *Proc. Symp. VLSI Tech.*, Honolulu, HE, pp. 205–206, 2008. - 49. N. Patil, A.Lin, Z. Jie, W. Hai, K. Anderson, H.-S.P. Wong, S. Mitra, "Scalable carbon nanotube computational and storage circuits immune to metallic and mispositioned carbon nanotubes," *IEEE Trans.Nano Technol.*, vol.10, no. 4, pp. 744–750, July 2011. - 50. M. Shulaker, J. V. Rethy, G. Hills, H. Chen, G. Gielen, H.P. Wong, S. Mitra, "Experimental demonstration of a fully digital capacitive sensor interface built entirely using carbon-nanotube FETs," in *Proc. IEEE Int. Solid-State Circuits Conf. Digest of Technical papers (ISSCC)*, Feb. 2013,pp. 112–113. - 51. IBM Milestone Advances Effort to Enhance Semiconductors through Nanotechnology. Yorktown Heights, NY, USA, 2006, Mar 24. [Online]. Available:https://www-03.ibm.com/press/us/en/pressrelease/22191.wss. - 52. Q. Cao, H.-S. Kim, N. Pimparkar, J. P. Kulkarni, C. Wang, M. Shim, K. Roy, M. A. Alam, J. A. Rogers, "Medium-scale carbon nanotube thin-film integrated circuits on flexible plastic substrates," *Nature*, vol. 454, no. 7203, pp. 495-500, Jul. 2008. - 53. M. Shulaker, G. Hills, N. Patil, H. Wei, H. Y. Chen, H. S. P. Wong and S. Mitra, "Carbon nanotube computer," *Nature*, vol. 501, no. 7468, pp. 526-530, Sep. 2013. - 54. J. Zhang, A. Lin, N. Patil, H. Wei, L. Wei, H.-S. P. Wong, and S. Mitra, "Robust digital VLSI using carbon nanotubes," *IEEE Trans. CAD*, vol. 31, no.4, pp. 453–471, Apr. 2012. - 55. N. Patil, "Design and Fabrication of Imperfection-Immune Carbon Nanotube Digital VLSI Circuits," PhD thesis, Stanford Univ., 2010. - 56. R. Chau, B. Doyle, S. Datta, J. Kavalieros, and K. Zhang, "Integrated nanoelectronics for the future," *Nat Mater*, vol. 6, no. 11, pp. 810-812, Nov. 2007. - 57. J. Guo, S.Datta, M. Lundstrom, M. Brink, P. McEuen, A. Javey, H. Dai, K. Hyoungsub and P. McIntyre, "Assessment of Silicon MOS and Carbon Nanotube FET Performance Limits Using a General Theory of Ballistic Transistors," in *Proc. IEEE Int. Electron Device Meeting, IEDM*, San Francisco, CA, USA, Dec. 2002, pp. 711-714. - 58. J. Guo, A. Javey, H. Dai, and M. Lundstrom, "Performance Analysis and Design Optimization of Near Ballistic Carbon Nanotube Field-Effect Transistors," in *Proc. IEEE Int. Electron Devices Meet, IEDM Tech. Dig.*, Dec. 2004, pp. 703-706. - 59. E. L. Post, "Introduction to a general theory of elementary propositions", *American Journal of Mathematics*, vol. 43, no. 3, pp. 163-185, Jul. 1921. - 60. A. C. Alexander, Notes on the Synthesis of Form, Cambridge, MA: Harvard University Press, 1964. - 61. M. Kameyama, S. Kawahito, T. Higuchi, "A multiplier chip with multiple-valued bidirectional current-mode logic circuits," *Computer*, vol. 21, no. 4, pp. 43-56, Apr. 1988. - 62. P. C. Balla and A. Antoniou, "Low power dissipation MOS ternary logic family," *IEEE J. Solid-State Circuits*, vol. 19, no. 5, pp. 739–749, Oct.1984. - 63. A. Heung and H. T. Mouftah, "Depletion/enhancement CMOS for a lower power family of three-valued logic circuits," *IEEE J. Solid-State Circuits*, vol. 20, no. 2, pp. 609–616, Apr. 1985. - 64. A. Raychowdhury and K. Roy, "Carbon-nanotube-based voltage-mode multiple-valued logic design," *IEEE Trans. Nanotechnol.*, vol. 4, no. 2,pp. 168–179, Mar. 2005. - 65. D. A. Rich, "A survey of multivalued memories," *IEEE Trans. Comput.*, vol. 35, no. 2, pp. 99–106, Feb. 1986. - 66. M. K. Raja and N. Koppala, "Modeling and Implementation of Reliable Ternary Arithmetic and Logic Unit Design Using VHDL," *Int. J. Engineering Research and Applications*, vol. 4, no. 6, pp.259-264, June 2014. - 67. S. Lin, Y. Kim, and F. Lombardi, "CNTFET-based design of ternary logic gates and arithmetic circuits," *IEEE Trans. Nanotechnol.*, vol. 10, no.2, pp. 217–225, 2010. - 68. P. Keshavarzian and K. Navi, "Universal ternary logic circuit design through carbon nanotube technology," *Int. J. Nanotechnol.*, vol. 6, no. 10–11, pp. 942–953, 2009. - 69. T. E. Rani, M. A. Rani, and R. Rao, "Area Optimized Low Power Arithmetic And Logic Unit," in 3<sup>rd</sup> Int. Con. Electronics Comp. Tech. (ICECT), Kanyakumari, April 2011, pp. 224 228. - 70. R. A. Powers, "Batteries for low power electronics," in *Proc. IEEE*, Apr. 1995, vol. 83, no. 4, pp. 687-693. - 71. A. Srivastava and D. Govindarajan, "A Fast ALU Design in CMOS for Low Voltage Operation," *VLSI Design*, vol. 14, no. 4, pp. 315-327, 2002. - 72. A. Srivastava and C. Srinivasan, "ALU design using reconfigurable CMOS logic," in 45<sup>th</sup>Midwest Symp. Circuits and Systems, MWSCAS, 2002, vol. 2, pp. 663-666. - 73. A. P. Dhande, & V. T. Ingole, "Design & Implementation of 2-Bit Ternary ALU slice," *In Proc.* 3<sup>rd</sup> *Int. conf. Sciences of electronic, Technologies of information and telecommunications*, Tunisia, Mar. 2005, pp. 1–11. - 74. A. S. Tanenbaum, Computer Networks. Prentice Hall, Upper Saddle River, NJ, 2003. - 75. P. Gupta, "Algorithms for routing lookups and packet classification," Ph.D. Thesis, Department of Computer Science, Stanford University, CA, 2000. - 76. A. J. McAuley, and P. Francis, "Fast routing table lookup using CAMs," in *Proc. INFOCOM*, San Francisco, CA, 1993, vol. 3, pp. 1382-1391. - P. F. Lin, and J. B. Kuo, "A 1-V 128-kb four-way set-associative CMOS cache memory using word line-oriented tag-compare WLOTC structure with the content addressable memory (CAM) 10-transistor tag cell," *IEEE J. of Solid-state Circuits*, vol. 36, no. 4, pp. 666-675, Apr. 2001. - 78. P. F. Lin, and J. B. Kuo, "A 0.8-V 128-kb four-way set-associative two-level CMOS cache memory using two-stage wordline/bitline-oriented tag-compare (WLOTC / BLOTC) scheme," *IEEE J. of Solid-state Circuits*, vol. 37, no. 10,pp. 1307-1317, Oct. 2002. - 79. H. Higuchi, S. Tachibana, M. Minami, and T. Nagano, "A 5-mW, 10-ns cycle TLB using a high-performance CAM with low-power match detection circuits," *IEICE Trans. on Electronics*, vol. E79-C, no. 6, Jun. 1996. - 80. M. Sumita, "A 800 MHz single cycle access 32 entry fully associative TLB with a 240ps access match circuit," *Dig. Technical Papers*, Symp. VLSI Circuits, Jun. 2001, pp. 231-232. - 81. J. P. Wade, and C. G. Sodini, "A ternary content-addressable search engine," *IEEE J. of Solid-state Circuits*, vol. 24, no. 4, Aug. 1989. - 82. K. J. Lin, and C. W. Wu, "A low-power CAM design for LZ data compression," *IEEE Trans. on Computers*, vol. 49, no. 10, Oct. 2000. - 83. T. Ogura, M. Nakanishi, T. Baba, Y. Nakabayshi, and R. Kasai, "A 336-kb content addressable memory for highly parallel image processing," in *Proc. IEEE conf. Custom Integrated Circuits (CICC)*, May 1996, pp. 273-276. - 84. F. Yu, R. H. Katz, and T. V. Lakshman, "Gigabit rate packet pattern-matching using TCAM," in *Proc. IEEE Int. Conf. Network Protocols (ICNP)*, Berlin, Germany, Oct. 2004, pp. 1-10. - 85. F. Yu, and R H. Katz, "Efficient multi-match packet classification with TCAM," in *Proc. IEEE Symp. High Performance Interconnects (HOTI)*, Stanford, CA, Aug. 2004, pp. 1-7. - 86. R. Saito, G. Dresselhaus, and M. S. Dresselhaus, "Physical Properties of Carbon Nanotubes," Imperial College Press, London, 1998. - 87. G. Cho, Y.B. Kim, Y. F. Lombardi and M. Choi, "Performance evaluation of CNFET based logic gates," in *Proc. IEEE Instrumentation and Measurement Technology Conf.*, I2MTC, Singapore, May, 2009, pp. 909–912. - 88. G. W., Hanson, "Fundamentals of Nanoelectronics," Prentice–Hall Inc., New Jersey, USA, 2008. - 89. J. Appenzeller, "Carbon Nanotubes for High-Performance Electronics—Progress and Prospect," in *Proc. IEEE*, vol. 96, no. 2, Feb. 2008, pp. 201-211. - 90. A. Raychowdhury and K. Roy, "Carbon Nanotube Electronics: Design of High-Performance and Low-Power Digital Circuits," *IEEE Trans. Circuits and Systems I: Regular Papers*, vol. 54, no. 11, pp. 2391 2401, Nov. 2007. - 91. J. Deng and H. S. P. Wong, "A Compact SPICE Model for Carbon-Nanotube Field-Effect Transistors Including Nonidealities and Its Application—Part I: Model of the intrinsic channel region, " *IEEE Trans. Electron Device*, vol. 54, no.12, pp. 3186 -3194, Dec. 2007. - 92. Y. Li, W. Kim, Y Zhang, M Rolandi, and D. Wang, "Growth of Single-Walled Carbon Nanotubes from Discrete Catalytic Nanoparticles of Various sizes," *J. Phys. Chem.*, vol. 105, no. 46, pp. 11424–11431, Oct. 2001. - 93. Y. Ohno, S. Kishimoto, T. Mizutani, T. Okazaki, and H. Shinohara, "Chirality assignment of individual single-walled carbon nanotubes in carbon nanotube field- - effect transistors by micro-photocurrent spectroscopy," *Appl. Phys. Lett.*, vol. 84, no.8, pp. 1368–1370, Feb 2004. - 94. B. Wang, P. Poa, L. Wei, L. Li, Y. Yang, and Y. Chen, "(n,m) Selectivity of single-walled carbon nanotubes by different carbon precursors on Co–Mo catalysts," *J. American Chemical Society*, vol. 129, no. 9, pp. 9014–9019, 2007. - 95. A. Lin, N. Patil, K. Ryu, A. Badmaev, L. G. De Arco, C. Zhou, S. Mitra, and H.-S. P. Wong, "Threshold Voltage and On–Off Ratio Tuning for Multiple-Tube Carbon Nanotube FETs," in *IEEE Trans. Nanotechnology*, vol. 8, no. 1, Aug. 2008, pp. 04-09. - 96. A. Javey, J. Guo, D. B. Farmer, Q. Wang, E. Yenilmez, R. G. Gordon, M. Lundstrom and H. Dai, "Self-Aligned Ballistic Molecular Transistors and Electrically Parallel Nanotube Arrays," *Nano Lett.*, vol. 4, no. 7, pp. 1319-1322, Jul. 2004. - 97. Y.-M. Lin, J. Appenzeller, Z. Chen, Z.-G Chen, H.-M Cheng and Ph. Avouris, "Demonstration of a High Performance 40nm Gate Carbon Nanotube Field Effect Transistor," in *Proc. IEEE Device Research Conference Digest*, DRC, Santa Barbara, CA, June, 2005, pp. 113 114. - 98. A. Javey, Q. Wang, W. Kim and H. Dai, "Advancements in Complementary Carbon Nanotube Field Effect Transistors," in *Proc. IEEE Int. Electron Devices Meeting*, IEDM Tech. Dig., Dec. 2003, pp. 31.2.1 31.2.4. - 99. Y. C. Tseng, K. Phoa, D. Carlton, and J. Bokor, "Effect of Diameter Variation in a Large Set of Carbon Nanotube Transistors," *Nano Lett*, vol. 6, no. 7, pp. 1364-1368, Jul. 2006. - 100. A. Raychowdhury, V. De, J. Kurtin, S. Borkar, K. Roy and A. Keshavarzi, "Variation Tolerance in a Multichannel Carbon-Nanotube Transistor for High-Speed Digital Circuits," *IEEE Trans. on Electron Devices*, vol. 56, no. 3, pp. 383-392, Mar.2009. - 101. S. Han, X. Liu, and C. Zhou, "Template-Free Directional Growth of Single-Walled Carbon Nanotubes on a- and r-Plane Sapphire," *J. American Chemical Society*, vol. 127, no. 15, pp. 5294-5295, Apr. 2005. - 102. C. Kocabas, M. Shim, and J. A. Rogers, "Spatially Selective Guided Growth of High-Coverage Arrays and Random Networks of Single-Walled Carbon Nanotubes and Their Integration into Electronic Devices," *J. American Chemical Society*, vol.128, no. 14, pp. 4540-4541, Apr. 2006. - 103. S. J. Kang, C. Kocabas, T. Ozel, M. Shim, N. Pimparkar, M. A. Alam, S. V. Rotkin and J. A. Rogers, "High-Performance Electronics Using Dense, Perfectly Aligned Arrays of Single-Walled Carbon Nanotubes," *Nat. Nanotechnol.*, vol. 2, no. 4, pp. 230-236, Apr. 2007. - 104. N. Patil, A. Lin, E.R. Myers, K. Ryu, A. Badmaev, Z. Chongwu, H.-S.P. Wong and S. Mitra, "Wafer-Scale Growth and Transfer of Aligned Single-Walled Carbon Nanotubes," *IEEE Trans. Nanotechnology*, vol. 8, no. 4, pp. 498-504, Mar. 2009. - 105. S. W. Hong, T. Banks and J. A. Rogers, "Improved Density in Aligned Arrays of Single-Walled Carbon Nanotubes by Sequential Chemical Vapor Deposition on Quartz," *Advanced Materials*, vol. 22, no. 45, pp. 1826-1830, 2010. - 106. M. M. Shulaker, H. Wei, N. Patil, J. Provine, H.-Y. Chen, H.-S. P. Wong, and S. Mitra, "Linear Increases in Carbon Nanotube Density through Multiple Transfer Technique," *Nano Letters*, vol. 11, no. 5, pp. 1881-1886, 2011. - 107. J. Kong, C. Zhou, E. Yenilmez and H. Dai, "Alkaline Metal-Doped n-type Semiconducting Nanotubes as Quantum Dots," *Appl. Phys. Lett.*, vol. 77, no. 24, pp. 3977–3979, 2000. - 108. N. Moriyama, Y. Ohno, T. Kitamura, S. Kishimoto and T. Mizutani "Change in Carrier Type in High-K Gate Carbon Nanotube Field-Effect Transistors by Interface Fixed Charges," *Nanotechnology*, vol. 21, no. 16, 2010. - 109. D. Mann, A. Javey, J. Kong, Q. Wang, and H. Dai, "Ballistic Transport in Metallic Nanotubes with Reliable Pd Ohmic Contacts," *Nano Letters*, vol.3, no. 11, pp. 1541-1544, Oct. 2003. - 110. Z. Y. Zhang, S. Wang, L. Ding, X. L. Liang, H. L. Xu, J. Shen, Q. Chen, R. L. Cui, Y. Li and L.-M. Peng, "High-Performance N-Type Carbon Nanotube Field-Effect Transistors with Estimated Sub-10-ps Gate Delay," *Applied Physics Letters*, vol. 92, no.13pp. 133117, Mar.2008. - J. Zhang, N. Patil, A. Hazeghi, H.-S. P. Wong and S. Mitra, "Characterization and Design of Logic Circuits in the Presence of Carbon Nanotube Density Variations," *IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems*, vol. 30, no. 8, pp. 1103-1113, 2011. - 112. J. Zhang, "Variation-aware design of carbon nanotube digital VLSI circuits," Ph.D. thesis, Department Electrical Engineering, Stanford University, Stanford, CA, 2011. - 113. G. Zhang, P. Qi, X. Wang, Y. Lu, X. Li, R. Tu, S. Bangsaruntip, D. Mann, L. Zhang and H. Dai, "Selective Etching of Metallic Carbon Nanotubes by Gas-Phase Reaction," *Science*, vol. 314, no. 5801, pp. 974-977, Nov. 2006. - 114. P. G. Collins, M. S. Arnold, and P. Avouris, "Engineering Carbon Nanotubes and Nanotube Circuits Using Electrical Breakdown," *Science*, vol. 292, no. 5517, pp. 706-709, Apr. 2001. - 115. N. Patil, A. Lin, J. Zhang, H. Wei, K. Anderson, H.-S.P. Wong, and S. Mitra, "VMR: VLSI-compatible metallic carbon nanotube removal for imperfection-immune cascaded multi-stage digital logic circuits using Carbon Nanotube FETs," in *Proc. IEEE Int. Electron Devices Meet. (IEDM)*, Baltimore, MD, Dec. 2009, pp. 1-4. - 116. N. Patil, J. Deng, A. Lin, H. Wong, and S. Mitra, "Design Methods for Misaligned and Mispositioned Carbon-Nanotube Immune Circuits," *IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems*, vol. 27, no. 10, pp. 1725-1736, 2008. - 117. Stanford University CNFET model Website. Stanford University, Stanford, CA, 2008 [Online]. Available: http://nano.stanford.edu/model.php?id=23 - G. Gelao, R. Marani, R. Diana, A.G. Perri, "A Semi-Empirical SPICE Model for n-type Conventional CNTFETs.," *IEEE Trans. Nanotechnology*, vol. 10, no. 3, pp.506-512, May 2010. - 119. R. Marani and A.G. Perri, "A Compact, Semi-empirical Model of Carbon Nanotube Field Effect Transistors oriented to Simulation Software," *Current Nanoscience*, vol. 7, no. 2, pp.245-253, 2011. - 120. R. Marani, G. Gelao and A.G. Perri, "Comparison of ABM SPICE library with Verilog-A for Compact CNTFET model implementation," *Current Nanoscience*, vol.8, no.4, pp.556-565, Sep. 2012. - 121. R. Marani, G. Gelao and A.G. Perri, "Modelling of Carbon Nanotube Field Effect Transistors oriented to SPICE software for A/D circuit design," *Microelectronics Journal*, 44, no. 1, pp.33-38, Jan. 2013. - 122. C. Dwyer, M. Cheung and D. J. Sorin, "Semi empirical SPICE models for carbon nanotube FET logic," in *Proc. IEEE Conf. Nanotechnology*, Aug. 2004, pp. 386–388. - 123. F. Pregaldiny, J. B. Kammerer and C. Lallement, "Compact Modelling and Applications of CNTFETs for Analog and Digital Circuit Design," in *Proc.* 13<sup>th</sup> IEEE Int. Conf. Electronics, Circuits and Systems, ICECS, Nice, Dec. 2006, pp. 1030–1033. - 124. S. Fregonese, H. Cazin d'Honincthun, J. Goguet, C. Maneux, T. Zimmer, J.-P. Bourgoin, P. Dollfus, S. Galdin-Retailleau, "Computationally Efficient Physics-Based Compact CNTFET Model for Circuit Design," *IEEE Trans. Electron Devices*, vol. 55, no. 6, pp. 1317-1327, June 2008. - 125. C. Maneux, J. Goguet, S. Fregonese, T. Zimmer, H. Cazin d'Honincthun, S. Galdin-Retailleau, "Analysis of CNTFET physical compact model," in *Int. Conf. Design and Test of Integrated Systems in Nanoscale Technology*, DTIS, Tunis, Sept. 2006, pp. 40 45. - 126. R. Marani and A. G. Perri, "Modelling of CNTFETs for Computer Aided Design of A/D Electronic Circuits," *Current Nanoscience*, vol. 10, no. 3, pp. 326-333, June 2014. - 127. B.-J. Liu, L. Cai, X. Yang, H. Huang, X. Zhao and Y. Wen, "An accurate numerical model for charge density in ballistic carbon nanotube field effect transistors (CNTFETs)," *Int. J. Physical Sciences*, vol. 7, no. 1, pp. 18 23, Jan. 2012. - 128. S. Yamacli and M. Avci, "Accurate SPICE compatible CNT interconnect and CNTFET models for circuit design and simulation," *Mathematical and Computer Modelling*, vol. 58, no. 1–2, pp. 368–378, July 2013. - 129. G. Cho, F. Lombardi and Y. B. Kim, "Modelling a CNTFET with Undeposited CNT Defects," in *Proc. IEEE 25th Int. Symp. Defect and Fault Tolerance in VLSI Systems* (*DFT*), Kyoto, Oct. 2010, pp. 289 296. - 130. H.T. Mouftah and I.B. Jordan, "Integrated circuits for ternary logic," in *Proc. Int. Symp. Multiple Valued Logic*, Morgantown, West Virginia, United States, May 1974, pp. 285–302. - 131. H.T. Koanantakool, "Implementation of ternary identity cells using CMOS integrated circuits", *Electronics Lett.*, vol. 14, no. 15, pp. 462 464, July 1978. - 132. H.T. Mouftah and I.B. Jordan, "Design of ternary COS/MOS memory and sequential circuits," *IEEE Trans. Computers*, vol.C-26, no.3, pp. 281-288, March 1977. - 133. H.T. Mouftah, "A study on the implementation of three-valued logic," in *Proc. 6th int. Symp. Multiple-valued logic (ISMVL)*, Bloomington, IL, May 1976, pp. 123-126. - 134. J.M. Carmona, J.L. Huertas, and J.I. Acha, "Realization of three-valued C.M.O.S. cycling gates," *Electronics Lett.*, vol. 14, no. 9, pp. 288-290, 1978. - 135. J. L. Huertas and J. M. Carmona, "Low-power ternary CMOS circuits", in *Proc. IEEE int. Symp. Multiple-valued logic (ISMVL)*, 1979, pp. 170-174. - 136. H.T.Mouftah and K.C. Smith, "Design and Implementation of Three-Valued Logic Systems with M.O.S. Integrated Circuits," in *Proc. IEE Electronic Circuits & Systems*, vol. 127, no. 4, Aug. 1980, pp. 165-168. - 137. L. Meng and G. Wei-Nan, "The New Method of Implementation for Ternary Logic System", in *Proc. IEEE 13th Int. Symp. Multiple-Valued Logic (ISMVL)*, May 1983. pp. 56-60. - 138. H. M. Aytac, "Ternary Logic Based on a Novel MOS Building Block Circuit" *Int. J. Electronics*, vol. 63, no. 2, 1987. - 139. H. T. Mouftah and K.C. Smith, "Injected voltage low-power CMOS for 3-valued logic," in *Proc. IEEE Electronic Circuits and Systems*, G, vol. 129, no. 6, Dec.1982, pp. 270–272. - 140. A. Srivastava, K. Venkatapathy, "Design and Implementation of a Low Power Ternary Full Adder," *VLSI Design*, vol. 4, no.1, pp. 75–81, 1996. - 141. A. Srivastava, "Back gate bias method of threshold voltage control for the design of low voltage CMOS ternary logic circuits," *Microelectronics Reliability*, vol. 40, no. 12, pp. 2107–2110, 2000. - 142. J. S. Wang, C.-Y. Wu, M.-K. Tsai, "Low power dynamic ternary Logic," in *Proc. IEEE Electronic Circuits and Systems*, G, vol. 135, no. 6, Dec.1988, pp. 221-230. - 143. M. Yoeli and G. Rosenfeld, "Logical Design of Ternary Switching Circuits," *IEEE Trans. Electronic Computers*, vol. 14, no. 1, pp. 19-29, Mar. 1965. - 144. C. Y. Wu and H. Y. Huang, "A new two-phase pipelied dynamic CMOS temary logic," in *IEEE Proc. ISCAS*, May 1990, vol. 1, pp. 582-586. - 145. W. Chung-Yu and H. Hong-Yi, "Design and application of pipelined dynamic CMOS ternary logic and simple ternary differential logic," *IEEE J. Solid-State Circuits*, vol. 28, no. 8, pp. 895-906, Aug. 1993. - 146. A. Herrfeld and S. Hentschke, "CMOS ternary dynamic differential logic," *Electronics Letters*, vol. 30, no. 10, pp. 762 763, May 1994. - 147. F. Toto and R. Saletti, "CMOS dynamic ternary circuit with full logic swing and zero-static power consumption," *Electron. Letters*, vol. 34, no. 11, pp. 1083–1084, 1998. - 148. S.G. Younis and T. F. Knight, "Practical implementation of charge recovery asymptotically zero power CMOS," In *Proc. Symp. Research on integrated systems*, 1993, pp. 236-250. - 149. D. Mateo and A. Rubio, "Quasi-adiabatic ternary CMOS logic," *Electronics Letters*, vol. 32, no. 2, pp. 99–101, 1996. - 150. D. Mateo and A. Rubio, "Design and Implementation of a 5 X5 trits Multiplier in a Quasi-Adiabatic Ternary CMOS Logic," *IEEE J. Solid State Circuits*, vol. 33, no. 7, Jul. 1998. - 151. S. G. Younis and T. F. Knight, "Asymptotically zero energy split-level charge recovery logic," in *Int. Workshop Low Power Design*, 1994, pp.177–182. - 152. S. G. Younis, "Asymptotically zero energy computing with split-level charge recovery logic," Ph.D. dissertation, Mass. Inst. Technol., Cambridge, June 1994. - 153. Y. Ye and K. Roy, "Reversible and quasistatic adiabatic logic," in *European conf. Circuit Theory and Design*, 1997, pp. 912–917. - 154. H. N. Shivashankar and A. P. Shivaprasad, "Ternary multiplexer" *Int. J. Electronics*, vol. 53, no. 4, pp.363-370, 1982. - 155. H. N. Shivashankar and A. P. Shivaprasad, "Ternary function and circuit design using ternary Multiplexers," *Int. J. Electronics*, vol. 56, no. 1, pp.135-150, 1983. - 156. M. Kameyama, and T. Higuchi, "Synthesis of optimal T-gate Networks in Multiple-valued Logic," *IEEE Trans. Computers*, vol. 26, no. 12, pp. 1297 1302, Dec. 1977. - 157. E. Sipos, G. Oltean and C. Miron, "A Method to Design Ternary Multiplexers Controlled by Ternary Signals Based on SUS-LOC," in *Proc. IEEE Int. con. Automation*, Quality and Testing, Robotics, AQTR, Cluj-Napoca, vol. 3, May 2008, pp. 402-407. - 158. A. S. kumar and A. S. Priya, "Modeling of Combinational Circuits Based on Ternary Multiplexer Using VHDL", *Int. J. Computer Science and Engineering (IJCSE)*, vol. 2, no. 5, pp.1777-1791, 2010. - 159. H. Gundersen and Y. Berg, "A Novel Balanced Ternary Adder Using CMOS Recharged Semi-Floating Gate Devices," in *Proc. 36th IEEE Int. Symp. Multiple-Valued Logic (ISMVL)*, Singapore, May 2006, pp. 18. - 160. H. Gundersen and Y. Berg," Fast Addition Using Balanced Ternary Counters Designed With CMOS Semi-Floating Gate Devices", in *Proc. 37th IEEE Int. Symp. Multiple-Valued Logic (ISMVL)*, Oslo, May 2007, pp. 30. - 161. X. Zeng and P. Wang, "Design of Low-Power Complementary Pass Transistor and Ternary Adder Based on Multi-valued Switch-signal Theory," in *Proc. IEEE Int. con. ASIC*, Changsha, Hunan, Oct. 2009, pp-851-854. - 162. X. Zeng, P. Wang, "Design of Low Power Ternary Magnitude Comparator Based on Multi-Valued Switch-Signal Theory," *Asia-Pacific con. Information Processing*, 2009, pp. 258-261. - 163. S. Muta, "Micropower CMOS implementation of three valued logic function", in 13th Int. Symp. Multiple-Valued Logic, ISMVL, May 1983, pp. 61-63. - 164. E. Mccluskey, "Logic design of MOS ternary logic," *in Int. Symp. Multiple-Valued Logic*, ISMVL, May 1980, pp. 1-5. - 165. T. Sasao, "Compact SOP representation for multiple output functions-An encoding method using multiple valued logic", in *Proc. IEEE 31st Int. Symp. Multiple-Valued Logic, ISMVL*, 2001, 6 pages. - 166. M. Aline, T. Saidi, E. Kinvi-Boh, O. Sentieys and E. D. Olson., "Design and characterization of a low-power ternary DSP," in *Proc. Int. Signal Processing Con.* (*ISPC*), USA, 2003. - 167. I. S. E Chen and T.N. Rajashekhara, "A Fast Multiplier Design Using Signed-Digit Numbers and 3-Valued Logic", in *Proc. 33rd Midwest Symp. Circuits and Systems*, Calgary, Alta, Aug. 1990, vol. 2, pp. 881-884. - 168. T.N. Rajashekhara and A.S. Nale, "Conversion Representation from Signed-Digit to Radix Complement," *Int. J. Electronics*, vol. 69, no. 6, 1990. - 169. N. Takagi, H. Yasuura and S. Yajima, "High Speed VLSI Multiplication Algorithm with A Redundant Binary Addition Tree," *IEEE Trans. Computers*, vol. C-34, no. 9, pp.789-796, Sep. 1985. - 170. T.N. Rajashekhara and O. Kal, "Fast Multiplier Design Using Redundant Signed-Digit Numbers," *Int. J. Electronics*, vol. 69, no. 3, 1990. - 171. P. Wang, K. Li and F. Mei, "Design of Ternary Adiabatic Multiplier on Switch-Level," *J. Electronics (China)*, vol. 28, no. 3, pp. 375-382, May 2011. - 172. K. Navi, M.Rashtian, A. Khatir, P. Keshavarzian and O. Hashemipour, "High speed capacitor–inverter based carbon nanotube full adder," *Nanoscale Res. Lett.*,vol. 5, no. 5, pp. 859–862,Mar. 2010. - 173. A. Khatir, S. Abdolahzadegan and I. Mahmoudi, "High Speed Multiple Valued Logic Full Adder Using Carbon Nano Tube Field Effect Transistor," *Int. J. VLSI design & Communication Systems (VLSICS)*, vol. 2, no.1, Mar. 2011. - 174. A. Ghorbani and M. Sarkhosh, "A New Low Power Full Adder Cell Based On Carbon Nanotube Field Effect Transistors," *J. Basic. Appl. Sci.Res.*, vol. 3, no. 3, pp.1267-1272, 2013. - 175. S. Mehrabi, R. F. Mirzaee, K. Navi, and O. Hashemipour, "A High-Efficient Multi-Output Mixed Dynamic/Static Single-Bit Adder Cell," *ISRN Electronics*, vol. 2013, Article ID 376869, 8 pages, 2013. DOI:10.1155/2013/376869 - 176. S. Mehrabi, R. F. Mirzaee, M. H. Moaiyeri, K. Navi and O. Hashemipour, "CNFET-Based Design of Energy-Efficient Symmetric Three-Input XOR and Full Adder Circuits," *Arabian J. Science and Engineering*, vol. 38, no. 12, pp. 3367-3382, Dec. 2013. - 177. M. H. Moaiyeri, R. F. Mirzaee, K. Navi and A. Momeni, "Design and analysis of a high-performance CNFET-based Full Adder," *Int. J. Electronics*, vol. 99, no. 1, pp. 113-130, 2012. - 178. K. Navi, A. Momeni, F. Sharif and P. keshavarzian, "Two novel ultra-high speed carbon nanotube full-adder cells," *IEICE Electronics Express*, vol. 6, no. 19, pp. 1395-1401, Oct. 2009. - 179. R. S. Rad, M. Norouzi and S. K. H. Rabori, "A Great Efficiency Full Adder Cell Based on Carbon Nano-Tube Technology," *Research J. Applied Sciences*, Engineering and Technology, vol. 5, Issue 14, pp 3791-3795, 2013 - 180. K. Navi, R. Sharifi Rad, M. Hossein Moaiyeri and A. Momeni, "A low-voltage and energy-efficient full adder cell based on carbon nanotube technology," *Nano-Micro Letters*, vol. 2, no.2, pp 114-120, 2010. - 181. M. Sarkhosh, A. Ghorbani, S. Naderi, T. Panahi and P. Keshavarzian, "Best Performance Parallel Prefix Adder Cells by Carbon Nano Tube Field Effect Transistors," *J. Basic. Appl. Sci. Res.*, vol. 2, no. 12, pp 12294-12301, 2012. - 182. M. H. Ghadiry, A. A. Manaf, M. T. Ahmadi, H. Sadeghi and M. N. Senejani, "Design and Analysis of a New Carbon Nanotube Full Adder Cell," *J. Nano Materials*, vol. 2011, Article Id 906237, 6 pages, 2011. - 183. A. Taeb, K. Navi, M. R. Taheri and A. Zakerolhoseini, "Design of an energy-efficient CNFET Full Adder Cell," *IJCSI Int. J. Computer Science Issues*, vol. 9, no. 3, May 2012. - 184. M. Bagherizadeh, M. Eshghi, "Two novel low-power and high-speed dynamic carbon nanotube full-adder cells," *Nanoscale Research Letters*, vol. 6, no. 1, pp. 519-526, 2011. - 185. M. Moradi and K. Navi, "Performance Analysis of 3 Improved Modified 1-Bit Full Adder Cells Based on CNTFET Technology," *European J. Scientific Research*, vol. 62, no. 4, pp. 588-599, Oct. 2011. - 186. R. F. Mirzaee, M. H. Moaiyeri and K. Navi, "High Speed NP-CMOS and Multi-Output Dynamic Full Adder Cells," *World Academy of Science, Engineering and Technology*, vol. 4, no. 3, 2010. - 187. S. Wairya1, R. K. Nagaria and S. Tiwari, "New Design Methodologies For High-Speed Mixed-Mode CMOS Full Adder Circuits," *Int. J. VLSI design & Communication Systems (VLSICS)*, vol. 2, no. 2, June 2011. - 188. M.H. Moaiyeri, R. Chavoshisani, A. Jalali, K. Navi, O. Hashemipour, "High-performance mixed-mode universal min-max circuits for nanotechnology," *Circuits Syst. Signal Process*, vol. 31, no. 2, pp. 465–488, Apr. 2012. - 189. R. Zarhoun, M. H. Moaiyeri, S. S. Farahani, and K. Navi, "An Efficient 5-Input Exclusive-OR Circuit Based on Carbon Nanotube FETs," *ETRI Journal*, vol. 36, No. 1, pp. 89-98, Feb. 2014. - 190. S. S. Farahani, R. Zarhoun, M. H. Moaiyeri and K. Navi, "An Efficient Cntfet-Based 7-Input Minority Gate," *Int. J. VLSI design & Communication Systems (VLSICS)*, vol 4, no.1, Feb. 2013. - 191. S. Das, S. Bhattacharya and D. Das, "Design of Digital Logic Circuits using Carbon Nanotube Field Effect Transistors," *Int. J. Soft Computing and Engineering (IJSCE)*, vol. 1, no. 6, Dec. 2011. - 192. I. O'Connor, J. Liu and F. Gaffiot, "CNTFET-Based Logic Circuit Design," Int. con. Design and Test of Integrated Systems in Nanoscale Technology, DTIS, Tunis, Sep. 2006, pp. 46–51. - 193. A. Raychowdhury and K. Roy, "A Novel Multi Valued Logic Design Using Ballistic CNTFETs," in *Proc. 34th IEEE Int. Symp. Multiple-Valued Logic (ISMVL)*, Toronto, Canada, May 2004. DOI: 10.1109/ISMVL.2004.1319913. - 194. A. Raychowdhury, S. Mukhopadhyay, and K. Roy, "Circuit Compatible Modeling Of Carbon Nanotube FET's in The Ballistic Limit Of Performance," in *Proc. 3rd IEEE Conf. Nanotechnology*, Aug. 2003, vol. 12–14, pp. 343–346. - 195. S. Lin, Y. Kim, F. Lombardi, "A Novel CNFET Based Ternary Logic Gate Design," in Proc. 52<sup>nd</sup> IEEE Int. Midwest Symp. Circuits and Systems, MWSCAS, Cancun, Aug. 2009, pp. 435-438. - 196. N. Haiqing, "Novel ternary logic design based on CNFET," *SoC Design Con.* (*ISOCC*), Seoul, Nov. 2010, pp. 115-118. - 197. M.A, Khayer, R.K. Lake, "Drive currents and leakage currents in InSb and InAs nanowire and nanotube band-to-band tunneling FETs," *IEEE J. Electron Device Letters*, vol. 30, no. 12, pp. 1257 1259, 2009. - 198. J. Liang, L. Chen, J. Han and F. Lombardi, "Design and Reliability Analysis of Multiple Valued Logic Gates using Carbon Nanotube FETs," in *Proc. IEEE/ACM* Int. Symp. Nanoscale Architectures (NANOARCH), Jul. 2012, pp. 131-138, - 199. M.H. Moaiyeri, A. Doostaregan, K. Navi, "Design of energy-efficient and robust ternary circuits for nanotechnology," *IET Circuits Devices System*, vol. 5, no.4, pp. 285–296, 2011. - 200. M. H. Moaiyeri, R.F.Mirzaee, A. Doostaregan, K. Navi, O. Hashemipour, "A universal method for designing low-power carbon nanotube FET-based multiple-valued logic circuits," *Computers & Digital Techniques IET*, vol. 7, no. 4, Jul. 2013. - 201. C. Vudadha, V. Sreehari, M.B. Srinivas, "Multiplexer Based Design for Ternary Logic Circuits," in *Proc. IEEE 8<sup>th</sup> conf. Ph.D. Research in Microelectronics and Electronics (PRIME)*, Aachen, Germany, June 2012, pp. 1-4. - 202. C. Vudadha, Sreehari V, M.B. Srinivas, "2:1 Multiplexer Based Design for Ternary Logic Circuits..2012," in *Proc. IEEE Asia Pacific Conf. Postgraduate Research*, Dec. 2013, pp. 46-51. - 203. P. V. Saidutt, V. Srinivas, P.S. Phaneendra, N. M. Muthukrishnan, "Design of Encoder for Ternary Logic Circuits," in *Proc. IEEE Asia Pacific Conf. Postgraduate Research*, Dec. 2012, pp. 85-88. - 204. V. Sridevi, T. Jayanthy, "Minimization of CNTFET Ternary Combinational Circuits Using Negation of Literals Technique," *Arabian J. Science and Engineering*, vol. 39, no. 6, pp. 4875-4890, May 2014. - 205. C. Vudadha, P.S. Phaneendra., G.Makkena, V. Sreehari and N.M. Muthukrishnan,; M.B. Srinivas, "Design of CNFET based Ternary Comparator using Grouping Logic," in *Proc. IEEE Faible Tension Faible Consommation (FTFC)*, Paris, June 2012, pp. 1-4. - 206. C. Vudadha, P.S. Phaneendra., V. Shreehari, M.B. Shrinivas, "CNTFET based Ternary Magnitude Comparator," in *Proc. Int. Symp. Communications and Information Technologies (ISCIT)*, Gold Coast, QLD, Oct. 2012, pp. 942-946. - 207. K. Nepal, "Dynamic circuits for ternary computation in carbon nanotube based field effect transistors," in *Proc.* 8<sup>th</sup> *IEEE Int. NEWCAS Conf.*, Montreal, QC, June 2010, pp. 53–56. - 208. R. Mariani, F. Pessolano and R. Saletti, "A new CMOS ternary logic design for low-power low-voltage circuits," in *Proc. PATMOS*, 7th Int. Workshop Program, 1997, pp.8-10. - 209. M.H. Moaiyeri, R.F. Mirzaee, K. Navi and O. Hashemipour, "Efficient CNTFET-based ternary Full Adder cells for nanoelectronics," *Nano-Micro Letters*, vol. 3, no. 1, pp. 43–50, 2011. - 210. S.A. Ebrahimi, P. Keshavarzian, S. Sorouri and M. Shahsavari, "Low power CNTFET-based ternary full adder cell for nanoelectronics," *Int. J. of Soft Computing and Engineering*, vol. 2, no. 2, pp 291-295, 2012. - 211. R.F. Mirzaee, M.H. Moaiyeri, M. Maleknejad, K. Navi, and O. Hashemipour, "Dramatically Low-Transistor-Count High-Speed Ternary Adders," In *Proc. IEEE 43rd Int. Symp. Multiple-Valued Logic (ISMVL)*, Toyama, May 2013, pp. 170-175. - 212. P. Keshavarzian and R. Sarikhani, "A Novel CNTFET-based Ternary Full Adder," *Circuits Systems and Signal Processing*, vol. 33, no. 3, pp. 665–679, Mar. 2014. - 213. K. Sridharan, S. Gurindagunta and V. Pudi, "Efficient Multi-ternary Digit Adder Design in CNTFET Technology," *IEEE trans. on nanotechnology*, vol. 12, no. 3, pp. 283-287, May 2013. - 214. G. Cho, Y.-B. Kim, and F. Lombardi, "Assessment of CNTFET-based circuit performance and robustness to PIV variations," in *Proc. IEEE Int. Midwest Symp. Circuits Syst.*, Aug. 2009, pp. 1106–1109. - 215. T. Panahi, S. Naderi, T. Heidari, E. Z. nejad and P. Keshavarzian, "New Ternary Logic Subtractor Using Carbon Nanotube Field-Effect Transistors," *Int. J. Soft Computing and Engineering (IJSCE)*, vol. 2, no. 6, Jan. 2013. - 216. J.T. Koo, "Integrated-circuit content-addressable memories," *IEEE J. Solid-State Circuits*, vol. 5, no. 5, pp. 208-215, 1970. - 217. H. Kadota, J. Miyake, Y. Nishimichi, H. Kudoh, K. Kagawa, "An 8-Kbit content-addressable and re-entrant memory," *IEEE J. Solid-State Circuits*, vol.20, no. 5, pp. 951-957, Oct. 1985. - 218. G.A. Uvieghara, Y. Nakagome, D.-K. Jeong and D. Hodges, "An on-chip smart memory for a data-flow CPU," *IEEE J. Solid-State Circuits*, vol. 25, pp. 84-94, no. 1, Feb.1990. - 219. H. Bergh, J. Eneland, L.-E. Lundstrom, "A fault-tolerant associative memory with high-speed operation," *IEEE J. Solid-State Circuits*, vol. 25, no. 4, pp. 912-919, Aug. 1990. - 220. H. Miyatake, M. Tanaka, and Y. Mori, "A design for high-speed low power CMOS fully parallel content-addressable memory macros," *IEEE J. Solid-State Circuits*, vol. 36, no. 6, pp. 956–968, Jun. 2001. - W.R. Daasch, "Inexact match associative memory cell," *Electronics Letters*, vol. 27, no. 18, pp. 1623-1625, Aug. 1991. - 222. S. C. Liu, F. A. Wu, and J. B. Kuo, "A novel low-voltage content-addressable memory (CAM) cell with a fast tag-compare capability using partially depleted (PD) SOI CMOS dynamic-threshold (DTMOS) techniques," *IEEE J. Solid-State Circuits*, vol. 36, no. 4, pp. 712–716, Apr. 2001. - 223. G. Thirugnanam, N. Vijaykrishnan, and M. J. Irwin, "A novel low power CAM design," in *Proc. 14th Annual IEEE Int. ASIC/SOC Conf.*, Arlington, VA, 2001, pp. 198–202. - 224. T. Jamil, "RAM versus CAM", *IEEE Potentials*, April/May 1997, pp. 26-29. - 225. C. A. Zukowski and W. Shao-Yi, "Use of Selective Precharge for Low- Power Content Addresssable Memories", in *Proc. IEEE Int. Symp. Circuits and Systems, ISCAS*, June 1997, vol. 3, pp. 1788-1791. - 226. J.L. Mundy, J.F. Burgess, R. E. Joynson and C. Neugebauer, "Low-cost associative memory," *IEEE J. Solid-State Circuits*, vol. 7, no. 5, pp. 364-369, Oct. 1972. - 227. T. Yamagata, M. Mihara, T. Hamamoto, Y.Murai, T. Kobayashi, M. Yamada and H. Ozaki, "A 288-kb fully parallel content addressable memory using a stacked-capacitor cell structure," *IEEE J. Solid-State Circuits*, vol. 27, no. 12, pp. 1927-1933, Dec.1992. - 228. J. P. Wade, C.G. Sodini, "Dynamic cross-coupled bitline content addressable memory cell for high density arrays," in *Proc. Int. Electron Devices Meeting, IEDM*, 1985, vol. 31, pp. 284-287. - 229. S. Jones, "Design, selection and implementation of a content-addressable memory for VLSI CMOS chip architecture," in *Proc. IEEE Computers and Digital Techniques*, May 1988, vol. 135, no. 3, pp. 165-172. - 230. K. Pagiamtzis and A. Sheikholeslami, "Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey," *IEEE J. Solid-State Circuits*, vol. 41, no. 3, pp. 712–727, Mar. 2006. - 231. A. Roth, D. Foss, R. McKenzie, and D. Perry, "Advanced ternary CAM circuits on 0.13 m logic process technology," in *Proc. IEEE Custom Integrated Circuits Conf.* (CICC), Oct. 2004, pp. 465–468. - 232. I. Arsovski, T. Chandler, and A. Sheikholeslami, "A ternary content-addressable memory (TCAM) based on 4T static storage and including a current-race sensing scheme", *IEEE J. Solid-State Circuits*, vol. 38, no. 1, pp.155 -158, Jan. 2003. - 233. S. Choi, K. Sohn, M.-W. Lee, S. Kim, H.-M. Choi, D. Kim, U.-R. Cho, H.-G. Byun, Y.-S. Shin, and H.-J. Yoo, "A 0.7 fJ/bit/search, 2.2 ns search time hybrid type TCAM architecture," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, 2004, pp. 498–499. - 234. S. Choi, K. Sohn, and H.-J. Yoo, "A 0.7 fJ/bit/search, 2.2-ns search time hybrid-type TCAM architecture," *IEEE J. Solid-State Circuits*, vol. 40,no. 1, pp. 254–260, Jan. 2005 - 235. M. Sultan, M. Siddiqui, Sonika and G. S. Visweswaran, "A low-Power Ternary Content Addressable Memory (TCAM) With Segmented And Non-Segmented Matchlines" in *IEEE Region 10 Conf.*, TENCON, Hyderabad, Nov. 2008, pp. 1-5. - 236. N. Mohan and M. Sachdev, "Low-Leakage Storage Cells for Ternary Content Addressable Memories," *IEEE transactions on VLSI systems*, vol. 17, no. 5, pp. 604-612, May 2009. - 237. S. Kumar, A. Noor, K., B. K. Kaushik, and B. Kumar, "Design of Ternary Content Addressable Memory (TCAM) with 180 nm," In *Proc. Int. Conf. Devices and Communications (ICDeCom)*, Mesra, Feb 2011, pp. 1-5. - 238. J.G. Delgado-Fries, A. Yu and J. Nyathi, "A dynamic content addressable memory using a 4-transistor cell," *3<sup>rd</sup> IEEE Int. workshop Design Mixed-Mode Integrated Circuits and Applications*, Puerto Vallarta, 1999, pp. 110-113. - 239. V. Lines, A. Ahmed, P. Ma, and S. Ma, "66 MHz 2.3M ternary dynamic content addressable memory," in *Record IEEE Int. Workshop Memory Technology, Design and Testing*, San Jose, CA, Aug. 2000, pp. 101–105. - 240. H. Noda, K. Inoue, H. J. Mattausch, K. Koide, and K. Arimoto, "A cost-efficient dynamic ternary CAM in 130 nm CMOS technology with planar complementary capacitors and TSR architecture," in *Symp. VLSI Circuits Dig. Tech. Papers*, Kyoto, Japan, Jun 2003, pp. 83–84. - 241. H. Noda, K. Inoue, M. Kuroiwa, F. Igaue, K. Yamamoto, H. J. Mattausch, T. Koide, A. Amo, A. Hachisuka, S. Soeda, I. Hayashi, F. Morishita, K. Dosaka, K. Arimoto, K. Fujishima, K. Anami, and T. Yoshihara, "A cost-efficient high-performance dynamic TCAM with pipelined hierarchical search and shift redudancy architecture," *IEEE J. Solid- State Circuits*, vol. 40, no. 1, pp. 245–253, Jan. 2005 - 242. J. G. Delgado-Frias, J. Nyathi, and S. B. Tatapudi, "Decoupled Dynamic Ternary Content Addressable Memories," *IEEE Trans. Circuits and Systems I: Regular Papers*, vol. 52, no. 10, pp. 2139-2147, Oct. 2005. - 243. S. Lin, Y. B. Kim, F. Lombardi, and Y. J. Lee, "A new SRAM cell design using CNTFETs," in *Proc. IEEE Int. SoC Design Conf., ISOCC, Busan*, Nov. 2008, vol. 1, pp. 168–171, - 244. S. Lin, Y. B. Kim, and F. Lombardi, "Design of a CNTFET-Based SRAM Cell by Dual-Chirality Selection," *IEEE Nanotechnology*, vol. 9, no. 1, pp. 30-37, Jan. 2010. - 245. Berkeley Predictive Technology Model website [Online]. Available: http://www.eas.asu.edu/~ptm/. - 246. Young Bok Kim, Yong-Bin Kim, F. Lombard and Y. J. Lee, "A Low Power 8T SRAM Cell Design technique for CNFET," in *Proc. Int. SoC Design Conf.*, 2008, pp. 176-179. - 247. K. You and K. Nepal, "Design of a ternary static memory cell using carbon nanotube-based transistors," *Micro & Nano Letters*, vol. 6, no. 6, pp. 381-385, 2011. - 248. S. Lin, Y.-B. Kim and F. Lombardi, "Design of a Ternary Memory Cell Using CNTFETs," *IEEE Trans. Nanotechnology*, vol. 11, no. 5, Sep. 2012. - 249. D. Das, A. S. Roy, and H. Rahaman, "Design of Content Addressable Memory Architecture using Carbon Nanotube Field Effect Transistors," *Progress in VLSI Design and Test*, vol. 7373, pp. 233-242, 2012. - 250. K. Nepal and K. You, "Carbon nanotube field effect transistor-based content addressable memory architectures," *Micro & Nano Letters*, vol. 7, no. 1, pp. 20–23, 2012. - 251. R.P. Hallworth and F. G. Heath, "Semiconductor circuits for ternary logic," in *IEE Proc. Part C: Monographs*, Mar. 1962, vol. 109, no. 15, pp. 219-225. - 252. S. L. Murotiya and A.Gupta, "Design of CNTFET based 2-bit ternary ALU for nanoelectronics," *Int. J. Electronics*, vol. 101, no. 9, pp. 1244-1257, Sep. 2014. - 253. S. Sinha, A. Balijepalli and Y. Cao, "Compact Modeling of Carbon Nanotube Transistor and Interconnects," *IEEE Trans. Electron Device*, vol. 56, pp. 232 2242, 2009. - 254. S. L. Murotiya, A. Gupta and S. Vasishth, "Novel design of ternary magnitude comparator using CNTFETs," in *IEEE Annual India Conference (INDICON)*, Pune, Dec.2014, pp. 1-4. - 255. S. L. Murotiya and A. Gupta, "Design of High Speed Ternary Full Adder and Three-Input XOR Circuits Using CNTFETs," in 28<sup>th</sup> Int. Conf. VLSI Design (VLSID), Bangalore, Jan. 2015, pp. 292 297. - 256. S. L. Murotiya, A. Gupta and S. Vasishth, "CNTFET-based design of dynamic ternary full adder cell," in *IEEE Annual India Conference (INDICON)*, Pune, Dec.2014, pp. 1-5. - 257. S. L. Murotiya and A. Gupta, "A Novel design of Ternary Full Adder using CNTFETs," *Arabian J. Science Engineering*, vol. 39, no. 11, pp. 7839-7846, Nov. 2014. - 258. E. Seevinck, F. J. List, and J. Lohstroh, "Static-noise margin analysis of MOS SRAM cells," *IEEE J. Solid-State Circuits*, vol. SSC-22, no. 5, pp. 748–754, Oct. 1987. - 259. B. C. Paul, S. Fujitam, M. Okajima, T. H. Lee, H.-S. P. Wong and Y. Nishi, "Impact of a process variation on nanowire and nanotube device performance," *IEEE Trans. Electron. Devices*, vol. 54, no. 9, pp. 2369–2376, Sep. 2007. - 260. S. L. Murotiya and A. Gupta, "Hardware-efficient low-power 2-bit ternary ALU design in CNTFET technology," *Int. J. Electronics*, Taylor & Francis, Sep. 2015. DOI:10.1080/00207217.2015.1082199 ## **APPENDIX I** Figure 1: Transient waveform for half adder-subtractor (HAS) of 2-bit HO-TALU **Figure 8:** Transient waveform for half adder-subtractor-exclusive-OR (HASE) of 2-bit POTALU ## **APPENDIX II** Table 1: Simulation results of HO-TALU ternary circuits at architecture level | Circuits | Delay (×10 <sup>-10</sup> S) | Power (×10 <sup>-6</sup> W) | PDP<br>(×10 <sup>-16</sup> J) | |-------------------------------------------------------------|------------------------------|-----------------------------|-------------------------------| | CNTFET-based THA of [199] | 0.72 | 1.04 | 0.75 | | CNTFET-based HAS for addition operation ( <b>proposed</b> ) | 0.73 | 1.05 | 0.77 | | CNTFET-based TFA of [199] | 0.83 | 1.49 | 1.24 | | CNTFET-based FAS for addition operation ( <b>proposed</b> ) | 0.85 | 1.54 | 1.31 | | 2-bit multiplier of [73] using CNTFETs | 1.99 | 23.7 | 47.16 | | CNTFET-based 2-bit multiplier ( <b>proposed</b> ) | 1.47 | 7.88 | 11.58 | | 2-bit comparator of [73] using CNTFETs | 0.83 | 1.03 | 0.85 | | CNTFET-based 2-bit comparator ( <b>proposed</b> ) | 0.51 | 0.68 | 0.35 | | 1-bit exclusive-OR of [73] using CNTFETs | 0.72 | 1.04 | 0.75 | | CNTFET-based 1-bit exclusive-OR (proposed) | 0.71 | 0.63 | 0.45 | **Table 2:** Simulation results of CNTFET-based ternary full adder (TFA) designs at architecture level | Circuits | Delay (×10 <sup>-10</sup> S) | Power (×10 <sup>-6</sup> W) | PDP(×10 <sup>-16</sup> J) | |-------------------|------------------------------|-----------------------------|---------------------------| | HS-TFA (proposed) | 0.75 | 6.92 | 5.19 | | LP-TFA (proposed) | 1.01 | 1.46 | 1.47 | | DTFA (proposed) | 0.99 | 1.64 | 1.62 | | TFA of [199] | 0.84 | 1.95 | 1.64 | | TFA of [212] | 1.56 | 4.41 | 6.88 | | TFA of [213] | 0.88 | 53.7 | 47.26 | Table 3: Simulation results of 2-bit comparator circuits for process variations | Circuits | Delay (×10 <sup>-10</sup> S) | Power (×10 <sup>-6</sup> W) | PDP (×10 <sup>-16</sup> J) | CNT<br>Diameter<br>Variation | |---------------------|------------------------------|-----------------------------|----------------------------|------------------------------| | Proposed comparator | 0.29 | 0.49 | 0.14 | +10 % | | | 0.45 | 0.39 | 0.18 | -10 % | | Comparator of [206] | 0.31 | 0.6 | 0.19 | +10 % | | | 0.56 | 0.44 | 0.25 | -10 % | **Table 4:** Simulation results of CNTFET-based PO-TALU ternary circuits at architecture level | Circuits | <b>Delay</b> (×10 <sup>-10</sup> S) | Power (×10 <sup>-6</sup> W) | PDP (×10 <sup>-16</sup> J) | |----------------------|-------------------------------------|-----------------------------|----------------------------| | Proposed THA | 0.64 | 0.51 | 0.33 | | THA of [199] | 0.72 | 1.51 | 1.09 | | <b>Proposed</b> HASE | 0.69 | 0.53 | 0.37 | | HAS of [252] | 0.73 | 1.50 | 1.10 | | <b>Proposed</b> TFA | 0.85 | 0.88 | 0.75 | | TFA of [256] | 0.99 | 1.64 | 1.47 | | TFA of [257] | 1.01 | 1.46 | 1.47 | | TFA of [199] | 0.84 | 1.95 | 1.64 | | TFA of [255] | 0.75 | 6.92 | 5.19 | | TFA of [212] | 1.56 | 4.41 | 6.88 | | TFA of [213] | 0.88 | 53.7 | 47.26 | | Proposed FASE | 0.89 | 0.90 | 0.80 | | FAS of [252] | 0.85 | 1.99 | 1.69 | **Figure 1:** Power consumption versus operating frequency plot for ternary half adder (THA) designs **Figure 2:** Power consumption versus operating frequency plot for ternary multiplier designs **Figure 3:** Power-delay product (PDP) versus supply voltage plot for ternary half adder (TFA) designs **Figure 4:** Power-delay product (PDP) versus supply voltage plot for ternary multiplier designs ## **LIST OF PUBLICATION** #### **Publication in Peer Reviewed Journals:** - S. L. Murotiya and A. Gupta, "Design of CNTFET based 2-bit ternary ALU for nanoelectronics," *Int. J. Electronics*, Taylor & Francis, vol. 101, no. 9, pp. 1244-1257, Aug. 2013. - S. L. Murotiya and A. Gupta, "A Novel design of Ternary Full Adder using CNTFETs," *Arabian J. Science Engineering*, Springer, vol. 39, no. 11, pp. 7839-7846, Nov. 2014. - 3. **S. L. Murotiya** and A. Gupta, "Design of content-addressable memory cell using CNTFETs," *Int. J. Electronics Letters*, Taylor & Francis, vol. 3, no. 3, pp. 131-138, May 2014. - 4. **S. L. Murotiya** and A. Gupta, "Hardware-efficient low-power 2-bit ternary ALU design in CNTFET technology," *Int. J. Electronics*, Taylor & Francis, Sep. 2015. DOI:10.1080/00207217.2015.1082199 #### **Publication in Peer Reviewed Conferences:** - 1. **S. L. Murotiya**, A. Gupta and S. Vasishth, "Novel design of ternary magnitude comparator using CNTFETs," in *IEEE Annual India Conference (INDICON)*, Pune, Dec.2014, pp. 1-4. - 2. **S. L. Murotiya** and A. Gupta, "Design of High Speed Ternary Full Adder and Three-Input XOR Circuits Using CNTFETs," in 28<sup>th</sup> Int. Conf. VLSI Design (VLSID), Bangalore, Jan. 2015, pp. 292 297. - 3. **S. L. Murotiya** and A. Gupta, "CNTFET based design of content addressable memory cells," in 4<sup>th</sup> *IEEE Int. Conf. Computer and Communication Technology (ICCCT)*, Allahabad, Sep. 2013, pp. 1-4. - 4. **S. L. Murotiya**, A. Gupta and S. Vasishth, "CNTFET-based design of dynamic ternary full adder cell," in *IEEE Annual India Conference (INDICON)*, Pune, Dec.2014, pp. 1-5. # **BRIEF BIOGRAPHY OF THE CANDIDATE** Snehlata Murotiya received B.E. (Electronics & Communication Engineering) from Jai Narain Vyas University, Jodhpur (Rajasthan) in 2006. She obtained M.E. with specialization in Microelectronics from Birla Institute of Technology & Science, Pilani in 2009. She joined Birla Institute of Technology & Science, Pilani in 2009 as a Lecturer in Electrical & Electronics Engineering Department. From 2010 onwards, she is pursuing her doctoral research in the area of Digital VLSI Circuits and Architectures for nanotechnology. She has over 5 research publications in reputed peer reviewed international journals, and 9 in national/international conference proceedings. During her M.E. program, she received GATE scholarship. She also received scholarship from NXP Semiconductors (formally a part of Philips India Limited). ## BRIEF BIOGRAPHY OF THE SUPERVISOR Dr. Anu Gupta received M.Sc (Physics-Electronics) in 1988 from Delhi University, M.E and Ph.D. degrees from Birla Institute of Technology and Science (BITS) Pilani in 1995 and 2003, respectively. In 1995, she joined BITS Pilani as Assistant Lecturer. In 2003, she was designated as Assistant Professor, Associate Professor (in 2010) at BITS PILANI. Her research interest includes Low Power, High Performance Analog/ Digital/ Mixed signal design for FPGA/ ASIC applications. She has over 77 research publications (of which 23 are in reputed peer reviewed international and national journals, and 54 are in conference proceedings). She is guiding three PhD candidates currently. She is member of IEEE and VLSI Society of India, and Life Fellow of IETE.