# Parallel Multipliers: Architectural Exploration and Designing ## **THESIS** Submitted in partial fulfillment of the requirements for the degree of **DOCTOR OF PHILOSOPHY** Ву **SUBHENDU KUMAR SAHOO** Under the Supervision of **DR. CHANDRA SHEKHAR** BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE PILANI (RAJASTHAN), INDIA 2006 # **Chapter 7** # **Conclusion** In this chapter, we present a summary of the results of investigations made in the previous chapters of this thesis. We also highlight the novelty aspects of some of the contributions made under this thesis work and also point out the areas for further investigation. ## 7.1 Summary of work done The primary objective of this thesis has been to explore different architectures for parallel multipliers, evaluate them and suggest some changes in the architectures to improve their performance. First chapter of the thesis presents the steps in parallel multiplication. Second chapter discusses two partial product generation methods. In the first method using radix-4 Booth algorithm, different circuits are explored and a new circuit is proposed, which is shown to perform better in terms of transistor count, delay and power as compared to the known circuits. In the second method (radix-64), a technique is used which reduces the delay in partial product generation for parallel multipliers using radix-64 encoding. In the third chapter partial product row accumulation methods like carry save adders (Array multiplier), 3:2 compressors in Wallace tree, 4:2 compressors in Wallace tree and RB adders are explored. For each accumulation method, the worst-case delay for multipliers of operand size 8, 16, 32, 54 and 64 bits are obtained in terms of $T_{XOR}$ . This gives an estimate of the accumulation delay for different sizes of multipliers, using the various accumulation methods. A novel 4:2 compressor circuit is reported, which outperforms the best reported 4:2 compressors in terms of energy delay product. Fastest adder architecture CLA is explored in the fourth chapter. A novel architecture for addition as well as RB to NB conversion based on RB arithmetic is also presented. Theoretical as well as simulation results prove that this novel architecture is faster than the CLA. In the fifth chapter based on different combinations of schemes for partial product generation, accumulation and final addition, 10 different architectures are defined. Theoretical calculations for critical paths for all the multipliers are made. Prior to designing a multiplier for a specific delay requirement, these calculations can be used to decide a suitable architecture. For multipliers with operand sizes of 8, 16, 32, 54 and 64 bit the theoretical worst-case delays for all ten architectures are obtained. Again all these multipliers are synthesized using Magma EDA tool. From the synthesis results best architecture (among the architectures defined by us) in terms of the specific figure of merit (T, A, P, ATP, AT<sup>2</sup>) is pointed out for any operand size in chapter six. These results can be directly used in designing need specific multipliers. ### 7.2 Future work This section indicates the areas that may be further explored to extend the scope of present work. #### 7.2.1 Exploration and designing of serial parallel multiplier architectures Digital signal processing is used in a wide range of applications such as telephone, radio, video, sonar, etc. The sample rate requirements vary from application to application and can range anywhere from 10KHz to 100MHz. Real time implementation of these systems requires hardware architectures which can process input signal samples as they are received, as opposed to storing them in memory and processing them in batch mode. Bit serial systems, which processes one bit of the input sample in one clock cycle, are area efficient and ideal for low speed applications. On the contrary, in parallel systems one whole word is processed in single clock cycle and this is suited for high-speed applications. But for moderate speed applications, both these scheme will be inefficient. In such a situation serial parallel systems will be an efficient solution. Such system will use most of the basic units of parallel multipliers. Exploration of serial parallel multipliers along with the novel ideas presented in this thesis can result in better multipliers for such applications. For a multiplier designer, this will be an additional dimension to explore to choose the optimum application specific multiplier architecture. #### 7.2.2 Development of a multiplier synthesis tool In the present work, all multipliers are synthesized using specific process technology parameters like supply voltage as 1.08V and technology library as 130 nm. We can slightly change the supply voltages and can get summary of figures of merit. Similarly we can use different technology libraries and obtain performance of all the architectures for different operand size in terms of figures of merit. This extensive exploration can be built into a tool that can be used to get the most suited multiplier architectural choice for a specific technology library. This can result in the design of an efficient multiplier synthesis tool, which can synthesize a multiplier as per a designers preferred figure of merit.