- Open Access
- Authors : Prity Mishra
- Paper ID : IJERTV13IS030228
- Volume & Issue : Volume 13, Issue 03 (March 2024)
- Published (First Online): 03-04-2024
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
FPGA Realization of Single-Cycle, 32-Bit Booth Recoding Signed Multiplier Enhanced by High-Speed Compressors
Prity Mishra
Electronics and Telecommunication Engineering International Institute of Information Technology, Bhubaneshwar, Odisha, India
AbstractA multiplier is one of the integral and frequently used parts of every digital system. In this paper, a 32×32-bit signed multiplier has been implemented using the booth recoding algorithm and high-speed compressors. This in turn facilitates hardware reduction as partial product rows are significantly reduced, thus improving efficiency in terms of speed and power consumption. An analytical comparison of this multiplier with the conventional multiplier has also been provided. This booth recoding multiplier has been designed using the SYSTEM VERILOG HDL and FPGA realization has been achieved through Xilinx Kintex-7 FPGA in Xilinx Vivado 2021.2 software.
KeywordsBooth Recoding; 32×32 bit signed multiplier; Agile compressors; Parallel addition; Verilog HDL; FPGA.
-
INTRODUCTION
Low-power VLSI circuits have emerged as critical criteria for designing energy-efficient electronic systems tailored for high- performing portable devices. Across various domains such as microprocessors, image and video processing, digital filters, error correction and coding, neural networks, and machine learning, multipliers play an integral role in enhancing computational efficiency. This paper presents the implementation of a 32×32-bit signed multiplier using the Booth recoding algorithm and high-speed agile compressors to improve processing speed. The inputs can be either signed or unsigned.
Conventional multiplication relies on the partial products method, where each bit of the multiplier is multiplied with the multiplicand to form partial product rows. Subsequently, these rows are left-shifted and summed to produce the final result.
A similarity is observed in case of multiplication of two binary numbers. This process is simplified as it involves only two digits1 and 0whereby multiplication with 1 results in a direct copy and same with 0 is discarded. However, for numbers with a high number of bits, processing time increases, leading to elevated time complexity and hardware demands.
In this paper, the focus will be on overcoming those major drawbacks with the proposed multiplier design. The paper has been divided into subsequent sections as follows. Section II provides a comprehensive overview of the traditional booth recoding algorithm. Section III goes into detail explaining the operation of the high-speed, agile compressors. The comparative analysis and the experimental results have been included in Section IV, followed by the concluding remarks in Section V.
-
CONVENTIONAL BOOTH RECODING ALGORITHM
-
Abstract block diagram of top module of the multiplier and the signal description
For booth recoding algorithm-based multiplier, two 32-bit inputs are required to represent the multiplicand and multiplier. In addition to that, two control signals are also fed to indicate whether the multiplier and multiplicand are signed or unsigned binary values. The output is a 64-bit value.
32
64
32
mplier
mplier_s_u
prod
mplicand mplicand_s_u
Table 1:Signal Description of Top Module Figure 1:32-bit Booth Recoding Multiplier
Signal name
Width
Source
Description
mplier
32
input
Top module multiplier input
mplier_s_u
1
input
1multiplier is signed 0multiplier is unsigned
mplicand
32
input
Top module multiplicand input
mplicand_s_u
1
input
1multiplicand is signed
0multiplicand is
unsigned
prod
64
output
Output of the multiplier block
-
Mathematical representation of signed and unsigned numbers
Signed number:
2s complement representation of A:
=23131 + 23030 + (2 1)22929
15
=
=0
221 21
+22828 + (2 1)22727
+22626 + (2 1)22525
..
..
+222 + (2 1)211
+200 + 201 where 1 0
=23131 + 23030 + 23029
22929 + 22828 + 22827
22727 + 22626 + 22625
..
..
233 + 222 + 221
211 + 200 + 201 where 1 0
= 230(2 31 + 30 + 29)
+228(2 29 + 28 + 27)
+226(2 27 + 26 + 25)
..
..
+22(2 3 + 2 + 1)
+20(2 1 + 0 + 1) where 1 0
where 21 = 2 2 + 21 + 22 (2)
When equations (1) and (2) are compared, it is evident that preprocessing of integer A can be done before recoding the basic signed and unsigned integer.
Table 2:Truth Table for F-Block Values
+
(+/)
(× 1/0)
(× 2/0)
0
0
0
0
0
0
0
0
0
1
1
0
1
0
0
1
0
1
0
1
0
0
1
1
2
0
1
1
1
0
0
-2
1
1
1
1
0
1
-1
1
1
0
1
1
0
-1
1
1
0
1
1
1
0
0
0
0
=
15
=0
= 15
22( 2 2+1
22 2
+ 2
+ 21)
-
Sub-modules of the Booth Recoding Multiplier
-
33rd bit extender unit
=0
where 2 = 2 2+1 + 2 + 21 (1)
Unsigned number:
2s complement representation of A:
=23232 + 23131 + 23130
23030 + 22929 + 22928
22828 + 22727 + 22726
Here, a signed multiplier is being implemented to performed unsigned operations. Thus, in order to cover the complete range of unsigned multiplication, 1 extra bit is being extended at MSB in the next step. The 33rd bit for unsigned and signed number is
mplier_s_u
..
..
222 + 211 + 210
200 + 211 + 212 where 1 = 2 0
mplier[31:0]
0
mplier[31] 1
mplier[31:0]
mux_out
a[32:0]
=231(2 32 + 31 + 30)
+229(2 30 + 29 + 28)
+227(2 28 + 27 + 26)
mplicand[31:0]
mplicand[31:0]
mplicand[31]
1
mux_out
0
b[32:0]
..
..
+21(2 2 + 1 + 0)
+21(2 0 + 1 + 2) where 1
= 2 0
mplicand_s_u
zero and sign-extended respectively as shown in Fig.2 below.
=
16
=0
221( 2 2
+ 21
+
22)
Figure 2:33rd-bit extender unit block diagram
-
Preprocessing/F block formation unit
-
In this unit, a 0 is added at the least significant bit (LSB) of the a(Here, a is the 32-bit input to pre-processing block which is also the output of the 33rd bit extender unit)
-
Following that, blocks of 3 bits each are made. The most significant bit (MSB) of the previous block is made the least significant bit (LSB) of current one.
-
As a 33-bit number a is in consideration, so after addition of extra bit of 0 in the least significant bit (LSB) position, we have a shortage of one bit needed to form the F-block.
-
To counter this, a bit is again added to the most significant bit (MSB), which is the same as the 33rd bit, irrespective of whether the number is signed or unsigned.
Table 3:Values after extension
-
-
Partial product generation unit
There will be 16 rows of partial products of the 16 F-blocks that
21
2
2+1
21
2
2+1 1
Extra bit added for F-block formation
Extended 33-bit number (output of the 33rd bit extender unit)
0 added in
LSB
a[32]
a[32:0]
0
21
2
0
1
2
Total width of the number after pre-processing becomes 34- bits. Pre-processing is just a simple rewiring process.
Table 4:Grouping for F-Block
2+1
Figure 3:Partial Product Generator
F0
{ 1, 0, 0}
F2
{ 3, 2, 1}
F4
{ 5, 4, 3}
F6
{ 7, 6, 5}
F8
{ 9, 8, 7}
F10
{ 11, 10, 9}
F12
{ 13, 12, 11}
F14
{ 15, 14, 13}
F16
{ 17, 16, 15}
F18
{ 19, 18, 17}
F20
{ 21, 20, 19}
F22
{ 23, 22, 21}
F24
{ 25, 24, 23}
F26
{ 27, 26, 25}
F28
{ 29, 28, 27}
F30
{ 31, 30, 29}
F32
{ 32, 32, 31}
have been generated. Using Table 4, , 1 and 2 have been
obtained and will be used to do the necessary bit manipulation to get the required partial products. The hardware realization for
, 1 and 2 is given above(Fig.3).
Note: is high when f2i is -ve, otherwise low.
1 is high when f2i0, otherwise low.
2 is high when f2i = ±2, otherwise low.
All the F-blocks need to be processed individually to find the respective , 1 and 2.
Using booth recoding algorithm, our partial products will be reduced. There will only be 17 rows of partial product to be added. For example, to generate ROW_0 partial product, we
will need to operate 0, 01 and 02 in b(33-bit multiplicand; output of preprocessing unit).Similarly, other rows are operated upon to get the respective partial product rows.
b[32]
b[32]
b[1]
b[0]
b[31]
0
1
2
0
1
0
1
2
1
Sign Extension Region
Where n=0,2,4,.,32
row_n[63] row_n[34]
row_n[33]
row_n[32]
row_n[1]
row_n[0]
-
Partial product addition unit
-
Figure 4:Row#n calculation using Fn2, Fn1 and Fn
utilising compressors over traditional adders aims to further augment computational speed and efficiency, thereby enhancing
This unit undertakes the summation of all partial products,
denoted as ROW#0 through ROW#32. Carries are efficiently propagated diagonally leftward and downward to enhance computational speed. However, during the processing of the final row, ROW#32, which lacks a row beneath it, horizontal carry propagation becomes necessary.
It is imperative to acknowledge that horizontal carry propagation is viable due to the unoccupied augend, as evidenced by the absence of a subsequent row. Notably, in cases where the value of f2i is negative, the computation necessitates the derivation of the 2's complement of the number "b". Despite the depiction in Fig.4, the operational scope is confined to XOR operations. Furthermore, the addition of 1 to the Least Significant Bit (LSB) position, in alignment with the binary values of the respective ROWs, is indispensable.
To address this requirement, an additional ROW, denoted as ROW#-1, is introduced. This supplementary row serves the purpose of facilitating the addition of 1 to the aligned LSB position for the relevant ROWs, ensuring computational accuracy and integrity within the system architecture.
Conventional approaches in multiplier design have often relied on the utilization of half and full adders. However, our approach diverges from tradition as we opt for the integration of high- speed agile compressors within our multiplier architecture. Critical path in conventional addition is longer compared to compressors. We use this fact and our strategic choice of
the overall performance of the multiplier.
-
-
COMPRESSORS
Binary multiplication involves the execution of numerous partial product additions. In conventional multiplier architectures, full adders are conventionally employed for this purpose. However, full adders can only accommodate the addition of up to three inputs simultaneously. Consequently, when adding all partial products, a substantial number of full adders are necessitated. To mitigate the requirement for a vast number of adders in this operational scheme, we introduce high-speed compressors. These compressors are developed based on the principle of binary counter properties. Among the various compressor designs, the 5-3 compressor stands out, as it allows the addition of up to five inputs simultaneously, yielding a three-bit output. This innovation serves to streamline the computational process and optimize resource utilization within the multiplier architecture. Table-5 showcases the counter property of 5-3 compressor.
Input Condition
Out3
Out2
Out1
Decimal value
All inputs are zero
0
0
0
0
Any one input is one
0
0
1
1
Any two input are one
0
1
0
2
Any three input are one
0
1
1
3
Any four input are one
1
0
0
4
Any five input are one
1
0
1
5
Table 5:Counter-Property of a 5:3 Compressor
A B C D E
Different compressors:-their design and sub-units
Some basic different compressors architecture re designed and implemented by using the same logic. Their block diagrams are as follows
Out 3 Out 2 Out 1
Full adder
Full adder
Half adder
Figure 6:Structure of a 5:3 Compressor
A B C D
Full adder
A
B
S5
C D
S4 S3
E
F G
S2 S1
Half adder
Half adder
Parallel Addition
Full Adder
4:3 compressor
Out 3 Out 2 Out 1
Figure 5:Structure of a 4:3 compressor
Out 3 Out 2 Out 1
Figure 7:Structure of a 7:3 Compressor
5:3 compressor
Full Adder
Full Adder
Full Adder
Parallel Addition
S2
S3
S5
S6
5:3 compressor
Full Adder
Full Adder
A B C
D E F
G H I
J K L
M N O
S4 S1
Out 4 Out 3 Out 2 Out 1
Figure 8:Structure of a 15:4 Compressor
ull
dder
Half
dder
S S4 S2
Half
dder
2 1
S3 S1
S S
Half
dder
ull
dder
Half
dder
2
S3
1
S4 S2 S1
ut 3 ut 2 ut 1
ut 4 ut 3 ut 2
ut 1
Figure 9:Parallel Addition Unit for 7:3 compressor Figure 10:Parallel Addition Unit for15:4 compressor
4
3
4 3
21
H
H
1 input bits
4 input bits
row2 1 row0 3 2
row2 0
row0 2
row0 1
0 row0 0
4 carry bits
3
2 1
3 3 2 1 0
H Half dder
ull dder
Figure 11:Partial Product Addition Using High-speed Agile Compressor
-
RESULTS AND COMPARISON
The implemented single cycled, 32-bit signed multiplier using booth recoding algorithm and high speed compressors has been successfully verified through a written testbench script. The code was written and executed on the Xilinx Vivado 2021.2 platform. The synthesized design was targeted towards the Kintex-7 FPGA, specifically the Device- XC7K70T with Package- FBG484 and speed-1. Table-6 presents a comparison between the implemented single-cycle 32-bit signed multiplier and the 32-bit complex Vedic multiplier [2].
Table 6:A Comparison with The Vedic Multiplier
Parameters
32-bit Vedic Multiplier
Implemented 32-bit Signed Multiplier
Improvement
Slice LUTs
7874
1978
74.88%
Bonded IOBs
258
130
49.61%
Table 7:FPGA Hardware Utilization
Parameter
Utilization
Bonded IOB
130
Slice
SLICEL
310
SLICEM
243
LUT as Logic
Using o6 output only
1713
Using o5 and o6
264
Figure 12:Behavioural Simulation Results
(On Chip)
Figure 13:RTL Schematic of implemented multiplier
-
CONCLUSION
The architecture of the multiplier proposed by us in this paper can multiply two signed as well as unsigned 32-bit integers using booth recoding algorithm and high speed agile compressors. The booth recoding algorithm significantly reduces the number of partial product rows which in turn improve the speed as compared to conventional multipliers and the high speed compressors have a much shorter critical path as compared to traditional adders. The implemented multiplier's performance has been examined using various parameters, including power and hardware utilization. Compared to the 32-bit complex Vedic multiplier [2], the slice LUTs and bonded IOBs utilizations are significantly better, with improvements of 74.88% and 49.61%, respectively.
Figure 14:On-Chip Power
REFERENCES
-
Saha, P., Banerjee, A., Bhattacharyya, P., and Dandapat, A. (2011, January). High speed SI design of complex multiplier using vedicmathematics. In Students Technology Symposium (TechSym), 2011 IEEE (pp. 237-241). IEEE.
-
Ankush Nikam, Swati Salunke, Sweta Bhurse. Design and Implementation of 32bit omplex ultiplier using Vedic lgorithm IJERT ,2015 March,Vol 4.
-
. Dandapat, S. Ghosal, P. Sarkar, D. ukhopadhyay, 1.2-ns16×16- Bit Binary ultiplier Using High Speed ompressors, International Journal of Electrical and Electronics Engineering, 2010.
-
Shubhajit Roy Chowdhury, Aritra Banerjee, Aniruddha Roy, Hiranmay Saha,Design, Simulation and Testing of a High Speed Low Power 1 -4 Compressor for High Speed Multiplication pplications, irst International Conference on Emerging Trends in Engineering and Technology, 2008.
-
. orris ano, Digital Design,3rd edition, Prentice Hall,2002..