- Open Access
- Authors : Piyush Pati
- Paper ID : IJERTV12IS030115
- Volume & Issue : Volume 12, Issue 03 (March 2023)
- Published (First Online): 25-03-2023
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
FPGA Implementation of Single Cycle Signed Multiplier using Booth Recoding Algorithm
Piyush Pati
Electronics and Instrumentation Engineering Odisha University of Technology and Research, Bhubaneswar, Odisha, India
AbstractThe focus of this paper is on the implementation of a single cycle signed multiplier through use of the booth recoding algorithm on an FPGA. By utilizing fewer partial products, this implementation offers benefits such as reduced delay, power consumption, and usage of hardware resources. Additionally, this signed multiplier is capable of performing multiplication of both signed and unsigned numbers. The paper presents a comparative analysis of the 32-bit multiplier's performance in terms of power consumption and FPGA hardware resource utilization. The proposed 32-bit multiplier is designed using Verilog HDL and implemented through Xilinx Vivado 2022.2 software for Xilinx Virtex-7 FPGA.
KeywordsBooth recoding; Signed Multiplier; FPGA; Verilog HDL.
-
INTRODUCTION
A
B
Y
0
0
0
0
1
0
1
0
0
1
1
1
The process for binary multiplication is analogous to that of the decimal system. The regulations governing binary multiplication are illustrated in the following table.
Binary multiplication offers several benefits. It involves adding the multiplicand to itself, after a suitable shift based on the multiplier, which simplifies the process into a series of shifting and adding steps. These steps should be repeated until the MSB of the multiplier has been shifted, and the final addition has been performed.
The paper is organized as follows. Section II presents the design of booth recoding multiplier. Section III contains the sub-modules of implemented booth recoding multiplier. Section IV describes partial product addition unit. Experimental results and comparison are given in Section V. Finally, Section VI concludes the paper.
-
DESIGN OF BOOTH RECODING MULTIPLIER
-
32×32 bit multiplier(Heading 2)
The Booth recoding multiplier requires two 32-bit inputs representing the multiplier and multiplicand, respectively. The output of the multiplier is 64-bit. Additionally, two control signals are included to specify whether the multiplier and multiplicand are signed or unsigned.
Table 1:1-bit binary multiplication
In binary arithmetic, when multiplying two numbers, each bit of the multiplier is multiplied by the multiplicand one at a time, similar to how it's done in the decimal system. The resulting partial products are arranged such that the least significant bit (LSB) is positioned under the corresponding bit in the multiplier.
Once the partial products have been calculated through binary multiplication, they are summed together to form the final product.
It is important to keep in mind that any multiplication by zero would result in all bits of the partial product being zero, which can be skipped in intermediate steps.
Furthermore, when multiplied by 1, the bits of the
32-bit booth recoding multiplier
mplier mplier_s_u
mplicand mplicand_s_u
Signal Name
Width
Source
Description
mplier
32
input
Top module multiplier input
mplier_s_u
1
input
1= multiplier is signed, 0 =multiplier is unsigned
mplicand
32
input
Top module multiplicand input
mplicand_s_u
1
input
1 = multiplier is signed, 0 =multiplier is unsigned
prod
64
output
Output from the multiplier block
Figure 1: 32-bit booth recoding multiplier top module
Table 2: Signal Description of top module
prod
multiplicand remain unaltered, but they are shifted one bit position to the left. Performing intermediate sums of partial products simplifies the multiplication process of binary numbers.
-
Mathematical Representation of Signed Number
2s complement representation of A
= 23131 + 23030 + (2 1)2929 + 22828 + (2 1)2727 + 22626 +
(2 1)2525 + . + 222 + (2 1)11 + 200 + 201
where 1 0
21(22 + 1 + 0) +
21(20 + 1 + 2)
Where 1 = 2 0
= 23131 + 23030 + 23029
22929 + 22828 + 22827
=
16
=0
221(22 + 21 + 22)
22727 + 22626 + 22625
=
16
=0
2212
(2)
233 + 222 + 221
211 + 200 + 201
21 = 22 + 21 + 22
Comparing equation 1 and 2 it is apparent that one can
process the integer A before re-coding basic unsigned and signed integer.
+
(+/)
(x1/0)
(x2/0)
0
0
0
0
0
0
0
0
0
1
1
0
1
0
0
1
0
1
0
1
0
0
1
1
2
0
1
1
1
0
0
-2
1
1
1
1
0
1
-1
1
1
0
1
1
0
-1
1
1
0
1
1
1
0
0
0
0
where 1 0
= 230(23131 + 23030 + 23029) 228(22929 + 22828 + 22827)
226(22727 + 22626 + 22625)
22(233 + 222 + 221)
20(211 + 200 + 201)
where 1 0
=
15
=0
22(22+1 + 2 + 21)
=
15
=0
22 2 (1)
2 = 22+1 + 2 + 21
-
Mathematical Representation of Unsigned Number
= 23232 + 23131 + 23130
23030 + 22929 + 22928
22828 + 22727 + 22726
222 + 211 + 210
200 + 211 + 212
Where 1 = 2 0
= 231(232 + 31 + 30) + 229(230 + 29 + 28) +
Table 3:Booth recoding truth table
-
-
SUB-MODULES OF IMPLEMENTED BOOTH RECODING MULTIPLIER
-
33rd bit extension unit
Our implementation involves using a signed multiplier to perform an unsigned operation. To cover the entire range of unsigned multiplication, we add an extra bit at the MS. For unsigned numbers, the 33rd bit is zero, while for signed numbers, the 33rd bit is sign-extended.
F8
{9, 8, 7}
F10
{11, 10, 9}
F12
{13, 12, 11}
F14
{15, 14, 13}
F16
{17, 16, 15}
F18
{19, 18, 17}
F20
{21, 20, 19}
F22
{23, 22, 21}
F24
{25, 24, 23}
F26
{27, 26, 25}
F28
{29, 28, 27}
F30
{31, 30, 29}
F32
{32, 32, 31}
mplier_s_u
mplier [31]
mplier [31:0]
mplicand [31:0]
mplicand [31]
0
mux_out
1
1
mux_out
0
a [32:0]
b [32:0]
mplicand_s_u
Figure 2: 33rd bit extension unit block diagram
-
Preprocessing/F block formation unit
-
During the pre-processing phase, we will append a "0" to the least significant bit (LSB) of "a", which is a 32-bit input to the pre-processing block, and serves as the output of the 33rd bit extension unit.
-
Following that, we need to create blocks consisting of
Table 5: F Block Formation
-
-
Partial product generation
-
There will be 16 rows of partial products for each of the
16 "F" blocks. Table 3 was used to generate , 1 and 2
which are then used for the required bit manipulation to
generate the necessary partial products. The hardware
implementation for , 1 and 2 is provided below.
21
2
three bits each, where the least significant bit (LSB) of the current block becomes the most significant bit (MSB) of the previous block.
-
Since we are forming blocks in the 33-bit number "a", adding a "0" to the least significant bit results in a shortage of one bit required to form the F block.
-
To address this situation, we will add a number to the most significant bit (MSB) that is the same as the 33rd bit, regardless of whether we are generating signed or unsigned partial products.
2+1
21
2
2+1
21
2
1
0
1 2
Extra bit added for block formation
Extended 33-bit number (output of 33rd bit extension unit)
0 added in LSB
a [32]
a [32:0]
0
2+1
Table 4:Extended 34-bit multiplier
After preprocessing total width of the number is 34-bit.
Note: preprocessing of number a is nothing but just rewiring.
Figure 3: Hardware realization for , 1 and 2
Note: 1. is high when 2 is -ve, low otherwise.
-
1 is high for 2 0 ,low otherwise.
-
2 is high for 2 = ±2 ,low otherwise.
-
-
We need to process all the F block in this hardware to get
F0
{1, 0, 0}
F2
{3, 2, 1}
F4
{5, 4, 3}
F6
{7, 6, 5}
corresponding , 1 and 2. For example, if we process F0 block we can get the 0, 01 and 02. Similarly, we
can get the values for 2, 2
and 2 ——32, 32
1 2 1
and 322by processing corresponding F blocks.
-
As we are using booth recoding algorithm, so our partial
product will reduce. There will be 17 rows of partial
product which we need to add. For generating ROW#0
partial product we need to 0, 01 and 02 in b (33-bit multiplicand, output of preprocessing unit). Similarly for
ROW#2 we need to use 2, 21 and 22 on b and so
on for other ROWS
-
-
PARTIAL PRODUCT ADDITION UNIT
The process in this unit involves the addition of all partial products, namely ROW#0, ROW#2, …, ROW#32. To achieve this, we use full adders and half adders. The carries are propagated diagonally to the left and downward to achieve superior speed. However, when processing the last row,
b [32]
b [32]
0 1
b [31]
02
b [1]
0 1
b [0]
0 1
02
01
0
ROW#32, there is no row below it, and so we propagate the carries horizontally instead.
Note: Horizontal carry propagation is feasible because the augend is unoccupied, given the absence of a row beneath it.
It is worth noting that if the value of 2 is negative, we must
compute the 2's complement of the number "b". However, the
structure depicted in figures 4 and 5 merely perform XOR operations. Additionally, we need to add 1 to the LSB position, aligned with the binary values of the respective ROWs. To address this, we introduce an extra ROW, known as ROW#-1, which will facilitate the addition of 1 to the aligned LSB position for the relevant ROWs.
row0[63]
row0[33] row0[32] row0[34]
row0[1]
row0[0]
ROW#-1= {32b0, 32, 0 , 30, 0 , 28, 0 , 26, 0
,24, 0 ,22, 0 ,20, 0 , 18, 0 , 16, 0 , 14, 0 , 12,
0 ,10, 0 ,8, 0 ,6, 0 , 4, 0 , 2, 0 , 0}
A. Addition Stages
During the initial stage of addition, we will add ROW#-1, ROW#0, and ROW#2 (shifted left by 2 positions). In the
Figure 4: ROW#0 calculation using 0, 01 and 02
Similar structure can be used for calculation of other ROWs.
subsequent stage, we will add the sum obtained from the previous stage, ROW#4 (shifted left by 4 positions), and the carry generated by the first stage (shifted left by 1 position). This procedure is repeated in a similar manner for the succeeding
b [32]
b [32]
b [31]
b [1]
b [0]
stages of addition.
0 1 02 0 1
R#-1[0]
R#0[0]
R#-1[1]
R#0[1]
R#2[0]
R#-1[2]
R#0[2]
R#2[61]
R#-1[63] R#0[63]
0 1 02
01
——–
0
C#2[63]
S#2[63]
C#2[2]
S#2[2]
C#2[1]
S#2[1]
C#2[0]
S#2[0]
1b0
S#2[0]
C#2[0]
S#2[1]
R#4[0]
C#2[3]
S#2[4]
R#4[59]
C#2[62] S#2[63]
Figure 6 : First stage addition block
row2[33] row2[32] row2[34]
row2[1]
row2[0]
——–
——–
row2[63]
Figure 5: ROW#2 calculation using 2, 21 and 22
C#4[63]
S#4[63]
C#4[4]
S#4[4]
C#4[1]
S#4[1]
C#4[0]
S#4[0]
Figure 7: Second stage addition block
1b0 S#32[0]
C#32[0]
S#32[1]
C#32[16]
S#32[17]
C#32[62]
S#32[63]
carry
——–
carry
——–
Figure 11 : On Chip Power
M[63]
M[17]
M[1]
Parameter
Utilization
Bonded IOB
130
Slice
SLICEL
223
SLICEM
144
LUT as Logic
using o5 output only
0
using o6 output only
919
using o5 and 06
386
M[0]
Figure 8: Final Stage addition block
-
RESULTS AND COMPARISON
A test bench program has bee developed to verify the results of the implemented single-cycle signed multiplier using the Booth recoding algorithm. The code was written and executed using Xilinx Vivado 2022.2. The synthesized design was targeted towards the Virtex-7 FPGA, specifically the Device-XC7V585T with Package-FFG1157 and speed-1. Table-6 presents a comparison between the implemented single-cycle signed multiplier and the 32-bit complex Vedic multiplier [2].
Parameter
32-bit Vedic Multiplier
Implemented 32-bit Signed Multiplier
Improvement
Slice LUTs
7874
1305
83.43%
Bonded IOBs
258
130
49.61%
Table 6: Comparison with Vedic multiplier
Figure 9 : Simulation Waveform
Figure 10 : RTL schematic of Implemented multiplier
Table 7:FPGA hardware utilization
-
CONCLUSION
The architecture we propose in this paper can multiply two 32-bit signed numbers using the Booth recoding algorithm. Compared to conventional multipliers, our multiplier algorithm requires only half of the partial product addition. We examine the implemented multiplier's performance using various parameters, including power and hardware utilization. Compared to the 32-bit complex Vedic multiplier [2], the slice LUTs and bonded IOBs utilizations are significantly better, with improvements of 83.43% and 49.61%, respectively.
REFERENCES
[1] Saha, P., Banerjee, A., Bhattacharyya, P., and Dandapat, A. (2011, January). High speed ASIC design of complex multiplier using vedicmathematics. In Students Technology Symposium (TechSym), 2011 IEEE (pp. 237-241). IEEE. [2] Ankush Nikam, Swati Salunke, Sweta Bhurse. Design and Implementation of 32bit Complex Multiplier using Vedic Algorithm IJERT ,2015 March,Vol 4. [3] A. Dandapat, S. Ghosal, P. Sarkar, D. Mukhopadhyay, A 1.2-ns16×16- Bit Binary Multiplier Using High Speed Compressors, International Journal of Electrical and Electronics Engineering, 2010. [4] M. Morris Mano, Digital Design,3rd edition, Prentice Hall,2002.