FPGA Realization of Single-Cycle, 32-Bit Booth Recoding Signed Multiplier Enhanced by High-Speed Compressors

Prity Mishra

doi:10.17577/IJERTV13IS030228

Volume 13, Issue 03 (March 2024)

FPGA Realization of Single-Cycle, 32-Bit Booth Recoding Signed Multiplier Enhanced by High-Speed Compressors

DOI : 10.17577/IJERTV13IS030228

Download Full-Text PDF Cite this Publication

Open Access
[post-views]
Authors : Prity Mishra
Paper ID : IJERTV13IS030228
Volume & Issue : Volume 13, Issue 03 (March 2024)
Published (First Online): 03-04-2024
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

FPGA Realization of Single-Cycle, 32-Bit Booth Recoding Signed Multiplier Enhanced by High-Speed Compressors

Prity Mishra

Electronics and Telecommunication Engineering International Institute of Information Technology, Bhubaneshwar, Odisha, India

AbstractA multiplier is one of the integral and frequently used parts of every digital system. In this paper, a 32×32-bit signed multiplier has been implemented using the booth recoding algorithm and high-speed compressors. This in turn facilitates hardware reduction as partial product rows are significantly reduced, thus improving efficiency in terms of speed and power consumption. An analytical comparison of this multiplier with the conventional multiplier has also been provided. This booth recoding multiplier has been designed using the SYSTEM VERILOG HDL and FPGA realization has been achieved through Xilinx Kintex-7 FPGA in Xilinx Vivado 2021.2 software.

KeywordsBooth Recoding; 32×32 bit signed multiplier; Agile compressors; Parallel addition; Verilog HDL; FPGA.

INTRODUCTION

Low-power VLSI circuits have emerged as critical criteria for designing energy-efficient electronic systems tailored for high- performing portable devices. Across various domains such as microprocessors, image and video processing, digital filters, error correction and coding, neural networks, and machine learning, multipliers play an integral role in enhancing computational efficiency. This paper presents the implementation of a 32×32-bit signed multiplier using the Booth recoding algorithm and high-speed agile compressors to improve processing speed. The inputs can be either signed or unsigned.

Conventional multiplication relies on the partial products method, where each bit of the multiplier is multiplied with the multiplicand to form partial product rows. Subsequently, these rows are left-shifted and summed to produce the final result.

A similarity is observed in case of multiplication of two binary numbers. This process is simplified as it involves only two digits1 and 0whereby multiplication with 1 results in a direct copy and same with 0 is discarded. However, for numbers with a high number of bits, processing time increases, leading to elevated time complexity and hardware demands.

In this paper, the focus will be on overcoming those major drawbacks with the proposed multiplier design. The paper has been divided into subsequent sections as follows. Section II provides a comprehensive overview of the traditional booth recoding algorithm. Section III goes into detail explaining the operation of the high-speed, agile compressors. The comparative analysis and the experimental results have been included in Section IV, followed by the concluding remarks in Section V.

CONVENTIONAL BOOTH RECODING ALGORITHM

Abstract block diagram of top module of the multiplier and the signal description

For booth recoding algorithm-based multiplier, two 32-bit inputs are required to represent the multiplicand and multiplier. In addition to that, two control signals are also fed to indicate whether the multiplier and multiplicand are signed or unsigned binary values. The output is a 64-bit value.

32

64

32

mplier

mplier_s_u

prod

mplicand mplicand_s_u

Table 1:Signal Description of Top Module Figure 1:32-bit Booth Recoding Multiplier

Signal name	Width	Source	Description
mplier	32	input	Top module multiplier input
mplier_s_u	1	input	1multiplier is signed 0multiplier is unsigned
mplicand	32	input	Top module multiplicand input
mplicand_s_u	1	input	1multiplicand is signed 0multiplicand is unsigned
prod	64	output	Output of the multiplier block

Mathematical representation of signed and unsigned numbers

Signed number:

2s complement representation of A:

=23131 + 23030 + (2 1)22929

15

=

=0

221 21

+22828 + (2 1)22727

+22626 + (2 1)22525

..

+222 + (2 1)211

+200 + 201 where 1 0

=23131 + 23030 + 23029

22929 + 22828 + 22827

22727 + 22626 + 22625

..

233 + 222 + 221

211 + 200 + 201 where 1 0

= 230(2 31 + 30 + 29)

+228(2 29 + 28 + 27)

+226(2 27 + 26 + 25)

..

+22(2 3 + 2 + 1)

+20(2 1 + 0 + 1) where 1 0

where 21 = 2 2 + 21 + 22 (2)

When equations (1) and (2) are compared, it is evident that preprocessing of integer A can be done before recoding the basic signed and unsigned integer.

Table 2:Truth Table for F-Block Values

+				(+/)	(Ã— 1/0)	(Ã— 2/0)
0	0	0	0	0	0	0
0	0	1	1	0	1	0
0	1	0	1	0	1	0
0	1	1	2	0	1	1
1	0	0	-2	1	1	1
1	0	1	-1	1	1	0
1	1	0	-1	1	1	0
1	1	1	0	0	0	0

=

15

=0

= 15

22( 2 2+1

22 2

+ 2

+ 21)

Sub-modules of the Booth Recoding Multiplier

33rd bit extender unit

=0

where 2 = 2 2+1 + 2 + 21 (1)

Unsigned number:

2s complement representation of A:

=23232 + 23131 + 23130

23030 + 22929 + 22928

22828 + 22727 + 22726

Here, a signed multiplier is being implemented to performed unsigned operations. Thus, in order to cover the complete range of unsigned multiplication, 1 extra bit is being extended at MSB in the next step. The 33rd bit for unsigned and signed number is

mplier_s_u

..

..

222 + 211 + 210

200 + 211 + 212 where 1 = 2 0

mplier[31:0]
0

mplier[31] 1

mplier[31:0]
mux_out

a[32:0]
=231(2 32 + 31 + 30)

+229(2 30 + 29 + 28)

+227(2 28 + 27 + 26)

mplicand[31:0]
mplicand[31:0]
mplicand[31]

1

mux_out

0

b[32:0]
..

..

+21(2 2 + 1 + 0)

+21(2 0 + 1 + 2) where 1

= 2 0

mplicand_s_u

zero and sign-extended respectively as shown in Fig.2 below.

=

16

=0

221( 2 2

+ 21

+

22)

Figure 2:33rd-bit extender unit block diagram
Preprocessing/F block formation unit
- In this unit, a 0 is added at the least significant bit (LSB) of the a(Here, a is the 32-bit input to pre-processing block which is also the output of the 33rd bit extender unit)
- Following that, blocks of 3 bits each are made. The most significant bit (MSB) of the previous block is made the least significant bit (LSB) of current one.
- As a 33-bit number a is in consideration, so after addition of extra bit of 0 in the least significant bit (LSB) position, we have a shortage of one bit needed to form the F-block.
- To counter this, a bit is again added to the most significant bit (MSB), which is the same as the 33rd bit, irrespective of whether the number is signed or unsigned.
Table 3:Values after extension

Partial product generation unit

There will be 16 rows of partial products of the 16 F-blocks that

21

2

2+1

21

2

2+1 1

Extra bit added for F-block formation

Extended 33-bit number (output of the 33rd bit extender unit)

0 added in

LSB

a[32]

a[32:0]

0

21

2

0

1

2

Total width of the number after pre-processing becomes 34- bits. Pre-processing is just a simple rewiring process.

Table 4:Grouping for F-Block

2+1

Figure 3:Partial Product Generator

F0	{ 1, 0, 0}
F2	{ 3, 2, 1}
F4	{ 5, 4, 3}
F6	{ 7, 6, 5}
F8	{ 9, 8, 7}
F10	{ 11, 10, 9}
F12	{ 13, 12, 11}
F14	{ 15, 14, 13}
F16	{ 17, 16, 15}
F18	{ 19, 18, 17}
F20	{ 21, 20, 19}
F22	{ 23, 22, 21}
F24	{ 25, 24, 23}
F26	{ 27, 26, 25}
F28	{ 29, 28, 27}
F30	{ 31, 30, 29}
F32	{ 32, 32, 31}

have been generated. Using Table 4, , 1 and 2 have been

obtained and will be used to do the necessary bit manipulation to get the required partial products. The hardware realization for

, 1 and 2 is given above(Fig.3).

Note: is high when f2i is -ve, otherwise low.

1 is high when f2i0, otherwise low.

2 is high when f2i = Â±2, otherwise low.

All the F-blocks need to be processed individually to find the respective , 1 and 2.

Using booth recoding algorithm, our partial products will be reduced. There will only be 17 rows of partial product to be added. For example, to generate ROW_0 partial product, we

will need to operate 0, 01 and 02 in b(33-bit multiplicand; output of preprocessing unit).Similarly, other rows are operated upon to get the respective partial product rows.

b[32]

b[1]

b[0]

b[31]

0

1

2

0

1

0

1

2

1

Sign Extension Region

Where n=0,2,4,.,32

row_n[63] row_n[34]

row_n[33]

row_n[32]

row_n[1]

row_n[0]

Partial product addition unit

Figure 4:Row#n calculation using Fn2, Fn1 and Fn

utilising compressors over traditional adders aims to further augment computational speed and efficiency, thereby enhancing

This unit undertakes the summation of all partial products,

denoted as ROW#0 through ROW#32. Carries are efficiently propagated diagonally leftward and downward to enhance computational speed. However, during the processing of the final row, ROW#32, which lacks a row beneath it, horizontal carry propagation becomes necessary.

It is imperative to acknowledge that horizontal carry propagation is viable due to the unoccupied augend, as evidenced by the absence of a subsequent row. Notably, in cases where the value of f2i is negative, the computation necessitates the derivation of the 2's complement of the number "b". Despite the depiction in Fig.4, the operational scope is confined to XOR operations. Furthermore, the addition of 1 to the Least Significant Bit (LSB) position, in alignment with the binary values of the respective ROWs, is indispensable.

To address this requirement, an additional ROW, denoted as ROW#-1, is introduced. This supplementary row serves the purpose of facilitating the addition of 1 to the aligned LSB position for the relevant ROWs, ensuring computational accuracy and integrity within the system architecture.

Conventional approaches in multiplier design have often relied on the utilization of half and full adders. However, our approach diverges from tradition as we opt for the integration of high- speed agile compressors within our multiplier architecture. Critical path in conventional addition is longer compared to compressors. We use this fact and our strategic choice of

the overall performance of the multiplier.

COMPRESSORS

Binary multiplication involves the execution of numerous partial product additions. In conventional multiplier architectures, full adders are conventionally employed for this purpose. However, full adders can only accommodate the addition of up to three inputs simultaneously. Consequently, when adding all partial products, a substantial number of full adders are necessitated. To mitigate the requirement for a vast number of adders in this operational scheme, we introduce high-speed compressors. These compressors are developed based on the principle of binary counter properties. Among the various compressor designs, the 5-3 compressor stands out, as it allows the addition of up to five inputs simultaneously, yielding a three-bit output. This innovation serves to streamline the computational process and optimize resource utilization within the multiplier architecture. Table-5 showcases the counter property of 5-3 compressor.

Input Condition	Out3	Out2	Out1	Decimal value
All inputs are zero	0	0	0	0
Any one input is one	0	0	1	1
Any two input are one	0	1	0	2
Any three input are one	0	1	1	3
Any four input are one	1	0	0	4
Any five input are one	1	0	1	5

Table 5:Counter-Property of a 5:3 Compressor

A B C D E

Different compressors:-their design and sub-units

Some basic different compressors architecture re designed and implemented by using the same logic. Their block diagrams are as follows

Out 3 Out 2 Out 1

Full adder

Half adder

Figure 6:Structure of a 5:3 Compressor

A B C D

Full adder

A

B

S5

C D

S4 S3

E

F G

S2 S1

Half adder

Parallel Addition

Full Adder

4:3 compressor

Out 3 Out 2 Out 1

Figure 5:Structure of a 4:3 compressor

Out 3 Out 2 Out 1

Figure 7:Structure of a 7:3 Compressor

5:3 compressor

Full Adder

Parallel Addition

S2

S3

S5

S6

5:3 compressor

Full Adder

A B C

D E F

G H I

J K L

M N O

S4 S1

Out 4 Out 3 Out 2 Out 1

Figure 8:Structure of a 15:4 Compressor

ull

dder

Half

dder

S S4 S2

Half

dder

2 1

S3 S1

S S

Half

dder

ull

dder

Half

dder

2

S3

1

S4 S2 S1

ut 3 ut 2 ut 1

ut 4 ut 3 ut 2

ut 1

Figure 9:Parallel Addition Unit for 7:3 compressor Figure 10:Parallel Addition Unit for15:4 compressor

4

3

4 3

21

H

1 input bits

4 input bits

row2 1 row0 3 2

row2 0

row0 2

row0 1

0 row0 0

4 carry bits

3

2 1

3 3 2 1 0

H Half dder

ull dder

Figure 11:Partial Product Addition Using High-speed Agile Compressor

RESULTS AND COMPARISON

The implemented single cycled, 32-bit signed multiplier using booth recoding algorithm and high speed compressors has been successfully verified through a written testbench script. The code was written and executed on the Xilinx Vivado 2021.2 platform. The synthesized design was targeted towards the Kintex-7 FPGA, specifically the Device- XC7K70T with Package- FBG484 and speed-1. Table-6 presents a comparison between the implemented single-cycle 32-bit signed multiplier and the 32-bit complex Vedic multiplier [2].

Table 6:A Comparison with The Vedic Multiplier

Parameters

32-bit Vedic Multiplier

Implemented 32-bit Signed Multiplier

Improvement

Slice LUTs

7874

1978

74.88%

Bonded IOBs

258

130

49.61%

Table 7:FPGA Hardware Utilization

Parameter

Utilization

Bonded IOB

130

Slice

SLICEL

310

SLICEM

243

LUT as Logic

Using o6 output only

1713

Using o5 and o6

264

Figure 12:Behavioural Simulation Results

(On Chip)

Figure 13:RTL Schematic of implemented multiplier
CONCLUSION

The architecture of the multiplier proposed by us in this paper can multiply two signed as well as unsigned 32-bit integers using booth recoding algorithm and high speed agile compressors. The booth recoding algorithm significantly reduces the number of partial product rows which in turn improve the speed as compared to conventional multipliers and the high speed compressors have a much shorter critical path as compared to traditional adders. The implemented multiplier's performance has been examined using various parameters, including power and hardware utilization. Compared to the 32-bit complex Vedic multiplier [2], the slice LUTs and bonded IOBs utilizations are significantly better, with improvements of 74.88% and 49.61%, respectively.

Figure 14:On-Chip Power

REFERENCES

Saha, P., Banerjee, A., Bhattacharyya, P., and Dandapat, A. (2011, January). High speed SI design of complex multiplier using vedicmathematics. In Students Technology Symposium (TechSym), 2011 IEEE (pp. 237-241). IEEE.
Ankush Nikam, Swati Salunke, Sweta Bhurse. Design and Implementation of 32bit omplex ultiplier using Vedic lgorithm IJERT ,2015 March,Vol 4.
. Dandapat, S. Ghosal, P. Sarkar, D. ukhopadhyay, 1.2-ns16Ã—16- Bit Binary ultiplier Using High Speed ompressors, International Journal of Electrical and Electronics Engineering, 2010.
Shubhajit Roy Chowdhury, Aritra Banerjee, Aniruddha Roy, Hiranmay Saha,Design, Simulation and Testing of a High Speed Low Power 1 -4 Compressor for High Speed Multiplication pplications, irst International Conference on Emerging Trends in Engineering and Technology, 2008.
. orris ano, Digital Design,3rd edition, Prentice Hall,2002..

Parameters	32-bit Vedic Multiplier	Implemented 32-bit Signed Multiplier	Improvement
Slice LUTs	7874	1978	74.88%
Bonded IOBs	258	130	49.61%

Parameter		Utilization
Bonded IOB		130
Slice	SLICEL	310
Slice	SLICEM	243
LUT as Logic	Using o6 output only	1713
LUT as Logic	Using o5 and o6	264