DA-Based DCT with Error-Compensated Adder Tree

Chinababu Panduru; P. Mahesh Kumar

doi:10.17577/IJERTCONV1IS06150

ICSEM - 2013 (Volume 1 - Issue 06)

DA-Based DCT with Error-Compensated Adder Tree

DOI : 10.17577/IJERTCONV1IS06150

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 86
Total Downloads : 15
Authors : Chinababu Panduru, P. Mahesh Kumar,
Paper ID : IJERTCONV1IS06150
Volume & Issue : ICSEM – 2013 (Volume 1 – Issue 06)
Published (First Online): 30-07-2018
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

DA-Based DCT with Error-Compensated Adder Tree

CHINABABU PANDURU,

II M.Tech, CREC, Tirupati, INDIA panduruchinababu@gmail.com

P. MAHESH KUMAR,

Assistant Professor, ECE, CREC, Tirupati, INDIA maheshpenubaku@gmail.com

Abstract- In this paper, by operating the shifting and addition in parallel, an error-compensated adder-tree (ECAT) is proposed to deal with the truncation errors and to achieve low-error and high-throughput discrete cosine transform (DCT) design. Instead of the 12 bits used in previous works, 9-bit distributed arithmetic- precision is chosen for this work so as to meet peak- signal-to-noise-ratio (PSNR) requirements. Thus, an area-efficientDCT core is implemented to achieve 1 Gpels/s throughput rate with gate counts of 22.2 K for the PSNR requirements outlined in the previous works. Index Terms-Distributed arithmetic (DA)-based, error- error-compensated adder-tree (ECAT), 2-D discrete cosine transform (DCT).

INTRODUCTION

Discrete cosine transform (DCT) is a widely used tool in image and video compression applications [1]. Recently, the high throughput DCT designs have been adopted to fit the requirements of real-time applications. The high-throughput shift-adder-tree (SAT) and adder-tree (AT), those unroll the number of shifting and addition words in parallel for DA- based computation, were introduced in [3] and [4], respectively. However, a large truncation error occurred. In order to reduce the truncation error effect, several error compensation bias methods have been presented [5][7] based on statistical analysis of the relationship between partial products and multiplier-multiplicand. However, the elements of the truncation part outlined in this work are independent so that the previously described compensation methods cannot be applied. This brief addresses a DA-based DCT core with an error-compensated adder-tree (ECAT).The proposed ECAT operates shifting and addition in parallel by unrolling all the words required to be computed. Furthermore, the error-compensated circuit alleviates the truncation error for high accuracy design.
MATHEMATICAL DERIVATION OF DISTRIBUTED ARITHMETIC

The inner product is an important tool in digital signal processing applications. It can be written as follows:

L

Y= AT X = Ai Xi (1)

i=1

where Ai, Xi, and L are ith fixed coefficient, ith input data, and number of inputs, respectively. Assume that coefficient Ai is Q-bit twos complement binary fraction number. The inner product computation in

(1) can be implemented by using shifting and adders instead of multipliers. Therefore, low hardware cost can be achieved by using DA-based architecture.

.

Fig. 1. Q P-bit words shifting and addition operations in parallel.

ECAT ARCHITECTURE

The shifting and addition computation can be written as follows:

Q-1

Y= yj . 2-j . (2)

j=0

In Fig. 1, the Q P-bit words operate the shifting and addition in parallel by unrolling all computations. Furthermore, the operation in Fig.1 can be divided into two parts: the main part (MP) that includes P

Chinababu Panduru, P. Mahesh Kumar

121

most significant bits (MSBs) and the truncation part (TP) that has Q least significant bits (LSBs). Then, the shifting and addition output can be expressed as follows:

Y= MP + TP .2-(P-2) . (3)

The proposed ECAT is explained as follows.

Proposed Error-Compensated Scheme

Fig. 2. Proposed ECAT architecture of shifting and addition operators for the (P,Q )=(12,6) example.

From Fig. 1, (3) can be approximated as Y MP + . 2-(P-2) (4)

where is the compensated bias from the TP to the

Shift-add- add

SAT

Proposed ECAT

Area (gares)

236

406

463

Delay (ns)

10.8

3.72

3.89

AreaÃ—delay

100%

59.3%

70.7%

mse

0.326

6.761

0.218

MP

= Round(TPmajor + TPminor) (5)

where Round() is rounded to the nearest integer. The TPmajor has more weight than TPminor when contributing towards the . Therefore, the compensated bias can be calculated by obtaining Tpmajor and estimating TPminor.

Performance Simulation for an Error- Compensated Circuit

The , max, and mse are defined as follows:

= Avg {|TP |} (6) = max {|TP |} (7)

TABLE I

COMPARISONS OF ABSOLUTE AVERAGE ERROR , MAXIMUM ABSOLUTE max

,ERROR , AND MEAN SQUARE ERROR mse

Error

(P,Q)

(12,3)

Case 1

(12,6)

Case 3

(12,9)

Case 2

(12,12)

Case 2

Direct-T

1.062

5

2.5078

4.0010

5.500

1

Propose d

0.265

6

0.3789

0.3804

0.473

8

Post-T

0.250

0

0.2500

0.250

0

ma

x

Direct-T

2.125

0

5.015

6

8.002

0

11.00

0

Propose d

0.625

0

1.500

0

2.002

0

3.000

0

Post-T

0.500

0

0.500

0

0.500

0

0.500

0

ms

e

Direct-T

1.351

6

6.761

4

16.73

0

31.22

4

Propose d

0.101

6

0.218

4

0.222

2

0.347

2

Post-T

0.085

9

0.083

4

0.083

3

0.083

3

TABLE II

COMPARISONS OF THE PROPOSED ECAT WITH OTHER ARCHITECTURES FOR A SIX 8- BIT WORDS EXAMPLE

Proposed ECAT Architecture

The proposed ECAT has the highest accuracy with a moderate area-delay product. The shift-and-add [2] method has the smallest area, but the overall computation time is equal to 10.8(=1.8Ã—6) ns that is the longest. Similarly, the SAT[10], which truncates

max

mse =Avg { (TP-)2 } (8) where Avg is the average operator.

the TP and computes in parallel, takes 3.72 ns to

complete the computation and uses 406 gates, which is the best area-delay product performance. However,

Chinababu Panduru, P. Mahesh Kumar

122

for system accuracy, the SAT is the worst option shown in Table II. Therefore, the ECAT is suitable for high-speed and low-error applications.

PROPOSED 8 Ã— 8 2-D DCT CORE DESIGN The 1-D DCT employs the DA-based architecture and the proposed ECAT to achieve a high-speed, small area, and low-error design. The 1-D 8-point DCT can be expressed as follows:

7

Zn=(Â½)kn xmÃ—cos(((2m+1)n)/16) .(9) m=0

Where xm denotes the input data;Zn denotes the transform output; the proposed 2-D DCT is desgned using two 1-D DCT cores and one transpose buffer. For accuracy, the DA-precision and transpose buffer word lengths are chosen to be 9 bits and 12 bits, respectively, meaning that the system can meet the PSNR requirements outlined in previous works.Moreover, the 2-D DCT core accepts 9-bit image input and 12-bit output precision.
DISCUSSION AND COMPARISONS

Table IV compares the proposed 8 Ã— 8 2-D DCT core with previous 2-D DCT cores. In [3] and [4], the SAT and AT architectures for DA-based DCTs improve the throughput rate of the NEDA method. However, DA-precision must be chosen as 13 bits to meet the system accuracy with more area overhead. The proposed DCT core uses low-error ECAT to achieve a high-speed design, and the DA-precision can be chosen as 9 bits to meet the PSNR requirements for reducing hardware costs. The proposed DCT core has the highest hardware efficiency, defined as follows (based on the accuracy required by the presented standards)

Hardware Efficiency(103 pels/s) = Throughput Rate

GateCounts

…… (10)

TABLE III

COMPARISONS OF DIFFERENT 2-D DCT ARCHITECTURES WITH THE PROPOSED ARCHITECTURE

Adders

92

50+16AT

46+16

ECAT

DA-precision

12 bits

13 bits

9 bits

Throughput

Rate(pels/sec)

77M

400 M

1G

Hardware

Efficiency

3.42

10.05

45

77MHz=1GHz/13

Fig. 3. Architecture of the proposed 1-D 8-point DCT

Fig. 4. Core layout and characteristics
CONCLUSION

In this brief, a high-speed and low-error 8 Ã— 8 2-D DCT design with ECAT is proposed to improve the throughput rate significantly up to about 13 folds at high compression rates by operating the shifting and addition in parallel. Furthermore, the proposed error- compensated circuit alleviates the truncation error in ECAT. In this way, the DA-precision can be chosen as 9 bits instead of 12 bits so as to meet the PSNR requirements. Thus, the proposed DCT core has the highest hardware efficiency than those in previous works for the same PSNR requirements. Finally, an area-efficient 2-D DCT core is implemented using a TSMC 0.18-_m process, and the maximum throughput rate is 1 Gpels/s. In summary, the

	Shams et al.[2]	Huang et al.[4]	Proposed
Architecture	NEDA	DA- based	DA- based
Technology	0.18m	0.18m	0.18m
Multipliers/ROMs	0/0	0/0	0/0

Chinababu Panduru, P. Mahesh Kumar

123

proposed architecture is suitable for high compression rate applications in VLSI designs.

ACKNOWLEDGEMENT

p. chinababu would like to thank Mr. p. Mahesh kumar Assistant professor of ECE department. Who had been guiding throughout the project and supporting me in giving technical ideas about the paper and motivating me to complete the work effectively and successfully.

REFERENCES

Y.Wang, J. Ostermann, and Y. Zhang, Video Processing and Communications,1st ed. Englewood Cliffs, NJ: Prentice-Hall, 2002.
A. M. Shams, A. Chidanandan, W. Pan, and M.

A. Bayoumi, NEDA:A low-power high performance DCT architecture, IEEE Trans. Signal Process., vol. 54, no. 3, pp. 955964, Mar. 2006.
C. Peng, X. Cao, D. Yu, and X. Zhang, A 250 MHz optimized distributed architecture of 2D 88 DCT, in Proc. Int. Conf. ASIC, 2007,pp. 189192.
C. Y. Huang, L. F. Chen, and Y. K. Lai, A high- speed 2-D transform architecture with unique kernel for multi-standard video applications, in Proc. IEEE Int. Symp. Circuits Syst., 2008, pp. 2124.
S. S. Kidambi, F. E. Guibaly, and A. Antonious, Area-efficient multipliers for digital signal processing applications, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 43, no. 2, pp. 9095, Feb. 1996.
K. J. Cho, K. C. Lee, J. G. Chung, and K. K. Parhi, Design of low-error fixed-width modified booth multiplier, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 522531, May 2004.
L. D. Van and C. C. Yang, Generalized low- error area-efficient fixed width multipliers, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 8, pp. 16081619, Aug. 2005.

Chinababu Panduru, P. Mahesh Kumar

124

	Shift-add- add	SAT	Proposed ECAT
Area (gares)	236	406	463
Delay (ns)	10.8	3.72	3.89
AreaÃ—delay	100%	59.3%	70.7%
mse	0.326	6.761	0.218

Adders	92	50+16AT	46+16 ECAT
DA-precision	12 bits	13 bits	9 bits
Throughput Rate(pels/sec)	77M	400 M	1G
Hardware Efficiency	3.42	10.05	45

DA-Based DCT with Error-Compensated Adder Tree

Leave a Reply