- Open Access
- Total Downloads : 15
- Authors : Chinababu Panduru, P. Mahesh Kumar,
- Paper ID : IJERTCONV1IS06150
- Volume & Issue : ICSEM – 2013 (Volume 1 – Issue 06)
- Published (First Online): 30-07-2018
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
DA-Based DCT with Error-Compensated Adder Tree
CHINABABU PANDURU,
II M.Tech, CREC, Tirupati, INDIA panduruchinababu@gmail.com
P. MAHESH KUMAR,
Assistant Professor, ECE, CREC, Tirupati, INDIA maheshpenubaku@gmail.com
Abstract- In this paper, by operating the shifting and addition in parallel, an error-compensated adder-tree (ECAT) is proposed to deal with the truncation errors and to achieve low-error and high-throughput discrete cosine transform (DCT) design. Instead of the 12 bits used in previous works, 9-bit distributed arithmetic- precision is chosen for this work so as to meet peak- signal-to-noise-ratio (PSNR) requirements. Thus, an area-efficientDCT core is implemented to achieve 1 Gpels/s throughput rate with gate counts of 22.2 K for the PSNR requirements outlined in the previous works. Index Terms-Distributed arithmetic (DA)-based, error- error-compensated adder-tree (ECAT), 2-D discrete cosine transform (DCT).
-
INTRODUCTION
Discrete cosine transform (DCT) is a widely used tool in image and video compression applications [1]. Recently, the high throughput DCT designs have been adopted to fit the requirements of real-time applications. The high-throughput shift-adder-tree (SAT) and adder-tree (AT), those unroll the number of shifting and addition words in parallel for DA- based computation, were introduced in [3] and [4], respectively. However, a large truncation error occurred. In order to reduce the truncation error effect, several error compensation bias methods have been presented [5][7] based on statistical analysis of the relationship between partial products and multiplier-multiplicand. However, the elements of the truncation part outlined in this work are independent so that the previously described compensation methods cannot be applied. This brief addresses a DA-based DCT core with an error-compensated adder-tree (ECAT).The proposed ECAT operates shifting and addition in parallel by unrolling all the words required to be computed. Furthermore, the error-compensated circuit alleviates the truncation error for high accuracy design.
-
MATHEMATICAL DERIVATION OF DISTRIBUTED ARITHMETIC
The inner product is an important tool in digital signal processing applications. It can be written as follows:
L
Y= AT X = Ai Xi (1)
i=1
where Ai, Xi, and L are ith fixed coefficient, ith input data, and number of inputs, respectively. Assume that coefficient Ai is Q-bit twos complement binary fraction number. The inner product computation in
(1) can be implemented by using shifting and adders instead of multipliers. Therefore, low hardware cost can be achieved by using DA-based architecture.
.
Fig. 1. Q P-bit words shifting and addition operations in parallel.
-
ECAT ARCHITECTURE
The shifting and addition computation can be written as follows:
Q-1
Y= yj . 2-j . (2)
j=0
In Fig. 1, the Q P-bit words operate the shifting and addition in parallel by unrolling all computations. Furthermore, the operation in Fig.1 can be divided into two parts: the main part (MP) that includes P
Chinababu Panduru, P. Mahesh Kumar
121
most significant bits (MSBs) and the truncation part (TP) that has Q least significant bits (LSBs). Then, the shifting and addition output can be expressed as follows:
Y= MP + TP .2-(P-2) . (3)
The proposed ECAT is explained as follows.
-
Proposed Error-Compensated Scheme
Fig. 2. Proposed ECAT architecture of shifting and addition operators for the (P,Q )=(12,6) example.
From Fig. 1, (3) can be approximated as Y MP + . 2-(P-2) (4)
where is the compensated bias from the TP to the
Shift-add- add
SAT
Proposed ECAT
Area (gares)
236
406
463
Delay (ns)
10.8
3.72
3.89
Area×delay
100%
59.3%
70.7%
mse
0.326
6.761
0.218
MP
= Round(TPmajor + TPminor) (5)
where Round() is rounded to the nearest integer. The TPmajor has more weight than TPminor when contributing towards the . Therefore, the compensated bias can be calculated by obtaining Tpmajor and estimating TPminor.
-
Performance Simulation for an Error- Compensated Circuit
The , max, and mse are defined as follows:
= Avg {|TP |} (6) = max {|TP |} (7)
TABLE I
COMPARISONS OF ABSOLUTE AVERAGE ERROR , MAXIMUM ABSOLUTE max
,ERROR , AND MEAN SQUARE ERROR mse
Error
(P,Q)
(12,3)
Case 1
(12,6)
Case 3
(12,9)
Case 2
(12,12)
Case 2
Direct-T
1.062
5
2.5078
4.0010
5.500
1
Propose d
0.265
6
0.3789
0.3804
0.473
8
Post-T
0.250
0
0.2500
0.2500
0.250
0
ma
x
Direct-T
2.125
0
5.015
6
8.002
0
11.00
0
Propose d
0.625
0
1.500
0
2.002
0
3.000
0
Post-T
0.500
0
0.500
0
0.500
0
0.500
0
ms
e
Direct-T
1.351
6
6.761
4
16.73
0
31.22
4
Propose d
0.101
6
0.218
4
0.222
2
0.347
2
Post-T
0.085
9
0.083
4
0.083
3
0.083
3
TABLE II
COMPARISONS OF THE PROPOSED ECAT WITH OTHER ARCHITECTURES FOR A SIX 8- BIT WORDS EXAMPLE
-
Proposed ECAT Architecture
The proposed ECAT has the highest accuracy with a moderate area-delay product. The shift-and-add [2] method has the smallest area, but the overall computation time is equal to 10.8(=1.8×6) ns that is the longest. Similarly, the SAT[10], which truncates
max
mse =Avg { (TP-)2 } (8) where Avg is the average operator.
the TP and computes in parallel, takes 3.72 ns to
complete the computation and uses 406 gates, which is the best area-delay product performance. However,
Chinababu Panduru, P. Mahesh Kumar
122
for system accuracy, the SAT is the worst option shown in Table II. Therefore, the ECAT is suitable for high-speed and low-error applications.
-
-
PROPOSED 8 × 8 2-D DCT CORE DESIGN The 1-D DCT employs the DA-based architecture and the proposed ECAT to achieve a high-speed, small area, and low-error design. The 1-D 8-point DCT can be expressed as follows:
7
Zn=(½)kn xm×cos(((2m+1)n)/16) .(9) m=0
Where xm denotes the input data;Zn denotes the transform output; the proposed 2-D DCT is desgned using two 1-D DCT cores and one transpose buffer. For accuracy, the DA-precision and transpose buffer word lengths are chosen to be 9 bits and 12 bits, respectively, meaning that the system can meet the PSNR requirements outlined in previous works.Moreover, the 2-D DCT core accepts 9-bit image input and 12-bit output precision.
-
DISCUSSION AND COMPARISONS
Table IV compares the proposed 8 × 8 2-D DCT core with previous 2-D DCT cores. In [3] and [4], the SAT and AT architectures for DA-based DCTs improve the throughput rate of the NEDA method. However, DA-precision must be chosen as 13 bits to meet the system accuracy with more area overhead. The proposed DCT core uses low-error ECAT to achieve a high-speed design, and the DA-precision can be chosen as 9 bits to meet the PSNR requirements for reducing hardware costs. The proposed DCT core has the highest hardware efficiency, defined as follows (based on the accuracy required by the presented standards)
Hardware Efficiency(103 pels/s) = Throughput Rate
GateCounts
…… (10)
TABLE III
COMPARISONS OF DIFFERENT 2-D DCT ARCHITECTURES WITH THE PROPOSED ARCHITECTURE
Adders
92
50+16AT
46+16
ECAT
DA-precision
12 bits
13 bits
9 bits
Throughput
Rate(pels/sec)
77M
400 M
1G
Hardware
Efficiency
3.42
10.05
45
77MHz=1GHz/13
Fig. 3. Architecture of the proposed 1-D 8-point DCT
Fig. 4. Core layout and characteristics
-
CONCLUSION
In this brief, a high-speed and low-error 8 × 8 2-D DCT design with ECAT is proposed to improve the throughput rate significantly up to about 13 folds at high compression rates by operating the shifting and addition in parallel. Furthermore, the proposed error- compensated circuit alleviates the truncation error in ECAT. In this way, the DA-precision can be chosen as 9 bits instead of 12 bits so as to meet the PSNR requirements. Thus, the proposed DCT core has the highest hardware efficiency than those in previous works for the same PSNR requirements. Finally, an area-efficient 2-D DCT core is implemented using a TSMC 0.18-_m process, and the maximum throughput rate is 1 Gpels/s. In summary, the
Shams et al.[2] |
Huang et al.[4] |
Proposed |
|
Architecture |
NEDA |
DA- based |
DA- based |
Technology |
0.18m |
0.18m |
0.18m |
Multipliers/ROMs |
0/0 |
0/0 |
0/0 |
Chinababu Panduru, P. Mahesh Kumar
123
proposed architecture is suitable for high compression rate applications in VLSI designs.
ACKNOWLEDGEMENT
p. chinababu would like to thank Mr. p. Mahesh kumar Assistant professor of ECE department. Who had been guiding throughout the project and supporting me in giving technical ideas about the paper and motivating me to complete the work effectively and successfully.
REFERENCES
-
Y.Wang, J. Ostermann, and Y. Zhang, Video Processing and Communications,1st ed. Englewood Cliffs, NJ: Prentice-Hall, 2002.
-
A. M. Shams, A. Chidanandan, W. Pan, and M.
A. Bayoumi, NEDA:A low-power high performance DCT architecture, IEEE Trans. Signal Process., vol. 54, no. 3, pp. 955964, Mar. 2006.
-
C. Peng, X. Cao, D. Yu, and X. Zhang, A 250 MHz optimized distributed architecture of 2D 88 DCT, in Proc. Int. Conf. ASIC, 2007,pp. 189192.
-
C. Y. Huang, L. F. Chen, and Y. K. Lai, A high- speed 2-D transform architecture with unique kernel for multi-standard video applications, in Proc. IEEE Int. Symp. Circuits Syst., 2008, pp. 2124.
-
S. S. Kidambi, F. E. Guibaly, and A. Antonious, Area-efficient multipliers for digital signal processing applications, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 43, no. 2, pp. 9095, Feb. 1996.
-
K. J. Cho, K. C. Lee, J. G. Chung, and K. K. Parhi, Design of low-error fixed-width modified booth multiplier, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 12, no. 5, pp. 522531, May 2004.
-
L. D. Van and C. C. Yang, Generalized low- error area-efficient fixed width multipliers, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 8, pp. 16081619, Aug. 2005.
Chinababu Panduru, P. Mahesh Kumar
124