- Open Access
- Total Downloads : 415
- Authors : B. Doss, K. Soundararajan, Y. Narasimha Murthy
- Paper ID : IJERTV4IS010763
- Volume & Issue : Volume 04, Issue 01 (January 2015)
- Published (First Online): 29-01-2015
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Implementation of Adaptive FIR Filter with Area Optimization Based on DA and OBC
B. Doss
Department Of ECE,JNTUA CEA Anantapur, India
K. Soundararajan
Department of ECE, KITES Engg, College
Hyderabad, India
-
Narasimha Murthy
Reader, SSBN Degree & PG College
Anantapur, India
Abstract In this paper an efficient pipelined architecture of an adaptive FIR filter based on distributed arithmetic (DA) is discussed. On comparing with existing DA based adaptive filter implementations the proposed architecture has achieved low- complexity. For achieving this advantage Offset Binary Coding (OBC) has been used in this filter design. By doing so, the number of LUTs used has been decreased from 16 to 8. In addition to this, DA-based inner product computations have been done by using conditional signed carry-save accumulation instead of traditional adder based shift accumulation. Unlike existing DA-based designs the proposed design involves the same number of multiplexers but a smaller LUT and nearly half of the number of adders.
Keywords Adaptive FIR filter, Distributed Arithmetic, carry- save accumulation, Adaptive LMS algorithm, offset Binary Coding.
-
INTRODUCTION
Now-a-days in numerous DSP applications adaptive filters are the most important and useful blocks. Because of its simplicity and satisfactory convergence performance, the tapped-delay line finite impulse response (FIR) is the most popularly used adaptive filter. The weight updating in this filter follows Widrow-Hoff least mean square (LMS) algorithm [1]. A long critical path on the forward path of FIR filter design will be resulted in the direct form realization. It is because of the computation of an inner product to obtain a filter output. So it is the case while dealing with input signals of high sampling rate, the reduction in the critical path of the structure is necessary.
In the current scenario some distributed arithmetic techniques such as multiplier less algorithms [2] have gained much popularity. This structure has more advantages such as high throughput and because of its regularity it is cost- effective and area efficient. Allred et al [3] proposed hardware-efficient DA-based design which uses two different lookup tables (LUTs) for updating weight and filter coefficients. Sang Yoon Park and Pramod Kumar proposed an architecture [4] for implementing adaptive FIR filters. But the architecture is complex for higher order filters.
So in this brief, low-complex architecture is proposed for implementing adaptive FIR filter. Such a simpler design is possible with the help of Offset Binary Coding. Offset binary coding is a digital coding scheme. It is also known as excess-k. In this coding scheme minimum negative value is represented by all zeros and the maximum positive value is represented by all-ones. This coding scheme has no standards,
but the offset K=2^ (n-1) is most often used for an n-bit binary word. Here the zero value is represented by all zeros except the most significant bit, which is similar to twos complement notation except the most significant bit is inverted.
Offset binary coding finds more applications in Digital Signal Processing (DSP) [5]. The basic units of DSP chips are analog to digital (A/D) and digital to analog (D/A) converters. These are unipolar so cannot process bipolar signals. For that biasing the analog signals with a DC offset is needed. The result is in offset binary format. Offset binary format cannot be handled directly by most of the standard computer CPU chips. They typically require some data conversion techniques to process offset binary. But without requiring any data conversion DSP chips can handle offset binary. Just by inverting the most significant bit offset binary can be converted into twos complement form.
-
CONCEPT OF ADAPTIVE LMS ALGORITHM
The LMS algorithm computes both the filter output and an error value during each cycle. Usually the difference between current filter output and desired responses is termed as error value. Then the filter coefficients are updated with an estimated error value in every training cycle. The filter coefficients are updated during the nth iteration according to the following equations:
+ 1 = + () (1) Error signal is given as
= () (2) Output signal of the adaptive filter is
= () (3)
Where is the filter coefficients vector, and () is the filter input vector, d(n) is the desired response, is convergence factor. The feedback error will be available This is called adaptation delay. So the delayed error is used for updating the current weight in pipelined structures instead of the most recent error, the parameter m is the adaptation delay. The equation that governs such weight update in adaptive filter using LMS algorithm is given by
+ 1 = + ( ) (4)
Here x(n) is filter input vector and w(n) is weight vector.
input as the result is represented in twos complement format. The output will be set to one if the appeared address is MSB slice.
-
DA-BASED APPROACH FOR INNER-PRODUCT COMPUTATON
In the LMS adaptive filter, inner-product computations of the critical path in each cycle are performed by using the following expression.
1
W0l w1l
w2l
L+2
0
X0
X1
X1+x0
X2
x2+x0
X2+x1
X2+x1+x0
X3+x0
X3+x1
X3+x1+x0
x3+x2
X3+x2+x0
X3+x2+x1
X3+x2+x1+x0
/
Shift accumulator Sign control
>>1
+/-
=
=0
(5)
D
Where and form the N-point vectors corresponding
to the current weights and the most recent N-1 input values respectively for 0 k N-1. Let L be the bit width of the weight and each component of the weight vector in its 2s complement representation is expressed as:
1
= 0 + 2 (6)
=1
Where kl denotes the lth bit of .Equation (5) can be written as
y
Fig. 1. 4-point inner product computation in Conventional DA-based implementation
Thus the XOR gates produce the LUT output of MSB slice in ones complement form without affecting the output of
1
1
1
other bit slices. At last both the outputs of XOR gate (i.e. sum
= 0 + 2
(7)
and carry words) resulted after L number of clock cycles have
=0
=0
=1
been added by the final adder. And the input carry to the final adder is set to one to account for performing 2s complement operation of the LUT output of the MSB slice.
In the above equation by altering the summations order of k and l indices it can be converted to sum-of-products form.
The equation that gives data in kth LUT location is as follows
1
1
1
1
= 0 + 2
(8)
=
(10)
=0
=1
=1
=0
From the above equation the inner-product can be calculated as
In the above expression is the (j+1)th bit of N-bit
1
= 2
=1
1
0, = 9
=0
binary representation of integer k for 0 k 2N-1. Here Ckfor 0 k 2N-1 can be recomputed and stored in RAM- based LUT of 2N words. However, only (2N-1) registers are stored in LUT instead of storing 2N words.
The partial sum for l = 0,1,.,L-1 will have 2N possible values. The inner-product of above equation will be calculated for L cycles of shift accumulation; thereafter for L number of bit slices where 0 l L-1 corresponding LUT-read operations will be performed as shown in fig.1. And the shift accumulation is performed using carry-save accumulator as it involves significant critical path.
In carry-save accumulation the weight vector bit slices are given one after the next in the order LSB to MSB. In the case of MSB slices the accumulation of output in its 2s complement form is needed. Therefore, all the bits of LUT output are passed through the XOR gates with a sign-control
-
EXISTING DA-BASED ADAPTIVE FILTER STRUCTURE
For our convenience we can decompose the computations of the large ordered adaptive filters into small adaptive filtering block. This makes the job of computing inner product which is the most important task in DA-based implementation of adaptive filters easier as the inner product computation of long vectors requires a very large LUT [3].
The existing DA-based LMS adaptive filter design requires 16 delay element DA-table structure. The 4th order adaptive filter (N=4) implementation involves a four-point inner product block, weight increment block and the circuit for generating error e, and a control word generator that generates the control word t for the barrel shifters as shown in fig 2.
But the same structure can be implemented with the help of 8-delay elements. Since with the help of basic elements x(n), x(n-1), x(n-2), x(n-3) the remaining structure can be implemented. It can be achieved with the help of Offset Binary Coding (OBC).
Fig. 2. Structure of DA-based LMS adaptive filter of filter length N=4
-
PROPOSED OFFSET BINARY CODING BASED ADAPTIVE FILTER DESSIGN
In the proposed approach, for the implementation of DA- based adaptive filter, we make use of the DA-table structure that makes use of the 8 delay elements. Hence by using the proposed structure, we can reduce the original DA-table structure by two times, which increases the area efficiency of the design twice. The proposed structure for the DA-table which makes use of eight delay elements is shown in the figure. 3.
The proposed structure for 4th order adaptive filter (N=4) is shown in the fig 2. Generally, it consists of a four-point inner product block, a weight increment block and the circuit for generating error e, and a control word generator that generates the control word t for the barrel shifters.
The four point inner product block which was shown in fig. 5 , consists of a DA-table which has an array of 15 registers. It is capable of storing the partial inner products y, for 0 < l 15 and a 16:1 multiplexer is used to select one of those registers at any particular instant. Weights A= {w, w, w ,w} for 0 l L-1 are fed to the multiplexer as control bits in the LSB-MSB order. The output of the MUX is then fed to the carry-save accumulation block. After L clock cycles, the carry-save accumulation block accumulates all the partial inner products and generates a sum word and a carry word of size L+2 bit length each.
Fig. 3. Proposed structure of DA table with 8 delay elements
The sum word is shifted right by one position right and added to the carry to generate the filter output y(n-1) which is then subtracted from the desired signal d(n) to obtain the error e(n). After that the sign-magnitude separator is used to separate the sign bit and magnitude bits from the obtained error.
Fig.4. Logic used for the generation of control word t for the barrel shifter for L=8
Fig. 5. Structure of four point inner product block
The magnitude bits are used by the control word generator to generate the control word t for the barrel shifter. The logic used for the generation of control word t for barrel shifter is shown in the figure 4.
The convergence factor µ is taken as 1/N. Generally, we can take µ as 2-i/N, where i is a small integer. The weight increment unit for N=4 consist of four barrel shifters and four adder/subtractor cells. The structure of weight increment cell is shown in figure 6.
Fig. 6. Structure of weight increment block for N=4
In general Barrel Shifter is the most helpful unit in Digital Signal Processing blocks. Mostly it is used in multiplication and division operations. It performs multiplication just by right shifting the data in registers by required number of times. That means to multiply the data by 2 times, the data is shifted right by one position. Similarly division can be performed by left shifting the data as per requirement. Thus the barrel shifter reduces the complexity of operations there by area optimization has been achieved too.
Here the barrel shifter shifts the input values , where k=0,1,.N-1 by defined number of locations. Barrel shifter yields the number of increments to be added or subtracted from the present weights. The sign bit from the sign- magnitude separator is used as the control for adder/subtractor cells such that depending on the value of the sign bit whether it is zero or one, the barrel shifter output is respectively added or subtracted from the content of the weight corresponding current value in register.
Thus the fourth order filter implementation is done successfully with 8-DA table structure. It reduces the complexity as the basic terms like x(n), x(n-1), x(n-2), x(n-3) are more enough to generate all the other terms. So higher order filters can also be easily implemented in hardware as this design achieves low-complexity.
A. Higher Order Filter Implementation
The structure of higher order filter of size N=16 is shown in figure 7. The design involves four sets of inner product blocks and weight increment blocks. All the blocks are connected in a proper manner in order to give a desired result.
The four 4-point inner product blocks and weight increment blocks all together is known as 16-bit data computing block as this sub block computes 16-bit sum and carry words. These are used in further computations. As in the case of fourth order filter implementation, here also the L+2 bit sums and carry are produced by the four inner product blocks. And these will be added by using two binary adder trees. The output of four 4-point inner-product blocks i.e. sum words are added with four carry-in bits. Since the carry words are of double the weight compared to the sum words, two carry-in bits are set as input carry at the first level binary adder tree of carry words, which is equivalent to inclusion of four carry-in bits to the sum words. Sign magnitude separator is used to separate sign bits and magnitude bits from the calculated error as in the case of smaller order filter designs. Outputs of sign-magnitude separator and control word generator are fed commonly to all the weight increment blocks. The logic used for control word generator is also same as that of previous logic. Further, the filtering process is same as that of the 4th order adaptive FIR filter except that the process is performed on 16-bit data instead of 4-bit data.
Similarly, the structure can be extended to 32-bit filter implementation. For that purpose two 16-bit data computing blocks which are mentioned earlier have been used. The structure of 32-bit filter is shown in the fig.8. Just by performing the binary addition of 16-bit sum and carry of the two 16-bit data computing blocks, the filter of length N=32 can be realized. The connections between the blocks must be given properly for achieving better results.
Fig.7. Structure of DA-based LMS adaptive filter of length N=16 and P=4
d(n) D
>>4
>>1
D
( 2)>
SIGN-MAG SEPARATO
SUM CARRY 16-BIT DATA COMPUTING BLOCK
SUM CARRY 16-BIT DATA COMPUTING BLOCK
x(n+1) sign mag
CONTROL WORD
To weight increment block
Fig. 8. Structure of DA-based LMS adaptive filter of length N=32 and P=4
-
SYNTHESIS RESULTS
TABLE I. SYNTHESIS REPORT
Design
Filter Length, N
No. of slices
No. of slice flip-flops
No. of 4-i/p LUTs
Availability
4656
9312
9312
Existing
4
252
271
469
8
406
464
733
Design with
an LUT
using 16-
16
838
912
1509
delay
elements
32
1287
1376
2313
Proposed
4
267
204
494
8
437
330
783
Design
with an LUT
16
901
644
1609
using
8-delay
32
1382
974
2463
elements
In this section we have implemented the proposed design with VHDL and synthesized our design using Xilinx ISE 13.2. The results obtained are tabulated above. From the above results it is clearly noticed that the hardware complexities are reduced significantly.
-
CONCLUSION
-
Low-complexity design implementations have greater importance in efficient hardware implementations. In this brief an architecture for the implementation of adaptive FIR filters is proposed. This design achieved low-complexity compared to existing DA-based implementations.
REFERENCES
-
S. Haykin and B. Widrow, Least-Mean-Square Adaptive Filters. Hoboken, NJ, USA: Wiley, 2003.
-
S. A. White, Applications of the distributed arithmetic to digital signal processing: A tutorial review, IEEE ASSP Mag., vol. 6, no. 3, pp. 419, Jul. 1989.
-
D. J. Allred, H. Yoo, V. Krishnan, W. Huang, and D. V. Anderson, LMS adaptive lters using distributed arithmetic for high throughput, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 7, pp. 13271337, Jul. 2005.
-
Sang Yoon Park and Promod Kumar Meher, Low- power, High- Throughput, and Low-area Adaptive FIR Filter Based on Distributed Arithmetic, IEEE Trans. On Circuits and Systems-II, Express Briefs, Vol.60, no.6, pp.346-350, June 2013.R. Nicole, Title of paper with only first word capitalized, J. Name Stand. Abbrev., in press.
-
W. Huang and D.V. Anderson, Modified Sliding-Block Distributed Arithmetic with Offset Binary Coding for Adaptive Filters, Journal of Signal Processing Systems, April 13, 2010.M. Young, The Technical Writers Handbook. Mill Valley, CA: University Science, 1989.