Low-power and High Speed 128-Point Pipeline FFT/IFFT Processor for OFDM Applications

Dr. D. Bhattacharya; Anil G L

doi:10.17577/IJERTV1IS6344

Volume 01, Issue 06 (August 2012)

Low-power and High Speed 128-Point Pipeline FFT/IFFT Processor for OFDM Applications

DOI : 10.17577/IJERTV1IS6344

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 114
Total Downloads : 638
Authors : Dr. D. Bhattacharya, Anil G L
Paper ID : IJERTV1IS6344
Volume & Issue : Volume 01, Issue 06 (August 2012)
Published (First Online): 30-08-2012
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Low-power and High Speed 128-Point Pipeline FFT/IFFT Processor for OFDM Applications

Vol. 1 Issue 6, August – 2012

DR. D. BHATTACHARYA1, ANIL G L2

Professor, Department of ECE at Vel Tech Technical University, Chennai. India
PhD Scholar, Department of ECE at Vel Tech Technical University, Chennai, India

ABSTRACT

This paper represents low power and high speed 128-point pipelined Fast Fourier Transform (FFT) and its inverse Fast Fourier Transform (IFFT) processor for OFDM. The Modified architecture also provides concept of ROM module and variable length support from 128~2048 point for FFT/IFFT for OFDM applications such as digital audio broadcasting (DAB), digital video broadcasting-terrestrial (DVB-T), asymmetric digital subscriber loop (ADSL) and very-high-speed digital subscriber loop (VDSL). The 128-point architecture consists of an optimized pipeline implementation based on Radix-2 butterfly processor Element. To reduce power consumption and chip area, special current-mode SRAMs are adopted to replace shift registers in the delay lines. In low-power operation, when the supply voltage is scaled down to 2.3 V, the processor consumes 176mW when it runs at 17.8 MHz.

KEYWORDS

Low power, FFT, IFFT, OFDM INTRODUCTION

The FFT (Fast Fourier Transform) and its inverse (IFFT) are the key components of OFDM (Orthogonal Frequency Division Multiplexing) systems. Recently, the demand for long length, high-speed and low-power FFT has increased in the OFDM applications. There are three kinds of main design architectures for implementing a FFT processor. One is the single-memory architecture. It has one processing element and one main memory. Hence, it occupies a small area. The second is the dual- memory architecture, which has two memories. This architecture has a higher throughput than the single-memory architecture because it can store butterfly outputs and read butterfly inputs at the same time. The fast Fourier transform plays an important role in many digital signal processing (DSP) systems. Recent advances in semiconductor processing technology have enabled the deployment of dedicated FFT processors in applications such as telecommunications, speech and image processing. Specifically, in the OFDM communication systems, FFT and inverse FFT (IFFT) play a very important role. The OFDM technique, due to its effectiveness in overcoming adverse channel effects [1, 2] as well as spectrum utilization, has become widely adopted in wire line and wireless communication standards.

The OFDM technique has been adopted in several standards like digital audio broadcasting (DAB) [3], digital video broadcasting- terrestrial (DVB-T) [4], asymmetrical digital subscriber line (ADSL) [5] and very-high-speed digital subscriber line (VDSL) [6]. Therefore, efficient and low-power VLSI implementation of FFT processors is essential for successful deployment of these OFDM-based systems. According to the standards of DAB, DVB-T, ADSL and VDSL, various FFT sizes are required, as shown in Table 1. From this Table, it is clear that variable-length FFT hardware is a crucial module in the low-cost solution of the above communication systems.

The Cooley Tukey N-point FFT algorithm requires O(Nlog N) computations, which is a huge saving over direct computation of the discrete Fourier transform (DFT). However, hardware implementation of the algorithm is both computational intensive, in terms of arithmetic operations, and communication intensive, in terms of data swapping. For real-time processing of FFT, O(log N) arithmetic operations are required per sample cycle. High speed real-time processing can be accomplished in two different ways. In the conventional general-purpose digital signal processor (DSP) approach, the computation is carried out by a single processor driven to a high clock frequency, which is O(log N) times the data sample frequency. In the application- specific parallel or pipelined processor approach, the required operations are performed at the clock frequency equivalent to the sample frequency, and this approach usually consumes less power.

In this paper, we aim to implement a low-power variable-length FFT processor. To this end, we adopt several optimization techniques in the circuit design to accomplish an area- and power-efficient pipelined FFT processor.

Pipelined FFT/IFFT processor Architecture Radix-2 FFT/IFFT architecture

The radix-2 multi-path delay commutator [7] is a pipelined implementation of the radix-2 FFT/IFFT algorithm. A radix-2 multi-path delay commutator architecture with N Â¼ 8 is shown in Fig. 1. The input sequence is divided into two parallel data streams by a commutator and then, with proper scheduling for two streams, butterfly operation in a processing element (PE)

and twiddle factor multiplication is executed. In total, (log2 N -2) multipliers, log2 N radix-2 butterfly units, and (3N/2) -2 delay elements are

Communication system	OFDM Size
ADSL	512
VDSL	8192,4096,2048,1024,512
DAB	2048,1024,512,256
DVB-T	8192,2048

Required. With a proper input buffering scheme, the processing element can work at 100% utilization.Radix-2 single-path delay feedback architecture (shown in Fig. 2) utilizes the delay elements more efficiently by sharing the same storage between the butterfly outputs and inputs [8]. A single data stream goes through the multiplier at every stage. This architecture has the same number of processing elements (PEs) and multipliers as needed in the radix-2 multi-path delay commutator architecture, albeit only N 1 delay elements. Note that the butterfly units and multipliers work at 50% utilization since half of the time they are bypassed.

Radix-2=4=8 FFT/IFFT algorithm and architecture

The N-point DFT is formulated as

Vol. 1 Issue 6, August – 2012

Fig. 1 Radix-2 multi-path delay commutator FFT/IFFT architecture and PE

N 1

z

x xnW nz , z 0,1, 2,………N 1

n 0

PE

nz Fig. 2 Radix-2 single-path delay feedback FFT architecture, and

Where W nz

j 2

e N The basic concept underlying the radix-2

FFT/IFFT algorithm is the use of symmetry between the twiddle factors Wnz and Wnz+N/2 (Wnz = -Wnz+N/2).

Exploiting twiddle factor symmetry further, the multiplication by the twiddle factors of WN/8, W3N/8, W5N/8 and W7N/8 can be further simplified since their real and imaginary parts have equal magnitude. The complex multiplications by these four twiddle factors can be formulated as:

Note that these complex multiplications can be realized by two real multiplications and two additions.

The signal flow graph (SFG) of the radix-2=4=8 FFT/IFFT algorithm is shown in Fig. 3 [9]. Instead of one single butterfly, the radix-2=4=8 algorithm implements the radix-8 butterfly using three radix-2 stages. Therefore its SFG is equivalent to that of the radix-23 algorithm [10]. Note that by modifying the radix-

(a jb)W N /8

(a jb)W 5 N /8

2

2 single-path delay feedback FFT/IFFT architecture, a radix-

2=4=8 architecture was proposed in [9]. There are three types of basic processing elements, called PE1, PE2 and PE3, and each processes one FFT stage. The architecture is made up of a

(a b)

2

j(b a)

repeated cascade of PE1, PE2, PE3 and a general complex multiplier for twiddle-factor multiplication. The numbe of delay elements needed decreases by half in every stage. The block diagrams of these three types of processing elements are

a jb W 3N /8

a jb W 7 N /8

illustrated in Fig. 4.

2 (b a)

2

j(a b)
Proposed variable-length FFT/IFFT processor architecture

At the architecture level, to reduce power consumption and chip area, it is desirable to adopt the FFT algorithm which has least computational complexity and the architecture that corresponds to less hardware complexity. The block diagram of the proposed variable-length FFT processor based on the radix-2=4=8 single- path delay feedback architecture is depicted in Fig. 5. The proposed processor can perform FFT operations of three

different lengths: 2048-point, 1024-point and 128-point. To accommodate different numbers of FFT stages, the first two stages are radix-2 PEs, which have the same structure as the PE3 unit in the radix-2=4=8 architecture, and each of the following three blocks is made up of a set of PE1, PE2 and PE3 and a twiddle-factor multiplier. If 512-point FFT is executed, then input signals skip the first two stages through the control of the multiplexer, MUX2. If a 128-point FFT is performed, the first stage is bypassed through MUX1.

Architecture considerations

Comparison of FFT architectures

Over the years, various FFT architectures have been proposed with a view to providing speedy and efficient implementation of the all-important FFT operation. In Table 2, we list some computational features the radix-2=4=8 FFT architecture used in the proposed IC and

ISSN: 2278-0181

Vol. 1 Issue 6, August – 2012

Fig. 3 Signal flow graph of the radix-2=4=8 FFT/IFFT algorithm

Fig. 4 Block diagrams of the PE units in the radix-2=4=8 architecturePE1 PE2 PE3

Fig. 5 Block diagram of proposed variable-length FFT/IFFT processor

Several other recent architectures. In the Table, we compare their computational complexity and memory requirements. It is apparent that the number of nontrivial complex multiplications decreases as the radix gets higher.

In addition, in bit-parallel operation, higher-radix algorithms also have better hardware utilization in multipliers. As to the adders in butterfly units, if the higher radix butterfly operation is implemented by concatenating radix-2

Table 2: Comparison of several FFT architectures

Bit-parallel Digit-serial

Radix

Data flow

Comple x adder utilizatio n

Comple x multipli er utilizatio n

Data memory

Twiddle factor ROM

radix- 2=4=8

feedba ck

2 log2 N

50%

log8 N 7 1

87:5%

N 7 1 0.25N

radix-4

feedba ck

4 log4 N

50%

log4 N 7 1

75%

N 7 1 N

radix- 4

Feed forwa rd

12

log4 N

100%

3(log4 N- 7

1)

100%

2.5N N

radix- 4

Feed forwa rd

8(log4 N + 1)

100%

3(log4 N -7 –

1)

100%

1.18N

0.5N

Proposed chip He & Torkelson [10] Hui et al. [ Chang & Parhi [12]

butterfly units, such as in [10], then only 50% adder utilization can be achieved. Note that in the digit-serial architectures [11, 12], the word-length of the data in adders and multipliers is reduced to one digit and thus fewer full adders are required. On the other hand, to achieve almost 100% utilization in adders and multipliers, the word-length of the signals in these two architectures must be restricted to match the throughputs of the radix-4 commutator 4 digits in these cases. Nevertheless, the occupied area of one complex multiplier overwhelms the area of one complex adder. Thus, a great saving in the cost of silicon can be accomplished with fewer complex multipliers.

Feedback FFT architecture needs the least amount of data memory, in the size of N 1. On the other hand, feed-forward architecture requires more memory elements, as in [11, 12]. Other memory blocks are the look-up-table ROMs that store twiddle factors. If the number of nontrivial complex multiplications is decreased, then there are fewer twiddle-factor ROMs. The twiddle-factor ROM for the first multiplier stores twiddle factors with a phase spacing of 2p=N. In the later stages, the phase spacing increases. If the symmetry of the sine=cosine function is further exploited, more saving in ROM size can be had. In the proposed chip, the twiddle-factor ROMs store only one-eighth cycle of the sine=cosine waveforms and we take advantage of the symmetry of all the twiddle factors instead of the redundancy within each group of Wn, W2n and W3n, for n Â¼ 0; 1; . . . ; N=4, in the radix-4 algorithm as in [12], and consequently a smaller ROM table is built.

In summary, the radix-2=4=8 algorithm can bring forth a variable-length FFT processor with the least overall hardware complexity. Although its adder and multiplier utilization is not as good as other architectures, we decide to adopt this architecture because it strikes a balance between hardware complexity and computational efficiency.

Complex multiplier against CORDIC

The CORDIC algorithm has been used for the twiddle factor multiplication in FFT processors due to its efficiency in vector rotation [13]. In this sub-Section, we evaluate and compare the performance and complexity of a CORDIC and a complex multiplier in phase rotation. In Table 3, the conventional CORDIC algorithm refers to the radix-2 CORDIC, and radix- 2=4 CORDIC refers to the work in [14] that enhances operation speed and reduces 25% of the micro-rotation stages. The complex multiplier used in the proposed chip consists of three multiplications and five additions [15]. To make a fair comparison, we set the precision to 16 bits in all algorithms. To avoid rounding error propagation [14, 16], 19 bits are allocated in the data path of the CORDIC-based architectures.

In the conventional CORDIC algorithm, a ROM table that stores the rotation sequences with N=4 16-bit words in the range of Â½0; p=2& is used. Two 19-bit adders are required in each micro-rotation stage and the conventional CORDIC architecture needs 2 16 19 Â¼ 608 full adders for 16 micro-rotation stages. Additional constant multiplication by 0.100110110111010 as the scaling factor is performed in the final scaling stage and it needs 2 9 19 adders. Without pipelining, its critical path delay is 19 16 times the full adder delay (TFA) in the 16 micro-rotation stages plus 28 TFA in the scaling stage.

In [14], the ROM table is further reduced to N=8 words with 23 bits per word due to the higher radix adopted in the later stages. According to the authors, each stage is based on a similar cell with a 4-2 adder=subtracted using two-level carry save adders (CSA) and redundant arithmetic representation intended to improve the performance. Two registers are used to buffer the intermediate sum and carry in each stage. Meanwhile two full adders are connected to perform the 4-2 compression. As a result, a total of 2 19 17 2 full adders are provided in the 17

ISSN: 2278-0181

stages including additional micro-rotation-repeVtoitli.o1nIssstuaeg6e, sAaugnudst2- 2012

scaling stages. Because of pipelining in every stage, the critical

path delay is reduced to about 2TFA with a penalty of a large number of Ã°17 19 2 2Ãž pipeline registers. Actually, its CORDIC outputs are still in the form of redundant arithmetic representation and will be transformed back to the binary format after butterfly operation by carry-look ahead adders.

In the proposed chip, complex multiplication consists of five real additions and three real multiplications. The real addition is implemented by carry-selected adders with a maximum delay of about 8TFA and each utilizes 30 full adders in the first 16-bit addition and 63 full adders in the last 33-bit addition. Because Wallace tree multipliers are adopted for the tree 16 17 multiplications, the critical path delay is reduced to 7TFA. One 16 17 Wallace tree multiplier needs about 280 full adders, and two pipeline stages are inserted before and after the multiplication.

We can see that the CORDIC algorithm may be too slow without pipelining. On the other hand, Wallace tree multiplication reduces the critical path delay of the complex multiplier approach. Considering all aspects of speed area tradeoff and that the application of the FFT processor is low power consumption rather than high speed, we use the complex multiplier for twiddle factor multiplication.

Circuit design

To serve as a key component in OFDM communication systems, the variable-length FFT processor must be designed to reduce its power consumption as well as chip
1. Word-length minimization
  
  In the design of this application-specific variable-length FFT processor, the word lengths of various signals are minimized according to their respective signal-to-noise ratio (SNR) requirements. To decide the optimal word length, input waveforms with Gaussian noise are fed to the FFT with fixed- point arithmetic implementation. The frequency-domain FFT output signals are obtained and the output signal-to-noise ratio (SNR) is computed. Figure 6a shows the output SNR against the FFT input word length under different input SNR conditions. Accordingly, the word length of the input is set to 9 bits. As to the precision of the sine and cosine tables, the output SNR against the word length of the twiddle factors is shown in Fig. 6b when the input signal has an SNR of 30 dB. A word length of 9 bits is thus chosen for the twiddle factors. The word-length minimization process then goes on module by module and the word lengths of all signals in the processor are determined, and are labeled in Fig. 5. Conventional address decoder since data to and from the SRAM is accessed sequentially. To further conserve power consumption, true-single-phase-clock (TSPC) flip-flops are used in the ring counters.
  
  4.3 Current-mode SRAM
  
  The current-mode technique has been used in reading SRAM cell contents. It has been proposed that the current-mode technique can also be applied to the writing operation of SRAM so as to further reduce power consumption [18]. This is because voltage swings of the SRAM bit lines and data lines can be kept very small in the current-mode read=write operations and thus the dynamic power dissipation can be significantly decreased.
  
  Fig. 6 Output SNR against word length of the FFT processor input and of twiddle factor
  
  4.2 RAM-based delay line
  
  A single-path delay feedback FFT processor needs several long and wide delay lines. Conventionally, delay lines are mostly implemented in shift registers, made up of cascades of data registers, as shown in Fig. 7a. At each clock edge, all data move forward in a lock-step fashion and approximately half of the registers change states, wasting much power. To save power and chip area, SRAM has been utilised to replace the shift registers. Since the read and write operations must be performed in one clock cycle, intuitively a dual-port memory is required. Two single-port SRAMs are adopted in [17], and the authors claimed that a single-port memory can save 33% in area over a dual-port memory. Here we use one single-port SRAM as shown in Fig. 7b. The SRAM is designed manually. In the first half clock cycle, the read operation is performed while the write operation follows in the next half clock cycle. To prevent the data access of the SRAM becoming critical paths, two registers, one before the PE and the other after, are inserted. Furthermore, a ring counter is used instead of the
  
  Fig. 7 Conventional shift-register-based delay line and proposed SRAM-based delay line
  
  The current-mode SRAM cell used is basedVool.n1 tIhssautep6r, oApuogussetd- 2012
  
  in [18], and it consists of seven transistors, one more than the
  
  conventional 6-transistor SRAM cell, and it is depicted in Fig. 8a. An extra transistor, Meq, is inserted to equalize the output voltages of the two inverters before each write operation, and therefore a small current difference can be sensed through access transistors controlled by the
  
  Fig. 8 Schematic diagrams of proposed 7T current-mode SRAM memory cell and of SRAM write circuitry using N- type current conveyor
  
  Word-line enable signal and amplified by the inverters. When Meq is off, the cell performs as the conventional 6T SRAM memory cell.
  
  During write access, a current difference, DI, appears on the write data lines wdlp and wdln. The N-type current conveyor (shown in Fig. 8b) is enabled by the signal WY. Then the currents are conveyed to the bit lines blp and bln without attenuation. Because the control signal WY is enabled, a virtual short circuit exists between the write data lines wdlp and wdln. Both the voltages at wdlp and wdln are equal to VDD Ã°V1 Ã¾ V2Ãž, which can be designed to approach the ground voltage. Thus the voltage swing on data lines can be kept as small as possible. The read operation in this SRAM is implemented by a sense amplifier, which has the same structure as the conventional SRAM, and a column decoder. As in conventional SRAM, a read access starts with the word line being enabled and the pair of bit lines driven by a differential current, which is then steered to the sense amplifier, where the data are sensed and buffered.
  
  4.4 Complex multiplication and twiddle-factor ROM
  
  p
  
  In the proposed FFT processor, due to the radix-2=4=8 algorithm, each complex multiplication of WN=8, W3N =8, W5N=8 and W 7N=8 is reduced to two real multiplications by the constant 2=2 as shown in (2) and (3), which can be further simplified to shift and add operations [9].
Experimental results

The whole chip, except for the SRAM modules, was designed by a gate-level hardware description language. The critical path lies in the complex multiplier. The layout of the SRAM modules containing the ring counters, timing control units as well as the SRAM cells are all designed manually. This proposed FFT processor is fabricated using a 0:35 mm CMOS process. The chips die photo is shown in Fig. 10. The multipliers are marked as MUT with their corresponding twiddle-factor ROMs right beside, and the PEs for processing elements are labeled as Ux. Considering circuit overheads in SRAM, all delay lines longer than 64 are implemented by SRAM, while shorter ones are realized by registers. A brief summary of the chip is given in Table 3.There was an error made in some of the ROM values but, discounting that error, the rest of the chip can operate as designed. The FFT processor can operate up to 17.8 MHz and dissipates 176mW at 2.3V supply voltage and it can operate up to 45 MHz at 3.3V supply voltage when it consumes 640 mW Comparisons of the proposed chip with several FFT processors [9, 17, 20, 21], including FFT size, algorithm, process, supply voltage, power consumption, clock rate, execution time and area. Because these FFT processors are fabricated in different CMOS technologies and the FFT sizes are also different, it is not easy to make a fair comparison. We adopted three indices to make comparisons and adjusted the numbers by estimation assuming all processors perform a 1024-point FFT. We use the normalized area, a metric in [21], and it is given by

Vol. 1 Issue 6, August – 2012

Normalized area =Area of 1024 – point

FFT/Technology/(0:35m)2.

FFT/Energy =Technology/Power of 1024 – point FFT * Execution Time *10-6

Another metric considering both energy efficiency and speed performance is the energytime product, and it is given by

Energy * Time =Execution Time/FFT/Energy

We can see from the Table that the proposed chip has the smallest normalized area and the smallest energytime product. Although the FFT processor in [21] has the best energy efficiency when operating at 1.1 V, its slow execution speed at that low voltage prevents it from high-speed applictions in Table 1.

TSMC 0.35 1P4M

3:9mm_ 5:5mm 598 078

45MHz at 3.3V

640mW (at 45 MHz, 3.3 V)

176mW (at 17.8 MHz, 2.3 V)

68 PGA

Table 3 chip summary

Process Area

Transistor count Maximum frequency

Power consumption (at highest speed)

Power consumption (at lowest voltage)

Package

Fig. 9 Block diagram of twiddle-factor ROM

Fig. 10 Die photograph of proposed FFT processor
Conclusions

In this paper, we have reported the design of an FFT/IFFT processor chip that is suitable for OFDM communication systems, such as DAB, DVB-T, ADSL and VDSL, for performing complex FFTs/IFFT of lengths 128=1024=2048. The proposed variable-length FFT processor not only achieves efficient hardware utilization but also low power consumption. Its a dual-path delay feedback FFT/IFFT architecture requires fewer delay elements and the radix-2=4=8 FFT algorithm replaces some complex multipliers with shift and-add operations. In addition, some other circuit techniques have been applied for saving complexity as well as power consumption. The chip was implemented using a 0:35 mm CMOS process. The measured results show that the chip can operate up to 45MHz under a 3.3-V supply voltage and it consumes 640 mW. When the supply voltage is scaled down to 2.3 V, this processor consumes only 176mW when it runs at 17.8 MHz
References

Bingham, J.A.C.: Multicarrier modulation for data transmission: an idea whose time has come, IEEE Commun. Mag., 1990, 28, (7),pp. 514
Cimini, L.J.: Analysis and simulation of a digital mobile channel using orthogonal frequency division multiplexing, IEEE Trans. Commun.,1985, 33, (7), pp. 665675
ETSI EN 300 401 (v1.3.2): Radio broadcasting systems; digital audio broadcasting (DAB) to mobile, portable and fixed receivers, Sep. 2000
ETSI EN 300 744 (v1.2.1): Digital video broadcasting (DVB); framing structure, channel coding and modulation for digital terrestrial television, Jul. 1999
T1E1.4/98-007R4: Standards project for interfaces relating to carrier to customer connection of asymmetrical digital subscriber line (ADSL) equipment, Jun. 1998
ETSI TS 101 270-2 (V1.1.1): Transmission and multiplexing (TM);access transmission systems on metallic access cables; very high speed digital subscriber line (VDSL); Part 2: Transceiver specification, Feb. 2001
Rabiner, L.R., and Gold, B.: Theory and application of digital signal processing (Prentice-Hall, Inc., NJ, 1975)
Groginsky, H.L., and Works, G.A.: A pipeline fast Fourier transform,IEEE Trans. Comput., 1970, 19, (11), pp. 10151019 9 Jia, L., Gao, Y., Isoaho, J., and Tenhunen, H.: A new VLSI- oriented FFT algorithm and implementation. Proc. IEEE ASIC Conf., 1998,pp. 337341

He, S., and Torkelson, M.: Designing pipeline FFT processor for OFDM (de)modulation. Proc. IEEE URSI Int. Symp. Signals, Systems and Electronics, 1998, pp. 257262
Hui, C.C.W., Ding, T.J., and McCanny, J.V.: A 64-point Fourier transform chip for video motion compensation using phase correlation,IEEE J. Solid-State Circuits, 1996, 31, pp. 17511761
Chang, Y.-N., and Parhi, K.K.: An efficient pipelined FFT architecture,IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process, 2003,50, (6), pp. 322325
Hu, Y.H.: CORDIC based VLSI architecture for digital signal processing, IEEE Signal Process. Mag., 1992, (4), pp. 1635
Sarmiento, R., Tobajas, F., de Armas, V., Esper-Chain, R., Lopez, J.F., Montiel-Nelson, J.A., and Nunez, A.: A CORDIC processor for FFT computation and its implementation using gallium arsenide technology, IEEE Trans. VLSI Syst., 1998, 6, (1), pp. 1830
Wenzler, A., and Luder, E.: New structures for complex multipliers and their noise analysis. Proc. IEEE Int. Symp. on Circuits and Systems, May 1995, Vol. 2, pp. 14321435
Hu, Y.H.: The quantization effects of the CORDIC algorithm, IEEE Trans. Signal Process., 1992, 40, (4), pp. 834 844
Li, W., and Wanhammer, L.: A pipeline FFT processor. Proc. Workshop Signal Processing Systems Design and Implementation,1999, pp. 654662
Wang, J.-S., Tseng, W., and Li, H.-Y.: Low-power embedded SRAM with the current-mode write technique, IEEE

J. Solid-State Circuits,2000, 35, (1), pp. 119124
Tan, L.K., and Samueli, H.: A 200MHz quadrature digital synthesizer/mixer in 0.8mm CMOS, IEEE J. Solid-State Circuits, 1995, 30,(3), pp. 19320020 Bidet, E., Castelain, D., Joanblanq, C., and Senn, P.: A fast single-chip implementation of 8192 complex point FFT, IEEE J. Solid-State Circuits, 1995, 30, (3), pp. 300305

21 Baas, B.M.: A low-power, high-performance, 1024-point FFT processor, IEEE J. Solid-State Circuits, 1999, 34, (3), pp. 380387

Dr. D. BhattachaVroyla. 1 Ifsisnuies6h,eAduguhsits- 2012

Master Degree in 2002 from Calcutta

University in the field of Electronics & Communication Engineering. He obtained his PhD from Lancaster University, UK as an International Student in 2007 from Department of Communication System. Presently, he is working as Professor in the Department of ECE at Vel Tech Technical University, Chennai. He has more than 5 years of experience in the field of Engineering Education and 5 years of experiences in Research. He worked almost 5 renouned universities through out Europe. Currently he is associated with Telecom Centre of Excellence (TCOE) in Vel Tech University as Head for continuing his research activities.

Low-power and High Speed 128-Point Pipeline FFT/IFFT Processor for OFDM Applications

Leave a Reply