- Open Access
- Total Downloads : 1424
- Authors : Bharathi.S.H, Dr.K. Nagabhushana Raju
- Paper ID : IJERTV1IS5230
- Volume & Issue : Volume 01, Issue 05 (July 2012)
- Published (First Online): 02-08-2012
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Verilog realization of Diagonal-Down-Left intra prediction for H.264 Video Encoder
Bharathi.S.H*, Dr.K. Nagabhushana Raju**
* Associate Professor in Department of Electronics and Communication Engineering at Reva Institute of Technology and Management, Bangalore.
**Associate Professor in Department Instrumentation Sri Krishnadevaraya University, Anantapur Andrapradesh.
Abstract
This paper proposes a Verilog realization of Diagonal-Down-Left intra prediction for H.264 Video Encoder. The algorithm is capable of processing video frames with the preferred compression controlled by the user input. The algorithm and architecture for Diagonal-Down- left mode of Intra prediction were designed and developed in Verilog. The complete H.264 Advanced Video Codec for the above said mode of intra prediction was coded in MATLAB first in order to verify the functionality of each block. The same is realized in Verilog. PSNR is computed for the reconstructed picture. It is found that the quality of reconstructed picture in verilog realization is more than 32 dB.
-
Introduction
Video compression systems are used in many commercial products, from consumer electronic devices such as digital camcorders, cellular phones to video teleconferencing systems. These applications make the video compression hardware devices an inevitable part of many commercial products. To improve the performance of the existing applications and to enable the applicability of video compression to new real-time applications, a new international standard for video compression is developed. This new standard, offering significantly better video compression efficiency than previous video compression standards, developed with the collaboration of ITU and ISO standardization organizations. Hence it is called with two different names, H.264 and MPEG4 Part 10 [1, 2, 3, and 4].
H.264 offers a significant performance improvement over previous video coding standards in terms of better peak signal to noise ratio and visual quality of variable block sizes for motion compensation, multiple reference frames, Integer Transform, Deblocking Filter, and Context Adaptive Variable Length Coding (CAVLC). The Intraprediction technique is one of the most important features that contribute to the success of H.264/AVC [1, 2]. The present work has realized one of the Intra prediction techniques called Diagonal-Down-Left for 4×4 luma components in achieving compression. The implementation conforms to the baseline, main as well as extended profiles since only Intra (I) frames are used. The H.264 Video coding standard supports intraprediction for various block sizes. For coding the luma signal, one 16×16 macro block may be predicted as a whole or as individual 4×4 sub blocks. There are nine modes of Intra- prediction for 4×4 sub-blocks.
In this work, we have realized Diagonal-Down-Left intra prediction technique as stated in the standard for H.264 Video Encoder in Verilog hardware language. The algorithm and architecture
for Diagonal-Down-Left mode of intra prediction has been developed. The algorithm is capable of processing video frames with the preferred compression controlled by the user input.
-
H.264/AVC Encoder
The block diagram of H.264/AVC Encoder is presented in Fig. 1 which includes two dataflow paths, a forward path and a reconstruction path. An input frame Fn is presented for encoding. Every frame is processed in units of a Macroblock (MB) of size 16×16 pixels. Each macro block is encoded in intra or inter mode. In both cases a prediction macro block P is formed based on a reconstructed frame. In intra mode, P is formed from samples in the current frame n that has been previously encoded and reconstructed. The prediction P is subtracted from the current macro block to produce a residual or difference macro block Dn.
Figure1. Block Diagram of H.264/AVC Encoder
This prediction block P is transformed and quantized to give a set of quantized transform coefficients X. These coefficients are reordered and entropy encoded. The compressed bit stream is transmitted over a band-limited serial transmission channel as Network Abstraction Layer (NAL) [5]. The quantized macro block coefficients X are decoded to reconstruct a frame for encoding of further macro blocks. The coefficients X are inverse quantized and inverse transformed to produce a difference macro block Dn. The prediction macro block P is added to
Dn to create a reconstructed macro block after a filter, which improves the quality of the reconstructed picture [6, 7, 8, 9]. The video sequences may also be encoded optionally using motion estimation and compensation to get a further compression of 10 % or more exploiting temporal redundancy.
-
H.264 Intra Prediction
Intra prediction algorithm predicts the pixels in a MB using the pixels in the available neighbouring blocks. For the luma component of a MB, a 16×16 predicted luma block is formed by performing intra predictions for each 4×4 luma block in the MB. There are nine prediction modes for each 4×4 luma block and four prediction modes for a 16×16 luma block.
Figure2. 4×4 luma block and neighbouring pixels in a sub block
Figure3. 4×4 luma prediction modes
The nine 4×4 luma intra prediction modes namely Vertical mode, Horizontal, DC, Diagonal- Down-left, Diagonal-Down-Right, Vertical-Right, Horizontal-Down, Vertical-Left and Horizontal-Up are designed in a directional manner [2,3 ]. A 4×4 luma block consisting of the pixels a to p are shown in Fig.2 The pixels A to M belong to the neighbouring blocks and are assumed to be already encoded and reconstructed and are therefore available in the encoder and decoder to generate a prediction for the current MB. Each 4×4 luma prediction mode generates 16 predicted pixel values using some or all of the neighbouring pixels A to M as shown in Fig.3. The arrows indicate the direction of prediction in each mode. The predicted pixels are calculated by a weighted average of the neighbouring pixels A-M for each mode except Vertical and Horizontal modes [1, 2].
DC mode is always used regardless of the availability of the neighbouring pixels. However, it is adopted based on which neighbouring pixels A-M are available. If pixels E, F, G and H have not yet been encoded and reconstructed, the value of pixel D is copied to these positions and they are marked as available for DC mode. The other prediction modes can only be used if all of the required neighbouring pixels are available [4, 5]. Available 4×4 luma prediction modes for a 4×4 luma block depends on the availability of neighbouring 4×4 luma blocks.
-
-
Architectural Details of Diagonal-Down-Left (Mode 3) Intra prediction module
Intra prediction is done to achieve compression within a frame. The neighboring pixels within the picture frame tend to have similar values in order to exploit the spatial redundancy the prediction is done based on the values of reconstructed pixels of previous sub block. The Diagonal-Down-Left Intra prediction (Mode 3) technique predicts the pixel values to be processed in Diagonal-Down-Left direction. The current pixel prediction can be made on the availability of last row of reconstructed pixels. A, B, C, and D are the neighboring pixel values of last row of reconstructed pixels as shown in fig.4. The direction of prediction and the prediction equations are shown in fig.5.
Figure 4. Architecture of Diagonal-Down-Left intra prediction
a) The direction of prediction for Diagonal-Down-Left mode
pred[0,0] A + 2B + C + 2 >> 2
prd[0,1] B + 2C + D + 2 >> 2
pred[0,2] C + 2D + E + 2 >> 2
pred[0,3] D + 2E + F + 2 >> 2
pred[1,0] B + 2C + D + 2 >> 2
pred[1,1] C + 2D + E + 2 >> 2
pred[1,2] D + 2E + F + 2 >> 2
pred[1,3] E + 2F + G + 2 >> 2
pred[2,0] C + 2D + E + 2 >> 2
pred[2,1] D + 2E + F + 2 >> 2
pred[2,2] E + 2F + G + 2 >> 2
pred[2,3] F + 2G + H + 2 >> 2
pred[3,0] D + 2E + F + 2 >> 2
pred[3,1] E + 2F + G + 2 >> 2
pred[3,2] F + 2G + H + 2 >> 2
pred[3,3] G + 3H + 2 >> 2
Figure 5. (b) Prediction equation for 4×4 Diagonal-Down-Left mode intra prediction.
After studying the above equations it is observed that there are common parts in the equations and some equations are similar also. Hence for implementation of these equations for prediction the equations are reorganized as shown in Table 1.
Table 1.Reorganized prediction equations for 4×4 Diagonal-Down-Left Mode
Pixel Position
Prediction Equation
pred[0,0]
[(A+B)+(B+C)+2] >>2 pred[0,1]
Pred[1,0]
[(C+D)+(B+C)+2] >>2 pred[0,2]
Pred[1,1]
Pred[2,0]
[(C+D)+(D+E)+2] >>2 pred[0,3]
Pred[1,2]
Pred[2,1]
[(E+F)+(D+E)+2] >>2 pred[3,0]
[(E+F)+(D+E)+2] >>2 pred[1,3]
Pred[2,2]
Pred[3,1]
[(E+F)+(F+G)+2] >>2 pred[2,3]
Pred[3,2]
[(G+H)+(F+G)+2] >>2 pred[3,3]
[(G+H)+(H+H)+2] >>2 G and H have not yet been encoded and reconstructed, the value of pixel D is copied to these positions and they are marked as available for DC mode. The other prediction modes can only be used if all of the required neighbouring pixels are available [4, 5].
Fig.4 shows the architecture of Diagonal-Down-Left intra prediction. The prediction technique exploits only the last row of previous reconstructed pixel values of sub block namely A, B, C, D as indicated in fig.4. The part of equation like (A+B), (B+C), (C+D) and (D+D) are computed and stored as intermediate values P1, P2, P3 and P4. P, Q, R, S are the predicted pixels obtained as per the equations explained. The picture is divided in to Macro blocks of size 16×16 pixels. A picture is processed macro block by macro block in the order from top to bottom and from left to
right. The order of the macro block process is also shown in fig.4. Further each macro block is divided into 4×4 pixels sub blocks A1 to A15. The order of sub block process is A0, A1, A2, and A3; A4, A5 A15. The pixel values of current sub block are p1, p2, p3.p16. In the fig.4 A4 is the current sub block required to be processed, A3 is the sub block which has already processed and reconstructed. For processing the first sub block A0 the prediction pixels are taken as 0,0,0,0. Fig.4 also presents the reconstruction of A4 current sub block as an example. The reconstruction is by processing TQIQIT. For processing the integer transform, the residual values of the sub block are taken as the inputs. The residual values are got by taking the pixel-wise difference between the current sub block (A4) and the prediction sub block (A3). The reconstructed residual pixels of the sub block are obtained after processing of TQIQIT. Subsequently these reconstructed residual pixels are added with the corresponding prediction sub block pixels to get the reconstructed sub block A4. For the first sub block A0 of a macro block no pixels are available to generate the predicted blocks. Therefore, the predicted block of such a block has its entire pixel values as 0, i.e., the block is processed without prediction.
-
Detailed architecture of Diagonal-Down-left Intra prediction module
Intrapredict_memorie s_DDL_Y
Intrapredict_memories
_DDL_C
Intrapredict module
Intrapredict_cntrl
Diagonal-Down-left Intra predict module applies Diagonal-down-left intra prediction (Mode 3) mode of intra prediction to predict the values of pixels to be processed. This method utilizes top row of pixel values of previous reconstructed sub block. The intrapredict module of this mode as coded in this work has three sub modules like Intrapredict_memories_DDL_Y, Intrapredict_memories_DDL_C and Intrapredict_Control. The detailed architecture of intrapredict module of Diagonal-Down-left intra prediction is presented in fig.6. The inner blocks of the fig.6 shows the three sub modules of Diagonal-Down-left Intra predict module. Intrapredict_memories_DDL_Y module comprises four_pix_out_Y and ram_predict_DDL_Y sub modules for luma components. Further Intrapredict_memories_DDL_C sub module in turn has four_pix_out_C and ram_predict_DDL_C sub modules for chroma components respectively. The detailed architecture of each sub modules is presented in next sections.
Pix_Y_dram_in [7:0] Pix_Y_dram_valid_in
Pix_Cb_dram_in [7:0] Pix_Cb_dram_valid_in Pix_Cr_dram_in [7:0]
Pix_Cr_dram_valid_in Pix_res_recon_in [7:0]
Pix_res_recon_valid_in
clk reset_n
halt
Pix_Y_recon_DDL_out [7:0]
Pix_Y_recon_valid_out Pix_Cb_recon_DDL_out [7:0]
Pix_Cb_recon_valid_out Pix_Cr_recon_DDL_out [7:0] Pix_Cr_recon_valid_out
Pix_res_DDL_out0 [7:0]
Pix_res_DDL_out1 [7:0]
Pix_res_DDL_out2 [7:0]
Pix_res_DDL_out3 [7:0] Pix_res_valid_out
Pix_Cb_req_dram_out Pix_Y_req_dram_out Pix_Cr_req_dram_out
Figure 6. Architecture of Diagonal_Down_left intra prediction Module
Pix_Y_dram_in is the luma (Y) pixel values from dual ram which is of 8 bit length. The luma(Y) pixel values are validated by pix_Y_dram_valid_in signal. When this signal is high luma pixel values are considered as valid values. Similarly pix_Cb_dram_in and pix_Cr_dram_in are the chroma pixel values from dual ram (chroma) whose values are ascertain by pix_Cb_dram_valid_in and pix_Cr_dram_valid_in respectively.
Intrapredict module generates residual pixels. Residual pixel values are the difference between actual pixel values and the predicted pixel values. These residual pixels are applied to Transform, Quantization, Inverse Transform and Inverse Quantization (TQIQIT) module to get residual reconstructed pixel values. pix_res_recon_in is the reconstructed pixel values which forms input to intrapredict module. The validity of pix_res_recon_in is ascertained by pix_res_recon_valid_in signal with high status. Intrapredict module outputs the luma reconstructed pixel called pix_Y_recon_DDL_out. pix_Cb_recon_DDL_out and pix_Cr_recon_DDL_out are the chroma reconstructed pixel values. The values of these reconstructed pixel values are ascertained by pix_Y_recon_valid_out, pix_Cb_recon_valid_out and pix_Cr_recon_valid_out respectively. pix_res_DDL_out0, pix_res_DDL_out1, pix_res_DDL_out2 and pix_res_DDL_out3 are the residual pixel values obtained by applying Diagonal-Down-left mode of intra prediction. These are the pixel values of one column of the 4×4 residual pixel block. Residual pixel values are the difference between actual pixel values and predicted pixel values. These residual pixel values are applied to TQIQIT module to get pix_res_recon pixel values. pix_Y_req_dram_out is a handshaking signal. When this signal is high, request is sent to dual ram to output pixels.
Pix_Y_dram_in[7:0]
Pix_Y_dram valid_in Pix_Y_res_recon_in[9:0] Pix_Y_res_recon_valid_in
Pix_Y_req_cntrl_in
lk reset_n
halt
Pix0_out [7:0]
Pix1_out [7:0]
Pix2_out [7:0]
Pix3_out [7:0] pix_out_valid
pix_pred0_DDL [7:0]
pix_pred1_DDL [7:0]
pix_pred2_DDL [7:0]
pix_pred3_DDL [7:0]
pix_pred_valid
Pix_Y_recon_DDL_out [7:0]
four_pix_out_Y
intrapredict_memories_DDL_Y
ram_predict_DDL_Y
Pix_Y_recon_valid_out Pix_Y_res0_DDL_out [9:0]
Pix_Y_res1_DDL_out [9:0]
Pix_Y_res2_DDL_out [9:0]
Pix_Y_res3_DDL_out [9:0]
Pix_Y_res_valid_out Pix_Y_req_dram_out
Figure 7. Architecture of Intrapredict_memories_DDL_Y Module
Intrapredict module of Diagonal-Down-Left Intraprediction module is described by three sub modules intrapredict_memories_DDL_Y, intrapredict_memories_DDL_C and intrapredict_cntrl. The following section explains in detail the sub module of intrapredict_memories_DDL_Y sub module.
Intrapredict_memories_DDL_Y module includes two rams. One ram for storing 4×4 block of the original image and the other ram for storing the reconstructed pixel values. The reconstructed pixel values are needed to generate the predicted sub block. These predicted values are saved to reconstruct the pixel values. Fig.7 presents the architecture of intrapredict_memories_DDL_Y in depth. pix_Y_dram_in is luma pixel values from dual ram. Whose values are ascertained by the signal pix_Y_dram_valid_in, pix_Y_res_recon_in is the input signal to intrapredict_memories_DDL_Y sub block which is from TQIQIT module. TQIQIT module applies Transformation Quantization Inverse Quantization and Inverse Transformation to residual pixels to reconstruct the residual sub block. The validly of pix_Y_res_recon_in pixel values are ascertained by making pix_Y_res_recon_valid_in high. pix_Y_req_cntrl_in is the request signal sent to dual ram for pixel request.
Intrapredict_memories_Y is the sub module of Intrapredict module which generates pix_Y_recon_DDL_out. pix_Y_recon_DDL_out is the luma reconstructed pixel values which can be obtained by adding predicted pixel values to pix_Y_res_recon_in signal. The validity of pix_Y_recon_DDL_out pixel values is validated by pix_Y_recon_valid_out. pix_Y_res0_DDL_out, pix_Y_res1_DDL_out, pix_Y_res2_DDL_out and pix_Y_res3_DDL_out are the luma residual pixel values generated by the difference between actual pixel values to be processed and predicted pixel values generated by ram_predict_DDL_Y module using Diagonal-Down-left mode of intra prediction. These residual pixel values are applied to TQIQIT module. The pixel values are ascertained by making pix_Y_res_valid_out signal. pix_Y_req_dram_out is the hand shaking signal, when this signal is high the request is sent to dual ram to output the pixels.
Current Sub block column pixels
Pix_Y_dram_in [7:0]
Pix_valid_dram_in Pix_req_cntrl_in
clk
Reset_in
halt
Pix0_out[7:0]
four_pix_out_Y
Pix1_out[7:0]
Pix2_out[7:0] Pix3_out[7:0]
Pix_valid_out Pix_req_dram_out Do-read_out
Figure 8. Architecture of four_pix_out_Y module
Intrapredict_memories_DDL_Y has again two sub modules four_pix_out_Y and ram_predict_DDL_Y. As intrapredict_memories_DDL_Y module generates the residual pixel values and luma reconstructed pixels, it is required to build two rams in this module. The detailed architecture of four_pix_out_Y and ram_predict_DDL_Y are presented below. Fig.8 presents the architecture of four_pix_out_Y module.
The current sub block pixel values from the dual RAM module is input to the module four_pix_out_y using the data bus pix_y_dram_in [7:0]. Its validity is signaled by simultaneously asserting the signal pix_valid_dram_in The four_pix_out_y module outputs current pixel values, pix0_out,to pix3_out. The values of these pixels are ascertained by pix_valid_out signal. pix_Y_req_dram_out is the hand shaking signal, when this signal is high the request is sent to dual ram to output the pixels.
3.2 Architecture of ram_predict_DDL_Y Module
The module as shown in fig. 9 called ram_predict_DDL_Y, which outputs predicted pixel values as per the direction of prediction as shown in fig.9. It is explained in the earlier section, that the Diagonal-Down-Left mode of intra prediction is achievable if the top neighbouring pixels A, B, C, and D of reconstructed sub block are available. The predicted pixels pix_pred0_DDL to pix_pred3_DDL are computed using prediction equations as explained in the algorithm.
Figure 9 Architecture of ram_predict_DDL_Y Module
Table 2 describes the predicted pixel assignment in Diagonal-Down-Left intra prediction. Where [0,0], [0,1],[0,2][3,3] are the positions of pixels in 4×4 luma matrix. After careful analysis of the equations used in 4×4 luma prediction mode of Diagonal-Down-Left mode of intra prediction, it is observed that some part of the equation like (A+B), (B+C), (C+D), (D+E) ,(E+F) ,(F+G),(G+H) and (H+H) are common in the equation. Where A, B, C, D, E, F, G, H are the top neighbouring pixels of reconstructed pixels. G and H have not yet been encoded and reconstructed, the value of pixel D is copied to these positions and they are marked as available for DC mode [6]. Therefore the present architecture first calculates
the results of the common parts in the equations and stores them in temporary registers P1, P2, P3 and P4. The ram_predict_DDL_Y module computes the predicted values and sends output at every clock cycle.
Table 2. Predicted pixel assignment in Diagonal-Down-Left intra prediction
Pixel position
pred[0,0]
[(A+B)+(B+C)+2] >>2= P pred[0,1]
Pred[1,0]
[(C+D)+(B+C)+2] >>2= Q pred[0,2]
Pred[1,1]
Pred[2,0]
[(C+D)+(D+E)+2] >>2= R pred[0,3]
Pred[1,2]
Pred[2,1]
[(E+F)+(D+E)+2] >>2= S pred[3,0]
[(E+F)+(D+E)+2] >>2= S pred[1,3]
Pred[2,2]
Pred[3,1]
[(E+F)+(F+G)+2] >>2= S pred[2,3]
Pred[3,2]
[(G+H)+(F+G)+2] >>2= S pred[3,3]
[(G+H)+(H+H)+2] >>2= S The TQIQIT module computes the residual values to get reconstructed signal for luma component called pix_Y_res_rec these values are added with the predicted pixel values pix_pred0_DDL to pix_pred3_DDL in the ram_predict_DDL_Y module. pix_Y_rec_DDL_out [7:0] is the reconstructed value of pixel. The last row pixel values of this reconstructed sub block are also output as pix_pred0_DDL to pix_pred3_DDL with pix_pred_valid as the valid signal.
-
-
Simulation Results of Diagonal-Down-Left Intra prediction
The H.264 video codec with Diagonal-Down-Left intra prediction was first implemented in Matlab in order to verify the functionality of various modules of codec and also to estimate the quality of the reconstructed image. The quality of the reconstructed image is tested various quantization steps. After confirming the results of Matlab, the Diagonal-Down- Left intra prediction module is implemented in verilog and simulated using Modelsim for Quantization step of 8. The verilog module generates reconstructed Y component Cb and Cr components as text files. C++ program is written to convert the text files to tif format. Matlab program is written to reconstruct the image, which also computes the quality of reconstructed image. Fig.10 shows the waveforms or Diagonal-Down-Left intra prediction. pix_pred_DDL_out0 to pix_pred_DDL
_out3 are the predicted pixels obtained from ram_predict_Y module. pix_Y_res0_DDL_out, pix_Y_res1_DDL_out, pix_Y_res2_DDL_out and pix_Y_res3_DDL_out are the luma residual pixel values generated by the difference of actual pixels pix0_out,to pix3_out and predicted pixels. The first prediction starts at 99481ns. The intra prediction process continues up to 1638235ns as shown in fig.11. The original and the reconstructed pictures of Lena are shown in Fig.12as an example. The Matlab program computes the quality of reconstructed picture. The PSNR obtained for Lena image for Verilog encoder employing Diagonal-Down-Left intra prediction is 37.3 dB.
Figure 10. Waveforms for Diagonal-Down-Left Intra prediction: First pixel residual Value after prediction
Figure 11. Waveforms for Diagonal-Down-Left Intra prediction: End of intra prediction
50
100
150
200
250
50 100 150 200 250 300 350 400 450 500
50
100
150
200
250
50 100 150 200 250 300 350 400 450 500
(a) (b)
Figure 12 Simulation Results Diagonal-Down-Left Intra prediction for H.264 Video Encoder.
-
Original Lena image (512×256 pixels)
-
Reconstructed Lena Image using Verilog, PSNR: 37.3 dB
4. Conclusion
The architectural design of H.264 Video encoder using Diagonal-Down-Left intra prediction was presented. The intra prediction Module is coded in verilog and integrated with other functional module such as Transform and Quantization. The complete video codec was also realized using Matlab for validating the verilog design. The results show that the reconstructed picture obtained after applying Diagonal-Down-Left Intra prediction is indistinguishable from the original.
References
-
ITU-T Recommendation H.264 and ISO/IEC 14496-10 (MPEG-4) AVC, Advanced Video Coding for Generic Audiovisual Services, Version 3: 2005.
-
Joint Video Team, Draft ITU-T Recommendation and Final Draft International Standard of joint video specifications, ITU-T Recommendation H.264 and ISO/IEC 14496-10 AVC, March 2005.
-
Iain E. G. Richardson, H.264 and Video Compression, John Wiley and Sons, 2003.
-
Feng Pan, Xiao Lin, Susanto Rahardja, Keng Pang Lim, Z. G. Li, Dajun Wu, Si Wu, Fast mode decision algorithm for intraprediction in H.264/AVC Video coding, Circuits and Systems for Video Technology, IEEE Transactions on Volume 15, July 2005.
-
S. Kwon, A. Tamhankar, K. R. Rao, Overview of H. 264/MPEG-4 Part 10, IEEE Transactions on Circuits and Systems for Video Technology, 2003.
-
Thomas Weigand, Gary J. Sullivan Gisle Bjonte gaard, and Ajay Luthra, Over view of the H.264/AVC Video Coding Standard. IEEE Transactions on Circuits and Systems for Video Technology, July 2003.
-
Gulistan Raja, Sadiqullah Khan, Muhammad Javed Mirza, VLSI architecture and implementation of
H.264 integer transform, The IEEE 2005 Workshop on Signal Processing Systems (SIPS05), 2005.
-
N. Keshaveni, S. Ramachandran, K.S. Gurumurthy, Design and Implementation of Integer Transform and Quantization Processor for H.264 Encoder on FPGA, International Conference on Advances in Computing, Control and Telecommunication Technologies, December 2009.