FPGA Implementation of Image Processing Architecture for Various Dip Applications

V. Balaji; R. Sakthi Kumar

doi:10.17577/IJERTV3IS10994

Volume 03, Issue 01 (January 2014)

FPGA Implementation of Image Processing Architecture for Various Dip Applications

DOI : 10.17577/IJERTV3IS10994

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 107
Total Downloads : 593
Authors : V. Balaji, R. Sakthi Kumar
Paper ID : IJERTV3IS10994
Volume & Issue : Volume 03, Issue 01 (January 2014)
Published (First Online): 28-01-2014
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

FPGA Implementation of Image Processing Architecture for Various Dip Applications

V. Balaji, R. Sakthi Kumar

PG scholar

Abstract

Digital image processing is mainly focused on ever expanding and dynamic area with applications reaching out into our day today life such as medicine, security purpose, space exploration, surveillance, identification & authentication, automatic industry inspection etc. Applications such as these involve different operations like image compression, image enhancement, object detection and Noise removing. Implementing the image processing applications on a computer can be easier one, but not efficient due to additional constraints on memory and other peripheral devices. However, most general purpose hardware is not suited for strong real-time constraints. This paper gives the implementation of median filter image processing on FPGA. The processors architecture is combining with a reconfigurable binary processing module, input and output image controller units, and peripheral circuits. Reconfigurable binary processing module will perform DCT application and sobel filter, for a 256Ã—256 image. The periphery circuits control the whole image processing and dynamic reconfiguration process .The simulation and experimental results demonstrate that the processor is suitable for real-time binary image processing applications.

Introduction

Image processing is any form of signal processing for which the input is an image, such as a photograph or video signal; the output of image processing may be either an image or a set of characteristics or parameters related to the image. Most of the image-processing techniques involves treating the image as a two-dimensional signal and applying standard signal processing techniques to it. Digital image processing is the method of computer algorithms to perform image processing on digital images, digital image processing has many advantages over analog image Processing. It allows a many algorithms to be applied to the input data and can avoid problems such as the build-up Of noise and signal distortion during processing. Since images are defined over 2-dimensions digital image processing may be modeled in the form of multidimensional systems. General-purpose chips

have the architecture of a digital processor, in which each digital processor handles pixel by pixel. When larges sized images are processed, the chip size will become extremely large. Thus, further analyzing needed to design a high performance, small size, and wide range of application for real- time binary image processing applications.

This paper presents a binary image processor that consists of a reconfigurable binary processing module, including reconfigurable binary computational units and output control logic, input and output image controller units, and peripheral circuits. The reconfigurable binary compute units are mixed grained architecture, which has the advantages of more flexibility, efficiency and high speed and performance. The processor performance is enhanced by using dynamic reconfiguration method. The processor is implemented to perform real time binary image processing applications. It is found that the processor can process pixel-level images and extract image features. Basic mathematical median operations and complicated algorithms can easily be implemented on it. The processor has the advantages of small size, high speed and simple structure, and wide range of applications. CSD (canonical sign digit) is a simple and hardware-efficient algorithm for the implementation of various elementary, especially trigonometric, functions. Instead of using Calculus based methods such as polynomial or rational functional approximation, it uses simple shift, add, subtract and table look-up operations to achieve this objective Discrete Cosine Transformation (DCT) is the most widely used transformation algorithm. DCT, first proposed by Ahmed [9] et al, 1974, has got more importance in recent years, especially in the fields of Image Compression and Video Compression. This chapter focuses on efficient hardware implementation of DCT by decreasing the number of computations, enhancing the accuracy of reconstruction of the original data, and decreasing chip area. As a result of which the power consumption also decreases. DCT also improves speed, as compared to other standard Image compression algorithms like JPEG. A programmable single instruction multiple data (SIMD) real time vision chip was presented to

achieve high-speed target tracking [10]. In [24], a programmable binary morphology coprocessor was introduced to the visual content analysis engine of the chip used for visual surveillance. A reconfigurable image processing accelerator incorporating eight macro processing elements was designed to support real-time change detection and background registration based on video object segmentation algorithm. Recently, a vision chip with the architecture of a massively parallel cellular array of processing elements was presented for image processing by using the asynchronous or synchronous processing technique Other general- purpose chips have the architecture of a digital processor array, in which each digital processor handles one pixel. When large sized images are processed, the chips will become extremely large. Thus, further studies are needed to design a high performance, small size, and wide application range chip for real-time binary image processing DCT applications. This paper presents a binary image processor that consists of a reconfigurable binary processing module, including reconfigurable binary compute units and output control logic, input and output image control units, and peripheral circuits
Reconfigurable Image Processor

FIELD-PROGRAMMABLE GATE ARRAYS

(FPGA) were introduced a decade ago, they have only recently becomes very popular. This is not only the fact of programmable logic saves development cost and reducing the time over and complex ASIC designs, but also because the gate counts per FPGA chip has reached numbers that allow for the implementation of more complex applications[11]. Many present days applications utilize a processor and other logic on two or more individual chips. However, with the anticipated ability to build chips with over ten million transistors, it will become possible to implement a processor within a sea of programmable logic, all on one chip.

Such a design approach would allow a great degree of programmability freedom, both in hardware and in software: EDA tools could decide which parts of a source code program are actually to be executed in software and which other parts are enhanced with hardware. The hardware implementation may be needed for application interfacing reasons or may simply represent a coprocessor used to improve execution time. Programmable logic need not only be used for application speed-up, it can also be employed as intelligent glue logic for custom interfacing

purposes such as in embedded. Controller applications. Current single-chip embedded processors attempt to provide very exible interfaces that can be used in a large number of applications.
1. Implementation of on chip processer
  
  Fig. 1. Reconfigurable Image Processor
  
  However, they can often result in interfaces that are less efficient than intended. Furthermore, it might be desirable to perform some bit-level data computations in-between the main processor and the actual I/O interface. This paper also investigates the requirements for providing a general purpose eld-congurable interface for embedded processor applications. The Reconfigrable image processor is shown in the Fig. 1. The processors architecture is a combination of a reconfigurable binary processing module, input and output image controller units, and peripheral circuits and on chip memory unit and NIOS-2 processor. The reconfigurable binary processing module will perform image compression operations and edge detection operation. The input image is given to pre-processing controller unit after the process the image is loaded into on chip memory unit. Initially analogue image is converted into digital and impulse noise is added using MATLAB. And image is converted into180 x 180 sizes and totally 3600 blocks are stored in text file. The text file accessed by modelsim and calculating the median values and remove the salt and pepper noise. NIOS II processer is used as a controller circuits. Gated clock is used to disable the idle blocks to reduce unnecessary transitions .FIFO synchronization is used to synchronies all the units.
  
  DISCRETE COSINE TRANSFORM –
  
  To Compress Image
  
  SOBEL FILETR – To detect edges
2. Image Processing Applications
  
  The reconfigurable binary compute units are of a mixed grained architecture, which has the characteristics of high flexibility, efficiency, and
  
  performance. The performance of the processor is enhanced by using the dynamic reconfiguration approach. The processor is implemented to perform real time binary image processing. It is found that the
  
  Processor can process pixel-level images and extract image features, such as boundary and motion images. Basic mathematical median operations and complicated algorithms can easily be implemented on it. The processor has the merit of high speed, simple structure, and wide application range. Although eld programmable gate arrays (FPGA) were introduced a decade ago, they have only recently become more popular. This is not only due to the fact that programmable logic saves development cost and time over increasingly complex ASIC designs, but also because the gate count per FPGA chip has reached numbers that allow for the implementation of more complex applications.
Discrete Cosine Transform

Multimedia data processing, which encompasses almost every aspects of our daily life such as communication broad casting, data search, advertisement, video games, etc has become an integral part of our life style. The most significant part of multimedia systems is application involving image or video, which require computationally intensive data processing. Moreover, as the use of mobile device increases exponentially, there is a growing demand for multimedia application to run on these portable devices. In order to reduce the volume of multimedia data over wireless channel compression techniques are widely used. Discrete cosine transform (DCT) is one of the major compression schemes owing to its near optimal performance. Its energy compaction efficiency is also greater than any other transform.
Low Complexity 2-D Dct Using 1-D Dct

Decomposed Matrix

The 1-D 8-point DCT can be expressed as follows:

(2)

Where xm denotes the input data;

Zn denotes the transform output; Kn = sqrt(1/2) for n=0 .

By neglecting the scaling factor 1/2, the 1-D 8- point DCT in (2) can be Divided into even and odd parts:

Fig.2 Decomposed DCT

In 8 point DCT 8 input values are multiplied with 8 x 8 DCT matrix. For getting all 8 outputs 64 multipliers

are used. In decomposed DCT architecture by adding one pre-processing unit we reduce the multipliers usage by 50 %( only 32 multipliers

used). In pre-processing unit we used only adders. Overall we can reduce the hardware complexity.
Binary Conversion

Many techniques have been used to efficiently convert this floating point values into binary representation for digital implementation. Then only we can implement DCT in VLSI.

The two ways of floating point to binary conversion are

(1).Both integral and fractional part is converted separately by repeatedly multiply 2, and considers each one bit as it appears left of the decimal.
1. DCT coefficients
  
  The 1-D DCT given by equation (2) can be split into two matrixes, the odd
  
  The 1-D DCT given by equation (5) can be split into two matrixes, the odd and the even.
  
  The odd 1-D DCT can be expressed as
  
  The even 1-D DCT can be expressed as
  
  where ck = cos k/16 , a = c1, b = c2, c = c3, d =
  
  c4, e = c5, f = c6, g = c7 are the cosine basis.
  
  From the equations (3) and (4), it can be stated that the DCT operation involves multiplication of various cosine coefficients with a fixed input sequence. Hence sub structure sharing technique is used to reduce the number of operators [6]. The
  
  cosine basis is quantized to 8-bits for energy efficiency. The cosine coefficients are represented as CSD number which has the advantage of reduced number of ones compared to the binary representation. The cosine basis is chosen up to four decimal places and each one is represented as 7 bit binary number. The number of bits has an impact on the quality of the system. The values of the cosine basis are shown in the Table below. The stronger operator, multiplication is transformed to simple shift and adds operations by applying Horners rule. This reduces the power consumption. For example, consider the cosine coefficients c and g, c *X = 25 + 24 + 22 +1 (X) = (24 (3) + 5) (X) and g*X = 23 + 22 (X)= 22(3) (X)
  
  and the common terms they share is 3X. The
  
  common terms among the cosine basis are 1X, 3X, 5X, and -1X and are shared to compute the partial outputs.
  
  Table 1. Cosine Basis Set
  
  These blocks are termed as precomputing units and an unit is shown in the Figure. The intermediate results from the precomputing blocks are added in the final stage yielding the DCT coefficients. The 3A is constructed by expressing it as 3A = 1A+2A
  
  = {1A + (1A<<1)}. Similarly the 5A can be expressed as {1A + (1A<<2)}. and g, c *X = 25 + 24 + 22 +1 (X) = (24 (3) + 5) (X) and g*X = 23 + 22
  
  (X)= 22(3) (X) and the common terms they share is 3X. The common terms among the cosine basis are 1X, 3X, 5X, and -1X and are shared to compute the partial outputs.
  - Multiplication is expensive in hardware
  - Decompose constant multiplications into shifts and additions\
    - 13*X = (1101)2*X = X + X<<2 + X<<3
    - Signed digits can reduce the number of additions/subtractions
    - Canonical Signed Digits (CSD)
    - (57)10 = (0110111)2 = (100- 1001)CSD
  Up to 50% reduction
Performance Results

The image is converted into pixels using MATLAB and the values are stored as a text file. The text file is accessed by the Model sim ALTERA and the corresponding 2-D DCT coefficients are calculated. These values are then fed to the IDCT module which returns the spatial data sequence. These data are written to a text file. The image can be reconstructed from the text file using MATLAB coding.

Fig 3. Simulated output

Table 2. Area comparison table

Fig 4. Input and reconstructed image

6.Conclusion

In this paper, a reconfigurable binary image processor was proposed to perform real-time binary image processing applications. The processor is combination of a reconfigurable binary processing module, input and output image controller units, and peripheral circuits. The reconfigurable binary processing module has a mixed-grained architecture with the characteristics of high efficiency and increase the processor performance. Basic DCT application and mathematical morphology operations can be easily implemented on its simple structure. The processor featured by simple structure, high speed, and wide range of applications are suitable for binary image processing.This increases the efficiency of the system. The filter can removes noise even at higher noise densities and preserves he edges and fine details. The performance of the filter is better when compared to the other filter of this type. The developed filters are tested using 180X180, 8- bits/pixel images. Different levels and the results are compared with MATLAB implementation.

References

Y. Liu and C. Pomalaza-Raez, A low- complexity algorithm for the on-chip moment computation of binary images, in Proc. Int. Conf. Mechatron. Autom., 2009, pp. 18711876.
E. C. Pedrino, O. Morandin, Jr., and V. O. Roda, Intelligent FPGA based system for shape recognition, in Proc. 7th Southern Conf. Programmable Logic, 2011, pp. 197202.
M. F. Talu and I. Turkoglu, A novel object recognition method based on improved edge tracing for binary images, in Proc. Int. Conf. Appl. Inform. Commun. Technol., 2009, pp. 15.
A. J. Lipton, H. Fujiyoshi, and R. S. Patil, Moving target classification and tracking from real-time video, in Proc. Workshop Appl. Comput. Vision, 1998, pp. 814.
J. Kim, J. Park, K. Lee et al., A portable surveillance camera architecture using one-bit motion detection, IEEE Trans. Consumer Electron., vol. 53, no. 4, pp. 12541259, Nov. 2007.
D. J. Dailey, F. W. Cathey, and S. Pumrin, An algorithm to estimate mean traffic speed using uncalibrated cameras, IEEE Trans. Intell. Transportation Syst., vol. 1, no. 2, pp. 98107, Jun. 2000.
T. Ikenaga and T. Ogura, A fully parallel 1-Mb CAM LSI for real-time pixel-parallel image processing, IEEE J. Solid-State Circuits, vol. 35, no. 4, pp. 536544, Apr. 2000.
E. C. Pedrino, J. H. Saito, and V. O. Roda, Architecture for binary mathematical morphology reconfigurable by genetic programming, in Proc. 6th Southern Programmable Logic Conf., 2010, pp. 9398.
M. R. Lyu, J. Song, and M. Cai, A comprehensive method for multilingual video text detection, localization, and extraction, IEEE Trans. Circuit Syst. Video Technol., vol. 15, no. 2, pp. 243255, Feb. 2005.
W. Miao, Q. Lin, W. Zhang et al., A programmable SIMD vision chip for real-time vision applications, IEEE J. Solid-State Circuits, vol. 43, no. 6, pp. 14701479, Jun. 2008.
Bin Zhang, Kuizhi Mei and Nanning Zheng,(MAY 2013), Reconfigurable Processor for Binary Image Processing, IEEE Transactions On Circuits And Systems For Video Technology.
K. Fujii, M. Nakanishi, S. Shigematsu et al., A 500-dpi cellular-logic processing array for fingerprint-image enhancement and verification, in Proc. IEEE Custom Integr. Circuits Conf., May 2002, pp. 261264.
H. J. Park, K. B. Kim, J. H. Kim et al., A novel motion detection pointing device using a binary CMOS image sensor, in Proc. IEEE Int. Symp. Circuits Syst., May 2007, pp. 837840.
M. Laiho, J. Poikonen, and A. Paasio, Space- dependent binary image processing within a 64Ã—64 mixed-mode array processor, in Proc. Eur. Conf. Circuit Theory Design, 2009, pp. 189192.
E. N. Malamas, A. G. Malamos, and T. A. Varvarigou, Fast implementation of binary morphological operations on hardware-efficient systolic architectures, J. VLSI Signal Process., vol. 25, no. 1, pp. 7993, 2000.
J. Velten and A. Kummert, Implementation of a high-performance hardware architecture for binary morphological image processing

operations, in Proc. 47th IEEE Int. Midwest Symp. Circuits Syst., Jul. 2004, pp. 241244.
R. Dominguez-Castro, S. Espejo, A. Rodriguez-Vazquez et al., A 0.8-m CMOS 2-D programmable mixed-signal focal-plane array processor with on-chip binary imaging and instructions storage, IEEE J. Solid-State Circuits, vol. 32, no. 7, pp. 10131026, Jul. 1997

BIOGRAPHIES

V.Balaji received the B.E Degree in Electronics and Communication Engineering from the Sri Ramakrishna Engineering College, Coimbatore in 2011. He is currently pursuing the M.E Degree in VLSI Design in Kalaignar Karunanidhi Institute of Technology, Coimbatore. His areas of interest are Image Processing and very large scale integration Architecture design for embedded vision systems.

R.Sakthikumar received the B.E Degree in Electronics and Communication Engineering from the Sri Subramanya college of Engineering and Technology, Palani in 2011. He is currently pursuing the M.E Degree in VLSI Design in Sengunthar Engineering College, Tiruchengode. His areas of interest are Image Process.

FPGA Implementation of Image Processing Architecture for Various Dip Applications

References

Leave a Reply