Framework for Improving Accuracy in Text Extraction from Natural Image

Roshi Saxena; Sushil Bansal

doi:10.17577/IJERTV2IS100518

Volume 02, Issue 10 (October 2013)

Framework for Improving Accuracy in Text Extraction from Natural Image

DOI : 10.17577/IJERTV2IS100518

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 80
Total Downloads : 276
Authors : Roshi Saxena, Sushil Bansal
Paper ID : IJERTV2IS100518
Volume & Issue : Volume 02, Issue 10 (October 2013)
Published (First Online): 17-10-2013
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Framework for Improving Accuracy in Text Extraction from Natural Image

Roshi Saxena ** Sushil Bansal #

** Student, Department of Computer Science Chitkara University , M.E ( 2010-2013),

# Assistant Professor & H.O.D , Deptt. Of CSE, Chitkara University

ABSTRACT

Text embedded in natural scene images contain large amount of useful information. Extracting text from natural scene images is a well known problem in image processing area. Data which appear as text in natural scene images may differ from each other in its size, style, font, orientation, contrast, background which makes it an extremely challenging task to extract the information with higher accuracy. Natural scene text extraction with higher accuracy is still a challenging problem. In this paper r we have presented an algorithm, a graphical user interface to extract the text in scene images with higher precision rate and recall rate.

KEYWORDS:

Text, Images, Natural, Accurate

INTRODUCTION

Today, most of the useful information is available into the text which is present into the natural images. For eg. Name of the brand embedded into clothes, text written on the nameplates, signboards etc. There should be some mechanism to extract the text from natural images. Recent studies show some methods to extract text from images but the approach didnt worked fine for characters which are small in size. In this paper we have presented an algorithm which will

extract the small sized characters and the algorithm works well with the text which is present into the noisy images also.

We have presented a framework which will extract the text from natural images with higher accuracy. Data from ICDAR dataset 2003 is being tested.
PREVIOUS WORK
[1] Kim K.C, Byun, H.R ., Song Y.W, Chi, S.Y, Kim, K.K, Chung Y.K presented method that extracts text regions in natural scene images using low-level image features and that verifies the extracted regions through a high-level text stroke feature. Then the two level features are combined hierarchically. The low-level features are color continuity, gray-level variation and color variance. [2] Shivananda V Seeri, Ranjana B Battur, Basavaraj S Sannakashappanavar presented a method to extract characters from natural scene images. Algorithm works well with the medium sized characters. [3] Xiaoqing Liu et al. proposed Multiscale edge based text extraction from complex

images, method which automatically detects and extracts text present in the complex images using the multi scale edge information. This method is robust with respect to the font size, color, orientation and alignment and has good performance of character extraction. [4] Nobuo Ezaki and Marius Bulacu, Lambert

Schomaker presented a text extraction method for blind persons. [5] Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, Hong-Wei Hao presented robust text detection in natural scene images. A fast and effective pruning algorithm is designed to extract Maximally Stable Extremal Regions (MSERs) as character candidates using the strategy of minimizing regularized variations.[6] Yang, presented the problem of recognizing and translating automatic signatures.
PROPOSED ALGORITHM

Text extraction method which is being used in our algorithm is edge based method and reverses edge based method. Our method converts RGB image into HSI plane to extract the exact color of the text from the image. To implement the method we have prepared a graphical user interface which will follow the following steps:
1. Load input image and separate R,G,B Channels
2. Convert RGB image into HSI image
3. Edges are detected by applying Sobel operator on the image in the following ways
  
  a.) Apply Sobel horizontal Kernel to get the horizontal gradient image.
  
  b.) Apply Sobel vertical kernel to get vertical gradient image.
  
  c.) Find magnitude of gradient Image
4. Apply Otsus thresholding to binarize the image.
5. Apply connected components to find text into the image.
6. Apply filtering to text to remove false objects.
7. Test on the database to calculate the accuracy.

Advantage of our method is that our graphical user interface extracts the small sized characters also with higher accuracy.

Implementation
1. Load Input image from the database
  
  First of all, input image is loaded from the database. After loading the image from database, image is separated into R, G,B channels.
2. Conversion of RGB image into HSI plane
  
  RGB model is used to display the images. No single color can be called as red, green and blue. In RGB Color Model each color appears in its primary spectral components of Red, Green and Blue.
  
  HSI color model is used to process the images. HUE is used to extract the dominant color perceived by an observer. SATURATION extracts the relative purity of the amount of white mixed with hue. INTENSITY is used to check the brightness at different points i.e. the total amount of light passing through a particular area. Hue, Saturation and Intensity are three important descriptors used in describing colors.
  - Hue represents the purity of the color. (i.e. pure red, pure yellow, pure green).
  - Saturation represents the measure of the degree to which a pure color is diluted by white light.
  - Intensity is the gray level value of the color.

Hue and Saturation represents the color carrying Chrominance (Chromatic) information. Intensity represents the gray-level Luminance (achromatic) information.

Converting Color from RGB to HSI: To convert RGB image into HSI, following formula is used

an approximation of the gradient of the image intensity function. The Sobel operator is based on convolving the image with a small, separable, and integer valued filter in horizontal and vertical direction and is therefore relatively inexpensive in terms of computations.

The operator uses two 3Ã—3 kernels which are convolved with the

H

360

if B G if B G

original image to calculate approximations of the derivatives – one for horizontal changes, and one for vertical. If we define A as the

source image, and Gx and Gy are two images which at each point contain

1 (R G) (R B)

cos1 2

the horizontal and vertical derivative approximations, the computations

1

(R G)2 (R B)(G B) 2

are as follows:

S 1

1

3

(R G B)

min(R, G, B)

Where here denotes the 2- dimensional convolution operation.

I (R G B) 3

Image being converted into HSI

Edge Detection

Edges characterize boundaries in images. Edges in images are areas with strong intensity contrasts a jump in intensity from one pixel to the next. Detecting an edge in the image reduces the amount of data and filters out useless information, while preserving the important structural properties in an image. The Sobel operator is applied on HSI image to detect the images.

Sobel Operator is a discrete differentiation operator, computing

The Sobel operator performs a 2-D spatial gradient measurement on an image.

The magnitude of the gradient is then calculated using the formula:

Image after edge detection

Image after edge detection
Image Binarization

A binary image is an image that only ues two values for its pixel values (0 and 1). To binarize our image, we have used Otsus method, which assumes that the image which is to be thresholded contains two classes of pixels i.e foreground and background and then calculates the optimum threshold separating those two classes so that their combined spread is minimal.

Otsu's thresholding method involves iterating through all the possible threshold values and calculating a measure of spread for the pixel levels each side of the threshold, i.e. the pixels that either fall in foreground or background. The aim is to find the threshold value where the sum of foreground and background spreads is at its minimum.

Binary Image

Image after Binarization
Dilation and Extraction of connected Components of text

In order to compute the dilation we will superimpose the structuring element on top of the input image so that the origin of the structuring element coincides with the input pixel position. If at least one pixel in the structuring element coincides with a foreground pixel in the image underneath, then the input pixel is set to the foreground value. In this way we can extract the components which are connected to each other.

Image after dilation

Dilated Image
Filtration by removing Noise

Image is reversed after dilation and extraction of connected components. OR operation is applied to original image and reversed image to remove the noise.

Final Image

EXPERIMENTAL RESULTS

We have conducted the following test on the framework which we have prepared and after conducting the test, accuracy were determined by calculating precision and recall rate.

Test 1

RGB IMAGE Image after edge detection Binary Image Image after dilation

Edge Detected Image Reverse Edge Image Final Image

Test2

RGB IMAGE Image after edge detection Binary Image Image after dilation

Edge Detected Image Reverse Edge Image Final Image

Test 3

RGB IMAGE Image after edge detection Binary Image

Image after dilation Edge Detected Image Reverse Edge Image

Final Image

Test4

Test 4	100	100
Test 5	100	100
Test 6	98.1818	98.0769
Overall Precision Rate	99.38
Overall recall Rate	99.34

Test 4	100	100
Test 5	100	100
Test 6	98.1818	98.0769
Overall Precision Rate	99.38
Overall recall Rate	99.34

RGB IMAGE Image after edge detection Binary Image Image after dilation

Edge Detected Image Reverse Edge Image Final Image

Test5

RGB IMAGE Image after edge detection Binary Image

Image after dilation Edge Detected Image Reverse Edge Image

5.2 Comparison with other Methods

Method which is proposed into the paper is compared with the existing text extraction Algorithms and the following results were obtained.

Method	Precision Rate ( % )	Recall Rate ( % )
Proposed Algorithm	99.38	99.34
Shivanand S. Seeri	98.46	97.83
Samarabandu	91.8	96.6
J. Gllavata	83.9	88.7
Wang	89.8	92.1
K.C. Kim	63.7	82.8
J. Yang	84.90	90.0

Method	Precision Rate ( % )	Recall Rate ( % )
Proposed Algorithm	99.38	99.34
Shivanand S. Seeri	98.46	97.83
Samarabandu	91.8	96.6
J. Gllavata	83.9	88.7
Wang	89.8	92.1
K.C. Kim	63.7	82.8
J. Yang	84.90	90.0

Final Image

Test 6

RGB IMAGE Image after edge detection Binary Image

Image after dilation Edge Detected Image Reverse Edge Image

5.1 Test Results

Test	Precision Rate	Recall Rate
Test 1	100	100
Test 2	98.0769	98.0769
Test 3	100	100

After Comparing with the other methods, proposed method is better than the existing one and it extracts small sized characters with higher accuracy.

CONCLUSION AND FUTURE SCOPE

In this paper we have tried to present an approach which will extract characters from natural scene Images with higher accuracy, precision and recall rate. Algorithm is implemented on small sized characters and it works well with the small sized characters. Limitation of the algorithm is that it did not work well with the character images which are blurred in nature. Future work involves extraction of

text characters from blurred images with higher accuracy.
REFERENCES
1. Kim K.C, Byun, H.R ., Song Y.W, Chi, S.Y, Kim, K.K, Chung Y.K, Scene Text extraction in natural scene images using hierarchical feature combining and verification Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on page 679-682 Volume:2
2. Shivananda V Seeri, Ranjana B Battur, Basavaraj S Sannakashappanavar Text Extraction from Natural Scene Images, International Journal of Advanced Research in Electronics and Communication Engineering, Volume 1 October 2012
3. Xiaoqing Liu and Jagath Samarabandu, Multiscale edge- based Text extraction from
  
  Complex images, IEEE, 2006
4. Nobuo Ezaki, Marius Bulacu, Lambert Schomaker, Text Detection from Natural Scene images : Towards a System for Visually impaired Persons, Proc. of 17th Int. Conf. on Pattern Recognition (ICPR 2004), IEEE Computer Society, 2004, pp. 683-686, vol. II,
  
  23-26 August, Cambridge, UK
5. Xu-Cheng Yin, Xuwang Yin, Kaizhu Huang, Hong-Wei Hao , Robust Text Detection in Natural Scene Images IEEE Explore June 2013
6. J. Yang, J. Gao, Y. Zhang, X. Chen and A. Waibel, An Automatic Sign Recognition and Translation System, Proceedings of the Workshop on Perceptive User Interfaces (PUI'01), 2001, pp. 1-8.
7. A.K Jain Fundamentals of Digital Image Processing Englewood Cliff, NJ: Prentice Hall, 1989, Ch 9
8. R.C. Gonzalez, Digital Image Processing Using MATLAB
9. N. Otsu, A Threshold Selection Method from Gray- Level Histogram, IEEE Trans. Systems, Man and Cybernetics, Vol. 9, 1979, pp. 62-69
10. S.M. Lucas, A. Panaretos, L. Sosa,
  1. Tang, S. Wong, and R. Young, ICDAR 2003 Robust Reading Competitions, Proc.of the ICDAR, 2003, pp. 682-687

Framework for Improving Accuracy in Text Extraction from Natural Image

Leave a Reply