An Edge Based Text Segmentation From Complex Images

W. Josephin; Dr. R. K. Selvakumar

doi:10.17577/IJERTV2IS70308

Volume 02, Issue 07 (July 2013)

An Edge Based Text Segmentation From Complex Images

DOI : 10.17577/IJERTV2IS70308

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 79
Total Downloads : 626
Authors : W. Josephin, Dr. R. K. Selvakumar
Paper ID : IJERTV2IS70308
Volume & Issue : Volume 02, Issue 07 (July 2013)
Published (First Online): 12-07-2013
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

An Edge Based Text Segmentation From Complex Images

W. Josephin

Dept. of Computer Science,

St. Johns College, Palayamkottai, TamilNadu

Dr. R. K. Selvakumar

Dept. of Computer Science & Engineering,

Cape Institute of Technologyy, Levengipuram, Tirunelveli, TamilNadu

Abstract

Text in images is a significant cue for visual content understanding and retrieval. Detection and extraction of text in images have been used in many applications. This paper proposes an edge-based text segmentation algorithm, which is robust with respect to font sizes, styles, color/intensity, orientations and alignment of text. This method can quickly and effectively localize and extract text regions from real scenes

.It can be used in a large variety of application fields, such as vehicle license detection and recognition, document retrieving and page segmentation etc.

KeyWords: Text extraction, Scene TextThresholding,.

Introduction

Text appears in images either in the form of documents such as scanned CD/book covers or video images. Video text can be broadly classified into two categories: overlay text and scene text. Scene text occurs naturally as a part of scene such as text in information boards/signs, nameplates, food containers etc. Automatic detection and extraction of text in images have been used in many applications. Scene text extraction can be used in mobile robot navigation to detect text-based landmarks,

vehicle license plate detection / recognition, object identification etc.

Text embedded in images contains large quantities of useful semantic information, which can be used to fully understand images. Among the several textural properties in an image, edge based methods focuss on the high contrast between the text and the background. The edges of the text boundary are identified and merged, and then several heuristics are used to filter out the non-text regions. Usually, an edge filter (eg. a Canny operator) is used for the edge detection and a smoothing operation or a morphological operator is used for the merging stage.

Yassin et al. [1] presented a morphological approach for text extraction. The RGB components of a color input image are combined to give an intensity image Y as follows: Y = 0.299 R + 0.587 G + 0.114 B,

where R, G, and B are the red, green and blue components respectively. Although this approach is simple and many researchers have adopted it to deal with color images, it has difficulties dealing with objects that have similar gray scale values, yet different colors in a color space. After the color conversion, the edges are identified using a morphological gradient operator. The resulting edge image is then thresholded to obtain a binary edge image. Adaptive thresholding is performed for each candidate region in the intensity image, which is less sensitive to illumination conditions and reflections. Edges that are spatially close are

grouped by dilation to form candidate regions, while small components are removed by erosion. Non-text components are filtered out using size, thickness, aspect ratio, and gray level homogeneity.

Wumo et al. [2] proposed a sparse representation based method for text detection from scene images. Edge information is extracted using Canny operator and then these edge points are grouped into connected components. Each connected component is labeled as text or non-text by a two-level labeling process. The core of the labeling process is a sparsity test using an over-complete dictionary, which is learned from edge segments of isolated character images. Layout analysis is further applied to verify these text candidates.

Wang et al. [3] proposed a connected- component based (CC based) method which combines a color clustering, a black adjacency graph (BAG), an aligning-and-merging-analysis (AMA) scheme and a set of heuristic rules together to detect text in the application of sign recognition such as street indicators and bill boards. As the author mentioned, uneven reflectances result in income plete character segmentation which increases the false alarm rate.

Kim et al. [4] combined a Support Vector Machine (SVM) and continuously adaptive mean shift algorithm (CAMSHIFT) to detect and identify text regions. Gao et al. [9] developed a three layer hierarchical adaptive text detection algorithm for natural scenes. This method was applied in a prototype Chinese sign translation system which mostly has a horizontal and/or vertical alignment.

X. Liu et al. [5] proposed a statistics based method to detect and localize text based features by calculating the spatial intensity variation. This method is very simple and fast. However, in real scenes, due to uneven illumination, reflections and shadows, an image background may contain areas with high spatial intensity variations that do not contain text. X. Liu and J. Samarabandu [6] developed a single scale edge-based text region extraction algorithm for indoor scene images, which is robust with respect to font sizes,styles,colors/intensity, orientations, effects of illumination, reflections, shadows and perspective distortion. X. Liu et al.
[7] developed a multiscale edge-based text

extraction algorithm, which can localize and extract text from both document and indoor/outdoor scene images.

In this paper we propose a single scale edge-based text segmentation method, which can quickly and effectively localize and extract scene text, especially from outdoor scene images and from object label images.
Proposed Method

Edges are a reliable feature of text regardless of color/intensity, layout, orientations etc. Edge strength and density are two distinguishing characteristics of text embedded in images, which can be used as main features for detecting scene text. Characters are made of strokes with different orientations, which can also be interpreted as edges with different orientations. Thus, variance information of the edge orientation is also an important feature of text. The proposed method uses the edge density, strength and orientation variance to extract text from real scenes.

The block diagram of the proposed text segmentation algorithm is given in figure 2.1. As can be seen in the block diagram, a color image is entered to the system as the input data and the segmented text on a clear black background is the output.

Input Color Image

Input Color Image

Preprocessing

Preprocessing

Directional Filtering

Directional Filtering

Edge Selection

Edge Selection

Feature Map Generation

Feature Map Generation

Feature Clustering

Feature Clustering

Heuristic Filtering

Heuristic Filtering

Boundary Boxes Generation

Boundary Boxes Generation

Text Extraction

Text Extraction

Figure 2.1: Block Diagram of the proposed Text Segmentation Algorithm
1. Preprocessing
  
  If the input image is a color image, its RGB components are combined to give the intensity image Y as follows:
  
  Y = 0.299R + 0.587G + 0.114B
2. Candidate Text Region Detection
  
  In this stage a feature map is built by using three important properties of edges: edge strength, density and variance of orientations. The feature map is a gray scale image with the same size of the input image where the pixel intensity represents the possibility of text.
  1. Directional Filtering
    
    The magnitude of second order derivative of intensity is used as a measurement
    
    of edge strength as this allows better detection of intensity peaks that normally chracterize text in images. The edge density is calculated based on the average edge strength within a window. Four orientations 0Â°, 45Â°, 90Â°,135Â° are used to evaluate the variance of orientations. 0Â° denotes horizontal orientation, 90Â° denotes vertical orientations and 45Â° and 135Â° denote the two diagonal directions respectively. After convolving the image with a compass operator [8] as shown in Fig.2.2, we get four oriented edge intensity images E = 0,45,90,135
    
    , which contain all the properties of edges required in our method.
    
    -1
    
    -1
    
    -1
    
    2
    
    2
    
    2
    
    -1
    
    -1
    
    -1
    
    -1
    
    -1
    
    2
    
    -1
    
    2
    
    -1
    
    2
    
    -1
    
    -1
    
    -1
    
    -1
    
    -1
    
    2
    
    2
    
    2
    
    -1
    
    -1
    
    -1
    
    -1
    
    -1
    
    2
    
    -1
    
    2
    
    -1
    
    2
    
    -1
    
    -1
    
    0Â° Kernel 45Â° Kernel
    
    -1
    
    2
    
    -1
    
    -1
    
    2
    
    -1
    
    -1
    
    2
    
    -1
    
    2
    
    -1
    
    -1
    
    -1
    
    2
    
    -1
    
    -1
    
    -1
    
    2
    
    -1
    
    2
    
    -1
    
    -1
    
    2
    
    -1
    
    -1
    
    2
    
    -1
    
    2
    
    -1
    
    -1
    
    -1
    
    2
    
    -1
    
    -1
    
    -1
    
    2
    
    90Â° Kernel 135Â° Kernel
    
    Figure 2.2: Compass Operator
  2. Edge Selection
    
    Vertical edges form the most important stroke of characters and their lengths also reflect the heights of corresponding characters. By extracting and grouping these strokes, we can locate text with different heights (sizes). Some non-character objects in the real scene also produce strong vertical edges and they have very large lengths. By grouping vertical edges into long and short edges, we can eliminate the vertical edges with extremely long lengths and retain short edges are retained for further processing.
    
    After thresholding, the long vertical edges may become broken short edges which may cause false alarms. In order to eliminate the false grouping, we use a two stage edge generation method.
    
    The first stage is used to obtain strong vertical edges as follows.
    
    Edge 90bwstrong = | E90|z ——- (1)
    
    Where E90 is the 90Â° intensity edge image, which is the 2D convolution result of the original image with the 90Â° kernel. |.|z is a thresholding operator to get the binary result of the vertical edges.
    
    The second stage is used to obtain weak vertical edges, Edge 90bwweak which are obtained as follows.
    
    A morphological dilation operation is performed on the strong vertical edges obtained in the first stage. Then a closing operator with appropriate structuring element is employed on the resultant vertical edges. The difference of the vertical edges obtained in the previous two steps is found. After that the following operation is performed, which results in weak vertical edges.
    
    Edge 90bwweak = |E90 Ã— (closed dilated)|z
    
    The resultant vertical edges of the two stage edge generation method are the combination of the strong and weak edges as described below.
    
    Edge 90bw = Edge90bwstrong + Edge 90bwweak
    
    A morphological thinning operator followed by a Connected Component Labeling and Analysis algorithms are then applied on the resultant vertical edges as described below.
    
    Thinned = Thinning(Edge 90bw ) Labeled = BWlabel(thinned,4)
    
    After the connected component labeling, each edge is uniquely labeled as a single connected component with its unique component number. The labeled edge image is processed with a length labeling process as described in [6]. As a result, all the pixels belonging to the same edge are labeled with the same number which is proportional to its length. A high value in the length labeled image represents a long edge. Therefore a simple thresholding is used to separate the short edges.
    
    2.2.3. Feature Map Generation
    
    Regions with text in them will have significantly higher values of average edge
    
    density, strength and variance of orientations than those of non-text regions. We generate a feature map which suppresses the false regions and enhances the true candidate text regions. The feature map is generated using the following procedure.
    
    DilCandidate = Dilation(short90bw)mÃ—m refined= DilCandidateÃ—E ( = 0,45,90,135)
    
    c c fmap(i,j)=N{[refined(i+m,j+n)]Ã— weight(i,j)}
    
    m=-c m=-c
    
    N is the normalization operation that normalizes the intensity of the feature map into a range of [0,255]. Weight(i,j) is a weight function, which determines the weight of a pixel (i,j) based on the number of orientations of edges within a window.
    The retaining blobs are enclosed in boundary boxes. The coordinates of the boxes are obtained from the maximum and minimum coordinates of the top left and bottom right of the corresponding blobs.
    
    2.4 Text Extraction:
    
    The corresponding regions (blobs) in the original grayscale image are taken. Finally, an adaptive threshold is applied on these regions, which results in the segmentation of the real text regions from the image.

Experimental Results and Discussion

Currently, our algorithm has been implemented in IDL language under Windows XP. Experiments were carried out on two types of images, namely the outdoor scene images and object label images. Each image is in BMP or JPEG format. Text in the images is of different font sizes, colors, orientations, alignments, perspective projection under different lighting conditions.

In order to evaluate the performance of the proposed method, we use 45 test images. The results of the proposed algorithm when run on some typical images are shown below.

Figure 3.1: OutDoor Scene Image

a) Original Image

Figure 3.: b) Real Text Regions

Figure 3.1: c) Segmented Text

Figure 3.2: OutDoor Scene Image a) Original Image

Figure 3.2: b) Real Text Regions

Figure 3.2: c) Segmented Text

Figure 3.3: OutDoor Scene Image a) Original Image

Figure 3.3: b) Real Text Regions

Figure 3.3: c) Segmented Text

Figure 3.4: OutDoor Scene Image a) Original Image

Figure 3.4: b) Real Text Regions

Figure 3.4: c) Segmented Text

Figure 3.5: ObjectLabel Image

a) Original Image

Figure 3.5: b) Real Text Regions

Figure 3.5: c) Segmented Text

Figure 3.6: ObjectLabel Image a) Original Image

Figure 3.7: ObjectLabel Image a) Original Image

Figure 3.7: b) Real Text Regions

Figure 3.6: b) Real Text Regions

Figure 3.6: c) Segmented Text

Figure 3.7: c) Segmented Text

In our proposed method, the accuracy of the algorithm output is computed by manually counting the number of correctly located text blocks, which are regarded as ground-truth. Precision Rate and Recall Rate are quantified to evaluate the performance.

Correctly Located

Precision = ———————————— Ã— 100

Correctly Located + FalsePositive

Correctly Located

Recall = —————————————- Ã— 100

Correctly Located + FalseNegative

In Table I, we see the precision rate and recall rate of the proposed method and that of the other existing methods. The performance of our proposed method is excellent overall. Therefore the proposed method is proved to be efficient for extracting the text from the outdoor scene images and from the object label images.

Table I : Performance Comparison

Method	Test Images No.	Text Block s No.	Corre ctly Locat ed No.	Precisi on Rate (%)	Recall Rate (%)
Proposed Method	45	184	175	94.1	97.2
X. Liu et al. [7]	75	75	–	91.8	96.6
J.Samara
bandu et	25	208	201	91.8	96.6
al. [6]
Wang et al. [3]	325	3597	3314	89.8	92.1
Kim et al. [4]	–	839	645	63.7	82.8
Xi et al. [10]	90	244	231	88.5	94.7
Wolf et al. [12]	60	371	–	–	93.5
Gao et al.[9]	–	823	–	89.9	93.3
Gllavata
et al.	326	1104	979	83.9	88.7
[11]

In Table II, we provide another evaluation using False Positive Rate. It is defined as follows.

FalsePositive

FalsePositive = ——————————- Ã— 100

Correctly Located + FalseNegative

Method	Image Type	False Positive Rate (%)
Proposed Method	Outdoor Scene and Object Labels Text	6.1
J.Samarabandu et al. [6]	Indoor Scene Text	5.0
Wang et al.[3]	Outdoor Scene Text	10.5
Xi et al. [10]	Text Captions	12.3
Gllavata et al. [11]	Overlay Text	17.0

Method	Image Type	False Positive Rate (%)
Proposed Method	Outdoor Scene and Object Labels Text	6.1
J.Samarabandu et al. [6]	Indoor Scene Text	5.0
Wang et al.[3]	Outdoor Scene Text	10.5
Xi et al. [10]	Text Captions	12.3
Gllavata et al. [11]	Overlay Text	17.0

Table II: False Positive Comparison

Conclusions

In this paper, a single scale edge based segmentation algorithm which can automatically detect and extract text from outdoor scene images and object label images is proposed. This method is robust with respect to font sizes, styles, color/intensity, orientations, and alignment of text. According to the experimental results, the proposed method is proved to be effective and efficient for extracting the text regions from the complex images.
References

Yassin M.Y., Hasan and Lina J. Karam, Morphological Text Extraction from Images, IEEE Transactions on Image Processing, Vol. 9, No. 11. November 2000.
Wumo Pan Bui, T.D. Suen, C.Y. , Text detection from scene images using sparse representation , Pattern Recognition, 2008. ICPR 2008. 19th International Conference on.

Dec. 2008, pp. 1-5
Kongqiao Wang and Jari A. Kangas, Character Location in Scene Images from Digital Camera, Pattern Recognition, Vol. 36, no. 10, pp. 2287 2299, 2003.
K.C.Kim, H.R.Byun, Y.J.Song, Y.M. Choi, S.Y. Chi, K.K. Kim and Y.K.Chung, Scene Text Extraction in natural scene images using hierarchical feature combining and verification, Pattern Recognition, 2004, Aug 2004, vol.2 of ICPR 2004. Proceedings of the 17th International Conference on, pp. 679 682.
X. Liu and J.Samarabandu, A Simple and fast text localization algorithm for indoor mobile robot navigation, in Proc. Of the SPIE IS & T Electronic Imaging, 2005,San Jose, California, USA, Jan 2005, Vol. 5672, pp.139 150.
X. Liu and J. Samarabandu, An edge-based text region extraction algorithm for indoor mobile robot navigation, in Proc. of the IEEE International Conference on Mechatronics and Automation (ICMA 2005), Niagara Falls, Canada, July 2005, pp. 701 706.
X. Liu et al., Multiscale Edge-based Text extraction from complex images, 2006 IEEE, ICME 2006.
A.K. Jain, Fundamentals of Digital Image Processing, Englewood Cliff, NJ; Prentice Hall, 1989, ch.9, pp. 356 357.
Jiang Gao and Jie Yang, An adaptive algorithm for text detection from natural scenes, in Computer Vision and Pattern Recognition, 2001, CVPR 2001, Proceedings of the 2001 IEEE Computer Society Conference on, pp. II-84-II-89.
Jie Xi, Xian Sheng Hua, Xiang Rong Chen, Liu Wenyin and Hong Jiang Zhang, A video text detection and recognition system, in Multimedia and Expo, 2001, ICME 2001, 2001 IEEE International Con ference on, pp. 873-876.
J. Gllavata, R. Ewerth and B. Freisleben, A robust algorithm for text detection in images, in Image and Signal Processing and Analysis, 2003, ISPA 2003, 2003 Proceedings of the 3rd International Symposium on, pp. l611-616.
C. Wolf, J.M. Jolion and F. Chassaing, Text Localization, enhancement and binarization in multimedia documents, in Pattern Recognition, 2002, Aug 2002, vol.2 of Proceedings. 16th International Conference on. Pp.1037 1040.

An Edge Based Text Segmentation From Complex Images

Leave a Reply