- Open Access
- Total Downloads : 731
- Authors : K. Radha, J. Nithya
- Paper ID : IJERTV2IS4474
- Volume & Issue : Volume 02, Issue 04 (April 2013)
- Published (First Online): 20-04-2013
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Image Similarity Assessment Based On Feature Based Sparse Representation
Image Similarity Assessment Based On Feature Based Sparse Representation
K. Radha J. Nithya
PG Scholar Associate Professor Department of Information Technology Department of Information Technology K.S.Rangasamy College of Technology K.S.Rangasamy College of Technology
Tiruchengode. Tiruchengode.
Abstract : Image similarity assessment is essentially important to several multimedia information processing systems and applications.The major goal of image similarity assessment is to design algorithms for repeated and objective evaluation of similarity in a consistent manner with subjective human evaluation.In the existing method, similarity metrics Human visual system (HVS) and Natural scene statistics (NSS) are efficient to measure the quality of an image evaluated with its original version, particularly for some image restoration applications. The HVS and NSS mainly focus on evaluating the similarities between a reference image and its non-geometrically variational versions, such as decompressed and brightness/contrast-enhanced versions. In several applications, assessment of the similarities between a reference image and its geometrically variational versions, such as conversion, turning round, sizing, spinning, and other deformations, is required. A feature based sparse representation(FBSR) approach is proposed to measure the information present in an image, based on robust image quality extraction. The identification of the feature points in an image is followed by describing each feature point using a descriptor. The descriptors of an image is represented via sparse representation and the similarity is assessed between the two images using sparse coding technique. The feature based approach mainly focus on reducing the computational complexities for lexicon feature extraction and image matching by performing sparse coding, which can be further reduced using new algorithm such as the Efficient greedy algorithm and Online lexicon learning algorithm.
Key Terms : SIFT Feature Extraction , Lexicon Learning , Image Similarity Assessment , Sparse Representation , Image copy Detection .
1.INTRODUCTION
In the domain of image processing, assessment of image similarity is significant for many multimedia applications. The similarity measurement is to automatically assess the similarities among the images in a consistent manner. A simple and popularly used metric is the peak signal-to-noise ratio (PSNR) or the corresponding mean squared error(MSE).MSE is a standard criteria for assessment of image quality and fidelity measure[1].The goal of a signal fidelity measure is to compare two signals by providing a quantitative score that describes the degree of relationship/ reliability or, on the other hand, the phase of inaccuracy/deformation between images.It is simple, It is parameter free and inexpensive. The MSE possesses the features to satisfying properties of convexity, symmetry, and differentiability. The MSE is also a desirable measure in the
statistics and estimation framework. It has been employed extensively for Optimizing and assessing a wide variety of signal processing applications. The relationship between MSE/PSNR and human judgement of quality is not suitable for most applications[1]. Image Quality Assessment (QA) algorithms are used for understanding the similarity with a reference or perfect image[2]. Full-Reference QA methods attempt to achieve consistency in quality prediction by modelling salient physiological and psychovisual features of the human visual system (HVS), or by signal fidelity measures[2].
To Quantify the loss of image information to the distortion process, QA systems are regularly involved with judging the visual quality of natural images and videos for
human utilization. The Human Visual System(HVS)/Natural scene statistics (NSS) mainly focus on assessing the similarity between a reference image and its non-geometrically variational versions to improve the PSNR metric[3]. Some advanced approaches, Structural Similarity Index (SSIM) and Visual information fidelity(VIF) are used to capture the loss of image structure.
The VIF method is better than a HVS based method and VIF performs well in single-distortion as well as in cross distortion scenarios. SSIM/VIF can tolerate slightly geometric variations[2].The feature based sparse representation(FBSR) approach is proposed to extract the information that is present in a reference image and how much of this information can be extracted from the test image to measure the similarity between the two images.The FBSR adopt the Scale Inariant Feature Transform(SIFT)[4].The SIFT features are invariant to image size and turning round, and to afford strong matching across a sizeable range of affine alteration, change in 3D viewpoint, noise, and change illumination. The features are highly unique and a single feature can be correctly matched with high probability against a large database of features from many images. SIFT features are widely used in several multimedia applications, such as image retrieval, recognition, copy detection, and near- duplicate detection[4]. The image is characterized by a set of perspective invariant region descriptors so that recognition can proceed successfully even though changes in viewpoint illumination. The similarity with text retrieval implementation matches on descriptors are pre-computed and the scheme builds method of indexing descriptors extracted from local regions, and is robust to background clutter[5]. The local region descriptors are hierarchically quantized in a term tree. The term tree allows a larger and more inequitable words to be used efficiently, it leads to a dramatic improvement in
retrieval quality and the tree is directly defines the quantization. The quantization and the indexing are consequently fully integrated[6].
Overview of Proposed Scheme
In several Multimedia applications, the assessment of similarity between a reference image and its geometrically variational versions is required. A feature based sparse representation approach is proposed to measure the information present in an image, based on vigorous image quality extraction. The identification of the quality points in an image is followed by describing each quality point using a descriptor. The descriptors of an image is represented by sparse depiction and the similarity is assessed between the two images using sparse coding method. The quality descriptor is sparsely depicted in terms of a lexicon or transferred as a linear combination of lexicon atoms, and to achieve efficient feature representations and vigorous image similarity assessment. Atom means a unique sample or basic unit learned from a set of training data.
A lexicon consisting of several atoms can provide sparse depiction of the data as a linear amalgamation of a few atoms.The feature based approach mainly focus on reducing the computational complexities for lexicon feature extraction and image matching by performing sparse convention, which can be further reduced using new algorithms, such as the Efficient greedy algorithm and Online lexicon learning algorithm.
II PROPOSED QUALITY BASED SPARSE REPRESENTATION ON IMAGE SIMILARITY ASSESSMENT (FBSR) METHODOLOGY:
Sparse representation has resulted in major impact on computer vision and pattern recognition. The goal of FBSR is to obtain the compact representation of the observed signal and also to extract semantic information.The selection of lexicon plays a key task to achieve the image similarity assessment.In the proposed scheme , thesparse representation and lexicon learning techniques are used. FBSR scheme first apply the standard SIFT to detect the keypoints of an image based on the keypoints, the quality points are extracted from the test and reference image. To make the quality points more compact using the lexicon learning algorithm and finally using the sparse coding to achieve the image similarity[3].
Scale Invariant Feature Transform(SIFT)
SIFT transforms image data into scale-invariant coordinates virtual to local quality points and generates large numbers of points that compactly cover the image over the full range of scales and locations[3]. The quantity of features is mostly important for entity identification, where the ability to distinguish small objects in cluttered backgrounds requires that at least 3 points be correctly matched from each entity for reliable identification.
-
SIFT Feature Extraction
SIFT features are first extracted from a set of reference images and stored in a database. Fig1 shows the reference
image. A new image is harmonized by independently evaluating each feature from the new image to this previous database and finding candidate matching features based on Euclidean distance of their feature vectors.SIFT feature extraction has been performed in four stages [3]
1. Scale-space extrema detection
Search over all scales and the entire image locations. Fig-2 Shows the difference-of-Gaussian function to identify potential interest points that are invariant to scale and orientation. The scale space of image is defined as a function of L(u,v,z), that is produced from the convolution of a variable scale Gaussian, G(u,v,z), L(u,v,z)=G(u,v,z)*l(u,v), where * is the convolution operation in u and v and
G(u,v,z)=(1)
To efficiently detect stable keypoint locations in scale space, In the proposed scheme using scale-space extrema in the difference-of-Guassian function convolved with the image (u,v,z), which can be computed from the difference of two nearby scales separated by a constant multiplicative factor k.
C(u,v,z)=(G(u,v,kz)-G(u,v,z))*I(u,v)=L(u,v,kz)- L(u,v,z)….(2)
Fig 1. Reference Image
Confined extrema detection
Each sample point is compared to perceive the confined maxima and minima.The samples are compared with eight neighbors of the current image and nine neighbors of the scale above and below. Maxima is selected only if it is larger than all neighbors and minima is selected only if it is smaller than all neighbors.
2 .Keypoint localization
A keypoint has been found by comparing a pixel to its neighbors and is to perform a detailed fit to the nearby data for location, scale, and ratio of key curvatures. The low dissimilarity points or poorly restricted behind an boundaries are removed by key point localization.
C(U)=C+U……..(3)
Fig 2. Diffrence of Gaussion
Where C and its imitatives are estimated at the sample point U and is the offset from the sample point.
Eliminating Edge response:
The stability is improved by removing the low contrast keypoints.The difference-of-Gaussian function will have a strapping response along boundaries, even if the position along the boundaries are poorly determined and therefore unstable to small amounts of noise.
3Orientation assignment
Based on local image slant information, one or more orientations are allocated to each key peak location. All potential actions are performed on image data that has been changed relative to the dispersed orientation, size, and position for each surface, thereby providing invariance to these revolutions.
4.Keypoint descriptor Generation
The key points are transformed into a representation that allows for significant levels of local shape distortion and change in illumination.
Representation of Feature Descriptor
A keypoint descriptor is created by first computing the gradient magnitude and direction at each image sample point in a region around the keypoint location. The key points are weighted by a Gaussian window, indicated by the overlaid circle and the samples are accumulated into orientation histograms summarizing the contents over 4×4 subregions with the length of each corresponding sum of the gradient magnitudes near that direction within the region. The 2×2 descriptor array computed from an 8×8 set of samples, the proposed scheme use 4×4 descriptors computed from a 16×16 sample array and each descriptor is 128 dimen sional vector. Usually hundreds to thousands of keypoints may be extracted from an image. .
To make the scale invariant feature transform quality is more compact, the bag of words (BoW) representation approach quantizes SIFT descriptors by vector quantization technique into a collection of visual words based on a pre- defined illustration terms or term tree[4] .
Term Tree
The term tree defines a hierarchical quantization that is built by hierarchical k-means clustering. A large set of consign descriptor vectors are used in the unsupervised training of the tree. The indexing descriptors extracted from local image region and is robust to background clutter.The local region descriptors are hierarchically quantized into a term tree. The term tree allows a larger and more inequitable term to be used efficiently, which leads to a striking enhancement in seizure quality. The most important property of the scheme is that the tree directly defines the quantization.The quantization and the indexing are fully integrated.
4.1.2 BoW representation
BoW representation approach quantizes SIFT feature descriptor by vector quantization into a collection of visual words based on the visual term tree.The BoW representation approach assessing the similarity between SIFT feature descriptors can be measured by matching their corresponding visual words by histogram matching.
Dessriptor Testing
In the descriptor testing , two parameters used to vary the complexity of the descriptor. 1. The number of orientations r in the histograms and the width n of the n array of orientation histograms.2.The size of the resulting descriptor Vector is rn2. The complexity of the descriptor groes, it will be able to discriminate better in a large database and it will also be more sensitive to shape distortions and occlusion[3].
-
Lexicon Learning
The feature vectors are represented in terms of a lexicon atoms.The lexicon atoms are Constructed for a set of SIFT features using the K-Singular Value Decomposition lexicon learning algorithm. K-SVD simplify the K-Means algorithm[10].The K-SVD is an iterative process that alternates between sparse coding of the prototypes based on the current lexicon and an update process for the lexicon atoms to better glowing the data. The update of the lexicon columns are joint with an update of the sparse representation coefficients correlated to it, resulting in rushed convergence. The K-SVD algorithm is stretchy and can effort with any detection scheme. Given a set of K SIFT feature vectors and i=1, 2,3..k and apply K-SVD algorithm to find the lexicon D. K-SVD has been performed in two steps.
-
Sparse coding stage
The feature vectors are sparsely decomposed into training samples by using the orthogonal matching pursuit and the samples are grouped into more discriminative samples.
-
Lexicon update stage
Update the lexicon for improved robust training samples.The lexicon is together with the non-zero coefficients of sparse representation samples. The size of the lexicon is L×N,L<N<<K, L is the length of the SIFT and N is the Sparse coefficient length, by formulating the problem as
D, ]
focus to , L, ..(4)
Where is the sparse representation
coefficients of and – is the norm of , counts the number of nonzero coefficients of and L is the most desired number of nonzero coefficients of . During the lexicon learning the -minimization in (4) can be converted into an – minimization problem.The obtained lexicon fature D is an overcomplete lexicon where the lexicon contains N Prototype feature vector atoms as the column vectors in D. Each feature vector can be sparsely represented in terms of lexicon atoms D satisfying , where 0 is an error tolerance.
ALGORITHM I: PROPOSED FSR
Input: A reference image and a test image .
Output: The similarity value between and , i.e., , )
.1. Extract the SIFT feature vectors , , , from , followed by learning the lexicon feature sparsely representing .
2. Extract the SIFT feature vectors , , , from , followed by learning the lexicon feature sparsely representing .
Online lexicon learning:
Require: u p(x) R
(regularization parameter), (initial lexicon), T
(number of iterations).
1: 0, 0 (reset the past information).
2: for t = 1 to T do
3: Draw from p(u).
4: Sparse coding: compute using LARS
arg +||||1.
5: = +
6: = +
7: Compute using lexicon update , with as warm restart, so that
arg + .
8: end for
9: Return (learned lexicon).
Lexicon Update:
Require: D = [d1, . . . ,dk] (input lexicon),
A = [a1, . . . ,ak]
B = [b1, . . . ,bk]
1: repeat
2: for j = 1 to k do
3: renew the j-th column to optimize
() (Orthogonal projection vector)
4: end for
5: until convergence
6: Return D (updated lexicon).
3: Perform -minimization by solving (3) for with respect to []
4: Calculate the reconstruction errors, and , for , with respect to and , respectively.
5:Perform voting by comparing and , , and get the percentages of and , with respect to and , respectively.
6:Calculate , ) = .(4)
-
-
Sparse Representation Based Image Similarity Assessment
Consider the two SIFT feature descriptors , i= 1, 2,3 , and , j=1,2,3 extracted from the reference image( ) and test image( ) with length L. and are the number of feature descriptors of test and reference image .The lexicon features of and are (of size
) And (of size L ) where L< and L<
.Hence = and , where and are two sparse coefficient vectors with length and
Assess the similarity between a reference and test image, use the discriminative features of sparse representation[9]. Concatenating the lexicon atoms for Measure how much of information present in the reference image can be exracted from the test image by greedy algorithm. Concatenating lexicon features for representing each SIFT feature descriptors( ) of test image. The joint
lexicon = can be defined as
= subject to 2
(5)
Where is the sparse coefficient vector of test image . To solve the sparsest solution for , (2) can be transmit to an – minimization problem as
(D) D)..(6)
where D in is the lexicon, each column representing a basis vector, and is a loss function such that l(u,D) should be small if D is good at representing the signal u in a sparse fashion. The number of samples n is usually large, whereas
the signal length m is relatively small, but each signal only uses a few elements of D in its representation and the overcomplete dictionaries with k > m are allowed.define l(u,D) as the optimal value of the sparse coding problem
l(u,D) +
where is a regularization parameter problem is also known as basis pursuit and is the sparse decomposition coefficient.The basis pursuit is well known that 1
Reference Image
Test Image
Reference Image
Test Image
Feature extraction using SIFT
Feature extraction using SIFT
Feature extraction using SIFT
Feature extraction using SIFT
Feature vectors Feature Vectors
lexicon Learning using K-SVD
lexicon Learning using K-SVD
lexicon Learning using K-SVD
lexicon Learning using K-SVD
lexicon Atoms
Information Extraction by sparse coding
Information Extraction by sparse coding
FBSR Similarity Value Fig 3.Concept of proposed scheme
regularization yields a sparse solution for , but there is no straight logical relation between the value of and the
corresponding effective sparsity ||||0.To prevent D from having arbitrarily large values, Pursuit is common to constrain its columns d1, . . . ,dk to have an 2-norm less than or equal
to one, and call C the convex set of matrices verifying this
constraint:
C {D s.t. j = 1, . . . ,k, dj 1} The problem of minimizing the empirical cost fn(D) is not convex with respect to D and It can be rewritten as a joint optimization problem with respect to the lexicon D and the coefficients = [1, . . . ,n] in of the sparse decompositions, which is not equally curved, but curved with respect to each of the two variables D and when the other one is fixed.
=
+T) (7)
Where T is the real parameter and apply efficient greedy algorithm to directly solve (7) in order to find the sparse representation ) of with respect to the lexicon .The position of non-zero coefficients in sparse representation highly concentrated on only one sub lexicon and the remaining coefficients are zero.The sparse representation is perceptive to expect that the atoms for sparsely representing feature vectors of test image should be selected from the sub lexicon of test image. Based on the sparse representation, calculate the reconstruction error as . Reconstruction error of reference image is calculated using
the lexicon atoms from the reference image. Reconstruction error of test image is calculated using the lexicon atoms from the test image[11]. Reconstruction error of reference image is greater than test image , the reference image atoms are more suitable to representing the feature vectors of test image and reference image lexicon atoms get vote. Reconstruction error of test image is greater than reference image, the test image atoms are more suitable to representing the feature vectors of test image and the test image get vote. Considering all the SIFT feature vectors of test image , the obtained percentage of votes from and
are denoted by and and respectively. Based on the voting strategy define the similarity between the two images, and , as
SIM( , ) = ..(8)
Achieve better reconstruction errors for lexicon atoms of test image ,the K-SVD using three rules
-
The number of lexicon atoms in reference should be larger than test image.
-
The K-SVD perform number of iterations for learning lexicon atoms of reference image should be larger than test image.
-
More number of non-zero coefficients are used to representing the each feature vector for learning lexicon atoms of reference image.
-
Considering all the SIFT feature vectors of test image and obtained percentage of votes from test and û image lexicon atoms. The vote for reference and test image range is 0<=1.The similarity range of two images [-1,1] which can be shifted to [0,1].The larger similarity shows that more lexicon atoms are learned from reference image.The result indicates that a considerable amount of information existing in reference image can be extracted from the test image. More lexicon atoms are learned from test image and the result shows that less /no informations are extracted from test image.
III Conclusion
The core proposed a feature -based image similarity assessment technique by exploring the two aspects of a feature detector in terms of representation and matching in FBSR . The reference image features were extracted by using Scale Invariant feature Transform. SIFT features are first extracted from a set of reference images and stored in a database. A new image is matched by individually comparing each feature from the new image to this previous database and finding candidate matching features based on Euclidean distance of their feature vectors.SIFT feature extraction first search over all scales and image locations. A difference-of- Guassian funcion to identify potential interest points that are invariant to scale and orientation.To efficiently detect stable keypoint locations in scale space.In the proposed scheme using scale Extrema in the difference-of-Guassian function convolved with the image, which can be computed from the difference of two nearby scales separated by a constant multiplicative factor k and the key point has been found by comparing a pixel to its neighbors. BoW representation approach quantizes SIFT feature descriptor by vector quantization into a collection of visual words based on the
visual vocabulary tree.The BoW representation approach assessing the similarity between SIFT feature descriptors can be measured by matching their corresponding visual words by histogram matching. The SIFT is one of the most pervasive and vigorous image features , and it has been widely used in several multimedia applications, image retrival , appreciation , copy detection and near-duplicate detection.The proposed FBSR may be extended to video copy detection by learning the lexicon features for each video sequence/clip.
REFERENCES
-
Z.Wang and A. C. Bovik, Mean squared error: Love it or leave it?A new look at signal fidelity measures, IEEE Signal Process. Mag., vol.
26, no. 1, pp. 98117, Jan. 2009.
-
H. R. Sheikh and A. C. Bovik, Image information and visual quality,
IEEE Trans. Image Process., vol. 15, no. 2, pp. 430444, Feb. 2006.
-
D. G. Lowe, Distinctive image features from scale-invariant keypoints,
Int. J. Comput. Vision, vol. 60, no. 2, pp. 91110, 2004.
-
J. Sivic and A. Zisserman, Video Google: A text retrieval approach to object matching in videos, in Proc. IEEE Int. Conf. Computer Vision, Nice, France, Oct. 2003, vol. 2, pp. 14701477.
-
D. Nistér and H. Stewénius, Scalable recognition with a vocabulary tree, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2006, pp. 21612168.
-
L. W. Kang, C. Y. Hsu, H. W. Chen, and C. S. Lu, Secure SIFTbased sparse representation for image copy detection and recognition,in Proc. IEEE Int. Conf. Multimedia and Expo, Singapore, Jul. 2010.
-
K. Mikolajczyk and C. Schmid, A performance evaluation of local descriptors, IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 10, pp. 16151630, Oct. 2005.
-
J. Wright, Y. Ma, J. Mairal, G. Sapiro, T. Huang, and S. Yan, Sparse representation for computer vision and pattern recognition, Proc.
IEEE, vol. 98, no. 6, pp. 10311044, Jun. 2010.
-
M. Aharon, M. Elad, and A. M. Bruckstein, The K-SVD: An algorithm for designing of overcomplete dictionaries for sparse representation,
IEEE Trans. Signal Process., vol. 54, no. 11, pp. 43114322,Nov. 2006.
-
S. Mallat and Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Trans. Signal Process., vol. 41, no. 12, pp. 33973415, Dec. 1993.
-
J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online learning for matrix factorization and sparse coding, J. Mach. Learn. Res., vol. 11, pp.
1960, 2010.
-
A. Yang, A. Ganesh, Y. Ma, and S. Sastry, Fast l -minimization algorithms and an application in robust face recognition: A review, in
Proc. IEEE Int. Conf. Image Process., Hong Kong, Sep. 2010.
-
C. Y. Hsu, C. S. Lu, and S. C. Pei, Secure and robust SIFT, in Proc. ACM Int. Conf. Multimedia, Beijing, China, 2009, pp. 637640.
-
C. Y. Hsu, C. S. Lu, and S. C. Pei, Homomorphic encryption-based secure SIFT for privacy-preserving feature extraction, in Proc. IS&T/ SPIE Media Watermarking, Forensics, and Security, Jan. 2011.
International Journal of Engineering Research & Technology (IJERT)
ISSN: 2278-0181
Vol. 2 Issue 4, April – 2013