Generation of Stereoscopic Images by using Monocular Camera

Swapnil Lonare; Prof.    Mithilesh Mahendra

doi:10.17577/IJERTV4IS060745

Volume 04, Issue 06 (June 2015)

Generation of Stereoscopic Images by using Monocular Camera

DOI : 10.17577/IJERTV4IS060745

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 50
Total Downloads : 125
Authors : Swapnil Lonare, Prof. Mithilesh Mahendra
Paper ID : IJERTV4IS060745
Volume & Issue : Volume 04, Issue 06 (June 2015)
DOI : http://dx.doi.org/10.17577/IJERTV4IS060745
Published (First Online): 26-06-2015
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Generation of Stereoscopic Images by using Monocular Camera

Swapnil Lonare

M. tech Student

Department of Electronics Engineering (Communication) Abha Gaikwad – Patil College of Engineering.

Nagpur, India 440016

Prof. Mithilesh Mahendra

Assistant Professor

Department of Electronics Engineering (Communication) Abha Gaikwad – Patil College of Engineering.

Nagpur, India 440016

Abstract Stereoscopic 3D displays plays an very important role in future applications. The proposed technique which takes two images or video frames using a monocular camera as input and transforms them into a stereoscopic which makes nice watching . Scale Invariant Feature Transform (SIFT) and Speed Up Robust Features (SURF) are being used for better result. As SURF algorithm is the fastest descriptor it reduces the time for feature detection.

KeywordsSIFT,SURF,steroscope.

3D INTRODUCTION

Displays nowadays has become a common household item, most people dont know about the working of 3D displays. Here our prime focus will be on the core points for the sake of simplicity. A human being's ability to perceive the third dimension goes hand-in-hand with our binocular vision. It is because of two eyes that we are able to see 3D with ease. Our brain combines the perspectives of our two eyes to give us a sense of how close or far an object is. here actually each sees the world with different position. These two different position, are thereby called as stereoscopic vision. In order to to demonstrate stereoscopic vision with a quick exercise. First Close your left eye and then put your right hand nearly about four inches in front of your right eye. Move your hand a little. Repeat the same process with your right eye you will find a big difference in your sense of depth and the position of your hand in 3D space. Your brain is able to put together that rich sense of relative placement and provide an accurate indication of how far your hand actually is from your face only when both the eyes give alternative face perspective. Imagine if you stop moving your hand and close each eye instead, you will detect that each eye will give a different view of your hand ; how it sits in your field of vision. The key to a 3D display, then, is to give each eye with an alternate view of the scene that is the alternative perspective of the same scene. To perform this task in a theater is bit challenging since there is only one screen to look at. How does a a three-dimensional form display provid a individual image for each eye? Amazingly, there are a number of paths to achieve this goal, one of which includes the use of old-school anglophil red-and-blue colored glasses.

But when it comes to mostly used and modern applications, there are two stereoscopic 3D systems that are going to capture the world: alternate-frame sequencing and dual-projector polarization. A stereoscopic image pair can be captured in many ways for example, by using a custom-built rig of two cameras in order to simulate two human eyes.

Fig. 1. Stereo camera rig. A stereo camera rig is rectified to simulate the human vision system. The optical axes of the two cameras are parallel to each other and perpendicular to the baseline, as shown in (b). A pair of images casually taken by a regular monocular camera, however, usually do not meet this requirement, as shown in (a).

This paper presents a technique to rectify such two images into a stereo image pair as if they were taken by a rectified stereo camera rig. (a) Unrectified camera rig, (b) Rectified

camera rig. As shown in Fig. 1(b), a stereoscopic camera system has got two cameras (lenses). These two cameras possess same intrinsic camera parameters and the same orientation. Their optical axes are parallel to each other and perpendicular to the baseline. These two cameras are typically separated from each other by the distance which is roughly equal to the distance between two human eyes that is by 2.5 inches .Whenever eyes that is by 2.5 inches

.Whenever needed, these two cameras are carefully toed in slightly for better depth composition. The camera rigs are difficult for common users to design and use.

The emerging consumer-level binocular camera systems, such as the FinePix REAL 3D W3 cameras, make it easier to create a stereo image pair. However, professional binocular cameras would be more difficult to manufacture and use due to the necessarily large form factor.
SIFT :

For the SIFT detector there are four main stages given as , scalespace extrema detection, keypoint localization, orientation computation and keypoint descriptor extraction [5]. The first stage uses Difference of Gaussians (DoG) to find out the potential keypoints. Several Gaussian blurred images at distinct scales are formed from the input image and DoGs are computed from neighbours in scale space. In the second stage, candidate keypoints are located by finding extrema in the DoG images that are locally extremal in space and scale. Spatially unstable keypoints are removed by thresholding against the ratio of eigenvalues of the Hessian matrix (unstable edge keypoints have a high ratio, and stable corner keypoints have a low ratio), even low contrast keypoints are eliminated and the remaining keypoints are localised by interpolating across the DoG images. The third stage gives a principal orientation to each keypoint. The final phase computes a highly distinctive descriptor for each keypoint. In order to achieve orientation invariance, the descriptor coordinates and gradient orientations are rotated relative to the key point orientation. For every keypoint, a set of orientation histograms are created on 4×4 pixel neighborhoods with 8 bins each (using magnitudes and orientation of samples in 16 x 16 region around the keypoint). The resulting feature descriptor will be a vector of 128 elements that is then normalized to unit length to handle illumination differences. Descriptor size can be varied, however best results are reported with 128D SIFT descriptors [5].SIFT descriptors are invariant to rotation, scale, contrast and partially invariant to other transformations.

Width of SIFT descriptor controlled its size by i.e. the array of orientation histograms (n x n ) and number of orientation bins in each histogram (r). The size of resulting SIFT descriptor is rn2 [5]. The value of n affects the window size around the keypoint as we use 4 x 4 region to capture pattern information e.g. for n = 3, here a window of size 12 x 12 Will be used around the keypoint. Different sizes were analyzed in [5] and it was reported that 128D SIFT is far better in terms of matching precision, i.e. n = 4 and r = 8. Many of the others work have used standard 128D SIFT features while very few has gone for thev smaller SIFT

descriptors for small scale works e.g. 3 x 3 subregions provided 36D SIFT features , each with 4 orientation bins, with few target images are used in [18].Smaller sized descriptors use less memory and result in faster classification but cost the negative impact on precision rates. No research article has checked classification performance of SIFT descriptors of size other then 128.
RESULT

Following are the result of SIFT shows in fig 2and.The two images left and right which are are taken from monochrome camera.as shown in fig 2a. which are common input images for both the SIFT algorithm. The

Fig.2 (a) Left and right images.

Fig:2.b. SIFT feature of left image.

Fig: 2.c. SIFT feature of right image.

Fig: 2.d. Rectified two image into a stereo image.

Fig:2.d Rectified two image ino a stereo image.

Average Disparity SIFT Based using nearest neighbor ratio and Sum of absolute difference

Before Calibration	6.859855
After Calibration	1.006139e-01

CONCLUSION

This paper has evaluated feature detection methods for stereoscopic image . Based on the experimental results, it is found that the SIFT has detected more number of features undergoes with speed. Our future scope is to make these algorithms work all types of image and also for the video.

REFERENCES

T. Linderberg, Feature Detection with automatic scale selection, International journal of Computer Vision, Vol. 30, pp. 79-116, 1998.
K. Mikolajczyk and C. Schmid, An affine invariant interest point detector, Proc. European Conference on Computer Vision, pp. 128- 142, 2002.
T. Tuytelaars and L. Van Gool, Wide baseline stereo based on local, affinely invariant regions, Proc. British Machine Vision Conference, pp. 412-425, 2000.
J. Matas, O. Chum, M. Urban and T. Padjla, Robust wide baseline stereo from maximally stable extermal regions, Proc. British Machine Vision Conference, Vol. 1, pp. 384-393, 2002.
D. Lowe, Distinctive Image features from scale invariant keypoints, International journal of Computer Vision, Vol. 60, pp. 91-110, 2004.
H. Bay, T. Tuytelaars and L. Van Gool, SURF: Speeded Up Robust Features, Proc. European Conference on Computer Vision, Vol. 110, pp. 407-417, 2006.
G. Carneiro and A.D. Jepson, Multi scale phase based local features, Proc. International Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 736-743, 2003.
F. Schaffalitzky and A. Zisserman, Multi view image matching for unordered image sets, Proc. European Conference on Computer Vision, Vol. 1,pp. 414-431, 2002.
C. Harris and M. Stepehens, A combined corner and edge detector, Proc. Alvey Vision Conference, pp. 147-151, 1998.
E. Rosten, R. Porter and T. Drummond, FASTER and better: A machine learning approach to corner detection, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 32, pp.105- 119, 2010.
T. Kadir and M. Bardy, Scale, saliency and image description, International journal of Computer Vision, Vol. 45, pp.83- 105, 2001.
K. Mikolajczyk, T. Tuytelaars, C Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir and L. Van Gool, A Comparison of Affine Region Detectors, International journal of Computer Vision, Vol. 65, pp. 43-72, 2005.
L. Juan and O. Gwun, A Comparison of SIFT , PCA-SIFT and SURF, International Journal of Image Processing, Vol. 65, pp. 143-152, 2009.
J. Bauer, N. Sunderhauf and P. Protzel, Comparing several implementations of two recently published feature detectors, Proc. International Conference on Intelligent and Autonomous Systems, 2007.
Y. Ke and R. Sukthankar, PCA-SIFT a more distinctive representation for local image descriptors, Proc. International Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 506-513, 2004.
P.A. Viola and M.J. Jones, Rapid Object Detection using a boosted cascade of simple features, Proc. International Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 511-518, 2001.
Y. Zhann-Long and G. Baco-Long,Image based mosaic based on SIFT, Proc. International Conf. on Intelligent Information Hiding/ Multimedia Signal Processing, pp. 1422-1425, 2008.
W. Daniel, R. Gerhard, M. Alessandro, D. Tom and S. Dieter, Pose Tracking from Natural Features on Mobile Phones, Proc. International Symposium on Mixed and Augmented Reality, pp. 125- 134, 2008.
D. Nister and H. Stewenius, Scalable recognition with a vocabulary tree, Proc. International Conference on Computer Vision and Pattern Recognition,Vol. 2, pp.2161-2168, 2006. Available from http://www.vis.uky.edu/stewe/ukbench/.
C. Evans, Notes on the OpenSURF Library, University of Bristol (UK), 2009. Available from http://www.chrisevansdev.

Generation of Stereoscopic Images by using Monocular Camera

Leave a Reply