Video Retrieval From Compressed Videos

DOI : 10.17577/IJERTV1IS7085

Download Full-Text PDF Cite this Publication

Text Only Version

Video Retrieval From Compressed Videos

  1. M. NAGARAJU 2. M. THANOJ 3. P. MANIKANTA 4. T.K. PRANEETH 5. S. BHAVANI 1,3,4,5.Assistant Professor, IT Dept, Gudlavalleru Engineering College, Gudlavalleru

  2. HOD, CSE Dept, Andhra Loyola Institute of Engineering and Technology, Vijayawada

Abstract

This paper proposes a thorough scheme, by virtue of camera zooming descriptor with two-level threshold, to automatically retrieve close-ups directly from moving picture experts group (MPEG) compressed videos based on camera motion analysis. A new algorithm for fast camera motion estimation in compressed domain is presented. In the retrieval process, camera- motion-based semantic retrieval is built. To improve the coverage of the proposed scheme, close-up retrieval in all kinds of videos is investigated. Extensive experiments illustrate that the proposed scheme provides promising retrieval results under real-time and automatic application scenario.

Keywords

Camera motion analysis, close-up retrieval, moving picture experts group (MPEG) compressed videos

  1. INTRODUCTION

    Moving picture experts group (MPEG) video compression techniques are already well developed and widely deployed, and the goal of effective retrieval directly in compressed domain has yet to be realized. However, most of the existing digital video processing techniques present difficulties to extract any meaningful features for further analysis from MPEG compressed videos. To extract relevant features, the content should in principle be decoded first, since this operation is time consuming, especially when a large video database should be processed, features extraction directly in compressed domain would be particularly interesting by providing fast and reliable information retrieval and analysis tools. While considerable research has been conducted to achieve image or video analysis, such as content feature extraction at low levels[1, 2], semantic feature extraction at high levels[3, 4], and video indexing/retrieval[511], only few solutions

    have been given to this challenging task with limited decoding of MPEG video streams. To this end, our work in this paper focuses on automatically retrieving the high-level semantic concept, i. e., close up, directly in MPEG compressed domain based on camera motion analysis. In order to reduce the semantic gap, we propose a thorough scheme with two stages: camera motion feature extraction directly in MPEG compressed domain, and then semantic retrieval for close-ups via exploiting camera zooming descriptor with two-level threshold, but without human intervention, especially under real-time application scenario, and worthy of investigation.

    Camera motion is an important feature in video analysis. Various camera operations used in video production have been described[1216]. In many applications, however, only pan, tilt, and zoom parameters are considered. Furthermore, models of camera motion can be used to detect important events. Herein, a new algorithm for fast camera motion estimation in compressed domain is presented.

    Previous research on close-up detection is mostly in sports video[1719]. To improve the coverage of the proposed scheme, we investigate close-up retrieval extensively in all kinds of videos. The basic principle is to extract zooming descriptor directly in compressed domain, and when its value is larger than a certain two-level threshold, the frame is retrieved as a close-up.

    The remainder of this paper is organized as follows. Section II introduces the new algorithm for fast camera motion estimation in compressed domain. Section III presents automatic close-up retrieval from MPEG compressed videos. Section IV contains the experimental results and evaluations. Section V provides conclusions.

  2. FAST CAMERA MOTION ESTIMATION INCOMPRESSED DOMAIN

    In the past few decades, significant research has been carried out to estimate the camera motion parameters in pixel domain. The

    general principle for all the work can be described as follows. Assuming that camera is undergoing rotation and zoom but no translation, the change of image intensity between neighboring frames can be modeled by the following 6-parameter projective transformation:

    where (x, y) and (x_, y_) are the image coordinates of corresponding points in two neighboring frames and [p1.,p6] are parameters of camera motion.

    Reference [16] put forward a compressed domain parameter estimation under the assumption that from one frame to the next:

    1) we can set p5 = p6 = 0, i. e., that perspective distortion effects are minimal; 2) we can set p2 = 0, i. e., that the camera does not rotate about the axis of the camera lens. In this case, the 6- parameter model can be approximated by the three-parameter transformation:

    s is referred to as the camera zoom factor, f as the camera pan rate, and f as the camera tilt rate, respectively. A zooming action takes place by a change of camera focal length from f to f, which is characterized by s = f/f_. As p1 = s,

    p3 = sf, and p4 = sf, (2) can be rearranged into

    Since MPEG has already exploited inter-frame redundancy via motion estimation and compensation, the above parameters can be quickly estimated via P-frames, in which correspondence can be approximated by the coordinates of corresponding macro-blocks from image point (x, y) to (x_, y_), connected by motion vector MV, where MVx = x x, MVy = y y.

    Cartesian axes, the center of the k-th inter coded macro-block has coordinates:

    and its corresponding motion vector in consistent units is (MVyk,MVxk).

    This macro-block is matched with the point in the previous anchor frame that has image-centered Cartesian coordinates:

    Hence, we take image point from (x, y) to (x, y) in each inter coded macro block as a sample of the unknown projective transformation. These samples can then be used to form a linear-in-the- parameters least-squares regression problem that can be solved to determine estimates of the unknown parameters of the projective transformation. In this case, the cost to be minimized is

    Finally, after a series of mathematical manipulation, the camera zoom parameter s can be estimated via the following equation:

    Let (i0, j0) represent the image coordinates (i. e., standard image coordinates) of the center of the image, and (ik, jk) present the row and column coordinates of the center of the k-th inter coded macro-block in the current P- frame. Then, with respect to image-centered

    In an MPEG video, all of the above required data can be obtained directly: (ik, jk) are the image coordinates of the center of the k-th inter coded macro-block, (MVxk,MVyk) is derived from its corresponding motion vector, and N is the number of inter coded macro blocks.

  3. CLOSE-UP RETRIEVAL

    General observations reveal that close- ups are dominated by large foreground areas inside the video frames. Most of the close-ups can be detected by measuring the proportion of its foreground size to the frame size. To improve the coverage of close-up detection and implement it extensively, the camera motion descriptor is utilized, i. e., zooming descriptor with two-level threshold to automatically retrieve close-ups. The zoom factor modifies the perspective projection, not the displacement of world points relative to the camera reference frame. The parameter s indicates the zooming effect of the camera, where s > 1 represents zoom in, s < 1 represents zoom out, and s = 1 represents no zoom. Let the class labels for threshold be denoted as THS and THM; the forme represents the threshold for a single detected zoom-in frame, and the latter represents the threshold for continuous multiple detected zoom-in frames. Then, the threshold operation is as follows: 1) If only one single zoom-in frame is detected, then the threshold is set to THS.

    Moreover, on the condition that its zoom factor

    s has a higher value than THS, this frame is retrieved as a close-up; 2) If continuous multiple zoom-in frames are detected, then the threshold is set to THM. Meanwhile, the maximum zoom factor smax and the minimum zoom factor smin for these continuous frames are obtained. If the

    ratio of smax to smin is larger than THM, these frames are retrieved as close-ups.

    The above can be described as

    where FS denotes a single detected zoom-in frame, FM denotes continuously multiple detected zoom-in frames, and CF denotes retrieved close-ups. The proposed automatic close-up retrieval directly in MPEG compressed domain using zoom-in descriptor with two-level threshold is summarized in Fig. 1. It is well known that a limited number of samples cannot cover a wide variety of videos. The threshold

    determination issue is basically an ill-posed problem by nature. Therefore, a better solution to this problem is to use statistics to estimate a value, or to be determined empirically. After beginning with using a good theoretical foundation, the proposed approach is able to better retrieve close-ups using much fewer heuristics than conventional methods require. As for our close-up retrieval scheme, THS is set to 1.001, and THM is set to IV.

  4. EXPERIMENTAL RESULTS AND EVALUATIONS

    To evaluate the performance, the proposed automatic close-up retrieval scheme are tested on a database with different MPEG video clips, including documentary videos grabbled from well-known TREC2001 video sequences, movies, sports, and news. These video clips are chosen due to the complexity of extensive graphical effects. Videos are captured at a speed of 30 fps, a frame size of 352×240, and stored in MPEG format. The ground truth for the numbers of close-ups in these video clips are determined manually. In order to assess the accuracy, a statistical performance measurement for each

    video clip is implemented. The values for the quality are defined as recall and precision:

    where D is the number of correct retrievals, MD is the number of missed retrievals, and FA is the number of false alarm. Recall measures the ratio of correct retrievals to the ground truth in a video clip, while Precision measures the ratio of correct retrievals to the total retrievals by algorithm. Based on this evaluation metric, we conduct experiments using the above test database, and the corresponding recall and precision rates for close-up retrieval are listed in Table 1. From Table 1, it can be seen that the proposed close-up retrieval algorithm achieves on average 92.22% recall rate with 87.33% precision rate. The results demonstrate that the proposed scheme is computationally efficient and consistent, also achieve superior performances in terms of both recall and precision rates. The close-ups of persons and objects can both be retrieved correctly and effectively. The samples of retrieval results of close-ups from test video clips are shown in Fig.

    2. Furthermore, we analyze the experimental data in detail, and obtain that the camera zoom factor s does not increase gradually, but fluctuates up and down, when the camera continues zooming in.

    Table 1 Summary of experimental results for close-up retrieval

    Fig. 2 Samples of close-up retrieval results from the test database.

  5. CONCLUSION

The main contributions of this paper are summarized as follows. By exploiting the new fast camera motion estimation in compressed domain, close-up retrieval using zooming descriptor with two-level threshold is built. The whole process is under real-time application scenario and without human intervention. The usability and efficiency of the proposed scheme are demonstrated through extensive experiments. It is shown that the computational complexity and the retrieval performance are well balanced in the proposed scheme. Close-ups of persons and objects can both be retrieved correctly and effectively.

REFERENCES

  1. J. Jiang, Y. Weng, P. J. Li. Dominant Colour Extraction in DCT Domain. Image and Vision Computing Journal, vol. 24, no. 12, pp. 1269 1277, 2006.

  2. J. Jiang, Y. Weng. Video Extraction for Fast Content Access to MPEG Compressed Videos. IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 5, pp. 595605, 2004.

  3. J. Vendrig, M. Worring. Systematic Evaluation of Logical Story Unit Segmentation. IEEE Transactions on Multimedia, vol. 4, no. 4, pp. 492499, 2002.

  4. H. W. Agius, M. C. Angelides. Modeling Content for Semantic-level Querying of Multimedia. Multimedia Tools and Applications, vol. 15, no. 1, pp. 537, 2001.

  5. T. Athanasiadis, P. Mylonas, Y. Avrithis, S. Kollias. Semantic Image Segmentation and Object Labeling. IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 3, pp. 298312, 2007.

  6. D. Djordjevic, E. Izquierdo. An Object- and User-driven System for Semantic-based Image Annotation and Retrieval. IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 3, pp. 313323, 2007.

  7. D. Vallet, P. Castells, M. Fernandez, P. Mylonas, Y. Avrithis. Personalized Content Retrieval in Context Using Ontological

    Knowledge. IEEE Transactions on Circuits and Systems for Video Technology, vol. 17, no. 3, pp. 336346, 2007.

  8. J. W. Hsieh, S. L. Yu, Y. S. Chen. Motion- based Video Retrieval by Trajectory Matching. IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 3, pp. 396409, 2006.

  9. K. W. Sze, K. M. Lam, G. Qiu. A New Key Frame Representation for Video Segment Retrieval. IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no.

9, pp. 11481155, 2005. [10] F. Jing, M. Li, H. J.

Zhang, B. Zhang. Relevance Feedback in Region-based Image Retrieval. IEEE Transactions on Circuits and Systems for Video Technology, vol. 14, no. 5, pp. 672681, 2004.

  1. S. Antani, R. Kasturi, R. Jain. A Survey on the Use of Pattern Recognition Methods for Abstraction, Indexing and Retrieval of Images and Video. Pattern Recognition, vol. 35, no. 4, pp. 945965, 2002.

  2. Y. H. Ho, C.W. Lin, J. F. Chen, H. Y. M. Liao. Fast Coarseto-fine Video Retrieval Using Shot-level Spatio-temporal Statistics. IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 5, pp. 642648, 2006.

  3. D. Farin, P. H. N. De. With. Enabling Arbitrary Rotational Camera Motion Using Multisprites with Minimum Coding Cost. IEEE Transactions on Circuits and Systems for Video Technology, vol. 16, no. 4, pp. 492506, 2006.

  4. Y. Su, M. T. Sun, V. Hsu. Global Motion Estimation from Coarsely Sampled Motion Vector Field and the Applications. IEEE Transactions on Circuits and Systems for Video Technology, vol. 15, no. 2, pp. 232242, 2005.

  5. J. C. Huang, W. S. Hsieh. Automatic Feature-based Global Motion Estimation in Video Sequences. IEEE Transactions on Consumer Electronics, vol. 50, no. 3, pp. 911 915, 2004.

  6. Y. P. Tan, D. D. Saur, S. R. Kulkarni, P. J. Ramadge. Rapid Estimation of Camera Motion from Compressed Video with Application to Video Annotation. IEEE Transactions on

    Circuits and Systems for Video Technology, vol. 10, no. 1, pp. 133146, 2000.

  7. L. Liu, X. Ye, M. Yao, S. Zhang. A Semantic Description Scheme of Soccer Video Based on MPEG-7. In Proceedings of the 5th Pacific Rim Conference on Multimedia, Lecture Notes in Computer Science, Springer-Verlag, Tokyo, Japan, vol. 3332, pp. 298305, 2004.

  8. D. W. Tjondronegoro, Y. P. Chen, B. Pham. Classification of Self-consumable Highlights for Soccer Video Summaries. In Proceedings of te 5th IEEE International Conference on Multimedia and Expo, IEEE Press, Taipei, PRC, vol. 1, pp. 579582 , 2004.

  9. G. Jin, L. Tao, G. Xu. Hidden Markov Model Based Events Detection in Soccer Video. In Proceedings of International Conference of Image Analysis and Recognition, Lecture Notes in Computer Science, Springer-Verlag, Porto, Portugal, vol. 3211, pp. 605612, 2004.

Leave a Reply