Adaptive Silhouette Extraction And Gait Recognition In Dynamic Environments Using Fuzzy Inference System

DOI : 10.17577/IJERTV1IS8110

Download Full-Text PDF Cite this Publication

Text Only Version

Adaptive Silhouette Extraction And Gait Recognition In Dynamic Environments Using Fuzzy Inference System

Naveen Rohila Research Scholar

Lingayas University Faridabad, Haryana India-121002

Abstract

Due to increasing crime rate there is need of an unobtrusive system which can identify the human. The system should be able to distinguish between normal, abnormal and suspicious walk of a person so that an alarming action may be taken well in time. Gait i.e. manner of walking, is unique and unobtrusive. Researchers are doing work on gait analysis since more than decade but still there are many challenges in this area. Results of gait recognition directly affected by quality of segmented silhouettes. This paper presents a novel method for foreground segmentation in real world applications. This method is capable of detaching objects like purse, paper, bag etc. carrying by person under investigation. Features are extracted in colour space, process them to update background model. Fuzzy logic with Bayesian classification is used to deal with real world constraints.

Videos are recorded at different locations, processed and analysed to check the robustness and accuracy of algorithm and results are comparatively better than other methods like Gaussian Classification or nearest neighbour method.

  1. Introduction

    Gait is manner of walking. It has many advantages over other biometrics. It can be observed at a distance, uses no additional skill on the part of a person who is under investigation and may be performed without his active participation. All of these advantages make it valuable particularly in surveillance systems.

    Gait recognition is directly linked with quality of segmentation of the walking person. Extracting a human silhouette from a video sequence is a challenging task specially when he is carrying some luggage or is in contact with some article and

    background is dynamic i.e consisting of traffic, running fountains, movement of tree leaves etc.

    In segmentation, silhouette extraction, namely, segmenting a human body or objects from a background, is the most important step in gait recognition. One central task in human silhouette extraction is background modeling [2,7]. Once a background model is prepared, walking humans in the video frames can be detected as the difference between current video frame and background model.

    This paper presents an accurate and robust silhouette extraction and human tracking algorithm which is capable of operating in real-world unconstrained environments with complex and dynamic background.. Video cameras are used to collect data of walking persons. Videos are captured at Govt. Polytechnic for Women, Faridabad in institution galleries, in front of institution gate and in open areas while students and staff are moving. Ninety Three persons including staff and students participated in these experiments. From these videos, we extract important gait information to perform automated gait recognition. Most important steps of gait recognition are to extract human body from background and fill the corresponding image region with white pixels so as to block the identifying features. Then for gait analysis we extract feature information from the human silhouette and used statistical models to model and identify gait.

  2. Major Challenges

    In real-world environments, there is need of an efficient and robust silhouette extraction and human tracking algorithm able to deal with following major challenges:

    1. Lighting is changing from morning to evening due to changing in sunlight,

      indoor lights turned on or off, adjustment of blinds, etc.

    2. Passing vehicles in front of main gate and parking area where camera was fixed.

    3. The background is dynamic like movements of tree leaves, running fountains, flying birds etc. Dynamic nature of background is one of major sources of false alarm in video based surveillance systems.

    4. The accuracy and robustness of silhouettes extracted. An accurate extraction is essential for gait analysis and modeling. Abnormal gait detection is totally based on feature information extracted from human silhouette.

  3. Related work

    Nikolaos V. Boulgouris, Dimitrios Hatzinakos, Konstantinos N.(Kostas) Plataniotis in [1], provide an overview of basic research directions in the field of gait analysis and recognition. It discusses about biometric and non-biometric applications of gait recognition, gait cycle, holistic approaches like PCA etc. And modular approaches like HMM, Kinematic model etc. It also compares the experimental assessment of various methods and their results. It also discusses about future advances in gait analysis and recognition in surveillance applications, analysed the gait signals by calculating the different static and dynamic parameters. Zongyi Liu and Sudeep Sarkar in [2] claim that gait recognition can be improved after normalization of dynamics and accurate shape information. Population Hidden Markov Model is defined for a set of individuals. The distance between two gait sequences is defined between two corresponding dynamics-normalized gait cycles, which quantify by the sum of distances between corresponding gait sequences. Linear discriminate Analysis is applied to maximize the discrimination between persons. In [3] ,[4] Mark S. Nixon, John N Carter, compare the gait and other biometrics. It also discusses the early approaches used, their results and limitations. It also discusses about recent approaches used, their results. It also discusses about different experimental databases available for this purpose and their comparison. Future aspects also have been discussed. This paper discusses about recognition of person by his gait style by applying statistical techniques and used HiD database for this purpose. Pavan Turaga, Rama Chellappa, V.S. Subrahmaniam and Octavian Druea, in [5] give a comprehensive survey of efforts in the past couple of decades to address the problems of representation, recognition and learning of human activities from video and related problems. Two major levels of complexities

    have been discussed 1. Actions 2. Activities. Actions are characterized by simple motion patterns while activities are more complex and involve co-ordinated actions. Probabilistic Petri Nets, HMM, and syntactic approaches are discussed to represent activities and to control sensors. In [6], Dacheng Tao1, Xuelong Li, Xindong Wu, and Stephen J. Maybank introduce a set of Gabor based human gait appearance models, because Gabor functions are similar to the receptive field profiles in the mammalian cortical simple cells. The very high dimensionality of the feature space makes training difficult. In order to solve this problem they propose a general tensor discriminant analysis (GTDA), which seamlessly incorporates the object (Gabor based human gait appearance model) structure information as a natural constraint. GTDA differs from the previous tensor based discriminant analysis methods in that the training converges.

    In [7] l. Wang, T. Tan, H.Ning, and W. Hu give a simple but efficient gait recognition algorithm using spatial-temporal silhouette analysis. For each image sequence, LDM-based human body pose recovery are then developed to estimate the LDM parameters from both manually labelled and automatically extracted silhouettes, where the automatic silhouette extraction is through a coarse- to-fine localization and extraction procedure. The estimated LDM parameters are used for model- based gait recognition by employing the dynamic time warping for matching and aopting the combination scheme in AdaBoost.M2. While the existing model-based gait recognition approaches focus primarily on the lower limbs, the estimated LDM parameters enable us to study full-body model-based gait recognition by utilizing the dynamics of the upper limbs, the shoulders and the head as well. In the experiments, the LDM-based gait recognition is tested on gait sequences with differences in shoe-type, surface, carrying condition and time. In [8], C.Y. Yam, M.S. Nixon and J.N. Carter analyse that current approaches are mostly statistical and concentrate on walking only. By analysing leg motion we show how we can recognise people not only by the walking gait, but also by the running gait. This is achieved by either of two new modelling approaches which employ coupled oscillators and the biomechanics of human locomotion as the underlying concepts. These models give a plausible method for data reduction by providing estimates of the inclination of the thigh and of the leg, from the image data. Both approaches derive a phase-weighted Fourier description gait signature by automated non- invasive means. One approach is completely automated whereas the other requires specification

    of a single parameter to distinguish between walking and running.

    In [9], Zongyi Liu and Sudeep Sarkar provide conclude that gait recognition can be improved after normalization of dynamics and accurate shape information. Population Hidden Markov Model is defined for a set of individuals. The distance between two gait sequences is defined between two corresponding dynamics-normalized gait cycles, which quantify by the sum of distances between corresponding gait sequences. Linear discriminate Analysis is applied to maximize the discrimination between persons. In [10], Nixon discuss that Automatic recognition by gait is subject to increasing interest and has the unique capability to recognize people at a distance when other biometrics are obscured. Its interest is reinforced by the longstanding computer vision interest in automated non-invasive analysis of human motion. Its recognition capability is supported by studies in other domains such as medicine (biomechanics), mathematics and psychology, which continue to suggest that gait is unique. Further, examples of recognition by gait can be found in literature, with early reference by Shakespeare concerning recognition by the way people walk. Current approaches confirm the early results that suggested gait could be used for identification, and now on much larger databases. This has been especially influenced by the human ID at a distance research program with its wide scenario of data and approaches. Gait has benefited from the developments in other biometrics and has led to new insight particularly in view of covariates. As such, gait is an interesting research area, with contributions not only to the field of biometrics but also to the stock of new techniques for the extraction and description of objects moving within image sequences. In [11] Baofeng Guo and Mark

    S. Nixon discuss about how to discard irrelevant and redundant information and to identify the most important attributes. The specific technique applied based on mutual information, which evaluates the statistical dependence between two random variables and has an established relation with Bayes classification. Experiments are carried based on a 73-dimensional model based gait features set and on a 64 by 64 pixels model free gait symmetry map on the Southhampton HiD Gait Database. In

    [12] G.C. Hapsari and A.S. Prabuwono give a review of human motion Analysis, the methodologies of human motion surveillance, choice of system and how they work are have been explained. The future prospectus of gait in home appliances for fire alarm system and remote robotic surgery have been discussed. In [13], M. Sujaritha and S. Annadurai propose an Adaptive spatial Gaussian mixture model for clustering based color image segmentation. A new clustering objective function which incorporates the spatial information

    is introduced in the Bayesian framework. The weighting parameters for controlling the importance of spatial information is made adaptive to the image content to augment the smoothness towards piecewise homogeneous region and diminish the edge blurring effect and hence the name adaptive spatial finite mixture model. In [14], Lun-Yu Chang and Winston H. Hsu propose a new foreground detection method using the static cameras. It merges multi-modality into Markov Random Fields (MRF) energy function and performs much better results than conventional methods. Not only color appearance of the frame has been considered, but spatial constraints of the foreground object also has been considered. Therefore more precise shape of foreground is obtained. In [15], Chris Stauffer and W.E.L Grimson discuss modelling each pixel as a mixture of Gaussians and using an on-line approximation to update the model. The Gaussian distributions of the adaptive mixture model are then evaluated to determine which are most likely to result from a background process. Each pixel is classified based on whether the Gaussian distribution which represents it most effectively is considered part of background model. In [16], Jaime Gallego, Montse Pardas, Gloria present a segmentation system for monocular video sequences with static camera with aims at foreground/background separation and tracking. A pixel wise model for background is combined with a general purpose region based model for foreground. The background is modelled using one Gaussian per pixel, thus achieving a precise and updated model.

  4. Overview of Algorithm

    In our approach the camera is fixed. Whenever motion is detected camera start recording. But at this time it is to decide whether this motion is in background or foreground. The features of brightness and chromaticity are invariant in an image. The distortion in brightness and chromaticity of foreground and background are used to decide whether the motion is in foreground or background. Instead of making computation on each pixel computations are done on the block of four pixels to save computation time. If motion is in foreground then moving object is detached and silhouette is generated. If motion is in background then background model is updated as shown in Figure 1. Features are extracted and stored in database which are used during training and classification. As the background is time varying so it is necessary that background model should be adaptive[15]. Periodical updations do not give good results in some cases when time period of updations is very small or very big and some time lead to false alarm. High level knowledge is fused with low level feature-based classification results

    to handle time-varying backgrounds and fuzzy

    of red, green and blue components of

    ith pixel in

    logical inference system is used to detach moving

    RGB colour space. [

    (i),

    (i),

    (i)]

    and

    objects, shadows and objects carried by walking persons from the human silhouettes.

    R G B

    [ R (i),G (i), B (i)] are mean and standard deviation of these colour components.

    Under brightness changes chromaticity distortion remains invariants. But for foreground and background the chromaticity is different. In proposed algorithm the background modelling and classification scheme, the past data frames are

    used to compute joint distribution of i ,i to

    build background model. For each block the brightness and chromaticity distortions are calculated. Using these two features and background model the foreground is separated from background. A fuzzy logic inference system is based on the principle that if chromaticity distortion is greater than threshold, then block belongs to foreground otherwise background. If chromaticity distortion is very small and brightness distortion close to one, the block is considered as shadow.

    Figure 1. Working Algorithm of Gait Recognition

  5. Feature Extraction

  6. Features extracted are of two types. First, those which detach the foreground and generate

    2

    Figure 2. Background Modeling in RBG Space

    silhouettes. And second, which used to identify a person. These feature variables should be invariants

    i arg min I i

    i i

    R

    R

    under brightness changes and also in RGB colour

    I (i) (i)

    I (i)

    (i)

    I (i) (i)

    space. The features in first category are brightness

    2

    G G

    2

    B B

    2

    and chromaticity distortion while features in other

    R (i) G (i)

    B (i)

    category are aspect ratio, diagonal angle and centroid which are used during experiments.

    (i) 2

    R

    (i) 2

    G

    B (i)

    If the brightness distortion factor

    i and

    R (i)

    G (i)

    Equation 1

    B (i)

    chromaticity distortion i for pixel i in RGB colour space when background colour vector is i and Ii is observed colour vector space as shown in Figure 2 is calculated in equation 1 and 2 respectively.

    Where [IR (i), IG (i), IB (i)]

    represent the values

    i Ii i i

    2 2

    5.2 Detaching Moving Objects Using

    2 Fuzzy Logic

    IR (i) i R (i) IG (i) i G (i) IB (i) i B (i)

    R (i)

    G (i)

    B (i)

    To detach moving objects like a person carrying papers or some other thing should be classified as

    Equation 2

      1. Kno wledge Based Adaptive Background Update

        Selecting size of shifting window is a major problem which should be dealt carefully as if it is very small then background will be updated very fast and if the person does not move for a while due to some reason then he will be updated as background. If it is very large then background will be updated very slowly due to which if a new object is introduced into background or a background object is moved, then before background is updated, this object will be classified into the foreground and hence become part of silhouette which may lead to false alarming.

        background. If it is counted as part of silhouette then recognition to person will not be accurate and recognition rate will fall down.

        Let the current frame is n and human silhouette in frame n-1 has already been correctly extracted, denoted by on1 . Let the foreground image region

        in the frame n be on , which might contain the human body and moving objects. Now, it is required to establish the fuzzy logic inference system to preserve the human body in frame n in detaching blocks that correspond to non-human artefacts. The fuzzy rules are based on following observations:

        1. If an image block in on belongs to human body, it should have a high probability of

          This problem has been solved by updating the

          finding good match in

          on1 . The sum of

          blocks of image which belongs to human body very slowly so that human body is not updated as background. The blocks which are outside the predicted body region can be updated much faster so that objects of background changing position can be absorbed into background.

          absolute difference(SAD) between these two image blocks is a measure of goodness of matching. Smaller the SAD better the block matching.

        2. If many blocks in its neighbourhood have good human body matches in on1 , it is

          Let

          p (xn , yn )

          and

          d (wn , hn )

          represent the

          highly possible that this block also belongs to human body.

          centroid and dimensions (width and height) of a human body silhouette in frame n. vn (xn , yn ) is motion vector of the centroid from frame n-1 to frame n. Then position and dimension and hence bounding box of human body can be calculated in frame n+1 from the following equations.

        3. If this block is far from the predicted position of human body centroid, there is a small possibility that this block belongs to human body.

          Based on above observations, the following features are extracted from each block in on :

          • SAD in motion matching- For each

          Equation3

          Equation 4

          L

          i0

          pn1 ai ( pni vni )

          L

          i0

          dn1 bi dni

          block in on , we find its best match in the range of 16×16 blocks centred at the position of the observed block in frame n-1. The SAD between this block and its best match in frame n-1 forms the first feature variable.

          • Neighbourhood ratio- The fraction of neighbouring blocks that have a good

            here L is the number of previous frames used for bounding box prediction, ai and bi are weight parameters, which are greater when the frame is closer to the current frame and is previous centroid and is previous dimensions.

            match in the previous body silhouette.

            • Spatial distance- The distance between the new block and predicted human body centroid.

          The fuzzy rules defined in our program to extract silhouette of foreground human body are:

          1. If SAD is very small AND NR is large AND D is small then SC is high.

          2. If SAD is large) AND NR is small AND D is very large then SC is low.

          3. If SAD is low AND NR is large AND D

            c

            Equation 6

            N

            1

            N i1

            di d Dc

            is medium then SC is high.

          4. If SAD is medium AND NR is medium AND D is medium then SC is medium.

          5. If SAD is large AND NR is medium

            I

            Equation 7

            N

            1

            N i1

            di d DI

            N

            AND D is medium then SC is medium.

            2 1 N

            2

          6. If SAD is large AND NR is large AND D is small then SC is high.

          7. If SAD is medium AND NR is large AND D is medium then SC is high.

          8. If SAD is medium AND NR is large

            c

            Equation 8

            ( di c )

            i1

            d Dc

            AND D is small then SC is high.

          9. If SAD is very large AND NR is small AND D is large then SC is low.

            2

            1 ( d

            N

            N i1

            )2 d D

            I

            i I I

          10. If SAD is medium AND NR is small AND D is very large then SC is low.

          11. If SAD is low AND NR is huge AND D is medium then SC is high.

          12. If SAD is low AND NR is medium AND D is medium then SC is high.

    Where SAD = Sum of absolute distance, NR = Neighbouring Ratio, D = Spatial Distance and SC = Silhouette Confidence.

    Equation 9

    Covariance matrices are diagonal and calculated by using equation 10 for each class separately.

    diag{ 2, 2,… 2 }

    1. Gait Classification using Bayesian

      Equation 10

      1 2 m

      Classification

      After doing the segmentation next step is classification. We used Bayesian classification for this purpose. A subject whose gait vector is stored

      Now, a new sequence of similarity is to describe over the known ones stored in database using the information calculated by equations 6 to 10. Likelihood of intra and inter class [16] is calculated by following equations

      in data base is ic

      and the vector of subject whose

      P(d

      ) 1

      identity is to verify is itest . Then Euclidean distance between known and unknown vector can be calculated as

      d itest ic

      Equation 5

      The variation in the value of vector d is the measure of whether the subject is identified or not. The value of d varies in two ways intra class

      C

      Equation 11

      P(d I )

      Equation 12

      d c

      1 e bc

      1

      (d I )

      1 e bI

      distance dc

      and inter class distance

      dI . And thus

      Where

      <>c P(d c )

      two training sets have been created. Mean, variance covariance for each set is calculated for both sets using equations 6,7,8 and 9.

      Equation 13

      P( d ) P(d

      1. ) P(d I )

        P(c

      P(d c)P(c)

      P(d)

      , P(c) P(I)

      Equation 14

      And the posterior probability for inter class and intra class is

      calculated by assuming that initially both are equal and given in equation 14,15 and 16.

      P(d ) P(c)P(d c ) P(I )P(d I ) Equation 15

      hand. Some students are moving comparatively at more distance from camera than one student. And other three students are moving together.

        1. Challenges

          1. Sunlight from windows.

          2. Reflection in floor

          3. There are more than one person walking at a time in video.

          4. One student is having paper and mobile phone

          5. Some students are moving

            Equation 16

            P( c

            d )

            P(d c )

            P(d c ) P(d I )

            together

          6. Students are at different distances from camera

        2. Results

      A suitable threshold Th is chosen to classify whether the vector under test belongs to class c or I

    2. Experimental Results

      For these experiments videos were recorded at different locations keeping the camera fixed. For this purpose we used SONY HVR-Z7U video camera. Videos were recorded at 25 frames per second in real world conditions.

      For persons moving singly and parallel to camera in real world conditions is 91.3%. Silhouettes extracted using fuzzy inference system are compared with that of spatial Gaussian, updated median background and morphological technique as shown in figure3.

      Figure 3. Silhouette extracted using (a) Fuzzy inference system (b) spatial Gaussian (c) updated median background (d) morphological technique

      To prove the robustness of algorithm the videos are recoded in corridors as shown in figure 4 where students are moving in corridors and sunlight is coming from windows. One student is having a paper in one hand and a mobile phone in another

      As seen from figure 4 the final result from fuzzy inference system method is able to suppress the sunlight, reflection and article touching the body parts. But to classify there should be one moving person at a time in frame. For more than one the method is silent.

      This problem may be resolved by selecting the person nearest to camera and suppressing the others and then normalize it. Now the case becomes a video where a single person is walking. Now, the recognition results are up to 92.3%. The different values of parameters during segmentation process are SAD= 0.354, Neighbourhood= 7.74, Distance=

      28.671, Silhouette= 0.817 as shown in figure 5.

      Figure 4. Different stages of silhouette extraction using fuzzy inference system

      Figure 5. Different values of parameters while applying fuzzy inference technique

      Figure 6 shows the Receiver Operating Characteristic curve(ROC) between FAR and FRR with different threshold values. These curves are quite helpful in selecting correct value of threshold so that FAR and FRR both can be kept minimum. In our case it is 3%.

    3. Conclusion

      Although the fuzzy logic inference system is able to remove the articles from the silhouette in which the body is in contact, shadows and refelection but for person wearing muflar in the neck or wearing shawl still a challenge as shown in figure 6. In all cases given in figure 7 in the recognition rate is below 50%.

      Figure 6. Receiver operating characteristic curves

      Figure 7. Cases where recognition rate is less than 50%.

      More than one persons in the testing videos is still a challenge. Selection of one person and to dethatch the rest with background is still a manual process and is an area of more research.

    4. References

    1. Nikolaos V. Boulgouris, Dimitrios Hatzinakos, Konstantinos N.(Kostas) Plataniotis ,Gait Recognition: A challenging signal processing technology for biometric identification, IEEE Signal Processing Magzine, Nov, 2005, page 78-89

    2. Zongyi Liu and Sudeep Sarkar, Improved Gait Recognition by Gait Dynamics Normalization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 28, No. 6, June 2006

    3. Mark S. Nixon, John N Carter, On Gait as a Biometric: Progress and Prospects, MSNJNC 2004, page No. 1401-1404

    4. Mark S. Nixon and John N. Carter, Automatic Recognition by Gait, Proceedings of IEEE, 94(11):2013-2024, Nov, 2006, ISSN 00189216

    5. Pavan Turaga, Rama Chellappa, V.S. Subrahmaniam and Octavian Druea, Machine Recognition of Human Activites: A Survey, IEEE transactions 2008

    6. Dacheng Tao1, Xuelong Li, Xindong Wu, and Stephen J. Maybank, Human Carrying Status in Visual Surveillance, Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR06)-7695-2597-0/06

    7. l. Wang, T. Tan, H.Ning, and W. Hu, Silhouette Analysis based Gait Recognition for Human Identification, IEEE transactions on Pattern Analysis and Machine Intelligence, vol. 25, no. 12, pp. 1505-1518

    8. C.Y. Yam, M.S. Nixon and J.N. Carter, Automated person Recognition by Walking and Running via model- based approaches, Pattern Recognition, vol. 37, no. 5, pp. 1057-1072, 2004

    9. Zongyi Liu and Sudeep Sarkar, Improved Gait Recognition by Gait Dynamics Normalization, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, No. 6, June 2006

    10. Mark S. Nixon and John N. Carter, Advances in Automatic Gait Recognition, Proceeding of

      International Conference on Automatic Face and Gesture Recognition, pp. 139-146, 2004

    11. Baofeng Guo and Mark S. Nixon, Gait Feature Subset Selection by Mutual Information, IEEE Transactions on Systems, Man, and Cybernetics-Part A:Systems and Humans, Vol 39, No. 1, January 2009

    12. G.C. Hapsari and A.S. Prabuwono, Human Motion Recognition in Real-time Surveillance System: A Review, Journal of Applied Science, ISSN 1812-5654, 10(22): 2793-2798, 2010

    13. M. Sujaritha and S. Annadurai, Colour Image Segmentation using Adaptive Spatial Gaussian Mixture Model, Journal of World Academy of Science,

      Engineering and Technology, 2010

    14. Lun-Yu Chang and Winston H. Hsu., Foreground Segmentation for Static Video via Multi-Core and Multi- Model Graph-Cut, IEEE Conferences Multimedia and Expo, 2009. ICM

    15. Chris Stauffer and W.E.L. Grimson, Adaptive background mixture models for real time tracking, Tutorial: The Artificial Intelligence Laboratory,

      Massachusetts Institute of Technology, Cambridge

    16. Jaime Gallego, Montse Pardas, Gloria , Baysian Foreground Segmentation and Tracking using Pixelwise Background Model and Region Based Foreground Model,

Leave a Reply