Hand Gesture Detection For Human-computer Interaction: A Natural Language Processing Approach

DOI : 10.17577/NCRTCA-PID-072

Download Full-Text PDF Cite this Publication

Text Only Version

Hand Gesture Detection For Human-computer Interaction: A Natural Language Processing Approach

Rahul Joshi, [PG Scholar], MCA,

Dayananda Sagar College of Engineering[DSCE] , Bengaluru, Affliated to VTU

Dr.Vibha MB, [Assistant Professor], MCA, Dayanada Sagar College of

Engineering[DSCE] ,

Bengaluru, Affliated to VTU

Abstract Technology for human-computer interaction (HCI) that recognises hand gestures seems promising. Without a keyboard or mouse, it lets users to interact with computers in a more natural and intuitive way. Systems that recognise hand gestures more accurately and efficiently can benefit from natural language processing (NLP). We offer a unique technique to hand gesture recognition using NLP in this research. Our approach first uses NLP to extract features from hand gestures. These features are then used to train a classifier to recognize hand gestures. We evaluated our approach on a publicly available dataset of hand gesture data. Our results showed that our approach can achieve a high accuracy for hand gesture recognition. Our work has several implications for HCI. First, our method may be utilised to enhance the performance and accuracy of current hand gesture recognition systems. Second, our method may be utilised to create new hand gesture recognition systems that are easier to use and more user-friendly. Third, our approach can be used to develop new HCI applications that use hand gestures for input. We believe that our work is a significant contribution to the field of HCI. [1] We believe that our approach can be used to develop new HCI applications that are more natural and intuitive for users.

Keywords Hand gesture recognition, natural language processing, human-computer interaction

  1. INTRODUCTION

    Hand gesture recognition is a promising technology for human-computer interaction (HCI). It allows users to interact with computers in a more natural and intuitive way, without the

    need for a keyboard or mouse. Hand gesture recognition systems may be made more precise and effective by utilising natural language processing (NLP).

    we offer a unique technique to hand gesture recognition using NLP in this research. First, we utilise NLP to extract characteristics from hand motions. Following that, these attributes are utilised to train a classifier to recognise hand movements. We tested our method using a publicly available collection of hand gesture data. Our results demonstrated that our method can achieve high accuracy in hand gesture recognition.

  2. Related Work

    Hand gesture detection and tracking has sparked considerable interest in the fields of Human- Computer Interaction (HCI) and sign language recognition. Researchers have been studying the use of glove-based sensors for monitoring hand placement and shape, particularly in the context of virtual reality. While this technology has demonstrated high accuracy and speed in collecting hand postures, its value is restricted in some contexts due to the limitations imposed by tethered wires, which inhibit natural hand motions. [2]

    In contrast, computer vision techniques offer the advantage of measuring hand postures and locations from a distance, allowing for unhindered movement. The vision community has explored various avenues to extract human skin regions, employing techniques such as background subtraction or skin-color segmentation. However, background subtraction methods prove impractical when dealing with images containing complex backgrounds or real-world situations

    where users require on-the-go application usage. Once the system identifies the relevant image regions, they can be further analysed to estimate hand postures.

    When it comes to detecting and tracking finger motions, for example, a prominent approach entails extracting hand regions and then recognising the fingertip to create the pose orientation. A 3D pointing interface that employed image processing to anticipate the posture of a pointing finger action was demonstrated in one example. [2.1] However, due to its reliance on a set threshold for image binarization and predetermined finger length and thickness parameters, this system has limits in real-world applications. Furthermore, low-cost web cameras and infrared cameras have been employed for finger recognition and tracking, with methodologies such as fitting a cone to rounded features or employing template matching to distinguish a limited range of motions being used.

    The use of computer vision algorithms to recognise hand gestures has been the subject of extensive study. Notably, a number of well-liked features have appeared in computer vision-based hand gesture detection systems, such as:

    • Hand shape features: These features revolve around the hand's physical shape, encompassing aspects such as finger length, palm width, and finger angle.

    • Hand position features: These features focus on the hand's spatial positioning within the image, encompassing coordinates such as the x- and y- coordinates of the hand's centre, as well as the angle of the hand.[2.2]

    • Hand movement features: These features capture the hand's motion over time, incorporating parameters such as velocity and acceleration.

    However, it is crucial to acknowledge that computer vision-based hand gesture recognition systems are susceptible to noise and variations stemming from the diverse ways gestures can be performed. Individuals may execute the same gesture differently, and even a single individual

    may alter their gesture depending on the context in which it is employed.

  3. Proposed Approach

  1. Segmentation

    The first step in our approach to hand gesture recognition using NLP is to segment the hand gesture image into sequences of words. This is done by using a technique called word spotting. Word spotting is a technique for finding words in images.[3] There are a number of different word spotting algorithms that can be used. In our work, we used a technique called bag-of-words (Bow) word spotting. Bow word spotting works by first creating a vocabulary of words that are likely to appear in hand gesture images. This vocabulary is created by extracting the most frequent words from a large dataset of hand gesture images.

    Once the vocabulary has been created, the Bow word spotting algorithm can be used to find words in a new hand gesture image. The algorithm works by first dividing the image into a grid of cells. Each cell is then assigned a label, which is the word that is most likely to appear in that cell. The output of the Bow word spotting algorithm is a sequence of words, which represents the words that were found in the hand gesture image.

  2. Feature extraction

    The second step in our approach to hand gesture recognition using NLP is to extract features from the sequences of words. The features that we extract include the frequency of each word in the sequence, the position of each word in the sequence, and the relationship between words in the sequence. The frequency of each word in the sequence is a measure of how often the word appears in the sequence.[4] The position of each word in the sequence is a measure of where the word appears in the sequence. The relationship between words in the sequence is a measure of how the words are connected to each other in the sequence.

    For example, the word "hello" is more likely to appear in a sequence that represents the "hello" gesture than in a sequence that represents the "goodbye" gesture. The word "hello" is also ore likely to appear at the beginning of a sequence

    that represents the "hello" gesture than at the end of the sequence.

    The features that we extract from the sequences of words are used to train a machine learning classifier.

  3. Classification

The third step in our approach to hand gesture recognition using NLP is to classify the sequences of words into hand gesture categories. This is done by using a machine learning classifier. We used a support vector machine (SVM) classifier in our work. SVM classifiers are a type of machine learning classifier that are known for their accuracy. [4.1]

Fig1.1: Phases of pattern recognition

The SVM classifier is trained on a dataset of hand gesture images that have been labelled with their corresponding hand gesture categories. The classifier is then used to classify new hand gesture images. The accuracy of the SVM classifier depends on the quality of the training data. If the training data is of high quality, then the SVM classifier will be able to classify new hand gesture images with high accuracy.

Fig 1.2: Computation of ML process

Fig1.3: Architecture of Process

IV. Literature Review

Hand gesture recognition has been an active area of research for many years, with a wide range of techniques being proposed. Early approaches to hand gesture recognition relied on glove-based devices, which tracked the movement of the fingers and hand. However, these devices were often bulky and uncomfortable to wear, and they could not be used in all environments.

An increasing number of people are interested in employing computer vision techniques to recognise hand gestures in recent years. It is possible to extract characteristics from hand gestures, such as the hand's form, location, and movement, using computer vision algorithms. A machine learning classifier may then be trained using these attributes to recognise hand gestures. [5]

In recent years, there has also been interest in using multimodal approaches to hand gesture recognition. Multimodal approaches combine computer vision techniques with other techniques, such as speech recognition or eye tracking. This can help to improve the accuracy of hand gesture recognition by providing additional information about the user's intent.

NLP-based approaches to hand gesture recognition have recently emerged as a promising new direction. These approaches use natural language processing techniques to extract the semantic meaning of hand gestures. This can be

accomplished by studying the sequence of words used to explain a hand gesture or the relationship between words in a phrase. NLP-based approaches have the potential to overcome some of the limitations of traditional approaches to hand gesture recognition. They are not sensitive to noise or variations in the way that gestures are performed, and they can handle complex hand gestures that are difficult to represent using traditional approaches.[6]

* Formula to train dataset*

NLP-based methods, however, are still in their infancy, and considerable work has to be done. One of the challenges is to develop NLP techniques that are able to accurately capture the semantic meaning of hand gestures. Another challenge is to develop NLP-based approaches that are robust to noise and variations in the way that gestures are performed.

Despite these challenges, NLP-based approaches to hand gesture recognition have the potential to revolutionize the way that hand gesture recognition is used in HCI. They could enable the development of more natural and intuitive user interfaces that are easier to use for everyone.

  1. Evaluation Methodology and

    Metrics

    The evaluation methodology and metrics used to assess the performance of the proposed NLP- based hand gesture recognition model are described in this section. The selection of appropriate metrics, cross-validation techniques, and benchmark datasets used for comparison are all discussed.

    1. Evaluation Metrics

      The following metrics were used to evaluate the performance of the proposed model:

      • Accuracy: Accuracy is the percentage of hand gestures that were correctly classified.

      • Precision: Precision is the percentage of hand gestures that were classified as the correct category.

      • Recall: Recall is the percentage of hand gestures in the correct category that were correctly classified.

      • F1-score: The F1-score is a weighted average of precision and recall.

    2. Cross-Validation

      A method for assessing a machine learning model's performance is cross-validation. The data is split into a training set and a test set for cross- validation. The test set is used to assess the model's performance after it has been trained using the training set.

      Tenfold cross-validation was employed to assess the suggested model. The data is split into 10 equal portions for 10-fold cross-validation. Nine of the sections are used to train the model, while the final portion is used to assess how well it performed. The outcomes are averaged after this process has been done ten times.

    3. Benchmark Datasets

    Two benchmark datasets were used to assess the proposed model.

    • MSR Action3D: 3D hand gesture pictures make up the MSR Action3D collection. Three thousand photos make up the dataset, which is broken down into ten categories.[7]

    • NTU RGB+D 120: Hand motions are shown in RGB and depth in the NTU RGB+D 120 dataset. There are 120 categories in the collection, which comprises of 12,000 photos.

    Fig1.4: Algorithm Dataflow diagram of process

    Procedure Execution

    • Import the Essential Libraries: Start by importing the necessary libraries such as OpenCV, NumPy, and TensorFlow. These libraries provide the required functionalities for image processing and machine learning tasks.

    • Load the Pre-Trained Model: Load a

      pre-trained model specifically designed for hand gesture recognition. TensorFlow offers various pre-trained models that can be utilized for this purpose. Loading the pre-trained model is crucial for leveraging its trained knowledge.

    • Initialize the Camera: Initialize the camera to capture real-time video frames. This allows you to continuously receive input from the camera, enabling live hand gesture detection.

    • Define the Region of Interest (ROI): Specify the region of interest where you want to detect hand gestures. Choosing a precise region of the video frames that is focused on the hand throughout this stage is necessary. The hand gesture detection algorithm's precision is improved by separating the hand from the surrounding area.

      • Hand Gesture Detection Loop: Create a loop that performs the hand gesture

        detection process continuously. Within this loop:

        • Capture a frame from the camera.

        • Preprocess the frame by resizing it to a standardized size and converting it to the appropriate format required by the pre- trained model.

        • Pass the pre-processed frame through the loaded pre-trained model for hand gesture recognition.

        • Extract the predicted class or label from the model's output.

        • Display the predicted hand gesture class on the frame using OpenCV's text drawing functions.

        • Draw a rectangle around the defined region of interest (ROI) on the frame to visually indicate the area being considered for hand gesture detection.

        • Display the frame with the predicted hand gesture class and the marked ROI rectangle. h. Continue the loop until a termination condition is met, such as pressing a specific key to exit the program.

      • Release Resources: Once the loop is terminated, release the camera and close any open windows or resources to ensure proper cleanup.

    By following this algorithm, you can develop a functional hand gesture detector using Python, OpenCV, and TensorFlow. It provides a step-by- step guide to importing libraries, loading a pre- trained model, initializing the camera, defining the region of interest, running the hand gesture detection loop, and releasing resources after completion.

  2. Algorithm

    • Importing Libraries

      1. Pandas: With Pandas, loading data frames in a 2D array format becomes effortless. It

        offers a plethora of functions that enable streamlined analysis tasks, allowing us to perform multiple operations in one go.

      2. NumPy: Known for its exceptional speed, NumPy arrays excel at performing large- scale computations in minimal time. This library is a go-to choose for handling numerical data efficiently.

      3. Matplotlib: Visualization plays a crucial role in data analysis and understanding. Matplotlib comes in handy for creating a wide range of visualizations, enabling us to effectively present data and gain valuable insights.

      4. TensorFlow: In the domains of artificial intelligence and machine learning, TensorFlow is a popular open-source framework. It makes it possible for developers to quickly create complex capabilities. It provides a wide range of tools and features that make building intricate models and algorithms simpler.

    • Importing Dataset

      The dataset consists of the CSV files sign_train.csv and sign_test.csv.

      • Data Loading and Preprocessing

    • Training and Testing Data

    • Data Visualization

    • Model Development and Training

    From this point on, the TensorFlow library will be used to build our CNN model. The Keras framework of the tensor flow library has all the capabilities needed to design a convolutional neural network's architecture and train it on data.

    • Model Evaluation

  3. Conclusion

    In this research paper, we proposed a novel approach to hand gesture detection for human- computer interaction using natural language processing (NLP). Our approach leverages NLP techniques to extract features from hand gestures and train a classifier for gesture recognition. We evaluated our approach on a publicly available dataset and achieved high accuracy in hand gesture recognition.

    The results of our study have several implications for the field of HCI. Firstly, our approach improves the accuracy and performance of existing hand gesture recognition systems, making interaction with computers more natural and intuitive. Secondly, it enables the development of new hand gesture recognition systems that can enhance user experiences in various domains. Lastly, our approach opens up possibilities for the development of new HCI applications that utilize hand gestures as an input modality.

    The future work in this area can focus on the development of more robust classifiers to handle complex and ambiguous hand gestures. Integrating hand gesture recognition with other HCI techniques, such as voice recognition or augmented reality, can further enhance user interaction. Additionally, the exploration of new applications and domains, such as healthcare or automotive, can expand the practical use of hand gesture recognition.

    Lastly, we would like to thank the research community for providing benchmark datasets and evaluation methodologies that enabled us to compare and validate our approach. Their efforts in advancing the field of hand gesture recognition have been instrumental in driving innovation and progress.

    In conclusion, our work demonstrates the potential of NLP-based hand gesture detection for HCI, and we hope that it inspires further research and development in this exciting area.

  4. FUTURE WORK

The Research and study conclude by highlighting the challenges and upcoming future directions in the field of hand gesture detection for HCI using a

natural language processing approach. It discusses the need for real-time performance, robustness to variations in lighting and hand poses, scalability, and multimodal integration.

  1. Real-Time Performance

    The ability to execute in real-time is one of the main obstacles facing hand gesture recognition. This is crucial for applications that involve manipulating equipment or interacting with virtual reality settings.

  2. Robustness to Variations in Lighting and Hand Poses

    Another challenge is the robustness of hand gesture recognition to variations in lighting and hand poses. This is a difficult problem because the appearance of hand gestures can change significantly depending on the lighting conditions and the way that the hands are posed.

  3. Scalability

    Another problem is the capacity to scale to a high number of hand motion types. This is critical for applications like sign language recognition.

  4. Multimodal Integration

    The integration of hand-gesture recognition with other HCI techniques, such as speech-recognition and eye tracking, is another promising direction for future work. This could lead to the development of more natural and intuitive user interfaces.

  5. Additional Information

    To create successful hand gesture recognition systems utilising NLP, there are a number of additional issues that must be resolved in addition to those already discussed. These difficulties include:

    • The need for a better representations of hand gestures.

    • The need for better feature extraction techniques.

    • The need for better machine learning algorithms.

      NLP has a lot of potential for enhancing the reliability and accuracy of hand gesture recognition systems despite these difficulties. I think that more in-depth study in this field will result in the creation of user interfaces that are more logical and accessible to users of various skill levels.

      REFERENCES

    • [1] Title: "Hand Gesture Recognition for Human- Computer Interaction" Author: Liang Wang Publisher: Springer Year: 2018 ISBN: 978-3-319- 69700-4(book)

    • [2][10]ACM Digital Library – https://dl.acm.org/

      [3][4]Smith, J., Johnson, A., & Lee, H. (2022). A Novel Approach to Hand Gesture Recognition Using Natural Language Processing. International Journal of Human-Computer Interaction

    • [2][5][7][8]Towards Data Science (https://towardsdatascience.com/)

    • [6]Gesture-Based Human-Computer Interaction and Simulation: 7th International Gesture Workshop, GW 2007" edited by Antonio Camurri and Gualtiero Volpe

    • [9]Hand Gesture Recognition: Challenges and Approaches" by Badrinath Roysam

    • [10]Medium (https://medium.com/)

    • Gesture Recognition: Principles, Techniques and Applications" by Ming-Hsuan Yang and Narendra Ahuja

    • DataCamp Blog

      (https://www.datacamp.com/community/blog)

    • hillier U., Hiirst W, and Duchnowski P (1996), Adaptive Biiodal Sensor Fusion for Automatic Speechreading Proc. Intern. Conference on Acoustics, Speech and Sig- nal Processing, ICASSP 1996

    • [3][7]Bnmelli, R, and Poggio, T (1993), Face recognition: fea- tures versus templates, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol. 15, No. 10, pp. 1042- 1052

    • [4.1]Kang, Yue. "Natural language processing (NLP) in management research: A review." Journal of Management Analytics 7.2 (2020)

      [2.1][2.2][1][10]T. Kadir, R. Bowden, E.J. Ong,

      and A Zisserman. Minimal training, large lexicon, unconstrainedsign language recognition. InProceedings of the BMVA British Machine Vision Conference,volume 2, pages 939 948,

      Kingston, UK, September 7 9 2004.

    • [4][8]T. Starner and A. Pentland. Real-time american sign language recognition from video using hiddenmarkov models.Computational Imaging and Vision, 9:227 244, 1997.