Enhancing Sign Language Detection through Mediapipe and Convolutional Neural Networks (CNN)
- URL: http://arxiv.org/abs/2406.03729v2
- Date: Tue, 27 Aug 2024 10:57:01 GMT
- Title: Enhancing Sign Language Detection through Mediapipe and Convolutional Neural Networks (CNN)
- Authors: Aditya Raj Verma, Gagandeep Singh, Karnim Meghwal, Banawath Ramji, Praveen Kumar Dadheech,
- Abstract summary: This research combines MediaPipe and CNNs for the efficient and accurate interpretation of ASL dataset.
The accuracy achieved by the model on ASL datasets is 99.12%.
The system will have applications in the communication, education, and accessibility domains.
- Score: 3.192629447369627
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This research combines MediaPipe and CNNs for the efficient and accurate interpretation of ASL dataset for the real-time detection of sign language. The system presented here captures and processes hands' gestures in real time. the intended purpose was to create a very easy, accurate, and fast way of entering commands without the necessity of touching something.MediaPipe supports one of the powerful frameworks in real-time hand tracking capabilities for the ability to capture and preprocess hand movements, which increases the accuracy of the gesture recognition system. Actually, the integration of CNN with the MediaPipe results in higher efficiency in using the model of real-time processing.The accuracy achieved by the model on ASL datasets is 99.12\%.The model was tested using American Sign Language (ASL) datasets. The results were then compared to those of existing methods to evaluate how well it performed, using established evaluation techniques. The system will have applications in the communication, education, and accessibility domains. Making systems such as described in this paper even better will assist people with hearing impairment and make things accessible to them. We tested the recognition and translation performance on an ASL dataset and achieved better accuracy over previous models.It is meant to the research is to identify the characters that American signs recognize using hand images taken from a web camera by based on mediapipe and CNNs
Related papers
- Deep Neural Network-Based Sign Language Recognition: A Comprehensive Approach Using Transfer Learning with Explainability [0.0]
We suggest a novel solution that uses a deep neural network to fully automate sign language recognition.
This methodology integrates sophisticated preprocessing methodologies to optimise the overall performance.
Our model's ability to provide informational clarity was assessed using the SHAP (SHapley Additive exPlanations) method.
arXiv Detail & Related papers (2024-09-11T17:17:44Z) - EvSign: Sign Language Recognition and Translation with Streaming Events [59.51655336911345]
Event camera could naturally perceive dynamic hand movements, providing rich manual clues for sign language tasks.
We propose efficient transformer-based framework for event-based SLR and SLT tasks.
Our method performs favorably against existing state-of-the-art approaches with only 0.34% computational cost.
arXiv Detail & Related papers (2024-07-17T14:16:35Z) - Mediapipe and CNNs for Real-Time ASL Gesture Recognition [0.1529342790344802]
This research paper describes a realtime system for identifying American Sign Language (ASL) movements.
The suggested method makes use of the Mediapipe library for feature extraction and a Convolutional Neural Network (CNN) for ASL gesture classification.
arXiv Detail & Related papers (2023-05-09T09:35:45Z) - Fine-tuning of sign language recognition models: a technical report [0.0]
We focus on investigating two questions: how fine-tuning on datasets from other sign languages helps improve sign recognition quality, and whether sign recognition is possible in real-time without using GPU.
We provide code for reproducing model training experiments, converting models to ONNX format, and inference for real-time gesture recognition.
arXiv Detail & Related papers (2023-02-15T14:36:18Z) - Combining Efficient and Precise Sign Language Recognition: Good pose
estimation library is all you need [2.9005223064604078]
Sign language recognition could significantly improve the user experience for d/Deaf people with general consumer technology.
Current sign language recognition architectures are usually computationally heavy and require robust GPU-equipped hardware to run in real-time.
We build upon the SPOTER architecture, which comes close to the performance of large models employed for this task.
arXiv Detail & Related papers (2022-09-30T17:30:32Z) - CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language
Representation Alignment [146.3128011522151]
We propose a Omni Crossmodal Learning method equipped with a Video Proxy mechanism on the basis of CLIP, namely CLIP-ViP.
Our approach improves the performance of CLIP on video-text retrieval by a large margin.
Our model also achieves SOTA results on a variety of datasets, including MSR-VTT, DiDeMo, LSMDC, and ActivityNet.
arXiv Detail & Related papers (2022-09-14T05:47:02Z) - Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble [71.97020373520922]
Sign language is commonly used by deaf or mute people to communicate.
We propose a novel Multi-modal Framework with a Global Ensemble Model (GEM) for isolated Sign Language Recognition ( SLR)
Our proposed SAM- SLR-v2 framework is exceedingly effective and achieves state-of-the-art performance with significant margins.
arXiv Detail & Related papers (2021-10-12T16:57:18Z) - Word-level Sign Language Recognition with Multi-stream Neural Networks Focusing on Local Regions and Skeletal Information [7.667316027377616]
Word-level sign language recognition (WSLR) has attracted attention because it is expected to overcome the communication barrier between people with speech impairment and those who can hear.
A method designed for action recognition has achieved the state-of-the-art accuracy.
We propose a novel WSLR method that takes into account information specifically useful for the WSLR problem.
arXiv Detail & Related papers (2021-06-30T11:30:06Z) - LeBenchmark: A Reproducible Framework for Assessing Self-Supervised
Representation Learning from Speech [63.84741259993937]
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing.
Recent works also investigated SSL from speech.
We propose LeBenchmark: a reproducible framework for assessing SSL from speech.
arXiv Detail & Related papers (2021-04-23T08:27:09Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation.
We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.