Combining Efficient and Precise Sign Language Recognition: Good pose
estimation library is all you need
- URL: http://arxiv.org/abs/2210.00893v1
- Date: Fri, 30 Sep 2022 17:30:32 GMT
- Title: Combining Efficient and Precise Sign Language Recognition: Good pose
estimation library is all you need
- Authors: Maty\'a\v{s} Boh\'a\v{c}ek, Zhuo Cao, Marek Hr\'uz
- Abstract summary: Sign language recognition could significantly improve the user experience for d/Deaf people with general consumer technology.
Current sign language recognition architectures are usually computationally heavy and require robust GPU-equipped hardware to run in real-time.
We build upon the SPOTER architecture, which comes close to the performance of large models employed for this task.
- Score: 2.9005223064604078
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Sign language recognition could significantly improve the user experience for
d/Deaf people with the general consumer technology, such as IoT devices or
videoconferencing. However, current sign language recognition architectures are
usually computationally heavy and require robust GPU-equipped hardware to run
in real-time. Some models aim for lower-end devices (such as smartphones) by
minimizing their size and complexity, which leads to worse accuracy. This
highly scrutinizes accurate in-the-wild applications. We build upon the SPOTER
architecture, which belongs to the latter group of light methods, as it came
close to the performance of large models employed for this task. By
substituting its original third-party pose estimation module with the MediaPipe
library, we achieve an overall state-of-the-art result on the WLASL100 dataset.
Significantly, our method beats previous larger architectures while still being
twice as computationally efficient and almost $11$ times faster on inference
when compared to a relevant benchmark. To demonstrate our method's combined
efficiency and precision, we built an online demo that enables users to
translate sign lemmas of American sign language in their browsers. This is the
first publicly available online application demonstrating this task to the best
of our knowledge.
Related papers
- CLSP: High-Fidelity Contrastive Language-State Pre-training for Agent State Representation [49.19402798479942]
multimodal learning has become an important research area for artificial intelligence.
For intelligent agents, the state is a crucial modality to convey precise information alongside common modalities like images, videos, and language.
We propose a High-Fidelity Contrastive Language-State Pre-training method, which can accurately encode state information into general representations.
arXiv Detail & Related papers (2024-09-24T07:08:00Z) - Deep Neural Network-Based Sign Language Recognition: A Comprehensive Approach Using Transfer Learning with Explainability [0.0]
We suggest a novel solution that uses a deep neural network to fully automate sign language recognition.
This methodology integrates sophisticated preprocessing methodologies to optimise the overall performance.
Our model's ability to provide informational clarity was assessed using the SHAP (SHapley Additive exPlanations) method.
arXiv Detail & Related papers (2024-09-11T17:17:44Z) - Enhancing Sign Language Detection through Mediapipe and Convolutional Neural Networks (CNN) [3.192629447369627]
This research combines MediaPipe and CNNs for the efficient and accurate interpretation of ASL dataset.
The accuracy achieved by the model on ASL datasets is 99.12%.
The system will have applications in the communication, education, and accessibility domains.
arXiv Detail & Related papers (2024-06-06T04:05:12Z) - SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign
Language Understanding [132.78015553111234]
Hand gesture serves as a crucial role during the expression of sign language.
Current deep learning based methods for sign language understanding (SLU) are prone to over-fitting due to insufficient sign data resource.
We propose the first self-supervised pre-trainable SignBERT+ framework with model-aware hand prior incorporated.
arXiv Detail & Related papers (2023-05-08T17:16:38Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - Paraphrastic Representations at Scale [134.41025103489224]
We release trained models for English, Arabic, German, French, Spanish, Russian, Turkish, and Chinese languages.
We train these models on large amounts of data, achieving significantly improved performance from the original papers.
arXiv Detail & Related papers (2021-04-30T16:55:28Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - FastHand: Fast Hand Pose Estimation From A Monocular Camera [12.790733588554588]
We propose a fast and accurate framework for hand pose estimation, dubbed as "FastHand"
FastHand offers high accuracy scores while reaching a speed of 25 frames per second on an NVIDIA Jetson TX2 graphics processing unit.
arXiv Detail & Related papers (2021-02-14T04:12:41Z) - Real-Time Execution of Large-scale Language Models on Mobile [49.32610509282623]
We find the best model structure of BERT for a given computation size to match specific devices.
Our framework can guarantee the identified model to meet both resource and real-time specifications of mobile devices.
Specifically, our model is 5.2x faster on CPU and 4.1x faster on GPU with 0.5-2% accuracy loss compared with BERT-base.
arXiv Detail & Related papers (2020-09-15T01:59:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.