Evaluating the Immediate Applicability of Pose Estimation for Sign
Language Recognition
- URL: http://arxiv.org/abs/2104.10166v1
- Date: Tue, 20 Apr 2021 14:41:45 GMT
- Title: Evaluating the Immediate Applicability of Pose Estimation for Sign
Language Recognition
- Authors: Amit Moryossef, Ioannis Tsochantaridis, Joe Dinn, Necati Cihan
Camg\"oz, Richard Bowden, Tao Jiang, Annette Rios, Mathias M\"uller, Sarah
Ebling
- Abstract summary: We evaluate representations based on skeleton poses, as these are explainable, person-independent, privacy-preserving, low-dimensional representations.
We perform two independent studies using two state-of-the-art pose estimation systems.
We analyze the applicability of the pose estimation systems to sign language recognition by evaluating the failure cases of the recognition models.
- Score: 33.26064598621083
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Signed languages are visual languages produced by the movement of the hands,
face, and body. In this paper, we evaluate representations based on skeleton
poses, as these are explainable, person-independent, privacy-preserving,
low-dimensional representations. Basically, skeletal representations generalize
over an individual's appearance and background, allowing us to focus on the
recognition of motion. But how much information is lost by the skeletal
representation? We perform two independent studies using two state-of-the-art
pose estimation systems. We analyze the applicability of the pose estimation
systems to sign language recognition by evaluating the failure cases of the
recognition models. Importantly, this allows us to characterize the current
limitations of skeletal pose estimation approaches in sign language
recognition.
Related papers
- Linguistic Indicators of Early Cognitive Decline in the DementiaBank Pitt Corpus: A Statistical and Machine Learning Study [4.417564179511245]
This study analyzes spontaneous speech transcripts from the DementiaBank Pitt Corpus using three linguistic representations.<n> syntactic and grammatical features retain strong discriminative power even in the absence of lexical content.<n>This study supports the use of linguistically grounded features for transparent and reliable language-based cognitive screening.
arXiv Detail & Related papers (2026-02-11T16:53:57Z) - Sign language recognition from skeletal data using graph and recurrent neural networks [0.0]
This work presents an approach for recognizing isolated sign language gestures using skeleton-based pose data extracted from video sequences.<n>A Graph-GRU temporal network is proposed to model both spatial and temporal dependencies between frames, enabling accurate classification.<n>The model is trained and evaluated on the AUTSL (Ankara university Turkish sign language) dataset, achieving high accuracy.
arXiv Detail & Related papers (2025-11-08T00:04:42Z) - Meaningful Pose-Based Sign Language Evaluation [29.030154300749086]
The study covers keypoint distance-based, embedding-based, and back-translation-based metrics.<n>We show tradeoffs between different metrics in different scenarios through automatic meta-evaluation of sign-level retrieval and a human correlation study of text-to-pose translation across different sign languages.
arXiv Detail & Related papers (2025-10-08T19:00:24Z) - Sign Language Recognition Based On Facial Expression and Hand Skeleton [2.5879170041667523]
We propose a sign language recognition network that integrates skeleton features of hands and facial expression.
By incorporating facial expression information, the accuracy and robustness of sign language recognition are improved.
arXiv Detail & Related papers (2024-07-02T13:02:51Z) - Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition [96.62264528407863]
We propose a self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency.
Inspired by the complementary property of motion and joint modalities, we first introduce first-order motion information into sign language modeling.
Our method is evaluated with extensive experiments on four public benchmarks, and achieves new state-of-the-art performance with a notable margin.
arXiv Detail & Related papers (2024-06-15T04:50:19Z) - Sign Languague Recognition without frame-sequencing constraints: A proof
of concept on the Argentinian Sign Language [42.27617228521691]
This paper presents a general probabilistic model for sign classification that combines sub-classifiers based on different types of features.
The proposed model achieved an accuracy rate of 97% on an Argentinian Sign Language dataset.
arXiv Detail & Related papers (2023-10-26T14:47:11Z) - Perceptual Grouping in Contrastive Vision-Language Models [59.1542019031645]
We show how vision-language models are able to understand where objects reside within an image and group together visually related parts of the imagery.
We propose a minimal set of modifications that results in models that uniquely learn both semantic and spatial information.
arXiv Detail & Related papers (2022-10-18T17:01:35Z) - Leveraging Pre-trained Language Model for Speech Sentiment Analysis [58.78839114092951]
We explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.
We propose a pseudo label-based semi-supervised training strategy using a language model on an end-to-end speech sentiment approach.
arXiv Detail & Related papers (2021-06-11T20:15:21Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - Regional Attention Network (RAN) for Head Pose and Fine-grained Gesture
Recognition [9.131161856493486]
We propose a novel end-to-end textbfRegional Attention Network (RAN), which is a fully Convolutional Neural Network (CNN)
Our regions consist of one or more consecutive cells and are adapted from the strategies used in computing HOG (Histogram of Oriented Gradient) descriptor.
The proposed approach outperforms the state-of-the-art by a considerable margin in different metrics.
arXiv Detail & Related papers (2021-01-17T10:14:28Z) - Pose-based Body Language Recognition for Emotion and Psychiatric Symptom
Interpretation [75.3147962600095]
We propose an automated framework for body language based emotion recognition starting from regular RGB videos.
In collaboration with psychologists, we extend the framework for psychiatric symptom prediction.
Because a specific application domain of the proposed framework may only supply a limited amount of data, the framework is designed to work on a small training set.
arXiv Detail & Related papers (2020-10-30T18:45:16Z) - Using Human Psychophysics to Evaluate Generalization in Scene Text
Recognition Models [7.294729862905325]
We characterize two important scene text recognition models by measuring their domains.
The domains specifies the ability of readers to generalize to different word lengths, fonts, and amounts of occlusion.
arXiv Detail & Related papers (2020-06-30T19:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.