Related papers: Ham2Pose: Animating Sign Language Notation into Pose Sequences

Ham2Pose: Animating Sign Language Notation into Pose Sequences

URL: http://arxiv.org/abs/2211.13613v2
Date: Sat, 1 Apr 2023 17:13:25 GMT
Title: Ham2Pose: Animating Sign Language Notation into Pose Sequences
Authors: Rotem Shalev-Arkushin, Amit Moryossef, Ohad Fried
Abstract summary: Translating spoken languages into Sign languages is necessary for open communication between the hearing and hearing-impaired communities. We propose the first method for animating a text written in HamNoSys, a lexical Sign language notation, into signed pose sequences.
Score: 9.132706284440276
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Translating spoken languages into Sign languages is necessary for open communication between the hearing and hearing-impaired communities. To achieve this goal, we propose the first method for animating a text written in HamNoSys, a lexical Sign language notation, into signed pose sequences. As HamNoSys is universal by design, our proposed method offers a generic solution invariant to the target Sign language. Our method gradually generates pose predictions using transformer encoders that create meaningful representations of the text and poses while considering their spatial and temporal information. We use weak supervision for the training process and show that our method succeeds in learning from partial and inaccurate data. Additionally, we offer a new distance measurement that considers missing keypoints, to measure the distance between pose sequences using DTW-MJE. We validate its correctness using AUTSL, a large-scale Sign language dataset, show that it measures the distance between pose sequences more accurately than existing measurements, and use it to assess the quality of our generated pose sequences. Code for the data pre-processing, the model, and the distance measurement is publicly released for future research.

Related papers

Pose-Based Sign Language Spotting via an End-to-End Encoder Architecture [0.4083182125683813]
We present a first step toward sign language retrieval by addressing the challenge of detecting the presence or absence of a query sign video.<n>Unlike conventional approaches that rely on intermediate gloss recognition or text-based matching, we propose an end-to-end model that directly operates on pose keypoints extracted from sign videos.<n>Our architecture employs an encoder-only backbone with a binary classification head to determine whether the query sign appears within the target sequence.
arXiv Detail & Related papers (2025-12-09T15:49:23Z)
Text-Driven Diffusion Model for Sign Language Production [13.671593137551268]
We introduce the hfut-lmc team's solution to the SLRTP Sign Production Challenge. The challenge aims to generate semantically aligned sign language pose sequences from text inputs. Our solution achieves a BLEU-1 score of 20.17, placing second in the challenge.
arXiv Detail & Related papers (2025-03-20T07:45:27Z)
Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator [55.94334001112357]
We introduce a multilingual sign language model, Signs as Tokens (SOKE), which can generate 3D sign avatars autoregressively from text inputs. We propose a retrieval-enhanced SLG approach, which incorporates external sign dictionaries to provide accurate word-level signs.
arXiv Detail & Related papers (2024-11-26T18:28:09Z)
SHuBERT: Self-Supervised Sign Language Representation Learning via Multi-Stream Cluster Prediction [65.1590372072555]
SHuBERT (Sign Hidden-Unit BERT) is a self-supervised contextual representation model learned from 1,000 hours of American Sign Language video.<n>SHuBERT adapts masked token prediction objectives to multi-stream visual sign language input, learning to predict multiple targets corresponding to clustered hand, face, and body pose streams.<n>SHuBERT achieves state-of-the-art performance across multiple tasks including sign language translation, isolated sign language recognition, and fingerspelling detection.
arXiv Detail & Related papers (2024-11-25T03:13:08Z)
MS2SL: Multimodal Spoken Data-Driven Continuous Sign Language Production [93.32354378820648]
We propose a unified framework for continuous sign language production, easing communication between sign and non-sign language users. A sequence diffusion model, utilizing embeddings extracted from text or speech, is crafted to generate sign predictions step by step. Experiments on How2Sign and PHOENIX14T datasets demonstrate that our model achieves competitive performance in sign language production.
arXiv Detail & Related papers (2024-07-04T13:53:50Z)
Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition [96.62264528407863]
We propose a self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency. Inspired by the complementary property of motion and joint modalities, we first introduce first-order motion information into sign language modeling. Our method is evaluated with extensive experiments on four public benchmarks, and achieves new state-of-the-art performance with a notable margin.
arXiv Detail & Related papers (2024-06-15T04:50:19Z)
T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text [59.57676466961787]
We propose a novel dynamic vector quantization (DVA-VAE) model that can adjust the encoding length based on the information density in sign language. Experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of our proposed method. We propose a new large German sign language dataset, PHOENIX-News, which contains 486 hours of sign language videos, audio, and transcription texts.
arXiv Detail & Related papers (2024-06-11T10:06:53Z)
SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale [22.49602248323602]
A persistent challenge in sign language video processing is how we learn representations of sign language. Our proposed method focuses on just the most relevant parts in a signing video: the face, hands and body posture of the signer. Our approach is based on learning from individual frames (rather than video sequences) and is therefore much more efficient than prior work on sign language pre-training.
arXiv Detail & Related papers (2024-06-11T03:00:41Z)
SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign Language Understanding [132.78015553111234]
Hand gesture serves as a crucial role during the expression of sign language. Current deep learning based methods for sign language understanding (SLU) are prone to over-fitting due to insufficient sign data resource. We propose the first self-supervised pre-trainable SignBERT+ framework with model-aware hand prior incorporated.
arXiv Detail & Related papers (2023-05-08T17:16:38Z)
On the Importance of Signer Overlap for Sign Language Detection [65.26091369630547]
We argue that the current benchmark data sets for sign language detection estimate overly positive results that do not generalize well. We quantify this with a detailed analysis of the effect of signer overlap on current sign detection benchmark data sets. We propose new data set partitions that are free of overlap and allow for more realistic performance assessment.
arXiv Detail & Related papers (2023-03-19T22:15:05Z)
Bridging the Gap between Language Models and Cross-Lingual Sequence Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks. Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages. In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap. Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z)
Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production [43.45785951443149]
Sign languages are visual languages, with vocabularies as rich as their spoken language counterparts. Current deep-learning based Sign Language Production (SLP) models produce under-articulated skeleton pose sequences. We tackle large-scale SLP by learning to co-articulate between dictionary signs. We also propose SignGAN, a pose-conditioned human synthesis model that produces photo-realistic sign language videos.
arXiv Detail & Related papers (2022-03-29T08:51:38Z)
Continuous 3D Multi-Channel Sign Language Production via Progressive Transformers and Mixture Density Networks [37.679114155300084]
Sign Language Production (SLP) must embody both the continuous articulation and full morphology of sign to be truly understandable by the Deaf community. We propose a novel Progressive Transformer architecture, the first SLP model to translate from spoken language sentences to continuous 3D sign pose sequences. We present extensive data augmentation techniques to reduce prediction drift, alongside an adversarial training regime and a Mixture Density Network (MDN) formulation to produce realistic and expressive sign pose sequences.
arXiv Detail & Related papers (2021-03-11T22:11:17Z)
Pose-based Sign Language Recognition using GCN and BERT [0.0]
Word-level sign language recognition (WSLR) is the first important step towards understanding and interpreting sign language. recognizing signs from videos is a challenging task as the meaning of a word depends on a combination of subtle body motions, hand configurations, and other movements. Recent pose-based architectures for W SLR either model both the spatial and temporal dependencies among the poses in different frames simultaneously or only model the temporal information without fully utilizing the spatial information. We tackle the problem of W SLR using a novel pose-based approach, which captures spatial and temporal information separately and performs late fusion.
arXiv Detail & Related papers (2020-12-01T19:10:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.