Related papers: A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars

A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars

URL: http://arxiv.org/abs/2401.04730v2
Date: Wed, 3 Jul 2024 17:04:14 GMT
Title: A Simple Baseline for Spoken Language to Sign Language Translation with 3D Avatars
Authors: Ronglai Zuo, Fangyun Wei, Zenggui Chen, Brian Mak, Jiaolong Yang, Xin Tong,
Abstract summary: Spoken2Sign is a system for translating spoken languages into sign languages. We present a simple baseline consisting of three steps: creating a gloss-video dictionary, estimating a 3D sign for each sign video, and training a Spoken2Sign model. As far as we know, we are the first to present the Spoken2Sign task in an output format of 3D signs.
Score: 49.60328609426056
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The objective of this paper is to develop a functional system for translating spoken languages into sign languages, referred to as Spoken2Sign translation. The Spoken2Sign task is orthogonal and complementary to traditional sign language to spoken language (Sign2Spoken) translation. To enable Spoken2Sign translation, we present a simple baseline consisting of three steps: 1) creating a gloss-video dictionary using existing Sign2Spoken benchmarks; 2) estimating a 3D sign for each sign video in the dictionary; 3) training a Spoken2Sign model, which is composed of a Text2Gloss translator, a sign connector, and a rendering module, with the aid of the yielded gloss-3D sign dictionary. The translation results are then displayed through a sign avatar. As far as we know, we are the first to present the Spoken2Sign task in an output format of 3D signs. In addition to its capability of Spoken2Sign translation, we also demonstrate that two by-products of our approach-3D keypoint augmentation and multi-view understanding-can assist in keypoint-based sign language understanding. Code and models are available at https://github.com/FangyunWei/SLRT.

Related papers

Lost in Translation, Found in Embeddings: Sign Language Translation and Alignment [84.39962912136525]
We develop a model for sign language understanding that performs sign language translation (SLT) and sign-subtitle alignment (SSA)<n>Our approach is built upon three components: (i) a lightweight visual backbone that captures manual and non-manual cues from human keypoints and lip-region images; (ii) a Sliding Perceiver mapping network that aggregates consecutive visual features into word-level embeddings; and (iii) a multi-task scalable training strategy that jointly optimises SLT and SSA.
arXiv Detail & Related papers (2025-12-08T21:05:46Z)
Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation [0.0]
We introduce a complete pipeline that converts English speech into smooth, realistic 3D sign language animations.<n>Our system starts with Whisper to translate spoken English into text.<n>Then, we use a MarianMT machine translation model to translate that text into American Sign Language (ASL) gloss.<n>We animate the translated gloss using a 3D keypoint-based motion system trained on Sign3D-WLASL.
arXiv Detail & Related papers (2025-07-09T04:13:49Z)
SignX: The Foundation Model for Sign Recognition [28.651340554377906]
This paper proposes SignX, a foundation model framework for sign recognition. It is a concise yet powerful framework applicable to multiple human activity recognition scenarios. Experimental results show that SignX can recognize signs from sign language video, producing predicted gloss representations with greater accuracy than has been reported in prior work.
arXiv Detail & Related papers (2025-04-22T23:23:39Z)
Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues [56.038123093599815]
Our objective is to translate continuous sign language into spoken language text. We incorporate additional contextual cues together with the signing video. We show that our contextual approach significantly enhances the quality of the translations.
arXiv Detail & Related papers (2025-01-16T18:59:03Z)
Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator [55.94334001112357]
We introduce a multilingual sign language model, Signs as Tokens (SOKE), which can generate 3D sign avatars autoregressively from text inputs. We propose a retrieval-enhanced SLG approach, which incorporates external sign dictionaries to provide accurate word-level signs.
arXiv Detail & Related papers (2024-11-26T18:28:09Z)
EvSign: Sign Language Recognition and Translation with Streaming Events [59.51655336911345]
Event camera could naturally perceive dynamic hand movements, providing rich manual clues for sign language tasks. We propose efficient transformer-based framework for event-based SLR and SLT tasks. Our method performs favorably against existing state-of-the-art approaches with only 0.34% computational cost.
arXiv Detail & Related papers (2024-07-17T14:16:35Z)
T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text [59.57676466961787]
We propose a novel dynamic vector quantization (DVA-VAE) model that can adjust the encoding length based on the information density in sign language. Experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of our proposed method. We propose a new large German sign language dataset, PHOENIX-News, which contains 486 hours of sign language videos, audio, and transcription texts.
arXiv Detail & Related papers (2024-06-11T10:06:53Z)
SignBLEU: Automatic Evaluation of Multi-channel Sign Language Translation [3.9711029428461653]
We introduce a new task named multi-channel sign language translation (MCSLT) We present a novel metric, SignBLEU, designed to capture multiple signal channels. We found that SignBLEU consistently correlates better with human judgment than competing metrics.
arXiv Detail & Related papers (2024-06-10T05:01:26Z)
Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation [30.008980708977095]
We introduce Sign2GPT, a novel framework for sign language translation. We propose a novel pretraining strategy that directs our encoder to learn sign representations from automatically extracted pseudo-glosses. We evaluate our approach on two public benchmark sign language translation datasets.
arXiv Detail & Related papers (2024-05-07T10:00:38Z)
Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition. We first build two sign language dictionaries containing isolated signs that appear in two datasets. Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z)
Changing the Representation: Examining Language Representation for Neural Sign Language Production [43.45785951443149]
We apply Natural Language Processing techniques to the first step of the Neural Sign Language Production pipeline. We use language models such as BERT and Word2Vec to create better sentence level embeddings. We introduce Text to HamNoSys (T2H) translation, and show the advantages of using a phonetic representation for sign language translation.
arXiv Detail & Related papers (2022-09-16T12:45:29Z)
Scaling up sign spotting through sign language dictionaries [99.50956498009094]
The focus of this work is $textitsign spotting$ - given a video of an isolated sign, our task is to identify $textitwhether$ and $textitwhere$ it has been signed in a continuous, co-articulated sign language video. We train a model using multiple types of available supervision by: (1) $textitwatching$ existing footage which is sparsely labelled using mouthing cues; (2) $textitreading$ associated subtitles which provide additional translations of the signed content. We validate the effectiveness of our approach on low
arXiv Detail & Related papers (2022-05-09T10:00:03Z)
Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate. Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance. Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z)
Watch, read and lookup: learning to spot signs from multiple supervisors [99.50956498009094]
Given a video of an isolated sign, our task is to identify whether and where it has been signed in a continuous, co-articulated sign language video. We train a model using multiple types of available supervision by: (1) watching existing sparsely labelled footage; (2) reading associated subtitles which provide additional weak-supervision; and (3) looking up words in visual sign language dictionaries. These three tasks are integrated into a unified learning framework using the principles of Noise Contrastive Estimation and Multiple Instance Learning.
arXiv Detail & Related papers (2020-10-08T14:12:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.