Changing the Representation: Examining Language Representation for
Neural Sign Language Production
- URL: http://arxiv.org/abs/2210.06312v1
- Date: Fri, 16 Sep 2022 12:45:29 GMT
- Title: Changing the Representation: Examining Language Representation for
Neural Sign Language Production
- Authors: Harry Walsh, Ben Saunders, Richard Bowden
- Abstract summary: We apply Natural Language Processing techniques to the first step of the Neural Sign Language Production pipeline.
We use language models such as BERT and Word2Vec to create better sentence level embeddings.
We introduce Text to HamNoSys (T2H) translation, and show the advantages of using a phonetic representation for sign language translation.
- Score: 43.45785951443149
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Neural Sign Language Production (SLP) aims to automatically translate from
spoken language sentences to sign language videos. Historically the SLP task
has been broken into two steps; Firstly, translating from a spoken language
sentence to a gloss sequence and secondly, producing a sign language video
given a sequence of glosses. In this paper we apply Natural Language Processing
techniques to the first step of the SLP pipeline. We use language models such
as BERT and Word2Vec to create better sentence level embeddings, and apply
several tokenization techniques, demonstrating how these improve performance on
the low resource translation task of Text to Gloss. We introduce Text to
HamNoSys (T2H) translation, and show the advantages of using a phonetic
representation for sign language translation rather than a sign level gloss
representation. Furthermore, we use HamNoSys to extract the hand shape of a
sign and use this as additional supervision during training, further increasing
the performance on T2H. Assembling best practise, we achieve a BLEU-4 score of
26.99 on the MineDGS dataset and 25.09 on PHOENIX14T, two new state-of-the-art
baselines.
Related papers
- Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues [56.038123093599815]
Our objective is to translate continuous sign language into spoken language text.
We incorporate additional contextual cues together with the signing video.
We show that our contextual approach significantly enhances the quality of the translations.
arXiv Detail & Related papers (2025-01-16T18:59:03Z) - Signs as Tokens: An Autoregressive Multilingual Sign Language Generator [55.94334001112357]
We introduce a multilingual sign language model, Signs as Tokens (SOKE), which can generate 3D sign avatars autoregressively from text inputs.
To align sign language with the LM, we develop a decoupled tokenizer that discretizes continuous signs into token sequences representing various body parts.
These sign tokens are integrated into the raw text vocabulary of the LM, allowing for supervised fine-tuning on sign language datasets.
arXiv Detail & Related papers (2024-11-26T18:28:09Z) - A Spatio-Temporal Representation Learning as an Alternative to Traditional Glosses in Sign Language Translation and Production [9.065171626657818]
This paper addresses the challenges associated with the use of glosses in Sign Language Translation (SLT) and Sign Language Production Language (SLP)
We introduce Universal Gloss-level Representation (UniGloR), a framework designed to capture thetemporal inherent sign language.
Our experiments in a keypoint-based setting demonstrate that UniGloR either outperforms or matches performance of previous SLT and SLP methods.
arXiv Detail & Related papers (2024-07-03T07:12:36Z) - T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text [59.57676466961787]
We propose a novel dynamic vector quantization (DVA-VAE) model that can adjust the encoding length based on the information density in sign language.
Experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of our proposed method.
We propose a new large German sign language dataset, PHOENIX-News, which contains 486 hours of sign language videos, audio, and transcription texts.
arXiv Detail & Related papers (2024-06-11T10:06:53Z) - VK-G2T: Vision and Context Knowledge enhanced Gloss2Text [60.57628465740138]
Existing sign language translation methods follow a two-stage pipeline: first converting the sign language video to a gloss sequence (i.e. Sign2Gloss) and then translating the generated gloss sequence into a spoken language sentence (i.e. Gloss2Text)
We propose a vision and context knowledge enhanced Gloss2Text model, named VK-G2T, which leverages the visual content of the sign language video to learn the properties of the target sentence and exploit the context knowledge to facilitate the adaptive translation of gloss words.
arXiv Detail & Related papers (2023-12-15T21:09:34Z) - Gloss-free Sign Language Translation: Improving from Visual-Language
Pretraining [56.26550923909137]
Gloss-Free Sign Language Translation (SLT) is a challenging task due to its cross-domain nature.
We propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-)
Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual and Text Decoder from
arXiv Detail & Related papers (2023-07-27T10:59:18Z) - Explore More Guidance: A Task-aware Instruction Network for Sign
Language Translation Enhanced with Data Augmentation [20.125265661134964]
Sign language recognition and translation first uses a recognition module to generate glosses from sign language videos.
In this work, we propose a task-aware instruction network, namely TIN-SLT, for sign language translation.
arXiv Detail & Related papers (2022-04-12T17:09:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.