Related papers: Changing the Representation: Examining Language Representation for Neural Sign Language Production

Changing the Representation: Examining Language Representation for Neural Sign Language Production

URL: http://arxiv.org/abs/2210.06312v1
Date: Fri, 16 Sep 2022 12:45:29 GMT
Title: Changing the Representation: Examining Language Representation for Neural Sign Language Production
Authors: Harry Walsh, Ben Saunders, Richard Bowden
Abstract summary: We apply Natural Language Processing techniques to the first step of the Neural Sign Language Production pipeline. We use language models such as BERT and Word2Vec to create better sentence level embeddings. We introduce Text to HamNoSys (T2H) translation, and show the advantages of using a phonetic representation for sign language translation.
Score: 43.45785951443149
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural Sign Language Production (SLP) aims to automatically translate from spoken language sentences to sign language videos. Historically the SLP task has been broken into two steps; Firstly, translating from a spoken language sentence to a gloss sequence and secondly, producing a sign language video given a sequence of glosses. In this paper we apply Natural Language Processing techniques to the first step of the SLP pipeline. We use language models such as BERT and Word2Vec to create better sentence level embeddings, and apply several tokenization techniques, demonstrating how these improve performance on the low resource translation task of Text to Gloss. We introduce Text to HamNoSys (T2H) translation, and show the advantages of using a phonetic representation for sign language translation rather than a sign level gloss representation. Furthermore, we use HamNoSys to extract the hand shape of a sign and use this as additional supervision during training, further increasing the performance on T2H. Assembling best practise, we achieve a BLEU-4 score of 26.99 on the MineDGS dataset and 25.09 on PHOENIX14T, two new state-of-the-art baselines.

Related papers

Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues [56.038123093599815]
Our objective is to translate continuous sign language into spoken language text. We incorporate additional contextual cues together with the signing video. We show that our contextual approach significantly enhances the quality of the translations.
arXiv Detail & Related papers (2025-01-16T18:59:03Z)
A Spatio-Temporal Representation Learning as an Alternative to Traditional Glosses in Sign Language Translation and Production [9.065171626657818]
This paper addresses the challenges associated with the use of glosses in Sign Language Translation (SLT) and Sign Language Production Language (SLP) We introduce Universal Gloss-level Representation (UniGloR), a framework designed to capture thetemporal inherent sign language. Our experiments in a keypoint-based setting demonstrate that UniGloR either outperforms or matches performance of previous SLT and SLP methods.
arXiv Detail & Related papers (2024-07-03T07:12:36Z)
T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text [59.57676466961787]
We propose a novel dynamic vector quantization (DVA-VAE) model that can adjust the encoding length based on the information density in sign language. Experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of our proposed method. We propose a new large German sign language dataset, PHOENIX-News, which contains 486 hours of sign language videos, audio, and transcription texts.
arXiv Detail & Related papers (2024-06-11T10:06:53Z)
A Data-Driven Representation for Sign Language Production [26.520016084139964]
Sign Language Production aims to automatically translate spoken language sentences into continuous sequences of sign language. Current state-of-the-art approaches rely on scarce linguistic resources to work. This paper introduces an innovative solution by transforming the continuous pose generation problem into a discrete sequence generation problem.
arXiv Detail & Related papers (2024-04-17T15:52:38Z)
VK-G2T: Vision and Context Knowledge enhanced Gloss2Text [60.57628465740138]
Existing sign language translation methods follow a two-stage pipeline: first converting the sign language video to a gloss sequence (i.e. Sign2Gloss) and then translating the generated gloss sequence into a spoken language sentence (i.e. Gloss2Text) We propose a vision and context knowledge enhanced Gloss2Text model, named VK-G2T, which leverages the visual content of the sign language video to learn the properties of the target sentence and exploit the context knowledge to facilitate the adaptive translation of gloss words.
arXiv Detail & Related papers (2023-12-15T21:09:34Z)
Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining [56.26550923909137]
Gloss-Free Sign Language Translation (SLT) is a challenging task due to its cross-domain nature. We propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-) Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual and Text Decoder from
arXiv Detail & Related papers (2023-07-27T10:59:18Z)
Cross-modality Data Augmentation for End-to-End Sign Language Translation [66.46877279084083]
End-to-end sign language translation (SLT) aims to convert sign language videos into spoken language texts directly without intermediate representations. It has been a challenging task due to the modality gap between sign videos and texts and the data scarcity of labeled data. We propose a novel Cross-modality Data Augmentation (XmDA) framework to transfer the powerful gloss-to-text translation capabilities to end-to-end sign language translation.
arXiv Detail & Related papers (2023-05-18T16:34:18Z)
Explore More Guidance: A Task-aware Instruction Network for Sign Language Translation Enhanced with Data Augmentation [20.125265661134964]
Sign language recognition and translation first uses a recognition module to generate glosses from sign language videos. In this work, we propose a task-aware instruction network, namely TIN-SLT, for sign language translation.
arXiv Detail & Related papers (2022-04-12T17:09:44Z)
Data Augmentation for Sign Language Gloss Translation [115.13684506803529]
Sign language translation (SLT) is often decomposed into video-to-gloss recognition and gloss-totext translation. We focus here on gloss-to-text translation, which we treat as a low-resource neural machine translation (NMT) problem. By pre-training on the thus obtained synthetic data, we improve translation from American Sign Language (ASL) to English and German Sign Language (DGS) to German by up to 3.14 and 2.20 BLEU, respectively.
arXiv Detail & Related papers (2021-05-16T16:37:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.