Scaling Back-Translation with Domain Text Generation for Sign Language
Gloss Translation
- URL: http://arxiv.org/abs/2210.07054v1
- Date: Thu, 13 Oct 2022 14:25:08 GMT
- Title: Scaling Back-Translation with Domain Text Generation for Sign Language
Gloss Translation
- Authors: Jinhui Ye, Wenxiang Jiao, Xing Wang and Zhaopeng Tu
- Abstract summary: Sign language gloss translation aims to translate the sign glosses into spoken language texts.
Back translation (BT) generates pseudo-parallel data by translating in-domain spoken language texts into sign glosses.
We propose a Prompt based domain text Generation (PGEN) approach to produce the large-scale spoken language text data.
- Score: 36.40377483258876
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Sign language gloss translation aims to translate the sign glosses into
spoken language texts, which is challenging due to the scarcity of labeled
gloss-text parallel data. Back translation (BT), which generates
pseudo-parallel data by translating in-domain spoken language texts into sign
glosses, has been applied to alleviate the data scarcity problem. However, the
lack of large-scale high-quality domain spoken language text data limits the
effect of BT. In this paper, to overcome the limitation, we propose a Prompt
based domain text Generation (PGEN) approach to produce the large-scale
in-domain spoken language text data. Specifically, PGEN randomly concatenates
sentences from the original in-domain spoken language text data as prompts to
induce a pre-trained language model (i.e., GPT-2) to generate spoken language
texts in a similar style. Experimental results on three benchmarks of sign
language gloss translation in varied languages demonstrate that BT with spoken
language texts generated by PGEN significantly outperforms the compared
methods. In addition, as the scale of spoken language texts generated by PGEN
increases, the BT technique can achieve further improvements, demonstrating the
effectiveness of our approach. We release the code and data for facilitating
future research in this field.
Related papers
- T2S-GPT: Dynamic Vector Quantization for Autoregressive Sign Language Production from Text [59.57676466961787]
We propose a novel dynamic vector quantization (DVA-VAE) model that can adjust the encoding length based on the information density in sign language.
Experiments conducted on the PHOENIX14T dataset demonstrate the effectiveness of our proposed method.
We propose a new large German sign language dataset, PHOENIX-News, which contains 486 hours of sign language videos, audio, and transcription texts.
arXiv Detail & Related papers (2024-06-11T10:06:53Z) - Gloss-free Sign Language Translation: Improving from Visual-Language
Pretraining [56.26550923909137]
Gloss-Free Sign Language Translation (SLT) is a challenging task due to its cross-domain nature.
We propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-)
Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual and Text Decoder from
arXiv Detail & Related papers (2023-07-27T10:59:18Z) - Cross-modality Data Augmentation for End-to-End Sign Language Translation [66.46877279084083]
End-to-end sign language translation (SLT) aims to convert sign language videos into spoken language texts directly without intermediate representations.
It has been a challenging task due to the modality gap between sign videos and texts and the data scarcity of labeled data.
We propose a novel Cross-modality Data Augmentation (XmDA) framework to transfer the powerful gloss-to-text translation capabilities to end-to-end sign language translation.
arXiv Detail & Related papers (2023-05-18T16:34:18Z) - Code-Switching Text Generation and Injection in Mandarin-English ASR [57.57570417273262]
We investigate text generation and injection for improving the performance of an industry commonly-used streaming model, Transformer-Transducer (T-T)
We first propose a strategy to generate code-switching text data and then investigate injecting generated text into T-T model explicitly by Text-To-Speech (TTS) conversion or implicitly by tying speech and text latent spaces.
Experimental results on the T-T model trained with a dataset containing 1,800 hours of real Mandarin-English code-switched speech show that our approaches to inject generated code-switching text significantly boost the performance of T-T models.
arXiv Detail & Related papers (2023-03-20T09:13:27Z) - Improving Sign Language Translation with Monolingual Data by Sign
Back-Translation [105.83166521438463]
We propose a sign back-translation (SignBT) approach, which incorporates massive spoken language texts into sign training.
With a text-to-gloss translation model, we first back-translate the monolingual text to its gloss sequence.
Then, the paired sign sequence is generated by splicing pieces from an estimated gloss-to-sign bank at the feature level.
arXiv Detail & Related papers (2021-05-26T08:49:30Z) - Sign Language Transformers: Joint End-to-end Sign Language Recognition
and Translation [59.38247587308604]
We introduce a novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation.
We evaluate the recognition and translation performances of our approaches on the challenging RWTH-PHOENIX-Weather-2014T dataset.
Our translation networks outperform both sign video to spoken language and gloss to spoken language translation models.
arXiv Detail & Related papers (2020-03-30T21:35:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.