Data Augmentation for Sign Language Gloss Translation
- URL: http://arxiv.org/abs/2105.07476v1
- Date: Sun, 16 May 2021 16:37:36 GMT
- Title: Data Augmentation for Sign Language Gloss Translation
- Authors: Amit Moryossef, Kayo Yin, Graham Neubig, Yoav Goldberg
- Abstract summary: Sign language translation (SLT) is often decomposed into video-to-gloss recognition and gloss-totext translation.
We focus here on gloss-to-text translation, which we treat as a low-resource neural machine translation (NMT) problem.
By pre-training on the thus obtained synthetic data, we improve translation from American Sign Language (ASL) to English and German Sign Language (DGS) to German by up to 3.14 and 2.20 BLEU, respectively.
- Score: 115.13684506803529
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sign language translation (SLT) is often decomposed into video-to-gloss
recognition and gloss-to-text translation, where a gloss is a sequence of
transcribed spoken-language words in the order in which they are signed. We
focus here on gloss-to-text translation, which we treat as a low-resource
neural machine translation (NMT) problem. However, unlike traditional
low-resource NMT, gloss-to-text translation differs because gloss-text pairs
often have a higher lexical overlap and lower syntactic overlap than pairs of
spoken languages. We exploit this lexical overlap and handle syntactic
divergence by proposing two rule-based heuristics that generate pseudo-parallel
gloss-text pairs from monolingual spoken language text. By pre-training on the
thus obtained synthetic data, we improve translation from American Sign
Language (ASL) to English and German Sign Language (DGS) to German by up to
3.14 and 2.20 BLEU, respectively.
Related papers
- VK-G2T: Vision and Context Knowledge enhanced Gloss2Text [60.57628465740138]
Existing sign language translation methods follow a two-stage pipeline: first converting the sign language video to a gloss sequence (i.e. Sign2Gloss) and then translating the generated gloss sequence into a spoken language sentence (i.e. Gloss2Text)
We propose a vision and context knowledge enhanced Gloss2Text model, named VK-G2T, which leverages the visual content of the sign language video to learn the properties of the target sentence and exploit the context knowledge to facilitate the adaptive translation of gloss words.
arXiv Detail & Related papers (2023-12-15T21:09:34Z) - Is context all you need? Scaling Neural Sign Language Translation to
Large Domains of Discourse [34.70927441846784]
Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos.
We propose a novel multi-modal transformer architecture that tackles the translation task in a context-aware manner, as a human would.
We report significant improvements on state-of-the-art translation performance using contextual information, nearly doubling the reported BLEU-4 scores of baseline approaches.
arXiv Detail & Related papers (2023-08-18T15:27:22Z) - Gloss-free Sign Language Translation: Improving from Visual-Language
Pretraining [56.26550923909137]
Gloss-Free Sign Language Translation (SLT) is a challenging task due to its cross-domain nature.
We propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-)
Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual and Text Decoder from
arXiv Detail & Related papers (2023-07-27T10:59:18Z) - Enhancing Cross-lingual Transfer via Phonemic Transcription Integration [57.109031654219294]
PhoneXL is a framework incorporating phonemic transcriptions as an additional linguistic modality for cross-lingual transfer.
Our pilot study reveals phonemic transcription provides essential information beyond the orthography to enhance cross-lingual transfer.
arXiv Detail & Related papers (2023-07-10T06:17:33Z) - Cross-modality Data Augmentation for End-to-End Sign Language Translation [66.46877279084083]
End-to-end sign language translation (SLT) aims to convert sign language videos into spoken language texts directly without intermediate representations.
It has been a challenging task due to the modality gap between sign videos and texts and the data scarcity of labeled data.
We propose a novel Cross-modality Data Augmentation (XmDA) framework to transfer the powerful gloss-to-text translation capabilities to end-to-end sign language translation.
arXiv Detail & Related papers (2023-05-18T16:34:18Z) - Better Sign Language Translation with Monolingual Data [6.845232643246564]
Sign language translation (SLT) systems heavily relies on the availability of large-scale parallel G2T pairs.
This paper proposes a simple and efficient rule transformation method to transcribe the large-scale target monolingual data into its pseudo glosses automatically.
Empirical results show that the proposed approach can significantly improve the performance of SLT.
arXiv Detail & Related papers (2023-04-21T09:39:54Z) - Changing the Representation: Examining Language Representation for
Neural Sign Language Production [43.45785951443149]
We apply Natural Language Processing techniques to the first step of the Neural Sign Language Production pipeline.
We use language models such as BERT and Word2Vec to create better sentence level embeddings.
We introduce Text to HamNoSys (T2H) translation, and show the advantages of using a phonetic representation for sign language translation.
arXiv Detail & Related papers (2022-09-16T12:45:29Z) - Improving Sign Language Translation with Monolingual Data by Sign
Back-Translation [105.83166521438463]
We propose a sign back-translation (SignBT) approach, which incorporates massive spoken language texts into sign training.
With a text-to-gloss translation model, we first back-translate the monolingual text to its gloss sequence.
Then, the paired sign sequence is generated by splicing pieces from an estimated gloss-to-sign bank at the feature level.
arXiv Detail & Related papers (2021-05-26T08:49:30Z) - Better Sign Language Translation with STMC-Transformer [9.835743237370218]
Sign Language Translation first uses a Sign Language Recognition system to extract sign language glosses from videos.
A translation system then generates spoken language translations from the sign language glosses.
This paper introduces the STMC-Transformer which improves on the current state-of-the-art by over 5 and 7 BLEU respectively.
arXiv Detail & Related papers (2020-04-01T17:20:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.