Related papers: Improving Sign Language Translation with Monolingual Data by Sign Back-Translation

Improving Sign Language Translation with Monolingual Data by Sign Back-Translation

URL: http://arxiv.org/abs/2105.12397v1
Date: Wed, 26 May 2021 08:49:30 GMT
Title: Improving Sign Language Translation with Monolingual Data by Sign Back-Translation
Authors: Hao Zhou, Wengang Zhou, Weizhen Qi, Junfu Pu, Houqiang Li
Abstract summary: We propose a sign back-translation (SignBT) approach, which incorporates massive spoken language texts into sign training. With a text-to-gloss translation model, we first back-translate the monolingual text to its gloss sequence. Then, the paired sign sequence is generated by splicing pieces from an estimated gloss-to-sign bank at the feature level.
Score: 105.83166521438463
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite existing pioneering works on sign language translation (SLT), there is a non-trivial obstacle, i.e., the limited quantity of parallel sign-text data. To tackle this parallel data bottleneck, we propose a sign back-translation (SignBT) approach, which incorporates massive spoken language texts into SLT training. With a text-to-gloss translation model, we first back-translate the monolingual text to its gloss sequence. Then, the paired sign sequence is generated by splicing pieces from an estimated gloss-to-sign bank at the feature level. Finally, the synthetic parallel data serves as a strong supplement for the end-to-end training of the encoder-decoder SLT framework. To promote the SLT research, we further contribute CSL-Daily, a large-scale continuous SLT dataset. It provides both spoken language translations and gloss-level annotations. The topic revolves around people's daily lives (e.g., travel, shopping, medical care), the most likely SLT application scenario. Extensive experimental results and analysis of SLT methods are reported on CSL-Daily. With the proposed sign back-translation method, we obtain a substantial improvement over previous state-of-the-art SLT methods.

Related papers

Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues [56.038123093599815]
Our objective is to translate continuous sign language into spoken language text. We incorporate additional contextual cues together with the signing video. We show that our contextual approach significantly enhances the quality of the translations.
arXiv Detail & Related papers (2025-01-16T18:59:03Z)
LLaVA-SLT: Visual Language Tuning for Sign Language Translation [42.20090162339927]
Recent advancements in Sign Language Translation (SLT) have shown promise, yet they often largely lag behind gloss-based approaches in terms of accuracy. We introduce LLaVA-SLT, a pioneering Large Multimodal Model (LMM) framework designed to leverage the power of Large Language Models (LLMs) through effectively learned visual language embeddings. Our comprehensive experiments demonstrate that LLaVA-SLT outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2024-12-21T08:01:08Z)
Scaling Sign Language Translation [38.43594795927101]
Sign language translation (SLT) addresses the problem of translating information from a sign language in video to a spoken language in text. In this paper, we push forward the frontier of SLT by scaling pretraining data, model size, and number of translation directions. Experiments show substantial quality improvements over the vanilla baselines, surpassing the previous state-of-the-art (SOTA) by wide margins.
arXiv Detail & Related papers (2024-07-16T15:36:58Z)
Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining [56.26550923909137]
Gloss-Free Sign Language Translation (SLT) is a challenging task due to its cross-domain nature. We propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-) Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual and Text Decoder from
arXiv Detail & Related papers (2023-07-27T10:59:18Z)
Cross-modality Data Augmentation for End-to-End Sign Language Translation [66.46877279084083]
End-to-end sign language translation (SLT) aims to convert sign language videos into spoken language texts directly without intermediate representations. It has been a challenging task due to the modality gap between sign videos and texts and the data scarcity of labeled data. We propose a novel Cross-modality Data Augmentation (XmDA) framework to transfer the powerful gloss-to-text translation capabilities to end-to-end sign language translation.
arXiv Detail & Related papers (2023-05-18T16:34:18Z)
Better Sign Language Translation with Monolingual Data [6.845232643246564]
Sign language translation (SLT) systems heavily relies on the availability of large-scale parallel G2T pairs. This paper proposes a simple and efficient rule transformation method to transcribe the large-scale target monolingual data into its pseudo glosses automatically. Empirical results show that the proposed approach can significantly improve the performance of SLT.
arXiv Detail & Related papers (2023-04-21T09:39:54Z)
LSA-T: The first continuous Argentinian Sign Language dataset for Sign Language Translation [52.87578398308052]
Sign language translation (SLT) is an active field of study that encompasses human-computer interaction, computer vision, natural language processing and machine learning. This paper presents the first continuous Argentinian Sign Language (LSA) dataset. It contains 14,880 sentence level videos of LSA extracted from the CN Sordos YouTube channel with labels and keypoints annotations for each signer.
arXiv Detail & Related papers (2022-11-14T14:46:44Z)
Scaling Back-Translation with Domain Text Generation for Sign Language Gloss Translation [36.40377483258876]
Sign language gloss translation aims to translate the sign glosses into spoken language texts. Back translation (BT) generates pseudo-parallel data by translating in-domain spoken language texts into sign glosses. We propose a Prompt based domain text Generation (PGEN) approach to produce the large-scale spoken language text data.
arXiv Detail & Related papers (2022-10-13T14:25:08Z)
A Token-level Contrastive Framework for Sign Language Translation [9.185037439012952]
Sign Language Translation is a promising technology to bridge the communication gap between the deaf and the hearing people. We propose ConSLT, a novel token-level. textbfContrastive learning framework for textbfSign textbfLanguage. textbfTranslation.
arXiv Detail & Related papers (2022-04-11T07:33:26Z)
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation [54.29679610921429]
Existing sign language datasets contain only about 10K-20K pairs of sign videos, gloss annotations and texts. Data is thus a bottleneck for training effective sign language translation models. This simple baseline surpasses the previous state-of-the-art results on two sign language translation benchmarks.
arXiv Detail & Related papers (2022-03-08T18:59:56Z)
SimulSLT: End-to-End Simultaneous Sign Language Translation [55.54237194555432]
Existing sign language translation methods need to read all the videos before starting the translation. We propose SimulSLT, the first end-to-end simultaneous sign language translation model. SimulSLT achieves BLEU scores that exceed the latest end-to-end non-simultaneous sign language translation model.
arXiv Detail & Related papers (2021-12-08T11:04:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.