SLTUNET: A Simple Unified Model for Sign Language Translation
- URL: http://arxiv.org/abs/2305.01778v1
- Date: Tue, 2 May 2023 20:41:59 GMT
- Title: SLTUNET: A Simple Unified Model for Sign Language Translation
- Authors: Biao Zhang, Mathias M\"uller, Rico Sennrich
- Abstract summary: We propose a simple unified neural model designed to support multiple sign-to-gloss, gloss-to-text and sign-to-text translation tasks.
Jointly modeling different tasks endows SLTUNET with the capability to explore the cross-task relatedness that could help narrow the modality gap.
We show in experiments that SLTUNET achieves competitive and even state-of-the-art performance on ENIX-2014T and CSL-Daily.
- Score: 40.93099095994472
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite recent successes with neural models for sign language translation
(SLT), translation quality still lags behind spoken languages because of the
data scarcity and modality gap between sign video and text. To address both
problems, we investigate strategies for cross-modality representation sharing
for SLT. We propose SLTUNET, a simple unified neural model designed to support
multiple SLTrelated tasks jointly, such as sign-to-gloss, gloss-to-text and
sign-to-text translation. Jointly modeling different tasks endows SLTUNET with
the capability to explore the cross-task relatedness that could help narrow the
modality gap. In addition, this allows us to leverage the knowledge from
external resources, such as abundant parallel data used for spoken-language
machine translation (MT). We show in experiments that SLTUNET achieves
competitive and even state-of-the-art performance on PHOENIX-2014T and
CSL-Daily when augmented with MT data and equipped with a set of optimization
techniques. We further use the DGS Corpus for end-to-end SLT for the first
time. It covers broader domains with a significantly larger vocabulary, which
is more challenging and which we consider to allow for a more realistic
assessment of the current state of SLT than the former two. Still, SLTUNET
obtains improved results on the DGS Corpus. Code is available at
https://github.com/bzhangGo/sltunet.
Related papers
- Diverse Sign Language Translation [27.457810402402387]
We introduce a Diverse Sign Language Translation (DivSLT) task, aiming to generate diverse yet accurate translations for sign language videos.
We employ large language models (LLM) to generate multiple references for the widely-used CSL-Daily and PHOENIX14T SLT datasets.
Specifically, we investigate multi-reference training strategies to enable our DivSLT model to achieve diverse translations.
arXiv Detail & Related papers (2024-10-25T14:28:20Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - Cross-modality Data Augmentation for End-to-End Sign Language Translation [66.46877279084083]
End-to-end sign language translation (SLT) aims to convert sign language videos into spoken language texts directly without intermediate representations.
It has been a challenging task due to the modality gap between sign videos and texts and the data scarcity of labeled data.
We propose a novel Cross-modality Data Augmentation (XmDA) framework to transfer the powerful gloss-to-text translation capabilities to end-to-end sign language translation.
arXiv Detail & Related papers (2023-05-18T16:34:18Z) - A Simple Multi-Modality Transfer Learning Baseline for Sign Language
Translation [54.29679610921429]
Existing sign language datasets contain only about 10K-20K pairs of sign videos, gloss annotations and texts.
Data is thus a bottleneck for training effective sign language translation models.
This simple baseline surpasses the previous state-of-the-art results on two sign language translation benchmarks.
arXiv Detail & Related papers (2022-03-08T18:59:56Z) - SimulSLT: End-to-End Simultaneous Sign Language Translation [55.54237194555432]
Existing sign language translation methods need to read all the videos before starting the translation.
We propose SimulSLT, the first end-to-end simultaneous sign language translation model.
SimulSLT achieves BLEU scores that exceed the latest end-to-end non-simultaneous sign language translation model.
arXiv Detail & Related papers (2021-12-08T11:04:52Z) - Improving Sign Language Translation with Monolingual Data by Sign
Back-Translation [105.83166521438463]
We propose a sign back-translation (SignBT) approach, which incorporates massive spoken language texts into sign training.
With a text-to-gloss translation model, we first back-translate the monolingual text to its gloss sequence.
Then, the paired sign sequence is generated by splicing pieces from an estimated gloss-to-sign bank at the feature level.
arXiv Detail & Related papers (2021-05-26T08:49:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.