Gloss Attention for Gloss-free Sign Language Translation
- URL: http://arxiv.org/abs/2307.07361v1
- Date: Fri, 14 Jul 2023 14:07:55 GMT
- Title: Gloss Attention for Gloss-free Sign Language Translation
- Authors: Aoxiong Yin, Tianyun Zhong, Li Tang, Weike Jin, Tao Jin, Zhou Zhao
- Abstract summary: We show how gloss annotations make sign language translation easier.
We then propose emphgloss attention, which enables the model to keep its attention within video segments that have the same semantics locally.
Experimental results on multiple large-scale sign language datasets show that our proposed GASLT model significantly outperforms existing methods.
- Score: 60.633146518820325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most sign language translation (SLT) methods to date require the use of gloss
annotations to provide additional supervision information, however, the
acquisition of gloss is not easy. To solve this problem, we first perform an
analysis of existing models to confirm how gloss annotations make SLT easier.
We find that it can provide two aspects of information for the model, 1) it can
help the model implicitly learn the location of semantic boundaries in
continuous sign language videos, 2) it can help the model understand the sign
language video globally. We then propose \emph{gloss attention}, which enables
the model to keep its attention within video segments that have the same
semantics locally, just as gloss helps existing models do. Furthermore, we
transfer the knowledge of sentence-to-sentence similarity from the natural
language model to our gloss attention SLT network (GASLT) to help it understand
sign language videos at the sentence level. Experimental results on multiple
large-scale sign language datasets show that our proposed GASLT model
significantly outperforms existing methods. Our code is provided in
\url{https://github.com/YinAoXiong/GASLT}.
Related papers
- Universal Gloss-level Representation for Gloss-free Sign Language Translation and Production [9.065171626657818]
Universal Gloss-level Representation (UniGloR) is a unified and self-supervised solution for both Sign Language Translation and Sign Language Production.
Our results demonstrate UniGloR's effectiveness in the translation and production tasks.
Our study suggests that self-supervised learning can be made in a unified manner, paving the way for innovative and practical applications.
arXiv Detail & Related papers (2024-07-03T07:12:36Z) - Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation [30.008980708977095]
We introduce Sign2GPT, a novel framework for sign language translation.
We propose a novel pretraining strategy that directs our encoder to learn sign representations from automatically extracted pseudo-glosses.
We evaluate our approach on two public benchmark sign language translation datasets.
arXiv Detail & Related papers (2024-05-07T10:00:38Z) - Gloss-free Sign Language Translation: Improving from Visual-Language
Pretraining [56.26550923909137]
Gloss-Free Sign Language Translation (SLT) is a challenging task due to its cross-domain nature.
We propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-)
Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual and Text Decoder from
arXiv Detail & Related papers (2023-07-27T10:59:18Z) - Gloss-Free End-to-End Sign Language Translation [59.28829048788345]
We design the Gloss-Free End-to-end sign language translation framework (GloFE)
Our method improves the performance of SLT in the gloss-free setting by exploiting the shared underlying semantics of signs and the corresponding spoken translation.
We obtained state-of-the-art results on large-scale datasets, including OpenASL and How2Sign.
arXiv Detail & Related papers (2023-05-22T09:57:43Z) - A Simple Multi-Modality Transfer Learning Baseline for Sign Language
Translation [54.29679610921429]
Existing sign language datasets contain only about 10K-20K pairs of sign videos, gloss annotations and texts.
Data is thus a bottleneck for training effective sign language translation models.
This simple baseline surpasses the previous state-of-the-art results on two sign language translation benchmarks.
arXiv Detail & Related papers (2022-03-08T18:59:56Z) - SimulSLT: End-to-End Simultaneous Sign Language Translation [55.54237194555432]
Existing sign language translation methods need to read all the videos before starting the translation.
We propose SimulSLT, the first end-to-end simultaneous sign language translation model.
SimulSLT achieves BLEU scores that exceed the latest end-to-end non-simultaneous sign language translation model.
arXiv Detail & Related papers (2021-12-08T11:04:52Z) - Predictive Representation Learning for Language Modeling [33.08232449211759]
Correlates of secondary information appear in LSTM representations even though they are not part of an emphexplicitly supervised prediction task.
We propose Predictive Representation Learning (PRL), which explicitly constrains LSTMs to encode specific predictions.
arXiv Detail & Related papers (2021-05-29T05:03:47Z) - Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation.
We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.