Related papers: Continuous Bangla Sign Language Translation: Mitigating the Expense of Gloss Annotation with the Assistance of Graph

Continuous Bangla Sign Language Translation: Mitigating the Expense of Gloss Annotation with the Assistance of Graph

URL: http://arxiv.org/abs/2508.10687v1
Date: Thu, 14 Aug 2025 14:32:31 GMT
Title: Continuous Bangla Sign Language Translation: Mitigating the Expense of Gloss Annotation with the Assistance of Graph
Authors: Safaeid Hossain Arib, Rabeya Akter, Sejuti Rahman,
Abstract summary: In societies that prioritize spoken languages, sign language often faces underestimation, leading to social exclusion.<n>The Continuous Bangla Sign Language Translation project aims to address this gap by enhancing translation methods.<n>Our contributions include architectural fusion, exploring various fusion strategies, and achieving a new state-of-the-art performance on diverse sign language datasets.
Score: 0.8192907805418583
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Millions of individuals worldwide are affected by deafness and hearing impairment. Sign language serves as a sophisticated means of communication for the deaf and hard of hearing. However, in societies that prioritize spoken languages, sign language often faces underestimation, leading to communication barriers and social exclusion. The Continuous Bangla Sign Language Translation project aims to address this gap by enhancing translation methods. While recent approaches leverage transformer architecture for state-of-the-art results, our method integrates graph-based methods with the transformer architecture. This fusion, combining transformer and STGCN-LSTM architectures, proves more effective in gloss-free translation. Our contributions include architectural fusion, exploring various fusion strategies, and achieving a new state-of-the-art performance on diverse sign language datasets, namely RWTH-PHOENIX-2014T, CSL-Daily, How2Sign, and BornilDB v1.0. Our approach demonstrates superior performance compared to current translation outcomes across all datasets, showcasing notable improvements of BLEU-4 scores of 4.01, 2.07, and 0.5, surpassing those of GASLT, GASLT and slt_how2sign in RWTH-PHOENIX-2014T, CSL-Daily, and How2Sign, respectively. Also, we introduce benchmarking on the BornilDB v1.0 dataset for the first time. Our method sets a benchmark for future research, emphasizing the importance of gloss-free translation to improve communication accessibility for the deaf and hard of hearing.

Related papers

RVLF: A Reinforcing Vision-Language Framework for Gloss-Free Sign Language Translation [44.39679803351263]
We build a large vision-language model (LVLM) specifically designed for sign language.<n>For a sufficient representation of sign language, RVLF introduces an effective semantic representation learning mechanism.<n>Then, to improve sentence-level semantic misalignment, we introduce a GRPO-based optimization strategy.
arXiv Detail & Related papers (2025-12-08T08:11:53Z)
POSESTITCH-SLT: Linguistically Inspired Pose-Stitching for End-to-End Sign Language Translation [21.374625125480133]
We propose POSESTITCH-SLT, a novel pre-training scheme for neural machine translation for sign languages.<n>We show that a simple transformer-based encoder-decoder architecture outperforms the prior art when considering template-generated sentence pairs in training.
arXiv Detail & Related papers (2025-10-31T21:44:59Z)
A multitask transformer to sign language translation using motion gesture primitives [0.6249768559720122]
This work introduces a multitask transformer architecture that includes a gloss learning representation to achieve a more suitable translation.<n>The proposed approach outperforms the state-of-the-art evaluated on the CoL-SLTD dataset, achieving a BLEU-4 of 72,64% in split 1, and a BLEU-4 of 14,64% in split 2.
arXiv Detail & Related papers (2025-03-25T13:53:25Z)
Spatio-temporal transformer to support automatic sign language translation [0.0]
This paper introduces a Transformer-based architecture that encodestemporal motion gestures, preserving both local and long-range spatial information.<n>The proposed approach was validated on the Colombian Sign Language Translation dataset.
arXiv Detail & Related papers (2025-02-04T18:59:19Z)
Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining [56.26550923909137]
Gloss-Free Sign Language Translation (SLT) is a challenging task due to its cross-domain nature. We propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-) Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual and Text Decoder from
arXiv Detail & Related papers (2023-07-27T10:59:18Z)
Cross-modality Data Augmentation for End-to-End Sign Language Translation [66.46877279084083]
End-to-end sign language translation (SLT) aims to convert sign language videos into spoken language texts directly without intermediate representations. It has been a challenging task due to the modality gap between sign videos and texts and the data scarcity of labeled data. We propose a novel Cross-modality Data Augmentation (XmDA) framework to transfer the powerful gloss-to-text translation capabilities to end-to-end sign language translation.
arXiv Detail & Related papers (2023-05-18T16:34:18Z)
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation [54.29679610921429]
Existing sign language datasets contain only about 10K-20K pairs of sign videos, gloss annotations and texts. Data is thus a bottleneck for training effective sign language translation models. This simple baseline surpasses the previous state-of-the-art results on two sign language translation benchmarks.
arXiv Detail & Related papers (2022-03-08T18:59:56Z)
SimulSLT: End-to-End Simultaneous Sign Language Translation [55.54237194555432]
Existing sign language translation methods need to read all the videos before starting the translation. We propose SimulSLT, the first end-to-end simultaneous sign language translation model. SimulSLT achieves BLEU scores that exceed the latest end-to-end non-simultaneous sign language translation model.
arXiv Detail & Related papers (2021-12-08T11:04:52Z)
Improving Sign Language Translation with Monolingual Data by Sign Back-Translation [105.83166521438463]
We propose a sign back-translation (SignBT) approach, which incorporates massive spoken language texts into sign training. With a text-to-gloss translation model, we first back-translate the monolingual text to its gloss sequence. Then, the paired sign sequence is generated by splicing pieces from an estimated gloss-to-sign bank at the feature level.
arXiv Detail & Related papers (2021-05-26T08:49:30Z)
Sign Language Transformers: Joint End-to-end Sign Language Recognition and Translation [59.38247587308604]
We introduce a novel transformer based architecture that jointly learns Continuous Sign Language Recognition and Translation. We evaluate the recognition and translation performances of our approaches on the challenging RWTH-PHOENIX-Weather-2014T dataset. Our translation networks outperform both sign video to spoken language and gloss to spoken language translation models.
arXiv Detail & Related papers (2020-03-30T21:35:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.