Related papers: Saudi Sign Language Translation Using T5

Saudi Sign Language Translation Using T5

URL: http://arxiv.org/abs/2510.11183v1
Date: Mon, 13 Oct 2025 09:18:34 GMT
Title: Saudi Sign Language Translation Using T5
Authors: Ali Alhejab, Tomas Zelezny, Lamya Alkanhal, Ivan Gruber, Yazeed Alharbi, Jakub Straka, Vaclav Javorek, Marek Hruz, Badriah Alkalifah, Ahmed Ali,
Abstract summary: This paper explores the application of T5 models for Saudi Sign Language (SSL) translation using a novel dataset.<n>The SSL dataset includes three challenging testing protocols, enabling comprehensive evaluation across different scenarios.<n>In our experiments, we investigate the impact of pre-training on American Sign Language (ASL) data by comparing T5 models pre-trained on the YouTubeASL dataset with models trained directly on the SSL dataset.
Score: 2.9661113373175034
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper explores the application of T5 models for Saudi Sign Language (SSL) translation using a novel dataset. The SSL dataset includes three challenging testing protocols, enabling comprehensive evaluation across different scenarios. Additionally, it captures unique SSL characteristics, such as face coverings, which pose challenges for sign recognition and translation. In our experiments, we investigate the impact of pre-training on American Sign Language (ASL) data by comparing T5 models pre-trained on the YouTubeASL dataset with models trained directly on the SSL dataset. Experimental results demonstrate that pre-training on YouTubeASL significantly improves models' performance (roughly $3\times$ in BLEU-4), indicating cross-linguistic transferability in sign language models. Our findings highlight the benefits of leveraging large-scale ASL data to improve SSL translation and provide insights into the development of more effective sign language translation systems. Our code is publicly available at our GitHub repository.

Related papers

Lost in Translation, Found in Embeddings: Sign Language Translation and Alignment [84.39962912136525]
We develop a model for sign language understanding that performs sign language translation (SLT) and sign-subtitle alignment (SSA)<n>Our approach is built upon three components: (i) a lightweight visual backbone that captures manual and non-manual cues from human keypoints and lip-region images; (ii) a Sliding Perceiver mapping network that aggregates consecutive visual features into word-level embeddings; and (iii) a multi-task scalable training strategy that jointly optimises SLT and SSA.
arXiv Detail & Related papers (2025-12-08T21:05:46Z)
Logos as a Well-Tempered Pre-train for Sign Language Recognition [75.42794328290088]
This paper presents Logos, a novel Russian Sign Language (RSL) dataset.<n>It is shown that a model, pre-trained on the Logos dataset can be used as a universal encoder for other language SLR tasks.<n>We show that explicitly labeling visually similar signs improves trained model quality as a visual encoder for downstream tasks.
arXiv Detail & Related papers (2025-05-15T16:31:49Z)
SSLR: A Semi-Supervised Learning Method for Isolated Sign Language Recognition [2.409285779772107]
Sign language recognition systems aim to recognize sign gestures and translate them into spoken language.<n>One of the main challenges in SLR is the scarcity of annotated datasets.<n>We propose a semi-supervised learning approach for SLR, employing a pseudo-label method to annotate unlabeled samples.
arXiv Detail & Related papers (2025-04-23T11:59:52Z)
Where Did Your Model Learn That? Label-free Influence for Self-supervised Learning [0.48933451909251774]
Self-supervised learning has revolutionized learning from large-scale unlabeled datasets.<n>Introductory relationship between pretraining data and learned representations remains poorly understood.<n>We introduce Influence-SSL, a novel and label-free approach for defining influence functions tailored to SSL.
arXiv Detail & Related papers (2024-12-22T21:43:56Z)
The American Sign Language Knowledge Graph: Infusing ASL Models with Linguistic Knowledge [6.481946043182915]
We introduce the American Sign Language Knowledge Graph (ASLKG), compiled from twelve sources of expert linguistic knowledge. We use the ASLKG to train neuro-symbolic models for 3 ASL understanding tasks, achieving accuracies of 91% on ISR, 14% for predicting the semantic features of unseen signs, and 36% for classifying the topic of Youtube-ASL videos.
arXiv Detail & Related papers (2024-11-06T00:16:16Z)
Diverse Sign Language Translation [27.457810402402387]
We introduce a Diverse Sign Language Translation (DivSLT) task, aiming to generate diverse yet accurate translations for sign language videos. We employ large language models (LLM) to generate multiple references for the widely-used CSL-Daily and PHOENIX14T SLT datasets. Specifically, we investigate multi-reference training strategies to enable our DivSLT model to achieve diverse translations.
arXiv Detail & Related papers (2024-10-25T14:28:20Z)
Scaling Sign Language Translation [38.43594795927101]
Sign language translation (SLT) addresses the problem of translating information from a sign language in video to a spoken language in text. In this paper, we push forward the frontier of SLT by scaling pretraining data, model size, and number of translation directions. Experiments show substantial quality improvements over the vanilla baselines, surpassing the previous state-of-the-art (SOTA) by wide margins.
arXiv Detail & Related papers (2024-07-16T15:36:58Z)
Gloss-free Sign Language Translation: Improving from Visual-Language Pretraining [56.26550923909137]
Gloss-Free Sign Language Translation (SLT) is a challenging task due to its cross-domain nature. We propose a novel Gloss-Free SLT based on Visual-Language Pretraining (GFSLT-) Our approach involves two stages: (i) integrating Contrastive Language-Image Pre-training with masked self-supervised learning to create pre-tasks that bridge the semantic gap between visual and textual representations and restore masked sentences, and (ii) constructing an end-to-end architecture with an encoder-decoder-like structure that inherits the parameters of the pre-trained Visual and Text Decoder from
arXiv Detail & Related papers (2023-07-27T10:59:18Z)
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation [54.29679610921429]
Existing sign language datasets contain only about 10K-20K pairs of sign videos, gloss annotations and texts. Data is thus a bottleneck for training effective sign language translation models. This simple baseline surpasses the previous state-of-the-art results on two sign language translation benchmarks.
arXiv Detail & Related papers (2022-03-08T18:59:56Z)
Improving Sign Language Translation with Monolingual Data by Sign Back-Translation [105.83166521438463]
We propose a sign back-translation (SignBT) approach, which incorporates massive spoken language texts into sign training. With a text-to-gloss translation model, we first back-translate the monolingual text to its gloss sequence. Then, the paired sign sequence is generated by splicing pieces from an estimated gloss-to-sign bank at the feature level.
arXiv Detail & Related papers (2021-05-26T08:49:30Z)
LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech [63.84741259993937]
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. We propose LeBenchmark: a reproducible framework for assessing SSL from speech.
arXiv Detail & Related papers (2021-04-23T08:27:09Z)
End-to-end Generative Zero-shot Learning via Few-shot Learning [76.9964261884635]
State-of-the-art approaches to Zero-Shot Learning (ZSL) train generative nets to synthesize examples conditioned on the provided metadata. We introduce an end-to-end generative ZSL framework that uses such an approach as a backbone and feeds its synthesized output to a Few-Shot Learning algorithm.
arXiv Detail & Related papers (2021-02-08T17:35:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.