Related papers: SL-SLR: Self-Supervised Representation Learning for Sign Language Recognition

SL-SLR: Self-Supervised Representation Learning for Sign Language Recognition

URL: http://arxiv.org/abs/2509.05188v1
Date: Fri, 05 Sep 2025 15:38:19 GMT
Title: SL-SLR: Self-Supervised Representation Learning for Sign Language Recognition
Authors: Ariel Basso Madjoukeng, Jérôme Fink, Pierre Poitier, Edith Belise Kenmogne, Benoit Frenay,
Abstract summary: Sign language recognition ( SLR) is a machine learning task aiming to identify signs in videos.<n>In SLR, only certain parts provide information that is truly useful for its recognition.<n>This paper proposes a self-supervised learning framework designed to learn meaningful representations for SLR.
Score: 0.5872014229110214
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sign language recognition (SLR) is a machine learning task aiming to identify signs in videos. Due to the scarcity of annotated data, unsupervised methods like contrastive learning have become promising in this field. They learn meaningful representations by pulling positive pairs (two augmented versions of the same instance) closer and pushing negative pairs (different from the positive pairs) apart. In SLR, in a sign video, only certain parts provide information that is truly useful for its recognition. Applying contrastive methods to SLR raises two issues: (i) contrastive learning methods treat all parts of a video in the same way, without taking into account the relevance of certain parts over others; (ii) shared movements between different signs make negative pairs highly similar, complicating sign discrimination. These issues lead to learning non-discriminative features for sign recognition and poor results in downstream tasks. In response, this paper proposes a self-supervised learning framework designed to learn meaningful representations for SLR. This framework consists of two key components designed to work together: (i) a new self-supervised approach with free-negative pairs; (ii) a new data augmentation technique. This approach shows a considerable gain in accuracy compared to several contrastive and self-supervised methods, across linear evaluation, semi-supervised learning, and transferability between sign languages.

Related papers

SSLR: A Semi-Supervised Learning Method for Isolated Sign Language Recognition [2.409285779772107]
Sign language recognition systems aim to recognize sign gestures and translate them into spoken language.<n>One of the main challenges in SLR is the scarcity of annotated datasets.<n>We propose a semi-supervised learning approach for SLR, employing a pseudo-label method to annotate unlabeled samples.
arXiv Detail & Related papers (2025-04-23T11:59:52Z)
Rethinking Prompting Strategies for Multi-Label Recognition with Partial Annotations [7.7149881834358345]
We study the impact of positive and negative prompt learning on Multi-Label Recognition (MLR) We introduce PositiveCoOp and NegativeCoOp, where only one prompt is learned with VLM guidance while the other is replaced by an embedding vector. We observe that negative prompts degrade MLR performance, and learning only positive prompts, combined with learned negative embeddings, outperforms dual prompt learning approaches.
arXiv Detail & Related papers (2024-09-12T20:02:51Z)
Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition [96.62264528407863]
We propose a self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency. Inspired by the complementary property of motion and joint modalities, we first introduce first-order motion information into sign language modeling. Our method is evaluated with extensive experiments on four public benchmarks, and achieves new state-of-the-art performance with a notable margin.
arXiv Detail & Related papers (2024-06-15T04:50:19Z)
A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels. We present a generative latent variable model for self-supervised learning. We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z)
ML-LMCL: Mutual Learning and Large-Margin Contrastive Learning for Improving ASR Robustness in Spoken Language Understanding [55.39105863825107]
We propose Mutual Learning and Large-Margin Contrastive Learning (ML-LMCL) to improve automatic speech recognition (ASR) robustness. In fine-tuning, we apply mutual learning and train two SLU models on the manual transcripts and the ASR transcripts, respectively. Experiments on three datasets show that ML-LMCL outperforms existing models and achieves new state-of-the-art performance.
arXiv Detail & Related papers (2023-11-19T16:53:35Z)
Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition. We first build two sign language dictionaries containing isolated signs that appear in two datasets. Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z)
RaSa: Relation and Sensitivity Aware Representation Learning for Text-based Person Search [51.09723403468361]
We propose a Relation and Sensitivity aware representation learning method (RaSa) RaSa includes two novel tasks: Relation-Aware learning (RA) and Sensitivity-Aware learning (SA) Experiments demonstrate that RaSa outperforms existing state-of-the-art methods by 6.94%, 4.45% and 15.35% in terms of Rank@1 on datasets.
arXiv Detail & Related papers (2023-05-23T03:53:57Z)
Rethinking Robust Contrastive Learning from the Adversarial Perspective [2.3333090554192615]
We find significant disparities between adversarial and clean representations in standard-trained networks. adversarial training mitigates these disparities and fosters the convergence of representations toward a universal set.
arXiv Detail & Related papers (2023-02-05T22:43:50Z)
Non-contrastive representation learning for intervals from well logs [58.70164460091879]
The representation learning problem in the oil & gas industry aims to construct a model that provides a representation based on logging data for a well interval. One of the possible approaches is self-supervised learning (SSL) We are the first to introduce non-contrastive SSL for well-logging data.
arXiv Detail & Related papers (2022-09-28T13:27:10Z)
Unified Contrastive Learning in Image-Text-Label Space [130.31947133453406]
Unified Contrastive Learning (UniCL) is effective way of learning semantically rich yet discriminative representations. UniCL stand-alone is a good learner on pure imagelabel data, rivaling the supervised learning methods across three image classification datasets.
arXiv Detail & Related papers (2022-04-07T17:34:51Z)
Simple Contrastive Representation Adversarial Learning for NLP Tasks [17.12062566060011]
Two novel frameworks, supervised contrastive adversarial learning (SCAL) and unsupervised SCAL (USCAL), are proposed. We employ it to Transformer-based models for natural language understanding, sentence semantic textual similarity and adversarial learning tasks. Experimental results on GLUE benchmark tasks show that our fine-tuned supervised method outperforms BERT$_base$ over 1.75%.
arXiv Detail & Related papers (2021-11-26T03:16:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.