Related papers: SSLR: A Semi-Supervised Learning Method for Isolated Sign Language Recognition

SSLR: A Semi-Supervised Learning Method for Isolated Sign Language Recognition

URL: http://arxiv.org/abs/2504.16640v1
Date: Wed, 23 Apr 2025 11:59:52 GMT
Title: SSLR: A Semi-Supervised Learning Method for Isolated Sign Language Recognition
Authors: Hasan Algafri, Hamzah Luqman, Sarah Alyami, Issam Laradji,
Abstract summary: Sign language recognition systems aim to recognize sign gestures and translate them into spoken language.<n>One of the main challenges in SLR is the scarcity of annotated datasets.<n>We propose a semi-supervised learning approach for SLR, employing a pseudo-label method to annotate unlabeled samples.
Score: 2.409285779772107
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Sign language is the primary communication language for people with disabling hearing loss. Sign language recognition (SLR) systems aim to recognize sign gestures and translate them into spoken language. One of the main challenges in SLR is the scarcity of annotated datasets. To address this issue, we propose a semi-supervised learning (SSL) approach for SLR (SSLR), employing a pseudo-label method to annotate unlabeled samples. The sign gestures are represented using pose information that encodes the signer's skeletal joint points. This information is used as input for the Transformer backbone model utilized in the proposed approach. To demonstrate the learning capabilities of SSL across various labeled data sizes, several experiments were conducted using different percentages of labeled data with varying numbers of classes. The performance of the SSL approach was compared with a fully supervised learning-based model on the WLASL-100 dataset. The obtained results of the SSL model outperformed the supervised learning-based model with less labeled data in many cases.

Related papers

Interpretable Zero-Shot Learning with Locally-Aligned Vision-Language Model [56.573203512455706]
Large-scale vision-language models (VLMs) have achieved remarkable success in zero-shot learning (ZSL) by leveraging large-scale visual-text pair datasets.<n>One approach to address this issue is to develop interpretable models by integrating language.<n>We propose LaZSL, a locally-aligned vision-language model for interpretable ZSL.
arXiv Detail & Related papers (2025-06-30T13:14:46Z)
Logos as a Well-Tempered Pre-train for Sign Language Recognition [75.42794328290088]
This paper presents Logos, a novel Russian Sign Language (RSL) dataset.<n>It is shown that a model, pre-trained on the Logos dataset can be used as a universal encoder for other language SLR tasks.<n>We show that explicitly labeling visually similar signs improves trained model quality as a visual encoder for downstream tasks.
arXiv Detail & Related papers (2025-05-15T16:31:49Z)
A Closer Look at Benchmarking Self-Supervised Pre-training with Image Classification [51.35500308126506]
Self-supervised learning (SSL) is a machine learning approach where the data itself provides supervision, eliminating the need for external labels. We study how classification-based evaluation protocols for SSL correlate and how well they predict downstream performance on different dataset types.
arXiv Detail & Related papers (2024-07-16T23:17:36Z)
Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition. We first build two sign language dictionaries containing isolated signs that appear in two datasets. Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z)
Towards the extraction of robust sign embeddings for low resource sign language recognition [7.969704867355098]
We show that keypoint-based embeddings can transfer between sign languages and achieve competitive performance. We furthermore achieve better performance using fine-tuned transferred embeddings than models trained only on the target sign language.
arXiv Detail & Related papers (2023-06-30T11:21:40Z)
Explaining, Analyzing, and Probing Representations of Self-Supervised Learning Models for Sensor-based Human Activity Recognition [2.2082422928825136]
Self-supervised learning (SSL) frameworks have been extensively applied to sensor-based Human Activity Recognition (HAR) In this paper, we aim to analyze deep representations of two recent SSL frameworks, namely SimCLR and VICReg.
arXiv Detail & Related papers (2023-04-14T07:53:59Z)
Automatic Pronunciation Assessment using Self-Supervised Speech Representation Learning [13.391307807956673]
We propose a novel automatic pronunciation assessment method based on self-supervised learning (SSL) models. First, the proposed method fine-tunes the pre-trained SSL models with connectionist temporal classification to adapt the English pronunciation of English-as-a-second-language learners. We show that the proposed SSL model-based methods outperform the baselines, in terms of the Pearson correlation coefficient, on datasets of Korean ESL learner children and Speechocean762.
arXiv Detail & Related papers (2022-04-08T06:13:55Z)
Augmented Contrastive Self-Supervised Learning for Audio Invariant Representations [28.511060004984895]
We propose an augmented contrastive SSL framework to learn invariant representations from unlabeled data. Our method applies various perturbations to the unlabeled input data and utilizes contrastive learning to learn representations robust to such perturbations.
arXiv Detail & Related papers (2021-12-21T02:50:53Z)
UniSpeech-SAT: Universal Speech Representation Learning with Speaker Aware Pre-Training [72.004873454347]
Two methods are introduced for enhancing the unsupervised speaker information extraction. Experiment results on SUPERB benchmark show that the proposed system achieves state-of-the-art performance. We scale up training dataset to 94 thousand hours public audio data and achieve further performance improvement.
arXiv Detail & Related papers (2021-10-12T05:43:30Z)
Multi-Modal Zero-Shot Sign Language Recognition [51.07720650677784]
We propose a multi-modal Zero-Shot Sign Language Recognition model. A Transformer-based model along with a C3D model is used for hand detection and deep features extraction. A semantic space is used to map the visual features to the lingual embedding of the class labels.
arXiv Detail & Related papers (2021-09-02T09:10:39Z)
LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech [63.84741259993937]
Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. We propose LeBenchmark: a reproducible framework for assessing SSL from speech.
arXiv Detail & Related papers (2021-04-23T08:27:09Z)
End-to-end Generative Zero-shot Learning via Few-shot Learning [76.9964261884635]
State-of-the-art approaches to Zero-Shot Learning (ZSL) train generative nets to synthesize examples conditioned on the provided metadata. We introduce an end-to-end generative ZSL framework that uses such an approach as a backbone and feeds its synthesized output to a Few-Shot Learning algorithm.
arXiv Detail & Related papers (2021-02-08T17:35:37Z)
Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation. We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.