Logos as a Well-Tempered Pre-train for Sign Language Recognition
- URL: http://arxiv.org/abs/2505.10481v1
- Date: Thu, 15 May 2025 16:31:49 GMT
- Title: Logos as a Well-Tempered Pre-train for Sign Language Recognition
- Authors: Ilya Ovodov, Petr Surovtsev, Karina Kvanchiani, Alexander Kapitanov, Alexander Nagaev,
- Abstract summary: This paper presents Logos, a novel Russian Sign Language (RSL) dataset.<n>It is shown that a model, pre-trained on the Logos dataset can be used as a universal encoder for other language SLR tasks.<n>We show that explicitly labeling visually similar signs improves trained model quality as a visual encoder for downstream tasks.
- Score: 75.42794328290088
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper examines two aspects of the isolated sign language recognition (ISLR) task. First, despite the availability of a number of datasets, the amount of data for most individual sign languages is limited. It poses the challenge of cross-language ISLR model training, including transfer learning. Second, similar signs can have different semantic meanings. It leads to ambiguity in dataset labeling and raises the question of the best policy for annotating such signs. To address these issues, this study presents Logos, a novel Russian Sign Language (RSL) dataset, the most extensive ISLR dataset by the number of signers and one of the largest available datasets while also the largest RSL dataset in size and vocabulary. It is shown that a model, pre-trained on the Logos dataset can be used as a universal encoder for other language SLR tasks, including few-shot learning. We explore cross-language transfer learning approaches and find that joint training using multiple classification heads benefits accuracy for the target lowresource datasets the most. The key feature of the Logos dataset is explicitly annotated visually similar sign groups. We show that explicitly labeling visually similar signs improves trained model quality as a visual encoder for downstream tasks. Based on the proposed contributions, we outperform current state-of-the-art results for the WLASL dataset and get competitive results for the AUTSL dataset, with a single stream model processing solely RGB video. The source code, dataset, and pre-trained models are publicly available.
Related papers
- Isharah: A Large-Scale Multi-Scene Dataset for Continuous Sign Language Recognition [5.772467856207932]
We present Isharah, a large multi-scene dataset for continuous SLR.<n>The dataset consists of 30,000 video clips performed by 18 deaf and professional signers.<n>This paper also introduces multiple sign language understanding benchmarks, including signer-independent and unseen-sentence CSLR.
arXiv Detail & Related papers (2025-06-04T06:48:21Z) - Language Representations Can be What Recommenders Need: Findings and Potentials [57.90679739598295]
We show that item representations, when linearly mapped from advanced LM representations, yield superior recommendation performance.<n>This outcome suggests the possible homomorphism between the advanced language representation space and an effective item representation space for recommendation.<n>Our findings highlight the connection between language modeling and behavior modeling, which can inspire both natural language processing and recommender system communities.
arXiv Detail & Related papers (2024-07-07T17:05:24Z) - A Tale of Two Languages: Large-Vocabulary Continuous Sign Language Recognition from Spoken Language Supervision [74.972172804514]
We introduce a multi-task Transformer model, CSLR2, that is able to ingest a signing sequence and output in a joint embedding space between signed language and spoken language text.
New dataset annotations provide continuous sign-level annotations for six hours of test videos, and will be made publicly available.
Our model significantly outperforms the previous state of the art on both tasks.
arXiv Detail & Related papers (2024-05-16T17:19:06Z) - Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets [2.512406961007489]
We use a temporal graph convolution-based sign language recognition approach to evaluate five supervised transfer learning approaches.
Experiments demonstrate that improvement over finetuning based transfer learning is possible with specialized supervised transfer learning methods.
arXiv Detail & Related papers (2024-03-21T16:36:40Z) - Improving Continuous Sign Language Recognition with Cross-Lingual Signs [29.077175863743484]
We study the feasibility of utilizing multilingual sign language corpora to facilitate continuous sign language recognition.
We first build two sign language dictionaries containing isolated signs that appear in two datasets.
Then we identify the sign-to-sign mappings between two sign languages via a well-optimized isolated sign language recognition model.
arXiv Detail & Related papers (2023-08-21T15:58:47Z) - Towards the extraction of robust sign embeddings for low resource sign
language recognition [7.969704867355098]
We show that keypoint-based embeddings can transfer between sign languages and achieve competitive performance.
We furthermore achieve better performance using fine-tuned transferred embeddings than models trained only on the target sign language.
arXiv Detail & Related papers (2023-06-30T11:21:40Z) - AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators [98.11286353828525]
GPT-3.5 series models have demonstrated remarkable few-shot and zero-shot ability across various NLP tasks.
We propose AnnoLLM, which adopts a two-step approach, explain-then-annotate.
We build the first conversation-based information retrieval dataset employing AnnoLLM.
arXiv Detail & Related papers (2023-03-29T17:03:21Z) - LSA-T: The first continuous Argentinian Sign Language dataset for Sign
Language Translation [52.87578398308052]
Sign language translation (SLT) is an active field of study that encompasses human-computer interaction, computer vision, natural language processing and machine learning.
This paper presents the first continuous Argentinian Sign Language (LSA) dataset.
It contains 14,880 sentence level videos of LSA extracted from the CN Sordos YouTube channel with labels and keypoints annotations for each signer.
arXiv Detail & Related papers (2022-11-14T14:46:44Z) - A Simple Multi-Modality Transfer Learning Baseline for Sign Language
Translation [54.29679610921429]
Existing sign language datasets contain only about 10K-20K pairs of sign videos, gloss annotations and texts.
Data is thus a bottleneck for training effective sign language translation models.
This simple baseline surpasses the previous state-of-the-art results on two sign language translation benchmarks.
arXiv Detail & Related papers (2022-03-08T18:59:56Z) - Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation.
We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.