A Comprehensive Study on Deep Learning-based Methods for Sign Language
Recognition
- URL: http://arxiv.org/abs/2007.12530v2
- Date: Fri, 19 Mar 2021 19:32:15 GMT
- Title: A Comprehensive Study on Deep Learning-based Methods for Sign Language
Recognition
- Authors: Nikolas Adaloglou, Theocharis Chatzis, Ilias Papastratis, Andreas
Stergioulas, Georgios Th. Papadopoulos, Vassia Zacharopoulou, George J.
Xydopoulos, Klimnis Atzakas, Dimitris Papazachariou, and Petros Daras
- Abstract summary: The aim of the present study is to provide insights on sign language recognition, focusing on mapping non-segmented video streams to glosses.
To the best of our knowledge, this is the first sign language dataset where sentence and gloss level annotations are provided for a video capture.
- Score: 14.714669469867871
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, a comparative experimental assessment of computer vision-based
methods for sign language recognition is conducted. By implementing the most
recent deep neural network methods in this field, a thorough evaluation on
multiple publicly available datasets is performed. The aim of the present study
is to provide insights on sign language recognition, focusing on mapping
non-segmented video streams to glosses. For this task, two new sequence
training criteria, known from the fields of speech and scene text recognition,
are introduced. Furthermore, a plethora of pretraining schemes is thoroughly
discussed. Finally, a new RGB+D dataset for the Greek sign language is created.
To the best of our knowledge, this is the first sign language dataset where
sentence and gloss level annotations are provided for a video capture.
Related papers
- Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition [96.62264528407863]
We propose a self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency.
Inspired by the complementary property of motion and joint modalities, we first introduce first-order motion information into sign language modeling.
Our method is evaluated with extensive experiments on four public benchmarks, and achieves new state-of-the-art performance with a notable margin.
arXiv Detail & Related papers (2024-06-15T04:50:19Z) - Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets [2.512406961007489]
We use a temporal graph convolution-based sign language recognition approach to evaluate five supervised transfer learning approaches.
Experiments demonstrate that improvement over finetuning based transfer learning is possible with specialized supervised transfer learning methods.
arXiv Detail & Related papers (2024-03-21T16:36:40Z) - Learning from What is Already Out There: Few-shot Sign Language
Recognition with Online Dictionaries [0.0]
We open-source the UWB-SL-Wild few-shot dataset, the first of its kind training resource consisting of dictionary-scraped videos.
We introduce a novel approach to training sign language recognition models in a few-shot scenario, resulting in state-of-the-art results.
arXiv Detail & Related papers (2023-01-10T03:21:01Z) - LSA-T: The first continuous Argentinian Sign Language dataset for Sign
Language Translation [52.87578398308052]
Sign language translation (SLT) is an active field of study that encompasses human-computer interaction, computer vision, natural language processing and machine learning.
This paper presents the first continuous Argentinian Sign Language (LSA) dataset.
It contains 14,880 sentence level videos of LSA extracted from the CN Sordos YouTube channel with labels and keypoints annotations for each signer.
arXiv Detail & Related papers (2022-11-14T14:46:44Z) - Self-Supervised Speech Representation Learning: A Review [105.1545308184483]
Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains.
Speech representation learning is experiencing similar progress in three main categories: generative, contrastive, and predictive methods.
This review presents approaches for self-supervised speech representation learning and their connection to other research areas.
arXiv Detail & Related papers (2022-05-21T16:52:57Z) - Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.
We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task.
This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z) - data2vec: A General Framework for Self-supervised Learning in Speech,
Vision and Language [85.9019051663368]
data2vec is a framework that uses the same learning method for either speech, NLP or computer vision.
The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup.
Experiments on the major benchmarks of speech recognition, image classification, and natural language understanding demonstrate a new state of the art or competitive performance.
arXiv Detail & Related papers (2022-02-07T22:52:11Z) - Sign Language Video Retrieval with Free-Form Textual Queries [19.29003565494735]
We introduce the task of sign language retrieval with free-form textual queries.
The objective is to find the signing video in the collection that best matches the written query.
We propose SPOT-ALIGN, a framework for interleaving iterative rounds of sign spotting and feature alignment to expand the scope and scale of available training data.
arXiv Detail & Related papers (2022-01-07T15:22:18Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.