ChaLearn LAP Large Scale Signer Independent Isolated Sign Language
Recognition Challenge: Design, Results and Future Research
- URL: http://arxiv.org/abs/2105.05066v1
- Date: Tue, 11 May 2021 14:17:39 GMT
- Title: ChaLearn LAP Large Scale Signer Independent Isolated Sign Language
Recognition Challenge: Design, Results and Future Research
- Authors: Ozge Mercanoglu Sincan, Julio C. S. Jacques Junior, Sergio Escalera,
Hacer Yalim Keles
- Abstract summary: This work summarises the ChaLearn LAP Large Scale Signer Independent Isolated SLR Challenge, organised at CVPR 2021.
We discuss the challenge design, top winning solutions and suggestions for future research.
Winning teams achieved more than 96% recognition rate, and their approaches benefited from pose/hand/face estimation, transfer learning, external data, fusion/ensemble of modalities and different strategies to model-temporal information.
- Score: 28.949528008976493
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The performances of Sign Language Recognition (SLR) systems have improved
considerably in recent years. However, several open challenges still need to be
solved to allow SLR to be useful in practice. The research in the field is in
its infancy in regards to the robustness of the models to a large diversity of
signs and signers, and to fairness of the models to performers from different
demographics. This work summarises the ChaLearn LAP Large Scale Signer
Independent Isolated SLR Challenge, organised at CVPR 2021 with the goal of
overcoming some of the aforementioned challenges. We analyse and discuss the
challenge design, top winning solutions and suggestions for future research.
The challenge attracted 132 participants in the RGB track and 59 in the
RGB+Depth track, receiving more than 1.5K submissions in total. Participants
were evaluated using a new large-scale multi-modal Turkish Sign Language
(AUTSL) dataset, consisting of 226 sign labels and 36,302 isolated sign video
samples performed by 43 different signers. Winning teams achieved more than 96%
recognition rate, and their approaches benefited from pose/hand/face
estimation, transfer learning, external data, fusion/ensemble of modalities and
different strategies to model spatio-temporal information. However, methods
still fail to distinguish among very similar signs, in particular those sharing
similar hand trajectories.
Related papers
- Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled Samples [18.29910296652917]
We present our submission solutions for the Semi-Supervised Learning Sub-Challenge (MER2024-SEMI)
This challenge tackles the issue of limited annotated data in emotion recognition.
Our proposed method is validated to be effective on the MER2024-SEMI Challenge, achieving a weighted average F-score of 88.25% and ranking 6th on the leaderboard.
arXiv Detail & Related papers (2024-08-23T11:33:54Z) - A Transformer Model for Boundary Detection in Continuous Sign Language [55.05986614979846]
The Transformer model is employed for both Isolated Sign Language Recognition and Continuous Sign Language Recognition.
The training process involves using isolated sign videos, where hand keypoint features extracted from the input video are enriched.
The trained model, coupled with a post-processing method, is then applied to detect isolated sign boundaries within continuous sign videos.
arXiv Detail & Related papers (2024-02-22T17:25:01Z) - Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation
over More Languages and Beyond [89.54151859266202]
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework.
The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages.
The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks.
arXiv Detail & Related papers (2023-10-09T08:30:01Z) - Towards the extraction of robust sign embeddings for low resource sign
language recognition [7.969704867355098]
We show that keypoint-based embeddings can transfer between sign languages and achieve competitive performance.
We furthermore achieve better performance using fine-tuned transferred embeddings than models trained only on the target sign language.
arXiv Detail & Related papers (2023-06-30T11:21:40Z) - Slovo: Russian Sign Language Dataset [83.93252084624997]
This paper presents the Russian Sign Language (RSL) video dataset Slovo, produced using crowdsourcing platforms.
The dataset contains 20,000 FullHD recordings, divided into 1,000 classes of isolated RSL gestures received by 194 signers.
arXiv Detail & Related papers (2023-05-23T21:00:42Z) - MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised
Learning [90.17500229142755]
The first Multimodal Emotion Recognition Challenge (MER 2023) was successfully held at ACM Multimedia.
This paper introduces the motivation behind this challenge, describe the benchmark dataset, and provide some statistics about participants.
We believe this high-quality dataset can become a new benchmark in multimodal emotion recognition, especially for the Chinese research community.
arXiv Detail & Related papers (2023-04-18T13:23:42Z) - Word level Bangla Sign Language Dataset for Continuous BSL Recognition [0.0]
We develop an attention-based Bi-GRU model that captures the temporal dynamics of pose information for individuals communicating through sign language.
The accuracy of the model is reported to be 85.64%.
arXiv Detail & Related papers (2023-02-22T18:55:54Z) - NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision
Research [96.53307645791179]
We introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks.
Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth.
Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks.
arXiv Detail & Related papers (2022-11-15T18:57:46Z) - Word separation in continuous sign language using isolated signs and
post-processing [47.436298331905775]
We propose a two-stage model for Continuous Sign Language Recognition.
In the first stage, the predictor model, which includes a combination of CNN, SVD, and LSTM, is trained with the isolated signs.
In the second stage, we apply a post-processing algorithm to the Softmax outputs obtained from the first part of the model.
arXiv Detail & Related papers (2022-04-02T18:34:33Z) - AUTSL: A Large Scale Multi-modal Turkish Sign Language Dataset and
Baseline Methods [6.320141734801679]
We present a new largescale multi-modal Turkish Sign Language dataset (AUTSL) with a benchmark.
Our dataset consists of 226 signs performed by 43 different signers and 38,336 isolated sign video samples.
We trained several deep learning based models and provide empirical evaluations using the benchmark.
arXiv Detail & Related papers (2020-08-03T15:12:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.