Related papers: ChaLearn LAP Large Scale Signer Independent Isolated Sign Language Recognition Challenge: Design, Results and Future Research

ChaLearn LAP Large Scale Signer Independent Isolated Sign Language Recognition Challenge: Design, Results and Future Research

URL: http://arxiv.org/abs/2105.05066v1
Date: Tue, 11 May 2021 14:17:39 GMT
Title: ChaLearn LAP Large Scale Signer Independent Isolated Sign Language Recognition Challenge: Design, Results and Future Research
Authors: Ozge Mercanoglu Sincan, Julio C. S. Jacques Junior, Sergio Escalera, Hacer Yalim Keles
Abstract summary: This work summarises the ChaLearn LAP Large Scale Signer Independent Isolated SLR Challenge, organised at CVPR 2021. We discuss the challenge design, top winning solutions and suggestions for future research. Winning teams achieved more than 96% recognition rate, and their approaches benefited from pose/hand/face estimation, transfer learning, external data, fusion/ensemble of modalities and different strategies to model-temporal information.
Score: 28.949528008976493
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The performances of Sign Language Recognition (SLR) systems have improved considerably in recent years. However, several open challenges still need to be solved to allow SLR to be useful in practice. The research in the field is in its infancy in regards to the robustness of the models to a large diversity of signs and signers, and to fairness of the models to performers from different demographics. This work summarises the ChaLearn LAP Large Scale Signer Independent Isolated SLR Challenge, organised at CVPR 2021 with the goal of overcoming some of the aforementioned challenges. We analyse and discuss the challenge design, top winning solutions and suggestions for future research. The challenge attracted 132 participants in the RGB track and 59 in the RGB+Depth track, receiving more than 1.5K submissions in total. Participants were evaluated using a new large-scale multi-modal Turkish Sign Language (AUTSL) dataset, consisting of 226 sign labels and 36,302 isolated sign video samples performed by 43 different signers. Winning teams achieved more than 96% recognition rate, and their approaches benefited from pose/hand/face estimation, transfer learning, external data, fusion/ensemble of modalities and different strategies to model spatio-temporal information. However, methods still fail to distinguish among very similar signs, in particular those sharing similar hand trajectories.

Related papers

Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora [84.03928547166873]
Children can acquire language from less than 100 million words of input. Large language models are far less data-efficient: they typically require 3 or 4 orders of magnitude more data and still do not perform as well as humans on many evaluations. The BabyLM Challenge is a communal effort in which participants compete to optimize language model training on a fixed data budget.
arXiv Detail & Related papers (2025-04-10T23:22:43Z)
Training Strategies for Isolated Sign Language Recognition [72.27323884094953]
This paper introduces a comprehensive model training pipeline for Isolated Sign Language Recognition. The constructed pipeline incorporates carefully selected image and video augmentations to tackle the challenges of low data quality and varying sign speeds. We achieve a state-of-the-art result on the WLASL and Slovo benchmarks with 1.63% and 14.12% improvements compared to the previous best solution.
arXiv Detail & Related papers (2024-12-16T08:37:58Z)
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora [79.03392191805028]
The BabyLM Challenge is a community effort to close the data-efficiency gap between human and computational language learners. Participants compete to optimize language model training on a fixed language data budget of 100 million words or less.
arXiv Detail & Related papers (2024-12-06T16:06:08Z)
A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look [52.114284476700874]
This paper reports on the results of a large-scale evaluation (the TREC 2024 RAG Track) where four different relevance assessment approaches were deployed. We find that automatically generated UMBRELA judgments can replace fully manual judgments to accurately capture run-level effectiveness. Surprisingly, we find that LLM assistance does not appear to increase correlation with fully manual assessments, suggesting that costs associated with human-in-the-loop processes do not bring obvious tangible benefits.
arXiv Detail & Related papers (2024-11-13T01:12:35Z)
Leveraging Contrastive Learning and Self-Training for Multimodal Emotion Recognition with Limited Labeled Samples [18.29910296652917]
We present our submission solutions for the Semi-Supervised Learning Sub-Challenge (MER2024-SEMI) This challenge tackles the issue of limited annotated data in emotion recognition. Our proposed method is validated to be effective on the MER2024-SEMI Challenge, achieving a weighted average F-score of 88.25% and ranking 6th on the leaderboard.
arXiv Detail & Related papers (2024-08-23T11:33:54Z)
A Transformer Model for Boundary Detection in Continuous Sign Language [55.05986614979846]
The Transformer model is employed for both Isolated Sign Language Recognition and Continuous Sign Language Recognition. The training process involves using isolated sign videos, where hand keypoint features extracted from the input video are enriched. The trained model, coupled with a post-processing method, is then applied to detect isolated sign boundaries within continuous sign videos.
arXiv Detail & Related papers (2024-02-22T17:25:01Z)
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond [89.54151859266202]
The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework. The challenge garnered 12 model submissions and 54 language corpora, resulting in a comprehensive benchmark encompassing 154 languages. The findings indicate that merely scaling models is not the definitive solution for multilingual speech tasks.
arXiv Detail & Related papers (2023-10-09T08:30:01Z)
Towards the extraction of robust sign embeddings for low resource sign language recognition [7.969704867355098]
We show that keypoint-based embeddings can transfer between sign languages and achieve competitive performance. We furthermore achieve better performance using fine-tuned transferred embeddings than models trained only on the target sign language.
arXiv Detail & Related papers (2023-06-30T11:21:40Z)
MER 2023: Multi-label Learning, Modality Robustness, and Semi-Supervised Learning [90.17500229142755]
The first Multimodal Emotion Recognition Challenge (MER 2023) was successfully held at ACM Multimedia. This paper introduces the motivation behind this challenge, describe the benchmark dataset, and provide some statistics about participants. We believe this high-quality dataset can become a new benchmark in multimodal emotion recognition, especially for the Chinese research community.
arXiv Detail & Related papers (2023-04-18T13:23:42Z)
Word level Bangla Sign Language Dataset for Continuous BSL Recognition [0.0]
We develop an attention-based Bi-GRU model that captures the temporal dynamics of pose information for individuals communicating through sign language. The accuracy of the model is reported to be 85.64%.
arXiv Detail & Related papers (2023-02-22T18:55:54Z)
NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research [96.53307645791179]
We introduce the Never-Ending VIsual-classification Stream (NEVIS'22), a benchmark consisting of a stream of over 100 visual classification tasks. Despite being limited to classification, the resulting stream has a rich diversity of tasks from OCR, to texture analysis, scene recognition, and so forth. Overall, NEVIS'22 poses an unprecedented challenge for current sequential learning approaches due to the scale and diversity of tasks.
arXiv Detail & Related papers (2022-11-15T18:57:46Z)
Word separation in continuous sign language using isolated signs and post-processing [47.436298331905775]
We propose a two-stage model for Continuous Sign Language Recognition. In the first stage, the predictor model, which includes a combination of CNN, SVD, and LSTM, is trained with the isolated signs. In the second stage, we apply a post-processing algorithm to the Softmax outputs obtained from the first part of the model.
arXiv Detail & Related papers (2022-04-02T18:34:33Z)
AUTSL: A Large Scale Multi-modal Turkish Sign Language Dataset and Baseline Methods [6.320141734801679]
We present a new largescale multi-modal Turkish Sign Language dataset (AUTSL) with a benchmark. Our dataset consists of 226 signs performed by 43 different signers and 38,336 isolated sign video samples. We trained several deep learning based models and provide empirical evaluations using the benchmark.
arXiv Detail & Related papers (2020-08-03T15:12:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.