One-Stage-TFS: Thai One-Stage Fingerspelling Dataset for Fingerspelling Recognition Frameworks
- URL: http://arxiv.org/abs/2411.02768v1
- Date: Tue, 05 Nov 2024 03:26:26 GMT
- Title: One-Stage-TFS: Thai One-Stage Fingerspelling Dataset for Fingerspelling Recognition Frameworks
- Authors: Siriwiwat Lata, Sirawan Phiphitphatphaisit, Emmanuel Okafor, Olarik Surinta,
- Abstract summary: The Thai One-Stage Fingerspelling dataset is a comprehensive resource designed to advance research in hand gesture recognition.
This dataset comprises 7,200 images capturing 15 one-stage consonant gestures performed by undergraduate students from Rajabhat Maha Sarakham University, Thailand.
- Score: 0.0
- License:
- Abstract: The Thai One-Stage Fingerspelling (One-Stage-TFS) dataset is a comprehensive resource designed to advance research in hand gesture recognition, explicitly focusing on the recognition of Thai sign language. This dataset comprises 7,200 images capturing 15 one-stage consonant gestures performed by undergraduate students from Rajabhat Maha Sarakham University, Thailand. The contributors include both expert students from the Special Education Department with proficiency in Thai sign language and students from other departments without prior sign language experience. Images were collected between July and December 2021 using a DSLR camera, with contributors demonstrating hand gestures against both simple and complex backgrounds. The One-Stage-TFS dataset presents challenges in detecting and recognizing hand gestures, offering opportunities to develop novel end-to-end recognition frameworks. Researchers can utilize this dataset to explore deep learning methods, such as YOLO, EfficientDet, RetinaNet, and Detectron, for hand detection, followed by feature extraction and recognition using techniques like convolutional neural networks, transformers, and adaptive feature fusion networks. The dataset is accessible via the Mendeley Data repository and supports a wide range of applications in computer science, including deep learning, computer vision, and pattern recognition, thereby encouraging further innovation and exploration in these fields.
Related papers
- Deep Neural Network-Based Sign Language Recognition: A Comprehensive Approach Using Transfer Learning with Explainability [0.0]
We suggest a novel solution that uses a deep neural network to fully automate sign language recognition.
This methodology integrates sophisticated preprocessing methodologies to optimise the overall performance.
Our model's ability to provide informational clarity was assessed using the SHAP (SHapley Additive exPlanations) method.
arXiv Detail & Related papers (2024-09-11T17:17:44Z) - Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition [96.62264528407863]
We propose a self-supervised contrastive learning framework to excavate rich context via spatial-temporal consistency.
Inspired by the complementary property of motion and joint modalities, we first introduce first-order motion information into sign language modeling.
Our method is evaluated with extensive experiments on four public benchmarks, and achieves new state-of-the-art performance with a notable margin.
arXiv Detail & Related papers (2024-06-15T04:50:19Z) - Arabic Handwritten Text for Person Biometric Identification: A Deep Learning Approach [0.9910347287556193]
This study thoroughly investigates how well deep learning models can recognize Arabic handwritten text for person biometric identification.
It compares three advanced architectures -- ResNet50, MobileNetV2, and EfficientNetB7 -- using three widely recognized datasets.
Results show that EfficientNetB7 outperforms the others, achieving test accuracies of 98.57%, 99.15%, and 99.79% on AHAWP, Khatt, and LAMIS-MSHD datasets.
arXiv Detail & Related papers (2024-06-01T11:43:00Z) - Enhancing Brazilian Sign Language Recognition through Skeleton Image Representation [2.6311088262657907]
This work proposes an Isolated Sign Language Recognition (ISLR) approach where body, hands, and facial landmarks are extracted throughout time and encoded as 2-D images.
We show that our method surpassed the state-of-the-art in terms of performance metrics on two widely recognized datasets in Brazilian Sign Language (LIBRAS)
In addition to being more accurate, our method is more time-efficient and easier to train due to its reliance on a simpler network architecture and solely RGB data as input.
arXiv Detail & Related papers (2024-04-29T23:21:17Z) - Learning Transferable Pedestrian Representation from Multimodal
Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information.
We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations.
We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z) - Unified Contrastive Learning in Image-Text-Label Space [130.31947133453406]
Unified Contrastive Learning (UniCL) is effective way of learning semantically rich yet discriminative representations.
UniCL stand-alone is a good learner on pure imagelabel data, rivaling the supervised learning methods across three image classification datasets.
arXiv Detail & Related papers (2022-04-07T17:34:51Z) - SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild [62.450907796261646]
Recognition of hand gestures can be performed directly from the stream of hand skeletons estimated by software.
Despite the recent advancements in gesture and action recognition from skeletons, it is unclear how well the current state-of-the-art techniques can perform in a real-world scenario.
This paper presents the results of the SHREC 2021: Track on Skeleton-based Hand Gesture Recognition in the Wild contest.
arXiv Detail & Related papers (2021-06-21T10:57:49Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - Scaling Up Visual and Vision-Language Representation Learning With Noisy
Text Supervision [57.031588264841]
We leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps.
A simple dual-encoder architecture learns to align visual and language representations of the image and text pairs using a contrastive loss.
We show that the scale of our corpus can make up for its noise and leads to state-of-the-art representations even with such a simple learning scheme.
arXiv Detail & Related papers (2021-02-11T10:08:12Z) - Force myography benchmark data for hand gesture recognition and transfer
learning [5.110894308882439]
We contribute to the advancement of this field by making accessible a benchmark dataset collected using a commercially available sensor setup from 20 persons covering 18 unique gestures.
We illustrate one use-case for such data, showing how we can improve gesture recognition accuracy by utilising transfer learning to incorporate data from multiple other persons.
arXiv Detail & Related papers (2020-07-29T15:43:59Z) - A Comprehensive Study on Deep Learning-based Methods for Sign Language
Recognition [14.714669469867871]
The aim of the present study is to provide insights on sign language recognition, focusing on mapping non-segmented video streams to glosses.
To the best of our knowledge, this is the first sign language dataset where sentence and gloss level annotations are provided for a video capture.
arXiv Detail & Related papers (2020-07-24T14:07:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.