Hierarchical Windowed Graph Attention Network and a Large Scale Dataset for Isolated Indian Sign Language Recognition
- URL: http://arxiv.org/abs/2407.14224v2
- Date: Fri, 27 Sep 2024 12:31:24 GMT
- Title: Hierarchical Windowed Graph Attention Network and a Large Scale Dataset for Isolated Indian Sign Language Recognition
- Authors: Suvajit Patra, Arkadip Maitra, Megha Tiwari, K. Kumaran, Swathy Prabhu, Swami Punyeshwarananda, Soumitra Samanta,
- Abstract summary: We introduce a large-scale isolated ISL dataset and a novel SL recognition model based on skeleton graph structure.
The dataset covers 2002 daily used common words in the deaf community recorded by 20 (10 male and 10 female) deaf adult signers.
We propose a SL recognition model namely Hierarchical Windowed Graph Attention Network (HWGAT) by utilizing the human upper body skeleton graph.
- Score: 0.20075899678041528
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automatic Sign Language (SL) recognition is an important task in the computer vision community. To build a robust SL recognition system, we need a considerable amount of data which is lacking particularly in Indian sign language (ISL). In this paper, we introduce a large-scale isolated ISL dataset and a novel SL recognition model based on skeleton graph structure. The dataset covers 2002 daily used common words in the deaf community recorded by 20 (10 male and 10 female) deaf adult signers (contains 40033 videos). We propose a SL recognition model namely Hierarchical Windowed Graph Attention Network (HWGAT) by utilizing the human upper body skeleton graph. The HWGAT tries to capture distinctive motions by giving attention to different body parts induced by the human skeleton graph. The utility of the proposed dataset and the usefulness of our model are evaluated through extensive experiments. We pre-trained the proposed model on the presented dataset and fine-tuned it across different sign language datasets further boosting the performance of 1.10, 0.46, 0.78, and 6.84 percentage points on INCLUDE, LSA64, AUTSL and WLASL respectively compared to the existing state-of-the-art keypoints-based models.
Related papers
- Bengali Sign Language Recognition through Hand Pose Estimation using Multi-Branch Spatial-Temporal Attention Model [0.5825410941577593]
We propose a spatial-temporal attention-based BSL recognition model considering hand joint skeletons extracted from the sequence of images.
Our model captures discriminative structural displacements and short-range dependency based on unified joint features projected onto high-dimensional feature space.
arXiv Detail & Related papers (2024-08-26T08:55:16Z) - SkelCap: Automated Generation of Descriptive Text from Skeleton Keypoint Sequences [2.0257616108612373]
We structured this dataset around AUTSL, a comprehensive isolated Turkish sign language dataset.
We also developed a baseline model, SkelCap, which can generate textual descriptions of body movements.
The model achieved promising results, with a ROUGE-L score of 0.98 and a BLEU-4 score of 0.94 in the signer-agnostic evaluation.
arXiv Detail & Related papers (2024-05-05T15:50:02Z) - Slovo: Russian Sign Language Dataset [83.93252084624997]
This paper presents the Russian Sign Language (RSL) video dataset Slovo, produced using crowdsourcing platforms.
The dataset contains 20,000 FullHD recordings, divided into 1,000 classes of isolated RSL gestures received by 194 signers.
arXiv Detail & Related papers (2023-05-23T21:00:42Z) - Fine-tuning of sign language recognition models: a technical report [0.0]
We focus on investigating two questions: how fine-tuning on datasets from other sign languages helps improve sign recognition quality, and whether sign recognition is possible in real-time without using GPU.
We provide code for reproducing model training experiments, converting models to ONNX format, and inference for real-time gesture recognition.
arXiv Detail & Related papers (2023-02-15T14:36:18Z) - Learning from What is Already Out There: Few-shot Sign Language
Recognition with Online Dictionaries [0.0]
We open-source the UWB-SL-Wild few-shot dataset, the first of its kind training resource consisting of dictionary-scraped videos.
We introduce a novel approach to training sign language recognition models in a few-shot scenario, resulting in state-of-the-art results.
arXiv Detail & Related papers (2023-01-10T03:21:01Z) - Privacy-Preserved Neural Graph Similarity Learning [99.78599103903777]
We propose a novel Privacy-Preserving neural Graph Matching network model, named PPGM, for graph similarity learning.
To prevent reconstruction attacks, the proposed model does not communicate node-level representations between devices.
To alleviate the attacks to graph properties, the obfuscated features that contain information from both vectors are communicated.
arXiv Detail & Related papers (2022-10-21T04:38:25Z) - Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble [71.97020373520922]
Sign language is commonly used by deaf or mute people to communicate.
We propose a novel Multi-modal Framework with a Global Ensemble Model (GEM) for isolated Sign Language Recognition ( SLR)
Our proposed SAM- SLR-v2 framework is exceedingly effective and achieves state-of-the-art performance with significant margins.
arXiv Detail & Related papers (2021-10-12T16:57:18Z) - Multi-Modal Zero-Shot Sign Language Recognition [51.07720650677784]
We propose a multi-modal Zero-Shot Sign Language Recognition model.
A Transformer-based model along with a C3D model is used for hand detection and deep features extraction.
A semantic space is used to map the visual features to the lingual embedding of the class labels.
arXiv Detail & Related papers (2021-09-02T09:10:39Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - BSL-1K: Scaling up co-articulated sign language recognition using
mouthing cues [106.21067543021887]
We show how to use mouthing cues from signers to obtain high-quality annotations from video data.
The BSL-1K dataset is a collection of British Sign Language (BSL) signs of unprecedented scale.
arXiv Detail & Related papers (2020-07-23T16:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.