Related papers: Hierarchical Windowed Graph Attention Network and a Large Scale Dataset for Isolated Indian Sign Language Recognition

Hierarchical Windowed Graph Attention Network and a Large Scale Dataset for Isolated Indian Sign Language Recognition

URL: http://arxiv.org/abs/2407.14224v1
Date: Fri, 19 Jul 2024 11:48:36 GMT
Title: Hierarchical Windowed Graph Attention Network and a Large Scale Dataset for Isolated Indian Sign Language Recognition
Authors: Suvajit Patra, Arkadip Maitra, Megha Tiwari, K. Kumaran, Swathy Prabhu, Swami Punyeshwarananda, Soumitra Samanta,
Abstract summary: We propose a large-scale isolated ISL dataset and a novel SL recognition model based on skeleton graph structure. The dataset covers 2,002 daily used common words in the deaf community recorded by 20 (10 male and 10 female) deaf adult signers.
Score: 0.20075899678041528
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Automatic Sign Language (SL) recognition is an important task in the computer vision community. To build a robust SL recognition system, we need a considerable amount of data which is lacking particularly in Indian sign language (ISL). In this paper, we propose a large-scale isolated ISL dataset and a novel SL recognition model based on skeleton graph structure. The dataset covers 2,002 daily used common words in the deaf community recorded by 20 (10 male and 10 female) deaf adult signers (contains 40033 videos). We propose a SL recognition model namely Hierarchical Windowed Graph Attention Network (HWGAT) by utilizing the human upper body skeleton graph structure. The HWGAT tries to capture distinctive motions by giving attention to different body parts induced by the human skeleton graph structure. The utility of the proposed dataset and the usefulness of our model are evaluated through extensive experiments. We pre-trained the proposed model on the proposed dataset and fine-tuned it across different sign language datasets further boosting the performance of 1.10, 0.46, 0.78, and 6.84 percentage points on INCLUDE, LSA64, AUTSL and WLASL respectively compared to the existing state-of-the-art skeleton-based models.

Related papers

Logos as a Well-Tempered Pre-train for Sign Language Recognition [75.42794328290088]
This paper presents Logos, a novel Russian Sign Language (RSL) dataset.<n>It is shown that a model, pre-trained on the Logos dataset can be used as a universal encoder for other language SLR tasks.<n>We show that explicitly labeling visually similar signs improves trained model quality as a visual encoder for downstream tasks.
arXiv Detail & Related papers (2025-05-15T16:31:49Z)
Bengali Sign Language Recognition through Hand Pose Estimation using Multi-Branch Spatial-Temporal Attention Model [0.5825410941577593]
We propose a spatial-temporal attention-based BSL recognition model considering hand joint skeletons extracted from the sequence of images. Our model captures discriminative structural displacements and short-range dependency based on unified joint features projected onto high-dimensional feature space.
arXiv Detail & Related papers (2024-08-26T08:55:16Z)
SkelCap: Automated Generation of Descriptive Text from Skeleton Keypoint Sequences [2.0257616108612373]
We structured this dataset around AUTSL, a comprehensive isolated Turkish sign language dataset. We also developed a baseline model, SkelCap, which can generate textual descriptions of body movements. The model achieved promising results, with a ROUGE-L score of 0.98 and a BLEU-4 score of 0.94 in the signer-agnostic evaluation.
arXiv Detail & Related papers (2024-05-05T15:50:02Z)
Slovo: Russian Sign Language Dataset [83.93252084624997]
This paper presents the Russian Sign Language (RSL) video dataset Slovo, produced using crowdsourcing platforms. The dataset contains 20,000 FullHD recordings, divided into 1,000 classes of isolated RSL gestures received by 194 signers.
arXiv Detail & Related papers (2023-05-23T21:00:42Z)
Fine-tuning of sign language recognition models: a technical report [0.0]
We focus on investigating two questions: how fine-tuning on datasets from other sign languages helps improve sign recognition quality, and whether sign recognition is possible in real-time without using GPU. We provide code for reproducing model training experiments, converting models to ONNX format, and inference for real-time gesture recognition.
arXiv Detail & Related papers (2023-02-15T14:36:18Z)
Learning from What is Already Out There: Few-shot Sign Language Recognition with Online Dictionaries [0.0]
We open-source the UWB-SL-Wild few-shot dataset, the first of its kind training resource consisting of dictionary-scraped videos. We introduce a novel approach to training sign language recognition models in a few-shot scenario, resulting in state-of-the-art results.
arXiv Detail & Related papers (2023-01-10T03:21:01Z)
Privacy-Preserved Neural Graph Similarity Learning [99.78599103903777]
We propose a novel Privacy-Preserving neural Graph Matching network model, named PPGM, for graph similarity learning. To prevent reconstruction attacks, the proposed model does not communicate node-level representations between devices. To alleviate the attacks to graph properties, the obfuscated features that contain information from both vectors are communicated.
arXiv Detail & Related papers (2022-10-21T04:38:25Z)
Sign Language Recognition via Skeleton-Aware Multi-Model Ensemble [71.97020373520922]
Sign language is commonly used by deaf or mute people to communicate. We propose a novel Multi-modal Framework with a Global Ensemble Model (GEM) for isolated Sign Language Recognition ( SLR) Our proposed SAM- SLR-v2 framework is exceedingly effective and achieves state-of-the-art performance with significant margins.
arXiv Detail & Related papers (2021-10-12T16:57:18Z)
Multi-Modal Zero-Shot Sign Language Recognition [51.07720650677784]
We propose a multi-modal Zero-Shot Sign Language Recognition model. A Transformer-based model along with a C3D model is used for hand detection and deep features extraction. A semantic space is used to map the visual features to the lingual embedding of the class labels.
arXiv Detail & Related papers (2021-09-02T09:10:39Z)
Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate. Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance. Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z)
BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues [106.21067543021887]
We show how to use mouthing cues from signers to obtain high-quality annotations from video data. The BSL-1K dataset is a collection of British Sign Language (BSL) signs of unprecedented scale.
arXiv Detail & Related papers (2020-07-23T16:59:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.