A Prototype-Based Generalized Zero-Shot Learning Framework for Hand
Gesture Recognition
- URL: http://arxiv.org/abs/2009.13957v1
- Date: Tue, 29 Sep 2020 12:18:35 GMT
- Title: A Prototype-Based Generalized Zero-Shot Learning Framework for Hand
Gesture Recognition
- Authors: Jinting Wu, Yujia Zhang and Xiaoguang Zhao
- Abstract summary: We propose an end-to-end prototype-based framework for hand gesture recognition.
The first branch is a prototype-based detector that learns gesture representations.
The second branch is a zero-shot label predictor which takes the features of unseen classes as input and outputs predictions.
- Score: 5.992264231643021
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hand gesture recognition plays a significant role in human-computer
interaction for understanding various human gestures and their intent. However,
most prior works can only recognize gestures of limited labeled classes and
fail to adapt to new categories. The task of Generalized Zero-Shot Learning
(GZSL) for hand gesture recognition aims to address the above issue by
leveraging semantic representations and detecting both seen and unseen class
samples. In this paper, we propose an end-to-end prototype-based GZSL framework
for hand gesture recognition which consists of two branches. The first branch
is a prototype-based detector that learns gesture representations and
determines whether an input sample belongs to a seen or unseen category. The
second branch is a zero-shot label predictor which takes the features of unseen
classes as input and outputs predictions through a learned mapping mechanism
between the feature and the semantic space. We further establish a hand gesture
dataset that specifically targets this GZSL task, and comprehensive experiments
on this dataset demonstrate the effectiveness of our proposed approach on
recognizing both seen and unseen gestures.
Related papers
- Zero-Shot Underwater Gesture Recognition [3.4078654008228924]
Hand gesture recognition allows humans to interact with machines non-verbally, which has a huge application in underwater exploration using autonomous underwater vehicles.
Recently, a new gesture-based language called CADDIAN has been devised for divers, and supervised learning methods have been applied to recognize the gestures with high accuracy.
In this work, we advocate the need for zero-shot underwater gesture recognition (ZSUGR), where the objective is to train a model with visual samples of gestures from a few seen'' classes only and transfer the gained knowledge at test time to recognize semantically-similar unseen gesture classes as well.
arXiv Detail & Related papers (2024-07-19T08:16:46Z) - MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition [94.56755080185732]
We propose a Motion-Aware masked autoencoder with Semantic Alignment (MASA) that integrates rich motion cues and global semantic information.
Our framework can simultaneously learn local motion cues and global semantic features for comprehensive sign language representation.
arXiv Detail & Related papers (2024-05-31T08:06:05Z) - Towards Open-set Gesture Recognition via Feature Activation Enhancement
and Orthogonal Prototype Learning [4.724899372568309]
Gesture recognition is a foundational task in human-machine interaction.
It is essential to effectively discern and reject unknown gestures of disinterest in a robust system.
We propose a more effective PL method leveraging two novel and inherent distinctions, feature activation level and projection inconsistency.
arXiv Detail & Related papers (2023-12-05T06:49:15Z) - Disentangled Interaction Representation for One-Stage Human-Object
Interaction Detection [70.96299509159981]
Human-Object Interaction (HOI) detection is a core task for human-centric image understanding.
Recent one-stage methods adopt a transformer decoder to collect image-wide cues that are useful for interaction prediction.
Traditional two-stage methods benefit significantly from their ability to compose interaction features in a disentangled and explainable manner.
arXiv Detail & Related papers (2023-12-04T08:02:59Z) - SignBERT+: Hand-model-aware Self-supervised Pre-training for Sign
Language Understanding [132.78015553111234]
Hand gesture serves as a crucial role during the expression of sign language.
Current deep learning based methods for sign language understanding (SLU) are prone to over-fitting due to insufficient sign data resource.
We propose the first self-supervised pre-trainable SignBERT+ framework with model-aware hand prior incorporated.
arXiv Detail & Related papers (2023-05-08T17:16:38Z) - Learning Common Rationale to Improve Self-Supervised Representation for
Fine-Grained Visual Recognition Problems [61.11799513362704]
We propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes.
We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective.
arXiv Detail & Related papers (2023-03-03T02:07:40Z) - Cross-modal Representation Learning for Zero-shot Action Recognition [67.57406812235767]
We present a cross-modal Transformer-based framework, which jointly encodes video data and text labels for zero-shot action recognition (ZSAR)
Our model employs a conceptually new pipeline by which visual representations are learned in conjunction with visual-semantic associations in an end-to-end manner.
Experiment results show our model considerably improves upon the state of the arts in ZSAR, reaching encouraging top-1 accuracy on UCF101, HMDB51, and ActivityNet benchmark datasets.
arXiv Detail & Related papers (2022-05-03T17:39:27Z) - The Overlooked Classifier in Human-Object Interaction Recognition [82.20671129356037]
We encode the semantic correlation among classes into the classification head by initializing the weights with language embeddings of HOIs.
We propose a new loss named LSE-Sign to enhance multi-label learning on a long-tailed dataset.
Our simple yet effective method enables detection-free HOI classification, outperforming the state-of-the-arts that require object detection and human pose by a clear margin.
arXiv Detail & Related papers (2022-03-10T23:35:00Z) - GAN for Vision, KG for Relation: a Two-stage Deep Network for Zero-shot
Action Recognition [33.23662792742078]
We propose a two-stage deep neural network for zero-shot action recognition.
In the sampling stage, we utilize a generative adversarial networks (GAN) trained by action features and word vectors of seen classes.
In the classification stage, we construct a knowledge graph based on the relationship between word vectors of action classes and related objects.
arXiv Detail & Related papers (2021-05-25T09:34:42Z) - FineHand: Learning Hand Shapes for American Sign Language Recognition [16.862375555609667]
We present an approach for effective learning of hand shape embeddings, which are discriminative for ASL gestures.
For hand shape recognition our method uses a mix of manually labelled hand shapes and high confidence predictions to train deep convolutional neural network (CNN)
We will demonstrate that higher quality hand shape models can significantly improve the accuracy of final video gesture classification.
arXiv Detail & Related papers (2020-03-04T23:32:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.