One for All: An End-to-End Compact Solution for Hand Gesture Recognition
- URL: http://arxiv.org/abs/2105.07143v1
- Date: Sat, 15 May 2021 05:10:47 GMT
- Title: One for All: An End-to-End Compact Solution for Hand Gesture Recognition
- Authors: Monu Verma, Ayushi Gupta, santosh kumar Vipparthi
- Abstract summary: This paper proposes an end-to-end compact CNN framework: fine grained feature attentive network for hand gesture recognition (Fit-Hand)
The pipeline of the proposed architecture consists of two main units: FineFeat module and dilated convolutional (Conv) layer.
The effectiveness of Fit-Hand is evaluated by using subject dependent (SD) and subject independent (SI) validation setup over seven benchmark datasets.
- Score: 8.321276216978637
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The HGR is a quite challenging task as its performance is influenced by
various aspects such as illumination variations, cluttered backgrounds,
spontaneous capture, etc. The conventional CNN networks for HGR are following
two stage pipeline to deal with the various challenges: complex signs,
illumination variations, complex and cluttered backgrounds. The existing
approaches needs expert expertise as well as auxiliary computation at stage 1
to remove the complexities from the input images. Therefore, in this paper, we
proposes an novel end-to-end compact CNN framework: fine grained feature
attentive network for hand gesture recognition (Fit-Hand) to solve the
challenges as discussed above. The pipeline of the proposed architecture
consists of two main units: FineFeat module and dilated convolutional (Conv)
layer. The FineFeat module extracts fine grained feature maps by employing
attention mechanism over multiscale receptive fields. The attention mechanism
is introduced to capture effective features by enlarging the average behaviour
of multi-scale responses. Moreover, dilated convolution provides global
features of hand gestures through a larger receptive field. In addition,
integrated layer is also utilized to combine the features of FineFeat module
and dilated layer which enhances the discriminability of the network by
capturing complementary context information of hand postures. The effectiveness
of Fit- Hand is evaluated by using subject dependent (SD) and subject
independent (SI) validation setup over seven benchmark datasets: MUGD-I,
MUGD-II, MUGD-III, MUGD-IV, MUGD-V, Finger Spelling and OUHANDS, respectively.
Furthermore, to investigate the deep insights of the proposed Fit-Hand
framework, we performed ten ablation study.
Related papers
- IVGF: The Fusion-Guided Infrared and Visible General Framework [41.07925395888705]
Infrared and visible dual-modality tasks can achieve robust performance even in extreme scenes by fusing complementary information.
We propose a fusion-guided infrared and visible general framework, IVGF, which can be easily extended to many high-level vision tasks.
arXiv Detail & Related papers (2024-09-02T06:38:37Z) - UGMAE: A Unified Framework for Graph Masked Autoencoders [67.75493040186859]
We propose UGMAE, a unified framework for graph masked autoencoders.
We first develop an adaptive feature mask generator to account for the unique significance of nodes.
We then design a ranking-based structure reconstruction objective joint with feature reconstruction to capture holistic graph information.
arXiv Detail & Related papers (2024-02-12T19:39:26Z) - Exploiting Modality-Specific Features For Multi-Modal Manipulation
Detection And Grounding [54.49214267905562]
We construct a transformer-based framework for multi-modal manipulation detection and grounding tasks.
Our framework simultaneously explores modality-specific features while preserving the capability for multi-modal alignment.
We propose an implicit manipulation query (IMQ) that adaptively aggregates global contextual cues within each modality.
arXiv Detail & Related papers (2023-09-22T06:55:41Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Semantic Feature Integration network for Fine-grained Visual
Classification [5.182627302449368]
We propose the Semantic Feature Integration network (SFI-Net) to address the above difficulties.
By eliminating unnecessary features and reconstructing the semantic relations among discriminative features, our SFI-Net has achieved satisfying performance.
arXiv Detail & Related papers (2023-02-13T07:32:25Z) - Progressively Dual Prior Guided Few-shot Semantic Segmentation [57.37506990980975]
Few-shot semantic segmentation task aims at performing segmentation in query images with a few annotated support samples.
We propose a progressively dual prior guided few-shot semantic segmentation network.
arXiv Detail & Related papers (2022-11-20T16:19:47Z) - MGA-VQA: Multi-Granularity Alignment for Visual Question Answering [75.55108621064726]
Learning to answer visual questions is a challenging task since the multi-modal inputs are within two feature spaces.
We propose Multi-Granularity Alignment architecture for Visual Question Answering task (MGA-VQA)
Our model splits alignment into different levels to achieve learning better correlations without needing additional data and annotations.
arXiv Detail & Related papers (2022-01-25T22:30:54Z) - An Attention-Based Deep Learning Model for Multiple Pedestrian
Attributes Recognition [4.6898263272139795]
This paper provides a novel solution to the problem of automatic characterization of pedestrians in surveillance footage.
We propose a multi-task deep model that uses an element-wise multiplication layer to extract more comprehensive feature representations.
Our experiments were performed on two well-known datasets (RAP and PETA) and point for the superiority of the proposed method with respect to the state-of-the-art.
arXiv Detail & Related papers (2020-04-02T16:21:14Z) - Global Context-Aware Progressive Aggregation Network for Salient Object
Detection [117.943116761278]
We propose a novel network named GCPANet to integrate low-level appearance features, high-level semantic features, and global context features.
We show that the proposed approach outperforms the state-of-the-art methods both quantitatively and qualitatively.
arXiv Detail & Related papers (2020-03-02T04:26:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.