Short-Term Temporal Convolutional Networks for Dynamic Hand Gesture
Recognition
- URL: http://arxiv.org/abs/2001.05833v1
- Date: Tue, 31 Dec 2019 23:30:27 GMT
- Title: Short-Term Temporal Convolutional Networks for Dynamic Hand Gesture
Recognition
- Authors: Yi Zhang, Chong Wang, Ye Zheng, Jieyu Zhao, Yuqi Li and Xijiong Xie
- Abstract summary: We present a multimodal gesture recognition method based on 3D densely convolutional networks (3D-DenseNets) and improved temporal convolutional networks (TCNs)
In spatial analysis, we adopt 3D-DenseNets to learn short-term-temporal features effectively.
In temporal analysis, we use TCNs to extract temporal features and employ improved Squeeze-and-Excitation Networks (SENets) to strengthen the representational power of temporal features from each TCNs' layers.
- Score: 23.054444026402738
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The purpose of gesture recognition is to recognize meaningful movements of
human bodies, and gesture recognition is an important issue in computer vision.
In this paper, we present a multimodal gesture recognition method based on 3D
densely convolutional networks (3D-DenseNets) and improved temporal
convolutional networks (TCNs). The key idea of our approach is to find a
compact and effective representation of spatial and temporal features, which
orderly and separately divide task of gesture video analysis into two parts:
spatial analysis and temporal analysis. In spatial analysis, we adopt
3D-DenseNets to learn short-term spatio-temporal features effectively.
Subsequently, in temporal analysis, we use TCNs to extract temporal features
and employ improved Squeeze-and-Excitation Networks (SENets) to strengthen the
representational power of temporal features from each TCNs' layers. The method
has been evaluated on the VIVA and the NVIDIA Gesture Dynamic Hand Gesture
Datasets. Our approach obtains very competitive performance on VIVA benchmarks
with the classification accuracies of 91.54%, and achieve state-of-the art
performance with 86.37% accuracy on NVIDIA benchmark.
Related papers
- Deepfake Detection: Leveraging the Power of 2D and 3D CNN Ensembles [0.0]
This work presents an innovative approach to validate video content.
The methodology blends advanced 2-dimensional and 3-dimensional Convolutional Neural Networks.
Experimental validation underscores the effectiveness of this strategy, showcasing its potential in countering deepfakes generation.
arXiv Detail & Related papers (2023-10-25T06:00:37Z) - Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision.
This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z) - A Spatio-Temporal Multilayer Perceptron for Gesture Recognition [70.34489104710366]
We propose a multilayer state-weighted perceptron for gesture recognition in the context of autonomous vehicles.
An evaluation of TCG and Drive&Act datasets is provided to showcase the promising performance of our approach.
We deploy our model to our autonomous vehicle to show its real-time capability and stable execution.
arXiv Detail & Related papers (2022-04-25T08:42:47Z) - DS-Net: Dynamic Spatiotemporal Network for Video Salient Object
Detection [78.04869214450963]
We propose a novel dynamic temporal-temporal network (DSNet) for more effective fusion of temporal and spatial information.
We show that the proposed method achieves superior performance than state-of-the-art algorithms.
arXiv Detail & Related papers (2020-12-09T06:42:30Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Multi-Temporal Convolutions for Human Action Recognition in Videos [83.43682368129072]
We present a novel temporal-temporal convolution block that is capable of extracting at multiple resolutions.
The proposed blocks are lightweight and can be integrated into any 3D-CNN architecture.
arXiv Detail & Related papers (2020-11-08T10:40:26Z) - Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z) - Directional Temporal Modeling for Action Recognition [24.805397801876687]
We introduce a channel independent directional convolution (CIDC) operation, which learns to model the temporal evolution among local features.
Our CIDC network can be attached to any activity recognition backbone network.
arXiv Detail & Related papers (2020-07-21T18:49:57Z) - Interpreting video features: a comparison of 3D convolutional networks
and convolutional LSTM networks [1.462434043267217]
We compare how 3D convolutional networks and convolutional LSTM networks learn features across temporally dependent frames.
Our findings indicate that the 3D convolutional model concentrates on shorter events in the input sequence, and places its spatial focus on fewer, contiguous areas.
arXiv Detail & Related papers (2020-02-02T11:27:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.