Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition
- URL: http://arxiv.org/abs/2008.09412v1
- Date: Fri, 21 Aug 2020 10:45:09 GMT
- Title: Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition
- Authors: Zitong Yu, Benjia Zhou, Jun Wan, Pichao Wang, Haoyu Chen, Xin Liu,
Stan Z. Li, Guoying Zhao
- Abstract summary: We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
- Score: 89.0152015268929
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Gesture recognition has attracted considerable attention owing to its great
potential in applications. Although the great progress has been made recently
in multi-modal learning methods, existing methods still lack effective
integration to fully explore synergies among spatio-temporal modalities
effectively for gesture recognition. The problems are partially due to the fact
that the existing manually designed network architectures have low efficiency
in the joint learning of multi-modalities. In this paper, we propose the first
neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal
representation via the proposed 3D Central Difference Convolution (3D-CDC)
family, which is able to capture rich temporal context via aggregating temporal
difference information; and 2) optimized backbones for multi-sampling-rate
branches and lateral connections among varied modalities. The resultant
multi-modal multi-rate network provides a new perspective to understand the
relationship between RGB and depth modalities and their temporal dynamics.
Comprehensive experiments are performed on three benchmark datasets (IsoGD,
NvGesture, and EgoGesture), demonstrating the state-of-the-art performance in
both single- and multi-modality settings.The code is available at
https://github.com/ZitongYu/3DCDC-NAS
Related papers
- GSPR: Multimodal Place Recognition Using 3D Gaussian Splatting for Autonomous Driving [9.023864430027333]
multimodal place recognition has gained increasing attention due to their ability to overcome weaknesses of uni sensor systems.
We propose a 3D Gaussian-based multimodal place recognition neural network dubbed GSPR.
arXiv Detail & Related papers (2024-10-01T00:43:45Z) - EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action Recognition [0.0]
We present an efficient pose-driven attention-guided multimodal action recognition (EPAM-Net) for action recognition in videos.
Specifically, we adapted X3D networks for both pose streams and network-temporal features from RGB videos and their skeleton sequences.
Our model provides a 6.2-9.9-x reduction in FLOPs (floating-point operation, in number of multiply-adds) and a 9--9.6x reduction in the number of network parameters.
arXiv Detail & Related papers (2024-08-10T03:15:24Z) - Multi-stage Factorized Spatio-Temporal Representation for RGB-D Action
and Gesture Recognition [30.975823858419965]
We propose an innovative architecture called Multi-stage Factorized-Trans (MFST) for RGB-D action and gesture recognition.
MFST model comprises a 3D Difference Con Stem (CDC-Stem) module and multiple factorizedtemporal stages.
arXiv Detail & Related papers (2023-08-23T08:49:43Z) - Two Approaches to Supervised Image Segmentation [55.616364225463066]
The present work develops comparison experiments between deep learning and multiset neurons approaches.
The deep learning approach confirmed its potential for performing image segmentation.
The alternative multiset methodology allowed for enhanced accuracy while requiring little computational resources.
arXiv Detail & Related papers (2023-07-19T16:42:52Z) - Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision.
This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z) - Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based
Action Recognition [49.163326827954656]
We propose a novel multi-granular-temporal graph network for skeleton-based action classification.
We develop a dual-head graph network consisting of two inter-leaved branches, which enables us to extract at least two-temporal resolutions.
We conduct extensive experiments on three large-scale datasets.
arXiv Detail & Related papers (2021-08-10T09:25:07Z) - Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks [87.50632573601283]
We present a novel method for multi-view depth estimation from a single video.
Our method achieves temporally coherent depth estimation results by using a novel Epipolar Spatio-Temporal (EST) transformer.
To reduce the computational cost, inspired by recent Mixture-of-Experts models, we design a compact hybrid network.
arXiv Detail & Related papers (2020-11-26T04:04:21Z) - Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation.
In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI.
We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.