Human Action Recognition Based on Multi-scale Feature Maps from Depth
Video Sequences
- URL: http://arxiv.org/abs/2101.07618v1
- Date: Tue, 19 Jan 2021 13:46:42 GMT
- Title: Human Action Recognition Based on Multi-scale Feature Maps from Depth
Video Sequences
- Authors: Chang Li and Qian Huang and Xing Li and Qianhan Wu
- Abstract summary: We present a novel framework focusing on multi-scale motion information to recognize human actions from depth video sequences.
We employ depth motion images (DMI) as the templates to generate the multi-scale static representation of actions.
We extract the multi-granularity descriptor called LP-DMI-HOG to provide more discriminative features.
- Score: 12.30399970340689
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human action recognition is an active research area in computer vision.
Although great process has been made, previous methods mostly recognize actions
based on depth data at only one scale, and thus they often neglect multi-scale
features that provide additional information action recognition in practical
application scenarios. In this paper, we present a novel framework focusing on
multi-scale motion information to recognize human actions from depth video
sequences. We propose a multi-scale feature map called Laplacian pyramid depth
motion images(LP-DMI). We employ depth motion images (DMI) as the templates to
generate the multi-scale static representation of actions. Then, we caculate
LP-DMI to enhance multi-scale dynamic information of motions and reduces
redundant static information in human bodies. We further extract the
multi-granularity descriptor called LP-DMI-HOG to provide more discriminative
features. Finally, we utilize extreme learning machine (ELM) for action
classification. The proposed method yeilds the recognition accuracy of 93.41%,
85.12%, 91.94% on public MSRAction3D dataset, UTD-MHAD and DHA dataset. Through
extensive experiments, we prove that our method outperforms state-of-the-art
benchmarks.
Related papers
- MSSPlace: Multi-Sensor Place Recognition with Visual and Text Semantics [41.94295877935867]
We study the impact of leveraging a multi-camera setup and integrating diverse data sources for multimodal place recognition.
Our proposed method named MSSPlace utilizes images from multiple cameras, LiDAR point clouds, semantic segmentation masks, and text annotations to generate comprehensive place descriptors.
arXiv Detail & Related papers (2024-07-22T14:24:56Z) - Joint Depth Prediction and Semantic Segmentation with Multi-View SAM [59.99496827912684]
We propose a Multi-View Stereo (MVS) technique for depth prediction that benefits from rich semantic features of the Segment Anything Model (SAM)
This enhanced depth prediction, in turn, serves as a prompt to our Transformer-based semantic segmentation decoder.
arXiv Detail & Related papers (2023-10-31T20:15:40Z) - MSMG-Net: Multi-scale Multi-grained Supervised Metworks for Multi-task
Image Manipulation Detection and Localization [1.14219428942199]
A novel multi-scale multi-grained deep network (MSMG-Net) is proposed to automatically identify manipulated regions.
In our MSMG-Net, a parallel multi-scale feature extraction structure is used to extract multi-scale features.
The MSMG-Net can effectively perceive the object-level semantics and encode the edge artifact.
arXiv Detail & Related papers (2022-11-06T14:58:21Z) - UMSNet: An Universal Multi-sensor Network for Human Activity Recognition [10.952666953066542]
This paper proposes a universal multi-sensor network (UMSNet) for human activity recognition.
In particular, we propose a new lightweight sensor residual block (called LSR block), which improves the performance.
Our framework has a clear structure and can be directly applied to various types of multi-modal Time Series Classification tasks.
arXiv Detail & Related papers (2022-05-24T03:29:54Z) - Slow-Fast Visual Tempo Learning for Video-based Action Recognition [78.3820439082979]
Action visual tempo characterizes the dynamics and the temporal scale of an action.
Previous methods capture the visual tempo either by sampling raw videos with multiple rates, or by hierarchically sampling backbone features.
We propose a Temporal Correlation Module (TCM) to extract action visual tempo from low-level backbone features at single-layer remarkably.
arXiv Detail & Related papers (2022-02-24T14:20:04Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection [74.19291916812921]
forged images generated by Deepfake techniques pose a serious threat to the trustworthiness of digital information.
In this paper, we aim to capture the subtle manipulation artifacts at different scales for Deepfake detection.
We introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods.
arXiv Detail & Related papers (2021-04-20T05:43:44Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Depth Guided Adaptive Meta-Fusion Network for Few-shot Video Recognition [86.31412529187243]
Few-shot video recognition aims at learning new actions with only very few labeled samples.
We propose a depth guided Adaptive Meta-Fusion Network for few-shot video recognition which is termed as AMeFu-Net.
arXiv Detail & Related papers (2020-10-20T03:06:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.