Integrating Human Parsing and Pose Network for Human Action Recognition
- URL: http://arxiv.org/abs/2307.07977v1
- Date: Sun, 16 Jul 2023 07:58:29 GMT
- Title: Integrating Human Parsing and Pose Network for Human Action Recognition
- Authors: Runwei Ding, Yuhang Wen, Jinfu Liu, Nan Dai, Fanyang Meng, Mengyuan
Liu
- Abstract summary: We introduce human parsing feature map as a novel modality for action recognition.
We propose Integrating Human Parsing and Pose Network (IPP-Net) for action recognition.
IPP-Net is the first to leverage both skeletons and human parsing feature maps dualbranch approach.
- Score: 12.308394270240463
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human skeletons and RGB sequences are both widely-adopted input modalities
for human action recognition. However, skeletons lack appearance features and
color data suffer large amount of irrelevant depiction. To address this, we
introduce human parsing feature map as a novel modality, since it can
selectively retain spatiotemporal features of the body parts, while filtering
out noises regarding outfits, backgrounds, etc. We propose an Integrating Human
Parsing and Pose Network (IPP-Net) for action recognition, which is the first
to leverage both skeletons and human parsing feature maps in dual-branch
approach. The human pose branch feeds compact skeletal representations of
different modalities in graph convolutional network to model pose features. In
human parsing branch, multi-frame body-part parsing features are extracted with
human detector and parser, which is later learnt using a convolutional
backbone. A late ensemble of two branches is adopted to get final predictions,
considering both robust keypoints and rich semantic body-part features.
Extensive experiments on NTU RGB+D and NTU RGB+D 120 benchmarks consistently
verify the effectiveness of the proposed IPP-Net, which outperforms the
existing action recognition methods. Our code is publicly available at
https://github.com/liujf69/IPP-Net-Parsing .
Related papers
- DROP: Decouple Re-Identification and Human Parsing with Task-specific
Features for Occluded Person Re-identification [15.910080319118498]
The paper introduces the Decouple Re-identificatiOn and human Parsing (DROP) method for occluded person re-identification (ReID)
Unlike mainstream approaches using global features for simultaneous multi-task learning of ReID and human parsing, DROP argues that the inferior performance of the former is due to distinct requirements for ReID and human parsing features.
Experimental results highlight the efficacy of DROP, especially achieving a Rank-1 accuracy of 76.8% on Occluded-Duke, surpassing two mainstream methods.
arXiv Detail & Related papers (2024-01-31T17:54:43Z) - Explore Human Parsing Modality for Action Recognition [17.624946657761996]
We propose a new dual-branch framework called Ensemble Human Parsing and Pose Network (EPP-Net)
EPP-Net is the first to leverage both skeletons and human parsing modalities for action recognition.
arXiv Detail & Related papers (2024-01-04T08:43:41Z) - Parsing is All You Need for Accurate Gait Recognition in the Wild [51.206166843375364]
This paper presents a novel gait representation, named Gait Parsing Sequence (GPS)
GPSs are sequences of fine-grained human segmentation, extracted from video frames, so they have much higher information entropy.
We also propose a novel human parsing-based gait recognition framework, named ParsingGait.
The experimental results show a significant improvement in accuracy brought by the GPS representation and the superiority of ParsingGait.
arXiv Detail & Related papers (2023-08-31T13:57:38Z) - Direct Dense Pose Estimation [138.56533828316833]
Dense human pose estimation is the problem of learning dense correspondences between RGB images and the surfaces of human bodies.
Prior dense pose estimation methods are all based on Mask R-CNN framework and operate in a top-down manner of first attempting to identify a bounding box for each person.
We propose a novel alternative method for solving the dense pose estimation problem, called Direct Dense Pose (DDP)
arXiv Detail & Related papers (2022-04-04T06:14:38Z) - Technical Report: Disentangled Action Parsing Networks for Accurate
Part-level Action Parsing [65.87931036949458]
Part-level Action Parsing aims at part state parsing for boosting action recognition in videos.
We present a simple yet effective approach, named disentangled action parsing (DAP)
arXiv Detail & Related papers (2021-11-05T02:29:32Z) - HighlightMe: Detecting Highlights from Human-Centric Videos [52.84233165201391]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos.
We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions.
We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z) - Revisiting Skeleton-based Action Recognition [107.08112310075114]
PoseC3D is a new approach to skeleton-based action recognition, which relies on a 3D heatmap instead stack a graph sequence as the base representation of human skeletons.
On four challenging datasets, PoseC3D consistently obtains superior performance, when used alone on skeletons and in combination with the RGB modality.
arXiv Detail & Related papers (2021-04-28T06:32:17Z) - An Adversarial Human Pose Estimation Network Injected with Graph
Structure [75.08618278188209]
In this paper, we design a novel generative adversarial network (GAN) to improve the localization accuracy of visible joints when some joints are invisible.
The network consists of two simple but efficient modules, Cascade Feature Network (CFN) and Graph Structure Network (GSN)
arXiv Detail & Related papers (2021-03-29T12:07:08Z) - GPRAR: Graph Convolutional Network based Pose Reconstruction and Action
Recognition for Human Trajectory Prediction [1.2891210250935146]
Existing prediction models are easily prone to errors in real-world settings where observations are often noisy.
We introduce GPRAR, a graph convolutional network based pose reconstruction and action recognition for human trajectory prediction.
We show that GPRAR improves the prediction accuracy up to 22% and 50% under noisy observations on JAAD and TITAN datasets.
arXiv Detail & Related papers (2021-03-25T20:12:14Z) - Group-Skeleton-Based Human Action Recognition in Complex Events [15.649778891665468]
We propose a novel group-skeleton-based human action recognition method in complex events.
This method first utilizes multi-scale spatial-temporal graph convolutional networks (MS-G3Ds) to extract skeleton features from multiple persons.
Results on the HiEve dataset show that our method can give superior performance compared to other state-of-the-art methods.
arXiv Detail & Related papers (2020-11-26T13:19:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.