Relation-Based Associative Joint Location for Human Pose Estimation in
Videos
- URL: http://arxiv.org/abs/2107.03591v3
- Date: Fri, 30 Jun 2023 09:52:30 GMT
- Title: Relation-Based Associative Joint Location for Human Pose Estimation in
Videos
- Authors: Yonghao Dang and Jianqin Yin and Shaojie Zhang
- Abstract summary: We design a lightweight and plug-and-play joint relation extractor (JRE) to model the associative relationship between joints explicitly and automatically.
The JRE flexibly learns the relationship between any two joints, allowing it to learn the rich spatial configuration of human poses.
Then, combined with temporal semantic continuity modeling, we propose a Relation-based Pose Semantics Transfer Network (RPSTN) for video-based human pose estimation.
- Score: 5.237054164442403
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video-based human pose estimation (VHPE) is a vital yet challenging task.
While deep learning methods have made significant progress for the VHPE, most
approaches to this task implicitly model the long-range interaction between
joints by enlarging the receptive field of the convolution. Unlike prior
methods, we design a lightweight and plug-and-play joint relation extractor
(JRE) to model the associative relationship between joints explicitly and
automatically. The JRE takes the pseudo heatmaps of joints as input and
calculates the similarity between pseudo heatmaps. In this way, the JRE
flexibly learns the relationship between any two joints, allowing it to learn
the rich spatial configuration of human poses. Moreover, the JRE can infer
invisible joints according to the relationship between joints, which is
beneficial for the model to locate occluded joints. Then, combined with
temporal semantic continuity modeling, we propose a Relation-based Pose
Semantics Transfer Network (RPSTN) for video-based human pose estimation.
Specifically, to capture the temporal dynamics of poses, the pose semantic
information of the current frame is transferred to the next with a joint
relation guided pose semantics propagator (JRPSP). The proposed model can
transfer the pose semantic features from the non-occluded frame to the occluded
frame, making our method robust to the occlusion. Furthermore, the proposed JRE
module is also suitable for image-based human pose estimation. The proposed
RPSTN achieves state-of-the-art results on the video-based Penn Action dataset,
Sub-JHMDB dataset, and PoseTrack2018 dataset. Moreover, the proposed JRE
improves the performance of backbones on the image-based COCO2017 dataset. Code
is available at https://github.com/YHDang/pose-estimation.
Related papers
- G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis [57.07638884476174]
G-HOP is a denoising diffusion based generative prior for hand-object interactions.
We represent the human hand via a skeletal distance field to obtain a representation aligned with the signed distance field for the object.
We show that this hand-object prior can then serve as generic guidance to facilitate other tasks like reconstruction from interaction clip and human grasp synthesis.
arXiv Detail & Related papers (2024-04-18T17:59:28Z) - Video-Based Human Pose Regression via Decoupled Space-Time Aggregation [0.5524804393257919]
We develop an efficient and effective video-based human pose regression method, which bypasses intermediate representations such as asmaps and instead directly maps the input to the joint coordinates.
Our method is capable of efficiently and flexibly utilizing the spatial dependency of adjacent joints and the temporal dependency of each joint itself.
Our approach either surpasses or is on par with the state-of-the-art heatmap-based multi-frame human pose estimation methods.
arXiv Detail & Related papers (2024-03-29T02:26:22Z) - Spatio-temporal MLP-graph network for 3D human pose estimation [8.267311047244881]
Graph convolutional networks and their variants have shown significant promise in 3D human pose estimation.
We introduce a new weighted Jacobi feature rule obtained through graph filtering with implicit propagation fairing.
We also employ adjacency modulation with the aim of learning meaningful correlations beyond defined between body joints.
arXiv Detail & Related papers (2023-08-29T14:00:55Z) - Joint-Relation Transformer for Multi-Person Motion Prediction [79.08243886832601]
We propose the Joint-Relation Transformer to enhance interaction modeling.
Our method achieves a 13.4% improvement of 900ms VIM on 3DPW-SoMoF/RC and 17.8%/12.0% improvement of 3s MPJPE.
arXiv Detail & Related papers (2023-08-09T09:02:47Z) - (Fusionformer):Exploiting the Joint Motion Synergy with Fusion Network
Based On Transformer for 3D Human Pose Estimation [1.52292571922932]
Many previous methods lack the understanding of local joint information.cite8888987considers the temporal relationship of a single joint in this work.
Our proposed textbfFusionformer method introduces a global-temporal self-trajectory module and a cross-temporal self-trajectory module.
The results show an improvement of 2.4% MPJPE and 4.3% P-MPJPE on the Human3.6M dataset.
arXiv Detail & Related papers (2022-10-08T12:22:10Z) - RelPose: Predicting Probabilistic Relative Rotation for Single Objects
in the Wild [73.1276968007689]
We describe a data-driven method for inferring the camera viewpoints given multiple images of an arbitrary object.
We show that our approach outperforms state-of-the-art SfM and SLAM methods given sparse images on both seen and unseen categories.
arXiv Detail & Related papers (2022-08-11T17:59:59Z) - Kinematics Modeling Network for Video-based Human Pose Estimation [9.506011491028891]
Estimating human poses from videos is critical in human-computer interaction.
Joints cooperate rather than move independently during human movement.
We propose a plug-and-play kinematics modeling module (KMM) to explicitly model temporal correlations between joints.
arXiv Detail & Related papers (2022-07-22T09:37:48Z) - MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose
Estimation in Video [75.23812405203778]
Recent solutions have been introduced to estimate 3D human pose from 2D keypoint sequence by considering body joints among all frames globally to learn-temporal correlation.
We propose Mix Mix, which has temporal transformer block to separately model the temporal motion of each joint and a transformer block inter-joint spatial correlation.
In addition, the network output is extended from the central frame to entire frames of input video, improving the coherence between the input and output benchmarks.
arXiv Detail & Related papers (2022-03-02T04:20:59Z) - Motion Prediction via Joint Dependency Modeling in Phase Space [40.54430409142653]
We introduce a novel convolutional neural model to leverage explicit prior knowledge of motion anatomy.
We then propose a global optimization module that learns the implicit relationships between individual joint features.
Our method is evaluated on large-scale 3D human motion benchmark datasets.
arXiv Detail & Related papers (2022-01-07T08:30:01Z) - Hierarchical Neural Implicit Pose Network for Animation and Motion
Retargeting [66.69067601079706]
HIPNet is a neural implicit pose network trained on multiple subjects across many poses.
We employ a hierarchical skeleton-based representation to learn a signed distance function on a canonical unposed space.
We achieve state-of-the-art results on various single-subject and multi-subject benchmarks.
arXiv Detail & Related papers (2021-12-02T03:25:46Z) - Pose And Joint-Aware Action Recognition [87.4780883700755]
We present a new model for joint-based action recognition, which first extracts motion features from each joint separately through a shared motion encoder.
Our joint selector module re-weights the joint information to select the most discriminative joints for the task.
We show large improvements over the current state-of-the-art joint-based approaches on JHMDB, HMDB, Charades, AVA action recognition datasets.
arXiv Detail & Related papers (2020-10-16T04:43:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.