Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos
- URL: http://arxiv.org/abs/2109.07353v1
- Date: Wed, 15 Sep 2021 15:06:19 GMT
- Title: Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos
- Authors: Junhao Zhang, Yali Wang, Zhipeng Zhou, Tianyu Luan, Zhe Wang, Yu Qiao
- Abstract summary: Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos.
New Dynamical Graph Network (DGNet) can estimate 3D pose by adaptively learning spatial/temporal joint relations from videos.
- Score: 47.601288796052714
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph Convolution Network (GCN) has been successfully used for 3D human pose
estimation in videos. However, it is often built on the fixed human-joint
affinity, according to human skeleton. This may reduce adaptation capacity of
GCN to tackle complex spatio-temporal pose variations in videos. To alleviate
this problem, we propose a novel Dynamical Graph Network (DG-Net), which can
dynamically identify human-joint affinity, and estimate 3D pose by adaptively
learning spatial/temporal joint relations from videos. Different from
traditional graph convolution, we introduce Dynamical Spatial/Temporal Graph
convolution (DSG/DTG) to discover spatial/temporal human-joint affinity for
each video exemplar, depending on spatial distance/temporal movement similarity
between human joints in this video. Hence, they can effectively understand
which joints are spatially closer and/or have consistent motion, for reducing
depth ambiguity and/or motion uncertainty when lifting 2D pose to 3D pose. We
conduct extensive experiments on three popular benchmarks, e.g., Human3.6M,
HumanEva-I, and MPI-INF-3DHP, where DG-Net outperforms a number of recent SOTA
approaches with fewer input frames and model size.
Related papers
- 3D-UGCN: A Unified Graph Convolutional Network for Robust 3D Human Pose Estimation from Monocular RGB Images [17.673385426594418]
This paper proposes an improved method based on the spatial-temporal graph convolution net-work (UGCN) to address the issue of missing human posture skeleton sequences in single-view videos.
We present the improvedN, which allows the network to process 3D human pose data and improves the 3D human pose skeleton sequence.
arXiv Detail & Related papers (2024-07-23T02:50:27Z) - STGFormer: Spatio-Temporal GraphFormer for 3D Human Pose Estimation in Video [7.345621536750547]
This paper presents a graph-based framework for 3D human pose estimation in video.
Specifically, we develop a graph-based attention mechanism, integrating graph information directly into the respective attention layers.
We demonstrate that our method achieves significant stateof-the-art performance in 3D human pose estimation.
arXiv Detail & Related papers (2024-07-14T06:45:27Z) - A hybrid classification-regression approach for 3D hand pose estimation
using graph convolutional networks [1.0152838128195467]
We propose a two-stage GCN-based framework that learns per-pose relationship constraints.
The first phase quantizes the 2D/3D space to classify the joints into 2D/3D blocks based on their locality.
The second stage uses a GCN-based module that uses an adaptative nearest neighbor algorithm to determine joint relationships.
arXiv Detail & Related papers (2021-05-23T10:09:10Z) - 3D Human Pose Regression using Graph Convolutional Network [68.8204255655161]
We propose a graph convolutional network named PoseGraphNet for 3D human pose regression from 2D poses.
Our model's performance is close to the state-of-the-art, but with much fewer parameters.
arXiv Detail & Related papers (2021-05-21T14:41:31Z) - Neural Monocular 3D Human Motion Capture with Physical Awareness [76.55971509794598]
We present a new trainable system for physically plausible markerless 3D human motion capture.
Unlike most neural methods for human motion capture, our approach is aware of physical and environmental constraints.
It produces smooth and physically principled 3D motions in an interactive frame rate in a wide variety of challenging scenes.
arXiv Detail & Related papers (2021-05-03T17:57:07Z) - Graph and Temporal Convolutional Networks for 3D Multi-person Pose
Estimation in Monocular Videos [33.974241749058585]
We propose a novel framework integrating graph convolutional networks (GCNs) and temporal convolutional networks (TCNs) to robustly estimate camera-centric multi-person 3D poses.
In particular, we introduce a human-joint GCN, which employs the 2D pose estimator's confidence scores to improve the pose estimation results.
The two GCNs work together to estimate the spatial frame-wise 3D poses and can make use of both visible joint and bone information in the target frame to estimate the occluded or missing human-part information.
arXiv Detail & Related papers (2020-12-22T03:01:19Z) - HMOR: Hierarchical Multi-Person Ordinal Relations for Monocular
Multi-Person 3D Pose Estimation [54.23770284299979]
This paper introduces a novel form of supervision - Hierarchical Multi-person Ordinal Relations (HMOR)
HMOR encodes interaction information as the ordinal relations of depths and angles hierarchically.
An integrated top-down model is designed to leverage these ordinal relations in the learning process.
The proposed method significantly outperforms state-of-the-art methods on publicly available multi-person 3D pose datasets.
arXiv Detail & Related papers (2020-08-01T07:53:27Z) - Motion Guided 3D Pose Estimation from Videos [81.14443206968444]
We propose a new loss function, called motion loss, for the problem of monocular 3D Human pose estimation from 2D pose.
In computing motion loss, a simple yet effective representation for keypoint motion, called pairwise motion encoding, is introduced.
We design a new graph convolutional network architecture, U-shaped GCN (UGCN), which captures both short-term and long-term motion information.
arXiv Detail & Related papers (2020-04-29T06:59:30Z) - A Graph Attention Spatio-temporal Convolutional Network for 3D Human
Pose Estimation in Video [7.647599484103065]
We improve the learning of constraints in human skeleton by modeling local global spatial information via attention mechanisms.
Our approach effectively mitigates depth ambiguity and self-occlusion, generalizes to half upper body estimation, and achieves competitive performance on 2D-to-3D video pose estimation.
arXiv Detail & Related papers (2020-03-11T14:54:40Z) - Anatomy-aware 3D Human Pose Estimation with Bone-based Pose
Decomposition [92.99291528676021]
Instead of directly regressing the 3D joint locations, we decompose the task into bone direction prediction and bone length prediction.
Our motivation is the fact that the bone lengths of a human skeleton remain consistent across time.
Our full model outperforms the previous best results on Human3.6M and MPI-INF-3DHP datasets.
arXiv Detail & Related papers (2020-02-24T15:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.