Learning Dynamics via Graph Neural Networks for Human Pose Estimation
and Tracking
- URL: http://arxiv.org/abs/2106.03772v1
- Date: Mon, 7 Jun 2021 16:36:50 GMT
- Title: Learning Dynamics via Graph Neural Networks for Human Pose Estimation
and Tracking
- Authors: Yiding Yang, Zhou Ren, Haoxiang Li, Chunluan Zhou, Xinchao Wang, Gang
Hua
- Abstract summary: We propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame.
Specifically, we derive this prediction of dynamics through a graph neural network(GNN) that explicitly accounts for both spatial-temporal and visual information.
Experiments on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed method achieves results superior to the state of the art on both human pose estimation and tracking tasks.
- Score: 98.91894395941766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-person pose estimation and tracking serve as crucial steps for video
understanding. Most state-of-the-art approaches rely on first estimating poses
in each frame and only then implementing data association and refinement.
Despite the promising results achieved, such a strategy is inevitably prone to
missed detections especially in heavily-cluttered scenes, since this
tracking-by-detection paradigm is, by nature, largely dependent on visual
evidences that are absent in the case of occlusion. In this paper, we propose a
novel online approach to learning the pose dynamics, which are independent of
pose detections in current fame, and hence may serve as a robust estimation
even in challenging scenarios including occlusion. Specifically, we derive this
prediction of dynamics through a graph neural network~(GNN) that explicitly
accounts for both spatial-temporal and visual information. It takes as input
the historical pose tracklets and directly predicts the corresponding poses in
the following frame for each tracklet. The predicted poses will then be
aggregated with the detected poses, if any, at the same frame so as to produce
the final pose, potentially recovering the occluded joints missed by the
estimator. Experiments on PoseTrack 2017 and PoseTrack 2018 datasets
demonstrate that the proposed method achieves results superior to the state of
the art on both human pose estimation and tracking tasks.
Related papers
- Improving Multi-Person Pose Tracking with A Confidence Network [37.84514614455588]
We develop a novel keypoint confidence network and a tracking pipeline to improve human detection and pose estimation.
Specifically, the keypoint confidence network is designed to determine whether each keypoint is occluded.
In the tracking pipeline, we propose the Bbox-revision module to reduce missing detection and the ID-retrieve module to correct lost trajectories.
arXiv Detail & Related papers (2023-10-29T06:36:27Z) - Human Pose-based Estimation, Tracking and Action Recognition with Deep
Learning: A Survey [15.920237822185301]
This paper presents a survey of pose-based applications utilizing deep learning, encompassing pose estimation, pose tracking, and action recognition.
Pose estimation involves the determination of human joint positions from images or image sequences.
Pose tracking is an emerging research direction aimed at generating consistent human pose trajectories over time.
Action recognition targets the identification of action types using pose estimation or tracking data.
arXiv Detail & Related papers (2023-10-19T17:59:04Z) - Understanding Pose and Appearance Disentanglement in 3D Human Pose
Estimation [72.50214227616728]
Several methods have proposed to learn image representations in a self-supervised fashion so as to disentangle the appearance information from the pose one.
We study disentanglement from the perspective of the self-supervised network, via diverse image synthesis experiments.
We design an adversarial strategy focusing on generating natural appearance changes of the subject, and against which we could expect a disentangled network to be robust.
arXiv Detail & Related papers (2023-09-20T22:22:21Z) - DiffPose: SpatioTemporal Diffusion Model for Video-Based Human Pose
Estimation [16.32910684198013]
We present DiffPose, a novel diffusion architecture that formulates video-based human pose estimation as a conditional heatmap generation problem.
We show two unique characteristics from DiffPose on pose estimation task: (i) the ability to combine multiple sets of pose estimates to improve prediction accuracy, particularly for challenging joints, and (ii) the ability to adjust the number of iterative steps for feature refinement without retraining the model.
arXiv Detail & Related papers (2023-07-31T14:00:23Z) - Neural Rendering of Humans in Novel View and Pose from Monocular Video [68.37767099240236]
We introduce a new method that generates photo-realistic humans under novel views and poses given a monocular video as input.
Our method significantly outperforms existing approaches under unseen poses and novel views given monocular videos as input.
arXiv Detail & Related papers (2022-04-04T03:09:20Z) - Investigating Pose Representations and Motion Contexts Modeling for 3D
Motion Prediction [63.62263239934777]
We conduct an indepth study on various pose representations with a focus on their effects on the motion prediction task.
We propose a novel RNN architecture termed AHMR (Attentive Hierarchical Motion Recurrent network) for motion prediction.
Our approach outperforms the state-of-the-art methods in short-term prediction and achieves much enhanced long-term prediction proficiency.
arXiv Detail & Related papers (2021-12-30T10:45:22Z) - MSR-GCN: Multi-Scale Residual Graph Convolution Networks for Human
Motion Prediction [34.565986275769745]
We propose a novel Multi-Scale Residual Graph Convolution Network (MSR-GCN) for human pose prediction task.
Our proposed approach is evaluated on two standard benchmark datasets, i.e., the Human3.6M dataset and the CMU Mocap dataset.
arXiv Detail & Related papers (2021-08-16T15:26:23Z) - Multi-level Motion Attention for Human Motion Prediction [132.29963836262394]
We study the use of different types of attention, computed at joint, body part, and full pose levels.
Our experiments on Human3.6M, AMASS and 3DPW validate the benefits of our approach for both periodical and non-periodical actions.
arXiv Detail & Related papers (2021-06-17T08:08:11Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Deep Dual Consecutive Network for Human Pose Estimation [44.41818683253614]
We propose a novel multi-frame human pose estimation framework, leveraging abundant temporal cues between video frames to facilitate keypoint detection.
Our method ranks No.1 in the Multi-frame Person Pose Challenge Challenge on the large-scale benchmark datasets PoseTrack 2017 and PoseTrack 2018.
arXiv Detail & Related papers (2021-03-12T13:11:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.