Jointformer: Single-Frame Lifting Transformer with Error Prediction and
Refinement for 3D Human Pose Estimation
- URL: http://arxiv.org/abs/2208.03704v1
- Date: Sun, 7 Aug 2022 12:07:19 GMT
- Title: Jointformer: Single-Frame Lifting Transformer with Error Prediction and
Refinement for 3D Human Pose Estimation
- Authors: Sebastian Lutz and Richard Blythman and Koustav Ghosal and Matthew
Moynihan and Ciaran Simms and Aljosa Smolic
- Abstract summary: 3D human pose estimation technologies have the potential to greatly increase the availability of human movement data.
The best-performing models for single-image 2D-3D lifting use graph convolutional networks (GCNs) that typically require some manual input to define the relationships between different body joints.
We propose a novel transformer-based approach that uses the more generalised self-attention mechanism to learn these relationships.
- Score: 11.592567773739407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monocular 3D human pose estimation technologies have the potential to greatly
increase the availability of human movement data. The best-performing models
for single-image 2D-3D lifting use graph convolutional networks (GCNs) that
typically require some manual input to define the relationships between
different body joints. We propose a novel transformer-based approach that uses
the more generalised self-attention mechanism to learn these relationships
within a sequence of tokens representing joints. We find that the use of
intermediate supervision, as well as residual connections between the stacked
encoders benefits performance. We also suggest that using error prediction as
part of a multi-task learning framework improves performance by allowing the
network to compensate for its confidence level. We perform extensive ablation
studies to show that each of our contributions increases performance.
Furthermore, we show that our approach outperforms the recent state of the art
for single-frame 3D human pose estimation by a large margin. Our code and
trained models are made publicly available on Github.
Related papers
- Double-chain Constraints for 3D Human Pose Estimation in Images and
Videos [21.42410292863492]
Reconstructing 3D poses from 2D poses lacking depth information is challenging due to the complexity and diversity of human motion.
We propose a novel model, called Double-chain Graph Convolutional Transformer (DC-GCT), to constrain the pose.
We show that DC-GCT achieves state-of-the-art performance on two challenging datasets.
arXiv Detail & Related papers (2023-08-10T02:41:18Z) - STPOTR: Simultaneous Human Trajectory and Pose Prediction Using a
Non-Autoregressive Transformer for Robot Following Ahead [8.227864212055035]
We develop a neural network model to predict future human motion from an observed human motion history.
We propose a non-autoregressive transformer architecture to leverage its parallel nature for easier training and fast, accurate predictions at test time.
Our model is well-suited for robotic applications in terms of test accuracy and speed favorably with respect to state-of-the-art methods.
arXiv Detail & Related papers (2022-09-15T20:27:54Z) - Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose
Estimation [63.199549837604444]
3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision.
We cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target.
We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-05T03:52:57Z) - 3D Pose Estimation and Future Motion Prediction from 2D Images [26.28886209268217]
This paper considers to jointly tackle the highly correlated tasks of estimating 3D human body poses and predicting future 3D motions from RGB image sequences.
Based on Lie algebra pose representation, a novel self-projection mechanism is proposed that naturally preserves human motion kinematics.
arXiv Detail & Related papers (2021-11-26T01:02:00Z) - Higher-Order Implicit Fairing Networks for 3D Human Pose Estimation [1.1501261942096426]
We introduce a higher-order graph convolutional framework with initial residual connections for 2D-to-3D pose estimation.
Our model is able to capture the long-range dependencies between body joints.
Experiments and ablations studies conducted on two standard benchmarks demonstrate the effectiveness of our model.
arXiv Detail & Related papers (2021-11-01T13:48:55Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Unsupervised Cross-Modal Alignment for Multi-Person 3D Pose Estimation [52.94078950641959]
We present a deployment friendly, fast bottom-up framework for multi-person 3D human pose estimation.
We adopt a novel neural representation of multi-person 3D pose which unifies the position of person instances with their corresponding 3D pose representation.
We propose a practical deployment paradigm where paired 2D or 3D pose annotations are unavailable.
arXiv Detail & Related papers (2020-08-04T07:54:25Z) - HMOR: Hierarchical Multi-Person Ordinal Relations for Monocular
Multi-Person 3D Pose Estimation [54.23770284299979]
This paper introduces a novel form of supervision - Hierarchical Multi-person Ordinal Relations (HMOR)
HMOR encodes interaction information as the ordinal relations of depths and angles hierarchically.
An integrated top-down model is designed to leverage these ordinal relations in the learning process.
The proposed method significantly outperforms state-of-the-art methods on publicly available multi-person 3D pose datasets.
arXiv Detail & Related papers (2020-08-01T07:53:27Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.