3D Human Pose Estimation with Spatial and Temporal Transformers
- URL: http://arxiv.org/abs/2103.10455v1
- Date: Thu, 18 Mar 2021 18:14:37 GMT
- Title: 3D Human Pose Estimation with Spatial and Temporal Transformers
- Authors: Ce Zheng, Sijie Zhu, Matias Mendieta, Taojiannan Yang, Chen Chen,
Zhengming Ding
- Abstract summary: We present PoseFormer, a purely transformer-based approach for 3D human pose estimation in videos.
Inspired by recent developments in vision transformers, we design a spatial-temporal transformer structure.
We quantitatively and qualitatively evaluate our method on two popular and standard benchmark datasets.
- Score: 59.433208652418976
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer architectures have become the model of choice in natural language
processing and are now being introduced into computer vision tasks such as
image classification, object detection, and semantic segmentation. However, in
the field of human pose estimation, convolutional architectures still remain
dominant. In this work, we present PoseFormer, a purely transformer-based
approach for 3D human pose estimation in videos without convolutional
architectures involved. Inspired by recent developments in vision transformers,
we design a spatial-temporal transformer structure to comprehensively model the
human joint relations within each frame as well as the temporal correlations
across frames, then output an accurate 3D human pose of the center frame. We
quantitatively and qualitatively evaluate our method on two popular and
standard benchmark datasets: Human3.6M and MPI-INF-3DHP. Extensive experiments
show that PoseFormer achieves state-of-the-art performance on both datasets.
Code is available at \url{https://github.com/zczcwh/PoseFormer}
Related papers
- UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - SkelFormer: Markerless 3D Pose and Shape Estimation using Skeletal Transformers [57.46911575980854]
We introduce SkelFormer, a novel markerless motion capture pipeline for multi-view human pose and shape estimation.
Our method first uses off-the-shelf 2D keypoint estimators, pre-trained on large-scale in-the-wild data, to obtain 3D joint positions.
Next, we design a regression-based inverse-kinematic skeletal transformer that maps the joint positions to pose and shape representations from heavily noisy observations.
arXiv Detail & Related papers (2024-04-19T04:51:18Z) - Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers [28.38686299271394]
We propose a framework for 3D sequence-to-sequence (seq2seq) human pose detection.
Firstly, the spatial module represents the human pose feature by intra-image content, while the frame-image relation module extracts temporal relationships.
Our method is evaluated on Human3.6M, a popular 3D human pose detection dataset.
arXiv Detail & Related papers (2024-01-30T03:00:25Z) - Multiple View Geometry Transformers for 3D Human Pose Estimation [35.26756920323391]
We aim to improve the 3D reasoning ability of Transformers in multi-view 3D human pose estimation.
We propose a novel hybrid model, MVGFormer, which has a series of geometric and appearance modules organized in an iterative manner.
arXiv Detail & Related papers (2023-11-18T06:32:40Z) - Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video [23.93644678238666]
We propose a Pose and Mesh Co-Evolution network (PMCE) to recover 3D human motion from a video.
The proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency.
arXiv Detail & Related papers (2023-08-20T16:03:21Z) - Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion
Modeling [83.76377808476039]
We propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior.
Inspired by the field of non-rigid structure-from-motion, we divide the task of reconstructing 3D human skeletons in motion into the estimation of a 3D reference skeleton.
A mixed spatial-temporal NRSfMformer is used to simultaneously estimate the 3D reference skeleton and the skeleton deformation of each frame from 2D observations sequence.
arXiv Detail & Related papers (2023-08-18T16:41:57Z) - EVOPOSE: A Recursive Transformer For 3D Human Pose Estimation With
Kinematic Structure Priors [72.33767389878473]
We propose a transformer-based model EvoPose to introduce the human body prior knowledge for 3D human pose estimation effectively.
A Structural Priors Representation (SPR) module represents human priors as structural features carrying rich body patterns.
A Recursive Refinement (RR) module is applied to the 3D pose outputs by utilizing estimated results and further injects human priors simultaneously.
arXiv Detail & Related papers (2023-06-16T04:09:16Z) - CrossFormer: Cross Spatio-Temporal Transformer for 3D Human Pose
Estimation [24.08170512746056]
3D human pose estimation can be handled by encoding the geometric dependencies between the body parts and enforcing the kinematic constraints.
Recent Transformer has been adopted to encode the long-range dependencies between the joints in the spatial and temporal domains.
We propose a novel pose estimation Transformer featuring rich representations of body joints critical for capturing subtle changes across frames.
arXiv Detail & Related papers (2022-03-24T23:40:11Z) - THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers [67.8628917474705]
THUNDR is a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people.
We show state-of-the-art results on Human3.6M and 3DPW, for both the fully-supervised and the self-supervised models.
We observe very solid 3d reconstruction performance for difficult human poses collected in the wild.
arXiv Detail & Related papers (2021-06-17T09:09:24Z) - ProtoRes: Proto-Residual Architecture for Deep Modeling of Human Pose [6.9997407868865364]
We tackle the problem of constructing a full static human pose based on sparse and variable user inputs.
We propose a novel neural architecture that combines residual connections with prototype encoding of a partially specified pose to create a new complete pose.
We develop a user interface to integrate our neural model in Unity, a real-time 3D development platform.
arXiv Detail & Related papers (2021-06-03T16:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.