TAPE: Temporal Attention-based Probabilistic human pose and shape
Estimation
- URL: http://arxiv.org/abs/2305.00181v1
- Date: Sat, 29 Apr 2023 06:08:43 GMT
- Title: TAPE: Temporal Attention-based Probabilistic human pose and shape
Estimation
- Authors: Nikolaos Vasilikopoulos, Nikos Kolotouros, Aggeliki Tsoli, Antonis
Argyros
- Abstract summary: Existing methods ignore the ambiguities of the reconstruction and provide a single deterministic estimate for the 3D pose.
We present a Temporal Attention based Probabilistic human pose and shape Estimation method (TAPE) that operates on an RGB video.
We show that TAPE outperforms state-of-the-art methods in standard benchmarks.
- Score: 7.22614468437919
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing 3D human pose and shape from monocular videos is a
well-studied but challenging problem. Common challenges include occlusions, the
inherent ambiguities in the 2D to 3D mapping and the computational complexity
of video processing. Existing methods ignore the ambiguities of the
reconstruction and provide a single deterministic estimate for the 3D pose. In
order to address these issues, we present a Temporal Attention based
Probabilistic human pose and shape Estimation method (TAPE) that operates on an
RGB video. More specifically, we propose to use a neural network to encode
video frames to temporal features using an attention-based neural network.
Given these features, we output a per-frame but temporally-informed probability
distribution for the human pose using Normalizing Flows. We show that TAPE
outperforms state-of-the-art methods in standard benchmarks and serves as an
effective video-based prior for optimization-based human pose and shape
estimation. Code is available at: https: //github.com/nikosvasilik/TAPE
Related papers
- UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose
Estimation [18.72362803593654]
The dominant paradigm in 3D human pose estimation that lifts a 2D pose sequence to 3D heavily relies on long-term temporal clues.
This can be attributed to their inherent inability to perceive spatial context as plain 2D joint coordinates carry no visual cues.
We propose a straightforward yet powerful solution: leveraging the readily available intermediate visual representations produced by off-the-shelf (pre-trained) 2D pose detectors.
arXiv Detail & Related papers (2023-11-06T18:04:13Z) - Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video [23.93644678238666]
We propose a Pose and Mesh Co-Evolution network (PMCE) to recover 3D human motion from a video.
The proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency.
arXiv Detail & Related papers (2023-08-20T16:03:21Z) - Live Stream Temporally Embedded 3D Human Body Pose and Shape Estimation [13.40702053084305]
We present a temporally embedded 3D human body pose and shape estimation (TePose) method to improve the accuracy and temporal consistency pose in live stream videos.
A multi-scale convolutional network is presented as the motion discriminator for adversarial training using datasets without any 3D labeling.
arXiv Detail & Related papers (2022-07-25T21:21:59Z) - Deep Two-Stream Video Inference for Human Body Pose and Shape Estimation [18.14237514372724]
We propose a new framework to generate 3D human pose and mesh from RGB videos.
We train a two-stream temporal network based on transformer to predict SMPL parameters.
The proposed algorithm is extensively evaluated on the Human3.6 and 3DPW datasets.
arXiv Detail & Related papers (2021-10-22T10:01:13Z) - Self-Attentive 3D Human Pose and Shape Estimation from Videos [82.63503361008607]
We present a video-based learning algorithm for 3D human pose and shape estimation.
We exploit temporal information in videos and propose a self-attention module.
We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets.
arXiv Detail & Related papers (2021-03-26T00:02:19Z) - Beyond Static Features for Temporally Consistent 3D Human Pose and Shape
from a Video [68.4542008229477]
We present a temporally consistent mesh recovery system (TCMR)
It effectively focuses on the past and future frames' temporal information without being dominated by the current static feature.
It significantly outperforms previous video-based methods in temporal consistency with better per-frame 3D pose and shape accuracy.
arXiv Detail & Related papers (2020-11-17T13:41:34Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Pose2Mesh: Graph Convolutional Network for 3D Human Pose and Mesh
Recovery from a 2D Human Pose [70.23652933572647]
We propose a novel graph convolutional neural network (GraphCNN)-based system that estimates the 3D coordinates of human mesh vertices directly from the 2D human pose.
We show that our Pose2Mesh outperforms the previous 3D human pose and mesh estimation methods on various benchmark datasets.
arXiv Detail & Related papers (2020-08-20T16:01:56Z) - Coherent Reconstruction of Multiple Humans from a Single Image [68.3319089392548]
In this work, we address the problem of multi-person 3D pose estimation from a single image.
A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently.
Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene.
arXiv Detail & Related papers (2020-06-15T17:51:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.