Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting
- URL: http://arxiv.org/abs/2402.18330v1
- Date: Wed, 28 Feb 2024 13:50:39 GMT
- Title: Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting
- Authors: Taeho Kang, Youngki Lee
- Abstract summary: We present EgoTAP, a heatmap-to-3D pose lifting method for highly accurate stereo egocentric 3D pose estimation.
Our method significantly outperforms the previous state-of-the-art qualitatively and quantitatively.
- Score: 8.134443548271301
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present EgoTAP, a heatmap-to-3D pose lifting method for highly accurate
stereo egocentric 3D pose estimation. Severe self-occlusion and out-of-view
limbs in egocentric camera views make accurate pose estimation a challenging
problem. To address the challenge, prior methods employ joint
heatmaps-probabilistic 2D representations of the body pose, but heatmap-to-3D
pose conversion still remains an inaccurate process. We propose a novel
heatmap-to-3D lifting method composed of the Grid ViT Encoder and the
Propagation Network. The Grid ViT Encoder summarizes joint heatmaps into
effective feature embedding using self-attention. Then, the Propagation Network
estimates the 3D pose by utilizing skeletal information to better estimate the
position of obscure joints. Our method significantly outperforms the previous
state-of-the-art qualitatively and quantitatively demonstrated by a 23.9\%
reduction of error in an MPJPE metric. Our source code is available in GitHub.
Related papers
- UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - A Single 2D Pose with Context is Worth Hundreds for 3D Human Pose
Estimation [18.72362803593654]
The dominant paradigm in 3D human pose estimation that lifts a 2D pose sequence to 3D heavily relies on long-term temporal clues.
This can be attributed to their inherent inability to perceive spatial context as plain 2D joint coordinates carry no visual cues.
We propose a straightforward yet powerful solution: leveraging the readily available intermediate visual representations produced by off-the-shelf (pre-trained) 2D pose detectors.
arXiv Detail & Related papers (2023-11-06T18:04:13Z) - Co-Evolution of Pose and Mesh for 3D Human Body Estimation from Video [23.93644678238666]
We propose a Pose and Mesh Co-Evolution network (PMCE) to recover 3D human motion from a video.
The proposed PMCE outperforms previous state-of-the-art methods in terms of both per-frame accuracy and temporal consistency.
arXiv Detail & Related papers (2023-08-20T16:03:21Z) - Neural Voting Field for Camera-Space 3D Hand Pose Estimation [106.34750803910714]
We present a unified framework for camera-space 3D hand pose estimation from a single RGB image based on 3D implicit representation.
We propose a novel unified 3D dense regression scheme to estimate camera-space 3D hand pose via dense 3D point-wise voting in camera frustum.
arXiv Detail & Related papers (2023-05-07T16:51:34Z) - Building Spatio-temporal Transformers for Egocentric 3D Pose Estimation [9.569752078386006]
We leverage information from past frames to guide our self-attention-based 3D estimation procedure -- Ego-STAN.
Specifically, we build atemporal Transformer model that attends to semantically rich convolutional neural network-based feature maps.
We demonstrate Ego-STAN's superior performance on the xR-EgoPose dataset.
arXiv Detail & Related papers (2022-06-09T22:33:27Z) - PONet: Robust 3D Human Pose Estimation via Learning Orientations Only [116.1502793612437]
We propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only.
PONet estimates the 3D orientation of these limbs by taking advantage of the local image evidence to recover the 3D pose.
We evaluate our method on multiple datasets, including Human3.6M, MPII, MPI-INF-3DHP, and 3DPW.
arXiv Detail & Related papers (2021-12-21T12:48:48Z) - HandsFormer: Keypoint Transformer for Monocular 3D Pose Estimation
ofHands and Object in Interaction [33.661745138578596]
We propose a robust and accurate method for estimating the 3D poses of two hands in close interaction from a single color image.
Our method starts by extracting a set of potential 2D locations for the joints of both hands as extrema of a heatmap.
We use appearance and spatial encodings of these locations as input to a transformer, and leverage the attention mechanisms to sort out the correct configuration of the joints.
arXiv Detail & Related papers (2021-04-29T20:19:20Z) - KAMA: 3D Keypoint Aware Body Mesh Articulation [79.04090630502782]
We propose an analytical solution to articulate a parametric body model, SMPL, via a set of straightforward geometric transformations.
Our approach offers significantly better alignment to image content when compared to state-of-the-art approaches.
Results on the challenging 3DPW and Human3.6M demonstrate that our approach yields state-of-the-art body mesh fittings.
arXiv Detail & Related papers (2021-04-27T23:01:03Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A
Geometric Approach [76.10879433430466]
We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs.
It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space.
The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset.
arXiv Detail & Related papers (2020-03-25T00:26:54Z) - Metric-Scale Truncation-Robust Heatmaps for 3D Human Pose Estimation [16.463390330757132]
We propose metric-scale truncation-robust volumetric heatmaps, whose dimensions are defined in metric 3D space near the subject.
We train a fully-convolutional network to estimate such heatmaps from monocular RGB in an end-to-end manner.
As our method is simple and fast, it can become a useful component for real-time top-down multi-person pose estimation systems.
arXiv Detail & Related papers (2020-03-05T22:38:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.