Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose
Reconstruction in a Diffusion Framework
- URL: http://arxiv.org/abs/2401.09836v1
- Date: Thu, 18 Jan 2024 09:53:03 GMT
- Title: Exploring Latent Cross-Channel Embedding for Accurate 3D Human Pose
Reconstruction in a Diffusion Framework
- Authors: Junkun Jiang and Jie Chen
- Abstract summary: Monocular 3D human pose estimation poses significant challenges due to inherent depth ambiguities that arise during the reprojection process from 2D to 3D.
Recent advancements in diffusion models have shown promise in incorporating structural priors to address reprojection ambiguities.
We propose a novel cross-channel embedding framework that aims to fully explore the correlation between joint-level features of 3D coordinates and their 2D projections.
- Score: 6.669850111205944
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular 3D human pose estimation poses significant challenges due to the
inherent depth ambiguities that arise during the reprojection process from 2D
to 3D. Conventional approaches that rely on estimating an over-fit projection
matrix struggle to effectively address these challenges and often result in
noisy outputs. Recent advancements in diffusion models have shown promise in
incorporating structural priors to address reprojection ambiguities. However,
there is still ample room for improvement as these methods often overlook the
exploration of correlation between the 2D and 3D joint-level features. In this
study, we propose a novel cross-channel embedding framework that aims to fully
explore the correlation between joint-level features of 3D coordinates and
their 2D projections. In addition, we introduce a context guidance mechanism to
facilitate the propagation of joint graph attention across latent channels
during the iterative diffusion process. To evaluate the effectiveness of our
proposed method, we conduct experiments on two benchmark datasets, namely
Human3.6M and MPI-INF-3DHP. Our results demonstrate a significant improvement
in terms of reconstruction accuracy compared to state-of-the-art methods. The
code for our method will be made available online for further reference.
Related papers
- UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Learn to Optimize Denoising Scores for 3D Generation: A Unified and
Improved Diffusion Prior on NeRF and 3D Gaussian Splatting [60.393072253444934]
We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks.
We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation.
arXiv Detail & Related papers (2023-12-08T03:55:34Z) - StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D [88.66678730537777]
We present StableDreamer, a methodology incorporating three advances.
First, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss.
Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition.
arXiv Detail & Related papers (2023-12-02T02:27:58Z) - JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human
Mesh Recovery [84.67823511418334]
This paper presents 3D JOint contrastive learning with TRansformers framework for handling occluded 3D human mesh recovery.
Our method includes an encoder-decoder transformer architecture to fuse 2D and 3D representations for achieving 2D$&$3D aligned results.
arXiv Detail & Related papers (2023-07-31T02:58:58Z) - Learning Scene Flow With Skeleton Guidance For 3D Action Recognition [1.5954459915735735]
This work demonstrates the use of 3D flow sequence by a deeptemporal model for 3D action recognition.
An extended deep skeleton is also introduced to learn the most discriminant action motion dynamics.
A late fusion scheme is adopted between the two models for learning the high level cross-modal correlations.
arXiv Detail & Related papers (2023-06-23T04:14:25Z) - DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion
Probabilistic Model [25.223801390996435]
This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection.
We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector.
We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets.
arXiv Detail & Related papers (2022-12-06T07:22:20Z) - (Fusionformer):Exploiting the Joint Motion Synergy with Fusion Network
Based On Transformer for 3D Human Pose Estimation [1.52292571922932]
Many previous methods lack the understanding of local joint information.cite8888987considers the temporal relationship of a single joint in this work.
Our proposed textbfFusionformer method introduces a global-temporal self-trajectory module and a cross-temporal self-trajectory module.
The results show an improvement of 2.4% MPJPE and 4.3% P-MPJPE on the Human3.6M dataset.
arXiv Detail & Related papers (2022-10-08T12:22:10Z) - Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose
Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations.
We derive suitable measures to quantify prediction uncertainty at both pose and joint level.
We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.