Related papers: $\text{Di}^2\text{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

$\text{Di}^2\text{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation

URL: http://arxiv.org/abs/2405.17016v1
Date: Mon, 27 May 2024 10:01:36 GMT
Title: $\text{Di}^2\text{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation
Authors: Weiquan Wang, Jun Xiao, Chunping Wang, Wei Liu, Zhao Wang, Long Chen,
Abstract summary: We introduce the Discrete Diffusion Pose ($textDi2textPose$), a novel framework designed for occluded 3D human pose estimation. $textDi2textPose$ employs a two-stage process: it first converts 3D poses into a discrete representation through a emphpose quantization step. This methodological innovation restrictively confines the search space towards physically viable configurations.
Score: 17.281031933210762
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Continuous diffusion models have demonstrated their effectiveness in addressing the inherent uncertainty and indeterminacy in monocular 3D human pose estimation (HPE). Despite their strengths, the need for large search spaces and the corresponding demand for substantial training data make these models prone to generating biomechanically unrealistic poses. This challenge is particularly noticeable in occlusion scenarios, where the complexity of inferring 3D structures from 2D images intensifies. In response to these limitations, we introduce the Discrete Diffusion Pose ($\text{Di}^2\text{Pose}$), a novel framework designed for occluded 3D HPE that capitalizes on the benefits of a discrete diffusion model. Specifically, $\text{Di}^2\text{Pose}$ employs a two-stage process: it first converts 3D poses into a discrete representation through a \emph{pose quantization step}, which is subsequently modeled in latent space through a \emph{discrete diffusion process}. This methodological innovation restrictively confines the search space towards physically viable configurations and enhances the model's capability to comprehend how occlusions affect human pose within the latent space. Extensive evaluations conducted on various benchmarks (e.g., Human3.6M, 3DPW, and 3DPW-Occ) have demonstrated its effectiveness.

Related papers

HyperDiff: Hypergraph Guided Diffusion Model for 3D Human Pose Estimation [15.321095223060768]
This paper introduces a novel 3D pose estimation method, HyperDiff, which integrates diffusion models with HyperGCN.<n>Results demonstrate that HyperDiff achieves state-of-the-art performance on the Human3.6M and MPI-INF-3DHP datasets.
arXiv Detail & Related papers (2025-08-20T05:03:55Z)
StarPose: 3D Human Pose Estimation via Spatial-Temporal Autoregressive Diffusion [29.682018018059043]
StarPose is an autoregressive diffusion framework for 3D human pose estimation.<n>It incorporates historical 3D pose predictions and spatial-temporal physical guidance.<n>It achieves superior accuracy and temporal consistency in 3D human pose estimation.
arXiv Detail & Related papers (2025-08-04T04:50:05Z)
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding [58.38294408121273]
We propose Cross-modal and Uncertainty-aware Agglomeration for Open-vocabulary 3D Scene Understanding dubbed CUA-O3D. Our method addresses two key challenges: (1) incorporating semantic priors from VLMs alongside the geometric knowledge of spatially-aware vision foundation models, and (2) using a novel deterministic uncertainty estimation to capture model-specific uncertainties.
arXiv Detail & Related papers (2025-03-20T20:58:48Z)
Towards Unified Latent Space for 3D Molecular Latent Diffusion Modeling [80.59215359958934]
3D molecule generation is crucial for drug discovery and material science. Existing approaches typically maintain separate latent spaces for invariant and equivariant modalities. We propose a multi-modal VAE that compresses 3D molecules into latent sequences from a unified latent space.
arXiv Detail & Related papers (2025-03-19T08:56:13Z)
Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model architecture. We represent a scene layout by a 2D floor plan and a set of objects, each defined by its category, location, size, and orientation. Our experimental results demonstrate that MiDiffusion substantially outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z)
ManiPose: Manifold-Constrained Multi-Hypothesis 3D Human Pose Estimation [54.86887812687023]
Most 3D-HPE methods rely on regression models, which assume a one-to-one mapping between inputs and outputs. We propose ManiPose, a novel manifold-constrained multi-hypothesis model capable of proposing multiple candidate 3D poses for each 2D input. Unlike previous multi-hypothesis approaches, our solution is completely supervised and does not rely on complex generative models.
arXiv Detail & Related papers (2023-12-11T13:50:10Z)
DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion [54.0238087499699]
We show that diffusion models enhance the accuracy, robustness, and coherence of human pose estimations. We introduce DiffHPE, a novel strategy for harnessing diffusion models in 3D-HPE. Our findings indicate that while standalone diffusion models provide commendable performance, their accuracy is even better in combination with supervised models.
arXiv Detail & Related papers (2023-09-04T12:54:10Z)
DiffPose: Multi-hypothesis Human Pose Estimation using Diffusion models [5.908471365011943]
We propose emphDiffPose, a conditional diffusion model that predicts multiple hypotheses for a given input image. We show that DiffPose slightly improves upon the state of the art for multi-hypothesis pose estimation for simple poses and outperforms it by a large margin for highly ambiguous poses.
arXiv Detail & Related papers (2022-11-29T18:55:13Z)
A generic diffusion-based approach for 3D human pose prediction in the wild [68.00961210467479]
3D human pose forecasting, i.e., predicting a sequence of future human 3D poses given a sequence of past observed ones, is a challenging-temporal task. We provide a unified formulation in which incomplete elements (no matter in the prediction or observation) are treated as noise and propose a conditional diffusion model that denoises them and forecasts plausible poses. We investigate our findings on four standard datasets and obtain significant improvements over the state-of-the-art.
arXiv Detail & Related papers (2022-10-11T17:59:54Z)
Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations. We derive suitable measures to quantify prediction uncertainty at both pose and joint level. We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z)
Kinematic-Structure-Preserved Representation for Unsupervised 3D Human Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable. We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions. Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.