Related papers: HyperDiff: Hypergraph Guided Diffusion Model for 3D Human Pose Estimation

HyperDiff: Hypergraph Guided Diffusion Model for 3D Human Pose Estimation

URL: http://arxiv.org/abs/2508.14431v1
Date: Wed, 20 Aug 2025 05:03:55 GMT
Title: HyperDiff: Hypergraph Guided Diffusion Model for 3D Human Pose Estimation
Authors: Bing Han, Yuhua Huang, Pan Gao,
Abstract summary: This paper introduces a novel 3D pose estimation method, HyperDiff, which integrates diffusion models with HyperGCN.<n>Results demonstrate that HyperDiff achieves state-of-the-art performance on the Human3.6M and MPI-INF-3DHP datasets.
Score: 15.321095223060768
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Monocular 3D human pose estimation (HPE) often encounters challenges such as depth ambiguity and occlusion during the 2D-to-3D lifting process. Additionally, traditional methods may overlook multi-scale skeleton features when utilizing skeleton structure information, which can negatively impact the accuracy of pose estimation. To address these challenges, this paper introduces a novel 3D pose estimation method, HyperDiff, which integrates diffusion models with HyperGCN. The diffusion model effectively captures data uncertainty, alleviating depth ambiguity and occlusion. Meanwhile, HyperGCN, serving as a denoiser, employs multi-granularity structures to accurately model high-order correlations between joints. This improves the model's denoising capability especially for complex poses. Experimental results demonstrate that HyperDiff achieves state-of-the-art performance on the Human3.6M and MPI-INF-3DHP datasets and can flexibly adapt to varying computational resources to balance performance and efficiency.

Related papers

HDiffTG: A Lightweight Hybrid Diffusion-Transformer-GCN Architecture for 3D Human Pose Estimation [21.823965837699166]
HDiffTG is a novel 3D Human Pose (3DHCN) method that integrates Transformer, Graph Convolutional Network (GCN), and diffusion model into a unified framework.<n>We show that HDiffTG significantly improves pose estimation accuracy and robustness while maintaining a lightweight design.
arXiv Detail & Related papers (2025-05-07T09:26:37Z)
Benchmarking 3D Human Pose Estimation Models under Occlusions [6.858859328420893]
Human Pose Estimation (HPE) involves detecting and localizing keypoints on the human body from visual data.<n>This paper presents a benchmark on the robustness of 3D HPE models under realistic occlusion conditions.<n>We evaluate nine state-of-the-art 2D-to-3D HPE models, spanning convolutional, transformer-based, graph-based, and diffusion-based architectures.
arXiv Detail & Related papers (2025-04-14T16:00:25Z)
DeProPose: Deficiency-Proof 3D Human Pose Estimation via Adaptive Multi-View Fusion [57.83515140886807]
We introduce the task of Deficiency-Aware 3D Pose Estimation.<n>DeProPose is a flexible method that simplifies the network architecture to reduce training complexity.<n>We have developed a novel 3D human pose estimation dataset.
arXiv Detail & Related papers (2025-02-23T03:22:54Z)
A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision [65.33043028101471]
We present a novel framework for training 3D image-conditioned diffusion models using only 2D supervision.<n>Most existing 3D generative models rely on full 3D supervision, which is impractical due to the scarcity of large-scale 3D datasets.
arXiv Detail & Related papers (2024-12-01T00:29:57Z)
$\text{Di}^2\text{Pose}$: Discrete Diffusion Model for Occluded 3D Human Pose Estimation [17.281031933210762]
We introduce the Discrete Diffusion Pose ($textDi2textPose$), a novel framework designed for occluded 3D human pose estimation. $textDi2textPose$ employs a two-stage process: it first converts 3D poses into a discrete representation through a emphpose quantization step. This methodological innovation restrictively confines the search space towards physically viable configurations.
arXiv Detail & Related papers (2024-05-27T10:01:36Z)
DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion [54.0238087499699]
We show that diffusion models enhance the accuracy, robustness, and coherence of human pose estimations. We introduce DiffHPE, a novel strategy for harnessing diffusion models in 3D-HPE. Our findings indicate that while standalone diffusion models provide commendable performance, their accuracy is even better in combination with supervised models.
arXiv Detail & Related papers (2023-09-04T12:54:10Z)
Unsupervised 3D Pose Estimation with Non-Rigid Structure-from-Motion Modeling [83.76377808476039]
We propose a new modeling method for human pose deformations and design an accompanying diffusion-based motion prior. Inspired by the field of non-rigid structure-from-motion, we divide the task of reconstructing 3D human skeletons in motion into the estimation of a 3D reference skeleton. A mixed spatial-temporal NRSfMformer is used to simultaneously estimate the 3D reference skeleton and the skeleton deformation of each frame from 2D observations sequence.
arXiv Detail & Related papers (2023-08-18T16:41:57Z)
DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion Probabilistic Model [25.223801390996435]
This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection. We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector. We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets.
arXiv Detail & Related papers (2022-12-06T07:22:20Z)
Uncertainty-Aware Adaptation for Self-Supervised 3D Human Pose Estimation [70.32536356351706]
We introduce MRP-Net that constitutes a common deep network backbone with two output heads subscribing to two diverse configurations. We derive suitable measures to quantify prediction uncertainty at both pose and joint level. We present a comprehensive evaluation of the proposed approach and demonstrate state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2022-03-29T07:14:58Z)
Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image. The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images. We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z)
Kinematic-Structure-Preserved Representation for Unsupervised 3D Human Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable. We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions. Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.