Related papers: UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation

UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation

URL: http://arxiv.org/abs/2508.15972v1
Date: Thu, 21 Aug 2025 21:31:04 GMT
Title: UnPose: Uncertainty-Guided Diffusion Priors for Zero-Shot Pose Estimation
Authors: Zhaodong Jiang, Ashish Sinha, Tongtong Cao, Yuan Ren, Bingbing Liu, Binbin Xu,
Abstract summary: UnPose is a framework for zero-shot, model-free 6D object pose estimation and reconstruction.<n>It exploits 3D priors and uncertainty estimates from a pre-trained diffusion model.<n>It significantly outperforms existing approaches in both 6D pose estimation accuracy and 3D reconstruction quality.
Score: 19.76147681894604
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Estimating the 6D pose of novel objects is a fundamental yet challenging problem in robotics, often relying on access to object CAD models. However, acquiring such models can be costly and impractical. Recent approaches aim to bypass this requirement by leveraging strong priors from foundation models to reconstruct objects from single or multi-view images, but typically require additional training or produce hallucinated geometry. To this end, we propose UnPose, a novel framework for zero-shot, model-free 6D object pose estimation and reconstruction that exploits 3D priors and uncertainty estimates from a pre-trained diffusion model. Specifically, starting from a single-view RGB-D frame, UnPose uses a multi-view diffusion model to estimate an initial 3D model using 3D Gaussian Splatting (3DGS) representation, along with pixel-wise epistemic uncertainty estimates. As additional observations become available, we incrementally refine the 3DGS model by fusing new views guided by the diffusion model's uncertainty, thereby continuously improving the pose estimation accuracy and 3D reconstruction quality. To ensure global consistency, the diffusion prior-generated views and subsequent observations are further integrated in a pose graph and jointly optimized into a coherent 3DGS field. Extensive experiments demonstrate that UnPose significantly outperforms existing approaches in both 6D pose estimation accuracy and 3D reconstruction quality. We further showcase its practical applicability in real-world robotic manipulation tasks.

Related papers

UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References [14.762839788171584]
We propose UA-Pose, an uncertainty-aware approach for 6D object pose estimation and online object completion.<n>We evaluate our method on the YCB-Video, YCBInEOAT, and HO3D datasets, including RGBD sequences of YCB objects manipulated by robots and human hands.
arXiv Detail & Related papers (2025-06-09T17:58:12Z)
Any6D: Model-free 6D Pose Estimation of Novel Objects [76.30057578269668]
We introduce Any6D, a model-free framework for 6D object pose estimation.<n>It requires only a single RGB-D anchor image to estimate both the 6D pose and size of unknown objects in novel scenes.<n>We evaluate our method on five challenging datasets.
arXiv Detail & Related papers (2025-03-24T13:46:21Z)
HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation [23.451960895369517]
This work focuses on model-free zero-shot 6D object pose estimation for robotics applications.<n>We propose a novel framework named HIPPo, which eliminates the need for curated CAD models and reference images.<n>Our HIPPo Dreamer can generate a 3D mesh of any unseen objects from a single glance in just a few seconds.
arXiv Detail & Related papers (2025-02-14T23:44:26Z)
Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis [25.898616784744377]
Given a sparse set of observed views, the observations may not provide sufficient direct evidence to obtain complete and accurate 3D.<n>We propose SparseAGS, a method that adapts this analysis-by-synthesis approach by: a) including novel-view-synthesis-based generative priors in conjunction with photometric objectives to improve the quality of the inferred 3D, and b) explicitly reasoning about outliers and using a discrete search with a continuous optimization-based strategy to correct them.
arXiv Detail & Related papers (2024-12-04T18:59:24Z)
UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation. It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z)
Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation [66.3814684757376]
This work presents Zero123-6D, the first work to demonstrate the utility of Diffusion Model-based novel-view-synthesizers in enhancing RGB 6D pose estimation at category-level. The outlined method shows reduction in data requirements, removal of the necessity of depth information in zero-shot category-level 6D pose estimation task, and increased performance, quantitatively demonstrated through experiments on the CO3D dataset.
arXiv Detail & Related papers (2024-03-21T10:38:18Z)
A generic diffusion-based approach for 3D human pose prediction in the wild [68.00961210467479]
3D human pose forecasting, i.e., predicting a sequence of future human 3D poses given a sequence of past observed ones, is a challenging-temporal task. We provide a unified formulation in which incomplete elements (no matter in the prediction or observation) are treated as noise and propose a conditional diffusion model that denoises them and forecasts plausible poses. We investigate our findings on four standard datasets and obtain significant improvements over the state-of-the-art.
arXiv Detail & Related papers (2022-10-11T17:59:54Z)
NeRF-Pose: A First-Reconstruct-Then-Regress Approach for Weakly-supervised 6D Object Pose Estimation [44.42449011619408]
We present a weakly-supervised reconstruction-based pipeline, named NeRF-Pose, which needs only 2D object segmentation and known relative camera poses during training. A NeRF-enabled RAN+SAC algorithm is used to estimate stable and accurate pose from the predicted correspondences. Experiments on LineMod-Occlusion show that the proposed method has state-of-the-art accuracy in comparison to the best 6D pose estimation methods.
arXiv Detail & Related papers (2022-03-09T15:28:02Z)
Spatial Attention Improves Iterative 6D Object Pose Estimation [52.365075652976735]
We propose a new method for 6D pose estimation refinement from RGB images. Our main insight is that after the initial pose estimate, it is important to pay attention to distinct spatial features of the object. We experimentally show that this approach learns to attend to salient spatial features and learns to ignore occluded parts of the object, leading to better pose estimation across datasets.
arXiv Detail & Related papers (2021-01-05T17:18:52Z)
Kinematic-Structure-Preserved Representation for Unsupervised 3D Human Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable. We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions. Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.