HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation
- URL: http://arxiv.org/abs/2502.10606v1
- Date: Fri, 14 Feb 2025 23:44:26 GMT
- Title: HIPPo: Harnessing Image-to-3D Priors for Model-free Zero-shot 6D Pose Estimation
- Authors: Yibo Liu, Zhaodong Jiang, Binbin Xu, Guile Wu, Yuan Ren, Tongtong Cao, Bingbing Liu, Rui Heng Yang, Amir Rasouli, Jinjun Shan,
- Abstract summary: This work focuses on model-free zero-shot 6D object pose estimation for robotics applications.
We propose a novel framework named HIPPo, which eliminates the need for curated CAD models and reference images.
Our HIPPo Dreamer can generate a 3D mesh of any unseen objects from a single glance in just a few seconds.
- Score: 23.451960895369517
- License:
- Abstract: This work focuses on model-free zero-shot 6D object pose estimation for robotics applications. While existing methods can estimate the precise 6D pose of objects, they heavily rely on curated CAD models or reference images, the preparation of which is a time-consuming and labor-intensive process. Moreover, in real-world scenarios, 3D models or reference images may not be available in advance and instant robot reaction is desired. In this work, we propose a novel framework named HIPPo, which eliminates the need for curated CAD models and reference images by harnessing image-to-3D priors from Diffusion Models, enabling model-free zero-shot 6D pose estimation. Specifically, we construct HIPPo Dreamer, a rapid image-to-mesh model built on a multiview Diffusion Model and a 3D reconstruction foundation model. Our HIPPo Dreamer can generate a 3D mesh of any unseen objects from a single glance in just a few seconds. Then, as more observations are acquired, we propose to continuously refine the diffusion prior mesh model by joint optimization of object geometry and appearance. This is achieved by a measurement-guided scheme that gradually replaces the plausible diffusion priors with more reliable online observations. Consequently, HIPPo can instantly estimate and track the 6D pose of a novel object and maintain a complete mesh for immediate robotic applications. Thorough experiments on various benchmarks show that HIPPo outperforms state-of-the-art methods in 6D object pose estimation when prior reference images are limited.
Related papers
- 6DOPE-GS: Online 6D Object Pose Estimation using Gaussian Splatting [7.7145084897748974]
We present 6DOPE-GS, a novel method for online 6D object pose estimation & tracking with a single RGB-D camera.
We show that 6DOPE-GS matches the performance of state-of-the-art baselines for model-free simultaneous 6D pose tracking and reconstruction.
We also demonstrate the method's suitability for live, dynamic object tracking and reconstruction in a real-world setting.
arXiv Detail & Related papers (2024-12-02T14:32:19Z) - Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation [66.3814684757376]
This work presents Zero123-6D, the first work to demonstrate the utility of Diffusion Model-based novel-view-synthesizers in enhancing RGB 6D pose estimation at category-level.
The outlined method shows reduction in data requirements, removal of the necessity of depth information in zero-shot category-level 6D pose estimation task, and increased performance, quantitatively demonstrated through experiments on the CO3D dataset.
arXiv Detail & Related papers (2024-03-21T10:38:18Z) - GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence [64.77224422330737]
GigaPose is a fast, robust, and accurate method for CAD-based novel object pose estimation in RGB images.
Our approach samples templates in only a two-degrees-of-freedom space instead of the usual three.
It achieves state-of-the-art accuracy and can be seamlessly integrated with existing refinement methods.
arXiv Detail & Related papers (2023-11-23T18:55:03Z) - 6D Object Pose Estimation from Approximate 3D Models for Orbital
Robotics [19.64111218032901]
We present a novel technique to estimate the 6D pose of objects from single images.
We employ a dense 2D-to-3D correspondence predictor that regresses 3D model coordinates for every pixel.
Our method achieves state-of-the-art performance on the SPEED+ dataset and has won the SPEC2021 post-mortem competition.
arXiv Detail & Related papers (2023-03-23T13:18:05Z) - OnePose++: Keypoint-Free One-Shot Object Pose Estimation without CAD
Models [51.68715543630427]
OnePose relies on detecting repeatable image keypoints and is thus prone to failure on low-textured objects.
We propose a keypoint-free pose estimation pipeline to remove the need for repeatable keypoint detection.
A 2D-3D matching network directly establishes 2D-3D correspondences between the query image and the reconstructed point-cloud model.
arXiv Detail & Related papers (2023-01-18T17:47:13Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - Shape, Pose, and Appearance from a Single Image via Bootstrapped
Radiance Field Inversion [54.151979979158085]
We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available.
We leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution.
Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios.
arXiv Detail & Related papers (2022-11-21T17:42:42Z) - Learning 6D Pose Estimation from Synthetic RGBD Images for Robotic
Applications [0.6299766708197883]
The proposed pipeline can efficiently generate large amounts of photo-realistic RGBD images for the object of interest.
We develop a real-time two-stage 6D pose estimation approach by integrating the object detector YOLO-V4-tiny and the 6D pose estimation algorithm PVN3D.
The resulting network shows competitive performance compared to state-of-the-art methods when evaluated on LineMod dataset.
arXiv Detail & Related papers (2022-08-30T14:17:15Z) - CenterSnap: Single-Shot Multi-Object 3D Shape Reconstruction and
Categorical 6D Pose and Size Estimation [19.284468553414918]
This paper studies the complex task of simultaneous multi-object 3D reconstruction, 6D pose and size estimation from a single-view RGB-D observation.
Existing approaches mainly follow a complex multi-stage pipeline which first localizes and detects each object instance in the image and then regresses to either their 3D meshes or 6D poses.
We present a simple one-stage approach to predict both the 3D shape and estimate the 6D pose and size jointly in a bounding-box free manner.
arXiv Detail & Related papers (2022-03-03T18:59:04Z) - CPS++: Improving Class-level 6D Pose and Shape Estimation From Monocular
Images With Self-Supervised Learning [74.53664270194643]
Modern monocular 6D pose estimation methods can only cope with a handful of object instances.
We propose a novel method for class-level monocular 6D pose estimation, coupled with metric shape retrieval.
We experimentally demonstrate that we can retrieve precise 6D poses and metric shapes from a single RGB image.
arXiv Detail & Related papers (2020-03-12T15:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.