POPE: 6-DoF Promptable Pose Estimation of Any Object, in Any Scene, with
One Reference
- URL: http://arxiv.org/abs/2305.15727v1
- Date: Thu, 25 May 2023 05:19:17 GMT
- Title: POPE: 6-DoF Promptable Pose Estimation of Any Object, in Any Scene, with
One Reference
- Authors: Zhiwen Fan, Panwang Pan, Peihao Wang, Yifan Jiang, Dejia Xu, Hanwen
Jiang, Zhangyang Wang
- Abstract summary: We propose a general paradigm for object pose estimation, called Promptable Object Pose Estimation (POPE)
POPE enables zero-shot 6DoF object pose estimation for any target object in any scene, while only a single reference is adopted as the support view.
Comprehensive experimental results demonstrate that POPE exhibits unrivaled robust performance in zero-shot settings.
- Score: 72.32413378065053
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the significant progress in six degrees-of-freedom (6DoF) object pose
estimation, existing methods have limited applicability in real-world scenarios
involving embodied agents and downstream 3D vision tasks. These limitations
mainly come from the necessity of 3D models, closed-category detection, and a
large number of densely annotated support views. To mitigate this issue, we
propose a general paradigm for object pose estimation, called Promptable Object
Pose Estimation (POPE). The proposed approach POPE enables zero-shot 6DoF
object pose estimation for any target object in any scene, while only a single
reference is adopted as the support view. To achieve this, POPE leverages the
power of the pre-trained large-scale 2D foundation model, employs a framework
with hierarchical feature representation and 3D geometry principles. Moreover,
it estimates the relative camera pose between object prompts and the target
object in new views, enabling both two-view and multi-view 6DoF pose estimation
tasks. Comprehensive experimental results demonstrate that POPE exhibits
unrivaled robust performance in zero-shot settings, by achieving a significant
reduction in the averaged Median Pose Error by 52.38% and 50.47% on the LINEMOD
and OnePose datasets, respectively. We also conduct more challenging testings
in causally captured images (see Figure 1), which further demonstrates the
robustness of POPE. Project page can be found with
https://paulpanwang.github.io/POPE/.
Related papers
- Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking [9.365544189576363]
6D Object Pose Estimation is a crucial yet challenging task in computer vision, suffering from a significant lack of large-scale datasets.
This paper introduces Omni6DPose, a dataset characterized by its diversity in object categories, large scale, and variety in object materials.
We introduce GenPose++, an enhanced version of the SOTA category-level pose estimation framework, incorporating two pivotal improvements.
arXiv Detail & Related papers (2024-06-06T17:57:20Z) - LocaliseBot: Multi-view 3D object localisation with differentiable
rendering for robot grasping [9.690844449175948]
We focus on object pose estimation.
Our approach relies on three pieces of information: multiple views of the object, the camera's parameters at those viewpoints, and 3D CAD models of objects.
We show that the estimated object pose results in 99.65% grasp accuracy with the ground truth grasp candidates.
arXiv Detail & Related papers (2023-11-14T14:27:53Z) - 3D-Aware Hypothesis & Verification for Generalizable Relative Object
Pose Estimation [69.73691477825079]
We present a new hypothesis-and-verification framework to tackle the problem of generalizable object pose estimation.
To measure reliability, we introduce a 3D-aware verification that explicitly applies 3D transformations to the 3D object representations learned from the two input images.
arXiv Detail & Related papers (2023-10-05T13:34:07Z) - Learning to Estimate 6DoF Pose from Limited Data: A Few-Shot,
Generalizable Approach using RGB Images [60.0898989456276]
We present a new framework named Cas6D for few-shot 6DoF pose estimation that is generalizable and uses only RGB images.
To address the false positives of target object detection in the extreme few-shot setting, our framework utilizes a self-supervised pre-trained ViT to learn robust feature representations.
Experimental results on the LINEMOD and GenMOP datasets demonstrate that Cas6D outperforms state-of-the-art methods by 9.2% and 3.8% accuracy (Proj-5) under the 32-shot setting.
arXiv Detail & Related papers (2023-06-13T07:45:42Z) - Rigidity-Aware Detection for 6D Object Pose Estimation [60.88857851869196]
Most recent 6D object pose estimation methods first use object detection to obtain 2D bounding boxes before actually regressing the pose.
We propose a rigidity-aware detection method exploiting the fact that, in 6D pose estimation, the target objects are rigid.
Key to the success of our approach is a visibility map, which we propose to build using a minimum barrier distance between every pixel in the bounding box and the box boundary.
arXiv Detail & Related papers (2023-03-22T09:02:54Z) - GPV-Pose: Category-level Object Pose Estimation via Geometry-guided
Point-wise Voting [103.74918834553249]
GPV-Pose is a novel framework for robust category-level pose estimation.
It harnesses geometric insights to enhance the learning of category-level pose-sensitive features.
It produces superior results to state-of-the-art competitors on common public benchmarks.
arXiv Detail & Related papers (2022-03-15T13:58:50Z) - Single-stage Keypoint-based Category-level Object Pose Estimation from
an RGB Image [27.234658117816103]
We propose a single-stage, keypoint-based approach for category-level object pose estimation.
The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative bounding cuboid dimensions.
We conduct extensive experiments on the challenging Objectron benchmark, outperforming state-of-the-art methods on the 3D IoU metric.
arXiv Detail & Related papers (2021-09-13T17:55:00Z) - CosyPose: Consistent multi-view multi-object 6D pose estimation [48.097599674329004]
We present a single-view single-object 6D pose estimation method, which we use to generate 6D object pose hypotheses.
Second, we develop a robust method for matching individual 6D object pose hypotheses across different input images.
Third, we develop a method for global scene refinement given multiple object hypotheses and their correspondences across views.
arXiv Detail & Related papers (2020-08-19T14:11:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.