Learning to Estimate 6DoF Pose from Limited Data: A Few-Shot,
Generalizable Approach using RGB Images
- URL: http://arxiv.org/abs/2306.07598v1
- Date: Tue, 13 Jun 2023 07:45:42 GMT
- Title: Learning to Estimate 6DoF Pose from Limited Data: A Few-Shot,
Generalizable Approach using RGB Images
- Authors: Panwang Pan, Zhiwen Fan, Brandon Y. Feng, Peihao Wang, Chenxin Li,
Zhangyang Wang
- Abstract summary: We present a new framework named Cas6D for few-shot 6DoF pose estimation that is generalizable and uses only RGB images.
To address the false positives of target object detection in the extreme few-shot setting, our framework utilizes a self-supervised pre-trained ViT to learn robust feature representations.
Experimental results on the LINEMOD and GenMOP datasets demonstrate that Cas6D outperforms state-of-the-art methods by 9.2% and 3.8% accuracy (Proj-5) under the 32-shot setting.
- Score: 60.0898989456276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The accurate estimation of six degrees-of-freedom (6DoF) object poses is
essential for many applications in robotics and augmented reality. However,
existing methods for 6DoF pose estimation often depend on CAD templates or
dense support views, restricting their usefulness in realworld situations. In
this study, we present a new cascade framework named Cas6D for few-shot 6DoF
pose estimation that is generalizable and uses only RGB images. To address the
false positives of target object detection in the extreme few-shot setting, our
framework utilizes a selfsupervised pre-trained ViT to learn robust feature
representations. Then, we initialize the nearest top-K pose candidates based on
similarity score and refine the initial poses using feature pyramids to
formulate and update the cascade warped feature volume, which encodes context
at increasingly finer scales. By discretizing the pose search range using
multiple pose bins and progressively narrowing the pose search range in each
stage using predictions from the previous stage, Cas6D can overcome the large
gap between pose candidates and ground truth poses, which is a common failure
mode in sparse-view scenarios. Experimental results on the LINEMOD and GenMOP
datasets demonstrate that Cas6D outperforms state-of-the-art methods by 9.2%
and 3.8% accuracy (Proj-5) under the 32-shot setting compared to OnePose++ and
Gen6D.
Related papers
- ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation [17.097170273209333]
Recovering camera poses from a set of images is a foundational task in 3D computer vision.
Recent data-driven approaches aim to directly output camera poses, either through regressing the 6DoF camera poses or formulating rotation as a probability distribution.
We propose ADen to unify the two frameworks by employing a generator and a discriminator.
arXiv Detail & Related papers (2024-08-16T22:45:46Z) - POPE: 6-DoF Promptable Pose Estimation of Any Object, in Any Scene, with
One Reference [72.32413378065053]
We propose a general paradigm for object pose estimation, called Promptable Object Pose Estimation (POPE)
POPE enables zero-shot 6DoF object pose estimation for any target object in any scene, while only a single reference is adopted as the support view.
Comprehensive experimental results demonstrate that POPE exhibits unrivaled robust performance in zero-shot settings.
arXiv Detail & Related papers (2023-05-25T05:19:17Z) - MV6D: Multi-View 6D Pose Estimation on RGB-D Frames Using a Deep
Point-wise Voting Network [14.754297065772676]
We present a novel multi-view 6D pose estimation method called MV6D.
We base our approach on the PVN3D network that uses a single RGB-D image to predict keypoints of the target objects.
In contrast to current multi-view pose detection networks such as CosyPose, our MV6D can learn the fusion of multiple perspectives in an end-to-end manner.
arXiv Detail & Related papers (2022-08-01T23:34:43Z) - Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing.
We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set.
By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z) - Coupled Iterative Refinement for 6D Multi-Object Pose Estimation [64.7198752089041]
Given a set of known 3D objects and an RGB or RGB-D input image, we detect and estimate the 6D pose of each object.
Our approach iteratively refines both pose and correspondence in a tightly coupled manner, allowing us to dynamically remove outliers to improve accuracy.
arXiv Detail & Related papers (2022-04-26T18:00:08Z) - FS6D: Few-Shot 6D Pose Estimation of Novel Objects [116.34922994123973]
6D object pose estimation networks are limited in their capability to scale to large numbers of object instances.
In this work, we study a new open set problem; the few-shot 6D object poses estimation: estimating the 6D pose of an unknown object by a few support views without extra training.
arXiv Detail & Related papers (2022-03-28T10:31:29Z) - NeRF-Pose: A First-Reconstruct-Then-Regress Approach for
Weakly-supervised 6D Object Pose Estimation [44.42449011619408]
We present a weakly-supervised reconstruction-based pipeline, named NeRF-Pose, which needs only 2D object segmentation and known relative camera poses during training.
A NeRF-enabled RAN+SAC algorithm is used to estimate stable and accurate pose from the predicted correspondences.
Experiments on LineMod-Occlusion show that the proposed method has state-of-the-art accuracy in comparison to the best 6D pose estimation methods.
arXiv Detail & Related papers (2022-03-09T15:28:02Z) - Adversarial samples for deep monocular 6D object pose estimation [16.308526930732718]
Estimating object 6D pose from an RGB image is important for many real-world applications such as autonomous driving and robotic grasping.
We study adversarial samples that can fool state-of-the-art (SOTA) deep learning based 6D pose estimation models.
arXiv Detail & Related papers (2022-03-01T09:16:37Z) - SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation [98.83762558394345]
SO-Pose is a framework for regressing all 6 degrees-of-freedom (6DoF) for the object pose in a cluttered environment from a single RGB image.
We introduce a novel reasoning about self-occlusion, in order to establish a two-layer representation for 3D objects.
Cross-layer consistencies that align correspondences, self-occlusion and 6D pose, we can further improve accuracy and robustness.
arXiv Detail & Related papers (2021-08-18T19:49:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.