GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D
Object Pose Estimation
- URL: http://arxiv.org/abs/2102.12145v1
- Date: Wed, 24 Feb 2021 09:11:31 GMT
- Title: GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D
Object Pose Estimation
- Authors: Gu Wang, Fabian Manhardt, Federico Tombari, Xiangyang Ji
- Abstract summary: 6D pose estimation from a single RGB image is a fundamental task in computer vision.
We propose a simple yet effective Geometry-guided Direct Regression Network (GDR-Net) to learn the 6D pose in an end-to-end manner.
Our approach remarkably outperforms state-of-the-art methods on LM, LM-O and YCB-V datasets.
- Score: 71.83992173720311
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 6D pose estimation from a single RGB image is a fundamental task in computer
vision. The current top-performing deep learning-based methods rely on an
indirect strategy, i.e., first establishing 2D-3D correspondences between the
coordinates in the image plane and object coordinate system, and then applying
a variant of the P$n$P/RANSAC algorithm. However, this two-stage pipeline is
not end-to-end trainable, thus is hard to be employed for many tasks requiring
differentiable poses. On the other hand, methods based on direct regression are
currently inferior to geometry-based methods. In this work, we perform an
in-depth investigation on both direct and indirect methods, and propose a
simple yet effective Geometry-guided Direct Regression Network (GDR-Net) to
learn the 6D pose in an end-to-end manner from dense correspondence-based
intermediate geometric representations. Extensive experiments show that our
approach remarkably outperforms state-of-the-art methods on LM, LM-O and YCB-V
datasets. The code will be available at https://git.io/GDR-Net.
Related papers
- Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference [62.99706119370521]
Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair.
We propose a novel 3D generalizable relative pose estimation method by elaborating (i) with a 2.5D shape from an RGB-D reference, (ii) with an off-the-shelf differentiable, and (iii) with semantic cues from a pretrained model like DINOv2.
arXiv Detail & Related papers (2024-06-26T16:01:10Z) - RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images [13.051302134031808]
We introduce a novel method for calculating the 6DoF pose of an object using a single RGB-D image.
Unlike existing methods that either directly predict objects' poses or rely on sparse keypoints for pose recovery, our approach addresses this challenging task using dense correspondence.
arXiv Detail & Related papers (2024-05-14T10:10:45Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - Pseudo Flow Consistency for Self-Supervised 6D Object Pose Estimation [14.469317161361202]
We propose a 6D object pose estimation method that can be trained with pure RGB images without any auxiliary information.
We evaluate our method on three challenging datasets and demonstrate that it outperforms state-of-the-art self-supervised methods significantly.
arXiv Detail & Related papers (2023-08-19T13:52:18Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing.
We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set.
By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z) - Coupled Iterative Refinement for 6D Multi-Object Pose Estimation [64.7198752089041]
Given a set of known 3D objects and an RGB or RGB-D input image, we detect and estimate the 6D pose of each object.
Our approach iteratively refines both pose and correspondence in a tightly coupled manner, allowing us to dynamically remove outliers to improve accuracy.
arXiv Detail & Related papers (2022-04-26T18:00:08Z) - DGECN: A Depth-Guided Edge Convolutional Network for End-to-End 6D Pose
Estimation [19.303780745324502]
We propose a Depth-Guided Edge Conal Network (DGECN) for 6D pose estimation task.
We take advantages ofestimated depth information to guide both the correspondences-extraction process and the cascaded differentiable RANSAC algorithm with geometric information.
Experiments demonstrate that our proposed network outperforms current works on both effectiveness and efficiency.
arXiv Detail & Related papers (2022-04-21T09:19:50Z) - 6D Rotation Representation For Unconstrained Head Pose Estimation [2.1485350418225244]
We address the problem of ambiguous rotation labels by introducing the rotation matrix formalism for our ground truth data.
This way, our method can learn the full rotation appearance which is contrary to previous approaches that restrict the pose prediction to a narrow-angle.
Experiments on the public AFLW2000 and BIWI datasets demonstrate that our proposed method significantly outperforms other state-of-the-art methods by up to 20%.
arXiv Detail & Related papers (2022-02-25T08:41:13Z) - L6DNet: Light 6 DoF Network for Robust and Precise Object Pose
Estimation with Small Datasets [0.0]
We propose a novel approach to perform 6 DoF object pose estimation from a single RGB-D image.
We adopt a hybrid pipeline in two stages: data-driven and geometric.
Our approach is more robust and accurate than state-of-the-art methods.
arXiv Detail & Related papers (2020-02-03T17:41:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.