Towards Two-view 6D Object Pose Estimation: A Comparative Study on
Fusion Strategy
- URL: http://arxiv.org/abs/2207.00260v1
- Date: Fri, 1 Jul 2022 08:22:34 GMT
- Title: Towards Two-view 6D Object Pose Estimation: A Comparative Study on
Fusion Strategy
- Authors: Jun Wu, Lilu Liu, Yue Wang, Rong Xiong
- Abstract summary: Current RGB-based 6D object pose estimation methods have achieved noticeable performance on datasets and real world applications.
This paper proposes a framework for 6D object pose estimation that learns implicit 3D information from 2 RGB images.
- Score: 16.65699606802237
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Current RGB-based 6D object pose estimation methods have achieved noticeable
performance on datasets and real world applications. However, predicting 6D
pose from single 2D image features is susceptible to disturbance from changing
of environment and textureless or resemblant object surfaces. Hence, RGB-based
methods generally achieve less competitive results than RGBD-based methods,
which deploy both image features and 3D structure features. To narrow down this
performance gap, this paper proposes a framework for 6D object pose estimation
that learns implicit 3D information from 2 RGB images. Combining the learned 3D
information and 2D image features, we establish more stable correspondence
between the scene and the object models. To seek for the methods best utilizing
3D information from RGB inputs, we conduct an investigation on three different
approaches, including Early- Fusion, Mid-Fusion, and Late-Fusion. We ascertain
the Mid- Fusion approach is the best approach to restore the most precise 3D
keypoints useful for object pose estimation. The experiments show that our
method outperforms state-of-the-art RGB-based methods, and achieves comparable
results with RGBD-based methods.
Related papers
- Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference [62.99706119370521]
Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair.
We propose a novel 3D generalizable relative pose estimation method by elaborating (i) with a 2.5D shape from an RGB-D reference, (ii) with an off-the-shelf differentiable, and (iii) with semantic cues from a pretrained model like DINOv2.
arXiv Detail & Related papers (2024-06-26T16:01:10Z) - RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images [13.051302134031808]
We introduce a novel method for calculating the 6DoF pose of an object using a single RGB-D image.
Unlike existing methods that either directly predict objects' poses or rely on sparse keypoints for pose recovery, our approach addresses this challenging task using dense correspondence.
arXiv Detail & Related papers (2024-05-14T10:10:45Z) - MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images [57.71600854525037]
We propose a Fuse-Describe-Match strategy for 6D pose estimation from RGB-D images.
MatchU is a generic approach that fuses 2D texture and 3D geometric cues for 6D pose prediction of unseen objects.
arXiv Detail & Related papers (2024-03-03T14:01:03Z) - 3D Neural Embedding Likelihood: Probabilistic Inverse Graphics for
Robust 6D Pose Estimation [50.15926681475939]
Inverse graphics aims to infer the 3D scene structure from 2D images.
We introduce probabilistic modeling to quantify uncertainty and achieve robustness in 6D pose estimation tasks.
3DNEL effectively combines learned neural embeddings from RGB with depth information to improve robustness in sim-to-real 6D object pose estimation from RGB-D images.
arXiv Detail & Related papers (2023-02-07T20:48:35Z) - DPODv2: Dense Correspondence-Based 6 DoF Pose Estimation [24.770767430749288]
We propose a 3 stage 6 DoF object detection method called DPODv2 (Dense Pose Object Detector)
We combine a 2D object detector with a dense correspondence estimation network and a multi-view pose refinement method to estimate a full 6 DoF pose.
DPODv2 achieves excellent results on all of them while still remaining fast and scalable independent of the used data modality and the type of training data.
arXiv Detail & Related papers (2022-07-06T16:48:56Z) - Coupled Iterative Refinement for 6D Multi-Object Pose Estimation [64.7198752089041]
Given a set of known 3D objects and an RGB or RGB-D input image, we detect and estimate the 6D pose of each object.
Our approach iteratively refines both pose and correspondence in a tightly coupled manner, allowing us to dynamically remove outliers to improve accuracy.
arXiv Detail & Related papers (2022-04-26T18:00:08Z) - Pose Estimation of Specific Rigid Objects [0.7931904787652707]
We address the problem of estimating the 6D pose of rigid objects from a single RGB or RGB-D input image.
This problem is of great importance to many application fields such as robotic manipulation, augmented reality, and autonomous driving.
arXiv Detail & Related papers (2021-12-30T14:36:47Z) - Learning Stereopsis from Geometric Synthesis for 6D Object Pose
Estimation [11.999630902627864]
Current monocular-based 6D object pose estimation methods generally achieve less competitive results than RGBD-based methods.
This paper proposes a 3D geometric volume based pose estimation method with a short baseline two-view setting.
Experiments show that our method outperforms state-of-the-art monocular-based methods, and is robust in different objects and scenes.
arXiv Detail & Related papers (2021-09-25T02:55:05Z) - SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation [98.83762558394345]
SO-Pose is a framework for regressing all 6 degrees-of-freedom (6DoF) for the object pose in a cluttered environment from a single RGB image.
We introduce a novel reasoning about self-occlusion, in order to establish a two-layer representation for 3D objects.
Cross-layer consistencies that align correspondences, self-occlusion and 6D pose, we can further improve accuracy and robustness.
arXiv Detail & Related papers (2021-08-18T19:49:29Z) - Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD
Images [69.5662419067878]
Grounding referring expressions in RGBD image has been an emerging field.
We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion.
Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that localizes the relevant regions in the RGBD image.
Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object.
arXiv Detail & Related papers (2021-03-14T11:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.