Object Level Depth Reconstruction for Category Level 6D Object Pose
Estimation From Monocular RGB Image
- URL: http://arxiv.org/abs/2204.01586v1
- Date: Mon, 4 Apr 2022 15:33:28 GMT
- Title: Object Level Depth Reconstruction for Category Level 6D Object Pose
Estimation From Monocular RGB Image
- Authors: Zhaoxin Fan, Zhenbo Song, Jian Xu, Zhicheng Wang, Kejian Wu, Hongyan
Liu and Jun He
- Abstract summary: We propose a novel approach named Object Level Depth reconstruction Network (OLD-Net) taking only RGB images as input for category-level 6D object pose estimation.
We propose to directly predict object-level depth from a monocular RGB image by deforming the category-level shape prior into object-level depth and the canonical NOCS representation.
Experiments on the challenging CAMERA25 and REAL275 datasets indicate that our model achieves state-of-the-art performance.
- Score: 12.382992538846896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, RGBD-based category-level 6D object pose estimation has achieved
promising improvement in performance, however, the requirement of depth
information prohibits broader applications. In order to relieve this problem,
this paper proposes a novel approach named Object Level Depth reconstruction
Network (OLD-Net) taking only RGB images as input for category-level 6D object
pose estimation. We propose to directly predict object-level depth from a
monocular RGB image by deforming the category-level shape prior into
object-level depth and the canonical NOCS representation. Two novel modules
named Normalized Global Position Hints (NGPH) and Shape-aware Decoupled Depth
Reconstruction (SDDR) module are introduced to learn high fidelity object-level
depth and delicate shape representations. At last, the 6D object pose is solved
by aligning the predicted canonical representation with the back-projected
object-level depth. Extensive experiments on the challenging CAMERA25 and
REAL275 datasets indicate that our model, though simple, achieves
state-of-the-art performance.
Related papers
- RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation [23.615122326731115]
We propose a novel solution that makes use of RGB video streams.
Our framework consists of three modules: a scale-aware monocular dense SLAM solution, a lightweight object pose predictor, and an object-level pose graph.
Our experimental results demonstrate that when utilizing public dataset sequences with high-quality depth information, the proposed method exhibits comparable performance to state-of-the-art RGB-D methods.
arXiv Detail & Related papers (2023-08-17T08:29:54Z) - StereoPose: Category-Level 6D Transparent Object Pose Estimation from
Stereo Images via Back-View NOCS [106.62225866064313]
We present StereoPose, a novel stereo image framework for category-level object pose estimation.
For a robust estimation from pure stereo images, we develop a pipeline that decouples category-level pose estimation into object size estimation, initial pose estimation, and pose refinement.
To address the issue of image content aliasing, we define a back-view NOCS map for the transparent object.
The back-view NOCS aims to reduce the network learning ambiguity caused by content aliasing, and leverage informative cues on the back of the transparent object for more accurate pose estimation.
arXiv Detail & Related papers (2022-11-03T08:36:09Z) - PhoCaL: A Multi-Modal Dataset for Category-Level Object Pose Estimation
with Photometrically Challenging Objects [45.31344700263873]
We introduce a multimodal dataset for category-level object pose estimation with photometrically challenging objects termed PhoCaL.
PhoCaL comprises 60 high quality 3D models of household objects over 8 categories including highly reflective, transparent and symmetric objects.
It ensures sub-millimeter accuracy of the pose for opaque textured, shiny and transparent objects, no motion blur and perfect camera synchronisation.
arXiv Detail & Related papers (2022-05-18T09:21:09Z) - Coupled Iterative Refinement for 6D Multi-Object Pose Estimation [64.7198752089041]
Given a set of known 3D objects and an RGB or RGB-D input image, we detect and estimate the 6D pose of each object.
Our approach iteratively refines both pose and correspondence in a tightly coupled manner, allowing us to dynamically remove outliers to improve accuracy.
arXiv Detail & Related papers (2022-04-26T18:00:08Z) - Next-Best-View Prediction for Active Stereo Cameras and Highly
Reflective Objects [12.21992378133376]
We propose a next-best-view framework to strategically select camera viewpoints for completing depth data on reflective objects.
We employ an RGB-based pose estimator to obtain current pose predictions from the existing data.
Our active depth acquisition method outperforms two strong baselines for both depth completion and object pose estimation performance.
arXiv Detail & Related papers (2022-02-27T01:48:02Z) - Single-stage Keypoint-based Category-level Object Pose Estimation from
an RGB Image [27.234658117816103]
We propose a single-stage, keypoint-based approach for category-level object pose estimation.
The proposed network performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative bounding cuboid dimensions.
We conduct extensive experiments on the challenging Objectron benchmark, outperforming state-of-the-art methods on the 3D IoU metric.
arXiv Detail & Related papers (2021-09-13T17:55:00Z) - DONet: Learning Category-Level 6D Object Pose and Size Estimation from
Depth Observation [53.55300278592281]
We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image.
Our framework makes inferences based on the rich geometric information of the object in the depth channel alone.
Our framework competes with state-of-the-art approaches that require labeled real-world images.
arXiv Detail & Related papers (2021-06-27T10:41:50Z) - Spatial Attention Improves Iterative 6D Object Pose Estimation [52.365075652976735]
We propose a new method for 6D pose estimation refinement from RGB images.
Our main insight is that after the initial pose estimate, it is important to pay attention to distinct spatial features of the object.
We experimentally show that this approach learns to attend to salient spatial features and learns to ignore occluded parts of the object, leading to better pose estimation across datasets.
arXiv Detail & Related papers (2021-01-05T17:18:52Z) - Shape Prior Deformation for Categorical 6D Object Pose and Size
Estimation [62.618227434286]
We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image.
We propose a deep network to reconstruct the 3D object model by explicitly modeling the deformation from a pre-learned categorical shape prior.
arXiv Detail & Related papers (2020-07-16T16:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.