DONet: Learning Category-Level 6D Object Pose and Size Estimation from
Depth Observation
- URL: http://arxiv.org/abs/2106.14193v1
- Date: Sun, 27 Jun 2021 10:41:50 GMT
- Title: DONet: Learning Category-Level 6D Object Pose and Size Estimation from
Depth Observation
- Authors: Haitao Lin, Zichang Liu, Chilam Cheang, Lingwei Zhang, Yanwei Fu,
Xiangyang Xue
- Abstract summary: We propose a method of Category-level 6D Object Pose and Size Estimation (COPSE) from a single depth image.
Our framework makes inferences based on the rich geometric information of the object in the depth channel alone.
Our framework competes with state-of-the-art approaches that require labeled real-world images.
- Score: 53.55300278592281
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a method of Category-level 6D Object Pose and Size Estimation
(COPSE) from a single depth image, without external pose-annotated real-world
training data. While previous works exploit visual cues in RGB(D) images, our
method makes inferences based on the rich geometric information of the object
in the depth channel alone. Essentially, our framework explores such geometric
information by learning the unified 3D Orientation-Consistent Representations
(3D-OCR) module, and further enforced by the property of Geometry-constrained
Reflection Symmetry (GeoReS) module. The magnitude information of object size
and the center point is finally estimated by Mirror-Paired Dimensional
Estimation (MPDE) module. Extensive experiments on the category-level NOCS
benchmark demonstrate that our framework competes with state-of-the-art
approaches that require labeled real-world images. We also deploy our approach
to a physical Baxter robot to perform manipulation tasks on unseen but
category-known instances, and the results further validate the efficacy of our
proposed model. Our videos are available in the supplementary material.
Related papers
- MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation [23.615122326731115]
We propose a novel solution that makes use of RGB video streams.
Our framework consists of three modules: a scale-aware monocular dense SLAM solution, a lightweight object pose predictor, and an object-level pose graph.
Our experimental results demonstrate that when utilizing public dataset sequences with high-quality depth information, the proposed method exhibits comparable performance to state-of-the-art RGB-D methods.
arXiv Detail & Related papers (2023-08-17T08:29:54Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - Category-Agnostic 6D Pose Estimation with Conditional Neural Processes [19.387280883044482]
We present a novel meta-learning approach for 6D pose estimation on unknown objects.
Our algorithm learns object representation in a category-agnostic way, which endows it with strong generalization capabilities across object categories.
arXiv Detail & Related papers (2022-06-14T20:46:09Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Aug3D-RPN: Improving Monocular 3D Object Detection by Synthetic Images
with Virtual Depth [64.29043589521308]
We propose a rendering module to augment the training data by synthesizing images with virtual-depths.
The rendering module takes as input the RGB image and its corresponding sparse depth image, outputs a variety of photo-realistic synthetic images.
Besides, we introduce an auxiliary module to improve the detection model by jointly optimizing it through a depth estimation task.
arXiv Detail & Related papers (2021-07-28T11:00:47Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z) - Shape Prior Deformation for Categorical 6D Object Pose and Size
Estimation [62.618227434286]
We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image.
We propose a deep network to reconstruct the 3D object model by explicitly modeling the deformation from a pre-learned categorical shape prior.
arXiv Detail & Related papers (2020-07-16T16:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.