Occlusion-Aware Self-Supervised Monocular 6D Object Pose Estimation
- URL: http://arxiv.org/abs/2203.10339v1
- Date: Sat, 19 Mar 2022 15:12:06 GMT
- Title: Occlusion-Aware Self-Supervised Monocular 6D Object Pose Estimation
- Authors: Gu Wang, Fabian Manhardt, Xingyu Liu, Xiangyang Ji, Federico Tombari
- Abstract summary: We propose a novel monocular 6D pose estimation approach by means of self-supervised learning.
We leverage current trends in noisy student training and differentiable rendering to further self-supervise the model.
Our proposed self-supervision outperforms all other methods relying on synthetic data.
- Score: 88.8963330073454
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: 6D object pose estimation is a fundamental yet challenging problem in
computer vision. Convolutional Neural Networks (CNNs) have recently proven to
be capable of predicting reliable 6D pose estimates even under monocular
settings. Nonetheless, CNNs are identified as being extremely data-driven, and
acquiring adequate annotations is oftentimes very time-consuming and labor
intensive. To overcome this limitation, we propose a novel monocular 6D pose
estimation approach by means of self-supervised learning, removing the need for
real annotations. After training our proposed network fully supervised with
synthetic RGB data, we leverage current trends in noisy student training and
differentiable rendering to further self-supervise the model on these
unsupervised real RGB(-D) samples, seeking for a visually and geometrically
optimal alignment. Moreover, employing both visible and amodal mask
information, our self-supervision becomes very robust towards challenging
scenarios such as occlusion. Extensive evaluations demonstrate that our
proposed self-supervision outperforms all other methods relying on synthetic
data or employing elaborate techniques from the domain adaptation realm.
Noteworthy, our self-supervised approach consistently improves over its
synthetically trained baseline and often almost closes the gap towards its
fully supervised counterpart. The code and models are publicly available at
https://github.com/THU-DA-6D-Pose-Group/self6dpp.git.
Related papers
- SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - FS6D: Few-Shot 6D Pose Estimation of Novel Objects [116.34922994123973]
6D object pose estimation networks are limited in their capability to scale to large numbers of object instances.
In this work, we study a new open set problem; the few-shot 6D object poses estimation: estimating the 6D pose of an unknown object by a few support views without extra training.
arXiv Detail & Related papers (2022-03-28T10:31:29Z) - VIPose: Real-time Visual-Inertial 6D Object Pose Tracking [3.44942675405441]
We introduce a novel Deep Neural Network (DNN) called VIPose to address the object pose tracking problem in real-time.
The key contribution is the design of a novel DNN architecture which fuses visual and inertial features to predict the objects' relative 6D pose.
The approach presents accuracy performances comparable to state-of-the-art techniques, but with additional benefit to be real-time.
arXiv Detail & Related papers (2021-07-27T06:10:23Z) - Unsupervised Domain Adaptation with Temporal-Consistent Self-Training
for 3D Hand-Object Joint Reconstruction [131.34795312667026]
We introduce an effective approach to addressing this challenge by exploiting 3D geometric constraints within a cycle generative adversarial network (CycleGAN)
In contrast to most existing works, we propose to enforce short- and long-term temporal consistency to fine-tune the domain-adapted model in a self-supervised fashion.
We will demonstrate that our approach outperforms state-of-the-art 3D hand-object joint reconstruction methods on three widely-used benchmarks.
arXiv Detail & Related papers (2020-12-21T11:27:56Z) - se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image
Residuals in Synthetic Domains [12.71983073907091]
This work proposes a data-driven optimization approach for long-term, 6D pose tracking.
It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object's model.
The proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images.
arXiv Detail & Related papers (2020-07-27T21:09:36Z) - Self6D: Self-Supervised Monocular 6D Object Pose Estimation [114.18496727590481]
We propose the idea of monocular 6D pose estimation by means of self-supervised learning.
We leverage recent advances in neural rendering to further self-supervise the model on unannotated real RGB-D data.
arXiv Detail & Related papers (2020-04-14T13:16:36Z) - CPS++: Improving Class-level 6D Pose and Shape Estimation From Monocular
Images With Self-Supervised Learning [74.53664270194643]
Modern monocular 6D pose estimation methods can only cope with a handful of object instances.
We propose a novel method for class-level monocular 6D pose estimation, coupled with metric shape retrieval.
We experimentally demonstrate that we can retrieve precise 6D poses and metric shapes from a single RGB image.
arXiv Detail & Related papers (2020-03-12T15:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.