Related papers: Occlusion-Aware Self-Supervised Monocular 6D Object Pose Estimation

Occlusion-Aware Self-Supervised Monocular 6D Object Pose Estimation

URL: http://arxiv.org/abs/2203.10339v1
Date: Sat, 19 Mar 2022 15:12:06 GMT
Title: Occlusion-Aware Self-Supervised Monocular 6D Object Pose Estimation
Authors: Gu Wang, Fabian Manhardt, Xingyu Liu, Xiangyang Ji, Federico Tombari
Abstract summary: We propose a novel monocular 6D pose estimation approach by means of self-supervised learning. We leverage current trends in noisy student training and differentiable rendering to further self-supervise the model. Our proposed self-supervision outperforms all other methods relying on synthetic data.
Score: 88.8963330073454
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: 6D object pose estimation is a fundamental yet challenging problem in computer vision. Convolutional Neural Networks (CNNs) have recently proven to be capable of predicting reliable 6D pose estimates even under monocular settings. Nonetheless, CNNs are identified as being extremely data-driven, and acquiring adequate annotations is oftentimes very time-consuming and labor intensive. To overcome this limitation, we propose a novel monocular 6D pose estimation approach by means of self-supervised learning, removing the need for real annotations. After training our proposed network fully supervised with synthetic RGB data, we leverage current trends in noisy student training and differentiable rendering to further self-supervise the model on these unsupervised real RGB(-D) samples, seeking for a visually and geometrically optimal alignment. Moreover, employing both visible and amodal mask information, our self-supervision becomes very robust towards challenging scenarios such as occlusion. Extensive evaluations demonstrate that our proposed self-supervision outperforms all other methods relying on synthetic data or employing elaborate techniques from the domain adaptation realm. Noteworthy, our self-supervised approach consistently improves over its synthetically trained baseline and often almost closes the gap towards its fully supervised counterpart. The code and models are publicly available at https://github.com/THU-DA-6D-Pose-Group/self6dpp.git.

Related papers

SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes. It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions. We introduce an external pretrained monocular depth estimation model for generating single-image depth prior. Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z)
FS6D: Few-Shot 6D Pose Estimation of Novel Objects [116.34922994123973]
6D object pose estimation networks are limited in their capability to scale to large numbers of object instances. In this work, we study a new open set problem; the few-shot 6D object poses estimation: estimating the 6D pose of an unknown object by a few support views without extra training.
arXiv Detail & Related papers (2022-03-28T10:31:29Z)
VIPose: Real-time Visual-Inertial 6D Object Pose Tracking [3.44942675405441]
We introduce a novel Deep Neural Network (DNN) called VIPose to address the object pose tracking problem in real-time. The key contribution is the design of a novel DNN architecture which fuses visual and inertial features to predict the objects' relative 6D pose. The approach presents accuracy performances comparable to state-of-the-art techniques, but with additional benefit to be real-time.
arXiv Detail & Related papers (2021-07-27T06:10:23Z)
Unsupervised Domain Adaptation with Temporal-Consistent Self-Training for 3D Hand-Object Joint Reconstruction [131.34795312667026]
We introduce an effective approach to addressing this challenge by exploiting 3D geometric constraints within a cycle generative adversarial network (CycleGAN) In contrast to most existing works, we propose to enforce short- and long-term temporal consistency to fine-tune the domain-adapted model in a self-supervised fashion. We will demonstrate that our approach outperforms state-of-the-art 3D hand-object joint reconstruction methods on three widely-used benchmarks.
arXiv Detail & Related papers (2020-12-21T11:27:56Z)
se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains [12.71983073907091]
This work proposes a data-driven optimization approach for long-term, 6D pose tracking. It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object's model. The proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images.
arXiv Detail & Related papers (2020-07-27T21:09:36Z)
Self6D: Self-Supervised Monocular 6D Object Pose Estimation [114.18496727590481]
We propose the idea of monocular 6D pose estimation by means of self-supervised learning. We leverage recent advances in neural rendering to further self-supervise the model on unannotated real RGB-D data.
arXiv Detail & Related papers (2020-04-14T13:16:36Z)
CPS++: Improving Class-level 6D Pose and Shape Estimation From Monocular Images With Self-Supervised Learning [74.53664270194643]
Modern monocular 6D pose estimation methods can only cope with a handful of object instances. We propose a novel method for class-level monocular 6D pose estimation, coupled with metric shape retrieval. We experimentally demonstrate that we can retrieve precise 6D poses and metric shapes from a single RGB image.
arXiv Detail & Related papers (2020-03-12T15:28:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.