FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects
- URL: http://arxiv.org/abs/2310.12974v1
- Date: Thu, 19 Oct 2023 17:59:09 GMT
- Title: FSD: Fast Self-Supervised Single RGB-D to Categorical 3D Objects
- Authors: Mayank Lunayach, Sergey Zakharov, Dian Chen, Rares Ambrus, Zsolt Kira,
Muhammad Zubair Irshad
- Abstract summary: This work addresses the challenging task of 3D object recognition without the reliance on real-world 3D labeled data.
Our goal is to predict the 3D shape, size, and 6D pose of objects within a single RGB-D image, operating at the category level and eliminating the need for CAD models during inference.
- Score: 37.175069234979645
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we address the challenging task of 3D object recognition
without the reliance on real-world 3D labeled data. Our goal is to predict the
3D shape, size, and 6D pose of objects within a single RGB-D image, operating
at the category level and eliminating the need for CAD models during inference.
While existing self-supervised methods have made strides in this field, they
often suffer from inefficiencies arising from non-end-to-end processing,
reliance on separate models for different object categories, and slow surface
extraction during the training of implicit reconstruction models; thus
hindering both the speed and real-world applicability of the 3D recognition
process. Our proposed method leverages a multi-stage training pipeline,
designed to efficiently transfer synthetic performance to the real-world
domain. This approach is achieved through a combination of 2D and 3D supervised
losses during the synthetic domain training, followed by the incorporation of
2D supervised and 3D self-supervised losses on real-world data in two
additional learning stages. By adopting this comprehensive strategy, our method
successfully overcomes the aforementioned limitations and outperforms existing
self-supervised 6D pose and size estimation baselines on the NOCS test-set with
a 16.4% absolute improvement in mAP for 6D pose estimation while running in
near real-time at 5 Hz.
Related papers
- Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation [66.3814684757376]
This work presents Zero123-6D, the first work to demonstrate the utility of Diffusion Model-based novel-view-synthesizers in enhancing RGB 6D pose estimation at category-level.
The outlined method shows reduction in data requirements, removal of the necessity of depth information in zero-shot category-level 6D pose estimation task, and increased performance, quantitatively demonstrated through experiments on the CO3D dataset.
arXiv Detail & Related papers (2024-03-21T10:38:18Z) - Advancing 6D Pose Estimation in Augmented Reality -- Overcoming Projection Ambiguity with Uncontrolled Imagery [0.0]
This study addresses the challenge of accurate 6D pose estimation in Augmented Reality (AR)
We propose a novel approach that strategically decomposes the estimation of z-axis translation and focal length.
This methodology not only streamlines the 6D pose estimation process but also significantly enhances the accuracy of 3D object overlaying in AR settings.
arXiv Detail & Related papers (2024-03-20T09:22:22Z) - RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects [68.85305626324694]
Ray-marching in Camera Space (RiCS) is a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map.
We show that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects.
arXiv Detail & Related papers (2022-05-14T05:35:35Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - Learning Stereopsis from Geometric Synthesis for 6D Object Pose
Estimation [11.999630902627864]
Current monocular-based 6D object pose estimation methods generally achieve less competitive results than RGBD-based methods.
This paper proposes a 3D geometric volume based pose estimation method with a short baseline two-view setting.
Experiments show that our method outperforms state-of-the-art monocular-based methods, and is robust in different objects and scenes.
arXiv Detail & Related papers (2021-09-25T02:55:05Z) - 3D Registration for Self-Occluded Objects in Context [66.41922513553367]
We introduce the first deep learning framework capable of effectively handling this scenario.
Our method consists of an instance segmentation module followed by a pose estimation one.
It allows us to perform 3D registration in a one-shot manner, without requiring an expensive iterative procedure.
arXiv Detail & Related papers (2020-11-23T08:05:28Z) - SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static
Images [44.78174845839193]
Recent efforts have turned to learning 3D reconstruction without 3D supervision from RGB images with annotated 2D silhouettes.
These techniques still require multi-view annotations of the same object instance during training.
We propose SDF-SRN, an approach that requires only a single view of objects at training time.
arXiv Detail & Related papers (2020-10-20T17:59:47Z) - se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image
Residuals in Synthetic Domains [12.71983073907091]
This work proposes a data-driven optimization approach for long-term, 6D pose tracking.
It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object's model.
The proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images.
arXiv Detail & Related papers (2020-07-27T21:09:36Z) - Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D
Human Pose Estimation [107.07047303858664]
Large-scale human datasets with 3D ground-truth annotations are difficult to obtain in the wild.
We address this problem by augmenting existing 2D datasets with high-quality 3D pose fits.
The resulting annotations are sufficient to train from scratch 3D pose regressor networks that outperform the current state-of-the-art on in-the-wild benchmarks.
arXiv Detail & Related papers (2020-04-07T20:21:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.