Disentangled Implicit Shape and Pose Learning for Scalable 6D Pose
Estimation
- URL: http://arxiv.org/abs/2107.12549v1
- Date: Tue, 27 Jul 2021 01:55:30 GMT
- Title: Disentangled Implicit Shape and Pose Learning for Scalable 6D Pose
Estimation
- Authors: Yilin Wen, Xiangyu Li, Hao Pan, Lei Yang, Zheng Wang, Taku Komura,
Wenping Wang
- Abstract summary: We present a novel approach for scalable 6D pose estimation, by self-supervised learning on synthetic data of multiple objects using a single autoencoder.
We test our method on two multi-object benchmarks with real data, T-LESS and NOCS REAL275, and show it outperforms existing RGB-based methods in terms of pose estimation accuracy and generalization.
- Score: 44.8872454995923
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 6D pose estimation of rigid objects from a single RGB image has seen
tremendous improvements recently by using deep learning to combat complex
real-world variations, but a majority of methods build models on the per-object
level, failing to scale to multiple objects simultaneously. In this paper, we
present a novel approach for scalable 6D pose estimation, by self-supervised
learning on synthetic data of multiple objects using a single autoencoder. To
handle multiple objects and generalize to unseen objects, we disentangle the
latent object shape and pose representations, so that the latent shape space
models shape similarities, and the latent pose code is used for rotation
retrieval by comparison with canonical rotations. To encourage shape space
construction, we apply contrastive metric learning and enable the processing of
unseen objects by referring to similar training objects. The different
symmetries across objects induce inconsistent latent pose spaces, which we
capture with a conditioned block producing shape-dependent pose codebooks by
re-entangling shape and pose representations. We test our method on two
multi-object benchmarks with real data, T-LESS and NOCS REAL275, and show it
outperforms existing RGB-based methods in terms of pose estimation accuracy and
generalization.
Related papers
- Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching [19.730504197461144]
We present a novel generalizable object pose estimation method to determine the object pose using only one RGB image.
Our method offers generalization to unseen objects without extensive training, operates with a single reference image of the object, and eliminates the need for 3D object models or multiple views of the object.
arXiv Detail & Related papers (2024-11-24T14:31:50Z) - Object Pose Estimation Using Implicit Representation For Transparent Objects [0.0]
The render-and-compare method renders the object from multiple views and compares it against the given 2D image.
We show that if the object is represented as an implicit (neural) representation in the form of Neural Radiance Field (NeRF), it exhibits a more realistic rendering of the actual scene.
We evaluated our NeRF implementation of the render-and-compare method on transparent datasets and found that it surpassed the current state-of-the-art results.
arXiv Detail & Related papers (2024-10-17T11:51:12Z) - SABER-6D: Shape Representation Based Implicit Object Pose Estimation [15.744920692895919]
We propose a novel encoder-decoder architecture, named SABER, to learn the 6D pose of the object in the embedding space.
We perform shape representation as an auxiliary task which helps us in learning rotations space for an object based on 2D images.
arXiv Detail & Related papers (2024-08-11T21:59:34Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - Generative Category-Level Shape and Pose Estimation with Semantic
Primitives [27.692997522812615]
We propose a novel framework for category-level object shape and pose estimation from a single RGB-D image.
To handle the intra-category variation, we adopt a semantic primitive representation that encodes diverse shapes into a unified latent space.
We show that the proposed method achieves SOTA pose estimation performance and better generalization in the real-world dataset.
arXiv Detail & Related papers (2022-10-03T17:51:54Z) - Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing.
We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set.
By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z) - Shape Prior Deformation for Categorical 6D Object Pose and Size
Estimation [62.618227434286]
We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image.
We propose a deep network to reconstruct the 3D object model by explicitly modeling the deformation from a pre-learned categorical shape prior.
arXiv Detail & Related papers (2020-07-16T16:45:05Z) - CPS++: Improving Class-level 6D Pose and Shape Estimation From Monocular
Images With Self-Supervised Learning [74.53664270194643]
Modern monocular 6D pose estimation methods can only cope with a handful of object instances.
We propose a novel method for class-level monocular 6D pose estimation, coupled with metric shape retrieval.
We experimentally demonstrate that we can retrieve precise 6D poses and metric shapes from a single RGB image.
arXiv Detail & Related papers (2020-03-12T15:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.