Category-Level 6D Object Pose and Size Estimation using Self-Supervised
Deep Prior Deformation Networks
- URL: http://arxiv.org/abs/2207.05444v1
- Date: Tue, 12 Jul 2022 10:24:52 GMT
- Title: Category-Level 6D Object Pose and Size Estimation using Self-Supervised
Deep Prior Deformation Networks
- Authors: Jiehong Lin, Zewei Wei, Changxing Ding, Kui Jia
- Abstract summary: It is difficult to precisely annotate object instances and their semantics in 3D space, and as such, synthetic data are extensively used for these tasks.
In this work, we aim to address this issue in the task setting of Sim2Real, unsupervised domain adaptation for category-level 6D object pose and size estimation.
We propose a method that is built upon a novel CAMERA Deep Prior Deformation Network, shortened as DPDN.
- Score: 39.6823489555449
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is difficult to precisely annotate object instances and their semantics in
3D space, and as such, synthetic data are extensively used for these tasks,
e.g., category-level 6D object pose and size estimation. However, the easy
annotations in synthetic domains bring the downside effect of synthetic-to-real
(Sim2Real) domain gap. In this work, we aim to address this issue in the task
setting of Sim2Real, unsupervised domain adaptation for category-level 6D
object pose and size estimation. We propose a method that is built upon a novel
Deep Prior Deformation Network, shortened as DPDN. DPDN learns to deform
features of categorical shape priors to match those of object observations, and
is thus able to establish deep correspondence in the feature space for direct
regression of object poses and sizes. To reduce the Sim2Real domain gap, we
formulate a novel self-supervised objective upon DPDN via consistency learning;
more specifically, we apply two rigid transformations to each object
observation in parallel, and feed them into DPDN respectively to yield dual
sets of predictions; on top of the parallel learning, an inter-consistency term
is employed to keep cross consistency between dual predictions for improving
the sensitivity of DPDN to pose changes, while individual intra-consistency
ones are used to enforce self-adaptation within each learning itself. We train
DPDN on both training sets of the synthetic CAMERA25 and real-world REAL275
datasets; our results outperform the existing methods on REAL275 test set under
both the unsupervised and supervised settings. Ablation studies also verify the
efficacy of our designs. Our code is released publicly at
https://github.com/JiehongLin/Self-DPDN.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - Syn-to-Real Unsupervised Domain Adaptation for Indoor 3D Object Detection [50.448520056844885]
We propose a novel framework for syn-to-real unsupervised domain adaptation in indoor 3D object detection.
Our adaptation results from synthetic dataset 3D-FRONT to real-world datasets ScanNetV2 and SUN RGB-D demonstrate remarkable mAP25 improvements of 9.7% and 9.1% over Source-Only baselines.
arXiv Detail & Related papers (2024-06-17T08:18:41Z) - IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images [50.4538089115248]
Generalizable 3D object reconstruction from single-view RGB-D images remains a challenging task.
We propose a novel approach, IPoD, which harmonizes implicit field learning with point diffusion.
Experiments conducted on the CO3D-v2 dataset affirm the superiority of IPoD, achieving 7.8% improvement in F-score and 28.6% in Chamfer distance over existing methods.
arXiv Detail & Related papers (2024-03-30T07:17:37Z) - Geometry-Aware Network for Domain Adaptive Semantic Segmentation [64.00345743710653]
We propose a novel Geometry-Aware Network for Domain Adaptation (GANDA) to shrink the domain gaps.
We exploit 3D topology on the point clouds generated from RGB-D images for coordinate-color disentanglement and pseudo-labels refinement in the target domain.
Our model outperforms state-of-the-arts on GTA5->Cityscapes and SYNTHIA->Cityscapes.
arXiv Detail & Related papers (2022-12-02T00:48:44Z) - DCL-Net: Deep Correspondence Learning Network for 6D Pose Estimation [43.963630959349885]
We introduce a new method of Deep Correspondence Learning Network for direct 6D object pose estimation, shortened as DCL-Net.
We show that DCL-Net outperforms existing methods on three benchmarking datasets, including YCB-Video, LineMOD, and Oclussion-LineMOD.
arXiv Detail & Related papers (2022-10-11T08:04:40Z) - Unseen Object Instance Segmentation with Fully Test-time RGB-D
Embeddings Adaptation [14.258456366985444]
Recently, a popular solution is leveraging RGB-D features of large-scale synthetic data and applying the model to unseen real-world scenarios.
We re-emphasize the adaptation process across Sim2Real domains in this paper.
We propose a framework to conduct the Fully Test-time RGB-D Embeddings Adaptation (FTEA) based on parameters of the BatchNorm layer.
arXiv Detail & Related papers (2022-04-21T02:35:20Z) - Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose
Estimation [63.199549837604444]
3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision.
We cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target.
We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-05T03:52:57Z) - Introducing Pose Consistency and Warp-Alignment for Self-Supervised 6D
Object Pose Estimation in Color Images [38.9238085806793]
Most successful approaches to estimate the 6D pose of an object typically train a neural network by supervising the learning with annotated poses in real world images.
A two-stage 6D object pose estimator framework that can be applied on top of existing neural-network-based approaches is proposed.
arXiv Detail & Related papers (2020-03-27T11:53:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.