DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal
Category-level Pose Estimation
- URL: http://arxiv.org/abs/2402.12647v2
- Date: Tue, 5 Mar 2024 07:12:34 GMT
- Title: DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal
Category-level Pose Estimation
- Authors: Takuya Ikeda, Sergey Zakharov, Tianyi Ko, Muhammad Zubair Irshad,
Robert Lee, Katherine Liu, Rares Ambrus, Koichi Nishiwaki
- Abstract summary: We propose a probabilistic model that relies on diffusion to estimate dense canonical maps crucial for recovering partial object shapes.
We introduce critical components to enhance performance by leveraging the strength of the diffusion models with multi-modal input representations.
Despite being trained solely on our generated synthetic data, our approach achieves state-of-the-art performance and unprecedented generalization qualities.
- Score: 20.676510832922016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the challenging problem of category-level pose
estimation. Current state-of-the-art methods for this task face challenges when
dealing with symmetric objects and when attempting to generalize to new
environments solely through synthetic data training. In this work, we address
these challenges by proposing a probabilistic model that relies on diffusion to
estimate dense canonical maps crucial for recovering partial object shapes as
well as establishing correspondences essential for pose estimation.
Furthermore, we introduce critical components to enhance performance by
leveraging the strength of the diffusion models with multi-modal input
representations. We demonstrate the effectiveness of our method by testing it
on a range of real datasets. Despite being trained solely on our generated
synthetic data, our approach achieves state-of-the-art performance and
unprecedented generalization qualities, outperforming baselines, even those
specifically trained on the target domain.
Related papers
- Testing Generalizability in Causal Inference [3.547529079746247]
There is no formal procedure for statistically evaluating generalizability in machine learning algorithms.
We propose a systematic and quantitative framework for evaluating model generalizability in causal inference settings.
By basing simulations on real data, our method ensures more realistic evaluations, which is often missing in current work.
arXiv Detail & Related papers (2024-11-05T11:44:00Z) - MITA: Bridging the Gap between Model and Data for Test-time Adaptation [68.62509948690698]
Test-Time Adaptation (TTA) has emerged as a promising paradigm for enhancing the generalizability of models.
We propose Meet-In-The-Middle based MITA, which introduces energy-based optimization to encourage mutual adaptation of the model and data from opposing directions.
arXiv Detail & Related papers (2024-10-12T07:02:33Z) - Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - A Simple Background Augmentation Method for Object Detection with Diffusion Model [53.32935683257045]
In computer vision, it is well-known that a lack of data diversity will impair model performance.
We propose a simple yet effective data augmentation approach by leveraging advancements in generative models.
Background augmentation, in particular, significantly improves the models' robustness and generalization capabilities.
arXiv Detail & Related papers (2024-08-01T07:40:00Z) - SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets [32.496818080222646]
We propose a new approach to model-based offline reinforcement learning.
We provide a theoretical guarantee of model uncertainty and performance bound of SeMOPO.
Experimental results show that our method substantially outperforms all baseline methods.
arXiv Detail & Related papers (2024-06-13T15:16:38Z) - Robust Latent Representation Tuning for Image-text Classification [9.789498730131607]
We propose a robust latent representation tuning method for large models.
Our approach introduces a modality latent translation module to maximize the correlation between modalities, resulting in a robust representation.
Within this framework, common semantics are refined during training, and robust performance is achieved even in the absence of one modality.
arXiv Detail & Related papers (2024-06-10T06:29:00Z) - pix2gestalt: Amodal Segmentation by Synthesizing Wholes [34.45464291259217]
pix2gestalt is a framework for zero-shot amodal segmentation.
We learn a conditional diffusion model for reconstructing whole objects in challenging zero-shot cases.
arXiv Detail & Related papers (2024-01-25T18:57:36Z) - GenPose: Generative Category-level Object Pose Estimation via Diffusion
Models [5.1998359768382905]
We propose a novel solution by reframing categorylevel object pose estimation as conditional generative modeling.
Our approach achieves state-of-the-art performance on the REAL275 dataset, surpassing 50% and 60% on strict 5d2cm and 5d5cm metrics.
arXiv Detail & Related papers (2023-06-18T11:45:42Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.