DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal
Category-level Pose Estimation
- URL: http://arxiv.org/abs/2402.12647v2
- Date: Tue, 5 Mar 2024 07:12:34 GMT
- Title: DiffusionNOCS: Managing Symmetry and Uncertainty in Sim2Real Multi-Modal
Category-level Pose Estimation
- Authors: Takuya Ikeda, Sergey Zakharov, Tianyi Ko, Muhammad Zubair Irshad,
Robert Lee, Katherine Liu, Rares Ambrus, Koichi Nishiwaki
- Abstract summary: We propose a probabilistic model that relies on diffusion to estimate dense canonical maps crucial for recovering partial object shapes.
We introduce critical components to enhance performance by leveraging the strength of the diffusion models with multi-modal input representations.
Despite being trained solely on our generated synthetic data, our approach achieves state-of-the-art performance and unprecedented generalization qualities.
- Score: 20.676510832922016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper addresses the challenging problem of category-level pose
estimation. Current state-of-the-art methods for this task face challenges when
dealing with symmetric objects and when attempting to generalize to new
environments solely through synthetic data training. In this work, we address
these challenges by proposing a probabilistic model that relies on diffusion to
estimate dense canonical maps crucial for recovering partial object shapes as
well as establishing correspondences essential for pose estimation.
Furthermore, we introduce critical components to enhance performance by
leveraging the strength of the diffusion models with multi-modal input
representations. We demonstrate the effectiveness of our method by testing it
on a range of real datasets. Despite being trained solely on our generated
synthetic data, our approach achieves state-of-the-art performance and
unprecedented generalization qualities, outperforming baselines, even those
specifically trained on the target domain.
Related papers
- SeMOPO: Learning High-quality Model and Policy from Low-quality Offline Visual Datasets [32.496818080222646]
We propose a new approach to model-based offline reinforcement learning.
We provide a theoretical guarantee of model uncertainty and performance bound of SeMOPO.
Experimental results show that our method substantially outperforms all baseline methods.
arXiv Detail & Related papers (2024-06-13T15:16:38Z) - Robust Latent Representation Tuning for Image-text Classification [9.789498730131607]
We propose a robust latent representation tuning method for large models.
Our approach introduces a modality latent translation module to maximize the correlation between modalities, resulting in a robust representation.
Within this framework, common semantics are refined during training, and robust performance is achieved even in the absence of one modality.
arXiv Detail & Related papers (2024-06-10T06:29:00Z) - Cross-Database Liveness Detection: Insights from Comparative Biometric
Analysis [20.821562115822182]
Liveness detection is the capability to differentiate between genuine and spoofed biometric samples.
This research presents a comprehensive evaluation of liveness detection models.
Our work offers a blueprint for navigating the evolving rhythms of biometric security.
arXiv Detail & Related papers (2024-01-29T15:32:18Z) - pix2gestalt: Amodal Segmentation by Synthesizing Wholes [34.45464291259217]
pix2gestalt is a framework for zero-shot amodal segmentation.
We learn a conditional diffusion model for reconstructing whole objects in challenging zero-shot cases.
arXiv Detail & Related papers (2024-01-25T18:57:36Z) - Steerable Conditional Diffusion for Out-of-Distribution Adaptation in
Imaging Inverse Problems [78.76955228709241]
We introduce a novel sampling framework called Steerable Conditional Diffusion.
This framework adapts the denoising network specifically to the available measured data.
We achieve substantial enhancements in OOD performance across diverse imaging modalities.
arXiv Detail & Related papers (2023-08-28T08:47:06Z) - GenPose: Generative Category-level Object Pose Estimation via Diffusion
Models [5.1998359768382905]
We propose a novel solution by reframing categorylevel object pose estimation as conditional generative modeling.
Our approach achieves state-of-the-art performance on the REAL275 dataset, surpassing 50% and 60% on strict 5d2cm and 5d5cm metrics.
arXiv Detail & Related papers (2023-06-18T11:45:42Z) - Cluster-level pseudo-labelling for source-free cross-domain facial
expression recognition [94.56304526014875]
We propose the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for Facial Expression Recognition (FER)
Our method exploits self-supervised pretraining to learn good feature representations from the target data.
We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER.
arXiv Detail & Related papers (2022-10-11T08:24:50Z) - Improving robustness of jet tagging algorithms with adversarial training [56.79800815519762]
We investigate the vulnerability of flavor tagging algorithms via application of adversarial attacks.
We present an adversarial training strategy that mitigates the impact of such simulated attacks.
arXiv Detail & Related papers (2022-03-25T19:57:19Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - Progressive residual learning for single image dehazing [57.651704852274825]
A progressive residual learning strategy has been proposed to combine the physical model-free dehazing process with reformulated scattering model-based dehazing operations.
The proposed method performs favorably against the state-of-the-art methods on public dehazing benchmarks with better model interpretability and adaptivity for complex data.
arXiv Detail & Related papers (2021-03-14T16:54:44Z) - A Multi-Channel Neural Graphical Event Model with Negative Evidence [76.51278722190607]
Event datasets are sequences of events of various types occurring irregularly over the time-line.
We propose a non-parametric deep neural network approach in order to estimate the underlying intensity functions.
arXiv Detail & Related papers (2020-02-21T23:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.