Diffusion Features for Zero-Shot 6DoF Object Pose Estimation
- URL: http://arxiv.org/abs/2411.16668v1
- Date: Mon, 25 Nov 2024 18:53:56 GMT
- Title: Diffusion Features for Zero-Shot 6DoF Object Pose Estimation
- Authors: Bernd Von Gimborn, Philipp Ausserlechner, Markus Vincze, Stefan Thalhammer,
- Abstract summary: This study assesses the influence of Latent Diffusion Model (LDM) backbones on zero-shot pose estimation.
A template-based multi-staged method for estimating poses in a zero-shot fashion using LDMs is presented.
- Score: 7.949705607963995
- License:
- Abstract: Zero-shot object pose estimation enables the retrieval of object poses from images without necessitating object-specific training. In recent approaches this is facilitated by vision foundation models (VFM), which are pre-trained models that are effectively general-purpose feature extractors. The characteristics exhibited by these VFMs vary depending on the training data, network architecture, and training paradigm. The prevailing choice in this field are self-supervised Vision Transformers (ViT). This study assesses the influence of Latent Diffusion Model (LDM) backbones on zero-shot pose estimation. In order to facilitate a comparison between the two families of models on a common ground we adopt and modify a recent approach. Therefore, a template-based multi-staged method for estimating poses in a zero-shot fashion using LDMs is presented. The efficacy of the proposed approach is empirically evaluated on three standard datasets for object-specific 6DoF pose estimation. The experiments demonstrate an Average Recall improvement of up to 27% over the ViT baseline. The source code is available at: https://github.com/BvG1993/DZOP.
Related papers
- Diff9D: Diffusion-Based Domain-Generalized Category-Level 9-DoF Object Pose Estimation [68.81887041766373]
We introduce a diffusion-based paradigm for domain-generalized 9-DoF object pose estimation.
We propose an effective diffusion model to redefine 9-DoF object pose estimation from a generative perspective.
We show that our method achieves state-of-the-art domain generalization performance.
arXiv Detail & Related papers (2025-02-04T17:46:34Z) - Category Level 6D Object Pose Estimation from a Single RGB Image using Diffusion [9.025235713063509]
We tackle the harder problem of pose estimation for category-level objects from a single RGB image.
We propose a novel solution that eliminates the need for specific object models or depth information.
Our approach outperforms the current state-of-the-art on the REAL275 dataset by a significant margin.
arXiv Detail & Related papers (2024-12-16T03:39:33Z) - Particle-based 6D Object Pose Estimation from Point Clouds using Diffusion Models [15.582644209879957]
This work proposes training a diffusion-based generative model for 6D object pose estimation.
During inference, the trained generative model allows for sampling multiple particles, i.e., pose hypotheses.
We propose two novel and effective pose selection strategies that do not require any additional training or computationally intensive operations.
arXiv Detail & Related papers (2024-12-01T14:52:44Z) - Opinion-Unaware Blind Image Quality Assessment using Multi-Scale Deep Feature Statistics [54.08757792080732]
We propose integrating deep features from pre-trained visual models with a statistical analysis model to achieve opinion-unaware BIQA (OU-BIQA)
Our proposed model exhibits superior consistency with human visual perception compared to state-of-the-art BIQA models.
arXiv Detail & Related papers (2024-05-29T06:09:34Z) - Diffusion-Based Particle-DETR for BEV Perception [94.88305708174796]
Bird-Eye-View (BEV) is one of the most widely-used scene representations for visual perception in Autonomous Vehicles (AVs)
Recent diffusion-based methods offer a promising approach to uncertainty modeling for visual perception but fail to effectively detect small objects in the large coverage of the BEV.
Here, we address this problem by combining the diffusion paradigm with current state-of-the-art 3D object detectors in BEV.
arXiv Detail & Related papers (2023-12-18T09:52:14Z) - FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking.
Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z) - GenPose: Generative Category-level Object Pose Estimation via Diffusion
Models [5.1998359768382905]
We propose a novel solution by reframing categorylevel object pose estimation as conditional generative modeling.
Our approach achieves state-of-the-art performance on the REAL275 dataset, surpassing 50% and 60% on strict 5d2cm and 5d5cm metrics.
arXiv Detail & Related papers (2023-06-18T11:45:42Z) - A Billion-scale Foundation Model for Remote Sensing Images [5.065947993017157]
Three key factors in pretraining foundation models are the pretraining method, the size of the pretraining dataset, and the number of model parameters.
This paper examines the effect of increasing the number of model parameters on the performance of foundation models in downstream tasks.
To the best of our knowledge, this is the first billion-scale foundation model in the remote sensing field.
arXiv Detail & Related papers (2023-04-11T13:33:45Z) - MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare [84.80956484848505]
MegaPose is a method to estimate the 6D pose of novel objects, that is, objects unseen during training.
We present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects.
Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner.
arXiv Detail & Related papers (2022-12-13T19:30:03Z) - Probabilistic Modeling for Human Mesh Recovery [73.11532990173441]
This paper focuses on the problem of 3D human reconstruction from 2D evidence.
We recast the problem as learning a mapping from the input to a distribution of plausible 3D poses.
arXiv Detail & Related papers (2021-08-26T17:55:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.