Related papers: Category Level 6D Object Pose Estimation from a Single RGB Image using Diffusion

Category Level 6D Object Pose Estimation from a Single RGB Image using Diffusion

URL: http://arxiv.org/abs/2412.11420v1
Date: Mon, 16 Dec 2024 03:39:33 GMT
Title: Category Level 6D Object Pose Estimation from a Single RGB Image using Diffusion
Authors: Adam Bethell, Ravi Garg, Ian Reid,
Abstract summary: We tackle the harder problem of pose estimation for category-level objects from a single RGB image.<n>We propose a novel solution that eliminates the need for specific object models or depth information.<n>Our approach outperforms the current state-of-the-art on the REAL275 dataset by a significant margin.
Score: 9.025235713063509
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Estimating the 6D pose and 3D size of an object from an image is a fundamental task in computer vision. Most current approaches are restricted to specific instances with known models or require ground truth depth information or point cloud captures from LIDAR. We tackle the harder problem of pose estimation for category-level objects from a single RGB image. We propose a novel solution that eliminates the need for specific object models or depth information. Our method utilises score-based diffusion models to generate object pose hypotheses to model the distribution of possible poses for the object. Unlike previous methods that rely on costly trained likelihood estimators to remove outliers before pose aggregation using mean pooling, we introduce a simpler approach using Mean Shift to estimate the mode of the distribution as the final pose estimate. Our approach outperforms the current state-of-the-art on the REAL275 dataset by a significant margin.

Related papers

MonoDiff9D: Monocular Category-Level 9D Object Pose Estimation via Diffusion Model [34.52439917115497]
We propose a diffusion-based monocular category-level 9D object pose generation method, MonoDiff9D. We first estimate coarse depth via DINOv2 from the monocular image in a zero-shot manner and convert it into a point cloud. We then fuse the global features of the point cloud with the input image and use the fused features along with the encoded time step to condition MonoDiff9D.
arXiv Detail & Related papers (2025-04-14T17:21:10Z)
Any6D: Model-free 6D Pose Estimation of Novel Objects [76.30057578269668]
We introduce Any6D, a model-free framework for 6D object pose estimation. It requires only a single RGB-D anchor image to estimate both the 6D pose and size of unknown objects in novel scenes. We evaluate our method on five challenging datasets.
arXiv Detail & Related papers (2025-03-24T13:46:21Z)
Particle-based 6D Object Pose Estimation from Point Clouds using Diffusion Models [15.582644209879957]
This work proposes training a diffusion-based generative model for 6D object pose estimation.<n>During inference, the trained generative model allows for sampling multiple particles, i.e., pose hypotheses.<n>We propose two novel and effective pose selection strategies that do not require any additional training or computationally intensive operations.
arXiv Detail & Related papers (2024-12-01T14:52:44Z)
DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses [59.51874686414509]
Current approaches approximate the continuous pose representation with a large number of discrete pose hypotheses. We present a Deep Voxel Matching Network (DVMNet) that eliminates the need for pose hypotheses and computes the relative object pose in a single pass. Our method delivers more accurate relative pose estimates for novel objects at a lower computational cost compared to state-of-the-art methods.
arXiv Detail & Related papers (2024-03-20T15:41:32Z)
Diff-DOPE: Differentiable Deep Object Pose Estimation [29.703385848843414]
We introduce Diff-DOPE, a 6-DoF pose refiner that takes as input an image, a 3D textured model of an object, and an initial pose of the object. The method uses differentiable rendering to update the object pose to minimize the visual error between the image and the projection of the model. We show that this simple, yet effective, idea is able to achieve state-of-the-art results on pose estimation datasets.
arXiv Detail & Related papers (2023-09-30T18:52:57Z)
MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation [23.615122326731115]
We propose a novel solution that makes use of RGB video streams. Our framework consists of three modules: a scale-aware monocular dense SLAM solution, a lightweight object pose predictor, and an object-level pose graph. Our experimental results demonstrate that when utilizing public dataset sequences with high-quality depth information, the proposed method exhibits comparable performance to state-of-the-art RGB-D methods.
arXiv Detail & Related papers (2023-08-17T08:29:54Z)
Learning to Estimate 6DoF Pose from Limited Data: A Few-Shot, Generalizable Approach using RGB Images [60.0898989456276]
We present a new framework named Cas6D for few-shot 6DoF pose estimation that is generalizable and uses only RGB images. To address the false positives of target object detection in the extreme few-shot setting, our framework utilizes a self-supervised pre-trained ViT to learn robust feature representations. Experimental results on the LINEMOD and GenMOP datasets demonstrate that Cas6D outperforms state-of-the-art methods by 9.2% and 3.8% accuracy (Proj-5) under the 32-shot setting.
arXiv Detail & Related papers (2023-06-13T07:45:42Z)
Generalizable Pose Estimation Using Implicit Scene Representations [4.124185654280966]
6-DoF pose estimation is an essential component of robotic manipulation pipelines. We address the generalization capability of pose estimation using models that contain enough information to render it in different poses. Our final evaluation shows a significant improvement in inference performance and speed compared to existing approaches.
arXiv Detail & Related papers (2023-05-26T20:42:52Z)
PoseMatcher: One-shot 6D Object Pose Estimation by Deep Feature Matching [51.142988196855484]
We propose PoseMatcher, an accurate model free one-shot object pose estimator. We create a new training pipeline for object to image matching based on a three-view system. To enable PoseMatcher to attend to distinct input modalities, an image and a pointcloud, we introduce IO-Layer.
arXiv Detail & Related papers (2023-04-03T21:14:59Z)
NOPE: Novel Object Pose Estimation from a Single Image [67.11073133072527]
We propose an approach that takes a single image of a new object as input and predicts the relative pose of this object in new images without prior knowledge of the object's 3D model. We achieve this by training a model to directly predict discriminative embeddings for viewpoints surrounding the object. This prediction is done using a simple U-Net architecture with attention and conditioned on the desired pose, which yields extremely fast inference.
arXiv Detail & Related papers (2023-03-23T18:55:43Z)
Unseen Object 6D Pose Estimation: A Benchmark and Baselines [62.8809734237213]
We propose a new task that enables and facilitates algorithms to estimate the 6D pose estimation of novel objects during testing. We collect a dataset with both real and synthetic images and up to 48 unseen objects in the test set. By training an end-to-end 3D correspondences network, our method finds corresponding points between an unseen object and a partial view RGBD image accurately and efficiently.
arXiv Detail & Related papers (2022-06-23T16:29:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.