Related papers: RayPose: Ray Bundling Diffusion for Template Views in Unseen 6D Object Pose Estimation

RayPose: Ray Bundling Diffusion for Template Views in Unseen 6D Object Pose Estimation

URL: http://arxiv.org/abs/2510.18521v1
Date: Tue, 21 Oct 2025 11:01:20 GMT
Title: RayPose: Ray Bundling Diffusion for Template Views in Unseen 6D Object Pose Estimation
Authors: Junwen Huang, Shishir Reddy Vutukur, Peter KT Yu, Nassir Navab, Slobodan Ilic, Benjamin Busam,
Abstract summary: We reformulate template-based object pose estimation as a ray alignment problem.<n>Inspired by recent progress in diffusion-based camera pose estimation, we embed this formulation into a diffusion transformer architecture.<n>A coarse-to-fine training strategy based on narrowed template sampling improves performance without modifying the network architecture.
Score: 57.182509595660946
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Typical template-based object pose pipelines estimate the pose by retrieving the closest matching template and aligning it with the observed image. However, failure to retrieve the correct template often leads to inaccurate pose predictions. To address this, we reformulate template-based object pose estimation as a ray alignment problem, where the viewing directions from multiple posed template images are learned to align with a non-posed query image. Inspired by recent progress in diffusion-based camera pose estimation, we embed this formulation into a diffusion transformer architecture that aligns a query image with a set of posed templates. We reparameterize object rotation using object-centered camera rays and model object translation by extending scale-invariant translation estimation to dense translation offsets. Our model leverages geometric priors from the templates to guide accurate query pose inference. A coarse-to-fine training strategy based on narrowed template sampling improves performance without modifying the network architecture. Extensive experiments across multiple benchmark datasets show competitive results of our method compared to state-of-the-art approaches in unseen object pose estimation.

Related papers

PoseGAM: Robust Unseen Object Pose Estimation via Geometry-Aware Multi-View Reasoning [49.66437612420291]
PoseGAM is a geometry-aware multi-view framework that directly predicts object pose from a query image and multiple template images.<n>We construct a large-scale synthetic dataset containing more than 190k objects under diverse environmental conditions.
arXiv Detail & Related papers (2025-12-11T17:29:25Z)
Co-op: Correspondence-based Novel Object Pose Estimation [14.598853174946656]
Co-op is a novel method for accurately and robustly estimating the 6DoF pose of objects unseen during training from a single RGB image.<n>Our method requires only the CAD model of the target object and can precisely estimate its pose without any additional fine-tuning.
arXiv Detail & Related papers (2025-03-22T11:24:19Z)
Diffeomorphic Template Registration for Atmospheric Turbulence Mitigation [50.16004183320537]
We describe a method for recovering the irradiance underlying a collection of images corrupted by atmospheric turbulence. We select one of the images as a reference, and model the deformation in this image by the aggregation of the optical flow from it to other images. We achieve state-of-the-art performance despite its simplicity.
arXiv Detail & Related papers (2024-05-06T17:39:53Z)
Cameras as Rays: Pose Estimation via Ray Diffusion [54.098613859015856]
Estimating camera poses is a fundamental task for 3D reconstruction and remains challenging given sparsely sampled views. We propose a distributed representation of camera pose that treats a camera as a bundle of rays. Our proposed methods, both regression- and diffusion-based, demonstrate state-of-the-art performance on camera pose estimation on CO3D.
arXiv Detail & Related papers (2024-02-22T18:59:56Z)
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [55.77542145604758]
FoundationPose is a unified foundation model for 6D object pose estimation and tracking. Our approach can be instantly applied at test-time to a novel object without fine-tuning.
arXiv Detail & Related papers (2023-12-13T18:28:09Z)
FoundPose: Unseen Object Pose Estimation with Foundation Features [11.32559845631345]
FoundPose is a model-based method for 6D pose estimation of unseen objects from a single RGB image. The method can quickly onboard new objects using their 3D models without requiring any object- or task-specific training.
arXiv Detail & Related papers (2023-11-30T18:52:29Z)
DTF-Net: Category-Level Pose Estimation and Shape Reconstruction via Deformable Template Field [29.42222066097076]
Estimating 6D poses and reconstructing 3D shapes of objects in open-world scenes from RGB-depth image pairs is challenging. We propose the DTF-Net, a novel framework for pose estimation and shape reconstruction based on implicit neural fields of object categories.
arXiv Detail & Related papers (2023-08-04T10:35:40Z)
LocPoseNet: Robust Location Prior for Unseen Object Pose Estimation [69.70498875887611]
LocPoseNet is able to robustly learn location prior for unseen objects. Our method outperforms existing works by a large margin on LINEMOD and GenMOP.
arXiv Detail & Related papers (2022-11-29T15:21:34Z)
Category Level Object Pose Estimation via Neural Analysis-by-Synthesis [64.14028598360741]
In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module. The image synthesis network is designed to efficiently span the pose configuration space. We experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone.
arXiv Detail & Related papers (2020-08-18T20:30:47Z)
Novel Object Viewpoint Estimation through Reconstruction Alignment [45.16865218423492]
We learn a reconstruct and align approach to estimate the viewpoint of a novel object. In particular, we propose learning two networks: the first maps images to a 3D geometry-aware feature bottleneck and is trained via an image-to-image translation loss. At test time, our model finds the relative transformation that best aligns the bottleneck features of our test image to a reference image.
arXiv Detail & Related papers (2020-06-05T17:58:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.