AlignPose: Generalizable 6D Pose Estimation via Multi-view Feature-metric Alignment
- URL: http://arxiv.org/abs/2512.20538v1
- Date: Tue, 23 Dec 2025 17:29:08 GMT
- Title: AlignPose: Generalizable 6D Pose Estimation via Multi-view Feature-metric Alignment
- Authors: Anna Šárová Mikeštíková, Médéric Fourmy, Martin Cífka, Josef Sivic, Vladimir Petrik,
- Abstract summary: We introduce AlignPose, a 6D object pose estimation method that aggregates information from multiple RGB views.<n>Key component of this approach is a new multi-view feature-metric refinement specifically designed for object pose.<n>It optimize a single, consistent world-frame object pose minimizing the feature discrepancy between on-the-fly rendered object features and observed image features.
- Score: 18.198789096671245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Single-view RGB model-based object pose estimation methods achieve strong generalization but are fundamentally limited by depth ambiguity, clutter, and occlusions. Multi-view pose estimation methods have the potential to solve these issues, but existing works rely on precise single-view pose estimates or lack generalization to unseen objects. We address these challenges via the following three contributions. First, we introduce AlignPose, a 6D object pose estimation method that aggregates information from multiple extrinsically calibrated RGB views and does not require any object-specific training or symmetry annotation. Second, the key component of this approach is a new multi-view feature-metric refinement specifically designed for object pose. It optimizes a single, consistent world-frame object pose minimizing the feature discrepancy between on-the-fly rendered object features and observed image features across all views simultaneously. Third, we report extensive experiments on four datasets (YCB-V, T-LESS, ITODD-MV, HouseCat6D) using the BOP benchmark evaluation and show that AlignPose outperforms other published methods, especially on challenging industrial datasets where multiple views are readily available in practice.
Related papers
- PoseGAM: Robust Unseen Object Pose Estimation via Geometry-Aware Multi-View Reasoning [49.66437612420291]
PoseGAM is a geometry-aware multi-view framework that directly predicts object pose from a query image and multiple template images.<n>We construct a large-scale synthetic dataset containing more than 190k objects under diverse environmental conditions.
arXiv Detail & Related papers (2025-12-11T17:29:25Z) - One2Any: One-Reference 6D Pose Estimation for Any Object [98.50085481362808]
6D object pose estimation remains challenging for many applications due to dependencies on complete 3D models, multi-view images, or training limited to specific object categories.<n>We propose a novel method One2Any that estimates the relative 6-degrees of freedom (DOF) object pose using only a single reference-single query RGB-D image.<n> Experiments on multiple benchmark datasets demonstrate that our model generalizes well to novel objects, achieving state-of-the-art accuracy and even rivaling methods that require multi-view or CAD inputs, at a fraction of compute.
arXiv Detail & Related papers (2025-05-07T03:54:59Z) - BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation [81.24160191975781]
This paper presents a general RGB-based approach for object pose estimation, specifically designed to address challenges in sparse-view settings.<n>To overcome these limitations, we introduce corner points of the object bounding box as an intermediate representation of the object pose.<n>The 3D object corners can be reliably recovered from sparse input views, while the 2D corner points in the target view are estimated through a novel reference-based point datasets.
arXiv Detail & Related papers (2025-04-10T17:58:35Z) - Active 6D Pose Estimation for Textureless Objects using Multi-View RGB Frames [10.859307261818362]
Estimating the 6D pose of textureless objects from RBG images is an important problem in robotics.<n>We propose a comprehensive active perception framework for estimating the 6D poses of textureless objects using only RGB images.
arXiv Detail & Related papers (2025-03-05T18:28:32Z) - Generalizable Single-view Object Pose Estimation by Two-side Generating and Matching [19.730504197461144]
We present a novel generalizable object pose estimation method to determine the object pose using only one RGB image.
Our method offers generalization to unseen objects without extensive training, operates with a single reference image of the object, and eliminates the need for 3D object models or multiple views of the object.
arXiv Detail & Related papers (2024-11-24T14:31:50Z) - POPE: 6-DoF Promptable Pose Estimation of Any Object, in Any Scene, with
One Reference [72.32413378065053]
We propose a general paradigm for object pose estimation, called Promptable Object Pose Estimation (POPE)
POPE enables zero-shot 6DoF object pose estimation for any target object in any scene, while only a single reference is adopted as the support view.
Comprehensive experimental results demonstrate that POPE exhibits unrivaled robust performance in zero-shot settings.
arXiv Detail & Related papers (2023-05-25T05:19:17Z) - Multi-view object pose estimation from correspondence distributions and
epipolar geometry [0.0]
We present a multi-view pose estimation method which aggregates learned 2D-3D distributions from multiple views for both the initial estimate and optional refinement.
Our method reduces pose estimation errors by 80-91% compared to the best single-view method, and we present state-of-the-art results on T-LESS with four views, even compared with methods using five and eight views.
arXiv Detail & Related papers (2022-10-03T13:30:40Z) - Coupled Iterative Refinement for 6D Multi-Object Pose Estimation [64.7198752089041]
Given a set of known 3D objects and an RGB or RGB-D input image, we detect and estimate the 6D pose of each object.
Our approach iteratively refines both pose and correspondence in a tightly coupled manner, allowing us to dynamically remove outliers to improve accuracy.
arXiv Detail & Related papers (2022-04-26T18:00:08Z) - CosyPose: Consistent multi-view multi-object 6D pose estimation [48.097599674329004]
We present a single-view single-object 6D pose estimation method, which we use to generate 6D object pose hypotheses.
Second, we develop a robust method for matching individual 6D object pose hypotheses across different input images.
Third, we develop a method for global scene refinement given multiple object hypotheses and their correspondences across views.
arXiv Detail & Related papers (2020-08-19T14:11:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.