GRLoc: Geometric Representation Regression for Visual Localization
- URL: http://arxiv.org/abs/2511.13864v1
- Date: Mon, 17 Nov 2025 19:30:22 GMT
- Title: GRLoc: Geometric Representation Regression for Visual Localization
- Authors: Changyang Li, Xuejian Ma, Lixiang Liu, Zhan Li, Qingan Yan, Yi Xu,
- Abstract summary: Absolute Pose Regression has emerged as a compelling paradigm for visual localization.<n>In this work, we propose a geometrically-grounded alternative.<n>We demonstrate state-of-the-art performance on 7-Scenes and Cambridge Landmarks datasets.
- Score: 14.972457710617816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Absolute Pose Regression (APR) has emerged as a compelling paradigm for visual localization. However, APR models typically operate as black boxes, directly regressing a 6-DoF pose from a query image, which can lead to memorizing training views rather than understanding 3D scene geometry. In this work, we propose a geometrically-grounded alternative. Inspired by novel view synthesis, which renders images from intermediate geometric representations, we reformulate APR as its inverse that regresses the underlying 3D representations directly from the image, and we name this paradigm Geometric Representation Regression (GRR). Our model explicitly predicts two disentangled geometric representations in the world coordinate system: (1) a ray bundle's directions to estimate camera rotation, and (2) a corresponding pointmap to estimate camera translation. The final 6-DoF camera pose is then recovered from these geometric components using a differentiable deterministic solver. This disentangled approach, which separates the learned visual-to-geometry mapping from the final pose calculation, introduces a strong geometric prior into the network. We find that the explicit decoupling of rotation and translation predictions measurably boosts performance. We demonstrate state-of-the-art performance on 7-Scenes and Cambridge Landmarks datasets, validating that modeling the inverse rendering process is a more robust path toward generalizable absolute pose estimation.
Related papers
- RnG: A Unified Transformer for Complete 3D Modeling from Partial Observations [70.83499963694238]
RnG (Reconstruction and Generation) is a novel feed-forward Transformer that unifies reconstruction and generation.<n>It reconstructs visible geometry and generates plausible, coherent unseen geometry and appearance.<n>Our method achieves state-of-the-art performance in both generalizable 3D reconstruction and novel view generation.
arXiv Detail & Related papers (2026-03-01T17:25:32Z) - G3Splat: Geometrically Consistent Generalizable Gaussian Splatting [30.752029360892504]
We introduce G3Splat, which enforces geometric priors to obtain geometrically consistent 3D scene representations.<n>Trained on RE10K, our approach achieves state-of-the-art performance in (i) geometrically consistent reconstruction, (ii) relative pose estimation, and (iii) novel-view synthesis.
arXiv Detail & Related papers (2025-12-19T13:11:55Z) - Perspective from a Higher Dimension: Can 3D Geometric Priors Help Visual Floorplan Localization? [8.82283453148819]
Self-localization of building's floorplans has attracted researchers' interest.<n>Since floorplans are minimalist representations of a building's structure, modal and geometric differences between visual perceptions and floorplans pose challenges to this task.<n>Existing methods cleverly utilize 2D geometric features and pose filters to achieve promising performance.<n>This paper views the 2D Floorplan localization problem from a higher dimension by injecting 3D geometric priors into the visual FLoc algorithm.
arXiv Detail & Related papers (2025-07-25T01:34:26Z) - Dens3R: A Foundation Model for 3D Geometry Prediction [44.13431776180547]
Dens3R is a 3D foundation model designed for joint geometric dense prediction.<n>By integrating image-pair matching features with intrinsic invariance modeling, Dens3R accurately regresses multiple geometric quantities.
arXiv Detail & Related papers (2025-07-22T07:22:30Z) - FLARE: Feed-forward Geometry, Appearance and Camera Estimation from Uncalibrated Sparse Views [100.45129752375658]
We present FLARE, a feed-forward model designed to infer high-quality camera poses and 3D geometry from uncalibrated sparse-view images.<n>Our solution features a cascaded learning paradigm with camera pose serving as the critical bridge, recognizing its essential role in mapping 3D structures onto 2D image planes.
arXiv Detail & Related papers (2025-02-17T18:54:05Z) - DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction [67.13370009386635]
We introduce Dual Point Maps (DualPM), where a pair of point maps is extracted from the same image-one associating pixels to their 3D locations on the object and the other to a canonical version of the object in its rest pose.<n>We show that 3D reconstruction and 3D pose estimation can be reduced to the prediction of DualPMs.
arXiv Detail & Related papers (2024-12-05T18:59:48Z) - Geometric Point Attention Transformer for 3D Shape Reassembly [17.34739330880715]
We present a network specifically designed to address the challenges of reasoning about geometric relationships.<n>We integrate both global shape information and local pairwise geometric features, along with poses represented as rotation and translation vectors for each part.<n>We evaluate our model on both the semantic and geometric assembly tasks, showing that it outperforms previous methods in absolute pose estimation.
arXiv Detail & Related papers (2024-11-26T15:29:38Z) - RelPose++: Recovering 6D Poses from Sparse-view Observations [66.6922660401558]
We address the task of estimating 6D camera poses from sparse-view image sets (2-8 images)
We build on the recent RelPose framework which learns a network that infers distributions over relative rotations over image pairs.
Our final system results in large improvements in 6D pose prediction over prior art on both seen and unseen object categories.
arXiv Detail & Related papers (2023-05-08T17:59:58Z) - Self-supervised Geometric Perception [96.89966337518854]
Self-supervised geometric perception is a framework to learn a feature descriptor for correspondence matching without any ground-truth geometric model labels.
We show that SGP achieves state-of-the-art performance that is on-par or superior to the supervised oracles trained using ground-truth labels.
arXiv Detail & Related papers (2021-03-04T15:34:43Z) - Exploring intermediate representation for monocular vehicle pose
estimation [38.85309013717312]
We present a new learning-based framework to recover vehicle pose in SO(3) from a single RGB image.
In contrast to previous works that map from local appearance to observation angles, we explore a progressive approach by extracting meaningful Intermediate Geometrical Representations (IGRs)
This approach features a deep model that transforms perceived intensities to IGRs, which are mapped to a 3D representation encoding object orientation in the camera coordinate system.
arXiv Detail & Related papers (2020-11-17T06:30:51Z) - Geometric Correspondence Fields: Learned Differentiable Rendering for 3D
Pose Refinement in the Wild [96.09941542587865]
We present a novel 3D pose refinement approach based on differentiable rendering for objects of arbitrary categories in the wild.
In this way, we precisely align 3D models to objects in RGB images which results in significantly improved 3D pose estimates.
We evaluate our approach on the challenging Pix3D dataset and achieve up to 55% relative improvement compared to state-of-the-art refinement methods in multiple metrics.
arXiv Detail & Related papers (2020-07-17T12:34:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.