Visible Structure Retrieval for Lightweight Image-Based Relocalisation
- URL: http://arxiv.org/abs/2511.12503v1
- Date: Sun, 16 Nov 2025 08:32:18 GMT
- Title: Visible Structure Retrieval for Lightweight Image-Based Relocalisation
- Authors: Fereidoon Zangeneh, Leonard Bruns, Amit Dekel, Alessandro Pieropan, Patric Jensfelt,
- Abstract summary: We propose a new paradigm for making structure-based relocalisation tractable.<n>We learn a direct mapping from image observations to the visible scene structure in a compact neural network.<n>Given a query image, a forward pass through our novel visible structure retrieval network allows the subset of 3D structure points in the map that the image views.
- Score: 41.33166541705719
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate camera pose estimation from an image observation in a previously mapped environment is commonly done through structure-based methods: by finding correspondences between 2D keypoints on the image and 3D structure points in the map. In order to make this correspondence search tractable in large scenes, existing pipelines either rely on search heuristics, or perform image retrieval to reduce the search space by comparing the current image to a database of past observations. However, these approaches result in elaborate pipelines or storage requirements that grow with the number of past observations. In this work, we propose a new paradigm for making structure-based relocalisation tractable. Instead of relying on image retrieval or search heuristics, we learn a direct mapping from image observations to the visible scene structure in a compact neural network. Given a query image, a forward pass through our novel visible structure retrieval network allows obtaining the subset of 3D structure points in the map that the image views, thus reducing the search space of 2D-3D correspondences. We show that our proposed method enables performing localisation with an accuracy comparable to the state of the art, while requiring lower computational and storage footprint.
Related papers
- NOVA3R: Non-pixel-aligned Visual Transformer for Amodal 3D Reconstruction [99.52487968452198]
NOVA3R is an effective approach for non-pixel-aligned 3D reconstruction from a set of unposed images in a feed-forward manner.<n>It produces physically plausible geometry with fewer duplicated structures in overlapping regions.<n>It outperforms state-of-the-art methods in terms of reconstruction accuracy and completeness.
arXiv Detail & Related papers (2026-03-04T15:36:25Z) - SAB3R: Semantic-Augmented Backbone in 3D Reconstruction [19.236494823612507]
We introduce a new task, Map and Locate, which unifies the objectives of open-vocabulary segmentation and 3D reconstruction.<n>Specifically, Map and Locate involves generating a point cloud from an unposed video and segmenting object instances based on open-vocabulary queries.<n>This task serves as a critical step toward real-world embodied AI applications and introduces a practical task that bridges reconstruction, recognition and reorganization.
arXiv Detail & Related papers (2025-06-02T18:00:04Z) - Improved Scene Landmark Detection for Camera Localization [11.56648898250606]
Method based on scene landmark detection (SLD) was recently proposed to address these limitations.
It involves training a convolutional neural network (CNN) to detect a few predetermined, salient, scene-specific 3D points or landmarks.
We show that the accuracy gap was due to insufficient model capacity and noisy labels during training.
arXiv Detail & Related papers (2024-01-31T18:59:12Z) - DisPlacing Objects: Improving Dynamic Vehicle Detection via Visual Place
Recognition under Adverse Conditions [29.828201168816243]
We investigate whether a prior map can be leveraged to aid in the detection of dynamic objects in a scene without the need for a 3D map.
We contribute an algorithm which refines an initial set of candidate object detections and produces a refined subset of highly accurate detections using a prior map.
arXiv Detail & Related papers (2023-06-30T10:46:51Z) - NAVI: Category-Agnostic Image Collections with High-Quality 3D Shape and
Pose Annotations [64.95582364215548]
NAVI is a new dataset of category-agnostic image collections with high-quality 3D scans and per-image 2D-3D alignments.
These 2D-3D alignments allow us to extract accurate derivative annotations such as dense pixel correspondences, depth and segmentation maps.
arXiv Detail & Related papers (2023-06-15T13:11:30Z) - RelPose++: Recovering 6D Poses from Sparse-view Observations [66.6922660401558]
We address the task of estimating 6D camera poses from sparse-view image sets (2-8 images)
We build on the recent RelPose framework which learns a network that infers distributions over relative rotations over image pairs.
Our final system results in large improvements in 6D pose prediction over prior art on both seen and unseen object categories.
arXiv Detail & Related papers (2023-05-08T17:59:58Z) - CheckerPose: Progressive Dense Keypoint Localization for Object Pose
Estimation with Graph Neural Network [66.24726878647543]
Estimating the 6-DoF pose of a rigid object from a single RGB image is a crucial yet challenging task.
Recent studies have shown the great potential of dense correspondence-based solutions.
We propose a novel pose estimation algorithm named CheckerPose, which improves on three main aspects.
arXiv Detail & Related papers (2023-03-29T17:30:53Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - Predicting Visual Overlap of Images Through Interpretable Non-Metric Box
Embeddings [29.412748394892105]
We propose an interpretable image-embedding that cuts the search in scale space to essentially a lookup.
We show how this embedding yields competitive image-matching results, while being simpler, faster, and also interpretable by humans.
arXiv Detail & Related papers (2020-08-13T10:01:07Z) - Object Detection on Single Monocular Images through Canonical
Correlation Analysis [3.4722706398428493]
We retrieve 3-D object information from single monocular images without using extra 3-D data like points cloud or depth images.
We propose a two-dimensional CCA framework to fuse monocular images and corresponding predicted depth images for basic computer vision tasks.
arXiv Detail & Related papers (2020-02-13T05:03:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.