FUSELOC: Fusing Global and Local Descriptors to Disambiguate 2D-3D Matching in Visual Localization
- URL: http://arxiv.org/abs/2408.12037v2
- Date: Tue, 26 Aug 2025 04:54:06 GMT
- Title: FUSELOC: Fusing Global and Local Descriptors to Disambiguate 2D-3D Matching in Visual Localization
- Authors: Son Tung Nguyen, Alejandro Fontan, Michael Milford, Tobias Fischer,
- Abstract summary: Direct 2D-3D matching requires significantly less memory but suffers from lower accuracy due to the larger and more ambiguous search space.<n>We address this ambiguity by fusing local and global descriptors using a weighted average operator.<n>We achieve performance close to hierarchical methods while using 43% less memory and running 1.6 times faster.
- Score: 52.57327385675752
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hierarchical visual localization methods achieve state-of-the-art accuracy but require substantial memory as they need to store all database images. Direct 2D-3D matching requires significantly less memory but suffers from lower accuracy due to the larger and more ambiguous search space. We address this ambiguity by fusing local and global descriptors using a weighted average operator. This operator rearranges the local descriptor space so that geographically nearby local descriptors are closer in the feature space according to the global descriptors. This decreases the number of irrelevant competing descriptors, especially if they are geographically distant, thus increasing the correct matching likelihood. We consistently improve the accuracy over local-only systems, and we achieve performance close to hierarchical methods while using 43\% less memory and running 1.6 times faster. Extensive experiments on four challenging datasets -- Cambridge Landmarks, Aachen Day/Night, RobotCar Seasons, and Extended CMU Seasons -- demonstrate that, for the first time, direct matching algorithms can benefit from global descriptors without compromising computational efficiency. Our code is available at \href{https://github.com/sontung/descriptor-disambiguation}{https://github.com/sontung/descriptor-disambiguation}.
Related papers
- ImLoc: Revisiting Visual Localization with Image-based Representation [61.282162006394934]
We propose to augment each image with estimated depth maps to capture the geometric structure.<n>This representation is easy to build and maintain, but achieves highest accuracy in challenging conditions.<n>Our method achieves a new state-of-the-art accuracy on various standard benchmarks and outperforms existing memory-efficient methods at comparable map sizes.
arXiv Detail & Related papers (2026-01-07T18:51:51Z) - Robust Scene Coordinate Regression via Geometrically-Consistent Global Descriptors [52.57327385675752]
We propose an aggregator module that learns global descriptors consistent with both geometrical structure and visual similarity.<n>This corrects erroneous associations caused by unreliable overlap scores.<n>Experiments on challenging benchmarks show substantial localization gains in large-scale environments.
arXiv Detail & Related papers (2025-12-19T04:24:03Z) - NeuraLoc: Visual Localization in Neural Implicit Map with Dual Complementary Features [50.212836834889146]
We propose an efficient and novel visual localization approach based on the neural implicit map with complementary features.
Specifically, to enforce geometric constraints and reduce storage requirements, we implicitly learn a 3D keypoint descriptor field.
To further address the semantic ambiguity of descriptors, we introduce additional semantic contextual feature fields.
arXiv Detail & Related papers (2025-03-08T08:04:27Z) - Coupled Laplacian Eigenmaps for Locally-Aware 3D Rigid Point Cloud Matching [0.0]
We propose a new technique, based on graph Laplacian eigenmaps, to match point clouds by taking into account fine local structures.
To deal with the order and sign ambiguity of Laplacian eigenmaps, we introduce a new operator, called Coupled Laplacian.
We show that the similarity between those aligned high-dimensional spaces provides a locally meaningful score to match shapes.
arXiv Detail & Related papers (2024-02-27T10:10:12Z) - Improved Scene Landmark Detection for Camera Localization [11.56648898250606]
Method based on scene landmark detection (SLD) was recently proposed to address these limitations.
It involves training a convolutional neural network (CNN) to detect a few predetermined, salient, scene-specific 3D points or landmarks.
We show that the accuracy gap was due to insufficient model capacity and noisy labels during training.
arXiv Detail & Related papers (2024-01-31T18:59:12Z) - ALSTER: A Local Spatio-Temporal Expert for Online 3D Semantic
Reconstruction [62.599588577671796]
We propose an online 3D semantic segmentation method that incrementally reconstructs a 3D semantic map from a stream of RGB-D frames.
Unlike offline methods, ours is directly applicable to scenarios with real-time constraints, such as robotics or mixed reality.
arXiv Detail & Related papers (2023-11-29T20:30:18Z) - D2S: Representing sparse descriptors and 3D coordinates for camera relocalization [1.2974519529978974]
We propose a learning-based approach to represent complex local descriptors and their scene coordinates.
Our method is characterized by its simplicity and cost-effectiveness.
Our approach outperforms the previous regression-based methods in both indoor and outdoor environments.
arXiv Detail & Related papers (2023-07-28T01:20:12Z) - Yes, we CANN: Constrained Approximate Nearest Neighbors for local
feature-based visual localization [2.915868985330569]
Constrained Approximate Nearest Neighbors (CANN) is a joint solution of k-nearest-neighbors across both the geometry and appearance space using only local features.
Our method significantly outperforms both state-of-the-art global feature-based retrieval and approaches using local feature aggregation schemes.
arXiv Detail & Related papers (2023-06-15T10:12:10Z) - Rapid Person Re-Identification via Sub-space Consistency Regularization [51.76876061721556]
Person Re-Identification (ReID) matches pedestrians across disjoint cameras.
Existing ReID methods adopting real-value feature descriptors have achieved high accuracy, but they are low in efficiency due to the slow Euclidean distance computation.
We propose a novel Sub-space Consistency Regularization (SCR) algorithm that can speed up the ReID procedure by 0.25$ times.
arXiv Detail & Related papers (2022-07-13T02:44:05Z) - LoGG3D-Net: Locally Guided Global Descriptor Learning for 3D Place
Recognition [31.105598103211825]
We show that an additional training signal (local consistency loss) can guide the network to learning local features which are consistent across revisits.
We formulate our approach in an end-to-end trainable architecture called LoGG3D-Net.
arXiv Detail & Related papers (2021-09-17T03:32:43Z) - On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation [83.29404673257328]
Re-localisation benchmarks measure how well each method replicates the results of a reference algorithm.
This begs the question whether the choice of the reference algorithm favours a certain family of re-localisation methods.
This paper analyzes two widely used re-localisation datasets and shows that evaluation outcomes indeed vary with the choice of the reference algorithm.
arXiv Detail & Related papers (2021-09-01T12:01:08Z) - SSC: Semantic Scan Context for Large-Scale Place Recognition [13.228580954956342]
We explore the use of high-level features, namely semantics, to improve the representation ability of descriptors.
We propose a novel global descriptor, Semantic Scan Context, which explores semantic information to represent scenes more effectively.
Our approach outperforms the state-of-the-art methods with a large margin.
arXiv Detail & Related papers (2021-07-01T11:51:19Z) - Efficient Regional Memory Network for Video Object Segmentation [56.587541750729045]
We propose a novel local-to-local matching solution for semi-supervised VOS, namely Regional Memory Network (RMNet)
The proposed RMNet effectively alleviates the ambiguity of similar objects in both memory and query frames.
Experimental results indicate that the proposed RMNet performs favorably against state-of-the-art methods on the DAVIS and YouTube-VOS datasets.
arXiv Detail & Related papers (2021-03-24T02:08:46Z) - Leveraging Local and Global Descriptors in Parallel to Search
Correspondences for Visual Localization [6.326242067588544]
We propose a novel parallel search framework to get nearest neighbor candidates of a query local feature.
We also utilize local descriptors to construct random tree structures for obtaining nearest neighbor candidates of the query local feature.
arXiv Detail & Related papers (2020-09-23T01:49:03Z) - DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF
Relocalization [56.15308829924527]
We propose a Siamese network that jointly learns 3D local feature detection and description directly from raw 3D points.
For detecting 3D keypoints we predict the discriminativeness of the local descriptors in an unsupervised manner.
Experiments on various benchmarks demonstrate that our method achieves competitive results for both global point cloud retrieval and local point cloud registration.
arXiv Detail & Related papers (2020-07-17T20:21:22Z) - D2D: Keypoint Extraction with Describe to Detect Approach [48.0325745125635]
We present a novel approach that exploits the information within the descriptor space to propose keypoint locations.
We propose an approach that inverts this process by first describing and then detecting the keypoint locations.
arXiv Detail & Related papers (2020-05-27T19:27:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.