On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation
- URL: http://arxiv.org/abs/2109.00524v1
- Date: Wed, 1 Sep 2021 12:01:08 GMT
- Title: On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation
- Authors: Eric Brachmann, Martin Humenberger, Carsten Rother, Torsten Sattler
- Abstract summary: Re-localisation benchmarks measure how well each method replicates the results of a reference algorithm.
This begs the question whether the choice of the reference algorithm favours a certain family of re-localisation methods.
This paper analyzes two widely used re-localisation datasets and shows that evaluation outcomes indeed vary with the choice of the reference algorithm.
- Score: 83.29404673257328
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Benchmark datasets that measure camera pose accuracy have driven progress in
visual re-localisation research. To obtain poses for thousands of images, it is
common to use a reference algorithm to generate pseudo ground truth. Popular
choices include Structure-from-Motion (SfM) and
Simultaneous-Localisation-and-Mapping (SLAM) using additional sensors like
depth cameras if available. Re-localisation benchmarks thus measure how well
each method replicates the results of the reference algorithm. This begs the
question whether the choice of the reference algorithm favours a certain family
of re-localisation methods. This paper analyzes two widely used re-localisation
datasets and shows that evaluation outcomes indeed vary with the choice of the
reference algorithm. We thus question common beliefs in the re-localisation
literature, namely that learning-based scene coordinate regression outperforms
classical feature-based methods, and that RGB-D-based methods outperform
RGB-based methods. We argue that any claims on ranking re-localisation methods
should take the type of the reference algorithm, and the similarity of the
methods to the reference algorithm, into account.
Related papers
- FUSELOC: Fusing Global and Local Descriptors to Disambiguate 2D-3D Matching in Visual Localization [57.59857784298536]
Direct 2D-3D matching algorithms require significantly less memory but suffer from lower accuracy due to the larger and more ambiguous search space.
We address this ambiguity by fusing local and global descriptors using a weighted average operator within a 2D-3D search framework.
We consistently improve the accuracy over local-only systems and achieve performance close to hierarchical methods while halving memory requirements.
arXiv Detail & Related papers (2024-08-21T23:42:16Z) - RGB-based Category-level Object Pose Estimation via Decoupled Metric
Scale Recovery [72.13154206106259]
We propose a novel pipeline that decouples the 6D pose and size estimation to mitigate the influence of imperfect scales on rigid transformations.
Specifically, we leverage a pre-trained monocular estimator to extract local geometric information.
A separate branch is designed to directly recover the metric scale of the object based on category-level statistics.
arXiv Detail & Related papers (2023-09-19T02:20:26Z) - Hyperspectral Target Detection Based on Low-Rank Background Subspace
Learning and Graph Laplacian Regularization [2.9626402880497267]
Hyperspectral target detection is good at finding dim and small objects based on spectral characteristics.
Existing representation-based methods are hindered by the problem of the unknown background dictionary.
This paper proposes an efficient optimizing approach based on low-rank representation (LRR) and graph Laplacian regularization (GLR)
arXiv Detail & Related papers (2023-06-01T13:51:08Z) - Learning to Localize in Unseen Scenes with Relative Pose Regressors [5.672132510411465]
Relative pose regressors (RPRs) localize a camera by estimating its relative translation and rotation to a pose-labelled reference.
In practice, however, the performance of RPRs is significantly degraded in unseen scenes.
We implement aggregation with concatenation, projection, and attention operations (Transformers) and learn to regress the relative pose parameters from the resulting latent codes.
Compared to state-of-the-art RPRs, our model is shown to localize significantly better in unseen environments, across both indoor and outdoor benchmarks, while maintaining competitive performance in seen scenes.
arXiv Detail & Related papers (2023-03-05T17:12:50Z) - ImPosIng: Implicit Pose Encoding for Efficient Camera Pose Estimation [2.6808541153140077]
Implicit Pose.
(ImPosing) embeds images and camera poses into a common latent representation with 2 separate neural networks.
By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but refined.
arXiv Detail & Related papers (2022-05-05T13:33:25Z) - Deep Metric Learning for Ground Images [4.864819846886142]
We deal with the initial localization task, in which we have no prior knowledge about the current robot positioning.
We propose a deep metric learning approach that retrieves the most similar reference images to the query image.
In contrast to existing approaches to image retrieval for ground images, our approach achieves significantly better recall performance and improves the localization performance of a state-of-the-art ground texture based localization method.
arXiv Detail & Related papers (2021-09-03T14:43:59Z) - Recall@k Surrogate Loss with Large Batches and Similarity Mixup [62.67458021725227]
Direct optimization, by gradient descent, of an evaluation metric is not possible when it is non-differentiable.
In this work, a differentiable surrogate loss for the recall is proposed.
The proposed method achieves state-of-the-art results in several image retrieval benchmarks.
arXiv Detail & Related papers (2021-08-25T11:09:11Z) - LM-Reloc: Levenberg-Marquardt Based Direct Visual Relocalization [54.77498358487812]
LM-Reloc is a novel approach for visual relocalization based on direct image alignment.
We propose a loss formulation inspired by the classical Levenberg-Marquardt algorithm to train LM-Net.
arXiv Detail & Related papers (2020-10-13T12:15:20Z) - Making Affine Correspondences Work in Camera Geometry Computation [62.7633180470428]
Local features provide region-to-region rather than point-to-point correspondences.
We propose guidelines for effective use of region-to-region matches in the course of a full model estimation pipeline.
Experiments show that affine solvers can achieve accuracy comparable to point-based solvers at faster run-times.
arXiv Detail & Related papers (2020-07-20T12:07:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.