CoPR: Towards Accurate Visual Localization With Continuous
Place-descriptor Regression
- URL: http://arxiv.org/abs/2304.07426v1
- Date: Fri, 14 Apr 2023 23:17:44 GMT
- Title: CoPR: Towards Accurate Visual Localization With Continuous
Place-descriptor Regression
- Authors: Mubariz Zaffar, Liangliang Nan, Julian Francisco Pieter Kooij
- Abstract summary: Visual Place Recognition (VPR) estimates the camera location of a query image by retrieving the most similar reference image from a map of geo-tagged reference images.
References for VPR are only available at sparse poses in a map, which enforces an upper bound on the maximum achievable localization accuracy.
We propose Continuous Place-descriptor Regression (CoPR) to densify the map and improve localization accuracy.
- Score: 2.7393821783237184
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual Place Recognition (VPR) is an image-based localization method that
estimates the camera location of a query image by retrieving the most similar
reference image from a map of geo-tagged reference images. In this work, we
look into two fundamental bottlenecks for its localization accuracy: reference
map sparseness and viewpoint invariance. Firstly, the reference images for VPR
are only available at sparse poses in a map, which enforces an upper bound on
the maximum achievable localization accuracy through VPR. We therefore propose
Continuous Place-descriptor Regression (CoPR) to densify the map and improve
localization accuracy. We study various interpolation and extrapolation models
to regress additional VPR feature descriptors from only the existing
references. Secondly, we compare different feature encoders and show that CoPR
presents value for all of them. We evaluate our models on three existing public
datasets and report on average around 30% improvement in VPR-based localization
accuracy using CoPR, on top of the 15% increase by using a viewpoint-variant
loss for the feature encoder. The complementary relation between CoPR and
Relative Pose Estimation is also discussed.
Related papers
- Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition [9.192660643226372]
We formulate how limitations in the Geographic Distance Sensitivity of current VPR embeddings result in a high probability of incorrectly sorting the top-k retrievals.
We propose a novel mining strategy, CliqueMining, that selects positive and negative examples by sampling cliques from a graph of visually similar images.
Our approach boosts the sensitivity of VPR embeddings at small distance ranges, significantly improving the state of the art on relevant benchmarks.
arXiv Detail & Related papers (2024-07-02T16:49:01Z) - Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.
VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.
Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z) - NYC-Indoor-VPR: A Long-Term Indoor Visual Place Recognition Dataset with Semi-Automatic Annotation [7.037667953803237]
This paper introduces the NYC-Indoor-VPR dataset, a unique and rich collection of over 36,000 images compiled from 13 distinct crowded scenes in New York City.
To establish the ground truth for VPR, we propose a semiautomatic annotation approach that computes the positional information of each image.
Our method specifically takes pairs of videos as input and yields matched pairs of images along with their estimated relative locations.
arXiv Detail & Related papers (2024-03-31T00:20:53Z) - Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - CPR++: Object Localization via Single Coarse Point Supervision [55.8671776333499]
coarse point refinement (CPR) is first attempt to alleviate semantic variance from an algorithmic perspective.
CPR reduces semantic variance by selecting a semantic centre point in a neighbourhood region to replace the initial annotated point.
CPR++ can obtain scale information and further reduce the semantic variance in a global region.
arXiv Detail & Related papers (2024-01-30T17:38:48Z) - $R^{2}$Former: Unified $R$etrieval and $R$eranking Transformer for Place
Recognition [92.56937383283397]
We propose a unified place recognition framework that handles both retrieval and reranking.
The proposed reranking module takes feature correlation, attention value, and xy coordinates into account.
$R2$Former significantly outperforms state-of-the-art methods on major VPR datasets.
arXiv Detail & Related papers (2023-04-06T23:19:32Z) - Data-efficient Large Scale Place Recognition with Graded Similarity
Supervision [10.117451511942267]
Visual place recognition (VPR) is a fundamental task of computer vision for visual localization.
Existing methods are trained using image pairs that either depict the same place or not.
We deploy an automatic re-annotation strategy to re-label VPR datasets.
We propose a new Generalized Contrastive Loss (GCL) that uses graded similarity labels for training contrastive networks.
arXiv Detail & Related papers (2023-03-21T10:56:57Z) - Learning to Localize in Unseen Scenes with Relative Pose Regressors [5.672132510411465]
Relative pose regressors (RPRs) localize a camera by estimating its relative translation and rotation to a pose-labelled reference.
In practice, however, the performance of RPRs is significantly degraded in unseen scenes.
We implement aggregation with concatenation, projection, and attention operations (Transformers) and learn to regress the relative pose parameters from the resulting latent codes.
Compared to state-of-the-art RPRs, our model is shown to localize significantly better in unseen environments, across both indoor and outdoor benchmarks, while maintaining competitive performance in seen scenes.
arXiv Detail & Related papers (2023-03-05T17:12:50Z) - ImPosIng: Implicit Pose Encoding for Efficient Camera Pose Estimation [2.6808541153140077]
Implicit Pose.
(ImPosing) embeds images and camera poses into a common latent representation with 2 separate neural networks.
By evaluating candidates through the latent space in a hierarchical manner, the camera position and orientation are not directly regressed but refined.
arXiv Detail & Related papers (2022-05-05T13:33:25Z) - On the Limits of Pseudo Ground Truth in Visual Camera Re-localisation [83.29404673257328]
Re-localisation benchmarks measure how well each method replicates the results of a reference algorithm.
This begs the question whether the choice of the reference algorithm favours a certain family of re-localisation methods.
This paper analyzes two widely used re-localisation datasets and shows that evaluation outcomes indeed vary with the choice of the reference algorithm.
arXiv Detail & Related papers (2021-09-01T12:01:08Z) - Graph-PCNN: Two Stage Human Pose Estimation with Graph Pose Refinement [54.29252286561449]
We propose a two-stage graph-based and model-agnostic framework, called Graph-PCNN.
In the first stage, heatmap regression network is applied to obtain a rough localization result, and a set of proposal keypoints, called guided points, are sampled.
In the second stage, for each guided point, different visual feature is extracted by the localization.
The relationship between guided points is explored by the graph pose refinement module to get more accurate localization results.
arXiv Detail & Related papers (2020-07-21T04:59:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.