Weatherproofing Retrieval for Localization with Generative AI and
Geometric Consistency
- URL: http://arxiv.org/abs/2402.09237v1
- Date: Wed, 14 Feb 2024 15:24:20 GMT
- Title: Weatherproofing Retrieval for Localization with Generative AI and
Geometric Consistency
- Authors: Yannis Kalantidis, Mert B\"ulent Sar{\i}y{\i}ld{\i}z, Rafael S.
Rezende, Philippe Weinzaepfel, Diane Larlus, Gabriela Csurka
- Abstract summary: State-of-the-art visual localization approaches rely on a first image retrieval step.
We improve this retrieval step and tailor it to the final localization task.
We experimentally show that those changes translate into large improvements for the most challenging visual localization datasets.
- Score: 32.46493952272438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-of-the-art visual localization approaches generally rely on a first
image retrieval step whose role is crucial. Yet, retrieval often struggles when
facing varying conditions, due to e.g. weather or time of day, with dramatic
consequences on the visual localization accuracy. In this paper, we improve
this retrieval step and tailor it to the final localization task. Among the
several changes we advocate for, we propose to synthesize variants of the
training set images, obtained from generative text-to-image models, in order to
automatically expand the training set towards a number of nameable variations
that particularly hurt visual localization. After expanding the training set,
we propose a training approach that leverages the specificities and the
underlying geometry of this mix of real and synthetic images. We experimentally
show that those changes translate into large improvements for the most
challenging visual localization datasets. Project page:
https://europe.naverlabs.com/ret4loc
Related papers
- CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Learning with Multi-modal Gradient Attention for Explainable Composed
Image Retrieval [15.24270990274781]
We propose a new gradient-attention-based learning objective that explicitly forces the model to focus on the local regions of interest being modified in each retrieval step.
We show how MMGrad can be incorporated into an end-to-end model training strategy with a new learning objective that explicitly forces these MMGrad attention maps to highlight the correct local regions corresponding to the modifier text.
arXiv Detail & Related papers (2023-08-31T11:46:27Z) - Self-Supervised Feature Learning for Long-Term Metric Visual
Localization [16.987148593917905]
We present a novel self-supervised feature learning framework for metric visual localization.
We use a sequence-based image matching algorithm to generate image correspondences without ground-truth labels.
We can then sample image pairs to train a deep neural network that learns sparse features with associated descriptors and scores without ground-truth pose supervision.
arXiv Detail & Related papers (2022-11-30T21:15:05Z) - Pretraining is All You Need for Image-to-Image Translation [59.43151345732397]
We propose to use pretraining to boost general image-to-image translation.
We show that the proposed pretraining-based image-to-image translation (PITI) is capable of synthesizing images of unprecedented realism and faithfulness.
arXiv Detail & Related papers (2022-05-25T17:58:26Z) - Controllable Person Image Synthesis with Spatially-Adaptive Warped
Normalization [72.65828901909708]
Controllable person image generation aims to produce realistic human images with desirable attributes.
We introduce a novel Spatially-Adaptive Warped Normalization (SAWN), which integrates a learned flow-field to warp modulation parameters.
We propose a novel self-training part replacement strategy to refine the pretrained model for the texture-transfer task.
arXiv Detail & Related papers (2021-05-31T07:07:44Z) - Cross-Descriptor Visual Localization and Mapping [81.16435356103133]
Visual localization and mapping is the key technology underlying the majority of Mixed Reality and robotics systems.
We present three novel scenarios for localization and mapping which require the continuous update of feature representations.
Our data-driven approach is agnostic to the feature descriptor type, has low computational requirements, and scales linearly with the number of description algorithms.
arXiv Detail & Related papers (2020-12-02T18:19:51Z) - Domain-invariant Similarity Activation Map Contrastive Learning for
Retrieval-based Long-term Visual Localization [30.203072945001136]
In this work, a general architecture is first formulated probabilistically to extract domain invariant feature through multi-domain image translation.
And then a novel gradient-weighted similarity activation mapping loss (Grad-SAM) is incorporated for finer localization with high accuracy.
Extensive experiments have been conducted to validate the effectiveness of the proposed approach on the CMUSeasons dataset.
Our performance is on par with or even outperforms the state-of-the-art image-based localization baselines in medium or high precision.
arXiv Detail & Related papers (2020-09-16T14:43:22Z) - Look here! A parametric learning based approach to redirect visual
attention [49.609412873346386]
We introduce an automatic method to make an image region more attention-capturing via subtle image edits.
Our model predicts a distinct set of global parametric transformations to be applied to the foreground and background image regions.
Our edits enable inference at interactive rates on any image size, and easily generalize to videos.
arXiv Detail & Related papers (2020-08-12T16:08:36Z) - Generating Person Images with Appearance-aware Pose Stylizer [66.44220388377596]
We present a novel end-to-end framework to generate realistic person images based on given person poses and appearances.
The core of our framework is a novel generator called Appearance-aware Pose Stylizer (APS) which generates human images by coupling the target pose with the conditioned person appearance progressively.
arXiv Detail & Related papers (2020-07-17T15:58:05Z) - Adversarial Transfer of Pose Estimation Regression [11.117357750374035]
We develop a deep adaptation network for learning scene-invariant image representations and use adversarial learning to generate representations for model transfer.
We evaluate our network on two public datasets, Cambridge Landmarks and 7Scene, demonstrate its superiority over several baselines and compare to the state of the art methods.
arXiv Detail & Related papers (2020-06-20T21:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.