Intelligent Reference Curation for Visual Place Recognition via Bayesian
Selective Fusion
- URL: http://arxiv.org/abs/2010.09228v2
- Date: Sun, 3 Jan 2021 22:28:28 GMT
- Title: Intelligent Reference Curation for Visual Place Recognition via Bayesian
Selective Fusion
- Authors: Timothy L. Molloy and Tobias Fischer and Michael Milford and Girish N.
Nair
- Abstract summary: Key challenge in visual place recognition is recognizing places despite drastic visual appearance changes.
We propose a novel approach, dubbed Bayesian Selective Fusion, for actively selecting and fusing informative reference images.
Our approach is well suited for long-term robot autonomy where dynamic visual environments are commonplace.
- Score: 24.612272323346144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A key challenge in visual place recognition (VPR) is recognizing places
despite drastic visual appearance changes due to factors such as time of day,
season, weather or lighting conditions. Numerous approaches based on
deep-learnt image descriptors, sequence matching, domain translation, and
probabilistic localization have had success in addressing this challenge, but
most rely on the availability of carefully curated representative reference
images of the possible places. In this paper, we propose a novel approach,
dubbed Bayesian Selective Fusion, for actively selecting and fusing informative
reference images to determine the best place match for a given query image. The
selective element of our approach avoids the counterproductive fusion of every
reference image and enables the dynamic selection of informative reference
images in environments with changing visual conditions (such as indoors with
flickering lights, outdoors during sunshowers or over the day-night cycle). The
probabilistic element of our approach provides a means of fusing multiple
reference images that accounts for their varying uncertainty via a novel
training-free likelihood function for VPR. On difficult query images from two
benchmark datasets, we demonstrate that our approach matches and exceeds the
performance of several alternative fusion approaches along with
state-of-the-art techniques that are provided with prior (unfair) knowledge of
the best reference images. Our approach is well suited for long-term robot
autonomy where dynamic visual environments are commonplace since it is
training-free, descriptor-agnostic, and complements existing techniques such as
sequence matching.
Related papers
- Neural Cover Selection for Image Steganography [7.7961128660417325]
In steganography, selecting an optimal cover image, referred to as cover selection, is pivotal for effective message concealment.
Inspired by recent advancements in generative models, we introduce a novel cover selection framework.
Our method shows significant advantages in message recovery and image quality.
arXiv Detail & Related papers (2024-10-23T18:32:34Z) - Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection [58.228940066769596]
We introduce a Dual-Image Enhanced CLIP approach, leveraging a joint vision-language scoring system.
Our methods process pairs of images, utilizing each as a visual reference for the other, thereby enriching the inference process with visual context.
Our approach significantly exploits the potential of vision-language joint anomaly detection and demonstrates comparable performance with current SOTA methods across various datasets.
arXiv Detail & Related papers (2024-05-08T03:13:20Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - Unsupervised Complementary-aware Multi-process Fusion for Visual Place
Recognition [28.235055888073855]
We propose an unsupervised algorithm that finds the most robust set of VPR techniques to use in the current deployment environment.
The proposed dynamic multi-process fusion (Dyn-MPF) has superior VPR performance compared to a variety of challenging competitive methods.
arXiv Detail & Related papers (2021-12-09T04:57:33Z) - Two-stage Visual Cues Enhancement Network for Referring Image
Segmentation [89.49412325699537]
Referring Image (RIS) aims at segmenting the target object from an image referred by one given natural language expression.
In this paper, we tackle this problem by devising a Two-stage Visual cues enhancement Network (TV-Net)
Through the two-stage enhancement, our proposed TV-Net enjoys better performances in learning fine-grained matching behaviors between the natural language expression and image.
arXiv Detail & Related papers (2021-10-09T02:53:39Z) - Enhance Images as You Like with Unpaired Learning [8.104571453311442]
We propose a lightweight one-path conditional generative adversarial network (cGAN) to learn a one-to-many relation from low-light to normal-light image space.
Our network learns to generate a collection of enhanced images from a given input conditioned on various reference images.
Our model achieves competitive visual and quantitative results on par with fully supervised methods on both noisy and clean datasets.
arXiv Detail & Related papers (2021-10-04T03:00:44Z) - Cross-Modal Retrieval Augmentation for Multi-Modal Classification [61.5253261560224]
We explore the use of unstructured external knowledge sources of images and their corresponding captions for improving visual question answering.
First, we train a novel alignment model for embedding images and captions in the same space, which achieves substantial improvement on image-caption retrieval.
Second, we show that retrieval-augmented multi-modal transformers using the trained alignment model improve results on VQA over strong baselines.
arXiv Detail & Related papers (2021-04-16T13:27:45Z) - Robust Place Recognition using an Imaging Lidar [45.37172889338924]
We propose a methodology for robust, real-time place recognition using an imaging lidar.
Our method is truly-invariant and can tackle reverse revisiting and upside-down revisiting.
arXiv Detail & Related papers (2021-03-03T01:08:31Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.