ConvSequential-SLAM: A Sequence-based, Training-less Visual Place
Recognition Technique for Changing Environments
- URL: http://arxiv.org/abs/2009.13454v1
- Date: Mon, 28 Sep 2020 16:31:29 GMT
- Title: ConvSequential-SLAM: A Sequence-based, Training-less Visual Place
Recognition Technique for Changing Environments
- Authors: Mihnea-Alexandru Tomit\u{a}, Mubariz Zaffar, Michael Milford, Klaus
McDonald-Maier and Shoaib Ehsan
- Abstract summary: Visual Place Recognition (VPR) is the ability to correctly recall a previously visited place under changing viewpoints and appearances.
We present a new handcrafted VPR technique that achieves state-of-the-art place matching performance under challenging conditions.
- Score: 19.437998213418446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Place Recognition (VPR) is the ability to correctly recall a
previously visited place under changing viewpoints and appearances. A large
number of handcrafted and deep-learning-based VPR techniques exist, where the
former suffer from appearance changes and the latter have significant
computational needs. In this paper, we present a new handcrafted VPR technique
that achieves state-of-the-art place matching performance under challenging
conditions. Our technique combines the best of 2 existing trainingless VPR
techniques, SeqSLAM and CoHOG, which are each robust to conditional and
viewpoint changes, respectively. This blend, namely ConvSequential-SLAM,
utilises sequential information and block-normalisation to handle appearance
changes, while using regional-convolutional matching to achieve
viewpoint-invariance. We analyse content-overlap in-between query frames to
find a minimum sequence length, while also re-using the image entropy
information for environment-based sequence length tuning. State-of-the-art
performance is reported in contrast to 8 contemporary VPR techniques on 4
public datasets. Qualitative insights and an ablation study on sequence length
are also provided.
Related papers
- Context-Based Visual-Language Place Recognition [4.737519767218666]
A popular approach to vision-based place recognition relies on low-level visual features.
We introduce a novel VPR approach that remains robust to scene changes and does not require additional training.
Our method constructs semantic image descriptors by extracting pixel-level embeddings using a zero-shot, language-driven semantic segmentation model.
arXiv Detail & Related papers (2024-10-25T06:59:11Z) - Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation [72.90144343056227]
We explore the visual representations produced from a pre-trained text-to-video (T2V) diffusion model for video understanding tasks.
We introduce a novel framework, termed "VD-IT", tailored with dedicatedly designed components built upon a fixed T2V model.
Our VD-IT achieves highly competitive results, surpassing many existing state-of-the-art methods.
arXiv Detail & Related papers (2024-03-18T17:59:58Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition [72.35438297011176]
We propose a novel method to realize seamless adaptation of pre-trained models for visual place recognition (VPR)
Specifically, to obtain both global and local features that focus on salient landmarks for discriminating places, we design a hybrid adaptation method.
Experimental results show that our method outperforms the state-of-the-art methods with less training data and training time.
arXiv Detail & Related papers (2024-02-22T12:55:01Z) - Multi-Technique Sequential Information Consistency For Dynamic Visual
Place Recognition In Changing Environments [23.33092172788319]
Visual place recognition (VPR) is an essential component of robot navigation and localization systems.
No single VPR technique excels in every environmental condition.
We propose a VPR system dubbed Multi-Sequential Information Consistency (MuSIC)
arXiv Detail & Related papers (2024-01-16T10:35:01Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - Condition-Invariant Semantic Segmentation [77.10045325743644]
We implement Condition-Invariant Semantic (CISS) on the current state-of-the-art domain adaptation architecture.
Our method achieves the second-best performance on the normal-to-adverse Cityscapes$to$ACDC benchmark.
CISS is shown to generalize well to domains unseen during training, such as BDD100K-night and ACDC-night.
arXiv Detail & Related papers (2023-05-27T03:05:07Z) - A-MuSIC: An Adaptive Ensemble System For Visual Place Recognition In
Changing Environments [22.58641358408613]
Visual place recognition (VPR) is an essential component of robot navigation and localization systems.
No single VPR technique excels in every environmental condition.
adaptive VPR system dubbed Adaptive Multi-Self Identification and Correction (A-MuSIC)
A-MuSIC matches or beats state-of-the-art VPR performance across all tested benchmark datasets.
arXiv Detail & Related papers (2023-03-24T19:25:22Z) - COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for
Cross-Modal Retrieval [59.15034487974549]
We propose a novel COllaborative Two-Stream vision-language pretraining model termed COTS for image-text retrieval.
Our COTS achieves the highest performance among all two-stream methods and comparable performance with 10,800X faster in inference.
Importantly, our COTS is also applicable to text-to-video retrieval, yielding new state-ofthe-art on the widely-used MSR-VTT dataset.
arXiv Detail & Related papers (2022-04-15T12:34:47Z) - STA-VPR: Spatio-temporal Alignment for Visual Place Recognition [17.212503755962757]
We propose an adaptive dynamic time warping algorithm to align local features from the spatial domain while measuring the distance between two images.
A local matching DTW algorithm is applied to perform image sequence matching based on temporal alignment.
The results show that the proposed method significantly improves the CNN-based methods.
arXiv Detail & Related papers (2021-03-25T03:27:42Z) - Early Bird: Loop Closures from Opposing Viewpoints for
Perceptually-Aliased Indoor Environments [35.663671249819124]
We present novel research that simultaneously addresses viewpoint change and perceptual aliasing.
We show that our integration of VPR with SLAM significantly boosts the performance of VPR, feature correspondence, and pose graph submodules.
For the first time, we demonstrate a localization system capable of state-of-the-art performance despite perceptual aliasing and extreme 180-degree-rotated viewpoint change.
arXiv Detail & Related papers (2020-10-03T20:18:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.