AnyLoc: Towards Universal Visual Place Recognition
- URL: http://arxiv.org/abs/2308.00688v2
- Date: Wed, 29 Nov 2023 04:44:30 GMT
- Title: AnyLoc: Towards Universal Visual Place Recognition
- Authors: Nikhil Keetha, Avneesh Mishra, Jay Karhade, Krishna Murthy
Jatavallabhula, Sebastian Scherer, Madhava Krishna, Sourav Garg
- Abstract summary: Visual Place Recognition (VPR) is vital for robot localization.
Most performant VPR approaches are environment- and task-specific.
We develop a universal solution to VPR -- a technique that works across a broad range of structured and unstructured environments.
- Score: 12.892386791383025
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Visual Place Recognition (VPR) is vital for robot localization. To date, the
most performant VPR approaches are environment- and task-specific: while they
exhibit strong performance in structured environments (predominantly urban
driving), their performance degrades severely in unstructured environments,
rendering most approaches brittle to robust real-world deployment. In this
work, we develop a universal solution to VPR -- a technique that works across a
broad range of structured and unstructured environments (urban, outdoors,
indoors, aerial, underwater, and subterranean environments) without any
re-training or fine-tuning. We demonstrate that general-purpose feature
representations derived from off-the-shelf self-supervised models with no
VPR-specific training are the right substrate upon which to build such a
universal VPR solution. Combining these derived features with unsupervised
feature aggregation enables our suite of methods, AnyLoc, to achieve up to 4X
significantly higher performance than existing approaches. We further obtain a
6% improvement in performance by characterizing the semantic properties of
these features, uncovering unique domains which encapsulate datasets from
similar environments. Our detailed experiments and analysis lay a foundation
for building VPR solutions that may be deployed anywhere, anytime, and across
anyview. We encourage the readers to explore our project page and interactive
demos: https://anyloc.github.io/.
Related papers
- Flex: End-to-End Text-Instructed Visual Navigation with Foundation Models [59.892436892964376]
We investigate the minimal data requirements and architectural adaptations necessary to achieve robust closed-loop performance with vision-based control policies.
Our findings are synthesized in Flex (Fly-lexically), a framework that uses pre-trained Vision Language Models (VLMs) as frozen patch-wise feature extractors.
We demonstrate the effectiveness of this approach on quadrotor fly-to-target tasks, where agents trained via behavior cloning successfully generalize to real-world scenes.
arXiv Detail & Related papers (2024-10-16T19:59:31Z) - EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition [6.996304653818122]
We propose a simple yet powerful approach to better exploit the potential of a foundation model for Visual Place Recognition.
We first demonstrate that features extracted from self-attention layers can serve as a powerful re-ranker for VPR.
We then demonstrate that a single-stage method leveraging internal ViT layers for pooling can generate global features that achieve state-of-the-art results.
arXiv Detail & Related papers (2024-05-28T11:24:41Z) - FSD V2: Improving Fully Sparse 3D Object Detection with Virtual Voxels [57.05834683261658]
We present FSDv2, an evolution that aims to simplify the previous FSDv1 while eliminating the inductive bias introduced by its handcrafted instance-level representation.
We develop a suite of components to complement the virtual voxel concept, including a virtual voxel encoder, a virtual voxel mixer, and a virtual voxel assignment strategy.
arXiv Detail & Related papers (2023-08-07T17:59:48Z) - A-MuSIC: An Adaptive Ensemble System For Visual Place Recognition In
Changing Environments [22.58641358408613]
Visual place recognition (VPR) is an essential component of robot navigation and localization systems.
No single VPR technique excels in every environmental condition.
adaptive VPR system dubbed Adaptive Multi-Self Identification and Correction (A-MuSIC)
A-MuSIC matches or beats state-of-the-art VPR performance across all tested benchmark datasets.
arXiv Detail & Related papers (2023-03-24T19:25:22Z) - MixVPR: Feature Mixing for Visual Place Recognition [3.6739949215165164]
Visual Place Recognition (VPR) is a crucial part of mobile robotics and autonomous driving.
We introduce MixVPR, a new holistic feature aggregation technique that takes feature maps from pre-trained backbones as a set of global features.
We demonstrate the effectiveness of our technique through extensive experiments on multiple large-scale benchmarks.
arXiv Detail & Related papers (2023-03-03T19:24:03Z) - StructVPR: Distill Structural Knowledge with Weighting Samples for
Visual Place Recognition [49.58170209388029]
Visual place recognition (VPR) is usually considered as a specific image retrieval problem.
We propose StructVPR, a novel training architecture for VPR, to enhance structural knowledge in RGB global features.
Ours achieves state-of-the-art performance while maintaining a low computational cost.
arXiv Detail & Related papers (2022-12-02T02:52:01Z) - SwitchHit: A Probabilistic, Complementarity-Based Switching System for
Improved Visual Place Recognition in Changing Environments [20.917586014941033]
There is no universal VPR technique that can work in all types of environments.
Running multiple VPR techniques in parallel may be prohibitive for resource-constrained embedded platforms.
This paper presents a probabilistic complementarity based switching VPR system, SwitchHit.
arXiv Detail & Related papers (2022-03-01T16:23:22Z) - Semantic Tracklets: An Object-Centric Representation for Visual
Multi-Agent Reinforcement Learning [126.57680291438128]
We study whether scalability can be achieved via a disentangled representation.
We evaluate semantic tracklets' on the visual multi-agent particle environment (VMPE) and on the challenging visual multi-agent GFootball environment.
Notably, this method is the first to successfully learn a strategy for five players in the GFootball environment using only visual data.
arXiv Detail & Related papers (2021-08-06T22:19:09Z) - Demonstration-efficient Inverse Reinforcement Learning in Procedurally
Generated Environments [137.86426963572214]
Inverse Reinforcement Learning can extrapolate reward functions from expert demonstrations.
We show that our approach, DE-AIRL, is demonstration-efficient and still able to extrapolate reward functions which generalize to the fully procedural domain.
arXiv Detail & Related papers (2020-12-04T11:18:02Z) - Shared Space Transfer Learning for analyzing multi-site fMRI data [83.41324371491774]
Multi-voxel pattern analysis (MVPA) learns predictive models from task-based functional magnetic resonance imaging (fMRI) data.
MVPA works best with a well-designed feature set and an adequate sample size.
Most fMRI datasets are noisy, high-dimensional, expensive to collect, and with small sample sizes.
This paper proposes the Shared Space Transfer Learning (SSTL) as a novel transfer learning approach.
arXiv Detail & Related papers (2020-10-24T08:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.