MultiRes-NetVLAD: Augmenting Place Recognition Training with
Low-Resolution Imagery
- URL: http://arxiv.org/abs/2202.09146v1
- Date: Fri, 18 Feb 2022 11:53:01 GMT
- Title: MultiRes-NetVLAD: Augmenting Place Recognition Training with
Low-Resolution Imagery
- Authors: Ahmad Khaliq, Michael Milford and Sourav Garg
- Abstract summary: We augment NetVLAD representation learning with low-resolution image pyramid encoding.
The resultant multi-resolution feature pyramid can be conveniently aggregated through VLAD into a single compact representation.
We show that the underlying learnt feature tensor can be combined with existing multi-scale approaches to improve their baseline performance.
- Score: 28.875236694573815
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Place Recognition (VPR) is a crucial component of 6-DoF localization,
visual SLAM and structure-from-motion pipelines, tasked to generate an initial
list of place match hypotheses by matching global place descriptors. However,
commonly-used CNN-based methods either process multiple image resolutions after
training or use a single resolution and limit multi-scale feature extraction to
the last convolutional layer during training. In this paper, we augment NetVLAD
representation learning with low-resolution image pyramid encoding which leads
to richer place representations. The resultant multi-resolution feature pyramid
can be conveniently aggregated through VLAD into a single compact
representation, avoiding the need for concatenation or summation of multiple
patches in recent multi-scale approaches. Furthermore, we show that the
underlying learnt feature tensor can be combined with existing multi-scale
approaches to improve their baseline performance. Evaluation on 15
viewpoint-varying and viewpoint-consistent benchmarking datasets confirm that
the proposed MultiRes-NetVLAD leads to state-of-the-art Recall@N performance
for global descriptor based retrieval, compared against 11 existing techniques.
Source code is publicly available at
https://github.com/Ahmedest61/MultiRes-NetVLAD.
Related papers
- Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control [66.78146440275093]
Learned retrieval (LSR) is a family of neural methods that encode queries and documents into sparse lexical vectors.
We explore the application of LSR to the multi-modal domain, with a focus on text-image retrieval.
Current approaches like LexLIP and STAIR require complex multi-step training on massive datasets.
Our proposed approach efficiently transforms dense vectors from a frozen dense model into sparse lexical vectors.
arXiv Detail & Related papers (2024-02-27T14:21:56Z) - Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network.
It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification.
Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z) - Optimal Transport Aggregation for Visual Place Recognition [9.192660643226372]
We introduce SALAD, which reformulates NetVLAD's soft-assignment of local features to clusters as an optimal transport problem.
In SALAD, we consider both feature-to-cluster and cluster-to-feature relations and we also introduce a 'dustbin' cluster, designed to selectively discard features deemed non-informative.
Our single-stage method surpasses single-stage baselines in public VPR datasets, but also surpasses two-stage methods that add a re-ranking with significantly higher cost.
arXiv Detail & Related papers (2023-11-27T15:46:19Z) - ClusVPR: Efficient Visual Place Recognition with Clustering-based
Weighted Transformer [13.0858576267115]
We present ClusVPR, a novel approach that tackles the specific issues of redundant information in duplicate regions and representations of small objects.
ClusVPR introduces a unique paradigm called Clustering-based weighted Transformer Network (CWTNet)
We also introduce the optimized-VLAD layer that significantly reduces the number of parameters and enhances model efficiency.
arXiv Detail & Related papers (2023-10-06T09:01:15Z) - RIAV-MVS: Recurrent-Indexing an Asymmetric Volume for Multi-View Stereo [20.470182157606818]
"Learning-to-optimize" paradigm iteratively indexes a plane-sweeping cost volume and regresses the depth map via a convolutional Gated Recurrent Unit (GRU)
We conduct extensive experiments on real-world MVS datasets and show that our method achieves state-of-the-art performance in terms of both within-dataset evaluation and cross-dataset generalization.
arXiv Detail & Related papers (2022-05-28T03:32:56Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Reuse your features: unifying retrieval and feature-metric alignment [3.845387441054033]
DRAN is the first network able to produce the features for the three steps of visual localization.
It achieves competitive performance in terms of robustness and accuracy under challenging conditions in public benchmarks.
arXiv Detail & Related papers (2022-04-13T10:42:00Z) - Multi-level Second-order Few-shot Learning [111.0648869396828]
We propose a Multi-level Second-order (MlSo) few-shot learning network for supervised or unsupervised few-shot image classification and few-shot action recognition.
We leverage so-called power-normalized second-order base learner streams combined with features that express multiple levels of visual abstraction.
We demonstrate respectable results on standard datasets such as Omniglot, mini-ImageNet, tiered-ImageNet, Open MIC, fine-grained datasets such as CUB Birds, Stanford Dogs and Cars, and action recognition datasets such as HMDB51, UCF101, and mini-MIT.
arXiv Detail & Related papers (2022-01-15T19:49:00Z) - HAT: Hierarchical Aggregation Transformers for Person Re-identification [87.02828084991062]
We take advantages of both CNNs and Transformers for image-based person Re-ID with high performance.
Work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID.
arXiv Detail & Related papers (2021-07-13T09:34:54Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.