StructVPR: Distill Structural Knowledge with Weighting Samples for
Visual Place Recognition
- URL: http://arxiv.org/abs/2212.00937v4
- Date: Wed, 22 Mar 2023 00:30:39 GMT
- Title: StructVPR: Distill Structural Knowledge with Weighting Samples for
Visual Place Recognition
- Authors: Yanqing Shen, Sanping Zhou, Jingwen Fu, Ruotong Wang, Shitao Chen, and
Nanning Zheng
- Abstract summary: Visual place recognition (VPR) is usually considered as a specific image retrieval problem.
We propose StructVPR, a novel training architecture for VPR, to enhance structural knowledge in RGB global features.
Ours achieves state-of-the-art performance while maintaining a low computational cost.
- Score: 49.58170209388029
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual place recognition (VPR) is usually considered as a specific image
retrieval problem. Limited by existing training frameworks, most deep
learning-based works cannot extract sufficiently stable global features from
RGB images and rely on a time-consuming re-ranking step to exploit spatial
structural information for better performance. In this paper, we propose
StructVPR, a novel training architecture for VPR, to enhance structural
knowledge in RGB global features and thus improve feature stability in a
constantly changing environment. Specifically, StructVPR uses segmentation
images as a more definitive source of structural knowledge input into a CNN
network and applies knowledge distillation to avoid online segmentation and
inference of seg-branch in testing. Considering that not all samples contain
high-quality and helpful knowledge, and some even hurt the performance of
distillation, we partition samples and weigh each sample's distillation loss to
enhance the expected knowledge precisely. Finally, StructVPR achieves
impressive performance on several benchmarks using only global retrieval and
even outperforms many two-stage approaches by a large margin. After adding
additional re-ranking, ours achieves state-of-the-art performance while
maintaining a low computational cost.
Related papers
- BEV$^2$PR: BEV-Enhanced Visual Place Recognition with Structural Cues [44.96177875644304]
We propose a new image-based visual place recognition (VPR) framework by exploiting the structural cues in bird's-eye view (BEV) from a single camera.
The BEV$2$PR framework generates a composite descriptor with both visual cues and spatial awareness based on a single camera.
arXiv Detail & Related papers (2024-03-11T10:46:43Z) - Human as Points: Explicit Point-based 3D Human Reconstruction from
Single-view RGB Images [78.56114271538061]
We introduce an explicit point-based human reconstruction framework called HaP.
Our approach is featured by fully-explicit point cloud estimation, manipulation, generation, and refinement in the 3D geometric space.
Our results may indicate a paradigm rollback to the fully-explicit and geometry-centric algorithm design.
arXiv Detail & Related papers (2023-11-06T05:52:29Z) - AnyLoc: Towards Universal Visual Place Recognition [12.892386791383025]
Visual Place Recognition (VPR) is vital for robot localization.
Most performant VPR approaches are environment- and task-specific.
We develop a universal solution to VPR -- a technique that works across a broad range of structured and unstructured environments.
arXiv Detail & Related papers (2023-08-01T17:45:13Z) - Structural and Statistical Texture Knowledge Distillation for Semantic
Segmentation [72.67912031720358]
We propose a novel Structural and Statistical Texture Knowledge Distillation (SSTKD) framework for semantic segmentation.
For structural texture knowledge, we introduce a Contourlet Decomposition Module (CDM) that decomposes low-level features.
For statistical texture knowledge, we propose a Denoised Texture Intensity Equalization Module (DTIEM) to adaptively extract and enhance statistical texture knowledge.
arXiv Detail & Related papers (2023-05-06T06:01:11Z) - Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal
Structured Representations [70.41385310930846]
We present an end-to-end framework Structure-CLIP to enhance multi-modal structured representations.
We use scene graphs to guide the construction of semantic negative examples, which results in an increased emphasis on learning structured representations.
A Knowledge-Enhance (KEE) is proposed to leverage SGK as input to further enhance structured representations.
arXiv Detail & Related papers (2023-05-06T03:57:05Z) - High-Fidelity Visual Structural Inspections through Transformers and
Learnable Resizers [2.126862120884775]
Recent advances in unmanned aerial vehicles (UAVs) and artificial intelligence have made the visual inspections faster, safer, and more reliable.
High-resolution segmentation is extremely challenging due to the high computational memory demands.
We propose a hybrid strategy that can adapt to different inspections tasks by managing the global and local semantics trade-off.
arXiv Detail & Related papers (2022-10-21T18:08:26Z) - An Empirical Investigation of Representation Learning for Imitation [76.48784376425911]
Recent work in vision, reinforcement learning, and NLP has shown that auxiliary representation learning objectives can reduce the need for large amounts of expensive, task-specific data.
We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation.
arXiv Detail & Related papers (2022-05-16T11:23:42Z) - Over-and-Under Complete Convolutional RNN for MRI Reconstruction [57.95363471940937]
Recent deep learning-based methods for MR image reconstruction usually leverage a generic auto-encoder architecture.
We propose an Over-and-Under Complete Convolu?tional Recurrent Neural Network (OUCR), which consists of an overcomplete and an undercomplete Convolutional Recurrent Neural Network(CRNN)
The proposed method achieves significant improvements over the compressed sensing and popular deep learning-based methods with less number of trainable parameters.
arXiv Detail & Related papers (2021-06-16T15:56:34Z) - Comparative Code Structure Analysis using Deep Learning for Performance
Prediction [18.226950022938954]
This paper aims to assess the feasibility of using purely static information (e.g., abstract syntax tree or AST) of applications to predict performance change based on the change in code structure.
Our evaluations of several deep embedding learning methods demonstrate that tree-based Long Short-Term Memory (LSTM) models can leverage the hierarchical structure of source-code to discover latent representations and achieve up to 84% (individual problem) and 73% (combined dataset with multiple of problems) accuracy in predicting the change in performance.
arXiv Detail & Related papers (2021-02-12T16:59:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.