Related papers: Structured Pruning for Efficient Visual Place Recognition

Structured Pruning for Efficient Visual Place Recognition

URL: http://arxiv.org/abs/2409.07834v1
Date: Thu, 12 Sep 2024 08:32:25 GMT
Title: Structured Pruning for Efficient Visual Place Recognition
Authors: Oliver Grainge, Michael Milford, Indu Bodala, Sarvapali D. Ramchurn, Shoaib Ehsan,
Abstract summary: Visual Place Recognition (VPR) is fundamental for the global re-localization of robots and devices. Our work introduces a novel structured pruning method to streamline common VPR architectures. This dual focus significantly enhances the efficiency of the system, reducing both map and model memory requirements and decreasing feature extraction and retrieval latencies.
Score: 24.433604332415204
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual Place Recognition (VPR) is fundamental for the global re-localization of robots and devices, enabling them to recognize previously visited locations based on visual inputs. This capability is crucial for maintaining accurate mapping and localization over large areas. Given that VPR methods need to operate in real-time on embedded systems, it is critical to optimize these systems for minimal resource consumption. While the most efficient VPR approaches employ standard convolutional backbones with fixed descriptor dimensions, these often lead to redundancy in the embedding space as well as in the network architecture. Our work introduces a novel structured pruning method, to not only streamline common VPR architectures but also to strategically remove redundancies within the feature embedding space. This dual focus significantly enhances the efficiency of the system, reducing both map and model memory requirements and decreasing feature extraction and retrieval latencies. Our approach has reduced memory usage and latency by 21% and 16%, respectively, across models, while minimally impacting recall@1 accuracy by less than 1%. This significant improvement enhances real-time applications on edge devices with negligible accuracy loss.

Related papers

Generative MIMO Beam Map Construction for Location Recovery and Beam Tracking [67.65578956523403]
This paper proposes a generative framework to recover location labels directly from sparse channel state information (CSI) measurements.<n>Instead of directly storing raw CSI, we learn a compact low-dimensional radio map embedding and leverage a generative model to reconstruct the high-dimensional CSI.<n> Numerical experiments demonstrate that the proposed model can improve localization accuracy by over 30% and achieve a 20% capacity gain in non-line-of-sight (NLOS) scenarios.
arXiv Detail & Related papers (2025-11-21T07:25:49Z)
EmbodiedPlace: Learning Mixture-of-Features with Embodied Constraints for Visual Place Recognition [9.75969669445091]
Visual Place Recognition (VPR) is a scene-oriented image retrieval problem in computer vision.<n>We propose a novel, simple re-ranking method that refines global features through a Mixture-of-Features (MoF) approach under embodied constraints.
arXiv Detail & Related papers (2025-06-16T06:40:12Z)
Any Image Restoration via Efficient Spatial-Frequency Degradation Adaptation [158.37640586809187]
Restoring any degraded image efficiently via just one model has become increasingly significant. Our approach, termed AnyIR, takes a unified path that leverages inherent similarity across various degradations. To fuse the degradation awareness and the contextualized attention, a spatial-frequency parallel fusion strategy is proposed.
arXiv Detail & Related papers (2025-04-19T09:54:46Z)
SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition [69.58329995485158]
Recent studies show that the visual place recognition (VPR) method using pre-trained visual foundation models can achieve promising performance. We propose a novel method to realize seamless adaptation of foundation models to VPR. In pursuit of higher efficiency and better performance, we propose an extension of the SelaVPR, called SelaVPR++.
arXiv Detail & Related papers (2025-02-23T15:01:09Z)
Any Image Restoration with Efficient Automatic Degradation Adaptation [132.81912195537433]
We propose a unified manner to achieve joint embedding by leveraging the inherent similarities across various degradations for efficient and comprehensive restoration. Our network sets new SOTA records while reducing model complexity by approximately -82% in trainable parameters and -85% in FLOPs.
arXiv Detail & Related papers (2024-07-18T10:26:53Z)
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models [96.76995840807615]
HiRes-LLaVA is a novel framework designed to process any size of high-resolution input without altering the original contextual and geometric information. HiRes-LLaVA comprises two innovative components: (i) a SliceRestore adapter that reconstructs sliced patches into their original form, efficiently extracting both global and local features via down-up-sampling and convolution layers, and (ii) a Self-Mining Sampler to compress the vision tokens based on themselves.
arXiv Detail & Related papers (2024-07-11T17:42:17Z)
EfficientMorph: Parameter-Efficient Transformer-Based Architecture for 3D Image Registration [1.741980945827445]
We propose EfficientMorph, a transformer-based architecture for unsupervised 3D image registration. It optimize the balance between local and global attention through a plane-based attention mechanism. It reduces computational redundancy via cascaded group attention, and captures fine details without compromising computational efficiency.
arXiv Detail & Related papers (2024-03-16T22:01:55Z)
Local positional graphs and attentive local features for a data and runtime-efficient hierarchical place recognition pipeline [11.099588962062937]
This paper proposes a runtime and data-efficient hierarchical VPR pipeline that extends existing approaches and presents novel ideas. First, we propose Local Positional Graphs (LPG), a training-free and runtime-efficient approach to encode spatial context information of local image features. Second, we present Attentive Local SPED (ATLAS), an extension of our previous local features approach with an attention module. Third, we present a hierarchical pipeline that exploits hyperdimensional computing to use the same local features as holistic HDC-descriptors for fast candidate selection and for candidate reranking.
arXiv Detail & Related papers (2024-03-15T13:26:39Z)
Deep Homography Estimation for Visual Place Recognition [49.235432979736395]
We propose a transformer-based deep homography estimation (DHE) network. It takes the dense feature map extracted by a backbone network as input and fits homography for fast and learnable geometric verification. Experiments on benchmark datasets show that our method can outperform several state-of-the-art methods.
arXiv Detail & Related papers (2024-02-25T13:22:17Z)
Design Space Exploration of Low-Bit Quantized Neural Networks for Visual Place Recognition [26.213493552442102]
Visual Place Recognition (VPR) is a critical task for performing global re-localization in visual perception systems. Recently new works have focused on the recall@1 metric as a performance measure with limited focus on resource utilization. This has resulted in methods that use deep learning models too large to deploy on low powered edge devices. We study the impact of compact convolutional network architecture design in combination with full-precision and mixed-precision post-training quantization on VPR performance.
arXiv Detail & Related papers (2023-12-14T15:24:42Z)
DETR Doesn't Need Multi-Scale or Locality Design [69.56292005230185]
This paper presents an improved DETR detector that maintains a "plain" nature. It uses a single-scale feature map and global cross-attention calculations without specific locality constraints. We show that two simple technologies are surprisingly effective within a plain design to compensate for the lack of multi-scale feature maps and locality constraints.
arXiv Detail & Related papers (2023-08-03T17:59:04Z)
Iterative Soft Shrinkage Learning for Efficient Image Super-Resolution [91.3781512926942]
Image super-resolution (SR) has witnessed extensive neural network designs from CNN to transformer architectures. This work investigates the potential of network pruning for super-resolution iteration to take advantage of off-the-shelf network designs and reduce the underlying computational overhead. We propose a novel Iterative Soft Shrinkage-Percentage (ISS-P) method by optimizing the sparse structure of a randomly network at each and tweaking unimportant weights with a small amount proportional to the magnitude scale on-the-fly.
arXiv Detail & Related papers (2023-03-16T21:06:13Z)
Improving Worst Case Visual Localization Coverage via Place-specific Sub-selection in Multi-camera Systems [29.519262914510396]
6-DoF visual localization systems utilize principled approaches rooted in 3D geometry to perform accurate camera pose estimation of images to a map. We demonstrate substantially improved worst-case localization performance compared to using off-the-shelf pipelines. Our proposed approach is particularly applicable to the crowdsharing model of autonomous vehicle deployment.
arXiv Detail & Related papers (2022-06-28T10:59:39Z)
FasterPose: A Faster Simple Baseline for Human Pose Estimation [65.8413964785972]
We propose a design paradigm for cost-effective network with LR representation for efficient pose estimation, named FasterPose. We study the training behavior of FasterPose, and formulate a novel regressive cross-entropy (RCE) loss function for accelerating the convergence. Compared with the previously dominant network of pose estimation, our method reduces 58% of the FLOPs and simultaneously gains 1.3% improvement of accuracy.
arXiv Detail & Related papers (2021-07-07T13:39:08Z)
Binary Neural Networks for Memory-Efficient and Effective Visual Place Recognition in Changing Environments [24.674034243725455]
Visual place recognition (VPR) is a robot's ability to determine whether a place was visited before using visual data. CNN-based approaches are unsuitable for resource-constrained platforms, such as small robots and drones. We propose a new class of highly compact models that drastically reduces the memory requirements and computational effort.
arXiv Detail & Related papers (2020-10-01T22:59:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.