SSR: A Generic Framework for Text-Aided Map Compression for Localization
- URL: http://arxiv.org/abs/2603.04272v1
- Date: Wed, 04 Mar 2026 16:55:48 GMT
- Title: SSR: A Generic Framework for Text-Aided Map Compression for Localization
- Authors: Mohammad Omama, Po-han Li, Harsh Goel, Minkyu Choi, Behdad Chalaki, Vaishnav Tadiparthi, Hossein Nourkhiz Mahjoub, Ehsan Moradi Pari, Sandeep P. Chinchali,
- Abstract summary: We propose a text-enhanced compression framework that reduces both memory and bandwidth footprints while retaining high-fidelity localization.<n>Similarity Space Replication learns an adaptive image embedding in one shot that captures only the information "complementary" to the text descriptions.<n>We validate our compression framework on multiple downstream localization tasks, including Visual Place Recognition and object-centric Monte Carlo localization.
- Score: 13.691397425850097
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mapping is crucial in robotics for localization and downstream decision-making. As robots are deployed in ever-broader settings, the maps they rely on continue to increase in size. However, storing these maps indefinitely (cold storage), transferring them across networks, or sending localization queries to cloud-hosted maps imposes prohibitive memory and bandwidth costs. We propose a text-enhanced compression framework that reduces both memory and bandwidth footprints while retaining high-fidelity localization. The key idea is to treat text as an alternative modality: one that can be losslessly compressed with large language models. We propose leveraging lightweight text descriptions combined with very small image feature vectors, which capture "complementary information" as a compact representation for the mapping task. Building on this, our novel technique, Similarity Space Replication (SSR), learns an adaptive image embedding in one shot that captures only the information "complementary" to the text descriptions. We validate our compression framework on multiple downstream localization tasks, including Visual Place Recognition as well as object-centric Monte Carlo localization in both indoor and outdoor settings. SSR achieves 2 times better compression than competing baselines on state-of-the-art datasets, including TokyoVal, Pittsburgh30k, Replica, and KITTI.
Related papers
- ImLoc: Revisiting Visual Localization with Image-based Representation [61.282162006394934]
We propose to augment each image with estimated depth maps to capture the geometric structure.<n>This representation is easy to build and maintain, but achieves highest accuracy in challenging conditions.<n>Our method achieves a new state-of-the-art accuracy on various standard benchmarks and outperforms existing memory-efficient methods at comparable map sizes.
arXiv Detail & Related papers (2026-01-07T18:51:51Z) - CoPatch: Zero-Shot Referring Image Segmentation by Leveraging Untapped Spatial Knowledge in CLIP [26.827036116024914]
textscCoPatch is a zero-shot RIS framework that enhances spatial representations in both text and image modalities.<n>We show that textscCoPatch significantly improves spatial grounding in zero-shot RIS across RefCOCO, RefCOCO+, RefCOCOg, and PhraseCut (+ 2--7 mIoU) without requiring any additional training.
arXiv Detail & Related papers (2025-09-27T04:12:10Z) - A-SCoRe: Attention-based Scene Coordinate Regression for wide-ranging scenarios [1.2093553114715083]
A-ScoRe is an Attention-based model which leverage attention on descriptor map level to produce meaningful and high-semantic 2D descriptors.<n>Results show our methods achieve comparable performance with State-of-the-art methods on multiple benchmark while being light-weighted and much more flexible.
arXiv Detail & Related papers (2025-03-18T07:39:50Z) - R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization [66.87005863868181]
We introduce a covisibility graph-based global encoding learning and data augmentation strategy.<n>We revisit the network architecture and local feature extraction module.<n>Our method achieves state-of-the-art on challenging large-scale datasets without relying on network ensembles or 3D supervision.
arXiv Detail & Related papers (2025-01-02T18:59:08Z) - Map-Assisted Remote-Sensing Image Compression at Extremely Low Bitrates [47.47031054057152]
Generative models have been explored to compress RS images into extremely low-bitrate streams.
These generative models struggle to reconstruct visually plausible images due to the highly ill-posed nature of extremely low-bitrate image compression.
We propose an image compression framework that utilizes a pre-trained diffusion model with powerful natural image priors to achieve high-realism reconstructions.
arXiv Detail & Related papers (2024-09-03T14:29:54Z) - Language-Oriented Semantic Latent Representation for Image Transmission [38.62941652189033]
New paradigm of semantic communication (SC) focuses on delivering meanings behind bits.
Recent advances in data-to-text models facilitate language-oriented SC.
We propose a novel SC framework that communicates both text and a compressed image embedding.
arXiv Detail & Related papers (2024-05-16T10:41:31Z) - NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera
Localization [60.73541222862195]
NeuMap is an end-to-end neural mapping method for camera localization.
It encodes a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels.
arXiv Detail & Related papers (2022-11-21T04:46:22Z) - Learning to Localize Through Compressed Binary Maps [83.03367511221437]
We learn to compress the map representation such that it is optimal for the localization task.
Our experiments show that it is possible to learn a task-specific compression which reduces storage requirements by two orders of magnitude over general-purpose codecs.
arXiv Detail & Related papers (2020-12-20T14:47:15Z) - Cross-Descriptor Visual Localization and Mapping [81.16435356103133]
Visual localization and mapping is the key technology underlying the majority of Mixed Reality and robotics systems.
We present three novel scenarios for localization and mapping which require the continuous update of feature representations.
Our data-driven approach is agnostic to the feature descriptor type, has low computational requirements, and scales linearly with the number of description algorithms.
arXiv Detail & Related papers (2020-12-02T18:19:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.