SMOL-MapSeg: Show Me One Label
- URL: http://arxiv.org/abs/2508.05501v1
- Date: Thu, 07 Aug 2025 15:36:17 GMT
- Title: SMOL-MapSeg: Show Me One Label
- Authors: Yunshuang Yuan, Frank Thiemann, Thorsten Dahms, Monika Sester,
- Abstract summary: We show that SMOL-MapSeg can accurately segment classes defined by OND knowledge.<n>It can also adapt to unseen classes through few-shot fine-tuning.<n>It outperforms a UNet-based baseline in average segmentation performance.
- Score: 0.4499833362998489
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Historical maps are valuable for studying changes to the Earth's surface. With the rise of deep learning, models like UNet have been used to extract information from these maps through semantic segmentation. Recently, pre-trained foundation models have shown strong performance across domains such as autonomous driving, medical imaging, and industrial inspection. However, they struggle with historical maps. These models are trained on modern or domain-specific images, where patterns can be tied to predefined concepts through common sense or expert knowledge. Historical maps lack such consistency -- similar concepts can appear in vastly different shapes and styles. To address this, we propose On-Need Declarative (OND) knowledge-based prompting, which introduces explicit prompts to guide the model on what patterns correspond to which concepts. This allows users to specify the target concept and pattern during inference (on-need inference). We implement this by replacing the prompt encoder of the foundation model SAM with our OND prompting mechanism and fine-tune it on historical maps. The resulting model is called SMOL-MapSeg (Show Me One Label). Experiments show that SMOL-MapSeg can accurately segment classes defined by OND knowledge. It can also adapt to unseen classes through few-shot fine-tuning. Additionally, it outperforms a UNet-based baseline in average segmentation performance.
Related papers
- DiffuMatch: Category-Agnostic Spectral Diffusion Priors for Robust Non-rigid Shape Matching [53.39693288324375]
We show that both in-network regularization and functional map training can be replaced with data-driven methods.<n>We first train a generative model of functional maps in the spectral domain using score-based generative modeling.<n>We then exploit the resulting model to promote the structural properties of ground truth functional maps on new shape collections.
arXiv Detail & Related papers (2025-07-31T16:44:54Z) - Now you see me! A framework for obtaining class-relevant saliency maps [38.663697418404546]
Saliency maps have been developed to gain understanding into which input features neural networks use for a specific prediction.<n>Although widely employed, these methods often result in overly general saliency maps that fail to identify the specific information that triggered the classification.<n>We suggest a framework that allows to incorporate attributions across classes to arrive at saliency maps that actually capture the class-relevant information.
arXiv Detail & Related papers (2025-03-10T13:59:57Z) - Semantic Segmentation for Sequential Historical Maps by Learning from Only One Map [0.4915744683251151]
We propose an automated approach to digitization using deep-learning-based semantic segmentation.<n>A key challenge in this process is the lack of ground-truth annotations required for training deep neural networks.<n>We introduce a weakly-supervised age-tracing strategy for model fine-tuning.
arXiv Detail & Related papers (2025-01-03T14:55:22Z) - A Top-down Graph-based Tool for Modeling Classical Semantic Maps: A Crosslinguistic Case Study of Supplementary Adverbs [50.982315553104975]
Semantic map models (SMMs) construct a network-like conceptual space from cross-linguistic instances or forms.<n>Most SMMs are manually built by human experts using bottom-up procedures.<n>We propose a novel graph-based algorithm that automatically generates conceptual spaces and SMMs in a top-down manner.
arXiv Detail & Related papers (2024-12-02T12:06:41Z) - TopoSD: Topology-Enhanced Lane Segment Perception with SDMap Prior [70.84644266024571]
We propose to train a perception model to "see" standard definition maps (SDMaps)
We encode SDMap elements into neural spatial map representations and instance tokens, and then incorporate such complementary features as prior information.
Based on the lane segment representation framework, the model simultaneously predicts lanes, centrelines and their topology.
arXiv Detail & Related papers (2024-11-22T06:13:42Z) - DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model [15.803614800117781]
We propose DiffMap, a novel approach specifically designed to model the structured priors of map segmentation masks.
By incorporating this technique, the performance of existing semantic segmentation methods can be significantly enhanced.
Our model demonstrates superior proficiency in generating results that more accurately reflect real-world map layouts.
arXiv Detail & Related papers (2024-05-03T11:16:27Z) - Context-Aware Meta-Learning [52.09326317432577]
We propose a meta-learning algorithm that emulates Large Language Models by learning new visual concepts during inference without fine-tuning.
Our approach exceeds or matches the state-of-the-art algorithm, P>M>F, on 8 out of 11 meta-learning benchmarks.
arXiv Detail & Related papers (2023-10-17T03:35:27Z) - Neural Map Prior for Autonomous Driving [17.198729798817094]
High-definition (HD) semantic maps are crucial in enabling autonomous vehicles to navigate urban environments.
Traditional method of creating offline HD maps involves labor-intensive manual annotation processes.
Recent studies have proposed an alternative approach that generates local maps using online sensor observations.
In this study, we propose Neural Map Prior (NMP), a neural representation of global maps.
arXiv Detail & Related papers (2023-04-17T17:58:40Z) - Open-domain Visual Entity Recognition: Towards Recognizing Millions of
Wikipedia Entities [54.26896306906937]
We present OVEN-Wiki, where a model need to link an image onto a Wikipedia entity with respect to a text query.
We show that a PaLI-based auto-regressive visual recognition model performs surprisingly well, even on Wikipedia entities that have never been seen during fine-tuning.
While PaLI-based models obtain higher overall performance, CLIP-based models are better at recognizing tail entities.
arXiv Detail & Related papers (2023-02-22T05:31:26Z) - Seeing the Un-Scene: Learning Amodal Semantic Maps for Room Navigation [143.6144560164782]
We introduce a learning-based approach for room navigation using semantic maps.
We train a model to generate amodal semantic top-down maps indicating beliefs of location, size, and shape of rooms.
Next, we use these maps to predict a point that lies in the target room and train a policy to navigate to the point.
arXiv Detail & Related papers (2020-07-20T02:19:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.