Automatic Uncertainty-Aware Synthetic Data Bootstrapping for Historical Map Segmentation
- URL: http://arxiv.org/abs/2511.15875v1
- Date: Wed, 19 Nov 2025 21:02:40 GMT
- Title: Automatic Uncertainty-Aware Synthetic Data Bootstrapping for Historical Map Segmentation
- Authors: Lukas Arzoumanidis, Julius Knechtel, Jan-Henrik Haunert, Youness Dehbi,
- Abstract summary: Most deep learning-based methods rely on large amounts of annotated training data, which are typically unavailable for historical maps.<n>We bootstrap an effectively unlimited number of synthetic historical maps suitable for tasks such as land-cover interpretation.<n>To evaluate the effectiveness and applicability of our approach, the generated training datasets were employed for domain-adaptive semantic segmentation.
- Score: 0.48998185508205744
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The automated analysis of historical documents, particularly maps, has drastically benefited from advances in deep learning and its success across various computer vision applications. However, most deep learning-based methods heavily rely on large amounts of annotated training data, which are typically unavailable for historical maps, especially for those belonging to specific, homogeneous cartographic domains, also known as corpora. Creating high-quality training data suitable for machine learning often takes a significant amount of time and involves extensive manual effort. While synthetic training data can alleviate the scarcity of real-world samples, it often lacks the affinity (realism) and diversity (variation) necessary for effective learning. By transferring the cartographic style of an original historical map corpus onto vector data, we bootstrap an effectively unlimited number of synthetic historical maps suitable for tasks such as land-cover interpretation of a homogeneous historical map corpus. We propose an automatic deep generative approach and a alternative manual stochastic degradation technique to emulate the visual uncertainty and noise, also known as data-dependent uncertainty, commonly observed in historical map scans. To quantitatively evaluate the effectiveness and applicability of our approach, the generated training datasets were employed for domain-adaptive semantic segmentation on a homogeneous map corpus using a Self-Constructing Graph Convolutional Network, enabling a comprehensive assessment of the impact of our data bootstrapping methods.
Related papers
- Selective Embedding for Deep Learning [0.4499833362998489]
Deep learning algorithms are sensitive to input data, and performance often deteriorates under nonstationary conditions.<n>This study introduces selective embedding, a novel data loading strategy, which alternates short segments of data from multiple sources within a single input channel.
arXiv Detail & Related papers (2025-07-16T15:45:01Z) - Semantic Segmentation for Sequential Historical Maps by Learning from Only One Map [0.4915744683251151]
We propose an automated approach to digitization using deep-learning-based semantic segmentation.<n>A key challenge in this process is the lack of ground-truth annotations required for training deep neural networks.<n>We introduce a weakly-supervised age-tracing strategy for model fine-tuning.
arXiv Detail & Related papers (2025-01-03T14:55:22Z) - Sexism Detection on a Data Diet [14.899608305188002]
We show how we can leverage influence scores to estimate the importance of a data point while training a model.
We evaluate the model performance trained on data pruned with different pruning strategies on three out-of-domain datasets.
arXiv Detail & Related papers (2024-06-07T12:39:54Z) - Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution [62.71425232332837]
We show that training amortized models with noisy labels is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - Terrain-Informed Self-Supervised Learning: Enhancing Building Footprint Extraction from LiDAR Data with Limited Annotations [1.3243401820948064]
Building footprint maps offer promise of precise footprint extraction without extensive post-processing.
Deep learning methods face challenges in generalization and label efficiency.
We propose terrain-aware self-supervised learning tailored to remote sensing.
arXiv Detail & Related papers (2023-11-02T12:34:23Z) - Neural Semantic Surface Maps [52.61017226479506]
We present an automated technique for computing a map between two genus-zero shapes, which matches semantically corresponding regions to one another.
Our approach can generate semantic surface-to-surface maps, eliminating manual annotations or any 3D training data requirement.
arXiv Detail & Related papers (2023-09-09T16:21:56Z) - Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep
Learning [77.34726150561087]
We propose an approach for creating a multi-modal and large-temporal dataset comprised of publicly available Remote Sensing data.
We use Convolutional Neural Networks (CNN) models that are capable of separating different classes of vegetation.
arXiv Detail & Related papers (2022-09-28T18:51:59Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - Progressive Spatio-Temporal Bilinear Network with Monte Carlo Dropout
for Landmark-based Facial Expression Recognition with Uncertainty Estimation [93.73198973454944]
The performance of our method is evaluated on three widely used datasets.
It is comparable to that of video-based state-of-the-art methods while it has much less complexity.
arXiv Detail & Related papers (2021-06-08T13:40:30Z) - Dataset Cartography: Mapping and Diagnosing Datasets with Training
Dynamics [118.75207687144817]
We introduce Data Maps, a model-based tool to characterize and diagnose datasets.
We leverage a largely ignored source of information: the behavior of the model on individual instances during training.
Our results indicate that a shift in focus from quantity to quality of data could lead to robust models and improved out-of-distribution generalization.
arXiv Detail & Related papers (2020-09-22T20:19:41Z) - Dataset Condensation with Gradient Matching [36.14340188365505]
We propose a training set synthesis technique for data-efficient learning, called dataset Condensation, that learns to condense large dataset into a small set of informative synthetic samples for training deep neural networks from scratch.
We rigorously evaluate its performance in several computer vision benchmarks and demonstrate that it significantly outperforms the state-of-the-art methods.
arXiv Detail & Related papers (2020-06-10T16:30:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.