Needles in the Landscape: Semi-Supervised Pseudolabeling for Archaeological Site Discovery under Label Scarcity
- URL: http://arxiv.org/abs/2510.16814v1
- Date: Sun, 19 Oct 2025 12:54:38 GMT
- Title: Needles in the Landscape: Semi-Supervised Pseudolabeling for Archaeological Site Discovery under Label Scarcity
- Authors: Simon Jaxy, Anton Theys, Patrick Willett, W. Chris Carleton, Ralf Vandam, Pieter Libin,
- Abstract summary: Archaeological predictive modelling estimates where undiscovered sites are likely to occur by combining known locations with environmental, cultural, and geospatial variables.<n>We address this challenge using a deep learning approach but must contend with structural label scarcity inherent to archaeology: positives are rare, and most locations are unlabeled.<n>Our results indicate that semi-supervised learning offers a promising approach to identifying undiscovered sites across large, sparsely annotated landscapes.
- Score: 0.3957768262206626
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Archaeological predictive modelling estimates where undiscovered sites are likely to occur by combining known locations with environmental, cultural, and geospatial variables. We address this challenge using a deep learning approach but must contend with structural label scarcity inherent to archaeology: positives are rare, and most locations are unlabeled. To address this, we adopt a semi-supervised, positive-unlabeled (PU) learning strategy, implemented as a semantic segmentation model and evaluated on two datasets covering a representative range of archaeological periods. Our approach employs dynamic pseudolabeling, refined with a Conditional Random Field (CRF) implemented via an RNN, increasing label confidence under severe class imbalance. On a geospatial dataset derived from a digital elevation model (DEM), our model performs on par with the state-of-the-art, LAMAP, while achieving higher Dice scores. On raw satellite imagery, assessed end-to-end with stratified k-fold cross-validation, it maintains performance and yields predictive surfaces with improved interpretability. Overall, our results indicate that semi-supervised learning offers a promising approach to identifying undiscovered sites across large, sparsely annotated landscapes.
Related papers
- Learning to Explore: Policy-Guided Outlier Synthesis for Graph Out-of-Distribution Detection [51.93878677594561]
In unsupervised graph-level OOD detection, models are typically trained using only in-distribution (ID) data.<n>We propose a Policy-Guided Outlier Synthesis framework that replaces statics with a learned exploration strategy.
arXiv Detail & Related papers (2026-02-28T11:40:18Z) - NordFKB: a fine-grained benchmark dataset for geospatial AI in Norway [36.2419347658476]
We present NordFKB, a fine-grained benchmark dataset for geospatial AI in Norway.<n>The dataset contains high-resolution orthophotos paired with detailed annotations for 36 semantic classes.<n> NordFKB provides a robust foundation for advancing AI methods in mapping, land administration, and spatial planning.
arXiv Detail & Related papers (2025-12-10T18:47:25Z) - Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition [63.55828203989405]
We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
arXiv Detail & Related papers (2025-06-26T11:53:59Z) - Performance and Generalizability Impacts of Incorporating Geolocation into Deep Learning for Dynamic PM2.5 Estimation [1.278093617645299]
We build on a recently published deep learning-based PM2.5 estimation model that achieves state-of-the-art performance on data observed in the contiguous United States.<n>We examine three approaches for incorporating geolocation: excluding geolocation as a baseline, using raw geographic coordinates, and leveraging pretrained location encoders.
arXiv Detail & Related papers (2025-05-24T02:00:34Z) - A Sensor Agnostic Domain Generalization Framework for Leveraging Geospatial Foundation Models: Enhancing Semantic Segmentation viaSynergistic Pseudo-Labeling and Generative Learning [5.299218284699214]
High-performance segmentation models are challenged by annotation scarcity and variability across sensors, illumination, and geography.<n>This paper introduces a domain generalization approach to leveraging emerging geospatial foundation models by combining soft-alignment pseudo-labeling with source-to-target generative pre-training.<n> Experiments with hyperspectral and multispectral remote sensing datasets confirm our method's effectiveness in enhancing adaptability and segmentation.
arXiv Detail & Related papers (2025-05-02T19:52:02Z) - A Deep Learning Architecture for Land Cover Mapping Using Spatio-Temporal Sentinel-1 Features [1.907072234794597]
The study focuses on three distinct regions - Amazonia, Africa, and Siberia - and evaluates the model performance across diverse ecoregions within these areas.<n>The results demonstrate the effectiveness and the capabilities of the proposed methodology in achieving overall accuracy (O.A.) values, even in regions with limited training data.
arXiv Detail & Related papers (2025-03-10T12:15:35Z) - Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance.
DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator.
Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z) - Edge Classification on Graphs: New Directions in Topological Imbalance [53.42066415249078]
We identify a novel Topological Imbalance Issue', which arises from the skewed distribution of edges across different classes.
We introduce Topological Entropy (TE), a novel topological-based metric that measures the topological imbalance for each edge.
We develop two strategies - Topological Reweighting and TE Wedge-based Mixup - to focus training on (synthetic) edges based on their TEs.
arXiv Detail & Related papers (2024-06-17T16:02:36Z) - PixelDINO: Semi-Supervised Semantic Segmentation for Detecting
Permafrost Disturbances [15.78884578132055]
We focus on the remote detection of retrogressive thaw slumps (RTS), a permafrost disturbance comparable to landslides induced by thawing.
We present a semi-supervised learning approach to train semantic segmentation models to detect RTS.
Our framework called PixelDINO is trained in parallel on labelled data as well as unlabelled data.
arXiv Detail & Related papers (2024-01-17T15:20:10Z) - Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe.
GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z) - TraND: Transferable Neighborhood Discovery for Unsupervised Cross-domain
Gait Recognition [77.77786072373942]
This paper proposes a Transferable Neighborhood Discovery (TraND) framework to bridge the domain gap for unsupervised cross-domain gait recognition.
We design an end-to-end trainable approach to automatically discover the confident neighborhoods of unlabeled samples in the latent space.
Our method achieves state-of-the-art results on two public datasets, i.e., CASIA-B and OU-LP.
arXiv Detail & Related papers (2021-02-09T03:07:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.