Related papers: Needles in the Landscape: Semi-Supervised Pseudolabeling for Archaeological Site Discovery under Label Scarcity

Needles in the Landscape: Semi-Supervised Pseudolabeling for Archaeological Site Discovery under Label Scarcity

URL: http://arxiv.org/abs/2510.16814v1
Date: Sun, 19 Oct 2025 12:54:38 GMT
Title: Needles in the Landscape: Semi-Supervised Pseudolabeling for Archaeological Site Discovery under Label Scarcity
Authors: Simon Jaxy, Anton Theys, Patrick Willett, W. Chris Carleton, Ralf Vandam, Pieter Libin,
Abstract summary: Archaeological predictive modelling estimates where undiscovered sites are likely to occur by combining known locations with environmental, cultural, and geospatial variables.<n>We address this challenge using a deep learning approach but must contend with structural label scarcity inherent to archaeology: positives are rare, and most locations are unlabeled.<n>Our results indicate that semi-supervised learning offers a promising approach to identifying undiscovered sites across large, sparsely annotated landscapes.
Score: 0.3957768262206626
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Archaeological predictive modelling estimates where undiscovered sites are likely to occur by combining known locations with environmental, cultural, and geospatial variables. We address this challenge using a deep learning approach but must contend with structural label scarcity inherent to archaeology: positives are rare, and most locations are unlabeled. To address this, we adopt a semi-supervised, positive-unlabeled (PU) learning strategy, implemented as a semantic segmentation model and evaluated on two datasets covering a representative range of archaeological periods. Our approach employs dynamic pseudolabeling, refined with a Conditional Random Field (CRF) implemented via an RNN, increasing label confidence under severe class imbalance. On a geospatial dataset derived from a digital elevation model (DEM), our model performs on par with the state-of-the-art, LAMAP, while achieving higher Dice scores. On raw satellite imagery, assessed end-to-end with stratified k-fold cross-validation, it maintains performance and yields predictive surfaces with improved interpretability. Overall, our results indicate that semi-supervised learning offers a promising approach to identifying undiscovered sites across large, sparsely annotated landscapes.

Related papers

Learning to Explore: Policy-Guided Outlier Synthesis for Graph Out-of-Distribution Detection [51.93878677594561]
In unsupervised graph-level OOD detection, models are typically trained using only in-distribution (ID) data.<n>We propose a Policy-Guided Outlier Synthesis framework that replaces statics with a learned exploration strategy.
arXiv Detail & Related papers (2026-02-28T11:40:18Z)
NordFKB: a fine-grained benchmark dataset for geospatial AI in Norway [36.2419347658476]
We present NordFKB, a fine-grained benchmark dataset for geospatial AI in Norway.<n>The dataset contains high-resolution orthophotos paired with detailed annotations for 36 semantic classes.<n> NordFKB provides a robust foundation for advancing AI methods in mapping, land administration, and spatial planning.
arXiv Detail & Related papers (2025-12-10T18:47:25Z)
Topology-Aware Modeling for Unsupervised Simulation-to-Reality Point Cloud Recognition [63.55828203989405]
We introduce a novel Topology-Aware Modeling (TAM) framework for Sim2Real UDA on object point clouds.<n>Our approach mitigates the domain gap by leveraging global spatial topology, characterized by low-level, high-frequency 3D structures.<n>We propose an advanced self-training strategy that combines cross-domain contrastive learning with self-training.
arXiv Detail & Related papers (2025-06-26T11:53:59Z)
Performance and Generalizability Impacts of Incorporating Geolocation into Deep Learning for Dynamic PM2.5 Estimation [1.278093617645299]
We build on a recently published deep learning-based PM2.5 estimation model that achieves state-of-the-art performance on data observed in the contiguous United States.<n>We examine three approaches for incorporating geolocation: excluding geolocation as a baseline, using raw geographic coordinates, and leveraging pretrained location encoders.
arXiv Detail & Related papers (2025-05-24T02:00:34Z)
A Sensor Agnostic Domain Generalization Framework for Leveraging Geospatial Foundation Models: Enhancing Semantic Segmentation viaSynergistic Pseudo-Labeling and Generative Learning [5.299218284699214]
High-performance segmentation models are challenged by annotation scarcity and variability across sensors, illumination, and geography.<n>This paper introduces a domain generalization approach to leveraging emerging geospatial foundation models by combining soft-alignment pseudo-labeling with source-to-target generative pre-training.<n> Experiments with hyperspectral and multispectral remote sensing datasets confirm our method's effectiveness in enhancing adaptability and segmentation.
arXiv Detail & Related papers (2025-05-02T19:52:02Z)
A Deep Learning Architecture for Land Cover Mapping Using Spatio-Temporal Sentinel-1 Features [1.907072234794597]
The study focuses on three distinct regions - Amazonia, Africa, and Siberia - and evaluates the model performance across diverse ecoregions within these areas.<n>The results demonstrate the effectiveness and the capabilities of the proposed methodology in achieving overall accuracy (O.A.) values, even in regions with limited training data.
arXiv Detail & Related papers (2025-03-10T12:15:35Z)
Downstream-Pretext Domain Knowledge Traceback for Active Learning [138.02530777915362]
We propose a downstream-pretext domain knowledge traceback (DOKT) method that traces the data interactions of downstream knowledge and pre-training guidance. DOKT consists of a traceback diversity indicator and a domain-based uncertainty estimator. Experiments conducted on ten datasets show that our model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-07-20T01:34:13Z)
Edge Classification on Graphs: New Directions in Topological Imbalance [53.42066415249078]
We identify a novel Topological Imbalance Issue', which arises from the skewed distribution of edges across different classes. We introduce Topological Entropy (TE), a novel topological-based metric that measures the topological imbalance for each edge. We develop two strategies - Topological Reweighting and TE Wedge-based Mixup - to focus training on (synthetic) edges based on their TEs.
arXiv Detail & Related papers (2024-06-17T16:02:36Z)
PixelDINO: Semi-Supervised Semantic Segmentation for Detecting Permafrost Disturbances [15.78884578132055]
We focus on the remote detection of retrogressive thaw slumps (RTS), a permafrost disturbance comparable to landslides induced by thawing. We present a semi-supervised learning approach to train semantic segmentation models to detect RTS. Our framework called PixelDINO is trained in parallel on labelled data as well as unlabelled data.
arXiv Detail & Related papers (2024-01-17T15:20:10Z)
Energy-based Out-of-Distribution Detection for Graph Neural Networks [76.0242218180483]
We propose a simple, powerful and efficient OOD detection model for GNN-based learning on graphs, which we call GNNSafe. GNNSafe achieves up to $17.0%$ AUROC improvement over state-of-the-arts and it could serve as simple yet strong baselines in such an under-developed area.
arXiv Detail & Related papers (2023-02-06T16:38:43Z)
TraND: Transferable Neighborhood Discovery for Unsupervised Cross-domain Gait Recognition [77.77786072373942]
This paper proposes a Transferable Neighborhood Discovery (TraND) framework to bridge the domain gap for unsupervised cross-domain gait recognition. We design an end-to-end trainable approach to automatically discover the confident neighborhoods of unlabeled samples in the latent space. Our method achieves state-of-the-art results on two public datasets, i.e., CASIA-B and OU-LP.
arXiv Detail & Related papers (2021-02-09T03:07:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.