Related papers: Baltimore Atlas: FreqWeaver Adapter for Semi-supervised Ultra-high Spatial Resolution Land Cover Classification

Baltimore Atlas: FreqWeaver Adapter for Semi-supervised Ultra-high Spatial Resolution Land Cover Classification

URL: http://arxiv.org/abs/2506.15565v2
Date: Sat, 23 Aug 2025 03:08:04 GMT
Title: Baltimore Atlas: FreqWeaver Adapter for Semi-supervised Ultra-high Spatial Resolution Land Cover Classification
Authors: Junhao Wu, Aboagye-Ntow Stephen, Chuyuan Wang, Gang Chen, Xin Huang,
Abstract summary: Land Cover Classification identifies land cover types on sub-meter remote imagery.<n>Most existing methods focus on 1 m imagery and rely heavily on large-scale annotations.<n>We introduce Baltimore Atlas, a generalization; land cover classification framework that reduces reliance on large-scale training data.
Score: 9.706130801069143
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ultra-high Spatial Resolution (UHSR) Land Cover Classification is increasingly important for urban analysis, enabling fine-scale planning, ecological monitoring, and infrastructure management. It identifies land cover types on sub-meter remote sensing imagery, capturing details such as building outlines, road networks, and distinct boundaries. However, most existing methods focus on 1 m imagery and rely heavily on large-scale annotations, while UHSR data remain scarce and difficult to annotate, limiting practical applicability. To address these challenges, we introduce Baltimore Atlas, a UHSR land cover classification framework that reduces reliance on large-scale training data and delivers high-accuracy results. Baltimore Atlas builds on three key ideas: (1) Baltimore Atlas Dataset, a 0.3 m resolution dataset based on aerial imagery of Baltimore City; (2) FreqWeaver Adapter, a parameter-efficient adapter that transfers SAM2 to this domain, leveraging foundation model knowledge to reduce training data needs while enabling fine-grained detail and structural modeling; (3) Uncertainty-Aware Teacher Student Framework, a semi-supervised framework that exploits unlabeled data to further reduce training dependence and improve generalization across diverse scenes. Using only 5.96% of total model parameters, our approach achieves a 1.78% IoU improvement over existing parameter-efficient tuning strategies and a 3.44% IoU gain compared to state-of-the-art high-resolution remote sensing segmentation methods on the Baltimore Atlas Dataset.

Related papers

Minimal High-Resolution Patches Are Sufficient for Whole Slide Image Representation via Cascaded Dual-Scale Reconstruction [13.897013242536849]
Whole-slide image (WSI) analysis remains challenging due to gigapixel scale and sparsely distributed diagnostic regions.<n>We propose a Cascaded Dual-Scale Reconstruction framework, demonstrating that only an average of 9 high-resolution patches per WSI are sufficient for robust slide-level representation.
arXiv Detail & Related papers (2025-08-03T08:01:30Z)
Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation [67.23953699167274]
Self-supervised learning (SSL) has enabled the development of vision foundation models for Earth Observation (EO)<n>In EO, this challenge is amplified by the redundancy and heavy-tailed distributions common in satellite imagery.<n>We propose a dynamic dataset pruning strategy designed to improve SSL pre-training by maximizing dataset diversity and balance.
arXiv Detail & Related papers (2025-04-09T15:13:26Z)
One Look is Enough: A Novel Seamless Patchwise Refinement for Zero-Shot Monocular Depth Estimation Models on High-Resolution Images [25.48185527420231]
We propose Patch Refine Once (PRO), an efficient and generalizable tile-based framework.<n>Our PRO consists of two key components: (i) Grouped Patch Consistency Training that enhances test-time efficiency while mitigating the depth discontinuity problem.<n>Our PRO can be well harmonized, making their DE capabilities still effective for the grid input of high-resolution images with little depth discontinuities at the grid boundaries.
arXiv Detail & Related papers (2025-03-28T11:46:50Z)
Towards Degradation-Robust Reconstruction in Generalizable NeRF [58.33351079982745]
Generalizable Radiance Field (GNeRF) across scenes has been proven to be an effective way to avoid per-scene optimization. There has been limited research on the robustness of GNeRFs to different types of degradation present in the source images.
arXiv Detail & Related papers (2024-11-18T16:13:47Z)
High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity [69.32473738284374]
Diffusion models have revolutionized text-to-image synthesis by delivering exceptional quality, fine detail resolution, and strong contextual awareness.<n>We propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models.<n>Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process.
arXiv Detail & Related papers (2024-10-14T02:49:23Z)
Seismic Fault SAM: Adapting SAM with Lightweight Modules and 2.5D Strategy for Fault Detection [11.868792440783054]
This paper proposes Seismic Fault SAM, which applies the general pre-training foundation model-Segment Anything Model (SAM)-to seismic fault interpretation. Our innovative points include designing lightweight Adapter modules, freezing most of the pre-training weights, and only updating a small number of parameters. Experimental results on the largest publicly available seismic dataset, Thebe, show that our method surpasses existing 3D models on both OIS and ODS metrics.
arXiv Detail & Related papers (2024-07-19T08:38:48Z)
PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation [42.29746147944489]
PatchRefiner is an advanced framework for metric single image depth estimation aimed at high-resolution real-domain inputs. PatchRefiner adopts a tile-based methodology, reconceptualizing high-resolution depth estimation as a refinement process. Our evaluations demonstrate PatchRefiner's superior performance, significantly outperforming existing benchmarks on the Unreal4KStereo dataset.
arXiv Detail & Related papers (2024-06-10T18:00:03Z)
TransLandSeg: A Transfer Learning Approach for Landslide Semantic Segmentation Based on Vision Foundation Model [4.8312235770932]
We propose TransLandSeg, which is a transfer learning approach for landslide semantic segmentation based on a vision foundation model (VFM) Our proposed adaptive transfer learning (ATL) architecture enables the powerful segmentation capability of SAM to be transferred to landslide detection by training only 1.3% of the number of the parameters of SAM.
arXiv Detail & Related papers (2024-03-15T09:18:53Z)
Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels [4.833320222969612]
Large-scale high-resolution (HR) land-cover mapping is a vital task to survey the Earth's surface and resolve many challenges facing humanity. We propose an efficient, weakly supervised framework (Paraformer) to guide large-scale HR land-cover mapping.
arXiv Detail & Related papers (2024-03-05T08:02:00Z)
Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV [50.616892315086574]
This paper proposes two novel datasets: SlowTV and CribsTV. These are large-scale datasets curated from publicly available YouTube videos, containing a total of 2M training frames. We leverage these datasets to tackle the challenging task of zero-shot generalization.
arXiv Detail & Related papers (2024-03-03T17:29:03Z)
360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results. We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics. We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations. Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z)
Recognize Any Regions [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model.<n>Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z)
Low-Resolution Self-Attention for Semantic Segmentation [93.30597515880079]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.<n>Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.<n>We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z)
Multi-tiling Neural Radiance Field (NeRF) -- Geometric Assessment on Large-scale Aerial Datasets [5.391764618878545]
In this paper, we aim to scale the Neural Radiance Fields (NeRF) on large-scael aerial datasets. Specifically, we introduce a location-specific sampling technique as well as a multi-camera tiling (MCT) strategy to reduce memory consumption. We implement our method on a representative approach, Mip-NeRF, and compare its geometry performance with threephotgrammetric MVS pipelines.
arXiv Detail & Related papers (2023-10-01T00:21:01Z)
Semi-supervised Learning from Street-View Images and OpenStreetMap for Automatic Building Height Estimation [59.6553058160943]
We propose a semi-supervised learning (SSL) method of automatically estimating building height from Mapillary SVI and OpenStreetMap data. The proposed method leads to a clear performance boosting in estimating building heights with a Mean Absolute Error (MAE) around 2.1 meters. The preliminary result is promising and motivates our future work in scaling up the proposed method based on low-cost VGI data.
arXiv Detail & Related papers (2023-07-05T18:16:30Z)
Rethinking Lightweight Salient Object Detection via Network Depth-Width Tradeoff [26.566339984225756]
Existing salient object detection methods often adopt deeper and wider networks for better performance. We propose a novel trilateral decoder framework by decoupling the U-shape structure into three complementary branches. We show that our method achieves better efficiency-accuracy balance across five benchmarks.
arXiv Detail & Related papers (2023-01-17T03:43:25Z)
PreTraM: Self-Supervised Pre-training via Connecting Trajectory and Map [58.53373202647576]
We propose PreTraM, a self-supervised pre-training scheme for trajectory forecasting. It consists of two parts: 1) Trajectory-Map Contrastive Learning, where we project trajectories and maps to a shared embedding space with cross-modal contrastive learning, and 2) Map Contrastive Learning, where we enhance map representation with contrastive learning on large quantities of HD-maps. On top of popular baselines such as AgentFormer and Trajectron++, PreTraM boosts their performance by 5.5% and 6.9% relatively in FDE-10 on the challenging nuScenes dataset.
arXiv Detail & Related papers (2022-04-21T23:01:21Z)
High Quality Segmentation for Ultra High-resolution Images [72.97958314291648]
We propose the Continuous Refinement Model for the ultra high-resolution segmentation refinement task. Our proposed method is fast and effective on image segmentation refinement.
arXiv Detail & Related papers (2021-11-29T11:53:06Z)
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation [7.629717457706323]
LoveDA dataset contains 5987 HSR images with 166 annotated objects from three different cities. LoveDA dataset is suitable for both land-cover semantic segmentation and unsupervised domain adaptation (UDA) tasks.
arXiv Detail & Related papers (2021-10-17T06:12:48Z)
Best-Buddy GANs for Highly Detailed Image Super-Resolution [71.13466303340192]
We consider the single image super-resolution (SISR) problem, where a high-resolution (HR) image is generated based on a low-resolution (LR) input. Most methods along this line rely on a predefined single-LR-single-HR mapping, which is not flexible enough for the SISR task. We propose best-buddy GANs (Beby-GAN) for rich-detail SISR. Relaxing the immutable one-to-one constraint, we allow the estimated patches to dynamically seek the best supervision.
arXiv Detail & Related papers (2021-03-29T02:58:27Z)
GANav: Group-wise Attention Network for Classifying Navigable Regions in Unstructured Outdoor Environments [54.21959527308051]
We present a new learning-based method for identifying safe and navigable regions in off-road terrains and unstructured environments from RGB images. Our approach consists of classifying groups of terrain classes based on their navigability levels using coarse-grained semantic segmentation. We show through extensive evaluations on the RUGD and RELLIS-3D datasets that our learning algorithm improves the accuracy of visual perception in off-road terrains for navigation.
arXiv Detail & Related papers (2021-03-07T02:16:24Z)
Foveation for Segmentation of Ultra-High Resolution Images [8.037287701125832]
We introduce foveation module, a learnable "dataloader" which adaptively chooses the appropriate configuration (FoV/resolution trade-off) of the input patch to feed to the downstream segmentation model. We demonstrate that the foveation module consistently improves segmentation performance over the cases trained with patches of fixed FoV/resolution trade-off.
arXiv Detail & Related papers (2020-07-29T21:44:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.