Baltimore Atlas: FreqWeaver Adapter for Semi-supervised Ultra-high Spatial Resolution Land Cover Classification
- URL: http://arxiv.org/abs/2506.15565v2
- Date: Sat, 23 Aug 2025 03:08:04 GMT
- Title: Baltimore Atlas: FreqWeaver Adapter for Semi-supervised Ultra-high Spatial Resolution Land Cover Classification
- Authors: Junhao Wu, Aboagye-Ntow Stephen, Chuyuan Wang, Gang Chen, Xin Huang,
- Abstract summary: Land Cover Classification identifies land cover types on sub-meter remote imagery.<n>Most existing methods focus on 1 m imagery and rely heavily on large-scale annotations.<n>We introduce Baltimore Atlas, a generalization; land cover classification framework that reduces reliance on large-scale training data.
- Score: 9.706130801069143
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Ultra-high Spatial Resolution (UHSR) Land Cover Classification is increasingly important for urban analysis, enabling fine-scale planning, ecological monitoring, and infrastructure management. It identifies land cover types on sub-meter remote sensing imagery, capturing details such as building outlines, road networks, and distinct boundaries. However, most existing methods focus on 1 m imagery and rely heavily on large-scale annotations, while UHSR data remain scarce and difficult to annotate, limiting practical applicability. To address these challenges, we introduce Baltimore Atlas, a UHSR land cover classification framework that reduces reliance on large-scale training data and delivers high-accuracy results. Baltimore Atlas builds on three key ideas: (1) Baltimore Atlas Dataset, a 0.3 m resolution dataset based on aerial imagery of Baltimore City; (2) FreqWeaver Adapter, a parameter-efficient adapter that transfers SAM2 to this domain, leveraging foundation model knowledge to reduce training data needs while enabling fine-grained detail and structural modeling; (3) Uncertainty-Aware Teacher Student Framework, a semi-supervised framework that exploits unlabeled data to further reduce training dependence and improve generalization across diverse scenes. Using only 5.96% of total model parameters, our approach achieves a 1.78% IoU improvement over existing parameter-efficient tuning strategies and a 3.44% IoU gain compared to state-of-the-art high-resolution remote sensing segmentation methods on the Baltimore Atlas Dataset.
Related papers
- Minimal High-Resolution Patches Are Sufficient for Whole Slide Image Representation via Cascaded Dual-Scale Reconstruction [13.897013242536849]
Whole-slide image (WSI) analysis remains challenging due to gigapixel scale and sparsely distributed diagnostic regions.<n>We propose a Cascaded Dual-Scale Reconstruction framework, demonstrating that only an average of 9 high-resolution patches per WSI are sufficient for robust slide-level representation.
arXiv Detail & Related papers (2025-08-03T08:01:30Z) - Efficient Self-Supervised Learning for Earth Observation via Dynamic Dataset Curation [67.23953699167274]
Self-supervised learning (SSL) has enabled the development of vision foundation models for Earth Observation (EO)<n>In EO, this challenge is amplified by the redundancy and heavy-tailed distributions common in satellite imagery.<n>We propose a dynamic dataset pruning strategy designed to improve SSL pre-training by maximizing dataset diversity and balance.
arXiv Detail & Related papers (2025-04-09T15:13:26Z) - One Look is Enough: A Novel Seamless Patchwise Refinement for Zero-Shot Monocular Depth Estimation Models on High-Resolution Images [25.48185527420231]
We propose Patch Refine Once (PRO), an efficient and generalizable tile-based framework.<n>Our PRO consists of two key components: (i) Grouped Patch Consistency Training that enhances test-time efficiency while mitigating the depth discontinuity problem.<n>Our PRO can be well harmonized, making their DE capabilities still effective for the grid input of high-resolution images with little depth discontinuities at the grid boundaries.
arXiv Detail & Related papers (2025-03-28T11:46:50Z) - Towards Degradation-Robust Reconstruction in Generalizable NeRF [58.33351079982745]
Generalizable Radiance Field (GNeRF) across scenes has been proven to be an effective way to avoid per-scene optimization.
There has been limited research on the robustness of GNeRFs to different types of degradation present in the source images.
arXiv Detail & Related papers (2024-11-18T16:13:47Z) - High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity [69.32473738284374]
Diffusion models have revolutionized text-to-image synthesis by delivering exceptional quality, fine detail resolution, and strong contextual awareness.<n>We propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models.<n>Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process.
arXiv Detail & Related papers (2024-10-14T02:49:23Z) - Seismic Fault SAM: Adapting SAM with Lightweight Modules and 2.5D Strategy for Fault Detection [11.868792440783054]
This paper proposes Seismic Fault SAM, which applies the general pre-training foundation model-Segment Anything Model (SAM)-to seismic fault interpretation.
Our innovative points include designing lightweight Adapter modules, freezing most of the pre-training weights, and only updating a small number of parameters.
Experimental results on the largest publicly available seismic dataset, Thebe, show that our method surpasses existing 3D models on both OIS and ODS metrics.
arXiv Detail & Related papers (2024-07-19T08:38:48Z) - PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation [42.29746147944489]
PatchRefiner is an advanced framework for metric single image depth estimation aimed at high-resolution real-domain inputs.
PatchRefiner adopts a tile-based methodology, reconceptualizing high-resolution depth estimation as a refinement process.
Our evaluations demonstrate PatchRefiner's superior performance, significantly outperforming existing benchmarks on the Unreal4KStereo dataset.
arXiv Detail & Related papers (2024-06-10T18:00:03Z) - TransLandSeg: A Transfer Learning Approach for Landslide Semantic Segmentation Based on Vision Foundation Model [4.8312235770932]
We propose TransLandSeg, which is a transfer learning approach for landslide semantic segmentation based on a vision foundation model (VFM)
Our proposed adaptive transfer learning (ATL) architecture enables the powerful segmentation capability of SAM to be transferred to landslide detection by training only 1.3% of the number of the parameters of SAM.
arXiv Detail & Related papers (2024-03-15T09:18:53Z) - Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels [4.833320222969612]
Large-scale high-resolution (HR) land-cover mapping is a vital task to survey the Earth's surface and resolve many challenges facing humanity.
We propose an efficient, weakly supervised framework (Paraformer) to guide large-scale HR land-cover mapping.
arXiv Detail & Related papers (2024-03-05T08:02:00Z) - Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV &
CribsTV [50.616892315086574]
This paper proposes two novel datasets: SlowTV and CribsTV.
These are large-scale datasets curated from publicly available YouTube videos, containing a total of 2M training frames.
We leverage these datasets to tackle the challenging task of zero-shot generalization.
arXiv Detail & Related papers (2024-03-03T17:29:03Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - Recognize Any Regions [55.76437190434433]
RegionSpot integrates position-aware localization knowledge from a localization foundation model with semantic information from a ViL model.<n>Experiments in open-world object recognition show that our RegionSpot achieves significant performance gain over prior alternatives.
arXiv Detail & Related papers (2023-11-02T16:31:49Z) - Low-Resolution Self-Attention for Semantic Segmentation [93.30597515880079]
We introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost.<n>Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution.<n>We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure.
arXiv Detail & Related papers (2023-10-08T06:10:09Z) - Multi-tiling Neural Radiance Field (NeRF) -- Geometric Assessment on Large-scale Aerial Datasets [5.391764618878545]
In this paper, we aim to scale the Neural Radiance Fields (NeRF) on large-scael aerial datasets.
Specifically, we introduce a location-specific sampling technique as well as a multi-camera tiling (MCT) strategy to reduce memory consumption.
We implement our method on a representative approach, Mip-NeRF, and compare its geometry performance with threephotgrammetric MVS pipelines.
arXiv Detail & Related papers (2023-10-01T00:21:01Z) - Semi-supervised Learning from Street-View Images and OpenStreetMap for
Automatic Building Height Estimation [59.6553058160943]
We propose a semi-supervised learning (SSL) method of automatically estimating building height from Mapillary SVI and OpenStreetMap data.
The proposed method leads to a clear performance boosting in estimating building heights with a Mean Absolute Error (MAE) around 2.1 meters.
The preliminary result is promising and motivates our future work in scaling up the proposed method based on low-cost VGI data.
arXiv Detail & Related papers (2023-07-05T18:16:30Z) - Rethinking Lightweight Salient Object Detection via Network Depth-Width
Tradeoff [26.566339984225756]
Existing salient object detection methods often adopt deeper and wider networks for better performance.
We propose a novel trilateral decoder framework by decoupling the U-shape structure into three complementary branches.
We show that our method achieves better efficiency-accuracy balance across five benchmarks.
arXiv Detail & Related papers (2023-01-17T03:43:25Z) - PreTraM: Self-Supervised Pre-training via Connecting Trajectory and Map [58.53373202647576]
We propose PreTraM, a self-supervised pre-training scheme for trajectory forecasting.
It consists of two parts: 1) Trajectory-Map Contrastive Learning, where we project trajectories and maps to a shared embedding space with cross-modal contrastive learning, and 2) Map Contrastive Learning, where we enhance map representation with contrastive learning on large quantities of HD-maps.
On top of popular baselines such as AgentFormer and Trajectron++, PreTraM boosts their performance by 5.5% and 6.9% relatively in FDE-10 on the challenging nuScenes dataset.
arXiv Detail & Related papers (2022-04-21T23:01:21Z) - High Quality Segmentation for Ultra High-resolution Images [72.97958314291648]
We propose the Continuous Refinement Model for the ultra high-resolution segmentation refinement task.
Our proposed method is fast and effective on image segmentation refinement.
arXiv Detail & Related papers (2021-11-29T11:53:06Z) - LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic
Segmentation [7.629717457706323]
LoveDA dataset contains 5987 HSR images with 166 annotated objects from three different cities.
LoveDA dataset is suitable for both land-cover semantic segmentation and unsupervised domain adaptation (UDA) tasks.
arXiv Detail & Related papers (2021-10-17T06:12:48Z) - Best-Buddy GANs for Highly Detailed Image Super-Resolution [71.13466303340192]
We consider the single image super-resolution (SISR) problem, where a high-resolution (HR) image is generated based on a low-resolution (LR) input.
Most methods along this line rely on a predefined single-LR-single-HR mapping, which is not flexible enough for the SISR task.
We propose best-buddy GANs (Beby-GAN) for rich-detail SISR. Relaxing the immutable one-to-one constraint, we allow the estimated patches to dynamically seek the best supervision.
arXiv Detail & Related papers (2021-03-29T02:58:27Z) - GANav: Group-wise Attention Network for Classifying Navigable Regions in
Unstructured Outdoor Environments [54.21959527308051]
We present a new learning-based method for identifying safe and navigable regions in off-road terrains and unstructured environments from RGB images.
Our approach consists of classifying groups of terrain classes based on their navigability levels using coarse-grained semantic segmentation.
We show through extensive evaluations on the RUGD and RELLIS-3D datasets that our learning algorithm improves the accuracy of visual perception in off-road terrains for navigation.
arXiv Detail & Related papers (2021-03-07T02:16:24Z) - Foveation for Segmentation of Ultra-High Resolution Images [8.037287701125832]
We introduce foveation module, a learnable "dataloader" which adaptively chooses the appropriate configuration (FoV/resolution trade-off) of the input patch to feed to the downstream segmentation model.
We demonstrate that the foveation module consistently improves segmentation performance over the cases trained with patches of fixed FoV/resolution trade-off.
arXiv Detail & Related papers (2020-07-29T21:44:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.