DragOSM: Extract Building Roofs and Footprints from Aerial Images by Aligning Historical Labels
- URL: http://arxiv.org/abs/2509.17951v1
- Date: Mon, 22 Sep 2025 16:10:13 GMT
- Title: DragOSM: Extract Building Roofs and Footprints from Aerial Images by Aligning Historical Labels
- Authors: Kai Li, Xingxing Weng, Yupeng Deng, Yu Meng, Chao Pang, Gui-Song Xia, Xiangyu Zhao,
- Abstract summary: We propose Drag OpenStreetMap Labels (DragOSM) to align dislocated historical labels with roofs and footprints.<n>DragOSM formulates the label alignment as an interactive denoising process, modeling the positional discrepancy as a Gaussian distribution.<n>We present a new dataset, Repairing Buildings in OSM (ReBO), comprising 179,265 buildings with both OpenStreetMap and manually corrected annotations across 5,473 images from 41 cities.
- Score: 48.74862499599635
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extracting polygonal roofs and footprints from remote sensing images is critical for large-scale urban analysis. Most existing methods rely on segmentation-based models that assume clear semantic boundaries of roofs, but these approaches struggle in off- nadir images, where the roof and footprint are significantly displaced, and facade pixels are fused with the roof boundary. With the increasing availability of open vector map annotations, e.g., OpenStreetMap, utilizing historical labels for off-nadir image annotation has become viable because remote sensing images are georeferenced once captured. However, these historical labels commonly suffer from significant positional discrepancies with new images and only have one annotation (roof or footprint), which fails to describe the correct structures of a building. To address these discrepancies, we first introduce a concept of an alignment token, which encodes the correction vector to guide the label correction. Based on this concept, we then propose Drag OpenStreetMap Labels (DragOSM), a novel model designed to align dislocated historical labels with roofs and footprints. Specifically, DragOSM formulates the label alignment as an interactive denoising process, modeling the positional discrepancy as a Gaussian distribution. During training, it learns to correct these errors by simulating misalignment with random Gaussian perturbations; during inference, it iteratively refines the positions of input labels. To validate our method, we further present a new dataset, Repairing Buildings in OSM (ReBO), comprising 179,265 buildings with both OpenStreetMap and manually corrected annotations across 5,473 images from 41 cities. Experimental results on ReBO demonstrate the effectiveness of DragOSM. Code, dataset, and trained models are publicly available at https://github.com/likaiucas/DragOSM.git.
Related papers
- SAModified: A Foundation Model-Based Zero-Shot Approach for Refining Noisy Land-Use Land-Cover Maps [2.374912052693646]
Land-use and land cover (LULC) analysis is critical in remote sensing.<n> automating LULC map generation using machine learning is rendered challenging due to noisy labels.<n>We propose a zero-shot approach using the foundation model, Segment Anything Model (SAM)<n>We achieve a significant reduction in label noise and an improvement in the performance of the downstream segmentation model by $approx 5%$ when trained with denoised labels.
arXiv Detail & Related papers (2024-12-17T05:23:00Z) - Training Matting Models without Alpha Labels [22.249204770416927]
This work explores using rough annotations such as trimaps coarsely indicating the foreground/background as supervision.
We present that the cooperation between learned semantics from indicated known regions and proper assumed matting rules can help infer alpha values at transition areas.
Experiments on AM-2K and P3M-10K dataset show that our paradigm achieves comparable performance with the fine-label-supervised baseline.
arXiv Detail & Related papers (2024-08-20T04:34:06Z) - iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing.
It generates images conditioned on a source image and a textual edit prompt.
It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z) - Diffusion-Based Scene Graph to Image Generation with Masked Contrastive
Pre-Training [112.94542676251133]
We propose to learn scene graph embeddings by directly optimizing their alignment with images.
Specifically, we pre-train an encoder to extract both global and local information from scene graphs.
The resulting method, called SGDiff, allows for the semantic manipulation of generated images by modifying scene graph nodes and connections.
arXiv Detail & Related papers (2022-11-21T01:11:19Z) - Weakly-Supervised Salient Object Detection Using Point Supervison [17.88596733603456]
Current state-of-the-art saliency detection models rely heavily on large datasets of accurate pixel-wise annotations.
We propose a novel weakly-supervised salient object detection method using point supervision.
Our method outperforms the previous state-of-the-art methods trained with the stronger supervision.
arXiv Detail & Related papers (2022-03-22T12:16:05Z) - CAMERAS: Enhanced Resolution And Sanity preserving Class Activation
Mapping for image saliency [61.40511574314069]
Backpropagation image saliency aims at explaining model predictions by estimating model-centric importance of individual pixels in the input.
We propose CAMERAS, a technique to compute high-fidelity backpropagation saliency maps without requiring any external priors.
arXiv Detail & Related papers (2021-06-20T08:20:56Z) - Rethinking Localization Map: Towards Accurate Object Perception with
Self-Enhancement Maps [78.2581910688094]
This work introduces a novel self-enhancement method to harvest accurate object localization maps and object boundaries with only category labels as supervision.
In particular, the proposed Self-Enhancement Maps achieve the state-of-the-art localization accuracy of 54.88% on ILSVRC.
arXiv Detail & Related papers (2020-06-09T12:35:55Z) - Learning to segment from misaligned and partial labels [0.0]
Many non-urban settings lack the ground-truth needed for accurate segmentation.
Open source infrastructure annotations like OpenStreetMaps (OSM) are representative of this issue.
We present a novel and generalizable two-stage framework that enables improved pixel-wise image segmentation given misaligned and missing annotations.
arXiv Detail & Related papers (2020-05-27T06:02:58Z) - Weakly-Supervised Salient Object Detection via Scribble Annotations [54.40518383782725]
We propose a weakly-supervised salient object detection model to learn saliency from scribble labels.
We present a new metric, termed saliency structure measure, to measure the structure alignment of the predicted saliency maps.
Our method not only outperforms existing weakly-supervised/unsupervised methods, but also is on par with several fully-supervised state-of-the-art models.
arXiv Detail & Related papers (2020-03-17T12:59:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.