Learning to Extract Building Footprints from Off-Nadir Aerial Images
- URL: http://arxiv.org/abs/2204.13637v1
- Date: Thu, 28 Apr 2022 16:56:06 GMT
- Title: Learning to Extract Building Footprints from Off-Nadir Aerial Images
- Authors: Jinwang Wang, Lingxuan Meng, Weijia Li, Wen Yang, Lei Yu, Gui-Song Xia
- Abstract summary: Existing approaches assume that the roof and footprint of a building are well overlapped, which may not hold in off-nadir aerial images.
We propose an offset vector learning scheme, which turns the building footprint extraction problem into an instance-level joint prediction problem.
A new dataset, Buildings in Off-Nadir Aerial Images (BONAI), is created and released in this paper.
- Score: 33.2991137981025
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Extracting building footprints from aerial images is essential for precise
urban mapping with photogrammetric computer vision technologies. Existing
approaches mainly assume that the roof and footprint of a building are well
overlapped, which may not hold in off-nadir aerial images as there is often a
big offset between them. In this paper, we propose an offset vector learning
scheme, which turns the building footprint extraction problem in off-nadir
images into an instance-level joint prediction problem of the building roof and
its corresponding "roof to footprint" offset vector. Thus the footprint can be
estimated by translating the predicted roof mask according to the predicted
offset vector. We further propose a simple but effective feature-level offset
augmentation module, which can significantly refine the offset vector
prediction by introducing little extra cost. Moreover, a new dataset, Buildings
in Off-Nadir Aerial Images (BONAI), is created and released in this paper. It
contains 268,958 building instances across 3,300 aerial images with fully
annotated instance-level roof, footprint, and corresponding offset vector for
each building. Experiments on the BONAI dataset demonstrate that our method
achieves the state-of-the-art, outperforming other competitors by 3.37 to 7.39
points in F1-score. The codes, datasets, and trained models are available at
https://github.com/jwwangchn/BONAI.git.
Related papers
- AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis [57.249817395828174]
We propose a scalable framework combining pseudo-synthetic renderings from 3D city-wide meshes with real, ground-level crowd-sourced images.
The pseudo-synthetic data simulates a wide range of aerial viewpoints, while the real, crowd-sourced images help improve visual fidelity for ground-level images.
Using this hybrid dataset, we fine-tune several state-of-the-art algorithms and achieve significant improvements on real-world, zero-shot aerial-ground tasks.
arXiv Detail & Related papers (2025-04-17T17:57:05Z) - FG$^2$: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching [69.81167130510333]
We propose a novel fine-grained cross-view localization method that estimates the 3 Degrees of Freedom pose of a ground-level image in an aerial image of the surroundings.
The pose is estimated by aligning a point plane generated from the ground image with a point plane sampled from the aerial image.
Compared to the previous state-of-the-art, our method reduces the mean localization error by 28% on the VIGOR cross-area test set.
arXiv Detail & Related papers (2025-03-24T14:34:20Z) - AIM2PC: Aerial Image to 3D Building Point Cloud Reconstruction [2.9998889086656586]
Recent methods primarily focus on rooftops from aerial images, often overlooking essential geometrical details.
There is a notable lack of datasets containing complete 3D point clouds for entire buildings, along with challenges in obtaining reliable camera pose information for aerial images.
This paper presents a novel methodology, AIM2PC, which utilizes our generated dataset that includes complete 3D point clouds determined camera poses.
arXiv Detail & Related papers (2025-03-24T10:34:07Z) - Extracting polygonal footprints in off-nadir images with Segment Anything Model [27.5051982104645]
We present OBMv2, an end-to-end and promptable model for polygonal footprint prediction.
Unlike its predecessor OBM, OBMv2 introduces a novel Self Offset Attention (SOFA) mechanism that improves performance across diverse building types.
We propose a Multi-level Information System (MISS) to effectively leverage roof masks, building masks, and offsets for accurate footprint prediction.
arXiv Detail & Related papers (2024-08-16T10:21:13Z) - DRAGON: Drone and Ground Gaussian Splatting for 3D Building Reconstruction [6.204957247203803]
DRAGON can take drone and ground building imagery as input and produce a 3D NVS model.
We compiled a semi-synthetic dataset of 9 large building scenes using Google Earth Studio.
arXiv Detail & Related papers (2024-07-01T19:52:32Z) - Classifying geospatial objects from multiview aerial imagery using semantic meshes [2.116528763953217]
We propose a new method to predict tree species based on aerial images of forests in the U.S.
We show that our proposed multiview method improves classification accuracy from 53% to 75% relative to an orthoorthoaic baseline on a challenging cross-site tree classification task.
arXiv Detail & Related papers (2024-05-15T17:56:49Z) - Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery [51.73680703579997]
We present a neural radiance field method for urban-scale semantic and building-level instance segmentation from aerial images.
objects in urban aerial images exhibit substantial variations in size, including buildings, cars, and roads.
We introduce a scale-adaptive semantic label fusion strategy that enhances the segmentation of objects of varying sizes.
We then introduce a novel cross-view instance label grouping strategy to mitigate the multi-view inconsistency problem in the 2D instance labels.
arXiv Detail & Related papers (2024-03-18T14:15:39Z) - Raising the Bar of AI-generated Image Detection with CLIP [50.345365081177555]
The aim of this work is to explore the potential of pre-trained vision-language models (VLMs) for universal detection of AI-generated images.
We develop a lightweight detection strategy based on CLIP features and study its performance in a wide variety of challenging scenarios.
arXiv Detail & Related papers (2023-11-30T21:11:20Z) - Building Extraction from Remote Sensing Images via an Uncertainty-Aware
Network [18.365220543556113]
Building extraction plays an essential role in many applications, such as city planning and urban dynamic monitoring.
We propose a novel and straightforward Uncertainty-Aware Network (UANet) to alleviate this problem.
Results demonstrate that the proposed UANet outperforms other state-of-the-art algorithms by a large margin.
arXiv Detail & Related papers (2023-07-23T12:42:15Z) - Semi-supervised Learning from Street-View Images and OpenStreetMap for
Automatic Building Height Estimation [59.6553058160943]
We propose a semi-supervised learning (SSL) method of automatically estimating building height from Mapillary SVI and OpenStreetMap data.
The proposed method leads to a clear performance boosting in estimating building heights with a Mean Absolute Error (MAE) around 2.1 meters.
The preliminary result is promising and motivates our future work in scaling up the proposed method based on low-cost VGI data.
arXiv Detail & Related papers (2023-07-05T18:16:30Z) - Learning to Generate 3D Representations of Building Roofs Using
Single-View Aerial Imagery [68.3565370706598]
We present a novel pipeline for learning the conditional distribution of a building roof mesh given pixels from an aerial image.
Unlike alternative methods that require multiple images of the same object, our approach enables estimating 3D roof meshes using only a single image for predictions.
arXiv Detail & Related papers (2023-03-20T15:47:05Z) - BiSVP: Building Footprint Extraction via Bidirectional Serialized Vertex
Prediction [43.61580149432732]
BiSVP is a refinement-free and end-to-end building footprint extraction method.
We propose a cross-scale feature fusion (CSFF) module to facilitate high resolution and rich semantic feature learning.
Our BiSVP outperforms state-of-the-art methods by considerable margins on three building instance segmentation benchmarks.
arXiv Detail & Related papers (2023-03-01T07:50:34Z) - Ground Plane Matters: Picking Up Ground Plane Prior in Monocular 3D
Object Detection [92.75961303269548]
The ground plane prior is a very informative geometry clue in monocular 3D object detection (M3OD)
We propose a Ground Plane Enhanced Network (GPENet) which resolves both issues at one go.
Our GPENet can outperform other methods and achieve state-of-the-art performance, well demonstrating the effectiveness and the superiority of the proposed approach.
arXiv Detail & Related papers (2022-11-03T02:21:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.