PolyFootNet: Extracting Polygonal Building Footprints in Off-Nadir Remote Sensing Images
- URL: http://arxiv.org/abs/2408.08645v4
- Date: Tue, 22 Apr 2025 09:27:53 GMT
- Title: PolyFootNet: Extracting Polygonal Building Footprints in Off-Nadir Remote Sensing Images
- Authors: Kai Li, Yupeng Deng, Jingbo Chen, Yu Meng, Zhihao Xi, Junxian Ma, Chenhao Wang, Maolin Wang, Xiangyu Zhao,
- Abstract summary: PolyFootNet is a novel deep-learning framework that directly outputs polygonal building footprints without requiring external post-processing steps.<n>A key contribution of PolyFootNet is introducing the Self Offset Attention mechanism, grounded in Nadaraya-Watson regression, to effectively mitigate the accuracy discrepancy observed between low-rise and high-rise buildings.
- Score: 27.58630526006379
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Extracting polygonal building footprints from off-nadir imagery is crucial for diverse applications. Current deep-learning-based extraction approaches predominantly rely on semantic segmentation paradigms and post-processing algorithms, limiting their boundary precision and applicability. However, existing polygonal extraction methodologies are inherently designed for near-nadir imagery and fail under the geometric complexities introduced by off-nadir viewing angles. To address these challenges, this paper introduces Polygonal Footprint Network (PolyFootNet), a novel deep-learning framework that directly outputs polygonal building footprints without requiring external post-processing steps. PolyFootNet employs a High-Quality Mask Prompter to generate precise roof masks, which guide polygonal vertex extraction in a unified model pipeline. A key contribution of PolyFootNet is introducing the Self Offset Attention mechanism, grounded in Nadaraya-Watson regression, to effectively mitigate the accuracy discrepancy observed between low-rise and high-rise buildings. This approach allows low-rise building predictions to leverage angular corrections learned from high-rise building offsets, significantly enhancing overall extraction accuracy. Additionally, motivated by the inherent ambiguity of building footprint extraction tasks, we systematically investigate alternative extraction paradigms and demonstrate that a combined approach of building masks and offsets achieves superior polygonal footprint results. Extensive experiments validate PolyFootNet's effectiveness, illustrating its promising potential as a robust, generalizable, and precise polygonal building footprint extraction method from challenging off-nadir imagery. To facilitate further research, we will release pre-trained weights of our offset prediction module at https://github.com/likaiucas/PolyFootNet.
Related papers
- LDPoly: Latent Diffusion for Polygonal Road Outline Extraction in Large-Scale Topographic Mapping [5.093758132026397]
We introduce LDPoly, the first framework for extracting polygonal road outlines from high-resolution aerial images.
We evaluate LDPoly on a new benchmark dataset, Map2ImLas, which contains detailed polygonal annotations for various topographic objects in several Dutch regions.
arXiv Detail & Related papers (2025-04-29T11:13:33Z) - Pix2Poly: A Sequence Prediction Method for End-to-end Polygonal Building Footprint Extraction from Remote Sensing Imagery [2.867517731896504]
Pix2Poly is an end-to-end trainable and differentiable deep neural network capable of directly generating explicit high-quality building footprints in a ring graph format.
Compared to previous graph learning methods, ours is a truly end-to-end trainable approach that extracts high-quality building footprints and road networks without requiring complicated, computationally intensive loss functions and intricate training pipelines.
arXiv Detail & Related papers (2024-12-10T20:10:46Z) - SpaceMesh: A Continuous Representation for Learning Manifold Surface Meshes [61.110517195874074]
We present a scheme to directly generate manifold, polygonal meshes of complex connectivity as the output of a neural network.
Our key innovation is to define a continuous latent connectivity space at each mesh, which implies the discrete mesh.
In applications, this approach not only yields high-quality outputs from generative models, but also enables directly learning challenging geometry processing tasks such as mesh repair.
arXiv Detail & Related papers (2024-09-30T17:59:03Z) - RoIPoly: Vectorized Building Outline Extraction Using Vertex and Logit Embeddings [5.093758132026397]
We propose a novel query-based approach for extracting building outlines from aerial or satellite imagery.
We formulate each polygon as a query and constrain the query attention on the most relevant regions of a potential building.
We evaluate our method on the vectorized building outline extraction dataset CrowdAI and the 2D floorplan reconstruction dataset Structured3D.
arXiv Detail & Related papers (2024-07-20T16:12:51Z) - Enhancing Polygonal Building Segmentation via Oriented Corners [0.3749861135832072]
This paper introduces a novel deep convolutional neural network named OriCornerNet, which directly extracts delineated building polygons from input images.
Our approach involves a deep model that predicts building footprint masks, corners, and orientation vectors that indicate directions toward adjacent corners.
Performance evaluations conducted on SpaceNet Vegas and CrowdAI-small datasets demonstrate the competitive efficacy of our approach.
arXiv Detail & Related papers (2024-07-17T01:59:06Z) - Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones.
In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z) - Object-level Scene Deocclusion [92.39886029550286]
We present a new self-supervised PArallel visible-to-COmplete diffusion framework, named PACO, for object-level scene deocclusion.
To train PACO, we create a large-scale dataset with 500k samples to enable self-supervised learning.
Experiments on COCOA and various real-world scenes demonstrate the superior capability of PACO for scene deocclusion, surpassing the state of the arts by a large margin.
arXiv Detail & Related papers (2024-06-11T20:34:10Z) - Arbitrary-Scale Point Cloud Upsampling by Voxel-Based Network with
Latent Geometric-Consistent Learning [52.825441454264585]
We propose an arbitrary-scale Point cloud Upsampling framework using Voxel-based Network (textbfPU-VoxelNet)
Thanks to the completeness and regularity inherited from the voxel representation, voxel-based networks are capable of providing predefined grid space to approximate 3D surface.
A density-guided grid resampling method is developed to generate high-fidelity points while effectively avoiding sampling outliers.
arXiv Detail & Related papers (2024-03-08T07:31:14Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - Progressive Evolution from Single-Point to Polygon for Scene Text [79.29097971932529]
We introduce Point2Polygon, which can efficiently transform single-points into compact polygons.
Our method uses a coarse-to-fine process, starting with creating anchor points based on recognition confidence, then vertically and horizontally refining the polygon.
In training detectors with polygons generated by our method, we attained 86% of the accuracy relative to training with ground truth (GT); 3) Additionally, the proposed Point2Polygon can be seamlessly integrated to empower single-point spotters to generate polygons.
arXiv Detail & Related papers (2023-12-21T12:08:27Z) - Leveraging Large-Scale Pretrained Vision Foundation Models for
Label-Efficient 3D Point Cloud Segmentation [67.07112533415116]
We present a novel framework that adapts various foundational models for the 3D point cloud segmentation task.
Our approach involves making initial predictions of 2D semantic masks using different large vision models.
To generate robust 3D semantic pseudo labels, we introduce a semantic label fusion strategy that effectively combines all the results via voting.
arXiv Detail & Related papers (2023-11-03T15:41:15Z) - Prompt-Driven Building Footprint Extraction in Aerial Images with
Offset-Building Model [11.1278832358904]
We propose a promptable framework for roof and offset extraction.
Within this framework, we propose a novel Offset-Building Model (OBM)
Our model reduces offset errors by 16.6% and improves roof Intersection over Union (IoU) by 10.8% compared to other models.
arXiv Detail & Related papers (2023-10-25T15:44:50Z) - Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models [97.58685709663287]
generative pre-training can boost the performance of fundamental models in 2D vision.
In 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training.
We propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model.
arXiv Detail & Related papers (2023-07-27T16:07:03Z) - Semi-supervised Learning from Street-View Images and OpenStreetMap for
Automatic Building Height Estimation [59.6553058160943]
We propose a semi-supervised learning (SSL) method of automatically estimating building height from Mapillary SVI and OpenStreetMap data.
The proposed method leads to a clear performance boosting in estimating building heights with a Mean Absolute Error (MAE) around 2.1 meters.
The preliminary result is promising and motivates our future work in scaling up the proposed method based on low-cost VGI data.
arXiv Detail & Related papers (2023-07-05T18:16:30Z) - BiSVP: Building Footprint Extraction via Bidirectional Serialized Vertex
Prediction [43.61580149432732]
BiSVP is a refinement-free and end-to-end building footprint extraction method.
We propose a cross-scale feature fusion (CSFF) module to facilitate high resolution and rich semantic feature learning.
Our BiSVP outperforms state-of-the-art methods by considerable margins on three building instance segmentation benchmarks.
arXiv Detail & Related papers (2023-03-01T07:50:34Z) - PolyBuilding: Polygon Transformer for End-to-End Building Extraction [9.196604757138825]
PolyBuilding predicts vector representation of buildings from remote sensing images.
Model learns the relations among them and encodes context information from the image to predict the final set of building polygons.
It also achieves a new state-of-the-art in terms of pixel-level coverage, instance-level precision and recall, and geometry-level properties.
arXiv Detail & Related papers (2022-11-03T04:53:17Z) - Towards General-Purpose Representation Learning of Polygonal Geometries [62.34832826705641]
We develop a general-purpose polygon encoding model, which can encode a polygonal geometry into an embedding space.
We conduct experiments on two tasks: 1) shape classification based on MNIST; 2) spatial relation prediction based on two new datasets - DBSR-46K and DBSR-cplx46K.
Our results show that NUFTspec and ResNet1D outperform multiple existing baselines with significant margins.
arXiv Detail & Related papers (2022-09-29T15:59:23Z) - Learning to Extract Building Footprints from Off-Nadir Aerial Images [33.2991137981025]
Existing approaches assume that the roof and footprint of a building are well overlapped, which may not hold in off-nadir aerial images.
We propose an offset vector learning scheme, which turns the building footprint extraction problem into an instance-level joint prediction problem.
A new dataset, Buildings in Off-Nadir Aerial Images (BONAI), is created and released in this paper.
arXiv Detail & Related papers (2022-04-28T16:56:06Z) - PolyWorld: Polygonal Building Extraction with Graph Neural Networks in
Satellite Images [10.661430927191205]
This paper introduces PolyWorld, a neural network that directly extracts building vertices from an image and connects them correctly to create precise polygons.
PolyWorld significantly outperforms the state-of-the-art in building polygonization.
arXiv Detail & Related papers (2021-11-30T15:23:17Z) - Voxel-based Network for Shape Completion by Leveraging Edge Generation [76.23436070605348]
We develop a voxel-based network for point cloud completion by leveraging edge generation (VE-PCN)
We first embed point clouds into regular voxel grids, and then generate complete objects with the help of the hallucinated shape edges.
This decoupled architecture together with a multi-scale grid feature learning is able to generate more realistic on-surface details.
arXiv Detail & Related papers (2021-08-23T05:10:29Z) - Hierarchical Convolutional Neural Network with Feature Preservation and
Autotuned Thresholding for Crack Detection [5.735035463793008]
Drone imagery is increasingly used in automated inspection for infrastructure surface defects.
This paper proposes a deep learning approach using hierarchical convolutional neural networks with feature preservation.
The proposed technique is then applied to identify surface cracks on the surface of roads, bridges or pavements.
arXiv Detail & Related papers (2021-04-21T13:07:58Z) - Quantization in Relative Gradient Angle Domain For Building Polygon
Estimation [88.80146152060888]
CNN approaches often generate imprecise building morphologies including noisy edges and round corners.
We propose a module that uses prior knowledge of building corners to create angular and concise building polygons from CNN segmentation outputs.
Experimental results demonstrate that our method refines CNN output from a rounded approximation to a more clear-cut angular shape of the building footprint.
arXiv Detail & Related papers (2020-07-10T21:33:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.