RoIPoly: Vectorized Building Outline Extraction Using Vertex and Logit Embeddings
- URL: http://arxiv.org/abs/2407.14920v1
- Date: Sat, 20 Jul 2024 16:12:51 GMT
- Title: RoIPoly: Vectorized Building Outline Extraction Using Vertex and Logit Embeddings
- Authors: Weiqin Jiao, Hao Cheng, Claudio Persello, George Vosselman,
- Abstract summary: We propose a novel query-based approach for extracting building outlines from aerial or satellite imagery.
We formulate each polygon as a query and constrain the query attention on the most relevant regions of a potential building.
We evaluate our method on the vectorized building outline extraction dataset CrowdAI and the 2D floorplan reconstruction dataset Structured3D.
- Score: 5.093758132026397
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Polygonal building outlines are crucial for geographic and cartographic applications. The existing approaches for outline extraction from aerial or satellite imagery are typically decomposed into subtasks, e.g., building masking and vectorization, or treat this task as a sequence-to-sequence prediction of ordered vertices. The former lacks efficiency, and the latter often generates redundant vertices, both resulting in suboptimal performance. To handle these issues, we propose a novel Region-of-Interest (RoI) query-based approach called RoIPoly. Specifically, we formulate each vertex as a query and constrain the query attention on the most relevant regions of a potential building, yielding reduced computational overhead and more efficient vertex level interaction. Moreover, we introduce a novel learnable logit embedding to facilitate vertex classification on the attention map; thus, no post-processing is needed for redundant vertex removal. We evaluated our method on the vectorized building outline extraction dataset CrowdAI and the 2D floorplan reconstruction dataset Structured3D. On the CrowdAI dataset, RoIPoly with a ResNet50 backbone outperforms existing methods with the same or better backbones on most MS-COCO metrics, especially on small buildings, and achieves competitive results in polygon quality and vertex redundancy without any post-processing. On the Structured3D dataset, our method achieves the second-best performance on most metrics among existing methods dedicated to 2D floorplan reconstruction, demonstrating our cross-domain generalization capability. The code will be released upon acceptance of this paper.
Related papers
- P2PFormer: A Primitive-to-polygon Method for Regular Building Contour Extraction from Remote Sensing Images [5.589842901102337]
Existing methods struggle with irregular contours, rounded corners, and redundancy points.
We introduce a novel, streamlined pipeline that generates regular building contours without post-processing.
P2PFormer achieves new state-of-the-art performance on the WHU, CrowdAI, and WHU-Mix datasets.
arXiv Detail & Related papers (2024-06-05T04:38:45Z) - PlaneRecTR: Unified Query Learning for 3D Plane Recovery from a Single
View [12.343189317320004]
PlaneRecTR is a Transformer-based architecture that unifies all subtasks related to single-view plane recovery with a single compact model.
Our proposed unified learning achieves mutual benefits across subtasks, obtaining a new state-of-the-art performance on public ScanNet and NYUv2-Plane datasets.
arXiv Detail & Related papers (2023-07-25T18:28:19Z) - BiSVP: Building Footprint Extraction via Bidirectional Serialized Vertex
Prediction [43.61580149432732]
BiSVP is a refinement-free and end-to-end building footprint extraction method.
We propose a cross-scale feature fusion (CSFF) module to facilitate high resolution and rich semantic feature learning.
Our BiSVP outperforms state-of-the-art methods by considerable margins on three building instance segmentation benchmarks.
arXiv Detail & Related papers (2023-03-01T07:50:34Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - BuildMapper: A Fully Learnable Framework for Vectorized Building Contour
Extraction [3.862461804734488]
We propose the first end-to-end learnable building contour extraction framework, named BuildMapper.
BuildMapper can directly and efficiently delineate building polygons just as a human does.
We show that BuildMapper can achieve a state-of-the-art performance, with a higher mask average precision (AP) and boundary AP than both segmentation-based and contour-based methods.
arXiv Detail & Related papers (2022-11-07T08:58:35Z) - CAGroup3D: Class-Aware Grouping for 3D Object Detection on Point Clouds [55.44204039410225]
We present a novel two-stage fully sparse convolutional 3D object detection framework, named CAGroup3D.
Our proposed method first generates some high-quality 3D proposals by leveraging the class-aware local group strategy on the object surface voxels.
To recover the features of missed voxels due to incorrect voxel-wise segmentation, we build a fully sparse convolutional RoI pooling module.
arXiv Detail & Related papers (2022-10-09T13:38:48Z) - Accurate Polygonal Mapping of Buildings in Satellite Imagery [30.262871819346213]
This paper studies the problem of polygonal mapping of buildings by tackling the issue of mask reversibility.
We propose a novel interaction mechanism of feature embedding sourced from different levels of supervision signals to obtain reversible building masks.
We show that the learned reversible building masks take all the merits of the advances of deep convolutional neural networks for high-performing polygonal mapping of buildings.
arXiv Detail & Related papers (2022-08-01T04:54:55Z) - Neural 3D Scene Reconstruction with the Manhattan-world Assumption [58.90559966227361]
This paper addresses the challenge of reconstructing 3D indoor scenes from multi-view images.
Planar constraints can be conveniently integrated into the recent implicit neural representation-based reconstruction methods.
The proposed method outperforms previous methods by a large margin on 3D reconstruction quality.
arXiv Detail & Related papers (2022-05-05T17:59:55Z) - Learning Local Displacements for Point Cloud Completion [93.54286830844134]
We propose a novel approach aimed at object and semantic scene completion from a partial scan represented as a 3D point cloud.
Our architecture relies on three novel layers that are used successively within an encoder-decoder structure.
We evaluate both architectures on object and indoor scene completion tasks, achieving state-of-the-art performance.
arXiv Detail & Related papers (2022-03-30T18:31:37Z) - Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object
Detection [89.66162518035144]
We present a flexible and high-performance framework, named Pyramid R-CNN, for two-stage 3D object detection from point clouds.
We propose a novel second-stage module, named pyramid RoI head, to adaptively learn the features from the sparse points of interest.
Our pyramid RoI head is robust to the sparse and imbalanced circumstances, and can be applied upon various 3D backbones to consistently boost the detection performance.
arXiv Detail & Related papers (2021-09-06T14:17:51Z) - PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object
Detection [57.49788100647103]
LiDAR-based 3D object detection is an important task for autonomous driving.
Current approaches suffer from sparse and partial point clouds of distant and occluded objects.
In this paper, we propose a novel two-stage approach, namely PC-RGNN, dealing with such challenges by two specific solutions.
arXiv Detail & Related papers (2020-12-18T18:06:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.