Related papers: HiT: Building Mapping with Hierarchical Transformers

HiT: Building Mapping with Hierarchical Transformers

URL: http://arxiv.org/abs/2309.09643v2
Date: Wed, 10 Jan 2024 09:50:20 GMT
Title: HiT: Building Mapping with Hierarchical Transformers
Authors: Mingming Zhang, Qingjie Liu, Yunhong Wang
Abstract summary: We propose a simple and novel building mapping method with Hierarchical Transformers, called HiT. HiT builds on a two-stage detection architecture by adding a polygon head parallel to classification and bounding box regression heads. Our method achieves a new state-of-the-art in terms of instance segmentation and polygonal metrics compared with state-of-the-art methods.
Score: 43.31497052507252
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep learning-based methods have been extensively explored for automatic building mapping from high-resolution remote sensing images over recent years. While most building mapping models produce vector polygons of buildings for geographic and mapping systems, dominant methods typically decompose polygonal building extraction in some sub-problems, including segmentation, polygonization, and regularization, leading to complex inference procedures, low accuracy, and poor generalization. In this paper, we propose a simple and novel building mapping method with Hierarchical Transformers, called HiT, improving polygonal building mapping quality from high-resolution remote sensing images. HiT builds on a two-stage detection architecture by adding a polygon head parallel to classification and bounding box regression heads. HiT simultaneously outputs building bounding boxes and vector polygons, which is fully end-to-end trainable. The polygon head formulates a building polygon as serialized vertices with the bidirectional characteristic, a simple and elegant polygon representation avoiding the start or end vertex hypothesis. Under this new perspective, the polygon head adopts a transformer encoder-decoder architecture to predict serialized vertices supervised by the designed bidirectional polygon loss. Furthermore, a hierarchical attention mechanism combined with convolution operation is introduced in the encoder of the polygon head, providing more geometric structures of building polygons at vertex and edge levels. Comprehensive experiments on two benchmarks (the CrowdAI and Inria datasets) demonstrate that our method achieves a new state-of-the-art in terms of instance segmentation and polygonal metrics compared with state-of-the-art methods. Moreover, qualitative results verify the superiority and effectiveness of our model under complex scenes.

Related papers

LDPoly: Latent Diffusion for Polygonal Road Outline Extraction in Large-Scale Topographic Mapping [5.093758132026397]
We introduce LDPoly, the first framework for extracting polygonal road outlines from high-resolution aerial images. We evaluate LDPoly on a new benchmark dataset, Map2ImLas, which contains detailed polygonal annotations for various topographic objects in several Dutch regions.
arXiv Detail & Related papers (2025-04-29T11:13:33Z)
SpaceMesh: A Continuous Representation for Learning Manifold Surface Meshes [61.110517195874074]
We present a scheme to directly generate manifold, polygonal meshes of complex connectivity as the output of a neural network. Our key innovation is to define a continuous latent connectivity space at each mesh, which implies the discrete mesh. In applications, this approach not only yields high-quality outputs from generative models, but also enables directly learning challenging geometry processing tasks such as mesh repair.
arXiv Detail & Related papers (2024-09-30T17:59:03Z)
Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries. We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z)
P2PFormer: A Primitive-to-polygon Method for Regular Building Contour Extraction from Remote Sensing Images [5.589842901102337]
Existing methods struggle with irregular contours, rounded corners, and redundancy points. We introduce a novel, streamlined pipeline that generates regular building contours without post-processing. P2PFormer achieves new state-of-the-art performance on the WHU, CrowdAI, and WHU-Mix datasets.
arXiv Detail & Related papers (2024-06-05T04:38:45Z)
PolyBuilding: Polygon Transformer for End-to-End Building Extraction [9.196604757138825]
PolyBuilding predicts vector representation of buildings from remote sensing images. Model learns the relations among them and encodes context information from the image to predict the final set of building polygons. It also achieves a new state-of-the-art in terms of pixel-level coverage, instance-level precision and recall, and geometry-level properties.
arXiv Detail & Related papers (2022-11-03T04:53:17Z)
Towards General-Purpose Representation Learning of Polygonal Geometries [62.34832826705641]
We develop a general-purpose polygon encoding model, which can encode a polygonal geometry into an embedding space. We conduct experiments on two tasks: 1) shape classification based on MNIST; 2) spatial relation prediction based on two new datasets - DBSR-46K and DBSR-cplx46K. Our results show that NUFTspec and ResNet1D outperform multiple existing baselines with significant margins.
arXiv Detail & Related papers (2022-09-29T15:59:23Z)
PolyWorld: Polygonal Building Extraction with Graph Neural Networks in Satellite Images [10.661430927191205]
This paper introduces PolyWorld, a neural network that directly extracts building vertices from an image and connects them correctly to create precise polygons. PolyWorld significantly outperforms the state-of-the-art in building polygonization.
arXiv Detail & Related papers (2021-11-30T15:23:17Z)
Automated LoD-2 Model Reconstruction from Very-HighResolution Satellite-derived Digital Surface Model and Orthophoto [1.2691047660244335]
We propose a model-driven method that reconstructs LoD-2 building models following a "decomposition-optimization-fitting" paradigm. Our proposed method has addressed a few technical caveats over existing methods, resulting in practically high-quality results.
arXiv Detail & Related papers (2021-09-08T19:03:09Z)
Machine-learned 3D Building Vectorization from Satellite Imagery [7.887221474814986]
We propose a machine learning based approach for automatic 3D building reconstruction and vectorization. Taking a single-channel photogrammetric digital surface model (DSM) and panchromatic (PAN) image as input, we first filter out non-building objects and refine the building of shapes. The refined DSM and the input PAN image are then used through a semantic segmentation network to detect edges and corners of building roofs.
arXiv Detail & Related papers (2021-04-13T19:57:30Z)
DSG-Net: Learning Disentangled Structure and Geometry for 3D Shape Generation [98.96086261213578]
We introduce DSG-Net, a deep neural network that learns a disentangled structured and geometric mesh representation for 3D shapes. This supports a range of novel shape generation applications with disentangled control, such as of structure (geometry) while keeping geometry (structure) unchanged. Our method not only supports controllable generation applications but also produces high-quality synthesized shapes, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2020-08-12T17:06:51Z)
Quantization in Relative Gradient Angle Domain For Building Polygon Estimation [88.80146152060888]
CNN approaches often generate imprecise building morphologies including noisy edges and round corners. We propose a module that uses prior knowledge of building corners to create angular and concise building polygons from CNN segmentation outputs. Experimental results demonstrate that our method refines CNN output from a rounded approximation to a more clear-cut angular shape of the building footprint.
arXiv Detail & Related papers (2020-07-10T21:33:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.