Related papers: Multi-task deep learning for large-scale building detail extraction from high-resolution satellite imagery

Multi-task deep learning for large-scale building detail extraction from high-resolution satellite imagery

URL: http://arxiv.org/abs/2310.18899v1
Date: Sun, 29 Oct 2023 04:43:30 GMT
Title: Multi-task deep learning for large-scale building detail extraction from high-resolution satellite imagery
Authors: Zhen Qian, Min Chen, Zhuo Sun, Fan Zhang, Qingsong Xu, Jinzhao Guo, Zhiwei Xie, Zhixin Zhang
Abstract summary: Multi-task Building Refiner (MT-BR) is an adaptable neural network tailored for simultaneous extraction of building details from satellite imagery. For large-scale applications, we devise a novel spatial sampling scheme that strategically selects limited but representative image samples. MT-BR consistently outperforms other state-of-the-art methods in extracting building details across various metrics.
Score: 13.544826927121992
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Understanding urban dynamics and promoting sustainable development requires comprehensive insights about buildings. While geospatial artificial intelligence has advanced the extraction of such details from Earth observational data, existing methods often suffer from computational inefficiencies and inconsistencies when compiling unified building-related datasets for practical applications. To bridge this gap, we introduce the Multi-task Building Refiner (MT-BR), an adaptable neural network tailored for simultaneous extraction of spatial and attributional building details from high-resolution satellite imagery, exemplified by building rooftops, urban functional types, and roof architectural types. Notably, MT-BR can be fine-tuned to incorporate additional building details, extending its applicability. For large-scale applications, we devise a novel spatial sampling scheme that strategically selects limited but representative image samples. This process optimizes both the spatial distribution of samples and the urban environmental characteristics they contain, thus enhancing extraction effectiveness while curtailing data preparation expenditures. We further enhance MT-BR's predictive performance and generalization capabilities through the integration of advanced augmentation techniques. Our quantitative results highlight the efficacy of the proposed methods. Specifically, networks trained with datasets curated via our sampling method demonstrate improved predictive accuracy relative to those using alternative sampling approaches, with no alterations to network architecture. Moreover, MT-BR consistently outperforms other state-of-the-art methods in extracting building details across various metrics. The real-world practicality is also demonstrated in an application across Shanghai, generating a unified dataset that encompasses both the spatial and attributional details of buildings.

Related papers

Synthetic Data Matters: Re-training with Geo-typical Synthetic Labels for Building Detection [13.550020274133866]
We propose re-training models at test time using synthetic data tailored to the target region's city layout.<n>This method generates geo-typical synthetic data that closely replicates the urban structure of a target area.<n>Experiments demonstrate significant performance enhancements, with median improvements of up to 12%, depending on the domain gap.
arXiv Detail & Related papers (2025-07-22T14:53:13Z)
Knowledge-guided Complex Diffusion Model for PolSAR Image Classification in Contourlet Domain [58.46450049579116]
We propose a knowledge-guided complex diffusion model for PolSAR image classification in the Contourlet domain.<n> Specifically, the Contourlet transform is first applied to decompose the data into low- and high-frequency subbands.<n>A knowledge-guided complex diffusion network is then designed to model the statistical properties of the low-frequency components.
arXiv Detail & Related papers (2025-07-08T04:50:28Z)
Spatial Understanding from Videos: Structured Prompts Meet Simulation Data [79.52833996220059]
We present a unified framework for enhancing 3D spatial reasoning in pre-trained vision-language models without modifying their architecture.<n>This framework combines SpatialMind, a structured prompting strategy that decomposes complex scenes and questions into interpretable reasoning steps, with ScanForgeQA, a scalable question-answering dataset built from diverse 3D simulation scenes.
arXiv Detail & Related papers (2025-06-04T07:36:33Z)
A Novel Shape-Aware Topological Representation for GPR Data with DNN Integration [3.367318729981566]
Ground Penetrating Radar (GPR) is a widely used Non-Destructive Testing (NDT) technique for subsurface exploration.<n>This study presents a novel framework that enhances the detection of underground utilities, especially pipelines.<n>We propose a novel shape-aware topological representation that amplifies structural features in the input data, thereby improving the model's responsiveness to the geometrical features of buried objects.
arXiv Detail & Related papers (2025-05-26T10:43:34Z)
Latent Diffusion Planning for Imitation Learning [78.56207566743154]
Latent Diffusion Planning (LDP) is a modular approach consisting of a planner and inverse dynamics model. By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data. On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches.
arXiv Detail & Related papers (2025-04-23T17:53:34Z)
Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data [14.104497777255137]
We introduce Low-rank Efficient Spatial-Spectral Vision Transformer with three key innovations. We pretrain LESS ViT using a Hyperspectral Masked Autoencoder framework with integrated positional and channel masking strategies. Experimental results demonstrate that our proposed method achieves competitive performance against state-of-the-art multi-modal geospatial foundation models.
arXiv Detail & Related papers (2025-03-17T05:42:19Z)
Classification of residential and non-residential buildings based on satellite data using deep learning [0.0]
In this paper, we are proposing a novel deep learning approach that combines high-resolution satellite data and vector data to achieve high-performance building classification. Experimental results on a large-scale dataset demonstrate the effectiveness of our model, achieving an impressive overall F1 -score of 0.9936.
arXiv Detail & Related papers (2024-11-11T11:23:43Z)
Fine-Grained Building Function Recognition from Street-View Images via Geometry-Aware Semi-Supervised Learning [18.432786227782803]
We propose a geometry-aware semi-supervised framework for fine-grained building function recognition. We use geometric relationships among multi-source data to enhance pseudo-label accuracy in semi-supervised learning. Our proposed framework exhibits superior performance in fine-grained functional recognition of buildings.
arXiv Detail & Related papers (2024-08-18T12:48:48Z)
IsUMap: Manifold Learning and Data Visualization leveraging Vietoris-Rips filtrations [0.08796261172196743]
We present a systematic and detailed construction of a metric representation for locally distorted metric spaces. Our approach addresses limitations in existing methods by accommodating non-uniform data distributions and intricate local geometries.
arXiv Detail & Related papers (2024-07-25T07:46:30Z)
Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation [51.44054828384487]
We propose a novel parameterization method dubbed Hierarchical Generative Latent Distillation (H-GLaD) This method systematically explores hierarchical layers within the generative adversarial networks (GANs) In addition, we introduce a novel class-relevant feature distance metric to alleviate the computational burden associated with synthetic dataset evaluation.
arXiv Detail & Related papers (2024-06-09T09:15:54Z)
Mechanistic Design and Scaling of Hybrid Architectures [114.3129802943915]
We identify and test new hybrid architectures constructed from a variety of computational primitives. We experimentally validate the resulting architectures via an extensive compute-optimal and a new state-optimal scaling law analysis. We find MAD synthetics to correlate with compute-optimal perplexity, enabling accurate evaluation of new architectures.
arXiv Detail & Related papers (2024-03-26T16:33:12Z)
GBSS:a global building semantic segmentation dataset for large-scale remote sensing building extraction [10.39943244036649]
We construct a Global Building Semantic dataset (The dataset will be released), which comprises 116.9k pairs of samples (about 742k buildings) from six continents. There are significant variations of building samples in terms of size and style, so the dataset can be a more challenging benchmark for evaluating the generalization and robustness of building semantic segmentation models.
arXiv Detail & Related papers (2024-01-02T12:13:35Z)
Semantic Segmentation of Vegetation in Remote Sensing Imagery Using Deep Learning [77.34726150561087]
We propose an approach for creating a multi-modal and large-temporal dataset comprised of publicly available Remote Sensing data. We use Convolutional Neural Networks (CNN) models that are capable of separating different classes of vegetation.
arXiv Detail & Related papers (2022-09-28T18:51:59Z)
A diverse large-scale building dataset and a novel plug-and-play domain generalization method for building extraction [2.578242050187029]
We introduce a new building dataset and propose a novel domain generalization method to facilitate the development of building extraction from remote sensing images. The WHU-Mix building dataset consists of a training/validation set containing 43,727 diverse images collected from all over the world, and a test set containing 8402 images from five other cities on five continents. To further improve the generalization ability of a building extraction model, we propose a domain generalization method named batch style mixing (BSM)
arXiv Detail & Related papers (2022-08-22T01:43:13Z)
CAFE: Learning to Condense Dataset by Aligning Features [72.99394941348757]
We propose a novel scheme to Condense dataset by Aligning FEatures (CAFE) At the heart of our approach is an effective strategy to align features from the real and synthetic data across various scales. We validate the proposed CAFE across various datasets, and demonstrate that it generally outperforms the state of the art.
arXiv Detail & Related papers (2022-03-03T05:58:49Z)
Shared Space Transfer Learning for analyzing multi-site fMRI data [83.41324371491774]
Multi-voxel pattern analysis (MVPA) learns predictive models from task-based functional magnetic resonance imaging (fMRI) data. MVPA works best with a well-designed feature set and an adequate sample size. Most fMRI datasets are noisy, high-dimensional, expensive to collect, and with small sample sizes. This paper proposes the Shared Space Transfer Learning (SSTL) as a novel transfer learning approach.
arXiv Detail & Related papers (2020-10-24T08:50:26Z)
Spatial-Spectral Residual Network for Hyperspectral Image Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet) Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information. In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.