BuildFormer: Automatic building extraction with vision transformer
- URL: http://arxiv.org/abs/2111.15637v1
- Date: Mon, 29 Nov 2021 11:23:52 GMT
- Title: BuildFormer: Automatic building extraction with vision transformer
- Authors: Libo Wang, Yuechi Yang, Rui Li
- Abstract summary: We propose a novel transformer-based network for extracting buildings from fine-resolution remote sensing images, namely BuildFormer.
In Comparision with the ResNet, the proposed method achieves an improvement of 2% in mIoU on the WHU building dataset.
- Score: 7.577142111447444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Building extraction from fine-resolution remote sensing images plays a vital
role in numerous geospatial applications, such as urban planning, population
statistic, economic assessment and disaster management. With the advancement of
deep learning technology, deep convolutional neural networks (DCNNs) have
dominated the automatic building extraction task for many years. However, the
local property of DCNNs limits the extraction of global information, weakening
the ability of the network for recognizing the building instance. Recently, the
Transformer comprises a hot topic in the computer vision domain and achieves
state-of-the-art performance in fundamental vision tasks, such as image
classification, semantic segmentation and object detection. Inspired by this,
in this paper, we propose a novel transformer-based network for extracting
buildings from fine-resolution remote sensing images, namely BuildFormer. In
Comparision with the ResNet, the proposed method achieves an improvement of 2%
in mIoU on the WHU building dataset.
Related papers
- Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries.
We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images.
Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z) - RSBuilding: Towards General Remote Sensing Image Building Extraction and Change Detection with Foundation Model [22.56227565913003]
We propose a comprehensive remote sensing image building model, termed RSBuilding, developed from the perspective of the foundation model.
RSBuilding is designed to enhance cross-scene generalization and task understanding.
Our model was trained on a dataset comprising up to 245,000 images and validated on multiple building extraction and change detection datasets.
arXiv Detail & Related papers (2024-03-12T11:51:59Z) - Point-aware Interaction and CNN-induced Refinement Network for RGB-D
Salient Object Detection [95.84616822805664]
We introduce CNNs-assisted Transformer architecture and propose a novel RGB-D SOD network with Point-aware Interaction and CNN-induced Refinement.
In order to alleviate the block effect and detail destruction problems brought by the Transformer naturally, we design a CNN-induced refinement (CNNR) unit for content refinement and supplementation.
arXiv Detail & Related papers (2023-08-17T11:57:49Z) - Building Extraction from Remote Sensing Images via an Uncertainty-Aware
Network [18.365220543556113]
Building extraction plays an essential role in many applications, such as city planning and urban dynamic monitoring.
We propose a novel and straightforward Uncertainty-Aware Network (UANet) to alleviate this problem.
Results demonstrate that the proposed UANet outperforms other state-of-the-art algorithms by a large margin.
arXiv Detail & Related papers (2023-07-23T12:42:15Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - Dual-Tasks Siamese Transformer Framework for Building Damage Assessment [11.888964682446879]
We present the first attempt at designing a Transformer-based damage assessment architecture (DamFormer)
To the best of our knowledge, it is the first time that such a deep Transformer-based network is proposed for multitemporal remote sensing interpretation tasks.
arXiv Detail & Related papers (2022-01-26T14:11:16Z) - Efficient Hybrid Transformer: Learning Global-local Context for Urban
Sence Segmentation [11.237929167356725]
We propose an efficient hybrid Transformer (EHT) for semantic segmentation of urban scene images.
EHT takes advantage of CNNs and Transformer, learning global-local context to strengthen the feature representation.
The proposed EHT achieves a 67.0% mIoU on the UAVid test set and outperforms other lightweight models significantly.
arXiv Detail & Related papers (2021-09-18T13:55:38Z) - PC-RGNN: Point Cloud Completion and Graph Neural Network for 3D Object
Detection [57.49788100647103]
LiDAR-based 3D object detection is an important task for autonomous driving.
Current approaches suffer from sparse and partial point clouds of distant and occluded objects.
In this paper, we propose a novel two-stage approach, namely PC-RGNN, dealing with such challenges by two specific solutions.
arXiv Detail & Related papers (2020-12-18T18:06:43Z) - Ventral-Dorsal Neural Networks: Object Detection via Selective Attention [51.79577908317031]
We propose a new framework called Ventral-Dorsal Networks (VDNets)
Inspired by the structure of the human visual system, we propose the integration of a "Ventral Network" and a "Dorsal Network"
Our experimental results reveal that the proposed method outperforms state-of-the-art object detection approaches.
arXiv Detail & Related papers (2020-05-15T23:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.