Swin Transformer coupling CNNs Makes Strong Contextual Encoders for VHR
Image Road Extraction
- URL: http://arxiv.org/abs/2201.03178v2
- Date: Sun, 28 May 2023 06:57:17 GMT
- Title: Swin Transformer coupling CNNs Makes Strong Contextual Encoders for VHR
Image Road Extraction
- Authors: Tao Chen, Yiran Liu, Haoyu Jiang, Ruirui Li
- Abstract summary: We propose a dual-branch network block named ConSwin that combines ResNet and SwinTransformers for road extraction tasks.
Our proposed method outperforms state-of-the-art methods on both the Massachusetts and CHN6-CUG datasets in terms of overall accuracy, IOU, and F1 indicators.
- Score: 11.308473487002782
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurately segmenting roads is challenging due to substantial intra-class
variations, indistinct inter-class distinctions, and occlusions caused by
shadows, trees, and buildings. To address these challenges, attention to
important texture details and perception of global geometric contextual
information are essential. Recent research has shown that CNN-Transformer
hybrid structures outperform using CNN or Transformer alone. While CNN excels
at extracting local detail features, the Transformer naturally perceives global
contextual information. In this paper, we propose a dual-branch network block
named ConSwin that combines ResNet and SwinTransformers for road extraction
tasks. This ConSwin block harnesses the strengths of both approaches to better
extract detailed and global features. Based on ConSwin, we construct an
hourglass-shaped road extraction network and introduce two novel connection
structures to better transmit texture and structural detail information to the
decoder. Our proposed method outperforms state-of-the-art methods on both the
Massachusetts and CHN6-CUG datasets in terms of overall accuracy, IOU, and F1
indicators. Additional experiments validate the effectiveness of our proposed
module, while visualization results demonstrate its ability to obtain better
road representations.
Related papers
- Interaction-Guided Two-Branch Image Dehazing Network [1.26404863283601]
Image dehazing aims to restore clean images from hazy ones.
CNNs and Transformers have demonstrated exceptional performance in local and global feature extraction.
We propose a novel dual-branch image dehazing framework that guides CNN and Transformer components interactively.
arXiv Detail & Related papers (2024-10-14T03:21:56Z) - SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for
Remote Sensing Images Change Detection [12.727650696327878]
We propose an end-to-end compounded dense network SwinV2DNet to inherit advantages of transformer and CNN.
It captures the change relationship features through the densely connected Swin V2 backbone.
It provides the low-level pre-changed and post-changed features through a CNN branch.
arXiv Detail & Related papers (2023-08-22T03:31:52Z) - Deeply-Coupled Convolution-Transformer with Spatial-temporal
Complementary Learning for Video-based Person Re-identification [91.56939957189505]
We propose a novel spatial-temporal complementary learning framework named Deeply-Coupled Convolution-Transformer (DCCT) for high-performance video-based person Re-ID.
Our framework could attain better performances than most state-of-the-art methods.
arXiv Detail & Related papers (2023-04-27T12:16:44Z) - ConvFormer: Combining CNN and Transformer for Medical Image Segmentation [17.88894109620463]
We propose a hierarchical CNN and Transformer hybrid architecture, called ConvFormer, for medical image segmentation.
Our ConvFormer, trained from scratch, outperforms various CNN- or Transformer-based architectures, achieving state-of-the-art performance.
arXiv Detail & Related papers (2022-11-15T23:11:22Z) - Cross-receptive Focused Inference Network for Lightweight Image
Super-Resolution [64.25751738088015]
Transformer-based methods have shown impressive performance in single image super-resolution (SISR) tasks.
Transformers that need to incorporate contextual information to extract features dynamically are neglected.
We propose a lightweight Cross-receptive Focused Inference Network (CFIN) that consists of a cascade of CT Blocks mixed with CNN and Transformer.
arXiv Detail & Related papers (2022-07-06T16:32:29Z) - Transformer-Guided Convolutional Neural Network for Cross-View
Geolocalization [20.435023745201878]
We propose a novel Transformer-guided convolutional neural network (TransGCNN) architecture.
Our TransGCNN consists of a CNN backbone extracting feature map from an input image and a Transformer head modeling global context.
Experiments on popular benchmark datasets demonstrate that our model achieves top-1 accuracy of 94.12% and 84.92% on CVUSA and CVACT_val, respectively.
arXiv Detail & Related papers (2022-04-21T08:46:41Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - CSformer: Bridging Convolution and Transformer for Compressive Sensing [65.22377493627687]
This paper proposes a hybrid framework that integrates the advantages of leveraging detailed spatial information from CNN and the global context provided by transformer for enhanced representation learning.
The proposed approach is an end-to-end compressive image sensing method, composed of adaptive sampling and recovery.
The experimental results demonstrate the effectiveness of the dedicated transformer-based architecture for compressive sensing.
arXiv Detail & Related papers (2021-12-31T04:37:11Z) - Semi-Supervised Vision Transformers [76.83020291497895]
We study the training of Vision Transformers for semi-supervised image classification.
We find Vision Transformers perform poorly on a semi-supervised ImageNet setting.
CNNs achieve superior results in small labeled data regime.
arXiv Detail & Related papers (2021-11-22T09:28:13Z) - Transformers Solve the Limited Receptive Field for Monocular Depth
Prediction [82.90445525977904]
We propose TransDepth, an architecture which benefits from both convolutional neural networks and transformers.
This is the first paper which applies transformers into pixel-wise prediction problems involving continuous labels.
arXiv Detail & Related papers (2021-03-22T18:00:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.