Fully Transformer Network for Change Detection of Remote Sensing Images
- URL: http://arxiv.org/abs/2210.00757v1
- Date: Mon, 3 Oct 2022 08:21:25 GMT
- Title: Fully Transformer Network for Change Detection of Remote Sensing Images
- Authors: Tianyu Yan and Zifu Wan and Pingping Zhang
- Abstract summary: We propose a novel learning framework named Fully Transformer Network (FTN) for remote sensing image CD.
It improves the feature extraction from a global view and combines multi-level visual features in a pyramid manner.
Our proposed method achieves a new state-of-the-art performance on four public CD benchmarks.
- Score: 22.989324947501014
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, change detection (CD) of remote sensing images have achieved great
progress with the advances of deep learning. However, current methods generally
deliver incomplete CD regions and irregular CD boundaries due to the limited
representation ability of the extracted visual features. To relieve these
issues, in this work we propose a novel learning framework named Fully
Transformer Network (FTN) for remote sensing image CD, which improves the
feature extraction from a global view and combines multi-level visual features
in a pyramid manner. More specifically, the proposed framework first utilizes
the advantages of Transformers in long-range dependency modeling. It can help
to learn more discriminative global-level features and obtain complete CD
regions. Then, we introduce a pyramid structure to aggregate multi-level visual
features from Transformers for feature enhancement. The pyramid structure
grafted with a Progressive Attention Module (PAM) can improve the feature
representation ability with additional interdependencies through channel
attentions. Finally, to better train the framework, we utilize the
deeply-supervised learning with multiple boundaryaware loss functions.
Extensive experiments demonstrate that our proposed method achieves a new
state-of-the-art performance on four public CD benchmarks. For model
reproduction, the source code is released at https://github.com/AI-Zhpp/FTN.
Related papers
- EfficientCD: A New Strategy For Change Detection Based With Bi-temporal Layers Exchanged [3.3885253104046993]
We propose a novel deep learning framework named EfficientCD for remote sensing image change detection.
The framework employs EfficientNet as its backbone network for feature extraction.
The EfficientCD has been experimentally validated on four remote sensing datasets.
arXiv Detail & Related papers (2024-07-22T19:11:50Z) - TransY-Net:Learning Fully Transformer Networks for Change Detection of
Remote Sensing Images [64.63004710817239]
We propose a novel Transformer-based learning framework named TransY-Net for remote sensing image CD.
It improves the feature extraction from a global view and combines multi-level visual features in a pyramid manner.
Our proposed method achieves a new state-of-the-art performance on four optical and two SAR image CD benchmarks.
arXiv Detail & Related papers (2023-10-22T07:42:19Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - Effective Image Tampering Localization via Enhanced Transformer and
Co-attention Fusion [5.691973573807887]
We propose an effective image tampering localization network (EITLNet) based on a two-branch enhanced transformer encoder.
The features extracted from RGB and noise streams are fused effectively by the coordinate attention-based fusion module.
arXiv Detail & Related papers (2023-09-17T15:43:06Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - HAT: Hierarchical Aggregation Transformers for Person Re-identification [87.02828084991062]
We take advantages of both CNNs and Transformers for image-based person Re-ID with high performance.
Work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID.
arXiv Detail & Related papers (2021-07-13T09:34:54Z) - ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias [76.16156833138038]
We propose a novel Vision Transformer Advanced by Exploring intrinsic IB from convolutions, ie, ViTAE.
ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context.
In each transformer layer, ViTAE has a convolution block in parallel to the multi-head self-attention module, whose features are fused and fed into the feed-forward network.
arXiv Detail & Related papers (2021-06-07T05:31:06Z) - Less is More: Pay Less Attention in Vision Transformers [61.05787583247392]
Less attention vIsion Transformer builds upon the fact that convolutions, fully-connected layers, and self-attentions have almost equivalent mathematical expressions for processing image patch sequences.
The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation.
arXiv Detail & Related papers (2021-05-29T05:26:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.