SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for
Remote Sensing Images Change Detection
- URL: http://arxiv.org/abs/2308.11159v1
- Date: Tue, 22 Aug 2023 03:31:52 GMT
- Title: SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for
Remote Sensing Images Change Detection
- Authors: Dalong Zheng, Zebin Wu, Jia Liu, Zhihui Wei
- Abstract summary: We propose an end-to-end compounded dense network SwinV2DNet to inherit advantages of transformer and CNN.
It captures the change relationship features through the densely connected Swin V2 backbone.
It provides the low-level pre-changed and post-changed features through a CNN branch.
- Score: 12.727650696327878
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Among the current mainstream change detection networks, transformer is
deficient in the ability to capture accurate low-level details, while
convolutional neural network (CNN) is wanting in the capacity to understand
global information and establish remote spatial relationships. Meanwhile, both
of the widely used early fusion and late fusion frameworks are not able to well
learn complete change features. Therefore, based on swin transformer V2 (Swin
V2) and VGG16, we propose an end-to-end compounded dense network SwinV2DNet to
inherit the advantages of both transformer and CNN and overcome the
shortcomings of existing networks in feature learning. Firstly, it captures the
change relationship features through the densely connected Swin V2 backbone,
and provides the low-level pre-changed and post-changed features through a CNN
branch. Based on these three change features, we accomplish accurate change
detection results. Secondly, combined with transformer and CNN, we propose
mixed feature pyramid (MFP) which provides inter-layer interaction information
and intra-layer multi-scale information for complete feature learning. MFP is a
plug and play module which is experimentally proven to be also effective in
other change detection networks. Further more, we impose a self-supervision
strategy to guide a new CNN branch, which solves the untrainable problem of the
CNN branch and provides the semantic change information for the features of
encoder. The state-of-the-art (SOTA) change detection scores and fine-grained
change maps were obtained compared with other advanced methods on four commonly
used public remote sensing datasets. The code is available at
https://github.com/DalongZ/SwinV2DNet.
Related papers
- Relating CNN-Transformer Fusion Network for Change Detection [23.025190360146635]
RCTNet introduces an early fusion backbone to exploit both spatial and temporal features.
Experiments demonstrate RCTNet's clear superiority over traditional RS image CD methods.
arXiv Detail & Related papers (2024-07-03T14:58:40Z) - ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions.
Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks.
We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z) - OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation [70.17681136234202]
We reexamine the design distinctions and test the limits of what a sparse CNN can achieve.
We propose two key components, i.e., adaptive receptive fields (spatially) and adaptive relation, to bridge the gap.
This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module.
arXiv Detail & Related papers (2024-03-21T14:06:38Z) - Explicit Change Relation Learning for Change Detection in VHR Remote
Sensing Images [12.228675703851733]
We propose a network architecture NAME for the explicit mining of change relation features.
The change features of change detection should be divided into pre-changed image features, post-changed image features and change relation features.
Our network performs better, in terms of F1, IoU, and OA, than those of the existing advanced networks for change detection.
arXiv Detail & Related papers (2023-11-14T08:47:38Z) - ConvFormer: Plug-and-Play CNN-Style Transformers for Improving Medical
Image Segmentation [10.727162449071155]
We build CNN-style Transformers (ConvFormer) to promote better attention convergence and thus better segmentation performance.
In contrast to positional embedding and tokenization, ConvFormer adopts 2D convolution and max-pooling for both position information preservation and feature size reduction.
arXiv Detail & Related papers (2023-09-09T02:18:17Z) - MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in
Optical Remote Sensing Images [7.764449276074902]
We propose a hybrid network based on multi-scale CNN-transformer structure, termed MCTNet.
We show that our MCTNet achieves better detection performance than existing state-of-the-art CD methods.
arXiv Detail & Related papers (2022-10-14T07:54:28Z) - Transformer-Guided Convolutional Neural Network for Cross-View
Geolocalization [20.435023745201878]
We propose a novel Transformer-guided convolutional neural network (TransGCNN) architecture.
Our TransGCNN consists of a CNN backbone extracting feature map from an input image and a Transformer head modeling global context.
Experiments on popular benchmark datasets demonstrate that our model achieves top-1 accuracy of 94.12% and 84.92% on CVUSA and CVACT_val, respectively.
arXiv Detail & Related papers (2022-04-21T08:46:41Z) - HAT: Hierarchical Aggregation Transformers for Person Re-identification [87.02828084991062]
We take advantages of both CNNs and Transformers for image-based person Re-ID with high performance.
Work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID.
arXiv Detail & Related papers (2021-07-13T09:34:54Z) - Container: Context Aggregation Network [83.12004501984043]
Recent finding shows that a simple based solution without any traditional convolutional or Transformer components can produce effective visual representations.
We present the model (CONText Ion NERtwok), a general-purpose building block for multi-head context aggregation.
In contrast to Transformer-based methods that do not scale well to downstream tasks that rely on larger input image resolutions, our efficient network, named modellight, can be employed in object detection and instance segmentation networks.
arXiv Detail & Related papers (2021-06-02T18:09:11Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - Curriculum By Smoothing [52.08553521577014]
Convolutional Neural Networks (CNNs) have shown impressive performance in computer vision tasks such as image classification, detection, and segmentation.
We propose an elegant curriculum based scheme that smoothes the feature embedding of a CNN using anti-aliasing or low-pass filters.
As the amount of information in the feature maps increases during training, the network is able to progressively learn better representations of the data.
arXiv Detail & Related papers (2020-03-03T07:27:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.