MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in
Optical Remote Sensing Images
- URL: http://arxiv.org/abs/2210.07601v3
- Date: Sat, 29 Jul 2023 03:13:05 GMT
- Title: MCTNet: A Multi-Scale CNN-Transformer Network for Change Detection in
Optical Remote Sensing Images
- Authors: Weiming Li, Lihui Xue, Xueqian Wang, and Gang Li
- Abstract summary: We propose a hybrid network based on multi-scale CNN-transformer structure, termed MCTNet.
We show that our MCTNet achieves better detection performance than existing state-of-the-art CD methods.
- Score: 7.764449276074902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For the task of change detection (CD) in remote sensing images, deep
convolution neural networks (CNNs)-based methods have recently aggregated
transformer modules to improve the capability of global feature extraction.
However, they suffer degraded CD performance on small changed areas due to the
simple single-scale integration of deep CNNs and transformer modules. To
address this issue, we propose a hybrid network based on multi-scale
CNN-transformer structure, termed MCTNet, where the multi-scale global and
local information are exploited to enhance the robustness of the CD performance
on changed areas with different sizes. Especially, we design the ConvTrans
block to adaptively aggregate global features from transformer modules and
local features from CNN layers, which provides abundant global-local features
with different scales. Experimental results demonstrate that our MCTNet
achieves better detection performance than existing state-of-the-art CD
methods.
Related papers
- CTA-Net: A CNN-Transformer Aggregation Network for Improving Multi-Scale Feature Extraction [14.377544481394013]
CTA-Net combines CNNs and ViTs, with transformers capturing long-range dependencies and CNNs extracting localized features.
This integration enables efficient processing of detailed local and broader contextual information.
Experiments on small-scale datasets with fewer than 100,000 samples show that CTA-Net achieves superior performance.
arXiv Detail & Related papers (2024-10-15T09:27:26Z) - Relating CNN-Transformer Fusion Network for Change Detection [23.025190360146635]
RCTNet introduces an early fusion backbone to exploit both spatial and temporal features.
Experiments demonstrate RCTNet's clear superiority over traditional RS image CD methods.
arXiv Detail & Related papers (2024-07-03T14:58:40Z) - ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection [65.59969454655996]
We propose an efficient change detection framework, ELGC-Net, which leverages rich contextual information to precisely estimate change regions.
Our proposed ELGC-Net sets a new state-of-the-art performance in remote sensing change detection benchmarks.
We also introduce ELGC-Net-LW, a lighter variant with significantly reduced computational complexity, suitable for resource-constrained settings.
arXiv Detail & Related papers (2024-03-26T17:46:25Z) - Multiscale Low-Frequency Memory Network for Improved Feature Extraction
in Convolutional Neural Networks [13.815116154370834]
We introduce a novel framework, the Multiscale Low-Frequency Memory (MLFM) Network.
The MLFM efficiently preserves low-frequency information, enhancing performance in targeted computer vision tasks.
Our work builds upon the existing CNN foundations and paves the way for future advancements in computer vision.
arXiv Detail & Related papers (2024-03-13T00:48:41Z) - TransY-Net:Learning Fully Transformer Networks for Change Detection of
Remote Sensing Images [64.63004710817239]
We propose a novel Transformer-based learning framework named TransY-Net for remote sensing image CD.
It improves the feature extraction from a global view and combines multi-level visual features in a pyramid manner.
Our proposed method achieves a new state-of-the-art performance on four optical and two SAR image CD benchmarks.
arXiv Detail & Related papers (2023-10-22T07:42:19Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for
Remote Sensing Images Change Detection [12.727650696327878]
We propose an end-to-end compounded dense network SwinV2DNet to inherit advantages of transformer and CNN.
It captures the change relationship features through the densely connected Swin V2 backbone.
It provides the low-level pre-changed and post-changed features through a CNN branch.
arXiv Detail & Related papers (2023-08-22T03:31:52Z) - Vision Transformer with Convolutions Architecture Search [72.70461709267497]
We propose an architecture search method-Vision Transformer with Convolutions Architecture Search (VTCAS)
The high-performance backbone network searched by VTCAS introduces the desirable features of convolutional neural networks into the Transformer architecture.
It enhances the robustness of the neural network for object recognition, especially in the low illumination indoor scene.
arXiv Detail & Related papers (2022-03-20T02:59:51Z) - Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks.
We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers.
Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z) - Container: Context Aggregation Network [83.12004501984043]
Recent finding shows that a simple based solution without any traditional convolutional or Transformer components can produce effective visual representations.
We present the model (CONText Ion NERtwok), a general-purpose building block for multi-head context aggregation.
In contrast to Transformer-based methods that do not scale well to downstream tasks that rely on larger input image resolutions, our efficient network, named modellight, can be employed in object detection and instance segmentation networks.
arXiv Detail & Related papers (2021-06-02T18:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.