STeInFormer: Spatial-Temporal Interaction Transformer Architecture for Remote Sensing Change Detection
- URL: http://arxiv.org/abs/2412.17247v1
- Date: Mon, 23 Dec 2024 03:40:04 GMT
- Title: STeInFormer: Spatial-Temporal Interaction Transformer Architecture for Remote Sensing Change Detection
- Authors: Xiaowen Ma, Zhenkai Wu, Mengting Ma, Mengjiao Zhao, Fan Yang, Zhenhong Du, Wei Zhang,
- Abstract summary: We present STeInFormer, a spatial-temporal interaction Transformer architecture for multi-temporal feature extraction.
We also propose a parameter-free multi-frequency token mixer to integrate frequency-domain features that provide spectral information for RSCD.
- Score: 5.4610555622532475
- License:
- Abstract: Convolutional neural networks and attention mechanisms have greatly benefited remote sensing change detection (RSCD) because of their outstanding discriminative ability. Existent RSCD methods often follow a paradigm of using a non-interactive Siamese neural network for multi-temporal feature extraction and change detection heads for feature fusion and change representation. However, this paradigm lacks the contemplation of the characteristics of RSCD in temporal and spatial dimensions, and causes the drawback on spatial-temporal interaction that hinders high-quality feature extraction. To address this problem, we present STeInFormer, a spatial-temporal interaction Transformer architecture for multi-temporal feature extraction, which is the first general backbone network specifically designed for RSCD. In addition, we propose a parameter-free multi-frequency token mixer to integrate frequency-domain features that provide spectral information for RSCD. Experimental results on three datasets validate the effectiveness of the proposed method, which can outperform the state-of-the-art methods and achieve the most satisfactory efficiency-accuracy trade-off. Code is available at https://github.com/xwmaxwma/rschange.
Related papers
- Decomposing and Fusing Intra- and Inter-Sensor Spatio-Temporal Signal for Multi-Sensor Wearable Human Activity Recognition [12.359681612030682]
We propose the DecomposeWHAR model to better model the relationships between modality variables.
The decomposition creates high-dimensional representations of each intra-sensor variable.
The fusion phase begins by capturing relationships between intra-sensor variables and fusing their features at both the channel and variable levels.
arXiv Detail & Related papers (2025-01-19T01:52:28Z) - Relating CNN-Transformer Fusion Network for Change Detection [23.025190360146635]
RCTNet introduces an early fusion backbone to exploit both spatial and temporal features.
Experiments demonstrate RCTNet's clear superiority over traditional RS image CD methods.
arXiv Detail & Related papers (2024-07-03T14:58:40Z) - DDLNet: Boosting Remote Sensing Change Detection with Dual-Domain Learning [5.932234366793244]
Change sensing (RSCD) aims to identify the changes of interest in a region by analyzing multi-temporal remote sensing images.
Existing RSCD methods are devoted to contextual modeling in the spatial domain to enhance the changes of interest.
We propose DNet, a RSCD network based on dual-domain learning (i.e. frequency and spatial domains)
arXiv Detail & Related papers (2024-06-19T14:54:09Z) - Time-Selective RNN for Device-Free Multi-Room Human Presence Detection
Using WiFi CSI [9.927073290898848]
Device-free human presence detection is crucial technology for various applications, including home automation, security, and healthcare.
Recent research has explored the use of wireless channel state information extracted from commercial WiFi access points (APs) to provide detailed channel characteristics.
We propose a device-free human presence detection system for multi-room scenarios using a time-selective conditional dual feature extract recurrent network.
arXiv Detail & Related papers (2023-04-25T19:21:47Z) - STNet: Spatial and Temporal feature fusion network for change detection
in remote sensing images [5.258365841490956]
We propose STNet, a remote sensing change detection network based on spatial and temporal feature fusions.
Experimental results on three benchmark datasets for RSCD demonstrate that the proposed method achieves the state-of-the-art performance.
arXiv Detail & Related papers (2023-04-22T14:40:41Z) - Deep Metric Learning for Unsupervised Remote Sensing Change Detection [60.89777029184023]
Remote Sensing Change Detection (RS-CD) aims to detect relevant changes from Multi-Temporal Remote Sensing Images (MT-RSIs)
The performance of existing RS-CD methods is attributed to training on large annotated datasets.
This paper proposes an unsupervised CD method based on deep metric learning that can deal with both of these issues.
arXiv Detail & Related papers (2023-03-16T17:52:45Z) - Gait Recognition in the Wild with Multi-hop Temporal Switch [81.35245014397759]
gait recognition in the wild is a more practical problem that has attracted the attention of the community of multimedia and computer vision.
This paper presents a novel multi-hop temporal switch method to achieve effective temporal modeling of gait patterns in real-world scenes.
arXiv Detail & Related papers (2022-09-01T10:46:09Z) - Deep Cellular Recurrent Network for Efficient Analysis of Time-Series
Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information.
The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z) - Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for
Gesture Recognition [89.0152015268929]
We propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition.
The proposed method includes two key components: 1) enhanced temporal representation via the 3D Central Difference Convolution (3D-CDC) family, and optimized backbones for multi-modal-rate branches and lateral connections.
The resultant multi-rate network provides a new perspective to understand the relationship between RGB and depth modalities and their temporal dynamics.
arXiv Detail & Related papers (2020-08-21T10:45:09Z) - Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by
Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment.
We implement this algorithm in a real-time robotic system with a microphone array.
The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.