MS-Former: Memory-Supported Transformer for Weakly Supervised Change
Detection with Patch-Level Annotations
- URL: http://arxiv.org/abs/2311.09726v1
- Date: Thu, 16 Nov 2023 09:57:29 GMT
- Title: MS-Former: Memory-Supported Transformer for Weakly Supervised Change
Detection with Patch-Level Annotations
- Authors: Zhenglai Li, Chang Tang, Xinwang Liu, Changdong Li, Xianju Li, Wei
Zhang
- Abstract summary: We propose a memory-supported transformer (MS-Former) for weakly supervised change detection.
MS-Former consists of a bi-directional attention block (BAB) and a patch-level supervision scheme (PSS)
Experimental results on three benchmark datasets demonstrate the effectiveness of our proposed method in the change detection task.
- Score: 50.79913333804232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fully supervised change detection methods have achieved significant
advancements in performance, yet they depend severely on acquiring costly
pixel-level labels. Considering that the patch-level annotations also contain
abundant information corresponding to both changed and unchanged objects in
bi-temporal images, an intuitive solution is to segment the changes with
patch-level annotations. How to capture the semantic variations associated with
the changed and unchanged regions from the patch-level annotations to obtain
promising change results is the critical challenge for the weakly supervised
change detection task. In this paper, we propose a memory-supported transformer
(MS-Former), a novel framework consisting of a bi-directional attention block
(BAB) and a patch-level supervision scheme (PSS) tailored for weakly supervised
change detection with patch-level annotations. More specifically, the BAM
captures contexts associated with the changed and unchanged regions from the
temporal difference features to construct informative prototypes stored in the
memory bank. On the other hand, the BAM extracts useful information from the
prototypes as supplementary contexts to enhance the temporal difference
features, thereby better distinguishing changed and unchanged regions. After
that, the PSS guides the network learning valuable knowledge from the
patch-level annotations, thus further elevating the performance. Experimental
results on three benchmark datasets demonstrate the effectiveness of our
proposed method in the change detection task. The demo code for our work will
be publicly available at \url{https://github.com/guanyuezhen/MS-Former}.
Related papers
- Enhancing Perception of Key Changes in Remote Sensing Image Change Captioning [49.24306593078429]
We propose a novel framework for remote sensing image change captioning, guided by Key Change Features and Instruction-tuned (KCFI)
KCFI includes a ViTs encoder for extracting bi-temporal remote sensing image features, a key feature perceiver for identifying critical change areas, and a pixel-level change detection decoder.
To validate the effectiveness of our approach, we compare it against several state-of-the-art change captioning methods on the LEVIR-CC dataset.
arXiv Detail & Related papers (2024-09-19T09:33:33Z) - ChangeBind: A Hybrid Change Encoder for Remote Sensing Change Detection [16.62779899494721]
Change detection (CD) is a fundamental task in remote sensing (RS) which aims to detect the semantic changes between the same geographical regions at different time stamps.
We propose an effective Siamese-based framework to encode the semantic changes occurring in the bi-temporal RS images.
arXiv Detail & Related papers (2024-04-26T17:47:14Z) - Segment Any Change [64.23961453159454]
We propose a new type of change detection model that supports zero-shot prediction and generalization on unseen change types and data distributions.
AnyChange is built on the segment anything model (SAM) via our training-free adaptation method, bitemporal latent matching.
We also propose a point query mechanism to enable AnyChange's zero-shot object-centric change detection capability.
arXiv Detail & Related papers (2024-02-02T07:17:39Z) - MapFormer: Boosting Change Detection by Using Pre-change Information [2.436285270638041]
We leverage existing maps describing features of the earth's surface for change detection in bi-temporal images.
We show that the simple integration of the additional information via concatenation of latent representations suffices to significantly outperform state-of-the-art change detection methods.
Our approach outperforms existing change detection methods by an absolute 11.7% and 18.4% in terms of binary change IoU on DynamicEarthNet and HRSCD, respectively.
arXiv Detail & Related papers (2023-03-31T07:39:12Z) - Neighborhood Contrastive Transformer for Change Captioning [80.10836469177185]
We propose a neighborhood contrastive transformer to improve the model's perceiving ability for various changes under different scenes.
The proposed method achieves the state-of-the-art performance on three public datasets with different change scenarios.
arXiv Detail & Related papers (2023-03-06T14:39:54Z) - Self-Pair: Synthesizing Changes from Single Source for Object Change
Detection in Remote Sensing Imagery [6.586756080460231]
We train a change detector using two spatially unrelated images with corresponding semantic labels such as building.
We show that manipulating the source image as an after-image is crucial to the performance of change detection.
Our method outperforms existing methods based on single-temporal supervision.
arXiv Detail & Related papers (2022-12-20T13:26:42Z) - UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision
Transformer for Face Forgery Detection [52.91782218300844]
We propose a novel Unsupervised Inconsistency-Aware method based on Vision Transformer, called UIA-ViT.
Due to the self-attention mechanism, the attention map among patch embeddings naturally represents the consistency relation, making the vision Transformer suitable for the consistency representation learning.
arXiv Detail & Related papers (2022-10-23T15:24:47Z) - DeViT: Deformed Vision Transformers in Video Inpainting [59.73019717323264]
We extend previous Transformers with patch alignment by introducing Deformed Patch-based Homography (DePtH)
Second, we introduce Mask Pruning-based Patch Attention (MPPA) to improve patch-wised feature matching.
Third, we introduce a Spatial-Temporal weighting Adaptor (STA) module to obtain accurate attention to spatial-temporal tokens.
arXiv Detail & Related papers (2022-09-28T08:57:14Z) - Detection and Description of Change in Visual Streams [20.62923173347949]
We propose a new approach to incorporating unlabeled data into training to generate natural language descriptions of change.
We also develop a framework for estimating the time of change in visual stream.
We use learned representations for change evidence and consistency of perceived change, and combine these in a regularized graph cut based change detector.
arXiv Detail & Related papers (2020-03-27T20:49:38Z) - DASNet: Dual attentive fully convolutional siamese networks for change
detection of high resolution satellite images [17.839181739760676]
The research objective is to identity the change information of interest and filter out the irrelevant change information as interference factors.
Recently, the rise of deep learning has provided new tools for change detection, which have yielded impressive results.
We propose a new method, namely, dual attentive fully convolutional Siamese networks (DASNet) for change detection in high-resolution images.
arXiv Detail & Related papers (2020-03-07T16:57:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.