Related papers: LG-CD: Enhancing Language-Guided Change Detection through SAM2 Adaptation

LG-CD: Enhancing Language-Guided Change Detection through SAM2 Adaptation

URL: http://arxiv.org/abs/2509.21894v1
Date: Fri, 26 Sep 2025 05:30:11 GMT
Title: LG-CD: Enhancing Language-Guided Change Detection through SAM2 Adaptation
Authors: Yixiao Liu, Yizhou Yang, Jinwen Li, Jun Tao, Ruoyu Li, Xiangkun Wang, Min Zhu, Junlong Cheng,
Abstract summary: We propose a novel Language-Guided Change Detection model (LG-CD)<n>This model leverages natural language prompts to direct the network's attention to regions of interest.<n>Our experiments on three datasets demonstrate that LG-CD consistently outperforms state-of-the-art change detection methods.
Score: 9.324344835427858
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Remote Sensing Change Detection (RSCD) typically identifies changes in land cover or surface conditions by analyzing multi-temporal images. Currently, most deep learning-based methods primarily focus on learning unimodal visual information, while neglecting the rich semantic information provided by multimodal data such as text. To address this limitation, we propose a novel Language-Guided Change Detection model (LG-CD). This model leverages natural language prompts to direct the network's attention to regions of interest, significantly improving the accuracy and robustness of change detection. Specifically, LG-CD utilizes a visual foundational model (SAM2) as a feature extractor to capture multi-scale pyramid features from high-resolution to low-resolution across bi-temporal remote sensing images. Subsequently, multi-layer adapters are employed to fine-tune the model for downstream tasks, ensuring its effectiveness in remote sensing change detection. Additionally, we design a Text Fusion Attention Module (TFAM) to align visual and textual information, enabling the model to focus on target change regions using text prompts. Finally, a Vision-Semantic Fusion Decoder (V-SFD) is implemented, which deeply integrates visual and semantic information through a cross-attention mechanism to produce highly accurate change detection masks. Our experiments on three datasets (LEVIR-CD, WHU-CD, and SYSU-CD) demonstrate that LG-CD consistently outperforms state-of-the-art change detection methods. Furthermore, our approach provides new insights into achieving generalized change detection by leveraging multimodal information.

Related papers

GRAD-Former: Gated Robust Attention-based Differential Transformer for Change Detection [0.7865560760233441]
Change detection (CD) in remote sensing aims to identify semantic differences between satellite images captured at different times.<n>Traditional transformer-based methods suffer from quadratic computational complexity when applied to very high-resolution (VHR) satellite images.<n>We present GRAD-Former, a novel framework that enhances contextual understanding while maintaining efficiency through reduced model size.
arXiv Detail & Related papers (2026-03-01T15:56:42Z)
RemoteVAR: Autoregressive Visual Modeling for Remote Sensing Change Detection [52.32112533846212]
Remote sensing change detection is central to applications such as environmental monitoring and disaster assessment.<n>Visual autoregressive models have recently shown impressive image generation capability, but their adoption for pixel-level discriminative tasks remains limited due to weak controllability, suboptimal dense prediction performance and exposure bias.<n>We introduce a new VAR-based change detection framework that addresses these limitations by conditioning autoregressive prediction on multi-resolution fused bi-temporal features via cross-attention, and by employing an autoregressive training strategy designed specifically for change map prediction.
arXiv Detail & Related papers (2026-01-17T03:50:00Z)
UniVCD: A New Method for Unsupervised Change Detection in the Open-Vocabulary Era [0.0]
Change detection (CD) identifies scene changes from multi-temporal observations and is widely used in urban development and environmental monitoring.<n>Most existing CD methods rely on supervised learning, making performance strongly dataset-dependent and incurring high annotation costs.<n>We propose Unified Open-Vocabulary Change Detection (UniVCD), an unsupervised, open-vocabulary change detection method built on frozen SAM2 and CLIP.
arXiv Detail & Related papers (2025-12-15T08:42:23Z)
Referring Change Detection in Remote Sensing Imagery [49.841833753558575]
We introduce Referring Change Detection (RCD), which leverages natural language prompts to detect specific classes of changes in remote sensing images.<n>We propose a two-stage framework consisting of (I) textbfRCDNet, a cross-modal fusion network designed for referring change detection, and (II) textbfRCDGen, a diffusion-based synthetic data generation pipeline.
arXiv Detail & Related papers (2025-12-12T16:57:12Z)
Multimodal Feature Fusion Network with Text Difference Enhancement for Remote Sensing Change Detection [36.96267014127019]
MMChange is a multimodal RSCD method that combines image and text modalities to enhance accuracy and robustness.<n>To overcome the semantic limitations of image features, we employ a vision language model (VLM) to generate semantic descriptions of bitemporal images.<n>A Textual Difference Enhancement (TDE) module captures fine grained semantic shifts, guiding the model toward meaningful changes.
arXiv Detail & Related papers (2025-09-04T07:39:18Z)
Wavelet-Guided Dual-Frequency Encoding for Remote Sensing Change Detection [67.84730634802204]
Change detection in remote sensing imagery plays a vital role in various engineering applications, such as natural disaster monitoring, urban expansion tracking, and infrastructure management.<n>Most existing methods still rely on spatial-domain modeling, where the limited diversity of feature representations hinders the detection of subtle change regions.<n>We observe that frequency-domain feature modeling particularly in the wavelet domain amplify fine-grained differences in frequency components, enhancing the perception of edge changes that are challenging to capture in the spatial domain.
arXiv Detail & Related papers (2025-08-07T11:14:16Z)
MGCR-Net:Multimodal Graph-Conditioned Vision-Language Reconstruction Network for Remote Sensing Change Detection [55.702662643521265]
We propose the multimodal graph-conditioned vision-language reconstruction network (MGCR-Net) to explore the semantic interaction capabilities of multimodal data.<n> Experimental results on four public datasets demonstrate that MGCR achieves superior performance compared to mainstream CD methods.
arXiv Detail & Related papers (2025-08-03T02:50:08Z)
DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-guided Difference Perception [0.846600473226587]
We introduce remote sensing image change analysis (RSICA) as a new paradigm that combines the strengths of change detection and visual question answering.<n>We propose DeltaVLM, an end-to-end architecture tailored for interactive RSICA.<n>DeltaVLM features three innovations: (1) a fine-tuned bi-temporal vision encoder to capture temporal differences; (2) a visual difference perception module with a cross-semantic relation measuring mechanism to interpret changes; and (3) an instruction-guided Q-former to effectively extract query-relevant difference information.
arXiv Detail & Related papers (2025-07-30T03:14:27Z)
Detect Changes like Humans: Incorporating Semantic Priors for Improved Change Detection [52.62459671461816]
This paper explores incorporating semantic priors from visual foundation models to improve the ability to detect changes.<n>Inspired by the human visual paradigm, a novel dual-stream feature decoder is derived to distinguish changes by combining semantic-aware features and difference-aware features.
arXiv Detail & Related papers (2024-12-22T08:27:15Z)
Enhancing Perception of Key Changes in Remote Sensing Image Change Captioning [49.24306593078429]
We propose a novel framework for remote sensing image change captioning, guided by Key Change Features and Instruction-tuned (KCFI) KCFI includes a ViTs encoder for extracting bi-temporal remote sensing image features, a key feature perceiver for identifying critical change areas, and a pixel-level change detection decoder. To validate the effectiveness of our approach, we compare it against several state-of-the-art change captioning methods on the LEVIR-CC dataset.
arXiv Detail & Related papers (2024-09-19T09:33:33Z)
TransY-Net:Learning Fully Transformer Networks for Change Detection of Remote Sensing Images [64.63004710817239]
We propose a novel Transformer-based learning framework named TransY-Net for remote sensing image CD. It improves the feature extraction from a global view and combines multi-level visual features in a pyramid manner. Our proposed method achieves a new state-of-the-art performance on four optical and two SAR image CD benchmarks.
arXiv Detail & Related papers (2023-10-22T07:42:19Z)
Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions. We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training. Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z)
Dsfer-Net: A Deep Supervision and Feature Retrieval Network for Bitemporal Change Detection Using Modern Hopfield Networks [35.415260892693745]
We propose a Deep Supervision and FEature Retrieval network (Dsfer-Net) for bitemporal change detection. Specifically, the highly representative deep features of bitemporal images are jointly extracted through a fully convolutional Siamese network. Our end-to-end network establishes a novel framework by aggregating retrieved features and feature pairs from different layers.
arXiv Detail & Related papers (2023-04-03T16:01:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.