UniChange: Unifying Change Detection with Multimodal Large Language Model
- URL: http://arxiv.org/abs/2511.02607v1
- Date: Tue, 04 Nov 2025 14:31:06 GMT
- Title: UniChange: Unifying Change Detection with Multimodal Large Language Model
- Authors: Xu Zhang, Danyang Li, Xiaohang Dong, Tianhao Wu, Hualong Yu, Jianye Wang, Qicheng Li, Xiang Li,
- Abstract summary: Change detection (CD) is a fundamental task for monitoring and analyzing land cover dynamics.<n>Current models typically acquire limited knowledge from single-type annotated data.<n>We develop UniChange to leverage diverse binary change detection (BCD) and semantic change (SCD) datasets.
- Score: 17.98018484822312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Change detection (CD) is a fundamental task for monitoring and analyzing land cover dynamics. While recent high performance models and high quality datasets have significantly advanced the field, a critical limitation persists. Current models typically acquire limited knowledge from single-type annotated data and cannot concurrently leverage diverse binary change detection (BCD) and semantic change detection (SCD) datasets. This constraint leads to poor generalization and limited versatility. The recent advancements in Multimodal Large Language Models (MLLMs) introduce new possibilities for a unified CD framework. We leverage the language priors and unification capabilities of MLLMs to develop UniChange, the first MLLM-based unified change detection model. UniChange integrates generative language abilities with specialized CD functionalities. Our model successfully unifies both BCD and SCD tasks through the introduction of three special tokens: [T1], [T2], and [CHANGE]. Furthermore, UniChange utilizes text prompts to guide the identification of change categories, eliminating the reliance on predefined classification heads. This design allows UniChange to effectively acquire knowledge from multi-source datasets, even when their class definitions conflict. Experiments on four public benchmarks (WHU-CD, S2Looking, LEVIR-CD+, and SECOND) demonstrate SOTA performance, achieving IoU scores of 90.41, 53.04, 78.87, and 57.62, respectively, surpassing all previous methods. The code is available at https://github.com/Erxucomeon/UniChange.
Related papers
- Make Some Noise: Unsupervised Remote Sensing Change Detection Using Latent Space Perturbations [0.0]
Unsupervised change detection (UCD) in remote sensing aims to localise semantic changes between two images of the same region without relying on labelled data during training.<n>We propose MaSoN, an end-to-end UCD framework that synthesises diverse changes directly in the latent feature space during training.<n>It generates changes that are dynamically estimated using feature statistics of target data, enabling diverse yet data-driven variation aligned with the target domain.
arXiv Detail & Related papers (2026-02-23T14:27:36Z) - One Language-Free Foundation Model Is Enough for Universal Vision Anomaly Detection [65.11602552904456]
Universal visual anomaly detection (AD) aims to identify anomaly images and segment anomaly regions towards open and dynamic scenarios.<n>Current methods often struggle with complex prompt engineering, elaborate adaptation modules, and challenging training strategies.<n>This paper presents an embarrassingly simple, general, and effective framework for Universal vision Anomaly Detection (UniADet)
arXiv Detail & Related papers (2026-01-09T06:05:18Z) - Parallel Universes, Parallel Languages: A Comprehensive Study on LLM-based Multilingual Counterfactual Example Generation [49.2073409243885]
Large language models (LLMs) excel at generating English counterfactuals and demonstrate multilingual proficiency.<n>We conduct automatic evaluations on both directly generated counterfactuals in the target languages and those derived via English translation across six languages.<n>We identify and categorize four main types of errors that consistently appear in the generated counterfactuals across languages.
arXiv Detail & Related papers (2026-01-01T08:53:49Z) - UniVCD: A New Method for Unsupervised Change Detection in the Open-Vocabulary Era [0.0]
Change detection (CD) identifies scene changes from multi-temporal observations and is widely used in urban development and environmental monitoring.<n>Most existing CD methods rely on supervised learning, making performance strongly dataset-dependent and incurring high annotation costs.<n>We propose Unified Open-Vocabulary Change Detection (UniVCD), an unsupervised, open-vocabulary change detection method built on frozen SAM2 and CLIP.
arXiv Detail & Related papers (2025-12-15T08:42:23Z) - The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs [45.08958917457921]
Large language models (LLMs) still struggle across tasks outside of high-resource languages.<n>In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce.
arXiv Detail & Related papers (2025-05-23T20:28:31Z) - SChanger: Change Detection from a Semantic Change and Spatial Consistency Perspective [0.6749750044497732]
We develop a fine-tuning strategy called the Semantic Change Network (SCN) to address the data scarcity issue.<n>We observe that the locations of changes between the two images are spatially identical, a concept we refer to as spatial consistency.<n>This enhances the modeling of multi-scale changes and helps capture underlying relationships in change detection semantics.
arXiv Detail & Related papers (2025-03-26T17:15:43Z) - New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration [49.180693704510006]
Referring Expression (REC) is a cross-modal task that evaluates the interplay of language understanding, image comprehension, and language-to-image grounding.<n>It serves as an essential testing ground for Multimodal Large Language Models (MLLMs)
arXiv Detail & Related papers (2025-02-27T13:58:44Z) - ChangeDiff: A Multi-Temporal Change Detection Data Generator with Flexible Text Prompts via Diffusion Model [21.50463332137926]
This paper focuses on the semantic CD (SCD) task and develops a multi-temporal SCD data generator ChangeDiff.<n>ChangeDiff generates change data in two steps: first, it uses text prompts and a text-to-image model to create continuous layouts, and then it employs layout-to-image to convert these layouts into images.<n>Our generated data shows significant progress in temporal continuity, spatial diversity, and quality realism, empowering change detectors with accuracy and transferability.
arXiv Detail & Related papers (2024-12-20T03:58:28Z) - ViewDelta: Scaling Scene Change Detection through Text-Conditioning [0.0]
We introduce a general framework for Scene Change Detection (SCD) that addresses the core ambiguity of distinguishing "relevant" from "nuisance" changes.<n>We propose ViewDelta, a text conditioned change detection framework that uses natural language prompts to define relevant changes.<n>Our code and dataset are available at https://joshuakgao.io/viewdelta/.
arXiv Detail & Related papers (2024-12-10T15:51:17Z) - ChangeAnywhere: Sample Generation for Remote Sensing Change Detection via Semantic Latent Diffusion Model [4.677012401985776]
ChangeAnywhere is a novel CD sample generation method using the semantic latent diffusion model and single-temporal images.
ChangeAnywhere captures the two essentials of CD samples, i.e. change implies semantically different, and non-change implies reasonable change under the same semantic constraints.
The ChangeAnywhere-100K significantly improved both zero-shot and few-shot performance on two CD benchmark datasets for various deep learning-based CD models.
arXiv Detail & Related papers (2024-04-13T03:46:35Z) - A New Learning Paradigm for Foundation Model-based Remote Sensing Change
Detection [54.01158175996638]
Change detection (CD) is a critical task to observe and analyze dynamic processes of land cover.
We propose a Bi-Temporal Adapter Network (BAN), which is a universal foundation model-based CD adaptation framework.
arXiv Detail & Related papers (2023-12-02T15:57:17Z) - Multi-Modal Few-Shot Temporal Action Detection [157.96194484236483]
Few-shot (FS) and zero-shot (ZS) learning are two different approaches for scaling temporal action detection to new classes.
We introduce a new multi-modality few-shot (MMFS) TAD problem, which can be considered as a marriage of FS-TAD and ZS-TAD.
arXiv Detail & Related papers (2022-11-27T18:13:05Z) - Few-Shot Class-Incremental Learning by Sampling Multi-Phase Tasks [59.12108527904171]
A model should recognize new classes and maintain discriminability over old classes.
The task of recognizing few-shot new classes without forgetting old classes is called few-shot class-incremental learning (FSCIL)
We propose a new paradigm for FSCIL based on meta-learning by LearnIng Multi-phase Incremental Tasks (LIMIT)
arXiv Detail & Related papers (2022-03-31T13:46:41Z) - Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language
Model [58.27176041092891]
Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements.
We propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features from the entangled pretrained cross-lingual representations.
Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts.
arXiv Detail & Related papers (2020-11-23T16:00:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.