ReasonCD: A Multimodal Reasoning Large Model for Implicit Change-of-Interest Semantic Mining
- URL: http://arxiv.org/abs/2512.19354v1
- Date: Mon, 22 Dec 2025 12:54:26 GMT
- Title: ReasonCD: A Multimodal Reasoning Large Model for Implicit Change-of-Interest Semantic Mining
- Authors: Zhenyang Huang, Xiao Yu, Yi Zhang, Decheng Wang, Hang Ruan,
- Abstract summary: Methods that use semantic guidance for detecting users' CRoI overly rely on explicit textual descriptions of CRoI.<n>This paper proposes a multimodal reasoning change detection model named ReasonCD, capable of mining users' implicit intent task intents.<n> Experimental results show that the ReasonCD model not only excels at basic reasoning-based change detection tasks but can also explain the reasoning process to aid human decision-making.
- Score: 8.920164654015808
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Remote sensing image change detection is one of the fundamental tasks in remote sensing intelligent interpretation. Its core objective is to identify changes within change regions of interest (CRoI). Current multimodal large models encode rich human semantic knowledge, which is utilized for guidance in tasks such as remote sensing change detection. However, existing methods that use semantic guidance for detecting users' CRoI overly rely on explicit textual descriptions of CRoI, leading to the problem of near-complete performance failure when presented with implicit CRoI textual descriptions. This paper proposes a multimodal reasoning change detection model named ReasonCD, capable of mining users' implicit task intent. The model leverages the powerful reasoning capabilities of pre-trained large language models to mine users' implicit task intents and subsequently obtains different change detection results based on these intents. Experiments on public datasets demonstrate that the model achieves excellent change detection performance, with an F1 score of 92.1\% on the BCDD dataset. Furthermore, to validate its superior reasoning functionality, this paper annotates a subset of reasoning data based on the SECOND dataset. Experimental results show that the model not only excels at basic reasoning-based change detection tasks but can also explain the reasoning process to aid human decision-making.
Related papers
- LLM-Augmented Changepoint Detection: A Framework for Ensemble Detection and Automated Explanation [2.6022681036325874]
This paper introduces a novel changepoint detection framework that combines ensemble statistical methods with Large Language Models (LLMs)<n>The proposed ensemble method aggregates results from ten distinct changepoint detection algorithms, achieving superior performance and robustness compared to individual methods.<n>For private or domain-specific data, a Retrieval-Augmented Generation (RAG) solution enables explanations grounded in user-provided documents.
arXiv Detail & Related papers (2026-01-06T12:04:38Z) - Adaptation Method for Misinformation Identification [8.581136866856255]
We propose ADOSE, an Active Domain Adaptation (ADA) framework for multimodal fake news detection.<n>ADOSE actively annotates a small subset of target samples to improve detection performance.<n>ADOSE outperforms existing ADA methods by 2.72% $sim$ 14.02%, indicating the superiority of our model.
arXiv Detail & Related papers (2025-04-19T04:18:32Z) - Detect Changes like Humans: Incorporating Semantic Priors for Improved Change Detection [52.62459671461816]
This paper explores incorporating semantic priors from visual foundation models to improve the ability to detect changes.<n>Inspired by the human visual paradigm, a novel dual-stream feature decoder is derived to distinguish changes by combining semantic-aware features and difference-aware features.
arXiv Detail & Related papers (2024-12-22T08:27:15Z) - Show Me What and Where has Changed? Question Answering and Grounding for Remote Sensing Change Detection [82.65760006883248]
We introduce a new task named Change Detection Question Answering and Grounding (CDQAG)
CDQAG extends the traditional change detection task by providing interpretable textual answers and intuitive visual evidence.
We construct the first CDQAG benchmark dataset, termed QAG-360K, comprising over 360K triplets of questions, textual answers, and corresponding high-quality visual masks.
arXiv Detail & Related papers (2024-10-31T11:20:13Z) - A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning [9.786907179872815]
The potential of vision and language remains underexplored in face forgery detection.
There is a need for a methodology that converts face forgery detection to a Visual Question Answering (VQA) task.
We propose a multi-staged approach that diverges from the traditional binary decision paradigm to address this gap.
arXiv Detail & Related papers (2024-10-01T08:16:40Z) - Innovative Horizons in Aerial Imagery: LSKNet Meets DiffusionDet for
Advanced Object Detection [55.2480439325792]
We present an in-depth evaluation of an object detection model that integrates the LSKNet backbone with the DiffusionDet head.
The proposed model achieves a mean average precision (MAP) of approximately 45.7%, which is a significant improvement.
This advancement underscores the effectiveness of the proposed modifications and sets a new benchmark in aerial image analysis.
arXiv Detail & Related papers (2023-11-21T19:49:13Z) - Align, Perturb and Decouple: Toward Better Leverage of Difference
Information for RSI Change Detection [24.249552791014644]
Change detection is a widely adopted technique in remote sense imagery (RSI) analysis.
We propose a series of operations to fully exploit the difference information: Alignment, Perturbation and Decoupling.
arXiv Detail & Related papers (2023-05-30T03:39:53Z) - Explaining Cross-Domain Recognition with Interpretable Deep Classifier [100.63114424262234]
Interpretable Deep (IDC) learns the nearest source samples of a target sample as evidence upon which the classifier makes the decision.
Our IDC leads to a more explainable model with almost no accuracy degradation and effectively calibrates classification for optimum reject options.
arXiv Detail & Related papers (2022-11-15T15:58:56Z) - FairMOT: On the Fairness of Detection and Re-Identification in Multiple
Object Tracking [92.48078680697311]
Multi-object tracking (MOT) is an important problem in computer vision.
We present a simple yet effective approach termed as FairMOT based on the anchor-free object detection architecture CenterNet.
The approach achieves high accuracy for both detection and tracking.
arXiv Detail & Related papers (2020-04-04T08:18:00Z) - DASNet: Dual attentive fully convolutional siamese networks for change
detection of high resolution satellite images [17.839181739760676]
The research objective is to identity the change information of interest and filter out the irrelevant change information as interference factors.
Recently, the rise of deep learning has provided new tools for change detection, which have yielded impressive results.
We propose a new method, namely, dual attentive fully convolutional Siamese networks (DASNet) for change detection in high-resolution images.
arXiv Detail & Related papers (2020-03-07T16:57:10Z) - Stance Detection Benchmark: How Robust Is Your Stance Detection? [65.91772010586605]
Stance Detection (StD) aims to detect an author's stance towards a certain topic or claim.
We introduce a StD benchmark that learns from ten StD datasets of various domains in a multi-dataset learning setting.
Within this benchmark setup, we are able to present new state-of-the-art results on five of the datasets.
arXiv Detail & Related papers (2020-01-06T13:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.