Related papers: Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering

Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering

URL: http://arxiv.org/abs/2508.08974v3
Date: Fri, 24 Oct 2025 10:53:51 GMT
Title: Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering
Authors: Elman Ghazaei, Erchan Aptoula,
Abstract summary: Change detection methods typically require expert knowledge for accurate interpretation.<n>New multi-modal and multi-domain dataset, BrightVQA, is introduced to facilitate domain generalization research.<n>Text-Conditioned State Space Model (TCSSM) framework is proposed to leverage both bi-temporal imagery and geo-disaster-related textual information.
Score: 4.698129958118586
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The Earth's surface is constantly changing, and detecting these changes provides valuable insights that benefit various aspects of human society. While traditional change detection methods have been employed to detect changes from bi-temporal images, these approaches typically require expert knowledge for accurate interpretation. To enable broader and more flexible access to change information by non-expert users, the task of Change Detection Visual Question Answering (CDVQA) has been introduced. However, existing CDVQA methods have been developed under the assumption that training and testing datasets share similar distributions. This assumption does not hold in real-world applications, where domain shifts often occur. In this paper, the CDVQA task is revisited with a focus on addressing domain shift. To this end, a new multi-modal and multi-domain dataset, BrightVQA, is introduced to facilitate domain generalization research in CDVQA. Furthermore, a novel state space model, termed Text-Conditioned State Space Model (TCSSM), is proposed. The TCSSM framework is designed to leverage both bi-temporal imagery and geo-disaster-related textual information in an unified manner to extract domain-invariant features across domains. Input-dependent parameters existing in TCSSM are dynamically predicted by using both bi-temporal images and geo-disaster-related description, thereby facilitating the alignment between bi-temporal visual data and the associated textual descriptions. Extensive experiments are conducted to evaluate the proposed method against state-of-the-art models, and superior performance is consistently demonstrated. The code and dataset will be made publicly available upon acceptance at https://github.com/Elman295/TCSSM.

Related papers

Exploiting Domain Properties in Language-Driven Domain Generalization for Semantic Segmentation [16.081767698947186]
We present a novel domain generalization framework for semantic segmentation, namely Domain-aware Prompt-driven Masked Transformer (DPMFormer)<n> Firstly, we introduce domain-aware prompt learning to facilitate semantic alignment between visual and textual cues.<n>To capture various domain-specific properties with a single source dataset, we propose domain-aware contrastive learning along with the texture perturbation that diversifies the observable domains.
arXiv Detail & Related papers (2025-12-03T06:58:38Z)
Out-of-Context Misinformation Detection via Variational Domain-Invariant Learning with Test-Time Training [7.447483980331488]
Out-of-context misinformation (OOC) is a low-cost form of misinformation in news reports.<n>We propose textbfVDT to enhance the domain adaptation capability for OOC misinformation detection.
arXiv Detail & Related papers (2025-11-13T11:34:26Z)
Detect Changes like Humans: Incorporating Semantic Priors for Improved Change Detection [52.62459671461816]
This paper explores incorporating semantic priors from visual foundation models to improve the ability to detect changes.<n>Inspired by the human visual paradigm, a novel dual-stream feature decoder is derived to distinguish changes by combining semantic-aware features and difference-aware features.
arXiv Detail & Related papers (2024-12-22T08:27:15Z)
Generalize or Detect? Towards Robust Semantic Segmentation Under Multiple Distribution Shifts [56.57141696245328]
In open-world scenarios, where both novel classes and domains may exist, an ideal segmentation model should detect anomaly classes for safety. Existing methods often struggle to distinguish between domain-level and semantic-level distribution shifts.
arXiv Detail & Related papers (2024-11-06T11:03:02Z)
Show Me What and Where has Changed? Question Answering and Grounding for Remote Sensing Change Detection [82.65760006883248]
We introduce a new task named Change Detection Question Answering and Grounding (CDQAG) CDQAG extends the traditional change detection task by providing interpretable textual answers and intuitive visual evidence. We construct the first CDQAG benchmark dataset, termed QAG-360K, comprising over 360K triplets of questions, textual answers, and corresponding high-quality visual masks.
arXiv Detail & Related papers (2024-10-31T11:20:13Z)
A Late-Stage Bitemporal Feature Fusion Network for Semantic Change Detection [32.112311027857636]
We propose a novel late-stage bitemporal feature fusion network to address the issue of semantic change detection. Specifically, we propose local global attentional aggregation module to strengthen feature fusion, and propose local global context enhancement module to highlight pivotal semantics. Our proposed model achieves new state-of-the-art performance on both datasets.
arXiv Detail & Related papers (2024-06-15T16:02:10Z)
SRC-Net: Bi-Temporal Spatial Relationship Concerned Network for Change Detection [9.682463974799893]
Change detection (CD) in remote sensing imagery is a crucial task with applications in environmental monitoring, urban development, and disaster management. We propose SRC-Net: a bi-temporal spatial relationship concerned network for CD.
arXiv Detail & Related papers (2024-06-09T06:53:39Z)
Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector [72.05791402494727]
This paper studies the challenging cross-domain few-shot object detection (CD-FSOD) It aims to develop an accurate object detector for novel domains with minimal labeled examples.
arXiv Detail & Related papers (2024-02-05T15:25:32Z)
Context-aware Domain Adaptation for Time Series Anomaly Detection [69.3488037353497]
Time series anomaly detection is a challenging task with a wide range of real-world applications. Recent efforts have been devoted to time series domain adaptation to leverage knowledge from similar domains. We propose a framework that combines context sampling and anomaly detection into a joint learning procedure.
arXiv Detail & Related papers (2023-04-15T02:28:58Z)
MapFormer: Boosting Change Detection by Using Pre-change Information [2.436285270638041]
We leverage existing maps describing features of the earth's surface for change detection in bi-temporal images. We show that the simple integration of the additional information via concatenation of latent representations suffices to significantly outperform state-of-the-art change detection methods. Our approach outperforms existing change detection methods by an absolute 11.7% and 18.4% in terms of binary change IoU on DynamicEarthNet and HRSCD, respectively.
arXiv Detail & Related papers (2023-03-31T07:39:12Z)
TAL: Two-stream Adaptive Learning for Generalizable Person Re-identification [115.31432027711202]
We argue that both domain-specific and domain-invariant features are crucial for improving the generalization ability of re-id models. We name two-stream adaptive learning (TAL) to simultaneously model these two kinds of information. Our framework can be applied to both single-source and multi-source domain generalization tasks.
arXiv Detail & Related papers (2021-11-29T01:27:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.