NeXt2Former-CD: Efficient Remote Sensing Change Detection with Modern Vision Architectures
- URL: http://arxiv.org/abs/2602.18717v1
- Date: Sat, 21 Feb 2026 04:51:53 GMT
- Title: NeXt2Former-CD: Efficient Remote Sensing Change Detection with Modern Vision Architectures
- Authors: Yufan Wang, Sokratis Makrogiannis, Chandra Kambhamettu,
- Abstract summary: NeXt2Former-CD is an end-to-end framework that integrates a Siamese ConvNeXt encoder with DINOv3 weights, a deformable attention-based temporal fusion module, and a Mask2Former decoder.<n>Our model maintains inference latency comparable to SSM-based approaches, suggesting it is practical for high-resolution change detection tasks.
- Score: 11.733678383805897
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State Space Models (SSMs) have recently gained traction in remote sensing change detection (CD) for their favorable scaling properties. In this paper, we explore the potential of modern convolutional and attention-based architectures as a competitive alternative. We propose NeXt2Former-CD, an end-to-end framework that integrates a Siamese ConvNeXt encoder initialized with DINOv3 weights, a deformable attention-based temporal fusion module, and a Mask2Former decoder. This design is intended to better tolerate residual co-registration noise and small object-level spatial shifts, as well as semantic ambiguity in bi-temporal imagery. Experiments on LEVIR-CD, WHU-CD, and CDD datasets show that our method achieves the best results among the evaluated methods, improving over recent Mamba-based baselines in both F1 score and IoU. Furthermore, despite a larger parameter count, our model maintains inference latency comparable to SSM-based approaches, suggesting it is practical for high-resolution change detection tasks.
Related papers
- GRAD-Former: Gated Robust Attention-based Differential Transformer for Change Detection [0.7865560760233441]
Change detection (CD) in remote sensing aims to identify semantic differences between satellite images captured at different times.<n>Traditional transformer-based methods suffer from quadratic computational complexity when applied to very high-resolution (VHR) satellite images.<n>We present GRAD-Former, a novel framework that enhances contextual understanding while maintaining efficiency through reduced model size.
arXiv Detail & Related papers (2026-03-01T15:56:42Z) - Learning Invariant Visual Representations for Planning with Joint-Embedding Predictive World Models [9.714188952666918]
We improve robustness to slow features while operating in a reduced latent space, up to 10x smaller than that of DINO-WM.<n>Our model is agnostic to the choice of pretrained visual encoder and maintains robustness when paired with DINOv2, SimDINOv2, and iBOT features.
arXiv Detail & Related papers (2026-02-20T22:19:46Z) - Foundation Model-Driven Semantic Change Detection in Remote Sensing Imagery [12.711361119734542]
We propose PerASCD, a semantic change detection (SCD) method driven by RS foundation model PerA.<n>We introduce a modular Cascaded Gated Decoder (CG-Decoder) that simplifies complex SCD decoding pipelines.<n>Our decoder achieves state-of-the-art (SOTA) performance on two public benchmark datasets.
arXiv Detail & Related papers (2026-02-14T13:56:31Z) - Towards Remote Sensing Change Detection with Neural Memory [61.39582645714727]
ChangeTitans is a Titans-based framework for remote sensing change detection.<n>First, we propose VTitans, which integrates neural memory with segmented local attention.<n>Second, we present a hierarchical VTitans-Adapter to refine multi-scale features across different network layers.<n>Third, we introduce TS-CBAM, a two-stream fusion module, to suppress pseudo-changes and enhance detection accuracy.
arXiv Detail & Related papers (2026-02-11T03:50:51Z) - FoBa: A Foreground-Background co-Guided Method and New Benchmark for Remote Sensing Semantic Change Detection [48.06921153684768]
We present a new benchmark for remote sensing semantic change detection (SCD) called LevirSCD.<n>The dataset covers 16 change categories and 210 specific change types, with more fine-grained class definitions.<n>We propose a foreground-background co-guided SCD (FoBa) method, which leverages foregrounds enriched with contextual information to guide the model.<n>FoBa achieves competitive results compared to current SOTA methods, with improvements of 1.48%, 3.61%, and 2.81% in the SeK metric, respectively.
arXiv Detail & Related papers (2025-09-19T09:19:57Z) - DC-Mamba: Bi-temporal deformable alignment and scale-sparse enhancement for remote sensing change detection [9.305032436286773]
We introduce DC-Mamba, an "align-then-enhance" framework built upon the ChangeMamba backbone.<n>It integrates two lightweight, plug-and-play modules: (1) Bi-Temporal Deformable Alignment (BTDA), which explicitly introduces geometric awareness to correct spatial misalignments at the semantic feature level; and (2) a Scale-Sparse Change Amplifier(SSCA), which uses multi-source cues to selectively amplify high-confidence change signals while suppressing noise before the final classification.
arXiv Detail & Related papers (2025-09-19T03:49:23Z) - STNMamba: Mamba-based Spatial-Temporal Normality Learning for Video Anomaly Detection [48.997518615379995]
Video anomaly detection (VAD) has been extensively researched due to its potential for intelligent video systems.<n>Most existing methods based on CNNs and transformers still suffer from substantial computational burdens.<n>We propose a lightweight and effective Mamba-based network named STNMamba to enhance the learning of spatial-temporal normality.
arXiv Detail & Related papers (2024-12-28T08:49:23Z) - Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector [72.05791402494727]
This paper studies the challenging cross-domain few-shot object detection (CD-FSOD)
It aims to develop an accurate object detector for novel domains with minimal labeled examples.
arXiv Detail & Related papers (2024-02-05T15:25:32Z) - Complexity Matters: Rethinking the Latent Space for Generative Modeling [65.64763873078114]
In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion.
In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity.
arXiv Detail & Related papers (2023-07-17T07:12:29Z) - Joint Spatial-Temporal and Appearance Modeling with Transformer for
Multiple Object Tracking [59.79252390626194]
We propose a novel solution named TransSTAM, which leverages Transformer to model both the appearance features of each object and the spatial-temporal relationships among objects.
The proposed method is evaluated on multiple public benchmarks including MOT16, MOT17, and MOT20, and it achieves a clear performance improvement in both IDF1 and HOTA.
arXiv Detail & Related papers (2022-05-31T01:19:18Z) - Enhancing Object Detection for Autonomous Driving by Optimizing Anchor
Generation and Addressing Class Imbalance [0.0]
This study presents an enhanced 2D object detector based on Faster R-CNN that is better suited for the context of autonomous vehicles.
The proposed modifications over the Faster R-CNN do not increase computational cost and can easily be extended to optimize other anchor-based detection frameworks.
arXiv Detail & Related papers (2021-04-08T16:58:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.