Morphing Through Time: Diffusion-Based Bridging of Temporal Gaps for Robust Alignment in Change Detection
- URL: http://arxiv.org/abs/2511.07976v1
- Date: Wed, 12 Nov 2025 01:32:01 GMT
- Title: Morphing Through Time: Diffusion-Based Bridging of Temporal Gaps for Robust Alignment in Change Detection
- Authors: Seyedehanita Madani, Vishal M. Patel,
- Abstract summary: We introduce a modular pipeline that improves spatial and temporal robustness without altering existing change detection networks.<n>A diffusion module synthesizes intermediate morphing frames that bridge large appearance gaps, enabling RoMa to estimate stepwise correspondences.<n>Experiments on LEVIR-CD, WHU-CD, and DSIFN-CD show consistent gains in both registration accuracy and downstream change detection.
- Score: 51.56484100374058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Remote sensing change detection is often challenged by spatial misalignment between bi-temporal images, especially when acquisitions are separated by long seasonal or multi-year gaps. While modern convolutional and transformer-based models perform well on aligned data, their reliance on precise co-registration limits their robustness in real-world conditions. Existing joint registration-detection frameworks typically require retraining and transfer poorly across domains. We introduce a modular pipeline that improves spatial and temporal robustness without altering existing change detection networks. The framework integrates diffusion-based semantic morphing, dense registration, and residual flow refinement. A diffusion module synthesizes intermediate morphing frames that bridge large appearance gaps, enabling RoMa to estimate stepwise correspondences between consecutive frames. The composed flow is then refined through a lightweight U-Net to produce a high-fidelity warp that co-registers the original image pair. Extensive experiments on LEVIR-CD, WHU-CD, and DSIFN-CD show consistent gains in both registration accuracy and downstream change detection across multiple backbones, demonstrating the generality and effectiveness of the proposed approach.
Related papers
- Time2General: Learning Spatiotemporal Invariant Representations for Domain-Generalization Video Semantic Segmentation [9.929390581043334]
Domain Generalized Video Semantic (DGVSS) is trained on a single labeled driving domain.<n>Time2General achieves a substantial improvement in cross-domain accuracy and temporal stability over prior DGVSS and VSS baselines.
arXiv Detail & Related papers (2026-02-10T10:55:25Z) - Error-Propagation-Free Learned Video Compression With Dual-Domain Progressive Temporal Alignment [92.57576987521107]
We propose a novel unifiedtransform framework with dual-domain progressive temporal alignment and quality-conditioned mixture-of-expert (QCMoE)<n>QCMoE allows continuous and consistent rate control with appealing R-D performance.<n> Experimental results show that the proposed method achieves competitive R-D performance compared with the state-of-the-arts.
arXiv Detail & Related papers (2025-12-11T09:14:51Z) - UniDiff: A Unified Diffusion Framework for Multimodal Time Series Forecasting [90.47915032778366]
We propose UniDiff, a unified diffusion framework for multimodal time series forecasting.<n>At its core lies a unified and parallel fusion module, where a single cross-attention mechanism integrates structural information from timestamps and semantic context from texts.<n>Experiments on real-world benchmark datasets across eight domains demonstrate that the proposed UniDiff model achieves state-of-the-art performance.
arXiv Detail & Related papers (2025-12-08T05:36:14Z) - DiffRegCD: Integrated Registration and Change Detection with Diffusion Features [74.3102451211493]
We present DiffRegCD, an integrated framework that unifies dense registration and change detection in a single model.<n>Experiments on aerial (LEVIR-CD, DSIFN-CD, WHU-CD, SYSU-CD) and ground level (VL-CMU-CD) datasets show that DiffRegCD consistently surpasses recent baselines.
arXiv Detail & Related papers (2025-11-11T07:32:19Z) - Efficient Large-Deformation Medical Image Registration via Recurrent Dynamic Correlation [3.0110453390062446]
Convolutional networks aggregate local features but lack direct modeling of voxel correspondences.<n>We propose a Recurrent Correlation-based matching framework, supporting efficient matching toward large deformations.<n>We conduct experiments on brain MRI and abdominal CT datasets under two settings: with and without affine pre-registration.
arXiv Detail & Related papers (2025-10-25T17:49:29Z) - Wavelet-Guided Dual-Frequency Encoding for Remote Sensing Change Detection [67.84730634802204]
Change detection in remote sensing imagery plays a vital role in various engineering applications, such as natural disaster monitoring, urban expansion tracking, and infrastructure management.<n>Most existing methods still rely on spatial-domain modeling, where the limited diversity of feature representations hinders the detection of subtle change regions.<n>We observe that frequency-domain feature modeling particularly in the wavelet domain amplify fine-grained differences in frequency components, enhancing the perception of edge changes that are challenging to capture in the spatial domain.
arXiv Detail & Related papers (2025-08-07T11:14:16Z) - OptiCorNet: Optimizing Sequence-Based Context Correlation for Visual Place Recognition [2.3093110834423616]
This paper presents OptiCorNet, a novel sequence modeling framework.<n>It unifies spatial feature extraction and temporal differencing into a differentiable, end-to-end trainable module.<n>Our approach outperforms state-of-the-art baselines under challenging seasonal and viewpoint variations.
arXiv Detail & Related papers (2025-07-19T04:29:43Z) - Multimodal LLM-Guided Semantic Correction in Text-to-Image Diffusion [52.315729095824906]
MLLM Semantic-Corrected Ping-Pong-Ahead Diffusion (PPAD) is a novel framework that introduces a Multimodal Large Language Model (MLLM) as a semantic observer during inference.<n>It performs real-time analysis on intermediate generations, identifies latent semantic inconsistencies, and translates feedback into controllable signals that actively guide the remaining denoising steps.<n>Extensive experiments demonstrate PPAD's significant improvements.
arXiv Detail & Related papers (2025-05-26T14:42:35Z) - Fast Diffeomorphic Image Registration using Patch based Fully Convolutional Networks [5.479932919974457]
This paper proposes a novel unsupervised learning-based fully convolutional network (FCN) framework for fast diffeomorphic image registration.
Experiments are conducted on three distinct T1-weighted magnetic resonance imaging (T1w MRI) datasets.
arXiv Detail & Related papers (2024-04-05T17:46:38Z) - Improving Misaligned Multi-modality Image Fusion with One-stage
Progressive Dense Registration [67.23451452670282]
Misalignments between multi-modality images pose challenges in image fusion.
We propose a Cross-modality Multi-scale Progressive Dense Registration scheme.
This scheme accomplishes the coarse-to-fine registration exclusively using a one-stage optimization.
arXiv Detail & Related papers (2023-08-22T03:46:24Z) - Breaking Modality Disparity: Harmonized Representation for Infrared and
Visible Image Registration [66.33746403815283]
We propose a scene-adaptive infrared and visible image registration.
We employ homography to simulate the deformation between different planes.
We propose the first ground truth available misaligned infrared and visible image dataset.
arXiv Detail & Related papers (2023-04-12T06:49:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.