Related papers: Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms

Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms

URL: http://arxiv.org/abs/2409.16850v1
Date: Wed, 25 Sep 2024 11:55:27 GMT
Title: Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms
Authors: Chun-Jung Lin, Sourav Garg, Tat-Jun Chin, Feras Dayoub,
Abstract summary: We present a novel method for scene change detection that leverages the robust feature extraction capabilities of a visual foundational model, DINOv2. We evaluate our approach on two benchmark datasets, VL-CMU-CD and PSCD, along with their viewpoint-varied versions. Our experiments demonstrate significant improvements in F1-score, particularly in scenarios involving geometric changes between image pairs.
Score: 27.882122236282054
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a novel method for scene change detection that leverages the robust feature extraction capabilities of a visual foundational model, DINOv2, and integrates full-image cross-attention to address key challenges such as varying lighting, seasonal variations, and viewpoint differences. In order to effectively learn correspondences and mis-correspondences between an image pair for the change detection task, we propose to a) ``freeze'' the backbone in order to retain the generality of dense foundation features, and b) employ ``full-image'' cross-attention to better tackle the viewpoint variations between the image pair. We evaluate our approach on two benchmark datasets, VL-CMU-CD and PSCD, along with their viewpoint-varied versions. Our experiments demonstrate significant improvements in F1-score, particularly in scenarios involving geometric changes between image pairs. The results indicate our method's superior generalization capabilities over existing state-of-the-art approaches, showing robustness against photometric and geometric variations as well as better overall generalization when fine-tuned to adapt to new environments. Detailed ablation studies further validate the contributions of each component in our architecture. Source code will be made publicly available upon acceptance.

Related papers

Leveraging Geometric Priors for Unaligned Scene Change Detection [53.523333385654546]
Unaligned Scene Change Detection aims to detect scene changes between image pairs captured at different times without assuming viewpoint alignment.<n>We introduce geometric priors for the first time to address the core challenges of unaligned SCD.<n>We propose a training-free framework that integrates them with the powerful representations of a visual foundation model.
arXiv Detail & Related papers (2025-09-14T14:31:08Z)
SChanger: Change Detection from a Semantic Change and Spatial Consistency Perspective [0.6749750044497732]
We develop a fine-tuning strategy called the Semantic Change Network (SCN) to address the data scarcity issue. We observe that the locations of changes between the two images are spatially identical, a concept we refer to as spatial consistency. This enhances the modeling of multi-scale changes and helps capture underlying relationships in change detection semantics.
arXiv Detail & Related papers (2025-03-26T17:15:43Z)
Detect Changes like Humans: Incorporating Semantic Priors for Improved Change Detection [52.62459671461816]
This paper explores incorporating semantic priors from visual foundation models to improve the ability to detect changes.<n>Inspired by the human visual paradigm, a novel dual-stream feature decoder is derived to distinguish changes by combining semantic-aware features and difference-aware features.
arXiv Detail & Related papers (2024-12-22T08:27:15Z)
Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection [58.228940066769596]
We introduce a Dual-Image Enhanced CLIP approach, leveraging a joint vision-language scoring system. Our methods process pairs of images, utilizing each as a visual reference for the other, thereby enriching the inference process with visual context. Our approach significantly exploits the potential of vision-language joint anomaly detection and demonstrates comparable performance with current SOTA methods across various datasets.
arXiv Detail & Related papers (2024-05-08T03:13:20Z)
Fiducial Focus Augmentation for Facial Landmark Detection [4.433764381081446]
We propose a novel image augmentation technique to enhance the model's understanding of facial structures. We employ a Siamese architecture-based training mechanism with a Deep Canonical Correlation Analysis (DCCA)-based loss. Our approach outperforms multiple state-of-the-art approaches across various benchmark datasets.
arXiv Detail & Related papers (2024-02-23T01:34:00Z)
Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods. We present a novel forgery-aware adaptive transformer approach, namely FatFormer. Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z)
Explicit Correspondence Matching for Generalizable Neural Radiance Fields [49.49773108695526]
We present a new NeRF method that is able to generalize to new unseen scenarios and perform novel view synthesis with as few as two source views. The explicit correspondence matching is quantified with the cosine similarity between image features sampled at the 2D projections of a 3D point on different views. Our method achieves state-of-the-art results on different evaluation settings, with the experiments showing a strong correlation between our learned cosine feature similarity and volume density.
arXiv Detail & Related papers (2023-04-24T17:46:01Z)
Learning Transformations To Reduce the Geometric Shift in Object Detection [60.20931827772482]
We tackle geometric shifts emerging from variations in the image capture process. We introduce a self-training approach that learns a set of geometric transformations to minimize these shifts. We evaluate our method on two different shifts, i.e., a camera's field of view (FoV) change and a viewpoint change.
arXiv Detail & Related papers (2023-01-13T11:55:30Z)
Background-Mixed Augmentation for Weakly Supervised Change Detection [18.319961338185458]
Change detection (CD) is to decouple object changes (i.e., object missing or appearing) from background changes (i.e., environment variations) Recent deep learning-based methods develop novel network architectures or optimization strategies with paired-training examples. We develop a novel weakly supervised training algorithm that only needs image-level labels.
arXiv Detail & Related papers (2022-11-21T14:12:53Z)
dual unet:a novel siamese network for change detection with cascade differential fusion [4.651756476458979]
We propose a novel Siamese neural network for change detection task, namely Dual-UNet. In contrast to previous individually encoded the bitemporal images, we design an encoder differential-attention module to focus on the spatial difference relationships of pixels. Experiments demonstrate that the proposed approach consistently outperforms the most advanced methods on popular seasonal change detection datasets.
arXiv Detail & Related papers (2022-08-12T14:24:09Z)
A Model for Multi-View Residual Covariances based on Perspective Deformation [88.21738020902411]
We derive a model for the covariance of the visual residuals in multi-view SfM, odometry and SLAM setups. We validate our model with synthetic and real data and integrate it into photometric and feature-based Bundle Adjustment.
arXiv Detail & Related papers (2022-02-01T21:21:56Z)
Semantic Change Detection with Asymmetric Siamese Networks [71.28665116793138]
Given two aerial images, semantic change detection aims to locate the land-cover variations and identify their change types with pixel-wise boundaries. This problem is vital in many earth vision related tasks, such as precise urban planning and natural resource management. We present an asymmetric siamese network (ASN) to locate and identify semantic changes through feature pairs obtained from modules of widely different structures.
arXiv Detail & Related papers (2020-10-12T13:26:30Z)
Co-Attention for Conditioned Image Matching [91.43244337264454]
We propose a new approach to determine correspondences between image pairs in the wild under large changes in illumination, viewpoint, context, and material. While other approaches find correspondences between pairs of images by treating the images independently, we instead condition on both images to implicitly take account of the differences between them.
arXiv Detail & Related papers (2020-07-16T17:32:00Z)
DASNet: Dual attentive fully convolutional siamese networks for change detection of high resolution satellite images [17.839181739760676]
The research objective is to identity the change information of interest and filter out the irrelevant change information as interference factors. Recently, the rise of deep learning has provided new tools for change detection, which have yielded impressive results. We propose a new method, namely, dual attentive fully convolutional Siamese networks (DASNet) for change detection in high-resolution images.
arXiv Detail & Related papers (2020-03-07T16:57:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.