Related papers: The Change You Want to See

The Change You Want to See

URL: http://arxiv.org/abs/2209.14341v1
Date: Wed, 28 Sep 2022 18:10:09 GMT
Title: The Change You Want to See
Authors: Ragav Sachdeva, Andrew Zisserman
Abstract summary: Given two images of the same scene, being able to automatically detect the changes in them has practical applications in a variety of domains. We tackle the change detection problem with the goal of detecting "object-level" changes in an image pair despite differences in their viewpoint and illumination.
Score: 91.3755431537592
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We live in a dynamic world where things change all the time. Given two images of the same scene, being able to automatically detect the changes in them has practical applications in a variety of domains. In this paper, we tackle the change detection problem with the goal of detecting "object-level" changes in an image pair despite differences in their viewpoint and illumination. To this end, we make the following four contributions: (i) we propose a scalable methodology for obtaining a large-scale change detection training dataset by leveraging existing object segmentation benchmarks; (ii) we introduce a co-attention based novel architecture that is able to implicitly determine correspondences between an image pair and find changes in the form of bounding box predictions; (iii) we contribute four evaluation datasets that cover a variety of domains and transformations, including synthetic image changes, real surveillance images of a 3D scene, and synthetic 3D scenes with camera motion; (iv) we evaluate our model on these four datasets and demonstrate zero-shot and beyond training transformation generalization.

Related papers

IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments [56.85804719947]
We present IAAO, a framework that builds an explicit 3D model for intelligent agents to gain understanding of articulated objects in their environment through interaction. We first build hierarchical features and label fields for each object state using 3D Gaussian Splatting (3DGS) by distilling mask features and view-consistent labels from multi-view images. We then perform object- and part-level queries on the 3D Gaussian primitives to identify static and articulated elements, estimating global transformations and local articulation parameters along with affordances.
arXiv Detail & Related papers (2025-04-09T12:36:48Z)
Gaussian Difference: Find Any Change Instance in 3D Scenes [6.726861447317518]
Instance-level change detection in 3D scenes presents significant challenges. This paper introduces a novel approach for detecting changes in real-world scenarios. Our approach offers improved detection accuracy, robustness to lighting changes, and efficient processing times.
arXiv Detail & Related papers (2025-02-24T08:09:49Z)
Multi-View Pose-Agnostic Change Localization with Zero Labels [4.997375878454274]
We propose a label-free, pose-agnostic change detection method that integrates information from multiple viewpoints. With as few as 5 images of the post-change scene, our approach can learn an additional change channel in a 3DGS. Our change-aware 3D scene representation additionally enables the generation of accurate change masks for unseen viewpoints.
arXiv Detail & Related papers (2024-12-05T06:28:54Z)
Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms [27.882122236282054]
We present a novel method for scene change detection that leverages the robust feature extraction capabilities of a visual foundational model, DINOv2. We evaluate our approach on two benchmark datasets, VL-CMU-CD and PSCD, along with their viewpoint-varied versions. Our experiments demonstrate significant improvements in F1-score, particularly in scenarios involving geometric changes between image pairs.
arXiv Detail & Related papers (2024-09-25T11:55:27Z)
Zero-Shot Scene Change Detection [14.095215136905553]
Our method takes advantage of the change detection effect of the tracking model by inputting reference and query images instead of consecutive frames. We extend our approach to video to exploit rich temporal information, enhancing scene change detection performance.
arXiv Detail & Related papers (2024-06-17T05:03:44Z)
Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers [50.576354045312115]
Direct image-to-graph transformation is a challenging task that solves object detection and relationship prediction in a single model. We introduce a set of methods enabling cross-domain and cross-dimension transfer learning for image-to-graph transformers. We demonstrate our method's utility in cross-domain and cross-dimension experiments, where we pretrain our models on 2D satellite images before applying them to vastly different target domains in 2D and 3D.
arXiv Detail & Related papers (2024-03-11T10:48:56Z)
ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes [64.57705752579207]
We evaluate the resilience of vision-based models against diverse object-to-background context variations. We harness the generative capabilities of text-to-image, image-to-text, and image-to-segment models to automatically generate object-to-background changes.
arXiv Detail & Related papers (2024-03-07T17:48:48Z)
Cohere3D: Exploiting Temporal Coherence for Unsupervised Representation Learning of Vision-based Autonomous Driving [73.3702076688159]
We propose a novel contrastive learning algorithm, Cohere3D, to learn coherent instance representations in a long-term input sequence. We evaluate our algorithm by finetuning the pretrained model on various downstream perception, prediction, and planning tasks.
arXiv Detail & Related papers (2024-02-23T19:43:01Z)
Living Scenes: Multi-object Relocalization and Reconstruction in Changing 3D Environments [20.890476387720483]
MoRE is a novel approach for multi-object relocalization and reconstruction in evolving environments. We view these environments as "living scenes" and consider the problem of transforming scans taken at different points in time into a 3D reconstruction of the object instances.
arXiv Detail & Related papers (2023-12-14T17:09:57Z)
The Change You Want to See (Now in 3D) [65.61789642291636]
The goal of this paper is to detect what has changed, if anything, between two "in the wild" images of the same 3D scene. We contribute a change detection model that is trained entirely on synthetic data and is class-agnostic. We release a new evaluation dataset consisting of real-world image pairs with human-annotated differences.
arXiv Detail & Related papers (2023-08-21T01:59:45Z)
Learning Transformations To Reduce the Geometric Shift in Object Detection [60.20931827772482]
We tackle geometric shifts emerging from variations in the image capture process. We introduce a self-training approach that learns a set of geometric transformations to minimize these shifts. We evaluate our method on two different shifts, i.e., a camera's field of view (FoV) change and a viewpoint change.
arXiv Detail & Related papers (2023-01-13T11:55:30Z)
Supervising Remote Sensing Change Detection Models with 3D Surface Semantics [1.8782750537161614]
We propose Contrastive Surface-Image Pretraining (CSIP) for joint learning using optical RGB and above ground level (AGL) map pairs. We then evaluate these pretrained models on several building segmentation and change detection datasets to show that our method does, in fact, extract features relevant to downstream applications.
arXiv Detail & Related papers (2022-02-26T23:35:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.