SceneDiff: A Benchmark and Method for Multiview Object Change Detection
- URL: http://arxiv.org/abs/2512.16908v1
- Date: Thu, 18 Dec 2025 18:59:02 GMT
- Title: SceneDiff: A Benchmark and Method for Multiview Object Change Detection
- Authors: Yuqun Wu, Chih-hao Lin, Henry Che, Aditi Tiwari, Chuhang Zou, Shenlong Wang, Derek Hoiem,
- Abstract summary: We introduce SceneDiff Benchmark, the first multiview change detection benchmark with object instance annotations.<n>We also introduce SceneDiff, a new training-free approach for multiview object change detection.<n>Our method aligns the captures in 3D, extracts object regions, and compares spatial and semantic region features to detect changes.
- Score: 24.67954935241515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate the problem of identifying objects that have been added, removed, or moved between a pair of captures (images or videos) of the same scene at different times. Detecting such changes is important for many applications, such as robotic tidying or construction progress and safety monitoring. A major challenge is that varying viewpoints can cause objects to falsely appear changed. We introduce SceneDiff Benchmark, the first multiview change detection benchmark with object instance annotations, comprising 350 diverse video pairs with thousands of changed objects. We also introduce the SceneDiff method, a new training-free approach for multiview object change detection that leverages pretrained 3D, segmentation, and image encoding models to robustly predict across multiple benchmarks. Our method aligns the captures in 3D, extracts object regions, and compares spatial and semantic region features to detect changes. Experiments on multi-view and two-view benchmarks demonstrate that our method outperforms existing approaches by large margins (94% and 37.4% relative AP improvements). The benchmark and code will be publicly released.
Related papers
- ChangingGrounding: 3D Visual Grounding in Changing Scenes [92.00984845186679]
Real-world robots localize objects from natural-language instructions while scenes around them keep changing.<n>Most of the existing 3D visual grounding (3DVG) method still assumes a reconstructed and up-to-date point cloud.<n>We introduce ChangingGrounding, the first benchmark that explicitly measures how well an agent can exploit past observations.
arXiv Detail & Related papers (2025-10-16T17:59:16Z) - Multi-View Pose-Agnostic Change Localization with Zero Labels [4.997375878454274]
We propose a label-free, pose-agnostic change detection method that integrates information from multiple viewpoints.<n>With as few as 5 images of the post-change scene, our approach can learn an additional change channel in a 3DGS.<n>Our change-aware 3D scene representation additionally enables the generation of accurate change masks for unseen viewpoints.
arXiv Detail & Related papers (2024-12-05T06:28:54Z) - 3DGS-CD: 3D Gaussian Splatting-based Change Detection for Physical Object Rearrangement [2.2122801766964795]
We present 3DGS-CD, the first 3D Gaussian Splatting (3DGS)-based method for detecting physical object rearrangements in 3D scenes.<n>Our approach estimates 3D object-level changes by comparing two sets of unaligned images taken at different times.<n>Our method can accurately identify changes in cluttered environments using sparse (as few as one) post-change images within as little as 18s.
arXiv Detail & Related papers (2024-11-06T07:08:41Z) - 3D-Aware Instance Segmentation and Tracking in Egocentric Videos [107.10661490652822]
Egocentric videos present unique challenges for 3D scene understanding.
This paper introduces a novel approach to instance segmentation and tracking in first-person video.
By incorporating spatial and temporal cues, we achieve superior performance compared to state-of-the-art 2D approaches.
arXiv Detail & Related papers (2024-08-19T10:08:25Z) - Zero-Shot Scene Change Detection [14.095215136905553]
Our method takes advantage of the change detection effect of the tracking model by inputting reference and query images instead of consecutive frames.<n>We extend our approach to video, leveraging rich temporal information to enhance the performance of scene change detection.
arXiv Detail & Related papers (2024-06-17T05:03:44Z) - Towards Generalizable Multi-Camera 3D Object Detection via Perspective
Debiasing [28.874014617259935]
Multi-Camera 3D Object Detection (MC3D-Det) has gained prominence with the advent of bird's-eye view (BEV) approaches.
We propose a novel method that aligns 3D detection with 2D camera plane results, ensuring consistent and accurate detections.
arXiv Detail & Related papers (2023-10-17T15:31:28Z) - Tracking Passengers and Baggage Items using Multiple Overhead Cameras at
Security Checkpoints [2.021502591596062]
We introduce a novel framework to track multiple objects in overhead camera videos for airport checkpoint security scenarios.
We propose a Self-Supervised Learning (SSL) technique to provide the model information about instance segmentation uncertainty from overhead images.
Our results show that self-supervision improves object detection accuracy by up to $42%$ without increasing the inference time of the model.
arXiv Detail & Related papers (2022-12-31T12:57:09Z) - The Change You Want to See [91.3755431537592]
Given two images of the same scene, being able to automatically detect the changes in them has practical applications in a variety of domains.
We tackle the change detection problem with the goal of detecting "object-level" changes in an image pair despite differences in their viewpoint and illumination.
arXiv Detail & Related papers (2022-09-28T18:10:09Z) - A Simple Baseline for Multi-Camera 3D Object Detection [94.63944826540491]
3D object detection with surrounding cameras has been a promising direction for autonomous driving.
We present SimMOD, a Simple baseline for Multi-camera Object Detection.
We conduct extensive experiments on the 3D object detection benchmark of nuScenes to demonstrate the effectiveness of SimMOD.
arXiv Detail & Related papers (2022-08-22T03:38:01Z) - Tag-Based Attention Guided Bottom-Up Approach for Video Instance
Segmentation [83.13610762450703]
Video instance is a fundamental computer vision task that deals with segmenting and tracking object instances across a video sequence.
We introduce a simple end-to-end train bottomable-up approach to achieve instance mask predictions at the pixel-level granularity, instead of the typical region-proposals-based approach.
Our method provides competitive results on YouTube-VIS and DAVIS-19 datasets, and has minimum run-time compared to other contemporary state-of-the-art performance methods.
arXiv Detail & Related papers (2022-04-22T15:32:46Z) - Objects are Different: Flexible Monocular 3D Object Detection [87.82253067302561]
We propose a flexible framework for monocular 3D object detection which explicitly decouples the truncated objects and adaptively combines multiple approaches for object depth estimation.
Experiments demonstrate that our method outperforms the state-of-the-art method by relatively 27% for the moderate level and 30% for the hard level in the test set of KITTI benchmark.
arXiv Detail & Related papers (2021-04-06T07:01:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.