Related papers: ViewDelta: Text-Prompted Change Detection in Unaligned Images

ViewDelta: Text-Prompted Change Detection in Unaligned Images

URL: http://arxiv.org/abs/2412.07612v1
Date: Tue, 10 Dec 2024 15:51:17 GMT
Title: ViewDelta: Text-Prompted Change Detection in Unaligned Images
Authors: Subin Varghese, Joshua Gao, Vedhus Hoskere,
Abstract summary: We propose a novel change detection method that is the first to utilize unaligned images and textual prompts to output a binary segmentation of changes relevant to user-provided text.<n>Our architecture not only enables flexible detection across diverse change detection use cases, but also yields state-of-the art performance on established benchmarks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Detecting changes between images is a fundamental problem in computer vision with broad applications in situational awareness, infrastructure assessment, environment monitoring, and industrial automation. Existing supervised models are typically limited to detecting specific types of changes, necessitating retraining for new tasks. To address these limitations with a single approach, we propose a novel change detection method that is the first to utilize unaligned images and textual prompts to output a binary segmentation of changes relevant to user-provided text. Our architecture not only enables flexible detection across diverse change detection use cases, but also yields state-of-the art performance on established benchmarks. Additionally, we release an accompanying dataset comprising of 100,311 pairs of images with text prompts and the corresponding change detection labels. We demonstrate the effectiveness of our method both quantitatively and qualitatively on datasets with a wide variety of viewpoints in indoor, outdoor, street level, synthetic, and satellite images.

Related papers

DILLEMA: Diffusion and Large Language Models for Multi-Modal Augmentation [0.13124513975412253]
We present a novel framework for testing vision neural networks that leverages Large Language Models and control-conditioned Diffusion Models. Our approach begins by translating images into detailed textual descriptions using a captioning model. These descriptions are then used to produce new test images through a text-to-image diffusion process.
arXiv Detail & Related papers (2025-02-05T16:35:42Z)
ZeroSCD: Zero-Shot Street Scene Change Detection [2.3020018305241337]
Scene Change Detection is a challenging task in computer vision and robotics. Traditional change detection methods rely on training models that take these image pairs as input and estimate the changes. We propose ZeroSCD, a zero-shot scene change detection framework that eliminates the need for training.
arXiv Detail & Related papers (2024-09-23T17:53:44Z)
Zero-Shot Scene Change Detection [14.095215136905553]
Our method takes advantage of the change detection effect of the tracking model by inputting reference and query images instead of consecutive frames. We extend our approach to video to exploit rich temporal information, enhancing scene change detection performance.
arXiv Detail & Related papers (2024-06-17T05:03:44Z)
Dual-Image Enhanced CLIP for Zero-Shot Anomaly Detection [58.228940066769596]
We introduce a Dual-Image Enhanced CLIP approach, leveraging a joint vision-language scoring system. Our methods process pairs of images, utilizing each as a visual reference for the other, thereby enriching the inference process with visual context. Our approach significantly exploits the potential of vision-language joint anomaly detection and demonstrates comparable performance with current SOTA methods across various datasets.
arXiv Detail & Related papers (2024-05-08T03:13:20Z)
Bilevel Fast Scene Adaptation for Low-Light Image Enhancement [50.639332885989255]
Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision. Main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes. We introduce the bilevel paradigm to model the above latent correspondence. A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes.
arXiv Detail & Related papers (2023-06-02T08:16:21Z)
Vision Transformer with Quadrangle Attention [76.35955924137986]
We propose a novel quadrangle attention (QA) method that extends the window-based attention to a general quadrangle formulation. Our method employs an end-to-end learnable quadrangle regression module that predicts a transformation matrix to transform default windows into target quadrangles. We integrate QA into plain and hierarchical vision transformers to create a new architecture named QFormer, which offers minor code modifications and negligible extra computational cost.
arXiv Detail & Related papers (2023-03-27T11:13:50Z)
Neighborhood Contrastive Transformer for Change Captioning [80.10836469177185]
We propose a neighborhood contrastive transformer to improve the model's perceiving ability for various changes under different scenes. The proposed method achieves the state-of-the-art performance on three public datasets with different change scenarios.
arXiv Detail & Related papers (2023-03-06T14:39:54Z)
Adversarial Virtual Exemplar Learning for Label-Frugal Satellite Image Change Detection [12.18340575383456]
In this paper, we investigate satellite image change detection using active learning. Our method is interactive and relies on a question and answer model which asks the oracle (user) questions about the most informative display. The main contribution of our method consists in a novel adversarial model that allows frugally probing the oracle with only the most representative, diverse and uncertain virtual exemplars.
arXiv Detail & Related papers (2022-12-28T17:46:20Z)
Self-Pair: Synthesizing Changes from Single Source for Object Change Detection in Remote Sensing Imagery [6.586756080460231]
We train a change detector using two spatially unrelated images with corresponding semantic labels such as building. We show that manipulating the source image as an after-image is crucial to the performance of change detection. Our method outperforms existing methods based on single-temporal supervision.
arXiv Detail & Related papers (2022-12-20T13:26:42Z)
SceneComposer: Any-Level Semantic Image Synthesis [80.55876413285587]
We propose a new framework for conditional image synthesis from semantic layouts of any precision levels. The framework naturally reduces to text-to-image (T2I) at the lowest level with no shape information, and it becomes segmentation-to-image (S2I) at the highest level. We introduce several novel techniques to address the challenges coming with this new setup.
arXiv Detail & Related papers (2022-11-21T18:59:05Z)
The Change You Want to See [91.3755431537592]
Given two images of the same scene, being able to automatically detect the changes in them has practical applications in a variety of domains. We tackle the change detection problem with the goal of detecting "object-level" changes in an image pair despite differences in their viewpoint and illumination.
arXiv Detail & Related papers (2022-09-28T18:10:09Z)
Simple Open-Vocabulary Object Detection with Vision Transformers [51.57562920090721]
We propose a strong recipe for transferring image-text models to open-vocabulary object detection. We use a standard Vision Transformer architecture with minimal modifications, contrastive image-text pre-training, and end-to-end detection fine-tuning. We provide the adaptation strategies and regularizations needed to attain very strong performance on zero-shot text-conditioned and one-shot image-conditioned object detection.
arXiv Detail & Related papers (2022-05-12T17:20:36Z)
ObjectFormer for Image Manipulation Detection and Localization [118.89882740099137]
We propose ObjectFormer to detect and localize image manipulations. We extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings. We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-03-28T12:27:34Z)
Frugal Learning of Virtual Exemplars for Label-Efficient Satellite Image Change Detection [12.18340575383456]
In this paper, we devise a novel interactive satellite image change detection algorithm based on active learning. The proposed framework is iterative and relies on a question and answer model which asks the oracle (user) questions about the most informative display. The contribution of our framework resides in a novel display model which selects the most representative and diverse virtual exemplars.
arXiv Detail & Related papers (2022-03-22T09:29:42Z)
Region-level Active Learning for Cluttered Scenes [60.93811392293329]
We introduce a new strategy that subsumes previous Image-level and Object-level approaches into a generalized, Region-level approach. We show that this approach significantly decreases labeling effort and improves rare object search on realistic data with inherent class-imbalance and cluttered scenes.
arXiv Detail & Related papers (2021-08-20T14:02:38Z)
Unsupervised Change Detection in Satellite Images with Generative Adversarial Network [20.81970476609318]
We propose a novel change detection framework utilizing a special neural network architecture -- Generative Adversarial Network (GAN) to generate better coregistered images. The optimized GAN model would produce better coregistered images where changes can be easily spotted and then the change map can be presented through a comparison strategy.
arXiv Detail & Related papers (2020-09-08T10:26:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.