ChangeDiff: A Multi-Temporal Change Detection Data Generator with Flexible Text Prompts via Diffusion Model
- URL: http://arxiv.org/abs/2412.15541v1
- Date: Fri, 20 Dec 2024 03:58:28 GMT
- Title: ChangeDiff: A Multi-Temporal Change Detection Data Generator with Flexible Text Prompts via Diffusion Model
- Authors: Qi Zang, Jiayi Yang, Shuang Wang, Dong Zhao, Wenjun Yi, Zhun Zhong,
- Abstract summary: This paper focuses on the semantic CD (SCD) task and develops a multi-temporal SCD data generator ChangeDiff.
ChangeDiff generates change data in two steps: first, it uses text prompts and a text-to-image model to create continuous layouts, and then it employs layout-to-image to convert these layouts into images.
Our generated data shows significant progress in temporal continuity, spatial diversity, and quality realism, empowering change detectors with accuracy and transferability.
- Score: 21.50463332137926
- License:
- Abstract: Data-driven deep learning models have enabled tremendous progress in change detection (CD) with the support of pixel-level annotations. However, collecting diverse data and manually annotating them is costly, laborious, and knowledge-intensive. Existing generative methods for CD data synthesis show competitive potential in addressing this issue but still face the following limitations: 1) difficulty in flexibly controlling change events, 2) dependence on additional data to train the data generators, 3) focus on specific change detection tasks. To this end, this paper focuses on the semantic CD (SCD) task and develops a multi-temporal SCD data generator ChangeDiff by exploring powerful diffusion models. ChangeDiff innovatively generates change data in two steps: first, it uses text prompts and a text-to-layout (T2L) model to create continuous layouts, and then it employs layout-to-image (L2I) to convert these layouts into images. Specifically, we propose multi-class distribution-guided text prompts (MCDG-TP), allowing for layouts to be generated flexibly through controllable classes and their corresponding ratios. Subsequently, to generalize the T2L model to the proposed MCDG-TP, a class distribution refinement loss is further designed as training supervision. %For the former, a multi-classdistribution-guided text prompt (MCDG-TP) is proposed to complement via controllable classes and ratios. To generalize the text-to-image diffusion model to the proposed MCDG-TP, a class distribution refinement loss is designed as training supervision. For the latter, MCDG-TP in three modes is proposed to synthesize new layout masks from various texts. Our generated data shows significant progress in temporal continuity, spatial diversity, and quality realism, empowering change detectors with accuracy and transferability. The code is available at https://github.com/DZhaoXd/ChangeDiff
Related papers
- Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models [59.16287352266203]
We introduce Adaptive Prompt-Tailored Pruning (APTP), a novel prompt-based pruning method for text-to-image (T2I) models.
APTP learns to determine the required capacity for an input text prompt and routes it to an architecture code, given a total desired compute budget for prompts.
APTP outperforms the single-model pruning baselines in terms of FID, CLIP, and CMMD scores.
arXiv Detail & Related papers (2024-06-17T19:22:04Z) - Adversarial Supervision Makes Layout-to-Image Diffusion Models Thrive [21.49096276631859]
Current L2I models either suffer from poor editability via text or weak alignment between the generated image and the input layout.
We propose to integrate adversarial supervision into the conventional training pipeline of L2I diffusion models (ALDM)
Specifically, we employ a segmentation-based discriminator which provides explicit feedback to the diffusion generator on the pixel-level alignment between the denoised image and the input layout.
arXiv Detail & Related papers (2024-01-16T20:31:46Z) - DreamDistribution: Prompt Distribution Learning for Text-to-Image
Diffusion Models [53.17454737232668]
We introduce a solution that allows a pretrained T2I diffusion model to learn a set of soft prompts.
These prompts offer text-guided editing capabilities and additional flexibility in controlling variation and mixing between multiple distributions.
We also show the adaptability of the learned prompt distribution to other tasks, such as text-to-3D.
arXiv Detail & Related papers (2023-12-21T12:11:00Z) - Contrastive Transformer Learning with Proximity Data Generation for
Text-Based Person Search [60.626459715780605]
Given a descriptive text query, text-based person search aims to retrieve the best-matched target person from an image gallery.
Such a cross-modal retrieval task is quite challenging due to significant modality gap, fine-grained differences and insufficiency of annotated data.
In this paper, we propose a simple yet effective dual Transformer model for text-based person search.
arXiv Detail & Related papers (2023-11-15T16:26:49Z) - T-SaS: Toward Shift-aware Dynamic Adaptation for Streaming Data [9.829993835712422]
This paper aims to solve the problem of sequential data modeling in the presence of sudden distribution shifts.
Specifically, we design a Bayesian framework, dubbed as T-SaS, with a discrete distribution-modeling variable to capture abrupt shifts of data.
The proposed method learns specific model parameters for each distribution by learning which neurons should be activated in the full network.
arXiv Detail & Related papers (2023-09-05T22:55:10Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - Diffusion Model is an Effective Planner and Data Synthesizer for
Multi-Task Reinforcement Learning [101.66860222415512]
Multi-Task Diffusion Model (textscMTDiff) is a diffusion-based method that incorporates Transformer backbones and prompt learning for generative planning and data synthesis.
For generative planning, we find textscMTDiff outperforms state-of-the-art algorithms across 50 tasks on Meta-World and 8 maps on Maze2D.
arXiv Detail & Related papers (2023-05-29T05:20:38Z) - Learning from Multiple Sources for Data-to-Text and Text-to-Data [16.080265665849527]
Data-to-text (D2T) and text-to-data (T2D) are dual tasks that convert structured data, such as graphs or tables into fluent text, and vice versa.
Current systems leverage pre-trained language models fine-tuned on D2T or T2D tasks.
This approach has two main limitations: first, a separate system has to be tuned for each task and source; second, learning is limited by the scarcity of available corpora.
We introduce a variational auto-encoder model with disentangled style and content variables that allows us to represent the diversity that
arXiv Detail & Related papers (2023-02-22T10:39:33Z) - Weakly Supervised Change Detection Using Guided Anisotropic Difusion [97.43170678509478]
We propose original ideas that help us to leverage such datasets in the context of change detection.
First, we propose the guided anisotropic diffusion (GAD) algorithm, which improves semantic segmentation results.
We then show its potential in two weakly-supervised learning strategies tailored for change detection.
arXiv Detail & Related papers (2021-12-31T10:03:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.