Origin Identification for Text-Guided Image-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2501.02376v2
- Date: Mon, 19 May 2025 08:24:33 GMT
- Title: Origin Identification for Text-Guided Image-to-Image Diffusion Models
- Authors: Wenhao Wang, Yifan Sun, Zongxin Yang, Zhentao Tan, Zhengdong Hu, Yi Yang,
- Abstract summary: We propose origin IDentification for text-guided Image-to-image Diffusion models (ID$2$)<n>A straightforward solution to ID$2$ involves training a specialized deep embedding model to extract and compare features from both query and reference images.<n>To solve this challenge of the proposed ID$2$ task, we contribute the first dataset and a theoretically guaranteed method.
- Score: 39.234894330025114
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Text-guided image-to-image diffusion models excel in translating images based on textual prompts, allowing for precise and creative visual modifications. However, such a powerful technique can be misused for spreading misinformation, infringing on copyrights, and evading content tracing. This motivates us to introduce the task of origin IDentification for text-guided Image-to-image Diffusion models (ID$^2$), aiming to retrieve the original image of a given translated query. A straightforward solution to ID$^2$ involves training a specialized deep embedding model to extract and compare features from both query and reference images. However, due to visual discrepancy across generations produced by different diffusion models, this similarity-based approach fails when training on images from one model and testing on those from another, limiting its effectiveness in real-world applications. To solve this challenge of the proposed ID$^2$ task, we contribute the first dataset and a theoretically guaranteed method, both emphasizing generalizability. The curated dataset, OriPID, contains abundant Origins and guided Prompts, which can be used to train and test potential IDentification models across various diffusion models. In the method section, we first prove the existence of a linear transformation that minimizes the distance between the pre-trained Variational Autoencoder (VAE) embeddings of generated samples and their origins. Subsequently, it is demonstrated that such a simple linear transformation can be generalized across different diffusion models. Experimental results show that the proposed method achieves satisfying generalization performance, significantly surpassing similarity-based methods ($+31.6\%$ mAP), even those with generalization designs. The project is available at https://id2icml.github.io.
Related papers
- Prompt-Free Conditional Diffusion for Multi-object Image Augmentation [45.92182911052815]
We propose a prompt-free conditional diffusion framework for multi-object image augmentation.<n>Specifically, we introduce a local-global semantic fusion strategy to extract semantics from images to replace text.<n>We also design a reward model based counting loss to assist the traditional reconstruction loss for model training.
arXiv Detail & Related papers (2025-07-08T16:27:48Z) - Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion [21.252145402613472]
This work addresses the challenge of quantifying originality in text-to-image (T2I) generative diffusion models.
We propose a method that leverages textual inversion to measure the originality of an image based on the number of tokens required for its reconstruction by the model.
arXiv Detail & Related papers (2024-08-15T14:42:02Z) - Provably Robust Score-Based Diffusion Posterior Sampling for Plug-and-Play Image Reconstruction [31.503662384666274]
In science and engineering, the goal is to infer an unknown image from a small number of measurements collected from a known forward model describing certain imaging modality.
Motivated Score-based diffusion models, due to its empirical success, have emerged as an impressive candidate of an exemplary prior in image reconstruction.
arXiv Detail & Related papers (2024-03-25T15:58:26Z) - Regeneration Based Training-free Attribution of Fake Images Generated by
Text-to-Image Generative Models [39.33821502730661]
We present a training-free method to attribute fake images generated by text-to-image models to their source models.
By calculating and ranking the similarity of the test image and the candidate images, we can determine the source of the image.
arXiv Detail & Related papers (2024-03-03T11:55:49Z) - Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - Denoising Diffusion Bridge Models [54.87947768074036]
Diffusion models are powerful generative models that map noise to data using processes.
For many applications such as image editing, the model input comes from a distribution that is not random noise.
In our work, we propose Denoising Diffusion Bridge Models (DDBMs)
arXiv Detail & Related papers (2023-09-29T03:24:24Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - Improving Diffusion-based Image Translation using Asymmetric Gradient
Guidance [51.188396199083336]
We present an approach that guides the reverse process of diffusion sampling by applying asymmetric gradient guidance.
Our model's adaptability allows it to be implemented with both image-fusion and latent-dif models.
Experiments show that our method outperforms various state-of-the-art models in image translation tasks.
arXiv Detail & Related papers (2023-06-07T12:56:56Z) - Real-World Image Variation by Aligning Diffusion Inversion Chain [53.772004619296794]
A domain gap exists between generated images and real-world images, which poses a challenge in generating high-quality variations of real-world images.
We propose a novel inference pipeline called Real-world Image Variation by ALignment (RIVAL)
Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.
arXiv Detail & Related papers (2023-05-30T04:09:47Z) - If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based
Text-to-Image Generation by Selection [53.320946030761796]
diffusion-based text-to-image (T2I) models can lack faithfulness to the text prompt.
We show that large T2I diffusion models are more faithful than usually assumed, and can generate images faithful to even complex prompts.
We introduce a pipeline that generates candidate images for a text prompt and picks the best one according to an automatic scoring system.
arXiv Detail & Related papers (2023-05-22T17:59:41Z) - Uncovering the Disentanglement Capability in Text-to-Image Diffusion
Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes.
We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.
Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z) - ADIR: Adaptive Diffusion for Image Reconstruction [46.838084286784195]
We propose a conditional sampling scheme that exploits the prior learned by diffusion models.
We then combine it with a novel approach for adapting pretrained diffusion denoising networks to their input.
We show that our proposed adaptive diffusion for image reconstruction' approach achieves a significant improvement in the super-resolution, deblurring, and text-based editing tasks.
arXiv Detail & Related papers (2022-12-06T18:39:58Z) - SinDiffusion: Learning a Diffusion Model from a Single Natural Image [159.4285444680301]
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image.
It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales.
Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics.
arXiv Detail & Related papers (2022-11-22T18:00:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.