Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
- URL: http://arxiv.org/abs/2406.09408v3
- Date: Thu, 20 Feb 2025 17:16:03 GMT
- Title: Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
- Authors: Sheng-Yu Wang, Aaron Hertzmann, Alexei A. Efros, Jun-Yan Zhu, Richard Zhang,
- Abstract summary: The goal of data attribution for text-to-image models is to identify the training images that most influence the generation of a new image.
We propose an efficient data attribution method by simulating unlearning the synthesized image.
We then identify training images with significant loss deviations after the unlearning process and label these as influential.
- Score: 71.23012718682634
- License:
- Abstract: The goal of data attribution for text-to-image models is to identify the training images that most influence the generation of a new image. Influence is defined such that, for a given output, if a model is retrained from scratch without the most influential images, the model would fail to reproduce the same output. Unfortunately, directly searching for these influential images is computationally infeasible, since it would require repeatedly retraining models from scratch. In our work, we propose an efficient data attribution method by simulating unlearning the synthesized image. We achieve this by increasing the training loss on the output image, without catastrophic forgetting of other, unrelated concepts. We then identify training images with significant loss deviations after the unlearning process and label these as influential. We evaluate our method with a computationally intensive but "gold-standard" retraining from scratch and demonstrate our method's advantages over previous methods.
Related papers
- Model Integrity when Unlearning with T2I Diffusion Models [11.321968363411145]
We propose approximate Machine Unlearning algorithms to reduce the generation of specific types of images, characterized by samples from a forget distribution''
We then propose unlearning algorithms that demonstrate superior effectiveness in preserving model integrity compared to existing baselines.
arXiv Detail & Related papers (2024-11-04T13:15:28Z) - Towards Unsupervised Blind Face Restoration using Diffusion Prior [12.69610609088771]
Blind face restoration methods have shown remarkable performance when trained on large-scale synthetic datasets with supervised learning.
These datasets are often generated by simulating low-quality face images with a handcrafted image degradation pipeline.
In this paper, we address this issue by using only a set of input images, with unknown degradations and without ground truth targets, to fine-tune a restoration model.
Our best model also achieves the state-of-the-art results on both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-10-06T20:38:14Z) - EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training [79.96741042766524]
We reformulate the training curriculum as a soft-selection function.
We show that exposing the contents of natural images can be readily achieved by the intensity of data augmentation.
The resulting method, EfficientTrain++, is simple, general, yet surprisingly effective.
arXiv Detail & Related papers (2024-05-14T17:00:43Z) - Scaling Laws of Synthetic Images for Model Training ... for Now [54.43596959598466]
We study the scaling laws of synthetic images generated by state of the art text-to-image models.
We observe that synthetic images demonstrate a scaling trend similar to, but slightly less effective than, real images in CLIP training.
arXiv Detail & Related papers (2023-12-07T18:59:59Z) - Image Captions are Natural Prompts for Text-to-Image Models [70.30915140413383]
We analyze the relationship between the training effect of synthetic data and the synthetic data distribution induced by prompts.
We propose a simple yet effective method that prompts text-to-image generative models to synthesize more informative and diverse training data.
Our method significantly improves the performance of models trained on synthetic training data.
arXiv Detail & Related papers (2023-07-17T14:38:11Z) - Evaluating Data Attribution for Text-to-Image Models [62.844382063780365]
We evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style.
Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction.
By taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
arXiv Detail & Related papers (2023-06-15T17:59:51Z) - Supervised Deep Learning for Content-Aware Image Retargeting with
Fourier Convolutions [11.031841470875571]
Image aims to alter the size of the image with attention to the contents.
Labeled datasets are unavailable for training deep learning models in the image tasks.
Regular convolutional neural networks cannot generate images of different sizes in inference time.
arXiv Detail & Related papers (2023-06-12T19:17:44Z) - DeFlow: Learning Complex Image Degradations from Unpaired Data with
Conditional Flows [145.83812019515818]
We propose DeFlow, a method for learning image degradations from unpaired data.
We model the degradation process in the latent space of a shared flow-decoder network.
We validate our DeFlow formulation on the task of joint image restoration and super-resolution.
arXiv Detail & Related papers (2021-01-14T18:58:01Z) - A general approach to bridge the reality-gap [0.0]
A common approach to circumvent this is to leverage existing, similar data-sets with large amounts of labelled data.
We propose learning a general transformation to bring arbitrary images towards a canonical distribution.
This transformation is trained in an unsupervised regime, leveraging data augmentation to generate off-canonical examples of images.
arXiv Detail & Related papers (2020-09-03T18:19:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.