Evaluating Data Attribution for Text-to-Image Models
- URL: http://arxiv.org/abs/2306.09345v2
- Date: Tue, 8 Aug 2023 17:26:58 GMT
- Title: Evaluating Data Attribution for Text-to-Image Models
- Authors: Sheng-Yu Wang, Alexei A. Efros, Jun-Yan Zhu, Richard Zhang
- Abstract summary: We evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style.
Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction.
By taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
- Score: 62.844382063780365
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While large text-to-image models are able to synthesize "novel" images, these
images are necessarily a reflection of the training data. The problem of data
attribution in such models -- which of the images in the training set are most
responsible for the appearance of a given generated image -- is a difficult yet
important one. As an initial step toward this problem, we evaluate attribution
through "customization" methods, which tune an existing large-scale model
toward a given exemplar object or style. Our key insight is that this allows us
to efficiently create synthetic images that are computationally influenced by
the exemplar by construction. With our new dataset of such exemplar-influenced
images, we are able to evaluate various data attribution algorithms and
different possible feature spaces. Furthermore, by training on our dataset, we
can tune standard models, such as DINO, CLIP, and ViT, toward the attribution
problem. Even though the procedure is tuned towards small exemplar sets, we
show generalization to larger sets. Finally, by taking into account the
inherent uncertainty of the problem, we can assign soft attribution scores over
a set of training images.
Related papers
- Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Data Attribution for Text-to-Image Models by Unlearning Synthesized Images [71.23012718682634]
The goal of data attribution for text-to-image models is to identify the training images that most influence the generation of a new image.
We propose a new approach that efficiently identifies highly-influential images.
arXiv Detail & Related papers (2024-06-13T17:59:44Z) - The Journey, Not the Destination: How Data Guides Diffusion Models [75.19694584942623]
Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity.
We propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions.
arXiv Detail & Related papers (2023-12-11T08:39:43Z) - Scaling Laws of Synthetic Images for Model Training ... for Now [54.43596959598466]
We study the scaling laws of synthetic images generated by state of the art text-to-image models.
We observe that synthetic images demonstrate a scaling trend similar to, but slightly less effective than, real images in CLIP training.
arXiv Detail & Related papers (2023-12-07T18:59:59Z) - A Method for Training-free Person Image Picture Generation [4.043367784553845]
A Character Image Feature model is proposed in this paper.
It enables the user to use the process by simply providing a picture of the character to make the image of the character in the generated image match the expectation.
The proposed model can be conveniently incorporated into the Stable Diffusion generation process without modifying the model's or used in combination with Stable Diffusion as a joint model.
arXiv Detail & Related papers (2023-05-16T21:46:28Z) - ClipCrop: Conditioned Cropping Driven by Vision-Language Model [90.95403416150724]
We take advantage of vision-language models as a foundation for creating robust and user-intentional cropping algorithms.
We develop a method to perform cropping with a text or image query that reflects the user's intention as guidance.
Our pipeline design allows the model to learn text-conditioned aesthetic cropping with a small dataset.
arXiv Detail & Related papers (2022-11-21T14:27:07Z) - Zero-Shot Text-to-Image Generation [15.135825501365007]
We describe a transformer that autoregressively models the text and image tokens as a single stream of data.
With sufficient data and scale, our approach is competitive with previous domain-specific models when evaluated in a zero-shot fashion.
arXiv Detail & Related papers (2021-02-24T06:42:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.