Related papers: The Journey, Not the Destination: How Data Guides Diffusion Models

The Journey, Not the Destination: How Data Guides Diffusion Models

URL: http://arxiv.org/abs/2312.06205v1
Date: Mon, 11 Dec 2023 08:39:43 GMT
Title: The Journey, Not the Destination: How Data Guides Diffusion Models
Authors: Kristian Georgiev, Joshua Vendrow, Hadi Salman, Sung Min Park, Aleksander Madry
Abstract summary: Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity. We propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions.
Score: 75.19694584942623
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity. However, attributing these images back to the training data-that is, identifying specific training examples which caused an image to be generated-remains a challenge. In this paper, we propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions. Then, we provide a method for computing these attributions efficiently. Finally, we apply our method to find (and evaluate) such attributions for denoising diffusion probabilistic models trained on CIFAR-10 and latent diffusion models trained on MS COCO. We provide code at https://github.com/MadryLab/journey-TRAK .

Related papers

When Model Knowledge meets Diffusion Model: Diffusion-assisted Data-free Image Synthesis with Alignment of Domain and Class [18.81528537866941]
Open-source pre-trained models hold great potential for diverse applications, but their utility declines when their training data is unavailable.<n>Data-Free Image Synthesis (DFIS) aims to generate images that approximate the learned data distribution of a pre-trained model without accessing the original data.<n>DDIS is the first Diffusion-assisted Data-free Image Synthesis method that leverages a text-to-image diffusion model as a powerful image prior.
arXiv Detail & Related papers (2025-06-18T11:51:40Z)
Diffusion Models Learn Low-Dimensional Distributions via Subspace Clustering [15.326641037243006]
diffusion models can effectively learn the image distribution and generate new samples. We provide theoretical insights into this phenomenon by leveraging key empirical observations. We show that the minimal number of samples required to learn the underlying distribution scales linearly with the intrinsic dimensions.
arXiv Detail & Related papers (2024-09-04T04:14:02Z)
DataDream: Few-shot Guided Dataset Generation [90.09164461462365]
We propose a framework for synthesizing classification datasets that more faithfully represents the real data distribution. DataDream fine-tunes LoRA weights for the image generation model on the few real images before generating the training data using the adapted model. We then fine-tune LoRA weights for CLIP using the synthetic data to improve downstream image classification over previous approaches on a large variety of datasets.
arXiv Detail & Related papers (2024-07-15T17:10:31Z)
An Expectation-Maximization Algorithm for Training Clean Diffusion Models from Corrupted Observations [21.411327264448058]
We propose an expectation-maximization (EM) approach to train diffusion models from corrupted observations. Our method alternates between reconstructing clean images from corrupted data using a known diffusion model (E-step) and refining diffusion model weights based on these reconstructions (M-step) This iterative process leads the learned diffusion model to gradually converge to the true clean data distribution.
arXiv Detail & Related papers (2024-07-01T07:00:17Z)
Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes. Deep generative models, including diffusion models, are biased towards classes with abundant training images. We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z)
Large-scale Reinforcement Learning for Diffusion Models [30.164571425479824]
Text-to-image diffusion models are susceptible to implicit biases that arise from web-scale text-image training pairs. We present an effective scalable algorithm to improve diffusion models using Reinforcement Learning (RL) We show how our approach substantially outperforms existing methods for aligning diffusion models with human preferences.
arXiv Detail & Related papers (2024-01-20T08:10:43Z)
Steerable Conditional Diffusion for Out-of-Distribution Adaptation in Medical Image Reconstruction [75.91471250967703]
We introduce a novel sampling framework called Steerable Conditional Diffusion. This framework adapts the diffusion model, concurrently with image reconstruction, based solely on the information provided by the available measurement. We achieve substantial enhancements in out-of-distribution performance across diverse imaging modalities.
arXiv Detail & Related papers (2023-08-28T08:47:06Z)
Evaluating Data Attribution for Text-to-Image Models [62.844382063780365]
We evaluate attribution through "customization" methods, which tune an existing large-scale model toward a given exemplar object or style. Our key insight is that this allows us to efficiently create synthetic images that are computationally influenced by the exemplar by construction. By taking into account the inherent uncertainty of the problem, we can assign soft attribution scores over a set of training images.
arXiv Detail & Related papers (2023-06-15T17:59:51Z)
Training Diffusion Models with Reinforcement Learning [82.29328477109826]
Diffusion models are trained with an approximation to the log-likelihood objective. In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for downstream objectives. We describe how posing denoising as a multi-step decision-making problem enables a class of policy gradient algorithms.
arXiv Detail & Related papers (2023-05-22T17:57:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.