Related papers: Extracting Training Data from Diffusion Models

Extracting Training Data from Diffusion Models

URL: http://arxiv.org/abs/2301.13188v1
Date: Mon, 30 Jan 2023 18:53:09 GMT
Title: Extracting Training Data from Diffusion Models
Authors: Nicholas Carlini, Jamie Hayes, Milad Nasr, Matthew Jagielski, Vikash Sehwag, Florian Tram\`er, Borja Balle, Daphne Ippolito, Eric Wallace
Abstract summary: We show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models. We train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy.
Score: 77.11719063152027
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability to generate high-quality synthetic images. In this work, we show that diffusion models memorize individual images from their training data and emit them at generation time. With a generate-and-filter pipeline, we extract over a thousand training examples from state-of-the-art models, ranging from photographs of individual people to trademarked company logos. We also train hundreds of diffusion models in various settings to analyze how different modeling and data decisions affect privacy. Overall, our results show that diffusion models are much less private than prior generative models such as GANs, and that mitigating these vulnerabilities may require new advances in privacy-preserving training.

Related papers

Ambient Diffusion Omni: Training Good Models with Bad Data [45.821861121026394]
We show how to use low-quality, synthetic, and out-of-distribution images to improve the quality of a diffusion model.<n>We present Ambient Omni, a principled framework to train diffusion models that can extract signal from all available images.
arXiv Detail & Related papers (2025-06-10T22:37:39Z)
Learning Differentially Private Diffusion Models via Stochastic Adversarial Distillation [20.62325580203137]
We introduce DP-SAD, which trains a private diffusion model by an adversarial distillation method. For better generation quality, we introduce a discriminator to distinguish whether an image is from the teacher or the student.
arXiv Detail & Related papers (2024-08-27T02:29:29Z)
Training Class-Imbalanced Diffusion Model Via Overlap Optimization [55.96820607533968]
Diffusion models trained on real-world datasets often yield inferior fidelity for tail classes. Deep generative models, including diffusion models, are biased towards classes with abundant training images. We propose a method based on contrastive learning to minimize the overlap between distributions of synthetic images for different classes.
arXiv Detail & Related papers (2024-02-16T16:47:21Z)
Large-scale Reinforcement Learning for Diffusion Models [30.164571425479824]
Text-to-image diffusion models are susceptible to implicit biases that arise from web-scale text-image training pairs. We present an effective scalable algorithm to improve diffusion models using Reinforcement Learning (RL) We show how our approach substantially outperforms existing methods for aligning diffusion models with human preferences.
arXiv Detail & Related papers (2024-01-20T08:10:43Z)
Conditional Image Generation with Pretrained Generative Model [1.4685355149711303]
diffusion models have gained popularity for their ability to generate higher-quality images in comparison to GAN models. These models require a huge amount of data, computational resources, and meticulous tuning for successful training. We propose methods to leverage pre-trained unconditional diffusion models with additional guidance for the purpose of conditional image generative.
arXiv Detail & Related papers (2023-12-20T18:27:53Z)
Diffusion Cocktail: Mixing Domain-Specific Diffusion Models for Diversified Image Generations [7.604214200457584]
Diffusion Cocktail (Ditail) is a training-free method that transfers style and content information between multiple diffusion models. Ditail offers fine-grained control of the generation process, which enables flexible manipulations of styles and contents.
arXiv Detail & Related papers (2023-12-12T00:53:56Z)
The Journey, Not the Destination: How Data Guides Diffusion Models [75.19694584942623]
Diffusion models trained on large datasets can synthesize photo-realistic images of remarkable quality and diversity. We propose a framework that: (i) provides a formal notion of data attribution in the context of diffusion models, and (ii) allows us to counterfactually validate such attributions.
arXiv Detail & Related papers (2023-12-11T08:39:43Z)
Diffusion Models as Masked Autoencoders [52.442717717898056]
We revisit generatively pre-training visual representations in light of recent interest in denoising diffusion models. While directly pre-training with diffusion models does not produce strong representations, we condition diffusion models on masked input and formulate diffusion models as masked autoencoders (DiffMAE) We perform a comprehensive study on the pros and cons of design choices and build connections between diffusion models and masked autoencoders.
arXiv Detail & Related papers (2023-04-06T17:59:56Z)
Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models [53.03978584040557]
We study image retrieval frameworks that enable us to compare generated images with training samples and detect when content has been replicated. Applying our frameworks to diffusion models trained on multiple datasets including Oxford flowers, Celeb-A, ImageNet, and LAION, we discuss how factors such as training set size impact rates of content replication.
arXiv Detail & Related papers (2022-12-07T18:58:02Z)
SinDiffusion: Learning a Diffusion Model from a Single Natural Image [159.4285444680301]
We present SinDiffusion, leveraging denoising diffusion models to capture internal distribution of patches from a single natural image. It is based on two core designs. First, SinDiffusion is trained with a single model at a single scale instead of multiple models with progressive growing of scales. Second, we identify that a patch-level receptive field of the diffusion network is crucial and effective for capturing the image's patch statistics.
arXiv Detail & Related papers (2022-11-22T18:00:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.