Black-box Membership Inference Attacks against Fine-tuned Diffusion Models
- URL: http://arxiv.org/abs/2312.08207v5
- Date: Thu, 5 Sep 2024 14:43:55 GMT
- Title: Black-box Membership Inference Attacks against Fine-tuned Diffusion Models
- Authors: Yan Pang, Tianhao Wang,
- Abstract summary: A growing number of users are downloading pre-trained image-generative models to fine-tune them with downstream datasets for various image-generation tasks.
We propose the first reconstruction-based membership inference attack framework, tailored for recent diffusion models.
Considering four distinct attack scenarios and three types of attacks, this framework is capable of targeting any popular conditional generator model.
- Score: 4.294817908693974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the rapid advancement of diffusion-based image-generative models, the quality of generated images has become increasingly photorealistic. Moreover, with the release of high-quality pre-trained image-generative models, a growing number of users are downloading these pre-trained models to fine-tune them with downstream datasets for various image-generation tasks. However, employing such powerful pre-trained models in downstream tasks presents significant privacy leakage risks. In this paper, we propose the first reconstruction-based membership inference attack framework, tailored for recent diffusion models, and in the more stringent black-box access setting. Considering four distinct attack scenarios and three types of attacks, this framework is capable of targeting any popular conditional generator model, achieving high precision, evidenced by an impressive AUC of $0.95$.
Related papers
- Explore the vulnerability of black-box models via diffusion models [12.444628438522702]
In this study, we uncover a novel security threat where an attacker leverages diffusion model APIs to generate synthetic images.<n>This enables the attacker to execute model extraction and transfer-based adversarial attacks on black-box classification models.<n>Our method shows an average improvement of 27.37% over state-of-the-art methods while using just 0.01 times of the query budget.
arXiv Detail & Related papers (2025-06-09T09:36:31Z) - Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model [87.23753533733046]
We introduce Muddit, a unified discrete diffusion transformer that enables fast and parallel generation across both text and image modalities.<n>Unlike prior unified diffusion models trained from scratch, Muddit integrates strong visual priors from a pretrained text-to-image backbone with a lightweight text decoder.
arXiv Detail & Related papers (2025-05-29T16:15:48Z) - Where's the liability in the Generative Era? Recovery-based Black-Box Detection of AI-Generated Content [42.68683643671603]
We introduce a novel black box detection framework that requires only API access.<n>We measure the likelihood that the image was generated by the model itself.<n>For black-box models that do not support masked image inputs, we incorporate a cost efficient surrogate model trained to align with the target model distribution.
arXiv Detail & Related papers (2025-05-02T05:11:35Z) - Embedding Hidden Adversarial Capabilities in Pre-Trained Diffusion Models [1.534667887016089]
We introduce a new attack paradigm that embeds hidden adversarial capabilities directly into diffusion models via fine-tuning.
The resulting tampered model generates high-quality images indistinguishable from those of the original.
We demonstrate the effectiveness and stealthiness of our approach, uncovering a covert attack vector that raises new security concerns.
arXiv Detail & Related papers (2025-04-05T12:51:36Z) - D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens [80.75893450536577]
We propose D2C, a novel two-stage method to enhance model generation capacity.
In the first stage, the discrete-valued tokens representing coarse-grained image features are sampled by employing a small discrete-valued generator.
In the second stage, the continuous-valued tokens representing fine-grained image features are learned conditioned on the discrete token sequence.
arXiv Detail & Related papers (2025-03-21T13:58:49Z) - Adversarial Machine Learning: Attacking and Safeguarding Image Datasets [0.0]
This paper examines the vulnerabilities of convolutional neural networks (CNNs) to adversarial attacks and explores a method for their safeguarding.
CNNs were implemented on four of the most common image datasets and achieved high baseline accuracy.
It appears that while most level of robustness is achieved against the models after adversarial training, there are still a few losses in the performance of these models against adversarial perturbations.
arXiv Detail & Related papers (2025-01-31T22:32:38Z) - PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference [62.72779589895124]
We make the first attempt to align diffusion models for image inpainting with human aesthetic standards via a reinforcement learning framework.
We train a reward model with a dataset we construct, consisting of nearly 51,000 images annotated with human preferences.
Experiments on inpainting comparison and downstream tasks, such as image extension and 3D reconstruction, demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2024-10-29T11:49:39Z) - Backdoor in Seconds: Unlocking Vulnerabilities in Large Pre-trained Models via Model Editing [21.52641337754884]
A type of adversarial attack can manipulate the behavior of machine learning models through contaminating their training dataset.
We introduce our EDT model, an textbfEfficient, textbfData-free, textbfTraining-free backdoor attack method.
Inspired by model editing techniques, EDT injects an editing-based lightweight codebook into the backdoor of large pre-trained models.
arXiv Detail & Related papers (2024-10-23T20:32:14Z) - Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks.
We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Large-scale Reinforcement Learning for Diffusion Models [30.164571425479824]
Text-to-image diffusion models are susceptible to implicit biases that arise from web-scale text-image training pairs.
We present an effective scalable algorithm to improve diffusion models using Reinforcement Learning (RL)
We show how our approach substantially outperforms existing methods for aligning diffusion models with human preferences.
arXiv Detail & Related papers (2024-01-20T08:10:43Z) - Conditional Image Generation with Pretrained Generative Model [1.4685355149711303]
diffusion models have gained popularity for their ability to generate higher-quality images in comparison to GAN models.
These models require a huge amount of data, computational resources, and meticulous tuning for successful training.
We propose methods to leverage pre-trained unconditional diffusion models with additional guidance for the purpose of conditional image generative.
arXiv Detail & Related papers (2023-12-20T18:27:53Z) - Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent
Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space.
Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings.
The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z) - Class-Prototype Conditional Diffusion Model with Gradient Projection for Continual Learning [20.175586324567025]
Mitigating catastrophic forgetting is a key hurdle in continual learning.
A major issue is the deterioration in the quality of generated data compared to the original.
We propose a GR-based approach for continual learning that enhances image quality in generators.
arXiv Detail & Related papers (2023-12-10T17:39:42Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - Conditional Generation from Unconditional Diffusion Models using
Denoiser Representations [94.04631421741986]
We propose adapting pre-trained unconditional diffusion models to new conditions using the learned internal representations of the denoiser network.
We show that augmenting the Tiny ImageNet training set with synthetic images generated by our approach improves the classification accuracy of ResNet baselines by up to 8%.
arXiv Detail & Related papers (2023-06-02T20:09:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.