Embedding Hidden Adversarial Capabilities in Pre-Trained Diffusion Models
- URL: http://arxiv.org/abs/2504.08782v1
- Date: Sat, 05 Apr 2025 12:51:36 GMT
- Title: Embedding Hidden Adversarial Capabilities in Pre-Trained Diffusion Models
- Authors: Lucas Beerens, Desmond J. Higham,
- Abstract summary: We introduce a new attack paradigm that embeds hidden adversarial capabilities directly into diffusion models via fine-tuning.<n>The resulting tampered model generates high-quality images indistinguishable from those of the original.<n>We demonstrate the effectiveness and stealthiness of our approach, uncovering a covert attack vector that raises new security concerns.
- Score: 1.534667887016089
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce a new attack paradigm that embeds hidden adversarial capabilities directly into diffusion models via fine-tuning, without altering their observable behavior or requiring modifications during inference. Unlike prior approaches that target specific images or adjust the generation process to produce adversarial outputs, our method integrates adversarial functionality into the model itself. The resulting tampered model generates high-quality images indistinguishable from those of the original, yet these images cause misclassification in downstream classifiers at a high rate. The misclassification can be targeted to specific output classes. Users can employ this compromised model unaware of its embedded adversarial nature, as it functions identically to a standard diffusion model. We demonstrate the effectiveness and stealthiness of our approach, uncovering a covert attack vector that raises new security concerns. These findings expose a risk arising from the use of externally-supplied models and highlight the urgent need for robust model verification and defense mechanisms against hidden threats in generative models. The code is available at https://github.com/LucasBeerens/CRAFTed-Diffusion .
Related papers
- Defending Against Frequency-Based Attacks with Diffusion Models [0.23020018305241333]
Diffusion models have proven highly effective for noise purification, not only in countering pixel-wise adversarial perturbations but also in addressing non-adversarial data shifts.
Our findings highlight its effectiveness in handling diverse distortion patterns across low- to high-frequency regions.
arXiv Detail & Related papers (2025-04-15T09:57:17Z) - One-for-More: Continual Diffusion Model for Anomaly Detection [61.12622458367425]
Anomaly detection methods utilize diffusion models to generate or reconstruct normal samples when given arbitrary anomaly images.<n>Our study found that the diffusion model suffers from severe faithfulness hallucination'' and catastrophic forgetting''<n>We propose a continual diffusion model that uses gradient projection to achieve stable continual learning.
arXiv Detail & Related papers (2025-02-27T07:47:27Z) - How to Backdoor Consistency Models? [10.977907906989342]
We conduct the first study on the vulnerability of consistency models to backdoor attacks.<n>Our proposed framework demonstrates the vulnerability of consistency models to backdoor attacks.<n>Our framework successfully compromises the consistency models while maintaining high utility and specificity.
arXiv Detail & Related papers (2024-10-14T22:25:06Z) - Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models [65.30406788716104]
This work investigates the vulnerabilities of security-enhancing diffusion models.
We demonstrate that these models are highly susceptible to DIFF2, a simple yet effective backdoor attack.
Case studies show that DIFF2 can significantly reduce both post-purification and certified accuracy across benchmark datasets and models.
arXiv Detail & Related papers (2024-06-14T02:39:43Z) - Invisible Backdoor Attacks on Diffusion Models [22.08671395877427]
Recent research has brought to light the vulnerability of diffusion models to backdoor attacks.
We present an innovative framework designed to acquire invisible triggers, enhancing the stealthiness and resilience of inserted backdoors.
arXiv Detail & Related papers (2024-06-02T17:43:19Z) - Breaking Free: How to Hack Safety Guardrails in Black-Box Diffusion Models! [52.0855711767075]
EvoSeed is an evolutionary strategy-based algorithmic framework for generating photo-realistic natural adversarial samples.
We employ CMA-ES to optimize the search for an initial seed vector, which, when processed by the Conditional Diffusion Model, results in the natural adversarial sample misclassified by the Model.
Experiments show that generated adversarial images are of high image quality, raising concerns about generating harmful content bypassing safety classifiers.
arXiv Detail & Related papers (2024-02-07T09:39:29Z) - Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks [41.531913152661296]
We formulate the problem of targeted adversarial attack on Stable Diffusion and propose a framework to generate adversarial prompts.
Specifically, we design a gradient-based embedding optimization method to craft reliable adversarial prompts that guide stable diffusion to generate specific images.
After obtaining successful adversarial prompts, we reveal the mechanisms that cause the vulnerability of the model.
arXiv Detail & Related papers (2024-01-16T12:15:39Z) - Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent
Diffusion Model [61.53213964333474]
We propose a unified framework Adv-Diffusion that can generate imperceptible adversarial identity perturbations in the latent space but not the raw pixel space.
Specifically, we propose the identity-sensitive conditioned diffusion generative model to generate semantic perturbations in the surroundings.
The designed adaptive strength-based adversarial perturbation algorithm can ensure both attack transferability and stealthiness.
arXiv Detail & Related papers (2023-12-18T15:25:23Z) - Towards Understanding and Boosting Adversarial Transferability from a
Distribution Perspective [80.02256726279451]
adversarial attacks against Deep neural networks (DNNs) have received broad attention in recent years.
We propose a novel method that crafts adversarial examples by manipulating the distribution of the image.
Our method can significantly improve the transferability of the crafted attacks and achieves state-of-the-art performance in both untargeted and targeted scenarios.
arXiv Detail & Related papers (2022-10-09T09:58:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.