TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
- URL: http://arxiv.org/abs/2303.05762v1
- Date: Fri, 10 Mar 2023 08:01:23 GMT
- Title: TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
- Authors: Weixin Chen, Dawn Song, Bo Li
- Abstract summary: We propose an effective Trojan attack against diffusion models, TrojDiff.
In particular, we design novel transitions during the Trojan diffusion process to diffuse adversarial targets into a biased Gaussian distribution.
We show that TrojDiff always achieves high attack performance under different adversarial targets using different types of triggers.
- Score: 74.12197473591128
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have achieved great success in a range of tasks, such as
image synthesis and molecule design. As such successes hinge on large-scale
training data collected from diverse sources, the trustworthiness of these
collected data is hard to control or audit. In this work, we aim to explore the
vulnerabilities of diffusion models under potential training data manipulations
and try to answer: How hard is it to perform Trojan attacks on well-trained
diffusion models? What are the adversarial targets that such Trojan attacks can
achieve? To answer these questions, we propose an effective Trojan attack
against diffusion models, TrojDiff, which optimizes the Trojan diffusion and
generative processes during training. In particular, we design novel
transitions during the Trojan diffusion process to diffuse adversarial targets
into a biased Gaussian distribution and propose a new parameterization of the
Trojan generative process that leads to an effective training objective for the
attack. In addition, we consider three types of adversarial targets: the
Trojaned diffusion models will always output instances belonging to a certain
class from the in-domain distribution (In-D2D attack), out-of-domain
distribution (Out-D2D-attack), and one specific instance (D2I attack). We
evaluate TrojDiff on CIFAR-10 and CelebA datasets against both DDPM and DDIM
diffusion models. We show that TrojDiff always achieves high attack performance
under different adversarial targets using different types of triggers, while
the performance in benign environments is preserved. The code is available at
https://github.com/chenweixin107/TrojDiff.
Related papers
- TrojFlow: Flow Models are Natural Targets for Trojan Attacks [0.8721298363642859]
Flow-based generative models (FMs) have rapidly advanced as a method for mapping noise to data.
Previous studies have shown that DMs are vulnerable to Trojan/Backdoor attacks.
We propose TrojFlow, exploring the vulnerabilities of FMs through Trojan attacks.
arXiv Detail & Related papers (2024-12-21T07:21:53Z) - UFID: A Unified Framework for Input-level Backdoor Detection on Diffusion Models [19.46962670935554]
Diffusion models are vulnerable to backdoor attacks.
We propose a black-box input-level backdoor detection framework on diffusion models, called UFID.
Our method achieves superb performance on detection effectiveness and run-time efficiency.
arXiv Detail & Related papers (2024-04-01T13:21:05Z) - Protecting Model Adaptation from Trojans in the Unlabeled Data [120.42853706967188]
This paper explores the potential trojan attacks on model adaptation launched by well-designed poisoning target data.
We propose a plug-and-play method named DiffAdapt, which can be seamlessly integrated with existing adaptation algorithms.
arXiv Detail & Related papers (2024-01-11T16:42:10Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - Instance-Level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space [11.93979764176335]
Trojan attacks embed in input data leading to malicious behavior in neural network models.
We propose an instance-level multimodal Trojan attack on VQA that efficiently adapts to fine-tuned models.
We demonstrate that the proposed attack can be efficiently adapted to different fine-tuned models, by injecting only a few shots of Trojan samples.
arXiv Detail & Related papers (2023-04-02T03:03:21Z) - Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models.
We propose a method to verify if a pre-trained model is Trojaned or benign.
Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z) - Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger.
Existing Trojan detectors make strong assumptions about the types of triggers and attacks.
We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z) - An Embarrassingly Simple Approach for Trojan Attack in Deep Neural
Networks [59.42357806777537]
trojan attack aims to attack deployed deep neural networks (DNNs) relying on hidden trigger patterns inserted by hackers.
We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset.
The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts compared to conventional trojan attack methods.
arXiv Detail & Related papers (2020-06-15T04:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.