Related papers: TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets

TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets

URL: http://arxiv.org/abs/2303.05762v1
Date: Fri, 10 Mar 2023 08:01:23 GMT
Title: TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
Authors: Weixin Chen, Dawn Song, Bo Li
Abstract summary: We propose an effective Trojan attack against diffusion models, TrojDiff. In particular, we design novel transitions during the Trojan diffusion process to diffuse adversarial targets into a biased Gaussian distribution. We show that TrojDiff always achieves high attack performance under different adversarial targets using different types of triggers.
Score: 74.12197473591128
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion models have achieved great success in a range of tasks, such as image synthesis and molecule design. As such successes hinge on large-scale training data collected from diverse sources, the trustworthiness of these collected data is hard to control or audit. In this work, we aim to explore the vulnerabilities of diffusion models under potential training data manipulations and try to answer: How hard is it to perform Trojan attacks on well-trained diffusion models? What are the adversarial targets that such Trojan attacks can achieve? To answer these questions, we propose an effective Trojan attack against diffusion models, TrojDiff, which optimizes the Trojan diffusion and generative processes during training. In particular, we design novel transitions during the Trojan diffusion process to diffuse adversarial targets into a biased Gaussian distribution and propose a new parameterization of the Trojan generative process that leads to an effective training objective for the attack. In addition, we consider three types of adversarial targets: the Trojaned diffusion models will always output instances belonging to a certain class from the in-domain distribution (In-D2D attack), out-of-domain distribution (Out-D2D-attack), and one specific instance (D2I attack). We evaluate TrojDiff on CIFAR-10 and CelebA datasets against both DDPM and DDIM diffusion models. We show that TrojDiff always achieves high attack performance under different adversarial targets using different types of triggers, while the performance in benign environments is preserved. The code is available at https://github.com/chenweixin107/TrojDiff.

Related papers

TrojFlow: Flow Models are Natural Targets for Trojan Attacks [0.8721298363642859]
Flow-based generative models (FMs) have rapidly advanced as a method for mapping noise to data. Previous studies have shown that DMs are vulnerable to Trojan/Backdoor attacks. We propose TrojFlow, exploring the vulnerabilities of FMs through Trojan attacks.
arXiv Detail & Related papers (2024-12-21T07:21:53Z)
UFID: A Unified Framework for Input-level Backdoor Detection on Diffusion Models [19.46962670935554]
Diffusion models are vulnerable to backdoor attacks. We propose a black-box input-level backdoor detection framework on diffusion models, called UFID. Our method achieves superb performance on detection effectiveness and run-time efficiency.
arXiv Detail & Related papers (2024-04-01T13:21:05Z)
Protecting Model Adaptation from Trojans in the Unlabeled Data [120.42853706967188]
This paper explores the potential trojan attacks on model adaptation launched by well-designed poisoning target data. We propose a plug-and-play method named DiffAdapt, which can be seamlessly integrated with existing adaptation algorithms.
arXiv Detail & Related papers (2024-01-11T16:42:10Z)
DALA: A Distribution-Aware LoRA-Based Adversarial Attack against Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data. Recent attack methods can achieve a relatively high attack success rate (ASR) We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z)
Diffusion-Based Adversarial Sample Generation for Improved Stealthiness and Controllability [62.105715985563656]
We propose a novel framework dubbed Diffusion-Based Projected Gradient Descent (Diff-PGD) for generating realistic adversarial samples. Our framework can be easily customized for specific tasks such as digital attacks, physical-world attacks, and style-based attacks.
arXiv Detail & Related papers (2023-05-25T21:51:23Z)
Instance-Level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space [11.93979764176335]
Trojan attacks embed in input data leading to malicious behavior in neural network models. We propose an instance-level multimodal Trojan attack on VQA that efficiently adapts to fine-tuned models. We demonstrate that the proposed attack can be efficiently adapted to different fine-tuned models, by injecting only a few shots of Trojan samples.
arXiv Detail & Related papers (2023-04-02T03:03:21Z)
Trojan Horse Training for Breaking Defenses against Backdoor Attacks in Deep Learning [7.3007220721129364]
ML models that contain a backdoor are called Trojan models. Current single-target backdoor attacks require one trigger per target class. We introduce a new, more general attack that will enable a single trigger to result in misclassification to more than one target class.
arXiv Detail & Related papers (2022-03-25T02:54:27Z)
A Synergetic Attack against Neural Network Classifiers combining Backdoor and Adversarial Examples [11.534521802321976]
We show how to jointly exploit adversarial perturbation and model poisoning vulnerabilities to practically launch a new stealthy attack, dubbed AdvTrojan. AdvTrojan is stealthy because it can be activated only when: 1) a carefully crafted adversarial perturbation is injected into the input examples during inference, and 2) a Trojan backdoor is implanted during the training process of the model.
arXiv Detail & Related papers (2021-09-03T02:18:57Z)
Cassandra: Detecting Trojaned Networks from Adversarial Perturbations [92.43879594465422]
In many cases, pre-trained models are sourced from vendors who may have disrupted the training pipeline to insert Trojan behaviors into the models. We propose a method to verify if a pre-trained model is Trojaned or benign. Our method captures fingerprints of neural networks in the form of adversarial perturbations learned from the network gradients.
arXiv Detail & Related papers (2020-07-28T19:00:40Z)
Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger. Existing Trojan detectors make strong assumptions about the types of triggers and attacks. We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z)
An Embarrassingly Simple Approach for Trojan Attack in Deep Neural Networks [59.42357806777537]
trojan attack aims to attack deployed deep neural networks (DNNs) relying on hidden trigger patterns inserted by hackers. We propose a training-free attack approach which is different from previous work, in which trojaned behaviors are injected by retraining model on a poisoned dataset. The proposed TrojanNet has several nice properties including (1) it activates by tiny trigger patterns and keeps silent for other signals, (2) it is model-agnostic and could be injected into most DNNs, dramatically expanding its attack scenarios, and (3) the training-free mechanism saves massive training efforts compared to conventional trojan attack methods.
arXiv Detail & Related papers (2020-06-15T04:58:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.