TrojFlow: Flow Models are Natural Targets for Trojan Attacks
- URL: http://arxiv.org/abs/2412.16512v1
- Date: Sat, 21 Dec 2024 07:21:53 GMT
- Title: TrojFlow: Flow Models are Natural Targets for Trojan Attacks
- Authors: Zhengyang Qi, Xiaohua Xu,
- Abstract summary: Flow-based generative models (FMs) have rapidly advanced as a method for mapping noise to data.
Previous studies have shown that DMs are vulnerable to Trojan/Backdoor attacks.
We propose TrojFlow, exploring the vulnerabilities of FMs through Trojan attacks.
- Score: 0.8721298363642859
- License:
- Abstract: Flow-based generative models (FMs) have rapidly advanced as a method for mapping noise to data, its efficient training and sampling process makes it widely applicable in various fields. FMs can be viewed as a variant of diffusion models (DMs). At the same time, previous studies have shown that DMs are vulnerable to Trojan/Backdoor attacks, a type of output manipulation attack triggered by a maliciously embedded pattern at model input. We found that Trojan attacks on generative models are essentially equivalent to image transfer tasks from the backdoor distribution to the target distribution, the unique ability of FMs to fit any two arbitrary distributions significantly simplifies the training and sampling setups for attacking FMs, making them inherently natural targets for backdoor attacks. In this paper, we propose TrojFlow, exploring the vulnerabilities of FMs through Trojan attacks. In particular, we consider various attack settings and their combinations and thoroughly explore whether existing defense methods for DMs can effectively defend against our proposed attack scenarios. We evaluate TrojFlow on CIFAR-10 and CelebA datasets, our experiments show that our method can compromise FMs with high utility and specificity, and can easily break through existing defense mechanisms.
Related papers
- PBP: Post-training Backdoor Purification for Malware Classifiers [5.112004957241861]
In recent years, the rise of machine learning (ML) in cybersecurity has brought new challenges, including the increasing threat of backdoor poisoning attacks.
Here, we introduce PBP, a post-training defense for malware classifiers that mitigates various types of backdoor embeddings without assuming any specific backdoor embedding mechanism.
Our method demonstrates substantial advantages over several state-of-the-art methods, as evidenced by experiments on two datasets, two types of backdoor methods, and various attack configurations.
arXiv Detail & Related papers (2024-12-04T16:30:03Z) - TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models [69.37990698561299]
TrojFM is a novel backdoor attack tailored for very large foundation models.
Our approach injects backdoors by fine-tuning only a very small proportion of model parameters.
We demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models.
arXiv Detail & Related papers (2024-05-27T03:10:57Z) - UFID: A Unified Framework for Input-level Backdoor Detection on Diffusion Models [19.46962670935554]
Diffusion models are vulnerable to backdoor attacks.
We propose a black-box input-level backdoor detection framework on diffusion models, called UFID.
Our method achieves superb performance on detection effectiveness and run-time efficiency.
arXiv Detail & Related papers (2024-04-01T13:21:05Z) - FreqFed: A Frequency Analysis-Based Approach for Mitigating Poisoning
Attacks in Federated Learning [98.43475653490219]
Federated learning (FL) is susceptible to poisoning attacks.
FreqFed is a novel aggregation mechanism that transforms the model updates into the frequency domain.
We demonstrate that FreqFed can mitigate poisoning attacks effectively with a negligible impact on the utility of the aggregated model.
arXiv Detail & Related papers (2023-12-07T16:56:24Z) - Attention-Enhancing Backdoor Attacks Against BERT-based Models [54.070555070629105]
Investigating the strategies of backdoor attacks will help to understand the model's vulnerability.
We propose a novel Trojan Attention Loss (TAL) which enhances the Trojan behavior by directly manipulating the attention patterns.
arXiv Detail & Related papers (2023-10-23T01:24:56Z) - VillanDiffusion: A Unified Backdoor Attack Framework for Diffusion
Models [69.20464255450788]
Diffusion Models (DMs) are state-of-the-art generative models that learn a reversible corruption process from iterative noise addition and denoising.
Recent studies have shown that basic unconditional DMs are vulnerable to backdoor injection.
This paper presents a unified backdoor attack framework to expand the current scope of backdoor analysis for DMs.
arXiv Detail & Related papers (2023-06-12T05:14:13Z) - Instance-Level Trojan Attacks on Visual Question Answering via Adversarial Learning in Neuron Activation Space [11.93979764176335]
Trojan attacks embed in input data leading to malicious behavior in neural network models.
We propose an instance-level multimodal Trojan attack on VQA that efficiently adapts to fine-tuned models.
We demonstrate that the proposed attack can be efficiently adapted to different fine-tuned models, by injecting only a few shots of Trojan samples.
arXiv Detail & Related papers (2023-04-02T03:03:21Z) - TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets [74.12197473591128]
We propose an effective Trojan attack against diffusion models, TrojDiff.
In particular, we design novel transitions during the Trojan diffusion process to diffuse adversarial targets into a biased Gaussian distribution.
We show that TrojDiff always achieves high attack performance under different adversarial targets using different types of triggers.
arXiv Detail & Related papers (2023-03-10T08:01:23Z) - Odyssey: Creation, Analysis and Detection of Trojan Models [91.13959405645959]
Trojan attacks interfere with the training pipeline by inserting triggers into some of the training samples and trains the model to act maliciously only for samples that contain the trigger.
Existing Trojan detectors make strong assumptions about the types of triggers and attacks.
We propose a detector that is based on the analysis of the intrinsic properties; that are affected due to the Trojaning process.
arXiv Detail & Related papers (2020-07-16T06:55:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.