Gradient Shaping: Enhancing Backdoor Attack Against Reverse Engineering
- URL: http://arxiv.org/abs/2301.12318v2
- Date: Sat, 2 Mar 2024 22:56:23 GMT
- Title: Gradient Shaping: Enhancing Backdoor Attack Against Reverse Engineering
- Authors: Rui Zhu, Di Tang, Siyuan Tang, Guanhong Tao, Shiqing Ma, Xiaofeng
Wang, Haixu Tang
- Abstract summary: gradient-based trigger inversion is considered to be among the most effective backdoor detection techniques.
Our study shows that existing attacks tend to inject the backdoor characterized by a low change rate around trigger-carrying inputs.
We design a new attack enhancement called textitGradient Shaping (GRASP) to reduce the change rate of a backdoored model with regard to the trigger.
- Score: 39.11590429626592
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most existing methods to detect backdoored machine learning (ML) models take
one of the two approaches: trigger inversion (aka. reverse engineer) and weight
analysis (aka. model diagnosis). In particular, the gradient-based trigger
inversion is considered to be among the most effective backdoor detection
techniques, as evidenced by the TrojAI competition, Trojan Detection Challenge
and backdoorBench. However, little has been done to understand why this
technique works so well and, more importantly, whether it raises the bar to the
backdoor attack. In this paper, we report the first attempt to answer this
question by analyzing the change rate of the backdoored model around its
trigger-carrying inputs. Our study shows that existing attacks tend to inject
the backdoor characterized by a low change rate around trigger-carrying inputs,
which are easy to capture by gradient-based trigger inversion. In the meantime,
we found that the low change rate is not necessary for a backdoor attack to
succeed: we design a new attack enhancement called \textit{Gradient Shaping}
(GRASP), which follows the opposite direction of adversarial training to reduce
the change rate of a backdoored model with regard to the trigger, without
undermining its backdoor effect. Also, we provide a theoretic analysis to
explain the effectiveness of this new technique and the fundamental weakness of
gradient-based trigger inversion. Finally, we perform both theoretical and
experimental analysis, showing that the GRASP enhancement does not reduce the
effectiveness of the stealthy attacks against the backdoor detection methods
based on weight analysis, as well as other backdoor mitigation methods without
using detection.
Related papers
- Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models [70.03122709795122]
We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks.
We find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger.
For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9$%$ with low computational cost.
arXiv Detail & Related papers (2024-07-05T01:53:21Z) - LSP Framework: A Compensatory Model for Defeating Trigger Reverse Engineering via Label Smoothing Poisoning [39.59018626026389]
We propose a new perspective to defeat trigger reverse engineering by manipulating the classification confidence of backdoor samples.
With proper modifications, the backdoor attack can easily bypass the trigger reverse engineering based methods.
arXiv Detail & Related papers (2024-04-19T12:42:31Z) - Backdoor Mitigation by Correcting the Distribution of Neural Activations [30.554700057079867]
Backdoor (Trojan) attacks are an important type of adversarial exploit against deep neural networks (DNNs)
We analyze an important property of backdoor attacks: a successful attack causes an alteration in the distribution of internal layer activations for backdoor-trigger instances.
We propose an efficient and effective method that achieves post-training backdoor mitigation by correcting the distribution alteration.
arXiv Detail & Related papers (2023-08-18T22:52:29Z) - Rethinking the Trigger-injecting Position in Graph Backdoor Attack [7.4968235623939155]
Backdoor attacks have been demonstrated as a security threat for machine learning models.
In this paper, we study two trigger-injecting strategies for backdoor attacks on Graph Neural Networks (GNNs)
Our results show that, generally, LIAS performs better, and the differences between the LIAS and MIAS performance can be significant.
arXiv Detail & Related papers (2023-04-05T07:50:05Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural
Networks [25.23881974235643]
We show that backdoor attacks induce a smoother decision function around the triggered samples -- a phenomenon which we refer to as textitbackdoor smoothing.
Our experiments show that smoothness increases when the trigger is added to the input samples, and that this phenomenon is more pronounced for more successful attacks.
arXiv Detail & Related papers (2020-06-11T18:28:54Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.