The "Beatrix'' Resurrections: Robust Backdoor Detection via Gram
Matrices
- URL: http://arxiv.org/abs/2209.11715v2
- Date: Mon, 26 Sep 2022 01:02:52 GMT
- Title: The "Beatrix'' Resurrections: Robust Backdoor Detection via Gram
Matrices
- Authors: Wanlun Ma, Derui Wang, Ruoxi Sun, Minhui Xue, Sheng Wen and Yang Xiang
- Abstract summary: Deep Neural Networks (DNNs) are susceptible to backdoor attacks during training.
We propose a novel technique, Beatrix (backdoor detection via Gram matrix)
Our approach achieves an F1 score of 91.1% in detecting dynamic backdoors, while the state of the art can only reach 36.9%.
- Score: 24.173099352455083
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Neural Networks (DNNs) are susceptible to backdoor attacks during
training. The model corrupted in this way functions normally, but when
triggered by certain patterns in the input, produces a predefined target label.
Existing defenses usually rely on the assumption of the universal backdoor
setting in which poisoned samples share the same uniform trigger. However,
recent advanced backdoor attacks show that this assumption is no longer valid
in dynamic backdoors where the triggers vary from input to input, thereby
defeating the existing defenses.
In this work, we propose a novel technique, Beatrix (backdoor detection via
Gram matrix). Beatrix utilizes Gram matrix to capture not only the feature
correlations but also the appropriately high-order information of the
representations. By learning class-conditional statistics from activation
patterns of normal samples, Beatrix can identify poisoned samples by capturing
the anomalies in activation patterns. To further improve the performance in
identifying target labels, Beatrix leverages kernel-based testing without
making any prior assumptions on representation distribution. We demonstrate the
effectiveness of our method through extensive evaluation and comparison with
state-of-the-art defensive techniques. The experimental results show that our
approach achieves an F1 score of 91.1% in detecting dynamic backdoors, while
the state of the art can only reach 36.9%.
Related papers
- T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models [70.03122709795122]
We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks.
We find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger.
For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9$%$ with low computational cost.
arXiv Detail & Related papers (2024-07-05T01:53:21Z) - Improved Activation Clipping for Universal Backdoor Mitigation and
Test-Time Detection [27.62279831135902]
Deep neural networks are vulnerable toTrojan attacks, where an attacker poisons the training set with backdoor triggers.
Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model.
We devise a new such approach, choosing the activation bounds to explicitly limit classification margins.
arXiv Detail & Related papers (2023-08-08T22:47:39Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - SATBA: An Invisible Backdoor Attack Based On Spatial Attention [7.405457329942725]
Backdoor attacks involve the training of Deep Neural Network (DNN) on datasets that contain hidden trigger patterns.
Most existing backdoor attacks suffer from two significant drawbacks: their trigger patterns are visible and easy to detect by backdoor defense or even human inspection.
We propose a novel backdoor attack named SATBA that overcomes these limitations using spatial attention and an U-net based model.
arXiv Detail & Related papers (2023-02-25T10:57:41Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - Imperceptible Backdoor Attack: From Input Space to Feature
Representation [24.82632240825927]
Backdoor attacks are rapidly emerging threats to deep neural networks (DNNs)
In this paper, we analyze the drawbacks of existing attack approaches and propose a novel imperceptible backdoor attack.
Our trigger only modifies less than 1% pixels of a benign image while the magnitude is 1.
arXiv Detail & Related papers (2022-05-06T13:02:26Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Poisoned classifiers are not only backdoored, they are fundamentally
broken [84.67778403778442]
Under a commonly-studied backdoor poisoning attack against classification models, an attacker adds a small trigger to a subset of the training data.
It is often assumed that the poisoned classifier is vulnerable exclusively to the adversary who possesses the trigger.
In this paper, we show empirically that this view of backdoored classifiers is incorrect.
arXiv Detail & Related papers (2020-10-18T19:42:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.