Backdoor Attack and Defense for Deep Regression
- URL: http://arxiv.org/abs/2109.02381v1
- Date: Mon, 6 Sep 2021 11:58:03 GMT
- Title: Backdoor Attack and Defense for Deep Regression
- Authors: Xi Li and George Kesidis and David J. Miller and Vladimir Lucic
- Abstract summary: We demonstrate a backdoor attack on a deep neural network used for regression.
The backdoor attack is localized based on training-set data poisoning wherein the mislabeled samples are surrounded by correctly labeled ones.
We also study the performance of a backdoor defense using gradient-based discovery of local error maximizers.
- Score: 23.20365307988698
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We demonstrate a backdoor attack on a deep neural network used for
regression. The backdoor attack is localized based on training-set data
poisoning wherein the mislabeled samples are surrounded by correctly labeled
ones. We demonstrate how such localization is necessary for attack success. We
also study the performance of a backdoor defense using gradient-based discovery
of local error maximizers. Local error maximizers which are associated with
significant (interpolation) error, and are proximal to many training samples,
are suspicious. This method is also used to accurately train for deep
regression in the first place by active (deep) learning leveraging an "oracle"
capable of providing real-valued supervision (a regression target) for samples.
Such oracles, including traditional numerical solvers of PDEs or SDEs using
finite difference or Monte Carlo approximations, are far more computationally
costly compared to deep regression.
Related papers
- T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models [70.03122709795122]
We propose a comprehensive defense method named T2IShield to detect, localize, and mitigate backdoor attacks.
We find the "Assimilation Phenomenon" on the cross-attention maps caused by the backdoor trigger.
For backdoor sample detection, T2IShield achieves a detection F1 score of 88.9$%$ with low computational cost.
arXiv Detail & Related papers (2024-07-05T01:53:21Z) - Setting the Trap: Capturing and Defeating Backdoors in Pretrained
Language Models through Honeypots [68.84056762301329]
Recent research has exposed the susceptibility of pretrained language models (PLMs) to backdoor attacks.
We propose and integrate a honeypot module into the original PLM to absorb backdoor information exclusively.
Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features.
arXiv Detail & Related papers (2023-10-28T08:21:16Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification.
CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z) - Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure.
We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z) - Invisible Backdoor Attacks Using Data Poisoning in the Frequency Domain [8.64369418938889]
We propose a generalized backdoor attack method based on the frequency domain.
It can implement backdoor implantation without mislabeling and accessing the training process.
We evaluate our approach in the no-label and clean-label cases on three datasets.
arXiv Detail & Related papers (2022-07-09T07:05:53Z) - PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection
and Mitigation in Deep Neural Networks [22.900501880865658]
Backdoor attacks impose a new threat in Deep Neural Networks (DNNs)
We propose PiDAn, an algorithm based on coherence optimization purifying the poisoned data.
Our PiDAn algorithm can detect more than 90% infected classes and identify 95% poisoned samples.
arXiv Detail & Related papers (2022-03-17T12:37:21Z) - Backdoor Defense via Decoupling the Training Process [46.34744086706348]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
We propose a novel backdoor defense via decoupling the original end-to-end training process into three stages.
arXiv Detail & Related papers (2022-02-05T03:34:01Z) - Anomaly Localization in Model Gradients Under Backdoor Attacks Against
Federated Learning [0.6091702876917281]
In this study, we make a deep gradient-level analysis on the expected variations in model gradients under several backdoor attack scenarios.
Our main novel finding is that backdoor-induced anomalies in local model updates (weights or gradients) appear in the final layer bias weights of the malicious local models.
arXiv Detail & Related papers (2021-11-29T16:46:01Z) - Black-box Detection of Backdoor Attacks with Limited Information and
Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model.
In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z) - Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch.
We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types.
In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.