Prototype-Guided Robust Learning against Backdoor Attacks
- URL: http://arxiv.org/abs/2509.08748v1
- Date: Wed, 03 Sep 2025 14:41:54 GMT
- Title: Prototype-Guided Robust Learning against Backdoor Attacks
- Authors: Wei Guo, Maura Pintor, Ambra Demontis, Battista Biggio,
- Abstract summary: Backdoor attacks poison the training data to embed a backdoor in the model.<n>We propose Prototype-Guided Robust Learning (PGRL) to be robust against diverse backdoor attacks.
- Score: 16.60001324267935
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Backdoor attacks poison the training data to embed a backdoor in the model, causing it to behave normally on legitimate inputs but maliciously when specific trigger signals appear. Training a benign model from a dataset poisoned by backdoor attacks is challenging. Existing works rely on various assumptions and can only defend against backdoor attacks with specific trigger signals, high poisoning ratios, or when the defender possesses a large, untainted, validation dataset. In this paper, we propose a defense called Prototype-Guided Robust Learning (PGRL), which overcomes all the aforementioned limitations, being robust against diverse backdoor attacks. Leveraging a tiny set of benign samples, PGRL generates prototype vectors to guide the training process. We compare our PGRL with 8 existing defenses, showing that it achieves superior robustness. We also demonstrate that PGRL generalizes well across various architectures, datasets, and advanced attacks. Finally, to evaluate our PGRL in the worst-case scenario, we perform an adaptive attack, where the attackers fully know the details of the defense.
Related papers
- Data Free Backdoor Attacks [83.10379074100453]
DFBA is a retraining-free and data-free backdoor attack without changing the model architecture.<n>We verify that our injected backdoor is provably undetectable and unchosen by various state-of-the-art defenses.<n>Our evaluation on multiple datasets demonstrates that our injected backdoor: 1) incurs negligible classification loss, 2) achieves 100% attack success rates, and 3) bypasses six existing state-of-the-art defenses.
arXiv Detail & Related papers (2024-12-09T05:30:25Z) - Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models.
In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned.
We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation.
Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them.
We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z) - Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
backdoor attack is an emerging yet threatening training-phase threat.
We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z) - Defending against Insertion-based Textual Backdoor Attacks via
Attribution [18.935041122443675]
We propose AttDef, an efficient attribution-based pipeline to defend against two insertion-based poisoning attacks.
Specifically, we regard the tokens with larger attribution scores as potential triggers since larger attribution words contribute more to the false prediction results.
We show that our proposed method can generalize sufficiently well in two common attack scenarios.
arXiv Detail & Related papers (2023-05-03T19:29:26Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.