Related papers: Backdoor Learning Curves: Explaining Backdoor Poisoning Beyond Influence Functions

Backdoor Learning Curves: Explaining Backdoor Poisoning Beyond Influence Functions

URL: http://arxiv.org/abs/2106.07214v4
Date: Mon, 16 Dec 2024 07:08:33 GMT
Title: Backdoor Learning Curves: Explaining Backdoor Poisoning Beyond Influence Functions
Authors: Antonio Emanuele Cinà, Kathrin Grosse, Sebastiano Vascon, Ambra Demontis, Battista Biggio, Fabio Roli, Marcello Pelillo,
Abstract summary: We study the process of backdoor learning under the lens of incremental learning and influence functions.<n>We show that the effectiveness of backdoor attacks depends on: (i) the complexity of the learning algorithm; (ii) the fraction of backdoor samples injected into the training set; and (iii) the size and visibility of the backdoor trigger.
Score: 23.750285504961337
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Backdoor attacks inject poisoning samples during training, with the goal of forcing a machine learning model to output an attacker-chosen class when presented a specific trigger at test time. Although backdoor attacks have been demonstrated in a variety of settings and against different models, the factors affecting their effectiveness are still not well understood. In this work, we provide a unifying framework to study the process of backdoor learning under the lens of incremental learning and influence functions. We show that the effectiveness of backdoor attacks depends on: (i) the complexity of the learning algorithm, controlled by its hyperparameters; (ii) the fraction of backdoor samples injected into the training set; and (iii) the size and visibility of the backdoor trigger. These factors affect how fast a model learns to correlate the presence of the backdoor trigger with the target class. Our analysis unveils the intriguing existence of a region in the hyperparameter space in which the accuracy on clean test samples is still high while backdoor attacks are ineffective, thereby suggesting novel criteria to improve existing defenses.

Related papers

Kill it with FIRE: On Leveraging Latent Space Directions for Runtime Backdoor Mitigation in Deep Neural Networks [1.9517610560768623]
A well-known vulnerability is a backdoor introduced into a neural network by poisoned training data or a malicious training process.<n>We propose our inference-time backdoor mitigation approach called FIRE.<n>We view the trigger as directions in the latent spaces between layers that can be applied in reverse to correct the inference mechanism.
arXiv Detail & Related papers (2026-02-11T12:13:25Z)
Backdoor Unlearning by Linear Task Decomposition [69.91984435094157]
Foundation models are highly susceptible to adversarial perturbations and targeted backdoor attacks.<n>Existing backdoor removal approaches rely on costly fine-tuning to override the harmful behavior.<n>This raises the question of whether backdoors can be removed without compromising the general capabilities of the models.
arXiv Detail & Related papers (2025-10-16T16:18:07Z)
Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations [50.1394620328318]
Existing backdoor attacks mainly focus on balanced datasets. We propose an effective backdoor attack named Dynamic Data Augmentation Operation (D$2$AO) Our method can achieve the state-of-the-art attack performance while preserving the clean accuracy.
arXiv Detail & Related papers (2024-10-16T18:44:22Z)
Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z)
Backdoor Defense through Self-Supervised and Generative Learning [0.0]
Training on such data injects a backdoor which causes malicious inference in selected test samples. This paper explores an approach based on generative modelling of per-class distributions in a self-supervised representation space. In both cases, we find that per-class generative models allow to detect poisoned data and cleanse the dataset.
arXiv Detail & Related papers (2024-09-02T11:40:01Z)
Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features. backdoor attacks subtly embed malicious behaviors within the model during training. We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z)
Demystifying Poisoning Backdoor Attacks from a Statistical Perspective [35.30533879618651]
Backdoor attacks pose a significant security risk due to their stealthy nature and potentially serious consequences. This paper evaluates the effectiveness of any backdoor attack incorporating a constant trigger. Our derived understanding applies to both discriminative and generative models.
arXiv Detail & Related papers (2023-10-16T19:35:01Z)
Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning System [4.9233610638625604]
We propose a novel black-box backdoor attack based on machine unlearning. The attacker first augments the training set with carefully designed samples, including poison and mitigation data, to train a benign' model. Then, the attacker posts unlearning requests for the mitigation samples to remove the impact of relevant data on the model, gradually activating the hidden backdoor.
arXiv Detail & Related papers (2023-09-12T02:42:39Z)
Rethinking Backdoor Attacks [122.1008188058615]
In a backdoor attack, an adversary inserts maliciously constructed backdoor examples into a training set to make the resulting model vulnerable to manipulation. Defending against such attacks typically involves viewing these inserted examples as outliers in the training set and using techniques from robust statistics to detect and remove them. We show that without structural information about the training data distribution, backdoor attacks are indistinguishable from naturally-occurring features in the data.
arXiv Detail & Related papers (2023-07-19T17:44:54Z)
Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks. Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence. Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z)
SoK: A Systematic Evaluation of Backdoor Trigger Characteristics in Image Classification [21.424907311421197]
Deep learning is vulnerable to backdoor attacks that modify the training set to embed a secret functionality in the trained model. This paper systematically analyzes the most relevant parameters for the backdoor attacks. Our attacks cover the majority of backdoor settings in research, providing concrete directions for future works.
arXiv Detail & Related papers (2023-02-03T14:00:05Z)
Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics. We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z)
Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure. We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z)
Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural Networks [25.23881974235643]
We show that backdoor attacks induce a smoother decision function around the triggered samples -- a phenomenon which we refer to as textitbackdoor smoothing. Our experiments show that smoothness increases when the trigger is added to the input samples, and that this phenomenon is more pronounced for more successful attacks.
arXiv Detail & Related papers (2020-06-11T18:28:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.