Non-omniscient backdoor injection with a single poison sample: Proving the one-poison hypothesis for linear regression and linear classification
- URL: http://arxiv.org/abs/2508.05600v1
- Date: Thu, 07 Aug 2025 17:41:33 GMT
- Title: Non-omniscient backdoor injection with a single poison sample: Proving the one-poison hypothesis for linear regression and linear classification
- Authors: Thorsten Peinemann, Paula Arnold, Sebastian Berndt, Thomas Eisenbarth, Esfandiar Mohammadi,
- Abstract summary: We show that an adversary with one poison sample and limited background knowledge can inject a backdoor with zero backdooring-error.<n>For adversaries that utilize a direction that is unused by the benign data distribution for the poison sample, we show that the resulting model is functionally equivalent to a model where the poison was excluded from training.
- Score: 6.816788256267754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Backdoor injection attacks are a threat to machine learning models that are trained on large data collected from untrusted sources; these attacks enable attackers to inject malicious behavior into the model that can be triggered by specially crafted inputs. Prior work has established bounds on the success of backdoor attacks and their impact on the benign learning task, however, an open question is what amount of poison data is needed for a successful backdoor attack. Typical attacks either use few samples, but need much information about the data points or need to poison many data points. In this paper, we formulate the one-poison hypothesis: An adversary with one poison sample and limited background knowledge can inject a backdoor with zero backdooring-error and without significantly impacting the benign learning task performance. Moreover, we prove the one-poison hypothesis for linear regression and linear classification. For adversaries that utilize a direction that is unused by the benign data distribution for the poison sample, we show that the resulting model is functionally equivalent to a model where the poison was excluded from training. We build on prior work on statistical backdoor learning to show that in all other cases, the impact on the benign learning task is still limited. We also validate our theoretical results experimentally with realistic benchmark data sets.
Related papers
- Deferred Poisoning: Making the Model More Vulnerable via Hessian Singularization [36.13844441263675]
We introduce a more threatening type of poisoning attack called the Deferred Poisoning Attack.<n>This new attack allows the model to function normally during the training and validation phases but makes it very sensitive to evasion attacks or even natural noise.<n>We have conducted both theoretical and empirical analyses of the proposed method and validated its effectiveness through experiments on image classification tasks.
arXiv Detail & Related papers (2024-11-06T08:27:49Z) - Generalization Bound and New Algorithm for Clean-Label Backdoor Attack [14.80556378962582]
backdoor attack has the special property that the poisoned triggers are contained in both the training set and the test set.
In this paper, we fill this gap by deriving algorithm-independent generalization bounds in the clean-label backdoor attack scenario.
We propose a new clean-label backdoor attack that computes the poisoning trigger by combining adversarial noise and indiscriminate poison.
arXiv Detail & Related papers (2024-06-02T01:46:58Z) - Mellivora Capensis: A Backdoor-Free Training Framework on the Poisoned Dataset without Auxiliary Data [29.842087372804905]
This paper addresses the challenges of backdoor attack countermeasures in real-world scenarios.
We propose a robust and clean-data-free backdoor defense framework, namely Mellivora Capensis (textttMeCa), which enables the model trainer to train a clean model on the poisoned dataset.
arXiv Detail & Related papers (2024-05-21T12:20:19Z) - SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks [53.28390057407576]
Modern NLP models are often trained on public datasets drawn from diverse sources.
Data poisoning attacks can manipulate the model's behavior in ways engineered by the attacker.
Several strategies have been proposed to mitigate the risks associated with backdoor attacks.
arXiv Detail & Related papers (2024-05-19T14:50:09Z) - Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information.
By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples.
We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z) - Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics.
We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z) - PiDAn: A Coherence Optimization Approach for Backdoor Attack Detection
and Mitigation in Deep Neural Networks [22.900501880865658]
Backdoor attacks impose a new threat in Deep Neural Networks (DNNs)
We propose PiDAn, an algorithm based on coherence optimization purifying the poisoned data.
Our PiDAn algorithm can detect more than 90% infected classes and identify 95% poisoned samples.
arXiv Detail & Related papers (2022-03-17T12:37:21Z) - Accumulative Poisoning Attacks on Real-time Data [56.96241557830253]
We show that a well-designed but straightforward attacking strategy can dramatically amplify the poisoning effects.
Our work validates that a well-designed but straightforward attacking strategy can dramatically amplify the poisoning effects.
arXiv Detail & Related papers (2021-06-18T08:29:53Z) - Property Inference From Poisoning [15.105224455937025]
Property inference attacks consider an adversary who has access to the trained model and tries to extract some global statistics of the training data.
We study poisoning attacks where the goal of the adversary is to increase the information leakage of the model.
Our findings suggest that poisoning attacks can boost the information leakage significantly and should be considered as a stronger threat model in sensitive applications.
arXiv Detail & Related papers (2021-01-26T20:35:28Z) - Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data.
We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label"
We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z) - Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and
Data Poisoning Attacks [74.88735178536159]
Data poisoning is the number one concern among threats ranging from model stealing to adversarial attacks.
We observe that data poisoning and backdoor attacks are highly sensitive to variations in the testing setup.
We apply rigorous tests to determine the extent to which we should fear them.
arXiv Detail & Related papers (2020-06-22T18:34:08Z) - Systematic Evaluation of Backdoor Data Poisoning Attacks on Image
Classifiers [6.352532169433872]
Backdoor data poisoning attacks have been demonstrated in computer vision research as a potential safety risk for machine learning (ML) systems.
Our work builds upon prior backdoor data-poisoning research for ML image classifiers.
We find that poisoned models are hard to detect through performance inspection alone.
arXiv Detail & Related papers (2020-04-24T02:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.