Phantom Transfer: Data-level Defences are Insufficient Against Data Poisoning
- URL: http://arxiv.org/abs/2602.04899v1
- Date: Tue, 03 Feb 2026 14:38:07 GMT
- Title: Phantom Transfer: Data-level Defences are Insufficient Against Data Poisoning
- Authors: Andrew Draganov, Tolga H. Dur, Anandmayi Bhongade, Mary Phuong,
- Abstract summary: We present a data poisoning attack -- Phantom Transfer -- with the property that, even if you know precisely how the poison was placed into an otherwise benign dataset, you cannot filter it out.<n>We demonstrate that the attack works across models, including GPT-4.1.<n>This suggests that data-level defences are insufficient for stopping sophisticated data poisoning attacks.
- Score: 2.290359868657638
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a data poisoning attack -- Phantom Transfer -- with the property that, even if you know precisely how the poison was placed into an otherwise benign dataset, you cannot filter it out. We achieve this by modifying subliminal learning to work in real-world contexts and demonstrate that the attack works across models, including GPT-4.1. Indeed, even fully paraphrasing every sample in the dataset using a different model does not stop the attack. We also discuss connections to steering vectors and show that one can plant password-triggered behaviours into models while still beating defences. This suggests that data-level defences are insufficient for stopping sophisticated data poisoning attacks. We suggest that future work should focus on model audits and white-box security methods.
Related papers
- SpooFL: Spoofing Federated Learning [54.05993847488204]
We introduce a spoofing-based defense that deceives attackers into believing they have recovered the true training data.<n>Unlike prior synthetic-data defenses that share classes or distributions with the private data, SFL uses a state-of-the-art generative model trained on an external dataset.<n>As a result, attackers are misled into recovering plausible yet completely irrelevant samples, preventing meaningful data leakage.
arXiv Detail & Related papers (2026-01-21T14:57:18Z) - Protecting Model Adaptation from Trojans in the Unlabeled Data [120.42853706967188]
This paper explores the potential trojan attacks on model adaptation launched by well-designed poisoning target data.<n>We propose a plug-and-play method named DiffAdapt, which can be seamlessly integrated with existing adaptation algorithms.
arXiv Detail & Related papers (2024-01-11T16:42:10Z) - Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models [53.416234157608]
We investigate security concerns of the emergent instruction tuning paradigm, that models are trained on crowdsourced datasets with task instructions to achieve superior performance.
Our studies demonstrate that an attacker can inject backdoors by issuing very few malicious instructions and control model behavior through data poisoning.
arXiv Detail & Related papers (2023-05-24T04:27:21Z) - Pick your Poison: Undetectability versus Robustness in Data Poisoning
Attacks [33.82164201455115]
Deep image classification models trained on vast amounts of web-scraped data are susceptible to data poisoning.
Existing work considers an effective defense as one that either (i) restores a model's integrity through repair or (ii) detects an attack.
We argue that this approach overlooks a crucial trade-off: Attackers can increase at the expense of detectability (over-poisoning) or decrease detectability at the cost of robustness (under-poisoning)
arXiv Detail & Related papers (2023-05-07T15:58:06Z) - A Data-Driven Defense against Edge-case Model Poisoning Attacks on Federated Learning [13.89043799280729]
We propose an effective defense against model poisoning attacks in Federated Learning systems.
DataDefense learns a poisoned data detector model which marks each example in the defense dataset as poisoned or clean.
It is able to reduce the attack success rate by at least 40% on standard attack setups and by more than 80% on some setups.
arXiv Detail & Related papers (2023-05-03T10:20:26Z) - Autoregressive Perturbations for Data Poisoning [54.205200221427994]
Data scraping from social media has led to growing concerns regarding unauthorized use of data.
Data poisoning attacks have been proposed as a bulwark against scraping.
We introduce autoregressive (AR) poisoning, a method that can generate poisoned data without access to the broader dataset.
arXiv Detail & Related papers (2022-06-08T06:24:51Z) - SPECTRE: Defending Against Backdoor Attacks Using Robust Statistics [44.487762480349765]
A small fraction of poisoned data changes the behavior of a trained model when triggered by an attacker-specified watermark.
We propose a novel defense algorithm using robust covariance estimation to amplify the spectral signature of corrupted data.
arXiv Detail & Related papers (2021-04-22T20:49:40Z) - Defening against Adversarial Denial-of-Service Attacks [0.0]
Data poisoning is one of the most relevant security threats against machine learning and data-driven technologies.
We propose a new approach of detecting DoS poisoned instances.
We evaluate our defence against two DoS poisoning attacks and seven datasets, and find that it reliably identifies poisoned instances.
arXiv Detail & Related papers (2021-04-14T09:52:36Z) - Property Inference From Poisoning [15.105224455937025]
Property inference attacks consider an adversary who has access to the trained model and tries to extract some global statistics of the training data.
We study poisoning attacks where the goal of the adversary is to increase the information leakage of the model.
Our findings suggest that poisoning attacks can boost the information leakage significantly and should be considered as a stronger threat model in sensitive applications.
arXiv Detail & Related papers (2021-01-26T20:35:28Z) - Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data.
We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label"
We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z) - Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and
Data Poisoning Attacks [74.88735178536159]
Data poisoning is the number one concern among threats ranging from model stealing to adversarial attacks.
We observe that data poisoning and backdoor attacks are highly sensitive to variations in the testing setup.
We apply rigorous tests to determine the extent to which we should fear them.
arXiv Detail & Related papers (2020-06-22T18:34:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.