Related papers: Not All Samples Are Equal: Quantifying Instance-level Difficulty in Targeted Data Poisoning

Not All Samples Are Equal: Quantifying Instance-level Difficulty in Targeted Data Poisoning

URL: http://arxiv.org/abs/2509.06896v1
Date: Mon, 08 Sep 2025 17:14:55 GMT
Title: Not All Samples Are Equal: Quantifying Instance-level Difficulty in Targeted Data Poisoning
Authors: William Xu, Yiwei Lu, Yihan Wang, Matthew Y. R. Yang, Zuoqiu Liu, Gautam Kamath, Yaoliang Yu,
Abstract summary: Targeted data poisoning attacks pose an increasingly serious threat due to their ease of deployment and high success rates.<n>This paper introduces three predictive criteria for targeted data poisoning difficulty: ergodic prediction accuracy, poison distance, and poison budget.
Score: 36.54354798424162
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Targeted data poisoning attacks pose an increasingly serious threat due to their ease of deployment and high success rates. These attacks aim to manipulate the prediction for a single test sample in classification models. Unlike indiscriminate attacks that aim to decrease overall test performance, targeted attacks present a unique threat to individual test instances. This threat model raises a fundamental question: what factors make certain test samples more susceptible to successful poisoning than others? We investigate how attack difficulty varies across different test instances and identify key characteristics that influence vulnerability. This paper introduces three predictive criteria for targeted data poisoning difficulty: ergodic prediction accuracy (analyzed through clean training dynamics), poison distance, and poison budget. Our experimental results demonstrate that these metrics effectively predict the varying difficulty of real-world targeted poisoning attacks across diverse scenarios, offering practitioners valuable insights for vulnerability assessment and understanding data poisoning attacks.

Related papers

Detecting Instruction Fine-tuning Attacks on Language Models using Influence Function [6.778374627376906]
Instruction finetuning attacks pose a serious threat to large language models.<n>Poisoned data is often indistinguishable from clean data.<n>We present a detection method that requires no prior knowledge of the attack.
arXiv Detail & Related papers (2025-04-12T00:50:28Z)
Exploring Model Dynamics for Accumulative Poisoning Discovery [62.08553134316483]
We propose a novel information measure, namely, Memorization Discrepancy, to explore the defense via the model-level information. By implicitly transferring the changes in the data manipulation to that in the model outputs, Memorization Discrepancy can discover the imperceptible poison samples. We thoroughly explore its properties and propose Discrepancy-aware Sample Correction (DSC) to defend against accumulative poisoning attacks.
arXiv Detail & Related papers (2023-06-06T14:45:24Z)
IDEA: Invariant Defense for Graph Adversarial Robustness [60.0126873387533]
We propose an Invariant causal DEfense method against adversarial Attacks (IDEA) We derive node-based and structure-based invariance objectives from an information-theoretic perspective. Experiments demonstrate that IDEA attains state-of-the-art defense performance under all five attacks on all five datasets.
arXiv Detail & Related papers (2023-05-25T07:16:00Z)
Identifying Adversarially Attackable and Robust Samples [1.4213973379473654]
Adrial attacks insert small, imperceptible perturbations to input samples that cause large, undesired changes to the output of deep learning models. This work introduces the notion of sample attackability, where we aim to identify samples that are most susceptible to adversarial attacks. We propose a deep-learning-based detector to identify the adversarially attackable and robust samples in an unseen dataset for an unseen target model.
arXiv Detail & Related papers (2023-01-30T13:58:14Z)
Amplifying Membership Exposure via Data Poisoning [18.799570863203858]
In this paper, we investigate the third type of exploitation of data poisoning - increasing the risks of privacy leakage of benign training samples. We propose a set of data poisoning attacks to amplify the membership exposure of the targeted class. Our results show that the proposed attacks can substantially increase the membership inference precision with minimum overall test-time model performance degradation.
arXiv Detail & Related papers (2022-11-01T13:52:25Z)
De-Pois: An Attack-Agnostic Defense against Data Poisoning Attacks [17.646155241759743]
De-Pois is an attack-agnostic defense against poisoning attacks. We implement four types of poisoning attacks and evaluate De-Pois with five typical defense methods.
arXiv Detail & Related papers (2021-05-08T04:47:37Z)
DeepPoison: Feature Transfer Based Stealthy Poisoning Attack [2.1445455835823624]
DeepPoison is a novel adversarial network of one generator and two discriminators. DeepPoison can achieve a state-of-the-art attack success rate, as high as 91.74%.
arXiv Detail & Related papers (2021-01-06T15:45:36Z)
Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder [78.01180944665089]
This paper demonstrates a fatal vulnerability in natural language inference (NLI) and text classification systems. We present a 'backdoor poisoning' attack on NLP models.
arXiv Detail & Related papers (2020-10-06T13:03:49Z)
Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data. We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label" We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z)
Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks [74.88735178536159]
Data poisoning is the number one concern among threats ranging from model stealing to adversarial attacks. We observe that data poisoning and backdoor attacks are highly sensitive to variations in the testing setup. We apply rigorous tests to determine the extent to which we should fear them.
arXiv Detail & Related papers (2020-06-22T18:34:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.