Related papers: Pick your Poison: Undetectability versus Robustness in Data Poisoning Attacks

Pick your Poison: Undetectability versus Robustness in Data Poisoning Attacks

URL: http://arxiv.org/abs/2305.09671v2
Date: Thu, 29 Jun 2023 15:23:47 GMT
Title: Pick your Poison: Undetectability versus Robustness in Data Poisoning Attacks
Authors: Nils Lukas and Florian Kerschbaum
Abstract summary: Deep image classification models trained on vast amounts of web-scraped data are susceptible to data poisoning. Existing work considers an effective defense as one that either (i) restores a model's integrity through repair or (ii) detects an attack. We argue that this approach overlooks a crucial trade-off: Attackers can increase at the expense of detectability (over-poisoning) or decrease detectability at the cost of robustness (under-poisoning)
Score: 33.82164201455115
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep image classification models trained on vast amounts of web-scraped data are susceptible to data poisoning - a mechanism for backdooring models. A small number of poisoned samples seen during training can severely undermine a model's integrity during inference. Existing work considers an effective defense as one that either (i) restores a model's integrity through repair or (ii) detects an attack. We argue that this approach overlooks a crucial trade-off: Attackers can increase robustness at the expense of detectability (over-poisoning) or decrease detectability at the cost of robustness (under-poisoning). In practice, attacks should remain both undetectable and robust. Detectable but robust attacks draw human attention and rigorous model evaluation or cause the model to be re-trained or discarded. In contrast, attacks that are undetectable but lack robustness can be repaired with minimal impact on model accuracy. Our research points to intrinsic flaws in current attack evaluation methods and raises the bar for all data poisoning attackers who must delicately balance this trade-off to remain robust and undetectable. To demonstrate the existence of more potent defenders, we propose defenses designed to (i) detect or (ii) repair poisoned models using a limited amount of trusted image-label pairs. Our results show that an attacker who needs to be robust and undetectable is substantially less threatening. Our defenses mitigate all tested attacks with a maximum accuracy decline of 2% using only 1% of clean data on CIFAR-10 and 2.5% on ImageNet. We demonstrate the scalability of our defenses by evaluating large vision-language models, such as CLIP. Attackers who can manipulate the model's parameters pose an elevated risk as they can achieve higher robustness at low detectability compared to data poisoning attackers.

Related papers

Deferred Poisoning: Making the Model More Vulnerable via Hessian Singularization [39.37308843208039]
We introduce a more threatening type of poisoning attack called the Deferred Poisoning Attack. This new attack allows the model to function normally during the training and validation phases but makes it very sensitive to evasion attacks or even natural noise. We have conducted both theoretical and empirical analyses of the proposed method and validated its effectiveness through experiments on image classification tasks.
arXiv Detail & Related papers (2024-11-06T08:27:49Z)
Diffusion Denoising as a Certified Defense against Clean-label Poisoning [56.04951180983087]
We show how an off-the-shelf diffusion model can sanitize the tampered training data. We extensively test our defense against seven clean-label poisoning attacks and reduce their attack success to 0-16% with only a negligible drop in the test time accuracy.
arXiv Detail & Related papers (2024-03-18T17:17:07Z)
Poison is Not Traceless: Fully-Agnostic Detection of Poisoning Attacks [4.064462548421468]
This paper presents a novel fully-agnostic framework, DIVA, that detects attacks solely relying on analyzing the potentially poisoned data set. For evaluation purposes, in this paper, we test DIVA on label-flipping attacks.
arXiv Detail & Related papers (2023-10-24T22:27:44Z)
RECESS Vaccine for Federated Learning: Proactive Defense Against Model Poisoning Attacks [20.55681622921858]
Model poisoning attacks greatly jeopardize the application of federated learning (FL) In this work, we propose a novel proactive defense named RECESS against model poisoning attacks. Unlike previous methods that score each iteration, RECESS considers clients' performance correlation across multiple iterations to estimate the trust score.
arXiv Detail & Related papers (2023-10-09T06:09:01Z)
Isolation and Induction: Training Robust Deep Neural Networks against Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers. This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses. In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z)
Avoid Adversarial Adaption in Federated Learning by Multi-Metric Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources. FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks. We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously. MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z)
Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks [76.35478518372692]
We introduce epsilon-illusory, a novel form of adversarial attack on sequential decision-makers. Compared to existing attacks, we empirically find epsilon-illusory to be significantly harder to detect with automated methods. Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses.
arXiv Detail & Related papers (2022-07-20T19:49:09Z)
Defening against Adversarial Denial-of-Service Attacks [0.0]
Data poisoning is one of the most relevant security threats against machine learning and data-driven technologies. We propose a new approach of detecting DoS poisoned instances. We evaluate our defence against two DoS poisoning attacks and seven datasets, and find that it reliably identifies poisoned instances.
arXiv Detail & Related papers (2021-04-14T09:52:36Z)
How Robust are Randomized Smoothing based Defenses to Data Poisoning? [66.80663779176979]
We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality. We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers. Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
arXiv Detail & Related papers (2020-12-02T15:30:21Z)
Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data. We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label" We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.