How Robust are Randomized Smoothing based Defenses to Data Poisoning?
- URL: http://arxiv.org/abs/2012.01274v2
- Date: Tue, 30 Mar 2021 16:29:47 GMT
- Title: How Robust are Randomized Smoothing based Defenses to Data Poisoning?
- Authors: Akshay Mehra, Bhavya Kailkhura, Pin-Yu Chen, Jihun Hamm
- Abstract summary: We present a previously unrecognized threat to robust machine learning models that highlights the importance of training-data quality.
We propose a novel bilevel optimization-based data poisoning attack that degrades the robustness guarantees of certifiably robust classifiers.
Our attack is effective even when the victim trains the models from scratch using state-of-the-art robust training methods.
- Score: 66.80663779176979
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Predictions of certifiably robust classifiers remain constant in a
neighborhood of a point, making them resilient to test-time attacks with a
guarantee. In this work, we present a previously unrecognized threat to robust
machine learning models that highlights the importance of training-data quality
in achieving high certified adversarial robustness. Specifically, we propose a
novel bilevel optimization-based data poisoning attack that degrades the
robustness guarantees of certifiably robust classifiers. Unlike other poisoning
attacks that reduce the accuracy of the poisoned models on a small set of
target points, our attack reduces the average certified radius (ACR) of an
entire target class in the dataset. Moreover, our attack is effective even when
the victim trains the models from scratch using state-of-the-art robust
training methods such as Gaussian data augmentation\cite{cohen2019certified},
MACER\cite{zhai2020macer}, and SmoothAdv\cite{salman2019provably} that achieve
high certified adversarial robustness. To make the attack harder to detect, we
use clean-label poisoning points with imperceptible distortions. The
effectiveness of the proposed method is evaluated by poisoning MNIST and
CIFAR10 datasets and training deep neural networks using previously mentioned
training methods and certifying the robustness with randomized smoothing. The
ACR of the target class, for models trained on generated poison data, can be
reduced by more than 30\%. Moreover, the poisoned data is transferable to
models trained with different training methods and models with different
architectures.
Related papers
- Unlearnable Examples Detection via Iterative Filtering [84.59070204221366]
Deep neural networks are proven to be vulnerable to data poisoning attacks.
It is quite beneficial and challenging to detect poisoned samples from a mixed dataset.
We propose an Iterative Filtering approach for UEs identification.
arXiv Detail & Related papers (2024-08-15T13:26:13Z) - FACTUAL: A Novel Framework for Contrastive Learning Based Robust SAR Image Classification [10.911464455072391]
FACTUAL is a Contrastive Learning framework for Adversarial Training and robust SAR classification.
Our model achieves 99.7% accuracy on clean samples, and 89.6% on perturbed samples, both outperforming previous state-of-the-art methods.
arXiv Detail & Related papers (2024-04-04T06:20:22Z) - Have You Poisoned My Data? Defending Neural Networks against Data Poisoning [0.393259574660092]
We propose a novel approach to detect and filter poisoned datapoints in the transfer learning setting.
We show that effective poisons can be successfully differentiated from clean points in the characteristic vector space.
Our evaluation shows that our proposal outperforms existing approaches in defense rate and final trained model performance.
arXiv Detail & Related papers (2024-03-20T11:50:16Z) - FreqFed: A Frequency Analysis-Based Approach for Mitigating Poisoning
Attacks in Federated Learning [98.43475653490219]
Federated learning (FL) is susceptible to poisoning attacks.
FreqFed is a novel aggregation mechanism that transforms the model updates into the frequency domain.
We demonstrate that FreqFed can mitigate poisoning attacks effectively with a negligible impact on the utility of the aggregated model.
arXiv Detail & Related papers (2023-12-07T16:56:24Z) - Not All Poisons are Created Equal: Robust Training against Data
Poisoning [15.761683760167777]
Data poisoning causes misclassification of test time target examples by injecting maliciously crafted samples in the training data.
We propose an efficient defense mechanism that significantly reduces the success rate of various data poisoning attacks.
arXiv Detail & Related papers (2022-10-18T08:19:41Z) - Accumulative Poisoning Attacks on Real-time Data [56.96241557830253]
We show that a well-designed but straightforward attacking strategy can dramatically amplify the poisoning effects.
Our work validates that a well-designed but straightforward attacking strategy can dramatically amplify the poisoning effects.
arXiv Detail & Related papers (2021-06-18T08:29:53Z) - Adaptive Feature Alignment for Adversarial Training [56.17654691470554]
CNNs are typically vulnerable to adversarial attacks, which pose a threat to security-sensitive applications.
We propose the adaptive feature alignment (AFA) to generate features of arbitrary attacking strengths.
Our method is trained to automatically align features of arbitrary attacking strength.
arXiv Detail & Related papers (2021-05-31T17:01:05Z) - A Framework of Randomized Selection Based Certified Defenses Against
Data Poisoning Attacks [28.593598534525267]
This paper proposes a framework of random selection based certified defenses against data poisoning attacks.
We prove that the random selection schemes that satisfy certain conditions are robust against data poisoning attacks.
Our framework allows users to improve robustness by leveraging prior knowledge about the training set and the poisoning model.
arXiv Detail & Related papers (2020-09-18T10:38:12Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.