Learning and Certification under Instance-targeted Poisoning
- URL: http://arxiv.org/abs/2105.08709v1
- Date: Tue, 18 May 2021 17:48:15 GMT
- Title: Learning and Certification under Instance-targeted Poisoning
- Authors: Ji Gao, Amin Karbasi, Mohammad Mahmoody
- Abstract summary: We study PAC learnability and certification under instance-targeted poisoning attacks.
We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable.
We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
- Score: 49.55596073963654
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study PAC learnability and certification under
instance-targeted poisoning attacks, where the adversary may change a fraction
of the training set with the goal of fooling the learner at a specific target
instance. Our first contribution is to formalize the problem in various
settings, and explicitly discussing subtle aspects such as learner's randomness
and whether (or not) adversary's attack can depend on it. We show that when the
budget of the adversary scales sublinearly with the sample complexity, PAC
learnability and certification are achievable. In contrast, when the
adversary's budget grows linearly with the sample complexity, the adversary can
potentially drive up the expected 0-1 loss to one. We further extend our
results to distribution-specific PAC learning in the same attack model and show
that proper learning with certification is possible for learning halfspaces
under Gaussian distribution. Finally, we empirically study the robustness of K
nearest neighbour, logistic regression, multi-layer perceptron, and
convolutional neural network on real data sets, and test them against
targeted-poisoning attacks. Our experimental results show that many models,
especially state-of-the-art neural networks, are indeed vulnerable to these
strong attacks. Interestingly, we observe that methods with high standard
accuracy might be more vulnerable to instance-targeted poisoning attacks.
Related papers
- Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - Identifying Adversarially Attackable and Robust Samples [1.4213973379473654]
Adrial attacks insert small, imperceptible perturbations to input samples that cause large, undesired changes to the output of deep learning models.
This work introduces the notion of sample attackability, where we aim to identify samples that are most susceptible to adversarial attacks.
We propose a deep-learning-based detector to identify the adversarially attackable and robust samples in an unseen dataset for an unseen target model.
arXiv Detail & Related papers (2023-01-30T13:58:14Z) - Improving Adversarial Robustness to Sensitivity and Invariance Attacks
with Deep Metric Learning [80.21709045433096]
A standard method in adversarial robustness assumes a framework to defend against samples crafted by minimally perturbing a sample.
We use metric learning to frame adversarial regularization as an optimal transport problem.
Our preliminary results indicate that regularizing over invariant perturbations in our framework improves both invariant and sensitivity defense.
arXiv Detail & Related papers (2022-11-04T13:54:02Z) - Policy Smoothing for Provably Robust Reinforcement Learning [109.90239627115336]
We study the provable robustness of reinforcement learning against norm-bounded adversarial perturbations of the inputs.
We generate certificates that guarantee that the total reward obtained by the smoothed policy will not fall below a certain threshold under a norm-bounded adversarial of perturbation the input.
arXiv Detail & Related papers (2021-06-21T21:42:08Z) - Curse or Redemption? How Data Heterogeneity Affects the Robustness of
Federated Learning [51.15273664903583]
Data heterogeneity has been identified as one of the key features in federated learning but often overlooked in the lens of robustness to adversarial attacks.
This paper focuses on characterizing and understanding its impact on backdooring attacks in federated learning through comprehensive experiments using synthetic and the LEAF benchmarks.
arXiv Detail & Related papers (2021-02-01T06:06:21Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z) - Challenging the adversarial robustness of DNNs based on error-correcting
output codes [33.46319608673487]
ECOC-based networks can be attacked quite easily by introducing a small adversarial perturbation.
adversarial examples can be generated in such a way to achieve high probabilities for the predicted target class.
arXiv Detail & Related papers (2020-03-26T12:14:56Z) - Targeted Forgetting and False Memory Formation in Continual Learners
through Adversarial Backdoor Attacks [2.830541450812474]
We explore the vulnerability of Elastic Weight Consolidation (EWC), a popular continual learning algorithm for avoiding catastrophic forgetting.
We show that an intelligent adversary can bypass the EWC's defenses, and instead cause gradual and deliberate forgetting by introducing small amounts of misinformation to the model during training.
We demonstrate such an adversary's ability to assume control of the model via injection of "backdoor" attack samples on both permuted and split benchmark variants of the MNIST dataset.
arXiv Detail & Related papers (2020-02-17T18:13:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.