Adversarial Targeted Forgetting in Regularization and Generative Based
Continual Learning Models
- URL: http://arxiv.org/abs/2102.08355v1
- Date: Tue, 16 Feb 2021 18:45:01 GMT
- Title: Adversarial Targeted Forgetting in Regularization and Generative Based
Continual Learning Models
- Authors: Muhammad Umer, Robi Polikar
- Abstract summary: Continual (or "incremental") learning approaches are employed when additional knowledge or tasks need to be learned from subsequent batches or from streaming data.
We show that an intelligent adversary can take advantage of a continual learning algorithm's capabilities of retaining existing knowledge over time.
We show that the adversary can create a "false memory" about any task by inserting carefully-designed backdoor samples to the test instances of that task.
- Score: 2.8021833233819486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continual (or "incremental") learning approaches are employed when additional
knowledge or tasks need to be learned from subsequent batches or from streaming
data. However these approaches are typically adversary agnostic, i.e., they do
not consider the possibility of a malicious attack. In our prior work, we
explored the vulnerabilities of Elastic Weight Consolidation (EWC) to the
perceptible misinformation. We now explore the vulnerabilities of other
regularization-based as well as generative replay-based continual learning
algorithms, and also extend the attack to imperceptible misinformation. We show
that an intelligent adversary can take advantage of a continual learning
algorithm's capabilities of retaining existing knowledge over time, and force
it to learn and retain deliberately introduced misinformation. To demonstrate
this vulnerability, we inject backdoor attack samples into the training data.
These attack samples constitute the misinformation, allowing the attacker to
capture control of the model at test time. We evaluate the extent of this
vulnerability on both rotated and split benchmark variants of the MNIST dataset
under two important domain and class incremental learning scenarios. We show
that the adversary can create a "false memory" about any task by inserting
carefully-designed backdoor samples to the test instances of that task thereby
controlling the amount of forgetting of any task of its choosing. Perhaps most
importantly, we show this vulnerability to be very acute and damaging: the
model memory can be easily compromised with the addition of backdoor samples
into as little as 1\% of the training data, even when the misinformation is
imperceptible to human eye.
Related papers
- Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - The Adversarial Implications of Variable-Time Inference [47.44631666803983]
We present an approach that exploits a novel side channel in which the adversary simply measures the execution time of the algorithm used to post-process the predictions of the ML model under attack.
We investigate leakage from the non-maximum suppression (NMS) algorithm, which plays a crucial role in the operation of object detectors.
We demonstrate attacks against the YOLOv3 detector, leveraging the timing leakage to successfully evade object detection using adversarial examples, and perform dataset inference.
arXiv Detail & Related papers (2023-09-05T11:53:17Z) - Backdoor Attacks Against Incremental Learners: An Empirical Evaluation
Study [79.33449311057088]
This paper empirically reveals the high vulnerability of 11 typical incremental learners against poisoning-based backdoor attack on 3 learning scenarios.
The defense mechanism based on activation clustering is found to be effective in detecting our trigger pattern to mitigate potential security risks.
arXiv Detail & Related papers (2023-05-28T09:17:48Z) - Learning to Unlearn: Instance-wise Unlearning for Pre-trained
Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model.
We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z) - Data Poisoning Attack Aiming the Vulnerability of Continual Learning [25.480762565632332]
We present a simple task-specific data poisoning attack that can be used in the learning process of a new task.
We experiment with the attack on the two representative regularization-based continual learning methods.
arXiv Detail & Related papers (2022-11-29T02:28:05Z) - False Memory Formation in Continual Learners Through Imperceptible
Backdoor Trigger [3.3439097577935213]
sequentially learning new information presented to a continual (incremental) learning model.
We show that an intelligent adversary can introduce small amount of misinformation to the model during training to cause deliberate forgetting of a specific task or class at test time.
We demonstrate such an adversary's ability to assume control of the model by injecting "backdoor" attack samples to commonly used generative replay and regularization based continual learning approaches.
arXiv Detail & Related papers (2022-02-09T14:21:13Z) - Backdoor Attacks on Self-Supervised Learning [22.24046752858929]
We show that self-supervised learning methods are vulnerable to backdoor attacks.
An attacker poisons a part of the unlabeled data by adding a small trigger (known to the attacker) to the images.
We propose a knowledge distillation based defense algorithm that succeeds in neutralizing the attack.
arXiv Detail & Related papers (2021-05-21T04:22:05Z) - Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks.
We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable.
We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z) - Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions.
We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples.
We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z) - Targeted Forgetting and False Memory Formation in Continual Learners
through Adversarial Backdoor Attacks [2.830541450812474]
We explore the vulnerability of Elastic Weight Consolidation (EWC), a popular continual learning algorithm for avoiding catastrophic forgetting.
We show that an intelligent adversary can bypass the EWC's defenses, and instead cause gradual and deliberate forgetting by introducing small amounts of misinformation to the model during training.
We demonstrate such an adversary's ability to assume control of the model via injection of "backdoor" attack samples on both permuted and split benchmark variants of the MNIST dataset.
arXiv Detail & Related papers (2020-02-17T18:13:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.