Related papers: Adversarial Targeted Forgetting in Regularization and Generative Based Continual Learning Models

Adversarial Targeted Forgetting in Regularization and Generative Based Continual Learning Models

URL: http://arxiv.org/abs/2102.08355v1
Date: Tue, 16 Feb 2021 18:45:01 GMT
Title: Adversarial Targeted Forgetting in Regularization and Generative Based Continual Learning Models
Authors: Muhammad Umer, Robi Polikar
Abstract summary: Continual (or "incremental") learning approaches are employed when additional knowledge or tasks need to be learned from subsequent batches or from streaming data. We show that an intelligent adversary can take advantage of a continual learning algorithm's capabilities of retaining existing knowledge over time. We show that the adversary can create a "false memory" about any task by inserting carefully-designed backdoor samples to the test instances of that task.
Score: 2.8021833233819486
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Continual (or "incremental") learning approaches are employed when additional knowledge or tasks need to be learned from subsequent batches or from streaming data. However these approaches are typically adversary agnostic, i.e., they do not consider the possibility of a malicious attack. In our prior work, we explored the vulnerabilities of Elastic Weight Consolidation (EWC) to the perceptible misinformation. We now explore the vulnerabilities of other regularization-based as well as generative replay-based continual learning algorithms, and also extend the attack to imperceptible misinformation. We show that an intelligent adversary can take advantage of a continual learning algorithm's capabilities of retaining existing knowledge over time, and force it to learn and retain deliberately introduced misinformation. To demonstrate this vulnerability, we inject backdoor attack samples into the training data. These attack samples constitute the misinformation, allowing the attacker to capture control of the model at test time. We evaluate the extent of this vulnerability on both rotated and split benchmark variants of the MNIST dataset under two important domain and class incremental learning scenarios. We show that the adversary can create a "false memory" about any task by inserting carefully-designed backdoor samples to the test instances of that task thereby controlling the amount of forgetting of any task of its choosing. Perhaps most importantly, we show this vulnerability to be very acute and damaging: the model memory can be easily compromised with the addition of backdoor samples into as little as 1\% of the training data, even when the misinformation is imperceptible to human eye.

Related papers

Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features. backdoor attacks subtly embed malicious behaviors within the model during training. We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z)
The Adversarial Implications of Variable-Time Inference [47.44631666803983]
We present an approach that exploits a novel side channel in which the adversary simply measures the execution time of the algorithm used to post-process the predictions of the ML model under attack. We investigate leakage from the non-maximum suppression (NMS) algorithm, which plays a crucial role in the operation of object detectors. We demonstrate attacks against the YOLOv3 detector, leveraging the timing leakage to successfully evade object detection using adversarial examples, and perform dataset inference.
arXiv Detail & Related papers (2023-09-05T11:53:17Z)
Backdoor Attacks Against Incremental Learners: An Empirical Evaluation Study [79.33449311057088]
This paper empirically reveals the high vulnerability of 11 typical incremental learners against poisoning-based backdoor attack on 3 learning scenarios. The defense mechanism based on activation clustering is found to be effective in detecting our trigger pattern to mitigate potential security risks.
arXiv Detail & Related papers (2023-05-28T09:17:48Z)
Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models [53.416234157608]
We investigate security concerns of the emergent instruction tuning paradigm, that models are trained on crowdsourced datasets with task instructions to achieve superior performance. Our studies demonstrate that an attacker can inject backdoors by issuing very few malicious instructions and control model behavior through data poisoning.
arXiv Detail & Related papers (2023-05-24T04:27:21Z)
Learning to Unlearn: Instance-wise Unlearning for Pre-trained Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model. We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z)
Data Poisoning Attack Aiming the Vulnerability of Continual Learning [25.480762565632332]
We present a simple task-specific data poisoning attack that can be used in the learning process of a new task. We experiment with the attack on the two representative regularization-based continual learning methods.
arXiv Detail & Related papers (2022-11-29T02:28:05Z)
False Memory Formation in Continual Learners Through Imperceptible Backdoor Trigger [3.3439097577935213]
sequentially learning new information presented to a continual (incremental) learning model. We show that an intelligent adversary can introduce small amount of misinformation to the model during training to cause deliberate forgetting of a specific task or class at test time. We demonstrate such an adversary's ability to assume control of the model by injecting "backdoor" attack samples to commonly used generative replay and regularization based continual learning approaches.
arXiv Detail & Related papers (2022-02-09T14:21:13Z)
Backdoor Attacks on Self-Supervised Learning [22.24046752858929]
We show that self-supervised learning methods are vulnerable to backdoor attacks. An attacker poisons a part of the unlabeled data by adding a small trigger (known to the attacker) to the images. We propose a knowledge distillation based defense algorithm that succeeds in neutralizing the attack.
arXiv Detail & Related papers (2021-05-21T04:22:05Z)
Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks. We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable. We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z)
Adversarial Self-Supervised Contrastive Learning [62.17538130778111]
Existing adversarial learning approaches mostly use class labels to generate adversarial samples that lead to incorrect predictions. We propose a novel adversarial attack for unlabeled data, which makes the model confuse the instance-level identities of the perturbed data samples. We present a self-supervised contrastive learning framework to adversarially train a robust neural network without labeled data.
arXiv Detail & Related papers (2020-06-13T08:24:33Z)
Targeted Forgetting and False Memory Formation in Continual Learners through Adversarial Backdoor Attacks [2.830541450812474]
We explore the vulnerability of Elastic Weight Consolidation (EWC), a popular continual learning algorithm for avoiding catastrophic forgetting. We show that an intelligent adversary can bypass the EWC's defenses, and instead cause gradual and deliberate forgetting by introducing small amounts of misinformation to the model during training. We demonstrate such an adversary's ability to assume control of the model via injection of "backdoor" attack samples on both permuted and split benchmark variants of the MNIST dataset.
arXiv Detail & Related papers (2020-02-17T18:13:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.