False Memory Formation in Continual Learners Through Imperceptible
Backdoor Trigger
- URL: http://arxiv.org/abs/2202.04479v1
- Date: Wed, 9 Feb 2022 14:21:13 GMT
- Title: False Memory Formation in Continual Learners Through Imperceptible
Backdoor Trigger
- Authors: Muhammad Umer, Robi Polikar
- Abstract summary: sequentially learning new information presented to a continual (incremental) learning model.
We show that an intelligent adversary can introduce small amount of misinformation to the model during training to cause deliberate forgetting of a specific task or class at test time.
We demonstrate such an adversary's ability to assume control of the model by injecting "backdoor" attack samples to commonly used generative replay and regularization based continual learning approaches.
- Score: 3.3439097577935213
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this brief, we show that sequentially learning new information presented
to a continual (incremental) learning model introduces new security risks: an
intelligent adversary can introduce small amount of misinformation to the model
during training to cause deliberate forgetting of a specific task or class at
test time, thus creating "false memory" about that task. We demonstrate such an
adversary's ability to assume control of the model by injecting "backdoor"
attack samples to commonly used generative replay and regularization based
continual learning approaches using continual learning benchmark variants of
MNIST, as well as the more challenging SVHN and CIFAR 10 datasets. Perhaps most
damaging, we show this vulnerability to be very acute and exceptionally
effective: the backdoor pattern in our attack model can be imperceptible to
human eye, can be provided at any point in time, can be added into the training
data of even a single possibly unrelated task and can be achieved with as few
as just 1\% of total training dataset of a single task.
Related papers
- Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - Transpose Attack: Stealing Datasets with Bidirectional Training [4.166238443183223]
We show that adversaries can exfiltrate datasets from protected learning environments under the guise of legitimate models.
We propose a novel approach for detecting infected models.
arXiv Detail & Related papers (2023-11-13T15:14:50Z) - Backdoor Attacks Against Incremental Learners: An Empirical Evaluation
Study [79.33449311057088]
This paper empirically reveals the high vulnerability of 11 typical incremental learners against poisoning-based backdoor attack on 3 learning scenarios.
The defense mechanism based on activation clustering is found to be effective in detecting our trigger pattern to mitigate potential security risks.
arXiv Detail & Related papers (2023-05-28T09:17:48Z) - Instructions as Backdoors: Backdoor Vulnerabilities of Instruction Tuning for Large Language Models [53.416234157608]
We investigate security concerns of the emergent instruction tuning paradigm, that models are trained on crowdsourced datasets with task instructions to achieve superior performance.
Our studies demonstrate that an attacker can inject backdoors by issuing very few malicious instructions and control model behavior through data poisoning.
arXiv Detail & Related papers (2023-05-24T04:27:21Z) - Enhancing Multiple Reliability Measures via Nuisance-extended
Information Bottleneck [77.37409441129995]
In practical scenarios where training data is limited, many predictive signals in the data can be rather from some biases in data acquisition.
We consider an adversarial threat model under a mutual information constraint to cover a wider class of perturbations in training.
We propose an autoencoder-based training to implement the objective, as well as practical encoder designs to facilitate the proposed hybrid discriminative-generative training.
arXiv Detail & Related papers (2023-03-24T16:03:21Z) - Data Poisoning Attack Aiming the Vulnerability of Continual Learning [25.480762565632332]
We present a simple task-specific data poisoning attack that can be used in the learning process of a new task.
We experiment with the attack on the two representative regularization-based continual learning methods.
arXiv Detail & Related papers (2022-11-29T02:28:05Z) - RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target.
RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead.
Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z) - Adversarial Targeted Forgetting in Regularization and Generative Based
Continual Learning Models [2.8021833233819486]
Continual (or "incremental") learning approaches are employed when additional knowledge or tasks need to be learned from subsequent batches or from streaming data.
We show that an intelligent adversary can take advantage of a continual learning algorithm's capabilities of retaining existing knowledge over time.
We show that the adversary can create a "false memory" about any task by inserting carefully-designed backdoor samples to the test instances of that task.
arXiv Detail & Related papers (2021-02-16T18:45:01Z) - Sampling Attacks: Amplification of Membership Inference Attacks by
Repeated Queries [74.59376038272661]
We introduce sampling attack, a novel membership inference technique that unlike other standard membership adversaries is able to work under severe restriction of no access to scores of the victim model.
We show that a victim model that only publishes the labels is still susceptible to sampling attacks and the adversary can recover up to 100% of its performance.
For defense, we choose differential privacy in the form of gradient perturbation during the training of the victim model as well as output perturbation at prediction time.
arXiv Detail & Related papers (2020-09-01T12:54:54Z) - Targeted Forgetting and False Memory Formation in Continual Learners
through Adversarial Backdoor Attacks [2.830541450812474]
We explore the vulnerability of Elastic Weight Consolidation (EWC), a popular continual learning algorithm for avoiding catastrophic forgetting.
We show that an intelligent adversary can bypass the EWC's defenses, and instead cause gradual and deliberate forgetting by introducing small amounts of misinformation to the model during training.
We demonstrate such an adversary's ability to assume control of the model via injection of "backdoor" attack samples on both permuted and split benchmark variants of the MNIST dataset.
arXiv Detail & Related papers (2020-02-17T18:13:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.