Related papers: Backdoor Learning on Sequence to Sequence Models

Backdoor Learning on Sequence to Sequence Models

URL: http://arxiv.org/abs/2305.02424v1
Date: Wed, 3 May 2023 20:31:13 GMT
Title: Backdoor Learning on Sequence to Sequence Models
Authors: Lichang Chen, Minhao Cheng, Heng Huang
Abstract summary: In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks. Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence. Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
Score: 94.23904400441957
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Backdoor learning has become an emerging research area towards building a trustworthy machine learning system. While a lot of works have studied the hidden danger of backdoor attacks in image or text classification, there is a limited understanding of the model's robustness on backdoor attacks when the output space is infinite and discrete. In this paper, we study a much more challenging problem of testing whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks. Specifically, we find by only injecting 0.2\% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence. Furthermore, we utilize Byte Pair Encoding (BPE) to create multiple new triggers, which brings new challenges to backdoor detection since these backdoors are not static. Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90\% attack success rate on multiple datasets and models.

Related papers

Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor [0.24335447922683692]
We introduce a new type of backdoor attack that conceals itself within the underlying model architecture. The add-on modules of model architecture layers can detect the presence of input trigger tokens and modify layer weights. We conduct extensive experiments to evaluate our attack methods using two model architecture settings on five different large language datasets.
arXiv Detail & Related papers (2024-09-03T14:54:16Z)
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures. This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z)
Exploiting Machine Unlearning for Backdoor Attacks in Deep Learning System [4.9233610638625604]
We propose a novel black-box backdoor attack based on machine unlearning. The attacker first augments the training set with carefully designed samples, including poison and mitigation data, to train a benign' model. Then, the attacker posts unlearning requests for the mitigation samples to remove the impact of relevant data on the model, gradually activating the hidden backdoor.
arXiv Detail & Related papers (2023-09-12T02:42:39Z)
Universal Soldier: Using Universal Adversarial Perturbations for Detecting Backdoor Attacks [15.917794562400449]
A deep learning model may be poisoned by training with backdoored data or by modifying inner network parameters. It is difficult to distinguish between clean and backdoored models without prior knowledge of the trigger. We propose a novel method called Universal Soldier for Backdoor detection (USB) and reverse engineering potential backdoor triggers via UAPs.
arXiv Detail & Related papers (2023-02-01T20:47:58Z)
Backdoor Defense via Suppressing Model Shortcuts [91.30995749139012]
In this paper, we explore the backdoor mechanism from the angle of the model structure. We demonstrate that the attack success rate (ASR) decreases significantly when reducing the outputs of some key skip connections.
arXiv Detail & Related papers (2022-11-02T15:39:19Z)
Backdoor Attacks on Crowd Counting [63.90533357815404]
Crowd counting is a regression task that estimates the number of people in a scene image. In this paper, we investigate the vulnerability of deep learning based crowd counting models to backdoor attacks.
arXiv Detail & Related papers (2022-07-12T16:17:01Z)
Check Your Other Door! Establishing Backdoor Attacks in the Frequency Domain [80.24811082454367]
We show the advantages of utilizing the frequency domain for establishing undetectable and powerful backdoor attacks. We also show two possible defences that succeed against frequency-based backdoor attacks and possible ways for the attacker to bypass them.
arXiv Detail & Related papers (2021-09-12T12:44:52Z)
Backdoor Learning Curves: Explaining Backdoor Poisoning Beyond Influence Functions [26.143147923356626]
We study the process of backdoor learning under the lens of incremental learning and influence functions. We show that the success of backdoor attacks inherently depends on (i) the complexity of the learning algorithm and (ii) the fraction of backdoor samples injected into the training set.
arXiv Detail & Related papers (2021-06-14T08:00:48Z)
Backdoor Smoothing: Demystifying Backdoor Attacks on Deep Neural Networks [25.23881974235643]
We show that backdoor attacks induce a smoother decision function around the triggered samples -- a phenomenon which we refer to as textitbackdoor smoothing. Our experiments show that smoothness increases when the trigger is added to the input samples, and that this phenomenon is more pronounced for more successful attacks.
arXiv Detail & Related papers (2020-06-11T18:28:54Z)
Scalable Backdoor Detection in Neural Networks [61.39635364047679]
Deep learning models are vulnerable to Trojan attacks, where an attacker can install a backdoor during training time to make the resultant model misidentify samples contaminated with a small trigger patch. We propose a novel trigger reverse-engineering based approach whose computational complexity does not scale with the number of labels, and is based on a measure that is both interpretable and universal across different network and patch types. In experiments, we observe that our method achieves a perfect score in separating Trojaned models from pure models, which is an improvement over the current state-of-the art method.
arXiv Detail & Related papers (2020-06-10T04:12:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.