Related papers: Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification

Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification

URL: http://arxiv.org/abs/2007.12070v3
Date: Mon, 15 Mar 2021 03:45:46 GMT
Title: Mitigating backdoor attacks in LSTM-based Text Classification Systems by Backdoor Keyword Identification
Authors: Chuanshuai Chen, Jiazhu Dai
Abstract summary: In text classification systems, backdoors inserted in the models can cause spam or malicious speech to escape detection. In this paper, through analyzing the changes in inner LSTM neurons, we proposed a defense method called Backdoor Keyword Identification (BKI) to mitigate backdoor attacks. We evaluate our method on four different text classification datset: IMDB, DBpedia, 20 newsgroups and Reuters-21578 dataset.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It has been proved that deep neural networks are facing a new threat called backdoor attacks, where the adversary can inject backdoors into the neural network model through poisoning the training dataset. When the input containing some special pattern called the backdoor trigger, the model with backdoor will carry out malicious task such as misclassification specified by adversaries. In text classification systems, backdoors inserted in the models can cause spam or malicious speech to escape detection. Previous work mainly focused on the defense of backdoor attacks in computer vision, little attention has been paid to defense method for RNN backdoor attacks regarding text classification. In this paper, through analyzing the changes in inner LSTM neurons, we proposed a defense method called Backdoor Keyword Identification (BKI) to mitigate backdoor attacks which the adversary performs against LSTM-based text classification by data poisoning. This method can identify and exclude poisoning samples crafted to insert backdoor into the model from training data without a verified and trusted dataset. We evaluate our method on four different text classification datset: IMDB, DBpedia ontology, 20 newsgroups and Reuters-21578 dataset. It all achieves good performance regardless of the trigger sentences.

Related papers

Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification. CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z)
FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases [50.065022493142116]
Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence. FreeEagle is the first data-free backdoor detection method that can effectively detect complex backdoor attacks.
arXiv Detail & Related papers (2023-02-28T11:31:29Z)
BackdoorBox: A Python Toolbox for Backdoor Learning [67.53987387581222]
This Python toolbox implements representative and advanced backdoor attacks and defenses. It allows researchers and developers to easily implement and compare different methods on benchmark or their local datasets.
arXiv Detail & Related papers (2023-02-01T09:45:42Z)
BATT: Backdoor Attack with Transformation-based Triggers [72.61840273364311]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. Backdoor adversaries inject hidden backdoors that can be activated by adversary-specified trigger patterns. One recent research revealed that most of the existing attacks failed in the real physical world.
arXiv Detail & Related papers (2022-11-02T16:03:43Z)
Detecting Backdoors in Deep Text Classifiers [43.36440869257781]
We present the first robust defence mechanism that generalizes to several backdoor attacks against text classification models. Our technique is highly accurate at defending against state-of-the-art backdoor attacks, including data poisoning and weight poisoning.
arXiv Detail & Related papers (2022-10-11T07:48:03Z)
Turn the Combination Lock: Learnable Textual Backdoor Attacks via Word Substitution [57.51117978504175]
Recent studies show that neural natural language processing (NLP) models are vulnerable to backdoor attacks. Injected with backdoors, models perform normally on benign examples but produce attacker-specified predictions when the backdoor is activated. We present invisible backdoors that are activated by a learnable combination of word substitution.
arXiv Detail & Related papers (2021-06-11T13:03:17Z)
Black-box Detection of Backdoor Attacks with Limited Information and Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model. In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)
Backdoor Learning: A Survey [75.59571756777342]
Backdoor attack intends to embed hidden backdoor into deep neural networks (DNNs) Backdoor learning is an emerging and rapidly growing research area. This paper presents the first comprehensive survey of this realm.
arXiv Detail & Related papers (2020-07-17T04:09:20Z)
Backdoors in Neural Models of Source Code [13.960152426268769]
We study backdoors in the context of deep-learning for source code. We show how to poison a dataset to install such backdoors. We also show the ease of injecting backdoors and our ability to eliminate them.
arXiv Detail & Related papers (2020-06-11T21:35:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.