Accuracy of TextFooler black box adversarial attacks on 01 loss sign
activation neural network ensemble
- URL: http://arxiv.org/abs/2402.07347v1
- Date: Mon, 12 Feb 2024 00:36:34 GMT
- Title: Accuracy of TextFooler black box adversarial attacks on 01 loss sign
activation neural network ensemble
- Authors: Yunzhe Xue and Usman Roshan
- Abstract summary: Recent work has shown the defense of 01 loss sign activation neural networks against image classification adversarial attacks.
We ask the following question in this study: are 01 loss sign activation neural networks hard to deceive with a popular black box text adversarial attack program called TextFooler?
We find that our 01 loss sign activation network is much harder to attack with TextFooler compared to sigmoid activation cross entropy and binary neural networks.
- Score: 5.439020425819001
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work has shown the defense of 01 loss sign activation neural networks
against image classification adversarial attacks. A public challenge to attack
the models on CIFAR10 dataset remains undefeated. We ask the following question
in this study: are 01 loss sign activation neural networks hard to deceive with
a popular black box text adversarial attack program called TextFooler? We study
this question on four popular text classification datasets: IMDB reviews, Yelp
reviews, MR sentiment classification, and AG news classification. We find that
our 01 loss sign activation network is much harder to attack with TextFooler
compared to sigmoid activation cross entropy and binary neural networks. We
also study a 01 loss sign activation convolutional neural network with a novel
global pooling step specific to sign activation networks. With this new
variation we see a significant gain in adversarial accuracy rendering
TextFooler practically useless against it. We make our code freely available at
\url{https://github.com/zero-one-loss/wordcnn01} and
\url{https://github.com/xyzacademic/mlp01example}. Our work here suggests that
01 loss sign activation networks could be further developed to create fool
proof models against text adversarial attacks.
Related papers
- OVLA: Neural Network Ownership Verification using Latent Watermarks [7.661766773170363]
We present a novel methodology for neural network ownership verification based on latent watermarks.
We show that our approach offers strong defense against backdoor detection, backdoor removal and surrogate model attacks.
arXiv Detail & Related papers (2023-06-15T17:45:03Z) - Trap and Replace: Defending Backdoor Attacks by Trapping Them into an
Easy-to-Replace Subnetwork [105.0735256031911]
Deep neural networks (DNNs) are vulnerable to backdoor attacks.
We propose a brand-new backdoor defense strategy, which makes it much easier to remove the harmful influence of backdoor samples.
We evaluate our method against ten different backdoor attacks.
arXiv Detail & Related papers (2022-10-12T17:24:01Z) - Constrained Gradient Descent: A Powerful and Principled Evasion Attack
Against Neural Networks [19.443306494201334]
We introduce several innovations that make white-box targeted attacks follow the intuition of the attacker's goal.
First, we propose a new loss function that explicitly captures the goal of targeted attacks.
Second, we propose a new attack method that uses a further developed version of our loss function capturing both the misclassification objective and the $L_infty$ distance limit.
arXiv Detail & Related papers (2021-12-28T17:36:58Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - A Partial Break of the Honeypots Defense to Catch Adversarial Attacks [57.572998144258705]
We break the baseline version of this defense by reducing the detection true positive rate to 0% and the detection AUC to 0.02.
To aid further research, we release the complete 2.5 hour keystroke-by-keystroke screen recording of our attack process at https://nicholas.carlini.com/code/ccs_honeypot_break.
arXiv Detail & Related papers (2020-09-23T07:36:37Z) - Defending against substitute model black box adversarial attacks with
the 01 loss [0.0]
We present 01 loss linear and 01 loss dual layer neural network models as a defense against substitute model black box attacks.
Our work shows that 01 loss models offer a powerful defense against substitute model black box attacks.
arXiv Detail & Related papers (2020-09-01T22:32:51Z) - Towards adversarial robustness with 01 loss neural networks [0.0]
We propose a hidden layer 01 loss neural network trained with convolutional coordinate descent as a defense against adversarial attacks in machine learning.
We compare the minimum distortion of the 01 loss network to the binarized neural network and the standard sigmoid activation network with cross-entropy loss.
Our work shows that the 01 loss network has the potential to defend against black box adversarial attacks better than convex loss and binarized networks.
arXiv Detail & Related papers (2020-08-20T18:18:49Z) - Evaluating a Simple Retraining Strategy as a Defense Against Adversarial
Attacks [17.709146615433458]
We show how simple algorithms like KNN can be used to determine the labels of the adversarial images needed for retraining.
We present the results on two standard datasets namely, CIFAR-10 and TinyImageNet.
arXiv Detail & Related papers (2020-07-20T07:49:33Z) - Anomaly Detection-Based Unknown Face Presentation Attack Detection [74.4918294453537]
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection.
In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection.
The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task.
arXiv Detail & Related papers (2020-07-11T21:20:55Z) - Backdoor Attacks to Graph Neural Networks [73.56867080030091]
We propose the first backdoor attack to graph neural networks (GNN)
In our backdoor attack, a GNN predicts an attacker-chosen target label for a testing graph once a predefined subgraph is injected to the testing graph.
Our empirical results show that our backdoor attacks are effective with a small impact on a GNN's prediction accuracy for clean testing graphs.
arXiv Detail & Related papers (2020-06-19T14:51:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.