Related papers: Accuracy of TextFooler black box adversarial attacks on 01 loss sign activation neural network ensemble

Accuracy of TextFooler black box adversarial attacks on 01 loss sign activation neural network ensemble

URL: http://arxiv.org/abs/2402.07347v1
Date: Mon, 12 Feb 2024 00:36:34 GMT
Title: Accuracy of TextFooler black box adversarial attacks on 01 loss sign activation neural network ensemble
Authors: Yunzhe Xue and Usman Roshan
Abstract summary: Recent work has shown the defense of 01 loss sign activation neural networks against image classification adversarial attacks. We ask the following question in this study: are 01 loss sign activation neural networks hard to deceive with a popular black box text adversarial attack program called TextFooler? We find that our 01 loss sign activation network is much harder to attack with TextFooler compared to sigmoid activation cross entropy and binary neural networks.
Score: 5.439020425819001
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work has shown the defense of 01 loss sign activation neural networks against image classification adversarial attacks. A public challenge to attack the models on CIFAR10 dataset remains undefeated. We ask the following question in this study: are 01 loss sign activation neural networks hard to deceive with a popular black box text adversarial attack program called TextFooler? We study this question on four popular text classification datasets: IMDB reviews, Yelp reviews, MR sentiment classification, and AG news classification. We find that our 01 loss sign activation network is much harder to attack with TextFooler compared to sigmoid activation cross entropy and binary neural networks. We also study a 01 loss sign activation convolutional neural network with a novel global pooling step specific to sign activation networks. With this new variation we see a significant gain in adversarial accuracy rendering TextFooler practically useless against it. We make our code freely available at \url{https://github.com/zero-one-loss/wordcnn01} and \url{https://github.com/xyzacademic/mlp01example}. Our work here suggests that 01 loss sign activation networks could be further developed to create fool proof models against text adversarial attacks.

Related papers

OVLA: Neural Network Ownership Verification using Latent Watermarks [7.661766773170363]
We present a novel methodology for neural network ownership verification based on latent watermarks. We show that our approach offers strong defense against backdoor detection, backdoor removal and surrogate model attacks.
arXiv Detail & Related papers (2023-06-15T17:45:03Z)
Trap and Replace: Defending Backdoor Attacks by Trapping Them into an Easy-to-Replace Subnetwork [105.0735256031911]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. We propose a brand-new backdoor defense strategy, which makes it much easier to remove the harmful influence of backdoor samples. We evaluate our method against ten different backdoor attacks.
arXiv Detail & Related papers (2022-10-12T17:24:01Z)
Constrained Gradient Descent: A Powerful and Principled Evasion Attack Against Neural Networks [19.443306494201334]
We introduce several innovations that make white-box targeted attacks follow the intuition of the attacker's goal. First, we propose a new loss function that explicitly captures the goal of targeted attacks. Second, we propose a new attack method that uses a further developed version of our loss function capturing both the misclassification objective and the $L_infty$ distance limit.
arXiv Detail & Related papers (2021-12-28T17:36:58Z)
BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks. Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z)
Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data. We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level. Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z)
A Partial Break of the Honeypots Defense to Catch Adversarial Attacks [57.572998144258705]
We break the baseline version of this defense by reducing the detection true positive rate to 0% and the detection AUC to 0.02. To aid further research, we release the complete 2.5 hour keystroke-by-keystroke screen recording of our attack process at https://nicholas.carlini.com/code/ccs_honeypot_break.
arXiv Detail & Related papers (2020-09-23T07:36:37Z)
Defending against substitute model black box adversarial attacks with the 01 loss [0.0]
We present 01 loss linear and 01 loss dual layer neural network models as a defense against substitute model black box attacks. Our work shows that 01 loss models offer a powerful defense against substitute model black box attacks.
arXiv Detail & Related papers (2020-09-01T22:32:51Z)
Towards adversarial robustness with 01 loss neural networks [0.0]
We propose a hidden layer 01 loss neural network trained with convolutional coordinate descent as a defense against adversarial attacks in machine learning. We compare the minimum distortion of the 01 loss network to the binarized neural network and the standard sigmoid activation network with cross-entropy loss. Our work shows that the 01 loss network has the potential to defend against black box adversarial attacks better than convex loss and binarized networks.
arXiv Detail & Related papers (2020-08-20T18:18:49Z)
Evaluating a Simple Retraining Strategy as a Defense Against Adversarial Attacks [17.709146615433458]
We show how simple algorithms like KNN can be used to determine the labels of the adversarial images needed for retraining. We present the results on two standard datasets namely, CIFAR-10 and TinyImageNet.
arXiv Detail & Related papers (2020-07-20T07:49:33Z)
Anomaly Detection-Based Unknown Face Presentation Attack Detection [74.4918294453537]
Anomaly detection-based spoof attack detection is a recent development in face Presentation Attack Detection. In this paper, we present a deep-learning solution for anomaly detection-based spoof attack detection. The proposed approach benefits from the representation learning power of the CNNs and learns better features for fPAD task.
arXiv Detail & Related papers (2020-07-11T21:20:55Z)
Backdoor Attacks to Graph Neural Networks [73.56867080030091]
We propose the first backdoor attack to graph neural networks (GNN) In our backdoor attack, a GNN predicts an attacker-chosen target label for a testing graph once a predefined subgraph is injected to the testing graph. Our empirical results show that our backdoor attacks are effective with a small impact on a GNN's prediction accuracy for clean testing graphs.
arXiv Detail & Related papers (2020-06-19T14:51:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.