Related papers: Unreasonable Effectiveness of Last Hidden Layer Activations

Unreasonable Effectiveness of Last Hidden Layer Activations

URL: http://arxiv.org/abs/2202.07342v1
Date: Tue, 15 Feb 2022 12:02:59 GMT
Title: Unreasonable Effectiveness of Last Hidden Layer Activations
Authors: Omer Faruk Tuna, Ferhat Ozgur Catak, M. Taner Eskil
Abstract summary: We show that using some widely known activation functions in the output layer of the model with high temperature values has the effect of zeroing out the gradients for both targeted and untargeted attack cases. We've experimentally verified the efficacy of our approach on MNIST (Digit), CIFAR10 datasets.
Score: 0.5156484100374058
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In standard Deep Neural Network (DNN) based classifiers, the general convention is to omit the activation function in the last (output) layer and directly apply the softmax function on the logits to get the probability scores of each class. In this type of architectures, the loss value of the classifier against any output class is directly proportional to the difference between the final probability score and the label value of the associated class. Standard White-box adversarial evasion attacks, whether targeted or untargeted, mainly try to exploit the gradient of the model loss function to craft adversarial samples and fool the model. In this study, we show both mathematically and experimentally that using some widely known activation functions in the output layer of the model with high temperature values has the effect of zeroing out the gradients for both targeted and untargeted attack cases, preventing attackers from exploiting the model's loss function to craft adversarial samples. We've experimentally verified the efficacy of our approach on MNIST (Digit), CIFAR10 datasets. Detailed experiments confirmed that our approach substantially improves robustness against gradient-based targeted and untargeted attack threats. And, we showed that the increased non-linearity at the output layer has some additional benefits against some other attack methods like Deepfool attack.

Related papers

Towards Model Resistant to Transferable Adversarial Examples via Trigger Activation [95.3977252782181]
Adversarial examples, characterized by imperceptible perturbations, pose significant threats to deep neural networks by misleading their predictions. We introduce a novel training paradigm aimed at enhancing robustness against transferable adversarial examples (TAEs) in a more efficient and effective way.
arXiv Detail & Related papers (2025-04-20T09:07:10Z)
Data-Free Universal Attack by Exploiting the Intrinsic Vulnerability of Deep Models [8.053186346076743]
Deep neural networks (DNNs) are susceptible to Universal Adversarial Perturbations (UAPs) We propose a novel data-free method called Intrinsic UAP (IntriUAP) Our method achieves highly competitive performance in attacking popular image classification deep models without using any image samples.
arXiv Detail & Related papers (2025-03-28T07:48:50Z)
Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM) To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z)
Defense Against Model Extraction Attacks on Recommender Systems [53.127820987326295]
We introduce Gradient-based Ranking Optimization (GRO) to defend against model extraction attacks on recommender systems. GRO aims to minimize the loss of the protected target model while maximizing the loss of the attacker's surrogate model. Results show GRO's superior effectiveness in defending against model extraction attacks.
arXiv Detail & Related papers (2023-10-25T03:30:42Z)
LEAT: Towards Robust Deepfake Disruption in Real-World Scenarios via Latent Ensemble Attack [11.764601181046496]
Deepfakes, malicious visual contents created by generative models, pose an increasingly harmful threat to society. To proactively mitigate deepfake damages, recent studies have employed adversarial perturbation to disrupt deepfake model outputs. We propose a simple yet effective disruption method called Latent Ensemble ATtack (LEAT), which attacks the independent latent encoding process.
arXiv Detail & Related papers (2023-07-04T07:00:37Z)
Boosting Adversarial Transferability via Fusing Logits of Top-1 Decomposed Feature [36.78292952798531]
We propose a Singular Value Decomposition (SVD)-based feature-level attack method. Our approach is inspired by the discovery that eigenvectors associated with the larger singular values from the middle layer features exhibit superior generalization and attention properties.
arXiv Detail & Related papers (2023-05-02T12:27:44Z)
Towards Reasonable Budget Allocation in Untargeted Graph Structure Attacks via Gradient Debias [50.628150015907565]
Cross-entropy loss function is used to evaluate perturbation schemes in classification tasks. Previous methods use negative cross-entropy loss as the attack objective in attacking node-level classification models. This paper argues about the previous unreasonable attack objective from the perspective of budget allocation.
arXiv Detail & Related papers (2023-03-29T13:02:02Z)
Not All Poisons are Created Equal: Robust Training against Data Poisoning [15.761683760167777]
Data poisoning causes misclassification of test time target examples by injecting maliciously crafted samples in the training data. We propose an efficient defense mechanism that significantly reduces the success rate of various data poisoning attacks.
arXiv Detail & Related papers (2022-10-18T08:19:41Z)
Order-Disorder: Imitation Adversarial Attacks for Black-box Neural Ranking Models [48.93128542994217]
We propose an imitation adversarial attack on black-box neural passage ranking models. We show that the target passage ranking model can be transparentized and imitated by enumerating critical queries/candidates. We also propose an innovative gradient-based attack method, empowered by the pairwise objective function, to generate adversarial triggers.
arXiv Detail & Related papers (2022-09-14T09:10:07Z)
RamBoAttack: A Robust Query Efficient Deep Neural Network Decision Exploit [9.93052896330371]
We develop a robust query efficient attack capable of avoiding entrapment in a local minimum and misdirection from noisy gradients. The RamBoAttack is more robust to the different sample inputs available to an adversary and the targeted class.
arXiv Detail & Related papers (2021-12-10T01:25:24Z)
Towards A Conceptually Simple Defensive Approach for Few-shot classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks. We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering. Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z)
Combating Adversaries with Anti-Adversaries [118.70141983415445]
In particular, our layer generates an input perturbation in the opposite direction of the adversarial one. We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models. Our anti-adversary layer significantly enhances model robustness while coming at no cost on clean accuracy.
arXiv Detail & Related papers (2021-03-26T09:36:59Z)
Luring of transferable adversarial perturbations in the black-box paradigm [0.0]
We present a new approach to improve the robustness of a model against black-box transfer attacks. A removable additional neural network is included in the target model, and is designed to induce the textitluring effect. Our deception-based method only needs to have access to the predictions of the target model and does not require a labeled data set.
arXiv Detail & Related papers (2020-04-10T06:48:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.