LAFEAT: Piercing Through Adversarial Defenses with Latent Features
- URL: http://arxiv.org/abs/2104.09284v2
- Date: Tue, 20 Apr 2021 07:35:16 GMT
- Title: LAFEAT: Piercing Through Adversarial Defenses with Latent Features
- Authors: Yunrui Yu, Xitong Gao, Cheng-Zhong Xu
- Abstract summary: We show that latent features in certain "robust" models are surprisingly susceptible to adversarial attacks.
We introduce a unified $ell_infty$-norm white-box attack algorithm which harnesses latent features in its gradient descent steps, namely LAFEAT.
- Score: 15.189068478164337
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Deep convolutional neural networks are susceptible to adversarial attacks.
They can be easily deceived to give an incorrect output by adding a tiny
perturbation to the input. This presents a great challenge in making CNNs
robust against such attacks. An influx of new defense techniques have been
proposed to this end. In this paper, we show that latent features in certain
"robust" models are surprisingly susceptible to adversarial attacks. On top of
this, we introduce a unified $\ell_\infty$-norm white-box attack algorithm
which harnesses latent features in its gradient descent steps, namely LAFEAT.
We show that not only is it computationally much more efficient for successful
attacks, but it is also a stronger adversary than the current state-of-the-art
across a wide range of defense mechanisms. This suggests that model robustness
could be contingent on the effective use of the defender's hidden components,
and it should no longer be viewed from a holistic perspective.
Related papers
- BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Understanding the Robustness of Randomized Feature Defense Against
Query-Based Adversarial Attacks [23.010308600769545]
Deep neural networks are vulnerable to adversarial examples that find samples close to the original image but can make the model misclassify.
We propose a simple and lightweight defense against black-box attacks by adding random noise to hidden features at intermediate layers of the model at inference time.
Our method effectively enhances the model's resilience against both score-based and decision-based black-box attacks.
arXiv Detail & Related papers (2023-10-01T03:53:23Z) - Efficient Defense Against Model Stealing Attacks on Convolutional Neural
Networks [0.548924822963045]
Model stealing attacks can lead to intellectual property theft and other security and privacy risks.
Current state-of-the-art defenses against model stealing attacks suggest adding perturbations to the prediction probabilities.
We propose a simple yet effective and efficient defense alternative.
arXiv Detail & Related papers (2023-09-04T22:25:49Z) - Isolation and Induction: Training Robust Deep Neural Networks against
Model Stealing Attacks [51.51023951695014]
Existing model stealing defenses add deceptive perturbations to the victim's posterior probabilities to mislead the attackers.
This paper proposes Isolation and Induction (InI), a novel and effective training framework for model stealing defenses.
In contrast to adding perturbations over model predictions that harm the benign accuracy, we train models to produce uninformative outputs against stealing queries.
arXiv Detail & Related papers (2023-08-02T05:54:01Z) - The Best Defense is a Good Offense: Adversarial Augmentation against
Adversarial Attacks [91.56314751983133]
$A5$ is a framework to craft a defensive perturbation to guarantee that any attack towards the input in hand will fail.
We show effective on-the-fly defensive augmentation with a robustifier network that ignores the ground truth label.
We also show how to apply $A5$ to create certifiably robust physical objects.
arXiv Detail & Related papers (2023-05-23T16:07:58Z) - Based-CE white-box adversarial attack will not work using super-fitting [10.34121642283309]
Deep Neural Networks (DNN) are widely used in various fields due to their powerful performance.
Recent studies have shown that deep learning models are vulnerable to adversarial attacks.
This paper proposes a new defense method by using the model super-fitting status.
arXiv Detail & Related papers (2022-05-04T09:23:00Z) - Sparse Coding Frontend for Robust Neural Networks [11.36192454455449]
Deep Neural Networks are known to be vulnerable to small, adversarially crafted, perturbations.
Current defense methods against these adversarial attacks are variants of adversarial training.
In this paper, we introduce a radically different defense based on a sparse coding based on clean images.
arXiv Detail & Related papers (2021-04-12T11:14:32Z) - Attack Agnostic Adversarial Defense via Visual Imperceptible Bound [70.72413095698961]
This research aims to design a defense model that is robust within a certain bound against both seen and unseen adversarial attacks.
The proposed defense model is evaluated on the MNIST, CIFAR-10, and Tiny ImageNet databases.
The proposed algorithm is attack agnostic, i.e. it does not require any knowledge of the attack algorithm.
arXiv Detail & Related papers (2020-10-25T23:14:26Z) - Online Alternate Generator against Adversarial Attacks [144.45529828523408]
Deep learning models are notoriously sensitive to adversarial examples which are synthesized by adding quasi-perceptible noises on real images.
We propose a portable defense method, online alternate generator, which does not need to access or modify the parameters of the target networks.
The proposed method works by online synthesizing another image from scratch for an input image, instead of removing or destroying adversarial noises.
arXiv Detail & Related papers (2020-09-17T07:11:16Z) - RayS: A Ray Searching Method for Hard-label Adversarial Attack [99.72117609513589]
We present the Ray Searching attack (RayS), which greatly improves the hard-label attack effectiveness as well as efficiency.
RayS attack can also be used as a sanity check for possible "falsely robust" models.
arXiv Detail & Related papers (2020-06-23T07:01:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.