LISArD: Learning Image Similarity to Defend Against Gray-box Adversarial Attacks
- URL: http://arxiv.org/abs/2502.20562v1
- Date: Thu, 27 Feb 2025 22:02:06 GMT
- Title: LISArD: Learning Image Similarity to Defend Against Gray-box Adversarial Attacks
- Authors: Joana C. Costa, Tiago Roxo, Hugo Proença, Pedro R. M. Inácio,
- Abstract summary: Adversarial Training (AT) and Adversarial Distillation (AD) include adversarial examples during the training phase.<n>This paper considers an even more realistic evaluation scenario: gray-box attacks, which assume that the attacker knows the architecture and the dataset used to train the target network, but cannot access its gradients.<n>We provide empirical evidence that models are vulnerable to gray-box attacks and propose LISArD, a defense mechanism that does not increase computational and temporal costs but provides robustness against gray-box and white-box attacks without including AT.
- Score: 13.154512864498912
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: State-of-the-art defense mechanisms are typically evaluated in the context of white-box attacks, which is not realistic, as it assumes the attacker can access the gradients of the target network. To protect against this scenario, Adversarial Training (AT) and Adversarial Distillation (AD) include adversarial examples during the training phase, and Adversarial Purification uses a generative model to reconstruct all the images given to the classifier. This paper considers an even more realistic evaluation scenario: gray-box attacks, which assume that the attacker knows the architecture and the dataset used to train the target network, but cannot access its gradients. We provide empirical evidence that models are vulnerable to gray-box attacks and propose LISArD, a defense mechanism that does not increase computational and temporal costs but provides robustness against gray-box and white-box attacks without including AT. Our method approximates a cross-correlation matrix, created with the embeddings of perturbed and clean images, to a diagonal matrix while simultaneously conducting classification learning. Our results show that LISArD can effectively protect against gray-box attacks, can be used in multiple architectures, and carries over its resilience to the white-box scenario. Also, state-of-the-art AD models underperform greatly when removing AT and/or moving to gray-box settings, highlighting the lack of robustness from existing approaches to perform in various conditions (aside from white-box settings). All the source code is available at https://github.com/Joana-Cabral/LISArD.
Related papers
- A Review of Adversarial Attacks in Computer Vision [16.619382559756087]
Adversarial attacks can be invisible to human eyes, but can lead to deep learning misclassification.
Adversarial attacks can be divided into white-box attacks, for which the attacker knows the parameters and gradient of the model, and black-box attacks, for the latter, the attacker can only obtain the input and output of the model.
arXiv Detail & Related papers (2023-08-15T09:43:10Z) - Certifiable Black-Box Attacks with Randomized Adversarial Examples: Breaking Defenses with Provable Confidence [34.35162562625252]
Black-box adversarial attacks have demonstrated strong potential to compromise machine learning models.
We study a new paradigm of black-box attacks with provable guarantees.
This new black-box attack unveils significant vulnerabilities of machine learning models.
arXiv Detail & Related papers (2023-04-10T01:12:09Z) - Adversarial Defense via Image Denoising with Chaotic Encryption [65.48888274263756]
We propose a novel defense that assumes everything but a private key will be made available to the attacker.
Our framework uses an image denoising procedure coupled with encryption via a discretized Baker map.
arXiv Detail & Related papers (2022-03-19T10:25:02Z) - Art-Attack: Black-Box Adversarial Attack via Evolutionary Art [5.760976250387322]
Deep neural networks (DNNs) have achieved state-of-the-art performance in many tasks but have shown extreme vulnerabilities to attacks generated by adversarial examples.
This paper proposes a gradient-free attack by using a concept of evolutionary art to generate adversarial examples.
arXiv Detail & Related papers (2022-03-07T12:54:09Z) - Similarity-based Gray-box Adversarial Attack Against Deep Face
Recognition [11.397740896235089]
We propose a similarity-based gray-box adversarial attack (SGADV) technique with a newly developed objective function.
We conduct experiments on face datasets of LFW, CelebA, and CelebA-HQ against deep face recognition models of FaceNet and InsightFace.
The results suggest that the proposed method significantly outperforms the existing adversarial attack techniques in the gray-box setting.
arXiv Detail & Related papers (2022-01-11T15:53:18Z) - Cross-Modal Transferable Adversarial Attacks from Images to Videos [82.0745476838865]
Recent studies have shown that adversarial examples hand-crafted on one white-box model can be used to attack other black-box models.
We propose a simple yet effective cross-modal attack method, named as Image To Video (I2V) attack.
I2V generates adversarial frames by minimizing the cosine similarity between features of pre-trained image models from adversarial and benign examples.
arXiv Detail & Related papers (2021-12-10T08:19:03Z) - Meta Gradient Adversarial Attack [64.5070788261061]
This paper proposes a novel architecture called Metaversa Gradient Adrial Attack (MGAA), which is plug-and-play and can be integrated with any existing gradient-based attack method.
Specifically, we randomly sample multiple models from a model zoo to compose different tasks and iteratively simulate a white-box attack and a black-box attack in each task.
By narrowing the gap between the gradient directions in white-box and black-box attacks, the transferability of adversarial examples on the black-box setting can be improved.
arXiv Detail & Related papers (2021-08-09T17:44:19Z) - Improving Query Efficiency of Black-box Adversarial Attack [75.71530208862319]
We propose a Neural Process based black-box adversarial attack (NP-Attack)
NP-Attack could greatly decrease the query counts under the black-box setting.
arXiv Detail & Related papers (2020-09-24T06:22:56Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z) - Spanning Attack: Reinforce Black-box Attacks with Unlabeled Data [96.92837098305898]
Black-box attacks aim to craft adversarial perturbations by querying input-output pairs of machine learning models.
Black-box attacks often suffer from the issue of query inefficiency due to the high dimensionality of the input space.
We propose a novel technique called the spanning attack, which constrains adversarial perturbations in a low-dimensional subspace via spanning an auxiliary unlabeled dataset.
arXiv Detail & Related papers (2020-05-11T05:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.