Enhancing targeted transferability via feature space fine-tuning
- URL: http://arxiv.org/abs/2401.02727v2
- Date: Sat, 13 Jan 2024 09:29:13 GMT
- Title: Enhancing targeted transferability via feature space fine-tuning
- Authors: Hui Zeng, Biwei Chen, and Anjie Peng
- Abstract summary: Adrial examples (AEs) have been extensively studied due to their potential for privacy protection and inspiring robust neural networks.
We propose fine-tuning an AE crafted by existing simple iterative attacks to make it transferable across unknown models.
- Score: 21.131915084053894
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Adversarial examples (AEs) have been extensively studied due to their
potential for privacy protection and inspiring robust neural networks. Yet,
making a targeted AE transferable across unknown models remains challenging. In
this paper, to alleviate the overfitting dilemma common in an AE crafted by
existing simple iterative attacks, we propose fine-tuning it in the feature
space. Specifically, starting with an AE generated by a baseline attack, we
encourage the features conducive to the target class and discourage the
features to the original class in a middle layer of the source model. Extensive
experiments demonstrate that only a few iterations of fine-tuning can boost
existing attacks' targeted transferability nontrivially and universally. Our
results also verify that the simple iterative attacks can yield comparable or
even better transferability than the resource-intensive methods, which rest on
training target-specific classifiers or generators with additional data. The
code is available at: github.com/zengh5/TA_feature_FT.
Related papers
- Transferable Adversarial Attacks on SAM and Its Downstream Models [87.23908485521439]
This paper explores the feasibility of adversarial attacking various downstream models fine-tuned from the segment anything model (SAM)
To enhance the effectiveness of the adversarial attack towards models fine-tuned on unknown datasets, we propose a universal meta-initialization (UMI) algorithm.
arXiv Detail & Related papers (2024-10-26T15:04:04Z) - Rethinking Weak-to-Strong Augmentation in Source-Free Domain Adaptive Object Detection [38.596886094105216]
Source-Free domain adaptive Object Detection (SFOD) aims to transfer a detector (pre-trained on source domain) to new unlabelled target domains.
This paper introduces a novel Weak-to-Strong Contrastive Learning (WSCoL) approach.
arXiv Detail & Related papers (2024-10-07T23:32:06Z) - CLIP-Guided Generative Networks for Transferable Targeted Adversarial Attacks [52.29186466633699]
Transferable targeted adversarial attacks aim to mislead models into outputting adversary-specified predictions in black-box scenarios.
textitsingle-target generative attacks train a generator for each target class to generate highly transferable perturbations.
textbfCLIP-guided textbfGenerative textbfNetwork with textbfCross-attention modules (CGNC) to enhance multi-target attacks.
arXiv Detail & Related papers (2024-07-14T12:30:32Z) - Transferable Attack for Semantic Segmentation [59.17710830038692]
adversarial attacks, and observe that the adversarial examples generated from a source model fail to attack the target models.
We propose an ensemble attack for semantic segmentation to achieve more effective attacks with higher transferability.
arXiv Detail & Related papers (2023-07-31T11:05:55Z) - GNP Attack: Transferable Adversarial Examples via Gradient Norm Penalty [14.82389560064876]
Adversarial examples (AE) with good transferability enable practical black-box attacks on diverse target models.
We propose a novel approach to enhance AE transferability using Gradient Norm Penalty (GNP)
By attacking 11 state-of-the-art deep learning models and 6 advanced defense methods, we empirically show that GNP is very effective in generating AE with high transferability.
arXiv Detail & Related papers (2023-07-09T05:21:31Z) - Common Knowledge Learning for Generating Transferable Adversarial
Examples [60.1287733223249]
This paper focuses on an important type of black-box attacks, where the adversary generates adversarial examples by a substitute (source) model.
Existing methods tend to give unsatisfactory adversarial transferability when the source and target models are from different types of DNN architectures.
We propose a common knowledge learning (CKL) framework to learn better network weights to generate adversarial examples.
arXiv Detail & Related papers (2023-07-01T09:07:12Z) - Boosting Adversarial Transferability via Fusing Logits of Top-1
Decomposed Feature [36.78292952798531]
We propose a Singular Value Decomposition (SVD)-based feature-level attack method.
Our approach is inspired by the discovery that eigenvectors associated with the larger singular values from the middle layer features exhibit superior generalization and attention properties.
arXiv Detail & Related papers (2023-05-02T12:27:44Z) - Plug & Play Attacks: Towards Robust and Flexible Model Inversion Attacks [13.374754708543449]
Model attacks (MIAs) aim to create synthetic images that reflect the class-wise characteristics from a target inversion's training data by exploiting the model's learned knowledge.
Previous research has developed generative MIAs using generative adversarial networks (GANs) as image priors tailored to a specific target model.
We present Plug & Play Attacks that loosen the dependency between the target model and image prior and enable the use of a single trained GAN to attack a broad range of targets.
arXiv Detail & Related papers (2022-01-28T15:25:50Z) - Discriminator-Free Generative Adversarial Attack [87.71852388383242]
Agenerative-based adversarial attacks can get rid of this limitation.
ASymmetric Saliency-based Auto-Encoder (SSAE) generates the perturbations.
The adversarial examples generated by SSAE not only make thewidely-used models collapse, but also achieves good visual quality.
arXiv Detail & Related papers (2021-07-20T01:55:21Z) - On Generating Transferable Targeted Perturbations [102.3506210331038]
We propose a new generative approach for highly transferable targeted perturbations.
Our approach matches the perturbed image distribution' with that of the target class, leading to high targeted transferability rates.
arXiv Detail & Related papers (2021-03-26T17:55:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.