Related papers: Can Targeted Adversarial Examples Transfer When the Source and Target Models Have No Label Space Overlap?

Can Targeted Adversarial Examples Transfer When the Source and Target Models Have No Label Space Overlap?

URL: http://arxiv.org/abs/2103.09916v1
Date: Wed, 17 Mar 2021 21:21:44 GMT
Title: Can Targeted Adversarial Examples Transfer When the Source and Target Models Have No Label Space Overlap?
Authors: Nathan Inkawhich, Kevin J Liang, Jingyang Zhang, Huanrui Yang, Hai Li, Yiran Chen
Abstract summary: We design blackbox transfer-based targeted adversarial attacks for an environment where the attacker's source model and the target blackbox model may have disjoint label spaces and training datasets. Our methodology begins with the construction of a class correspondence matrix between the whitebox and blackbox label sets. We show that our transfer attacks serve as powerful adversarial priors when integrated with query-based methods.
Score: 36.96777303738315
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We design blackbox transfer-based targeted adversarial attacks for an environment where the attacker's source model and the target blackbox model may have disjoint label spaces and training datasets. This scenario significantly differs from the "standard" blackbox setting, and warrants a unique approach to the attacking process. Our methodology begins with the construction of a class correspondence matrix between the whitebox and blackbox label sets. During the online phase of the attack, we then leverage representations of highly related proxy classes from the whitebox distribution to fool the blackbox model into predicting the desired target class. Our attacks are evaluated in three complex and challenging test environments where the source and target models have varying degrees of conceptual overlap amongst their unique categories. Ultimately, we find that it is indeed possible to construct targeted transfer-based adversarial attacks between models that have non-overlapping label spaces! We also analyze the sensitivity of attack success to properties of the clean data. Finally, we show that our transfer attacks serve as powerful adversarial priors when integrated with query-based methods, markedly boosting query efficiency and adversarial success.

Related papers

Breaking Dataset Boundaries: Class-Agnostic Targeted Adversarial Attacks [3.2771631221674333]
Cross-Domain Multi-Targeted Attack (CD-MTA)<n>Method for generating adversarial examples that mislead image classifiers toward any target class, including those not seen during training.<n>Replaces class-level supervision with an image-based conditional input and introduces class-agnostic losses that align the perturbed and target images in the feature space.
arXiv Detail & Related papers (2025-05-27T06:39:29Z)
On Transfer-based Universal Attacks in Pure Black-box Setting [94.92884394009288]
We study the role of prior knowledge of the target model data and number of classes in attack performance. We also provide several interesting insights based on our analysis, and demonstrate that priors cause overestimation in transferability scores.
arXiv Detail & Related papers (2025-04-11T10:41:20Z)
Hard-label based Small Query Black-box Adversarial Attack [2.041108289731398]
We propose a new practical setting of hard label based attack with an optimisation process guided by a pretrained surrogate model. We find the proposed method achieves approximately 5 times higher attack success rate compared to the benchmarks.
arXiv Detail & Related papers (2024-03-09T21:26:22Z)
Unstoppable Attack: Label-Only Model Inversion via Conditional Diffusion Model [14.834360664780709]
Model attacks (MIAs) aim to recover private data from inaccessible training sets of deep learning models. This paper develops a novel MIA method, leveraging a conditional diffusion model (CDM) to recover representative samples under the target label. Experimental results show that this method can generate similar and accurate samples to the target label, outperforming generators of previous approaches.
arXiv Detail & Related papers (2023-07-17T12:14:24Z)
Ensemble-based Blackbox Attacks on Dense Prediction [16.267479602370543]
We show that a carefully designed ensemble can create effective attacks for a number of victim models. In particular, we show that normalization of the weights for individual models plays a critical role in the success of the attacks. Our proposed method can also generate a single perturbation that can fool multiple blackbox detection and segmentation models simultaneously.
arXiv Detail & Related papers (2023-03-25T00:08:03Z)
Generalizable Black-Box Adversarial Attack with Meta Learning [54.196613395045595]
In black-box adversarial attack, the target model's parameters are unknown, and the attacker aims to find a successful perturbation based on query feedback under a query budget. We propose to utilize the feedback information across historical attacks, dubbed example-level adversarial transferability. The proposed framework with the two types of adversarial transferability can be naturally combined with any off-the-shelf query-based attack methods to boost their performance.
arXiv Detail & Related papers (2023-01-01T07:24:12Z)
Query Efficient Cross-Dataset Transferable Black-Box Attack on Action Recognition [99.29804193431823]
Black-box adversarial attacks present a realistic threat to action recognition systems. We propose a new attack on action recognition that addresses these shortcomings by generating perturbations. Our method achieves 8% and higher 12% deception rates compared to state-of-the-art query-based and transfer-based attacks.
arXiv Detail & Related papers (2022-11-23T17:47:49Z)
Zero-Query Transfer Attacks on Context-Aware Object Detectors [95.18656036716972]
Adversarial attacks perturb images such that a deep neural network produces incorrect classification results. A promising approach to defend against adversarial attacks on natural multi-object scenes is to impose a context-consistency check. We present the first approach for generating context-consistent adversarial attacks that can evade the context-consistency check.
arXiv Detail & Related papers (2022-03-29T04:33:06Z)
Label-Only Model Inversion Attacks via Boundary Repulsion [12.374249336222906]
We introduce an algorithm to invert private training data using only the target model's predicted labels. Using the example of face recognition, we show that the images reconstructed by BREP-MI successfully reproduce the semantics of the private training data.
arXiv Detail & Related papers (2022-03-03T18:57:57Z)
Local Black-box Adversarial Attacks: A Query Efficient Approach [64.98246858117476]
Adrial attacks have threatened the application of deep neural networks in security-sensitive scenarios. We propose a novel framework to perturb the discriminative areas of clean examples only within limited queries in black-box attacks. We conduct extensive experiments to show that our framework can significantly improve the query efficiency during black-box perturbing with a high attack success rate.
arXiv Detail & Related papers (2021-01-04T15:32:16Z)
Perturbing Across the Feature Hierarchy to Improve Standard and Strict Blackbox Attack Transferability [100.91186458516941]
We consider the blackbox transfer-based targeted adversarial attack threat model in the realm of deep neural network (DNN) image classifiers. We design a flexible attack framework that allows for multi-layer perturbations and demonstrates state-of-the-art targeted transfer performance. We analyze why the proposed methods outperform existing attack strategies and show an extension of the method in the case when limited queries to the blackbox model are allowed.
arXiv Detail & Related papers (2020-04-29T16:00:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.