Scalable Attribution of Adversarial Attacks via Multi-Task Learning
- URL: http://arxiv.org/abs/2302.14059v1
- Date: Sat, 25 Feb 2023 12:27:44 GMT
- Title: Scalable Attribution of Adversarial Attacks via Multi-Task Learning
- Authors: Zhongyi Guo and Keji Han and Yao Ge and Wei Ji and Yun Li
- Abstract summary: Adversarial Attribution Problem (AAP) is used to generate adversarial examples.
We propose a multi-task learning framework named Multi-Task Adversarial Attribution (MTAA) to recognize the three signatures simultaneously.
- Score: 11.302242821058865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) can be easily fooled by adversarial attacks
during inference phase when attackers add imperceptible perturbations to
original examples, i.e., adversarial examples. Many works focus on adversarial
detection and adversarial training to defend against adversarial attacks.
However, few works explore the tool-chains behind adversarial examples, which
can help defenders to seize the clues about the originator of the attack, their
goals, and provide insight into the most effective defense algorithm against
corresponding attacks. With such a gap, it is necessary to develop techniques
that can recognize tool-chains that are leveraged to generate the adversarial
examples, which is called Adversarial Attribution Problem (AAP). In this paper,
AAP is defined as the recognition of three signatures, i.e., {\em attack
algorithm}, {\em victim model} and {\em hyperparameter}. Current works transfer
AAP into single label classification task and ignore the relationship between
these signatures. The former will meet combination explosion problem as the
number of signatures is increasing. The latter dictates that we cannot treat
AAP simply as a single task problem. We first conduct some experiments to
validate the attributability of adversarial examples. Furthermore, we propose a
multi-task learning framework named Multi-Task Adversarial Attribution (MTAA)
to recognize the three signatures simultaneously. MTAA contains perturbation
extraction module, adversarial-only extraction module and classification and
regression module. It takes the relationship between attack algorithm and
corresponding hyperparameter into account and uses the uncertainty weighted
loss to adjust the weights of three recognition tasks. The experimental results
on MNIST and ImageNet show the feasibility and scalability of the proposed
framework as well as its effectiveness in dealing with false alarms.
Related papers
- AdvQDet: Detecting Query-Based Adversarial Attacks with Adversarial Contrastive Prompt Tuning [93.77763753231338]
Adversarial Contrastive Prompt Tuning (ACPT) is proposed to fine-tune the CLIP image encoder to extract similar embeddings for any two intermediate adversarial queries.
We show that ACPT can detect 7 state-of-the-art query-based attacks with $>99%$ detection rate within 5 shots.
We also show that ACPT is robust to 3 types of adaptive attacks.
arXiv Detail & Related papers (2024-08-04T09:53:50Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - PRAT: PRofiling Adversarial aTtacks [52.693011665938734]
We introduce a novel problem of PRofiling Adversarial aTtacks (PRAT)
Given an adversarial example, the objective of PRAT is to identify the attack used to generate it.
We use AID to devise a novel framework for the PRAT objective.
arXiv Detail & Related papers (2023-09-20T07:42:51Z) - Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack [53.032801921915436]
Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars.
Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks.
We show such threats exist, even when the attacker only has access to the input/output of the model.
We propose the very first black-box adversarial attack approach in skeleton-based HAR called BASAR.
arXiv Detail & Related papers (2022-11-21T09:51:28Z) - Towards A Conceptually Simple Defensive Approach for Few-shot
classifiers Against Adversarial Support Samples [107.38834819682315]
We study a conceptually simple approach to defend few-shot classifiers against adversarial attacks.
We propose a simple attack-agnostic detection method, using the concept of self-similarity and filtering.
Our evaluation on the miniImagenet (MI) and CUB datasets exhibit good attack detection performance.
arXiv Detail & Related papers (2021-10-24T05:46:03Z) - Identification of Attack-Specific Signatures in Adversarial Examples [62.17639067715379]
We show that different attack algorithms produce adversarial examples which are distinct not only in their effectiveness but also in how they qualitatively affect their victims.
Our findings suggest that prospective adversarial attacks should be compared not only via their success rates at fooling models but also via deeper downstream effects they have on victims.
arXiv Detail & Related papers (2021-10-13T15:40:48Z) - Self-Supervised Adversarial Example Detection by Disentangled
Representation [16.98476232162835]
We train an autoencoder, assisted by a discriminator network, over both correctly paired class/semantic features and incorrectly paired class/semantic features to reconstruct benign and counterexamples.
This mimics the behavior of adversarial examples and can reduce the unnecessary generalization ability of autoencoder.
Compared with the state-of-the-art self-supervised detection methods, our method exhibits better performance in various measurements.
arXiv Detail & Related papers (2021-05-08T12:48:18Z) - BAARD: Blocking Adversarial Examples by Testing for Applicability,
Reliability and Decidability [12.079529913120593]
Adversarial defenses protect machine learning models from adversarial attacks, but are often tailored to one type of model or attack.
We take inspiration from the concept of Applicability Domain in cheminformatics.
We propose a simple yet robust triple-stage data-driven framework that checks the input globally and locally.
arXiv Detail & Related papers (2021-05-02T15:24:33Z) - ExAD: An Ensemble Approach for Explanation-based Adversarial Detection [17.455233006559734]
We propose ExAD, a framework to detect adversarial examples using an ensemble of explanation techniques.
We evaluate our approach using six state-of-the-art adversarial attacks on three image datasets.
arXiv Detail & Related papers (2021-03-22T00:53:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.