SoK: Analyzing Adversarial Examples: A Framework to Study Adversary
Knowledge
- URL: http://arxiv.org/abs/2402.14937v1
- Date: Thu, 22 Feb 2024 19:44:19 GMT
- Title: SoK: Analyzing Adversarial Examples: A Framework to Study Adversary
Knowledge
- Authors: Lucas Fenaux and Florian Kerschbaum
- Abstract summary: Adversarial examples are malicious inputs to machine learning models that trigger a misclassification.
We focus on the image classification domain and provide a theoretical framework to study adversary knowledge inspired by work in order theory.
- Score: 34.39273915926214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adversarial examples are malicious inputs to machine learning models that
trigger a misclassification. This type of attack has been studied for close to
a decade, and we find that there is a lack of study and formalization of
adversary knowledge when mounting attacks. This has yielded a complex space of
attack research with hard-to-compare threat models and attacks. We focus on the
image classification domain and provide a theoretical framework to study
adversary knowledge inspired by work in order theory. We present an adversarial
example game, inspired by cryptographic games, to standardize attacks. We
survey recent attacks in the image classification domain and classify their
adversary's knowledge in our framework. From this systematization, we compile
results that both confirm existing beliefs about adversary knowledge, such as
the potency of information about the attacked model as well as allow us to
derive new conclusions on the difficulty associated with the white-box and
transferable threat models, for example, that transferable attacks might not be
as difficult as previously thought.
Related papers
- Analyzing the Impact of Adversarial Examples on Explainable Machine
Learning [0.31498833540989407]
Adversarial attacks are a type of attack on machine learning models where an attacker deliberately modifies the inputs to cause the model to make incorrect predictions.
Work on the vulnerability of deep learning models to adversarial attacks has shown that it is very easy to make samples that make a model predict things that it doesn't want to.
In this work, we analyze the impact of model interpretability due to adversarial attacks on text classification problems.
arXiv Detail & Related papers (2023-07-17T08:50:36Z) - Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack [53.032801921915436]
Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars.
Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks.
We show such threats exist, even when the attacker only has access to the input/output of the model.
We propose the very first black-box adversarial attack approach in skeleton-based HAR called BASAR.
arXiv Detail & Related papers (2022-11-21T09:51:28Z) - The Space of Adversarial Strategies [6.295859509997257]
Adversarial examples, inputs designed to induce worst-case behavior in machine learning models, have been extensively studied over the past decade.
We propose a systematic approach to characterize worst-case (i.e., optimal) adversaries.
arXiv Detail & Related papers (2022-09-09T20:53:11Z) - Adversarial Robustness of Deep Reinforcement Learning based Dynamic
Recommender Systems [50.758281304737444]
We propose to explore adversarial examples and attack detection on reinforcement learning-based interactive recommendation systems.
We first craft different types of adversarial examples by adding perturbations to the input and intervening on the casual factors.
Then, we augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data.
arXiv Detail & Related papers (2021-12-02T04:12:24Z) - Adversarial Transfer Attacks With Unknown Data and Class Overlap [19.901933940805684]
Current transfer attack research has an unrealistic advantage for the attacker.
We present the first study of transferring adversarial attacks focusing on the data available to attacker and victim under imperfect settings.
This threat model is relevant to applications in medicine, malware, and others.
arXiv Detail & Related papers (2021-09-23T03:41:34Z) - Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks.
We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable.
We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z) - BAARD: Blocking Adversarial Examples by Testing for Applicability,
Reliability and Decidability [12.079529913120593]
Adversarial defenses protect machine learning models from adversarial attacks, but are often tailored to one type of model or attack.
We take inspiration from the concept of Applicability Domain in cheminformatics.
We propose a simple yet robust triple-stage data-driven framework that checks the input globally and locally.
arXiv Detail & Related papers (2021-05-02T15:24:33Z) - ExAD: An Ensemble Approach for Explanation-based Adversarial Detection [17.455233006559734]
We propose ExAD, a framework to detect adversarial examples using an ensemble of explanation techniques.
We evaluate our approach using six state-of-the-art adversarial attacks on three image datasets.
arXiv Detail & Related papers (2021-03-22T00:53:07Z) - Adversarial Example Games [51.92698856933169]
Adrial Example Games (AEG) is a framework that models the crafting of adversarial examples.
AEG provides a new way to design adversarial examples by adversarially training a generator and aversa from a given hypothesis class.
We demonstrate the efficacy of AEG on the MNIST and CIFAR-10 datasets.
arXiv Detail & Related papers (2020-07-01T19:47:23Z) - Adversarial examples are useful too! [47.64219291655723]
I propose a new method to tell whether a model has been subject to a backdoor attack.
The idea is to generate adversarial examples, targeted or untargeted, using conventional attacks such as FGSM.
It is possible to visually locate the perturbed regions and unveil the attack.
arXiv Detail & Related papers (2020-05-13T01:38:56Z) - Adversarial Imitation Attack [63.76805962712481]
A practical adversarial attack should require as little as possible knowledge of attacked models.
Current substitute attacks need pre-trained models to generate adversarial examples.
In this study, we propose a novel adversarial imitation attack.
arXiv Detail & Related papers (2020-03-28T10:02:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.