Adversarial Attack Attribution: Discovering Attributable Signals in
Adversarial ML Attacks
- URL: http://arxiv.org/abs/2101.02899v1
- Date: Fri, 8 Jan 2021 08:16:41 GMT
- Title: Adversarial Attack Attribution: Discovering Attributable Signals in
Adversarial ML Attacks
- Authors: Marissa Dotter, Sherry Xie, Keith Manville, Josh Harguess, Colin
Busho, Mikel Rodriguez
- Abstract summary: Even production systems, such as self-driving cars and ML-as-a-service offerings, are susceptible to adversarial inputs.
Can perturbed inputs be attributed to the methods used to generate the attack?
We introduce the concept of adversarial attack attribution and create a simple supervised learning experimental framework to examine the feasibility of discovering attributable signals in adversarial attacks.
- Score: 0.7883722807601676
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine Learning (ML) models are known to be vulnerable to adversarial inputs
and researchers have demonstrated that even production systems, such as
self-driving cars and ML-as-a-service offerings, are susceptible. These systems
represent a target for bad actors. Their disruption can cause real physical and
economic harm. When attacks on production ML systems occur, the ability to
attribute the attack to the responsible threat group is a critical step in
formulating a response and holding the attackers accountable. We pose the
following question: can adversarially perturbed inputs be attributed to the
particular methods used to generate the attack? In other words, is there a way
to find a signal in these attacks that exposes the attack algorithm, model
architecture, or hyperparameters used in the attack? We introduce the concept
of adversarial attack attribution and create a simple supervised learning
experimental framework to examine the feasibility of discovering attributable
signals in adversarial attacks. We find that it is possible to differentiate
attacks generated with different attack algorithms, models, and hyperparameters
on both the CIFAR-10 and MNIST datasets.
Related papers
- Mitigating Label Flipping Attacks in Malicious URL Detectors Using
Ensemble Trees [16.16333915007336]
Malicious URLs provide adversarial opportunities across various industries, including transportation, healthcare, energy, and banking.
backdoor attacks involve manipulating a small percentage of training data labels, such as Label Flipping (LF), which changes benign labels to malicious ones and vice versa.
We propose an innovative alarm system that detects the presence of poisoned labels and a defense mechanism designed to uncover the original class labels.
arXiv Detail & Related papers (2024-03-05T14:21:57Z) - DALA: A Distribution-Aware LoRA-Based Adversarial Attack against
Language Models [64.79319733514266]
Adversarial attacks can introduce subtle perturbations to input data.
Recent attack methods can achieve a relatively high attack success rate (ASR)
We propose a Distribution-Aware LoRA-based Adversarial Attack (DALA) method.
arXiv Detail & Related papers (2023-11-14T23:43:47Z) - Analyzing the Impact of Adversarial Examples on Explainable Machine
Learning [0.31498833540989407]
Adversarial attacks are a type of attack on machine learning models where an attacker deliberately modifies the inputs to cause the model to make incorrect predictions.
Work on the vulnerability of deep learning models to adversarial attacks has shown that it is very easy to make samples that make a model predict things that it doesn't want to.
In this work, we analyze the impact of model interpretability due to adversarial attacks on text classification problems.
arXiv Detail & Related papers (2023-07-17T08:50:36Z) - Can Adversarial Examples Be Parsed to Reveal Victim Model Information? [62.814751479749695]
In this work, we ask whether it is possible to infer data-agnostic victim model (VM) information from data-specific adversarial instances.
We collect a dataset of adversarial attacks across 7 attack types generated from 135 victim models.
We show that a simple, supervised model parsing network (MPN) is able to infer VM attributes from unseen adversarial attacks.
arXiv Detail & Related papers (2023-03-13T21:21:49Z) - Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks [76.35478518372692]
We introduce epsilon-illusory, a novel form of adversarial attack on sequential decision-makers.
Compared to existing attacks, we empirically find epsilon-illusory to be significantly harder to detect with automated methods.
Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses.
arXiv Detail & Related papers (2022-07-20T19:49:09Z) - I Know What You Trained Last Summer: A Survey on Stealing Machine
Learning Models and Defences [0.1031296820074812]
We study model stealing attacks, assessing their performance and exploring corresponding defence techniques in different settings.
We propose a taxonomy for attack and defence approaches, and provide guidelines on how to select the right attack or defence based on the goal and available resources.
arXiv Detail & Related papers (2022-06-16T21:16:41Z) - On Trace of PGD-Like Adversarial Attacks [77.75152218980605]
Adversarial attacks pose safety and security concerns for deep learning applications.
We construct Adrial Response Characteristics (ARC) features to reflect the model's gradient consistency.
Our method is intuitive, light-weighted, non-intrusive, and data-undemanding.
arXiv Detail & Related papers (2022-05-19T14:26:50Z) - The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems.
In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z) - Zero-shot learning approach to adaptive Cybersecurity using Explainable
AI [0.5076419064097734]
We present a novel approach to handle the alarm flooding problem faced by Cybersecurity systems like security information and event management (SIEM) and intrusion detection (IDS)
We apply a zero-shot learning method to machine learning (ML) by leveraging explanations for predictions of anomalies generated by a ML model.
In this approach, without any prior knowledge of attack, we try to identify it, decipher the features that contribute to classification and try to bucketize the attack in a specific category.
arXiv Detail & Related papers (2021-06-21T06:29:13Z) - ML-Doctor: Holistic Risk Assessment of Inference Attacks Against Machine
Learning Models [64.03398193325572]
Inference attacks against Machine Learning (ML) models allow adversaries to learn about training data, model parameters, etc.
We concentrate on four attacks - namely, membership inference, model inversion, attribute inference, and model stealing.
Our analysis relies on a modular re-usable software, ML-Doctor, which enables ML model owners to assess the risks of deploying their models.
arXiv Detail & Related papers (2021-02-04T11:35:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.