On Adversarial Examples and Stealth Attacks in Artificial Intelligence
Systems
- URL: http://arxiv.org/abs/2004.04479v1
- Date: Thu, 9 Apr 2020 10:56:53 GMT
- Title: On Adversarial Examples and Stealth Attacks in Artificial Intelligence
Systems
- Authors: Ivan Y. Tyukin, Desmond J. Higham, and Alexander N. Gorban
- Abstract summary: We present a formal framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems.
The first class involves adversarial examples and concerns the introduction of small perturbations of the input data that cause misclassification.
The second class, introduced here for the first time and named stealth attacks, involves small perturbations to the AI system itself.
- Score: 62.997667081978825
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we present a formal theoretical framework for assessing and
analyzing two classes of malevolent action towards generic Artificial
Intelligence (AI) systems. Our results apply to general multi-class classifiers
that map from an input space into a decision space, including artificial neural
networks used in deep learning applications. Two classes of attacks are
considered. The first class involves adversarial examples and concerns the
introduction of small perturbations of the input data that cause
misclassification. The second class, introduced here for the first time and
named stealth attacks, involves small perturbations to the AI system itself.
Here the perturbed system produces whatever output is desired by the attacker
on a specific small data set, perhaps even a single input, but performs as
normal on a validation set (which is unknown to the attacker). We show that in
both cases, i.e., in the case of an attack based on adversarial examples and in
the case of a stealth attack, the dimensionality of the AI's decision-making
space is a major contributor to the AI's susceptibility. For attacks based on
adversarial examples, a second crucial parameter is the absence of local
concentrations in the data probability distribution, a property known as
Smeared Absolute Continuity. According to our findings, robustness to
adversarial examples requires either (a) the data distributions in the AI's
feature space to have concentrated probability density functions or (b) the
dimensionality of the AI's decision variables to be sufficiently small. We also
show how to construct stealth attacks on high-dimensional AI systems that are
hard to spot unless the validation set is made exponentially large.
Related papers
- Adversarial Attacks and Dimensionality in Text Classifiers [3.4179091429029382]
Adversarial attacks on machine learning algorithms have been a key deterrent to the adoption of AI in many real-world use cases.
We study adversarial examples in the field of natural language processing, specifically text classification tasks.
arXiv Detail & Related papers (2024-04-03T11:49:43Z) - Adversarial Attacks Neutralization via Data Set Randomization [3.655021726150369]
Adversarial attacks on deep learning models pose a serious threat to their reliability and security.
We propose a new defense mechanism that is rooted on hyperspace projection.
We show that our solution increases the robustness of deep learning models against adversarial attacks.
arXiv Detail & Related papers (2023-06-21T10:17:55Z) - Wasserstein distributional robustness of neural networks [9.79503506460041]
Deep neural networks are known to be vulnerable to adversarial attacks (AA)
For an image recognition task, this means that a small perturbation of the original can result in the image being misclassified.
We re-cast the problem using techniques of Wasserstein distributionally robust optimization (DRO) and obtain novel contributions.
arXiv Detail & Related papers (2023-06-16T13:41:24Z) - Can Adversarial Examples Be Parsed to Reveal Victim Model Information? [62.814751479749695]
In this work, we ask whether it is possible to infer data-agnostic victim model (VM) information from data-specific adversarial instances.
We collect a dataset of adversarial attacks across 7 attack types generated from 135 victim models.
We show that a simple, supervised model parsing network (MPN) is able to infer VM attributes from unseen adversarial attacks.
arXiv Detail & Related papers (2023-03-13T21:21:49Z) - Illusory Attacks: Information-Theoretic Detectability Matters in Adversarial Attacks [76.35478518372692]
We introduce epsilon-illusory, a novel form of adversarial attack on sequential decision-makers.
Compared to existing attacks, we empirically find epsilon-illusory to be significantly harder to detect with automated methods.
Our findings suggest the need for better anomaly detectors, as well as effective hardware- and system-level defenses.
arXiv Detail & Related papers (2022-07-20T19:49:09Z) - The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems.
In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z) - Learning and Certification under Instance-targeted Poisoning [49.55596073963654]
We study PAC learnability and certification under instance-targeted poisoning attacks.
We show that when the budget of the adversary scales sublinearly with the sample complexity, PAC learnability and certification are achievable.
We empirically study the robustness of K nearest neighbour, logistic regression, multi-layer perceptron, and convolutional neural network on real data sets.
arXiv Detail & Related papers (2021-05-18T17:48:15Z) - Practical No-box Adversarial Attacks against DNNs [31.808770437120536]
We investigate no-box adversarial examples, where the attacker can neither access the model information or the training set nor query the model.
We propose three mechanisms for training with a very small dataset and find that prototypical reconstruction is the most effective.
Our approach significantly diminishes the average prediction accuracy of the system to only 15.40%, which is on par with the attack that transfers adversarial examples from a pre-trained Arcface model.
arXiv Detail & Related papers (2020-12-04T11:10:03Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.