Bluff: Interactively Deciphering Adversarial Attacks on Deep Neural
Networks
- URL: http://arxiv.org/abs/2009.02608v2
- Date: Tue, 8 Sep 2020 02:38:11 GMT
- Title: Bluff: Interactively Deciphering Adversarial Attacks on Deep Neural
Networks
- Authors: Nilaksh Das, Haekyu Park, Zijie J. Wang, Fred Hohman, Robert Firstman,
Emily Rogers, Duen Horng Chau
- Abstract summary: Bluff is an interactive system for visualizing, characterizing, and deciphering adversarial attacks on vision-based neural networks.
It reveals mechanisms that adversarial attacks employ to inflict harm on a model.
- Score: 21.074988013822566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) are now commonly used in many domains. However,
they are vulnerable to adversarial attacks: carefully crafted perturbations on
data inputs that can fool a model into making incorrect predictions. Despite
significant research on developing DNN attack and defense techniques, people
still lack an understanding of how such attacks penetrate a model's internals.
We present Bluff, an interactive system for visualizing, characterizing, and
deciphering adversarial attacks on vision-based neural networks. Bluff allows
people to flexibly visualize and compare the activation pathways for benign and
attacked images, revealing mechanisms that adversarial attacks employ to
inflict harm on a model. Bluff is open-sourced and runs in modern web browsers.
Related papers
- On The Relationship Between Universal Adversarial Attacks And Sparse
Representations [38.43938212884298]
We show the connection between adversarial attacks and sparse representations.
Common attacks on neural networks can be expressed as attacks on the sparse representation of the input image.
arXiv Detail & Related papers (2023-11-14T16:00:29Z) - Investigating Human-Identifiable Features Hidden in Adversarial
Perturbations [54.39726653562144]
Our study explores up to five attack algorithms across three datasets.
We identify human-identifiable features in adversarial perturbations.
Using pixel-level annotations, we extract such features and demonstrate their ability to compromise target models.
arXiv Detail & Related papers (2023-09-28T22:31:29Z) - Searching for the Essence of Adversarial Perturbations [73.96215665913797]
We show that adversarial perturbations contain human-recognizable information, which is the key conspirator responsible for a neural network's erroneous prediction.
This concept of human-recognizable information allows us to explain key features related to adversarial perturbations.
arXiv Detail & Related papers (2022-05-30T18:04:57Z) - The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems.
In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z) - Attack to Fool and Explain Deep Networks [59.97135687719244]
We counter-argue by providing evidence of human-meaningful patterns in adversarial perturbations.
Our major contribution is a novel pragmatic adversarial attack that is subsequently transformed into a tool to interpret the visual models.
arXiv Detail & Related papers (2021-06-20T03:07:36Z) - Explainable Adversarial Attacks in Deep Neural Networks Using Activation
Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples.
We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z) - BreakingBED -- Breaking Binary and Efficient Deep Neural Networks by
Adversarial Attacks [65.2021953284622]
We study robustness of CNNs against white-box and black-box adversarial attacks.
Results are shown for distilled CNNs, agent-based state-of-the-art pruned models, and binarized neural networks.
arXiv Detail & Related papers (2021-03-14T20:43:19Z) - Adversarial Feature Desensitization [12.401175943131268]
We propose a novel approach to adversarial robustness, which builds upon the insights from the domain adaptation field.
Our method, called Adversarial Feature Desensitization (AFD), aims at learning features that are invariant towards adversarial perturbations of the inputs.
arXiv Detail & Related papers (2020-06-08T14:20:02Z) - Exploring the role of Input and Output Layers of a Deep Neural Network
in Adversarial Defense [0.0]
It has been shown that certain inputs exist which would not trick a human normally, but may mislead the model completely.
adversarial inputs pose a high security threat when such models are used in real world applications.
We have analyzed the resistance of three different classes of fully connected dense networks against the rarely tested non-gradient based adversarial attacks.
arXiv Detail & Related papers (2020-06-02T06:15:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.