Adversarial Attacks on the Interpretation of Neuron Activation
Maximization
- URL: http://arxiv.org/abs/2306.07397v1
- Date: Mon, 12 Jun 2023 19:54:33 GMT
- Title: Adversarial Attacks on the Interpretation of Neuron Activation
Maximization
- Authors: Geraldin Nanfack, Alexander Fulleringer, Jonathan Marty, Michael
Eickenberg, Eugene Belilovsky
- Abstract summary: Activation-maximization approaches are used to interpret and analyze trained deep-learning models.
In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
- Score: 70.5472799454224
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The internal functional behavior of trained Deep Neural Networks is
notoriously difficult to interpret. Activation-maximization approaches are one
set of techniques used to interpret and analyze trained deep-learning models.
These consist in finding inputs that maximally activate a given neuron or
feature map. These inputs can be selected from a data set or obtained by
optimization. However, interpretability methods may be subject to being
deceived. In this work, we consider the concept of an adversary manipulating a
model for the purpose of deceiving the interpretation. We propose an
optimization framework for performing this manipulation and demonstrate a
number of ways that popular activation-maximization interpretation techniques
associated with CNNs can be manipulated to change the interpretations, shedding
light on the reliability of these methods.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Explaining Explainability: Towards Deeper Actionable Insights into Deep
Learning through Second-order Explainability [70.60433013657693]
Second-order explainable AI (SOXAI) was recently proposed to extend explainable AI (XAI) from the instance level to the dataset level.
We demonstrate for the first time, via example classification and segmentation cases, that eliminating irrelevant concepts from the training set based on actionable insights from SOXAI can enhance a model's performance.
arXiv Detail & Related papers (2023-06-14T23:24:01Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Interpretable Social Anchors for Human Trajectory Forecasting in Crowds [84.20437268671733]
We propose a neural network-based system to predict human trajectory in crowds.
We learn interpretable rule-based intents, and then utilise the expressibility of neural networks to model scene-specific residual.
Our architecture is tested on the interaction-centric benchmark TrajNet++.
arXiv Detail & Related papers (2021-05-07T09:22:34Z) - Explainability-aided Domain Generalization for Image Classification [0.0]
We show that applying methods and architectures from the explainability literature can achieve state-of-the-art performance for the challenging task of domain generalization.
We develop a set of novel algorithms including DivCAM, an approach where the network receives guidance during training via gradient based class activation maps to focus on a diverse set of discriminative features.
Since these methods offer competitive performance on top of explainability, we argue that the proposed methods can be used as a tool to improve the robustness of deep neural network architectures.
arXiv Detail & Related papers (2021-04-05T02:27:01Z) - Ada-SISE: Adaptive Semantic Input Sampling for Efficient Explanation of
Convolutional Neural Networks [26.434705114982584]
We propose an efficient interpretation method for convolutional neural networks.
Experimental results show that the proposed method can reduce the execution time up to 30%.
arXiv Detail & Related papers (2021-02-15T19:10:00Z) - Towards Robust Explanations for Deep Neural Networks [5.735035463793008]
We develop a unified theoretical framework for deriving bounds on the maximal manipulability of a model.
We present three different techniques to boost robustness against manipulation.
arXiv Detail & Related papers (2020-12-18T18:29:09Z) - Making Neural Networks Interpretable with Attribution: Application to
Implicit Signals Prediction [11.427019313283997]
We propose a novel formulation of interpretable deep neural networks for the attribution task.
Using masked weights, hidden features can be deeply attributed, split into several input-restricted sub-networks and trained as a boosted mixture of experts.
arXiv Detail & Related papers (2020-08-26T06:46:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.