Towards Robust Explanations for Deep Neural Networks
- URL: http://arxiv.org/abs/2012.10425v1
- Date: Fri, 18 Dec 2020 18:29:09 GMT
- Title: Towards Robust Explanations for Deep Neural Networks
- Authors: Ann-Kathrin Dombrowski, Christopher J. Anders, Klaus-Robert M\"uller,
Pan Kessel
- Abstract summary: We develop a unified theoretical framework for deriving bounds on the maximal manipulability of a model.
We present three different techniques to boost robustness against manipulation.
- Score: 5.735035463793008
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Explanation methods shed light on the decision process of black-box
classifiers such as deep neural networks. But their usefulness can be
compromised because they are susceptible to manipulations. With this work, we
aim to enhance the resilience of explanations. We develop a unified theoretical
framework for deriving bounds on the maximal manipulability of a model. Based
on these theoretical insights, we present three different techniques to boost
robustness against manipulation: training with weight decay, smoothing
activation functions, and minimizing the Hessian of the network. Our
experimental results confirm the effectiveness of these approaches.
Related papers
- Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network.
Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z) - Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process.
We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z) - Adversarial Attacks on the Interpretation of Neuron Activation
Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models.
In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Robust Explanation Constraints for Neural Networks [33.14373978947437]
Post-hoc explanation methods used with the intent of neural networks are sometimes said to help engender trust in their outputs.
Our training method is the only method able to learn neural networks with insights about robustness tested across all six tested networks.
arXiv Detail & Related papers (2022-12-16T14:40:25Z) - A Robust Unsupervised Ensemble of Feature-Based Explanations using
Restricted Boltzmann Machines [4.821071466968101]
We propose a technique for aggregating the feature attributions of different explanatory algorithms using Restricted Boltzmann Machines (RBMs)
Several challenging experiments on real-world datasets show that the proposed RBM method outperforms popular feature attribution methods and basic ensemble techniques.
arXiv Detail & Related papers (2021-11-14T15:58:21Z) - Gone Fishing: Neural Active Learning with Fisher Embeddings [55.08537975896764]
There is an increasing need for active learning algorithms that are compatible with deep neural networks.
This article introduces BAIT, a practical representation of tractable, and high-performing active learning algorithm for neural networks.
arXiv Detail & Related papers (2021-06-17T17:26:31Z) - Ada-SISE: Adaptive Semantic Input Sampling for Efficient Explanation of
Convolutional Neural Networks [26.434705114982584]
We propose an efficient interpretation method for convolutional neural networks.
Experimental results show that the proposed method can reduce the execution time up to 30%.
arXiv Detail & Related papers (2021-02-15T19:10:00Z) - Feature Purification: How Adversarial Training Performs Robust Deep
Learning [66.05472746340142]
We show a principle that we call Feature Purification, where we show one of the causes of the existence of adversarial examples is the accumulation of certain small dense mixtures in the hidden weights during the training process of a neural network.
We present both experiments on the CIFAR-10 dataset to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly gradient descent indeed this principle.
arXiv Detail & Related papers (2020-05-20T16:56:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.