Related papers: Manipulating Feature Visualizations with Gradient Slingshots

Manipulating Feature Visualizations with Gradient Slingshots

URL: http://arxiv.org/abs/2401.06122v2
Date: Wed, 10 Jul 2024 16:08:08 GMT
Title: Manipulating Feature Visualizations with Gradient Slingshots
Authors: Dilyara Bareeva, Marina M. -C. Höhne, Alexander Warnecke, Lukas Pirch, Klaus-Robert Müller, Konrad Rieck, Kirill Bykov,
Abstract summary: We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
Score: 54.31109240020007
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Neural Networks (DNNs) are capable of learning complex and versatile representations, however, the semantic nature of the learned concepts remains unknown. A common method used to explain the concepts learned by DNNs is Feature Visualization (FV), which generates a synthetic input signal that maximally activates a particular neuron in the network. In this paper, we investigate the vulnerability of this approach to adversarial model manipulations and introduce a novel method for manipulating FV without significantly impacting the model's decision-making process. The key distinction of our proposed approach is that it does not alter the model architecture. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons by masking the original explanations of neurons with chosen target explanations during model auditing.

Related papers

Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
Automatic Discovery of Visual Circuits [66.99553804855931]
We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept. We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.
arXiv Detail & Related papers (2024-04-22T17:00:57Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Automated Natural Language Explanation of Deep Visual Neurons with Large Models [43.178568768100305]
This paper proposes a novel post-hoc framework for generating semantic explanations of neurons with large foundation models. Our framework is designed to be compatible with various model architectures and datasets, automated and scalable neuron interpretation.
arXiv Detail & Related papers (2023-10-16T17:04:51Z)
Concept backpropagation: An Explainable AI approach for visualising learned concepts in neural network models [0.0]
We present an extension to the method of concept detection, named emphconcept backpropagation, which provides a way of analysing how the information representing a given concept is internalised in a given neural network model.
arXiv Detail & Related papers (2023-07-24T08:21:13Z)
Adversarial Attacks on the Interpretation of Neuron Activation Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models. In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z)
On Modifying a Neural Network's Perception [3.42658286826597]
We propose a method which allows one to modify what an artificial neural network is perceiving regarding specific human-defined concepts. We test the proposed method on different models, assessing whether the performed manipulations are well interpreted by the models, and analyzing how they react to them.
arXiv Detail & Related papers (2023-03-05T12:09:37Z)
ConCerNet: A Contrastive Learning Based Framework for Automated Conservation Law Discovery and Trustworthy Dynamical System Prediction [82.81767856234956]
This paper proposes a new learning framework named ConCerNet to improve the trustworthiness of the DNN based dynamics modeling. We show that our method consistently outperforms the baseline neural networks in both coordinate error and conservation metrics.
arXiv Detail & Related papers (2023-02-11T21:07:30Z)
EINNs: Epidemiologically-Informed Neural Networks [75.34199997857341]
We introduce a new class of physics-informed neural networks-EINN-crafted for epidemic forecasting. We investigate how to leverage both the theoretical flexibility provided by mechanistic models as well as the data-driven expressability afforded by AI models.
arXiv Detail & Related papers (2022-02-21T18:59:03Z)
Ada-SISE: Adaptive Semantic Input Sampling for Efficient Explanation of Convolutional Neural Networks [26.434705114982584]
We propose an efficient interpretation method for convolutional neural networks. Experimental results show that the proposed method can reduce the execution time up to 30%.
arXiv Detail & Related papers (2021-02-15T19:10:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.