Related papers: Manipulating Feature Visualizations with Gradient Slingshots

Manipulating Feature Visualizations with Gradient Slingshots

URL: http://arxiv.org/abs/2401.06122v3
Date: Fri, 13 Jun 2025 16:13:55 GMT
Title: Manipulating Feature Visualizations with Gradient Slingshots
Authors: Dilyara Bareeva, Marina M. -C. Höhne, Alexander Warnecke, Lukas Pirch, Klaus-Robert Müller, Konrad Rieck, Sebastian Lapuschkin, Kirill Bykov,
Abstract summary: Feature Visualization (FV) is a widely used technique for interpreting the concepts learned by Deep Neural Networks (DNNs)<n>We introduce a novel method, Gradient Slingshots, that enables manipulation of FV without modifying the model architecture or significantly degrading its performance.
Score: 53.94925202421929
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Feature Visualization (FV) is a widely used technique for interpreting the concepts learned by Deep Neural Networks (DNNs), which synthesizes input patterns that maximally activate a given feature. Despite its popularity, the trustworthiness of FV explanations has received limited attention. In this paper, we introduce a novel method, Gradient Slingshots, that enables manipulation of FV without modifying the model architecture or significantly degrading its performance. By shaping new trajectories in the off-distribution regions of the activation landscape of a feature, we coerce the optimization process to converge in a predefined visualization. We evaluate our approach on several DNN architectures, demonstrating its ability to replace faithfuls FV with arbitrary targets. These results expose a critical vulnerability: auditors relying solely on FV may accept entirely fabricated explanations. To mitigate this risk, we propose a straightforward defense and quantitatively demonstrate its effectiveness.

Related papers

"Principal Components" Enable A New Language of Images [79.45806370905775]
We introduce a novel visual tokenization framework that embeds a provable PCA-like structure into the latent token space.<n>Our approach achieves state-of-the-art reconstruction performance and enables better interpretability to align with the human vision system.
arXiv Detail & Related papers (2025-03-11T17:59:41Z)
Discovering Chunks in Neural Embeddings for Interpretability [53.80157905839065]
We propose leveraging the principle of chunking to interpret artificial neural population activities. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities. We identify similar recurring embedding states corresponding to concepts in the input, with perturbations to these states activating or inhibiting the associated concepts.
arXiv Detail & Related papers (2025-02-03T20:30:46Z)
FFHFlow: A Flow-based Variational Approach for Learning Diverse Dexterous Grasps with Shape-Aware Introspection [19.308304984645684]
We introduce a novel model that can generate diverse grasps for a multi-fingered hand.<n>The proposed idea gains superior performance and higher run-time efficiency against strong baselines.<n>We also demonstrate substantial benefits of greater diversity for grasping objects in clutter and a confined workspace in the real world.
arXiv Detail & Related papers (2024-07-21T13:33:08Z)
Automatic Discovery of Visual Circuits [66.99553804855931]
We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept. We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.
arXiv Detail & Related papers (2024-04-22T17:00:57Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks. We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception. Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z)
Automated Natural Language Explanation of Deep Visual Neurons with Large Models [43.178568768100305]
This paper proposes a novel post-hoc framework for generating semantic explanations of neurons with large foundation models. Our framework is designed to be compatible with various model architectures and datasets, automated and scalable neuron interpretation.
arXiv Detail & Related papers (2023-10-16T17:04:51Z)
Towards General Visual-Linguistic Face Forgery Detection [95.73987327101143]
Deepfakes are realistic face manipulations that can pose serious threats to security, privacy, and trust. Existing methods mostly treat this task as binary classification, which uses digital labels or mask signals to train the detection model. We propose a novel paradigm named Visual-Linguistic Face Forgery Detection(VLFFD), which uses fine-grained sentence-level prompts as the annotation.
arXiv Detail & Related papers (2023-07-31T10:22:33Z)
Concept backpropagation: An Explainable AI approach for visualising learned concepts in neural network models [0.0]
We present an extension to the method of concept detection, named emphconcept backpropagation, which provides a way of analysing how the information representing a given concept is internalised in a given neural network model.
arXiv Detail & Related papers (2023-07-24T08:21:13Z)
Improving Viewpoint Robustness for Visual Recognition via Adversarial Training [26.824940629150362]
We propose Viewpoint-Invariant Adversarial Training (VIAT) to improve the viewpoint robustness of image classifiers. We show that VIAT significantly improves the viewpoint robustness of various image classifiers based on the diversity of adversarial viewpoints generated by GMVFool.
arXiv Detail & Related papers (2023-07-21T12:18:35Z)
FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation [9.025997629442896]
FACADE is designed for unsupervised mechanistic anomaly detection in deep neural networks. Our approach seeks to improve model robustness, enhance scalable model oversight, and demonstrates promising applications in real-world deployment settings.
arXiv Detail & Related papers (2023-07-20T04:00:37Z)
Adversarial Attacks on the Interpretation of Neuron Activation Maximization [70.5472799454224]
Activation-maximization approaches are used to interpret and analyze trained deep-learning models. In this work, we consider the concept of an adversary manipulating a model for the purpose of deceiving the interpretation.
arXiv Detail & Related papers (2023-06-12T19:54:33Z)
VISION DIFFMASK: Faithful Interpretation of Vision Transformers with Differentiable Patch Masking [10.345616883018296]
We propose a post-hoc interpretability method called VISION DIFFMASK. It uses the activations of the model's hidden layers to predict the relevant parts of the input that contribute to its final predictions. Our approach uses a gating mechanism to identify the minimal subset of the original input that preserves the predicted distribution over classes.
arXiv Detail & Related papers (2023-04-13T10:49:26Z)
Studying How to Efficiently and Effectively Guide Models with Explanations [52.498055901649025]
'Model guidance' is the idea of regularizing the models' explanations to ensure that they are "right for the right reasons" We conduct an in-depth evaluation across various loss functions, attribution methods, models, and 'guidance depths' on the PASCAL VOC 2007 and MS COCO 2014 datasets. Specifically, we guide the models via bounding box annotations, which are much cheaper to obtain than the commonly used segmentation masks.
arXiv Detail & Related papers (2023-03-21T15:34:50Z)
On Modifying a Neural Network's Perception [3.42658286826597]
We propose a method which allows one to modify what an artificial neural network is perceiving regarding specific human-defined concepts. We test the proposed method on different models, assessing whether the performed manipulations are well interpreted by the models, and analyzing how they react to them.
arXiv Detail & Related papers (2023-03-05T12:09:37Z)
ConCerNet: A Contrastive Learning Based Framework for Automated Conservation Law Discovery and Trustworthy Dynamical System Prediction [82.81767856234956]
This paper proposes a new learning framework named ConCerNet to improve the trustworthiness of the DNN based dynamics modeling. We show that our method consistently outperforms the baseline neural networks in both coordinate error and conservation metrics.
arXiv Detail & Related papers (2023-02-11T21:07:30Z)
EINNs: Epidemiologically-Informed Neural Networks [75.34199997857341]
We introduce a new class of physics-informed neural networks-EINN-crafted for epidemic forecasting. We investigate how to leverage both the theoretical flexibility provided by mechanistic models as well as the data-driven expressability afforded by AI models.
arXiv Detail & Related papers (2022-02-21T18:59:03Z)
Inducing Semantic Grouping of Latent Concepts for Explanations: An Ante-Hoc Approach [18.170504027784183]
We show that by exploiting latent and properly modifying different parts of the model can result better explanation as well as provide superior predictive performance. We also proposed a technique of using two different self-supervision techniques to extract meaningful concepts related to the type of self-supervision considered.
arXiv Detail & Related papers (2021-08-25T07:09:57Z)
Ada-SISE: Adaptive Semantic Input Sampling for Efficient Explanation of Convolutional Neural Networks [26.434705114982584]
We propose an efficient interpretation method for convolutional neural networks. Experimental results show that the proposed method can reduce the execution time up to 30%.
arXiv Detail & Related papers (2021-02-15T19:10:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.