Related papers: Automatic Discovery of Visual Circuits

Automatic Discovery of Visual Circuits

URL: http://arxiv.org/abs/2404.14349v1
Date: Mon, 22 Apr 2024 17:00:57 GMT
Title: Automatic Discovery of Visual Circuits
Authors: Achyuta Rajaram, Neil Chowdhury, Antonio Torralba, Jacob Andreas, Sarah Schwettmann,
Abstract summary: We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept. We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.
Score: 66.99553804855931
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: To date, most discoveries of network subcomponents that implement human-interpretable computations in deep vision models have involved close study of single units and large amounts of human labor. We explore scalable methods for extracting the subgraph of a vision model's computational graph that underlies recognition of a specific visual concept. We introduce a new method for identifying these subgraphs: specifying a visual concept using a few examples, and then tracing the interdependence of neuron activations across layers, or their functional connectivity. We find that our approach extracts circuits that causally affect model output, and that editing these circuits can defend large pretrained models from adversarial attacks.

Related papers

Deep Learning Through A Telescoping Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond [61.18736646013446]
In pursuit of a deeper understanding of its surprising behaviors, we investigate the utility of a simple yet accurate model of a trained neural network. Across three case studies, we illustrate how it can be applied to derive new empirical insights on a diverse range of prominent phenomena.
arXiv Detail & Related papers (2024-10-31T22:54:34Z)
Efficient Visualization of Neural Networks with Generative Models and Adversarial Perturbations [0.0]
This paper presents a novel approach for deep visualization via a generative network, offering an improvement over existing methods. Our model simplifies the architecture by reducing the number of networks used, requiring only a generator and a discriminator. Our model requires less prior training knowledge and uses a non-adversarial training process, where the discriminator acts as a guide.
arXiv Detail & Related papers (2024-09-20T14:59:25Z)
Interactive dense pixel visualizations for time series and model attribution explanations [8.24039921933289]
DAVOTS is an interactive visual analytics approach to explore raw time series data, activations of neural networks, and attributions in a dense-pixel visualization. We apply clustering approaches to the visualized data domains to highlight groups and present ordering strategies for individual and combined data exploration.
arXiv Detail & Related papers (2024-08-27T14:02:21Z)
Manipulating Feature Visualizations with Gradient Slingshots [54.31109240020007]
We introduce a novel method for manipulating Feature Visualization (FV) without significantly impacting the model's decision-making process. We evaluate the effectiveness of our method on several neural network models and demonstrate its capabilities to hide the functionality of arbitrarily chosen neurons.
arXiv Detail & Related papers (2024-01-11T18:57:17Z)
Sample-Efficient Learning of Novel Visual Concepts [7.398195748292981]
State-of-the-art deep learning models struggle to recognize novel objects in a few-shot setting. We show that incorporating a symbolic knowledge graph into a state-of-the-art recognition model enables a new approach for effective few-shot classification.
arXiv Detail & Related papers (2023-06-15T20:24:30Z)
FuNNscope: Visual microscope for interactively exploring the loss landscape of fully connected neural networks [77.34726150561087]
We show how to explore high-dimensional landscape characteristics of neural networks. We generalize observations on small neural networks to more complex systems. An interactive dashboard opens up a number of possible application networks.
arXiv Detail & Related papers (2022-04-09T16:41:53Z)
Explainable Adversarial Attacks in Deep Neural Networks Using Activation Profiles [69.9674326582747]
This paper presents a visual framework to investigate neural network models subjected to adversarial examples. We show how observing these elements can quickly pinpoint exploited areas in a model.
arXiv Detail & Related papers (2021-03-18T13:04:21Z)
Anomaly Detection on Attributed Networks via Contrastive Self-Supervised Learning [50.24174211654775]
We present a novel contrastive self-supervised learning framework for anomaly detection on attributed networks. Our framework fully exploits the local information from network data by sampling a novel type of contrastive instance pair. A graph neural network-based contrastive learning model is proposed to learn informative embedding from high-dimensional attributes and local structure.
arXiv Detail & Related papers (2021-02-27T03:17:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.