Related papers: Combining Causal Models for More Accurate Abstractions of Neural Networks

Combining Causal Models for More Accurate Abstractions of Neural Networks

URL: http://arxiv.org/abs/2503.11429v1
Date: Fri, 14 Mar 2025 14:14:43 GMT
Title: Combining Causal Models for More Accurate Abstractions of Neural Networks
Authors: Theodora-Mara Pîslar, Sara Magliacane, Atticus Geiger,
Abstract summary: Causal abstraction provides a precise notion of when a network implements an algorithm.<n>A typical problem in practical settings is that the algorithm is not an entirely faithful abstraction of the network.<n>We propose a solution where we combine different simple high-level models to produce a more faithful representation of the network.
Score: 10.115827125021438
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Mechanistic interpretability aims to reverse engineer neural networks by uncovering which high-level algorithms they implement. Causal abstraction provides a precise notion of when a network implements an algorithm, i.e., a causal model of the network contains low-level features that realize the high-level variables in a causal model of the algorithm. A typical problem in practical settings is that the algorithm is not an entirely faithful abstraction of the network, meaning it only partially captures the true reasoning process of a model. We propose a solution where we combine different simple high-level models to produce a more faithful representation of the network. Through learning this combination, we can model neural networks as being in different computational states depending on the input provided, which we show is more accurate to GPT 2-small fine-tuned on two toy tasks. We observe a trade-off between the strength of an interpretability hypothesis, which we define in terms of the number of inputs explained by the high-level models, and its faithfulness, which we define as the interchange intervention accuracy. Our method allows us to modulate between the two, providing the most accurate combination of models that describe the behavior of a neural network given a faithfulness level.

Related papers

Towards Scalable and Versatile Weight Space Learning [51.78426981947659]
This paper introduces the SANE approach to weight-space learning. Our method extends the idea of hyper-representations towards sequential processing of subsets of neural network weights.
arXiv Detail & Related papers (2024-06-14T13:12:07Z)
Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters. Our approach enables a single model to encode neural computational graphs with diverse architectures. We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z)
Residual Multi-Fidelity Neural Network Computing [0.0]
We consider the general problem of constructing a neural network surrogate model using multi-fidelity information.<n>Motivated by error-complexity estimates for ReLU neural networks, we formulate the correlation between an inexpensive low-fidelity model and an expensive high-fidelity model.<n>We present four numerical examples to demonstrate the power of the proposed framework.
arXiv Detail & Related papers (2023-10-05T14:43:16Z)
Interpretability of an Interaction Network for identifying $H \rightarrow b\bar{b}$ jets [4.553120911976256]
In recent times, AI models based on deep neural networks are becoming increasingly popular for many of these applications. We explore interpretability of AI models by examining an Interaction Network (IN) model designed to identify boosted $Hto bbarb$ jets. We additionally illustrate the activity of hidden layers within the IN model as Neural Activation Pattern (NAP) diagrams.
arXiv Detail & Related papers (2022-11-23T08:38:52Z)
NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction [37.357949900603295]
We propose a neural architecture representation model that can be used to estimate attributes holistically. Experiment results show that our proposed framework can be used to predict the latency and accuracy attributes of both cell architectures and whole deep neural networks.
arXiv Detail & Related papers (2022-11-15T10:15:21Z)
Robust Training and Verification of Implicit Neural Networks: A Non-Euclidean Contractive Approach [64.23331120621118]
This paper proposes a theoretical and computational framework for training and robustness verification of implicit neural networks. We introduce a related embedded network and show that the embedded network can be used to provide an $ell_infty$-norm box over-approximation of the reachable sets of the original network. We apply our algorithms to train implicit neural networks on the MNIST dataset and compare the robustness of our models with the models trained via existing approaches in the literature.
arXiv Detail & Related papers (2022-08-08T03:13:24Z)
Cardinality-Minimal Explanations for Monotonic Neural Networks [25.212444848632515]
In this paper, we investigate whether tractability can be regained by focusing on neural models implementing a monotonic function. Although the relevant decision problems remain intractable, we can show that they become solvable in favourable time.
arXiv Detail & Related papers (2022-05-19T23:47:25Z)
Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks [62.48782506095565]
We show that due to the greedy nature of learning in deep neural networks, models tend to rely on just one modality while under-fitting the other modalities. We propose an algorithm to balance the conditional learning speeds between modalities during training and demonstrate that it indeed addresses the issue of greedy learning.
arXiv Detail & Related papers (2022-02-10T20:11:21Z)
Robust Generalization of Quadratic Neural Networks via Function Identification [19.87036824512198]
Generalization bounds from learning theory often assume that the test distribution is close to the training distribution. We show that for quadratic neural networks, we can identify the function represented by the model even though we cannot identify its parameters.
arXiv Detail & Related papers (2021-09-22T18:02:00Z)
Mitigating Performance Saturation in Neural Marked Point Processes: Architectures and Loss Functions [50.674773358075015]
We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers. We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
arXiv Detail & Related papers (2021-07-07T16:59:14Z)
Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents. One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis. We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z)
Adaptive Explainable Neural Networks (AxNNs) [8.949704905866888]
We develop a new framework called Adaptive Explainable Neural Networks (AxNN) for achieving the dual goals of good predictive performance and model interpretability. For predictive performance, we build a structured neural network made up of ensembles of generalized additive model networks and additive index models. For interpretability, we show how to decompose the results of AxNN into main effects and higher-order interaction effects.
arXiv Detail & Related papers (2020-04-05T23:40:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.