Understanding Task Representations in Neural Networks via Bayesian Ablation
- URL: http://arxiv.org/abs/2505.13742v1
- Date: Mon, 19 May 2025 21:36:09 GMT
- Title: Understanding Task Representations in Neural Networks via Bayesian Ablation
- Authors: Andrew Nam, Declan Campbell, Thomas Griffiths, Jonathan Cohen, Sarah-Jane Leslie,
- Abstract summary: We introduce a novel probabilistic framework for interpreting latent task representations in neural networks.<n>Inspired by Bayesian inference, our approach defines a distribution over representational units to infer their causal contributions to task performance.
- Score: 1.3980986259786223
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural networks are powerful tools for cognitive modeling due to their flexibility and emergent properties. However, interpreting their learned representations remains challenging due to their sub-symbolic semantics. In this work, we introduce a novel probabilistic framework for interpreting latent task representations in neural networks. Inspired by Bayesian inference, our approach defines a distribution over representational units to infer their causal contributions to task performance. Using ideas from information theory, we propose a suite of tools and metrics to illuminate key model properties, including representational distributedness, manifold complexity, and polysemanticity.
Related papers
- Concept-Guided Interpretability via Neural Chunking [54.73787666584143]
We show that neural networks exhibit patterns in their raw population activity that mirror regularities in the training data.<n>We propose three methods to extract these emerging entities, complementing each other based on label availability and dimensionality.<n>Our work points to a new direction for interpretability, one that harnesses both cognitive principles and the structure of naturalistic data.
arXiv Detail & Related papers (2025-05-16T13:49:43Z) - A Comprehensive Survey on Self-Interpretable Neural Networks [36.0575431131253]
Self-interpretable neural networks inherently reveal the prediction rationale through the model structures.<n>We first collect and review existing works on self-interpretable neural networks and provide a structured summary of their methodologies.<n>We also present concrete, visualized examples of model explanations and discuss their applicability across diverse scenarios.
arXiv Detail & Related papers (2025-01-26T18:50:16Z) - Identifying Sub-networks in Neural Networks via Functionally Similar Representations [41.028797971427124]
We take a step toward automating the understanding of the network by investigating the existence of distinct sub-networks.<n>Specifically, we explore a novel automated and task-agnostic approach based on the notion of functionally similar representations within neural networks.<n>We show the proposed approach offers meaningful insights into the behavior of neural networks with minimal human and computational cost.
arXiv Detail & Related papers (2024-10-21T20:19:00Z) - Coding schemes in neural networks learning classification tasks [52.22978725954347]
We investigate fully-connected, wide neural networks learning classification tasks.
We show that the networks acquire strong, data-dependent features.
Surprisingly, the nature of the internal representations depends crucially on the neuronal nonlinearity.
arXiv Detail & Related papers (2024-06-24T14:50:05Z) - Semantic Loss Functions for Neuro-Symbolic Structured Prediction [74.18322585177832]
We discuss the semantic loss, which injects knowledge about such structure, defined symbolically, into training.
It is agnostic to the arrangement of the symbols, and depends only on the semantics expressed thereby.
It can be combined with both discriminative and generative neural models.
arXiv Detail & Related papers (2024-05-12T22:18:25Z) - Feature emergence via margin maximization: case studies in algebraic
tasks [4.401622714202886]
We show that trained neural networks employ features corresponding to irreducible group-theoretic representations to perform compositions in general groups.
More generally, we hope our techniques can help to foster a deeper understanding of why neural networks adopt specific computational strategies.
arXiv Detail & Related papers (2023-11-13T18:56:33Z) - Discrete, compositional, and symbolic representations through attractor dynamics [51.20712945239422]
We introduce a novel neural systems model that integrates attractor dynamics with symbolic representations to model cognitive processes akin to the probabilistic language of thought (PLoT)
Our model segments the continuous representational space into discrete basins, with attractor states corresponding to symbolic sequences, that reflect the semanticity and compositionality characteristic of symbolic systems through unsupervised learning, rather than relying on pre-defined primitives.
This approach establishes a unified framework that integrates both symbolic and sub-symbolic processing through neural dynamics, a neuroplausible substrate with proven expressivity in AI, offering a more comprehensive model that mirrors the complex duality of cognitive operations
arXiv Detail & Related papers (2023-10-03T05:40:56Z) - Sparse Relational Reasoning with Object-Centric Representations [78.83747601814669]
We investigate the composability of soft-rules learned by relational neural architectures when operating over object-centric representations.
We find that increasing sparsity, especially on features, improves the performance of some models and leads to simpler relations.
arXiv Detail & Related papers (2022-07-15T14:57:33Z) - Dynamic Inference with Neural Interpreters [72.90231306252007]
We present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules.
inputs to the model are routed through a sequence of functions in a way that is end-to-end learned.
We show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner.
arXiv Detail & Related papers (2021-10-12T23:22:45Z) - Making Sense of CNNs: Interpreting Deep Representations & Their
Invariances with INNs [19.398202091883366]
We present an approach based on INNs that (i) recovers the task-specific, learned invariances by disentangling the remaining factor of variation in the data and that (ii) invertibly transforms these invariances combined with the model representation into an equally expressive one with accessible semantic concepts.
Our invertible approach significantly extends the abilities to understand black box models by enabling post-hoc interpretations of state-of-the-art networks without compromising their performance.
arXiv Detail & Related papers (2020-08-04T19:27:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.