Studying Small Language Models with Susceptibilities
- URL: http://arxiv.org/abs/2504.18274v1
- Date: Fri, 25 Apr 2025 11:39:32 GMT
- Title: Studying Small Language Models with Susceptibilities
- Authors: Garrett Baker, George Wang, Jesse Hoogland, Daniel Murfet,
- Abstract summary: We develop a framework for interpretability that treats a neural network as a Bayesian statistical mechanical system.<n>A small, controlled perturbation of the data distribution induces a first-order change in the posterior expectation of an observable localized on a chosen component of the network.<n>The resulting susceptibility can be estimated efficiently with local SGLD samples and factorizes into signed, per-token contributions.
- Score: 0.5242869847419834
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We develop a linear response framework for interpretability that treats a neural network as a Bayesian statistical mechanical system. A small, controlled perturbation of the data distribution, for example shifting the Pile toward GitHub or legal text, induces a first-order change in the posterior expectation of an observable localized on a chosen component of the network. The resulting susceptibility can be estimated efficiently with local SGLD samples and factorizes into signed, per-token contributions that serve as attribution scores. Building a set of perturbations (probes) yields a response matrix whose low-rank structure separates functional modules such as multigram and induction heads in a 3M-parameter transformer. Susceptibilities link local learning coefficients from singular learning theory with linear-response theory, and quantify how local loss landscape geometry deforms under shifts in the data distribution.
Related papers
- Deep learning with missing data [3.829599191332801]
We propose Pattern Embedded Neural Networks (PENNs), which can be applied in conjunction with any existing imputation technique.<n>In addition to a neural network trained on the imputed data, PENNs pass the vectors of observation indicators through a second neural network to provide a compact representation.<n>The outputs are then combined in a third neural network to produce final predictions.
arXiv Detail & Related papers (2025-04-21T18:57:36Z) - Block-local learning with probabilistic latent representations [2.839567756494814]
Locking and weight transport are problems because they prevent efficient parallelization and horizontal scaling of the training process.
We propose a new method to address both these problems and scale up the training of large models.
We present results on a variety of tasks and architectures, demonstrating state-of-the-art performance using block-local learning.
arXiv Detail & Related papers (2023-05-24T10:11:30Z) - Neural networks trained with SGD learn distributions of increasing
complexity [78.30235086565388]
We show that neural networks trained using gradient descent initially classify their inputs using lower-order input statistics.
We then exploit higher-order statistics only later during training.
We discuss the relation of DSB to other simplicity biases and consider its implications for the principle of universality in learning.
arXiv Detail & Related papers (2022-11-21T15:27:22Z) - Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters.
We find that our approach successfully generates parameters for a wide range of loss prompts.
We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Slope and generalization properties of neural networks [0.0]
We show that the distribution of the slope of a well-trained neural network classifier is generally independent of the width of the layers in a fully connected network.
The slope is of similar size throughout the relevant volume, and varies smoothly. It also behaves as predicted in rescaling examples.
We discuss possible applications of the slope concept, such as using it as a part of the loss function or stopping criterion during network training, or ranking data sets in terms of their complexity.
arXiv Detail & Related papers (2021-07-03T17:54:27Z) - Prequential MDL for Causal Structure Learning with Neural Networks [9.669269791955012]
We show that the prequential minimum description length principle can be used to derive a practical scoring function for Bayesian networks.
We obtain plausible and parsimonious graph structures without relying on sparsity inducing priors or other regularizers which must be tuned.
We discuss how the the prequential score relates to recent work that infers causal structure from the speed of adaptation when the observations come from a source undergoing distributional shift.
arXiv Detail & Related papers (2021-07-02T22:35:21Z) - FF-NSL: Feed-Forward Neural-Symbolic Learner [70.978007919101]
This paper introduces a neural-symbolic learning framework, called Feed-Forward Neural-Symbolic Learner (FF-NSL)
FF-NSL integrates state-of-the-art ILP systems based on the Answer Set semantics, with neural networks, in order to learn interpretable hypotheses from labelled unstructured data.
arXiv Detail & Related papers (2021-06-24T15:38:34Z) - Locally Sparse Networks for Interpretable Predictions [7.362415721170984]
We propose a framework for training locally sparse neural networks where the local sparsity is learned via a sample-specific gating mechanism.
The sample-specific sparsity is predicted via a textitgating network, which is trained in tandem with the textitprediction network.
We demonstrate that our method outperforms state-of-the-art models when predicting the target function with far fewer features per instance.
arXiv Detail & Related papers (2021-06-11T15:46:50Z) - Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex.
This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z) - A simple normative network approximates local non-Hebbian learning in
the cortex [12.940770779756482]
Neuroscience experiments demonstrate that the processing of sensory inputs by cortical neurons is modulated by instructive signals.
Here, adopting a normative approach, we model these instructive signals as supervisory inputs guiding the projection of the feedforward data.
Online algorithms can be implemented by neural networks whose synaptic learning rules resemble calcium plateau potential dependent plasticity observed in the cortex.
arXiv Detail & Related papers (2020-10-23T20:49:44Z) - Learning What Makes a Difference from Counterfactual Examples and
Gradient Supervision [57.14468881854616]
We propose an auxiliary training objective that improves the generalization capabilities of neural networks.
We use pairs of minimally-different examples with different labels, a.k.a counterfactual or contrasting examples, which provide a signal indicative of the underlying causal structure of the task.
Models trained with this technique demonstrate improved performance on out-of-distribution test sets.
arXiv Detail & Related papers (2020-04-20T02:47:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.