Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer
- URL: http://arxiv.org/abs/2507.00683v5
- Date: Tue, 29 Jul 2025 03:58:24 GMT
- Title: Testing the spin-bath view of self-attention: A Hamiltonian analysis of GPT-2 Transformer
- Authors: Satadeep Bhattacharjee, Seung-Cheol Lee,
- Abstract summary: We study the attention mechanism of Large Language Models (LLMs) as an interacting two-body spin system.<n>We derive the corresponding effective Hamiltonian for every attention head from a production-grade GPT-2 model.<n>Our findings provide the first strong empirical evidence for the spin-bath analogy in a production-grade model.
- Score: 1.691971345435238
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recently proposed physics-based framework by Huo and Johnson~\cite{huo2024capturing} models the attention mechanism of Large Language Models (LLMs) as an interacting two-body spin system, offering a first-principles explanation for phenomena like repetition and bias. Building on this hypothesis, we extract the complete Query-Key weight matrices from a production-grade GPT-2 model and derive the corresponding effective Hamiltonian for every attention head. From these Hamiltonians, we obtain analytic phase boundaries and logit gap criteria that predict which token should dominate the next-token distribution for a given context. A systematic evaluation on 144 heads across 20 factual-recall prompts reveals a strong negative correlation between the theoretical logit gaps and the model's empirical token rankings ($r\approx-0.70$, $p<10^{-3}$).Targeted ablations further show that suppressing the heads most aligned with the spin-bath predictions induces the anticipated shifts in output probabilities, confirming a causal link rather than a coincidental association. Taken together, our findings provide the first strong empirical evidence for the spin-bath analogy in a production-grade model. In this work, we utilize the context-field lens, which provides physics-grounded interpretability and motivates the development of novel generative models bridging theoretical condensed matter physics and artificial intelligence.
Related papers
- Graph Stochastic Neural Process for Inductive Few-shot Knowledge Graph Completion [63.68647582680998]
We focus on a task called inductive few-shot knowledge graph completion (I-FKGC)
Inspired by the idea of inductive reasoning, we cast I-FKGC as an inductive reasoning problem.
We present a neural process-based hypothesis extractor that models the joint distribution of hypothesis, from which we can sample a hypothesis for predictions.
In the second module, based on the hypothesis, we propose a graph attention-based predictor to test if the triple in the query set aligns with the extracted hypothesis.
arXiv Detail & Related papers (2024-08-03T13:37:40Z) - SPIN: SE(3)-Invariant Physics Informed Network for Binding Affinity Prediction [3.406882192023597]
Accurate prediction of protein-ligand binding affinity is crucial for drug development.
Traditional methods often fail to accurately model the complex's spatial information.
We propose SPIN, a model that incorporates various inductive biases applicable to this task.
arXiv Detail & Related papers (2024-07-10T08:40:07Z) - Infusing Self-Consistency into Density Functional Theory Hamiltonian Prediction via Deep Equilibrium Models [30.746062388701187]
We introduce a unified neural network architecture, the Deep Equilibrium Density Functional Theory Hamiltonian (DEQH) model.
DEQH model inherently captures the self-consistency nature of Hamiltonian.
We propose a versatile framework that combines DEQ with off-the-shelf machine learning models for predicting Hamiltonians.
arXiv Detail & Related papers (2024-06-06T07:05:58Z) - CogDPM: Diffusion Probabilistic Models via Cognitive Predictive Coding [62.075029712357]
This work introduces the Cognitive Diffusion Probabilistic Models (CogDPM)
CogDPM features a precision estimation method based on the hierarchical sampling capabilities of diffusion models and weight the guidance with precision weights estimated by the inherent property of diffusion models.
We apply CogDPM to real-world prediction tasks using the United Kindom precipitation and surface wind datasets.
arXiv Detail & Related papers (2024-05-03T15:54:50Z) - NeoSySPArtaN: A Neuro-Symbolic Spin Prediction Architecture for
higher-order multipole waveforms from eccentric Binary Black Hole mergers
using Numerical Relativity [0.0]
We present a novel Neuro-Symbolic Architecture (NSA) that combines the power of neural networks and symbolic regression.
Our results provide a robust and interpretable framework for predicting spin magnitudes in mergers.
arXiv Detail & Related papers (2023-07-20T16:30:51Z) - Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative
Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models.
In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z) - Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures [93.17009514112702]
Pruning, setting a significant subset of the parameters of a neural network to zero, is one of the most popular methods of model compression.
Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood.
arXiv Detail & Related papers (2023-04-25T07:42:06Z) - Capturing dynamical correlations using implicit neural representations [85.66456606776552]
We develop an artificial intelligence framework which combines a neural network trained to mimic simulated data from a model Hamiltonian with automatic differentiation to recover unknown parameters from experimental data.
In doing so, we illustrate the ability to build and train a differentiable model only once, which then can be applied in real-time to multi-dimensional scattering data.
arXiv Detail & Related papers (2023-04-08T07:55:36Z) - Entangling dynamics from effective rotor/spin-wave separation in
U(1)-symmetric quantum spin models [0.0]
Non-equilibrium dynamics of quantum spin models is a most challenging topic, due to the exponentiality of Hilbert space.
A particularly important class of evolutions is the one governed by U(1) symmetric Hamiltonians.
We show that the dynamics of the OAT model can be closely reproduced by systems with power-lawdecaying interactions.
arXiv Detail & Related papers (2023-02-18T09:37:45Z) - A statistical approach to topological entanglement: Boltzmann machine
representation of high-order irreducible correlation [6.430262211852815]
A quantum analog of high-order correlations is the topological entanglement in topologically ordered states of matter at zero temperature.
In this work, we propose a statistical interpretation that unifies the two under the same information-theoretic framework.
arXiv Detail & Related papers (2023-02-07T02:49:21Z) - Beyond the Universal Law of Robustness: Sharper Laws for Random Features
and Neural Tangent Kernels [14.186776881154127]
This paper focuses on empirical risk minimization in two settings, namely, random features and the neural tangent kernel (NTK)
We prove that, for random features, the model is not robust for any degree of over- parameterization, even when the necessary condition coming from the universal law of robustness is satisfied.
Our results are corroborated by numerical evidence on both synthetic and standard prototypical datasets.
arXiv Detail & Related papers (2023-02-03T09:58:31Z) - Modeling the space-time correlation of pulsed twin beams [68.8204255655161]
Entangled twin-beams generated by parametric down-conversion are among the favorite sources for imaging-oriented applications.
We propose a semi-analytic model which aims to bridge the gap between time-consuming numerical simulations and the unrealistic plane-wave pump theory.
arXiv Detail & Related papers (2023-01-18T11:29:49Z) - Double Robust Representation Learning for Counterfactual Prediction [68.78210173955001]
We propose a novel scalable method to learn double-robust representations for counterfactual predictions.
We make robust and efficient counterfactual predictions for both individual and average treatment effects.
The algorithm shows competitive performance with the state-of-the-art on real world and synthetic data.
arXiv Detail & Related papers (2020-10-15T16:39:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.