How Do Transformers "Do" Physics? Investigating the Simple Harmonic Oscillator
- URL: http://arxiv.org/abs/2405.17209v1
- Date: Thu, 23 May 2024 01:14:22 GMT
- Title: How Do Transformers "Do" Physics? Investigating the Simple Harmonic Oscillator
- Authors: Subhash Kantamneni, Ziming Liu, Max Tegmark,
- Abstract summary: We investigate the simple harmonic oscillator (SHO), one of the most fundamental systems in physics.
We identify the methods transformers use to model the SHO, and to do so we hypothesize and evaluate possible methods by analyzing the encoding of these methods' intermediates.
Our analysis framework can conveniently extend to high-dimensional linear systems and nonlinear systems, which we hope will help reveal the "world model" hidden in transformers.
- Score: 15.01642959193149
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: How do transformers model physics? Do transformers model systems with interpretable analytical solutions, or do they create "alien physics" that are difficult for humans to decipher? We take a step in demystifying this larger puzzle by investigating the simple harmonic oscillator (SHO), $\ddot{x}+2\gamma \dot{x}+\omega_0^2x=0$, one of the most fundamental systems in physics. Our goal is to identify the methods transformers use to model the SHO, and to do so we hypothesize and evaluate possible methods by analyzing the encoding of these methods' intermediates. We develop four criteria for the use of a method within the simple testbed of linear regression, where our method is $y = wx$ and our intermediate is $w$: (1) Can the intermediate be predicted from hidden states? (2) Is the intermediate's encoding quality correlated with model performance? (3) Can the majority of variance in hidden states be explained by the intermediate? (4) Can we intervene on hidden states to produce predictable outcomes? Armed with these two correlational (1,2), weak causal (3) and strong causal (4) criteria, we determine that transformers use known numerical methods to model trajectories of the simple harmonic oscillator, specifically the matrix exponential method. Our analysis framework can conveniently extend to high-dimensional linear systems and nonlinear systems, which we hope will help reveal the "world model" hidden in transformers.
Related papers
- Automated Discovery of Conservation Laws via Hybrid Neural ODE-Transformers [0.0]
We propose a hybrid framework to automate the discovery of conserved quantities from noisy trajectory data.<n>Our approach integrates three components: (1) a Neural Ordinary Differential Equation that learns a continuous model of the system's dynamics, (2) a Transformer that generates symbolic candidate invariants conditioned on the learned vector field, and (3) a symbolic-numeric verifier that provides a strong numerical certificate for the validity of these candidates.
arXiv Detail & Related papers (2025-10-30T17:32:04Z) - When Do Transformers Learn Heuristics for Graph Connectivity? [33.73385470817422]
We prove that an $L$-layer model has capacity to solve for graphs with diameters up to exactly $3L$.<n>We analyze the training-dynamics, and show that the learned strategy hinges on whether most training instances are within this model capacity.
arXiv Detail & Related papers (2025-10-22T16:43:32Z) - FFT-Accelerated Auxiliary Variable MCMC for Fermionic Lattice Models: A Determinant-Free Approach with $O(N\log N)$ Complexity [52.3171766248012]
We introduce a Markov Chain Monte Carlo (MCMC) algorithm that dramatically accelerates the simulation of quantum many-body systems.<n>We validate our algorithm on benchmark quantum physics problems, accurately reproducing known theoretical results.<n>Our work provides a powerful tool for large-scale probabilistic inference and opens avenues for physics-inspired generative models.
arXiv Detail & Related papers (2025-10-13T07:57:21Z) - Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls [54.57326125204404]
Language models are increasingly capable, yet still fail at a seemingly simple task of multi-digit multiplication.<n>We study why, by reverse-engineering a model that successfully learns multiplication via emphimplicit chain-of-thought'
arXiv Detail & Related papers (2025-09-30T19:03:26Z) - MathBode: Understanding LLM Reasoning with Dynamical Systems [0.0]
We present MathBode, a dynamic diagnostic for mathematical reasoning in large language models (LLMs)<n>MathBode treats each parametric problem as a system: we drive a single parameter sinusoidally and fit first-harmonic responses of model outputs and exact solutions.<n>Across five closed-form families, the diagnostic surfaces systematic low-pass behavior and growing phase lag that accuracy alone obscures.
arXiv Detail & Related papers (2025-09-27T06:06:36Z) - Two failure modes of deep transformers and how to avoid them: a unified theory of signal propagation at initialisation [8.973965016201822]
Finding the right initialisation for neural networks is crucial to ensure smooth training and good performance.<n>In transformers, the wrong initialisation can lead to one of two failure modes of self-attention layers: rank collapse, where all tokens collapse into similar representations, and entropy collapse, where highly concentrated attention scores lead to instability.<n>Here, we provide an analytical theory of signal propagation through deep transformers with self-attention, layer normalisation, skip connections and gradients.
arXiv Detail & Related papers (2025-05-30T08:18:23Z) - (How) Can Transformers Predict Pseudo-Random Numbers? [7.201095605457193]
We study the ability of Transformers to learn pseudo-random number sequences from linear congruential generators (LCGs)
Our analysis reveals that Transformers can perform in-context prediction of LCG sequences with unseen moduli ($m$) and parameters ($a,c$)
arXiv Detail & Related papers (2025-02-14T18:59:40Z) - Can Transformers In-Context Learn Behavior of a Linear Dynamical System? [13.331659934508764]
We investigate whether transformers can learn to track a random process when given observations of a related process and parameters that relate them as context.
A further study of the transformer's robustness reveals that its performance is retained even if the model's parameters are partially withheld.
arXiv Detail & Related papers (2024-10-21T22:18:10Z) - Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning? [69.4145579827826]
We show a fast flow on the regression loss despite the gradient non-ity algorithms for our convergence landscape.
This is the first theoretical analysis for multi-layer Transformer in this setting.
arXiv Detail & Related papers (2024-10-10T18:29:05Z) - Can Transformers Learn $n$-gram Language Models? [77.35809823602307]
We study transformers' ability to learn random $n$-gram LMs of two kinds.
We find that classic estimation techniques for $n$-gram LMs such as add-$lambda$ smoothing outperform transformers.
arXiv Detail & Related papers (2024-10-03T21:21:02Z) - Can Transformers Do Enumerative Geometry? [44.99833362998488]
We introduce a new paradigm in computational enumerative geometry in analyzing the $psi$-class intersection numbers on the moduli space of curves.
We develop a Transformer-based model for computing $psi$-class intersection numbers based on the underlying quantum Airy structure.
We go beyond merely computing intersection numbers and explore the enumerative "world-model" of the Transformers.
arXiv Detail & Related papers (2024-08-27T09:44:01Z) - How do Transformers perform In-Context Autoregressive Learning? [76.18489638049545]
We train a Transformer model on a simple next token prediction task.
We show how a trained Transformer predicts the next token by first learning $W$ in-context, then applying a prediction mapping.
arXiv Detail & Related papers (2024-02-08T16:24:44Z) - Attention with Markov: A Framework for Principled Analysis of Transformers via Markov Chains [45.84704083061562]
We introduce a new framework for a principled analysis of transformers via Markov chains.<n>We show the existence of global minima (bigram) and bad local minima (unigram) contingent on data properties and model architecture.
arXiv Detail & Related papers (2024-02-06T17:18:59Z) - Setting the Record Straight on Transformer Oversmoothing [35.125957267464756]
As model depth increases, Transformers oversmooth, i.e., inputs become more and more similar.
We show that smoothing behavior depends on the eigenspectrum of the value and projection weights.
Our analysis reveals a simple way to parameterize the weights of the Transformer update equations to influence smoothing behavior.
arXiv Detail & Related papers (2024-01-09T01:19:03Z) - Transformers as Algorithms: Generalization and Implicit Model Selection
in In-context Learning [23.677503557659705]
In-context learning (ICL) is a type of prompting where a transformer model operates on a sequence of examples and performs inference on-the-fly.
We treat the transformer model as a learning algorithm that can be specialized via training to implement-at inference-time-another target algorithm.
We show that transformers can act as an adaptive learning algorithm and perform model selection across different hypothesis classes.
arXiv Detail & Related papers (2023-01-17T18:31:12Z) - Transformers learn in-context by gradient descent [58.24152335931036]
Training Transformers on auto-regressive objectives is closely related to gradient-based meta-learning formulations.
We show how trained Transformers become mesa-optimizers i.e. learn models by gradient descent in their forward pass.
arXiv Detail & Related papers (2022-12-15T09:21:21Z) - What do Toothbrushes do in the Kitchen? How Transformers Think our World
is Structured [137.83584233680116]
We investigate what extent transformer-based language models allow for extracting knowledge about object relations.
We show that the models combined with the different similarity measures differ greatly in terms of the amount of knowledge they allow for extracting.
Surprisingly, static models perform almost as well as contextualized models -- in some cases even better.
arXiv Detail & Related papers (2022-04-12T10:00:20Z) - Pathologies in priors and inference for Bayesian transformers [71.97183475225215]
No successful attempts to improve transformer models in terms of predictive uncertainty using Bayesian inference exist.
We find that weight-space inference in transformers does not work well, regardless of the approximate posterior.
We propose a novel method based on the implicit reparameterization of the Dirichlet distribution to apply variational inference directly to the attention weights.
arXiv Detail & Related papers (2021-10-08T10:35:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.