Transformer Dynamics: A neuroscientific approach to interpretability of large language models
- URL: http://arxiv.org/abs/2502.12131v1
- Date: Mon, 17 Feb 2025 18:49:40 GMT
- Title: Transformer Dynamics: A neuroscientific approach to interpretability of large language models
- Authors: Jesseba Fernando, Grigori Guitchounts,
- Abstract summary: We focus on the residual stream (RS) in transformer models, conceptualizing it as a dynamical system evolving across layers.
We find that activations of individual RS units exhibit strong continuity across layers, despite the RS being a non-privileged basis.
In reduced-dimensional spaces, the RS follows a curved trajectory with attractor-like dynamics in the lower layers.
- Score: 0.0
- License:
- Abstract: As artificial intelligence models have exploded in scale and capability, understanding of their internal mechanisms remains a critical challenge. Inspired by the success of dynamical systems approaches in neuroscience, here we propose a novel framework for studying computations in deep learning systems. We focus on the residual stream (RS) in transformer models, conceptualizing it as a dynamical system evolving across layers. We find that activations of individual RS units exhibit strong continuity across layers, despite the RS being a non-privileged basis. Activations in the RS accelerate and grow denser over layers, while individual units trace unstable periodic orbits. In reduced-dimensional spaces, the RS follows a curved trajectory with attractor-like dynamics in the lower layers. These insights bridge dynamical systems theory and mechanistic interpretability, establishing a foundation for a "neuroscience of AI" that combines theoretical rigor with large-scale data analysis to advance our understanding of modern neural networks.
Related papers
- Conservation-informed Graph Learning for Spatiotemporal Dynamics Prediction [84.26340606752763]
In this paper, we introduce the conservation-informed GNN (CiGNN), an end-to-end explainable learning framework.
The network is designed to conform to the general symmetry conservation law via symmetry where conservative and non-conservative information passes over a multiscale space by a latent temporal marching strategy.
Results demonstrate that CiGNN exhibits remarkable baseline accuracy and generalizability, and is readily applicable to learning for prediction of varioustemporal dynamics.
arXiv Detail & Related papers (2024-12-30T13:55:59Z) - A scalable generative model for dynamical system reconstruction from neuroimaging data [5.777167013394619]
Data-driven inference of the generative dynamics underlying a set of observed time series is of growing interest in machine learning.
Recent breakthroughs in training techniques for state space models (SSMs) specifically geared toward dynamical systems reconstruction (DSR) enable to recover the underlying system.
We propose a novel algorithm that solves this problem and scales exceptionally well with model dimensionality and filter length.
arXiv Detail & Related papers (2024-11-05T09:45:57Z) - Artificial Kuramoto Oscillatory Neurons [65.16453738828672]
It has long been known in both neuroscience and AI that ''binding'' between neurons leads to a form of competitive learning.
We introduce Artificial rethinking together with arbitrary connectivity designs such as fully connected convolutional, or attentive mechanisms.
We show that this idea provides performance improvements across a wide spectrum of tasks such as unsupervised object discovery, adversarial robustness, uncertainty, and reasoning.
arXiv Detail & Related papers (2024-10-17T17:47:54Z) - Contrastive Learning in Memristor-based Neuromorphic Systems [55.11642177631929]
Spiking neural networks have become an important family of neuron-based models that sidestep many of the key limitations facing modern-day backpropagation-trained deep networks.
In this work, we design and investigate a proof-of-concept instantiation of contrastive-signal-dependent plasticity (CSDP), a neuromorphic form of forward-forward-based, backpropagation-free learning.
arXiv Detail & Related papers (2024-09-17T04:48:45Z) - Learning System Dynamics without Forgetting [60.08612207170659]
Predicting trajectories of systems with unknown dynamics is crucial in various research fields, including physics and biology.
We present a novel framework of Mode-switching Graph ODE (MS-GODE), which can continually learn varying dynamics.
We construct a novel benchmark of biological dynamic systems, featuring diverse systems with disparate dynamics.
arXiv Detail & Related papers (2024-06-30T14:55:18Z) - Interpretable statistical representations of neural population dynamics and geometry [4.459704414303749]
We introduce a representation learning method, MARBLE, that decomposes on-manifold dynamics into local flow fields and maps them into a common latent space.
In simulated non-linear dynamical systems, recurrent neural networks, and experimental single-neuron recordings from primates and rodents, we discover emergent low-dimensional latent representations.
These representations are consistent across neural networks and animals, enabling the robust comparison of cognitive computations.
arXiv Detail & Related papers (2023-04-06T21:11:04Z) - ConCerNet: A Contrastive Learning Based Framework for Automated
Conservation Law Discovery and Trustworthy Dynamical System Prediction [82.81767856234956]
This paper proposes a new learning framework named ConCerNet to improve the trustworthiness of the DNN based dynamics modeling.
We show that our method consistently outperforms the baseline neural networks in both coordinate error and conservation metrics.
arXiv Detail & Related papers (2023-02-11T21:07:30Z) - Decomposed Linear Dynamical Systems (dLDS) for learning the latent
components of neural dynamics [6.829711787905569]
We propose a new decomposed dynamical system model that represents complex non-stationary and nonlinear dynamics of time series data.
Our model is trained through a dictionary learning procedure, where we leverage recent results in tracking sparse vectors over time.
In both continuous-time and discrete-time instructional examples we demonstrate that our model can well approximate the original system.
arXiv Detail & Related papers (2022-06-07T02:25:38Z) - Learning Continuous Chaotic Attractors with a Reservoir Computer [0.0]
We train a 1000-neuron RNN to abstract a continuous dynamical attractor memory from isolated examples of dynamical attractor memories.
By training the RC on isolated and shifted examples of either stable limit cycles or chaotic Lorenz attractors, the RC learns a continuum of attractors, as quantified by an extra Lyapunov exponent equal to zero.
arXiv Detail & Related papers (2021-10-16T18:07:27Z) - A brain basis of dynamical intelligence for AI and computational
neuroscience [0.0]
More brain-like capacities may demand new theories, models, and methods for designing artificial learning systems.
This article was inspired by our symposium on dynamical neuroscience and machine learning at the 6th Annual US/NIH BRAIN Initiative Investigators Meeting.
arXiv Detail & Related papers (2021-05-15T19:49:32Z) - Limited-angle tomographic reconstruction of dense layered objects by
dynamical machine learning [68.9515120904028]
Limited-angle tomography of strongly scattering quasi-transparent objects is a challenging, highly ill-posed problem.
Regularizing priors are necessary to reduce artifacts by improving the condition of such problems.
We devised a recurrent neural network (RNN) architecture with a novel split-convolutional gated recurrent unit (SC-GRU) as the building block.
arXiv Detail & Related papers (2020-07-21T11:48:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.