Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians
- URL: http://arxiv.org/abs/2505.19458v4
- Date: Wed, 05 Nov 2025 03:57:36 GMT
- Title: Recurrent Self-Attention Dynamics: An Energy-Agnostic Perspective from Jacobians
- Authors: Akiyoshi Tomihari, Ryo Karakida,
- Abstract summary: This work aims to relax energy constraints and provide an energy-agnostic characterization of inference dynamics.<n>It reveals that the normalization layer plays an essential role in suppressing the Lipschitzness of SA and the Jacobian's complex eigenvalues.<n>The Jacobian perspective also enables us to develop regularization methods for training and a pseudo-energy for monitoring inference dynamics.
- Score: 13.435505794863518
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The theoretical understanding of self-attention (SA) has been steadily progressing. A prominent line of work studies a class of SA layers that admit an energy function decreased by state updates. While it provides valuable insights into inherent biases in signal propagation, it often relies on idealized assumptions or additional constraints not necessarily present in standard SA. Thus, to broaden our understanding, this work aims to relax these energy constraints and provide an energy-agnostic characterization of inference dynamics by dynamical systems analysis. In more detail, we first consider relaxing the symmetry and single-head constraints traditionally required in energy-based formulations. Next, we show that analyzing the Jacobian matrix of the state is highly valuable when investigating more general SA architectures without necessarily admitting an energy function. It reveals that the normalization layer plays an essential role in suppressing the Lipschitzness of SA and the Jacobian's complex eigenvalues, which correspond to the oscillatory components of the dynamics. In addition, the Lyapunov exponents computed from the Jacobians demonstrate that the normalized dynamics lie close to a critical state, and this criticality serves as a strong indicator of high inference performance. Furthermore, the Jacobian perspective also enables us to develop regularization methods for training and a pseudo-energy for monitoring inference dynamics.
Related papers
- KoopGen: Koopman Generator Networks for Representing and Predicting Dynamical Systems with Continuous Spectra [65.11254608352982]
We introduce a generator-based neural Koopman framework that models dynamics through a structured, state-dependent representation of Koopman generators.<n>By exploiting the intrinsic Cartesian decomposition into skew-adjoint and self-adjoint components, KoopGen separates conservative transport from irreversible dissipation.
arXiv Detail & Related papers (2026-02-15T06:32:23Z) - Intrinsic-Energy Joint Embedding Predictive Architectures Induce Quasimetric Spaces [0.764671395172401]
Joint-Embedding Predictive Architectures (JEPAs) aim to learn representations by predicting target embeddings from context embeddings.<n>Quasimetric Reinforcement Learning (QRL) studies goal-conditioned control through directed distance values (cost-to-go) that support reaching goals under asymmetric dynamics.
arXiv Detail & Related papers (2026-02-12T18:30:27Z) - State Rank Dynamics in Linear Attention LLMs [37.607046806053035]
State Rank Stratification is characterized by a distinct spectral bifurcation among linear attention heads.<n>Low-rank heads are indispensable for model reasoning, whereas high-rank heads exhibit significant redundancy.<n>We propose Joint Rank-Norm Pruning, a zero-shot strategy that achieves a 38.9% reduction in KV-cache overhead while largely maintaining model accuracy.
arXiv Detail & Related papers (2026-02-02T15:00:42Z) - Constraint Breeds Generalization: Temporal Dynamics as an Inductive Bias [1.219017431258669]
We show that constraints shape dynamics to function not as limitations, but as a temporal inductive bias that breeds generalization.<n>We show that robust AI development requires not only scaling and removing limitations, but computationally mastering the temporal characteristics that naturally promote generalization.
arXiv Detail & Related papers (2025-12-30T00:34:24Z) - ECO: Energy-Constrained Operator Learning for Chaotic Dynamics with Boundedness Guarantees [3.2740680236631636]
We introduce the Energy-Constrained Operator (ECO) that simultaneously learns the system dynamics while enforcing boundedness in predictions.<n>To our knowledge, this is the first work establishing such formal guarantees for data-driven chaotic dynamics models.<n>We demonstrate empirical success in ECO's ability to generate stable long-horizon forecasts.
arXiv Detail & Related papers (2025-12-01T18:42:02Z) - Self-Organization and Spectral Mechanism of Attractor Landscapes in High-Capacity Kernel Hopfield Networks [0.0]
Kernel-based learning can dramatically increase the storage capacity of Hopfield networks.<n>We show that optimal performance is achieved by tuning the system to a spectral "Goldilocks zone" between rank collapse and diffusion.
arXiv Detail & Related papers (2025-11-17T06:58:34Z) - Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning [55.59724323303857]
We propose a framework that balances exploration and exploitation via three components: difficulty-aware coefficient allocation, initial-anchored target entropy, and dynamic global coefficient adjustment.<n>Experiments on multiple mathematical reasoning benchmarks show that AER consistently outperforms baselines, improving both reasoning accuracy and exploration capability.
arXiv Detail & Related papers (2025-10-13T03:10:26Z) - Quantum Simulation of Dynamical Response Functions of Equilibrium States [0.29998889086656577]
The computation of dynamical response functions is central to many problems in condensed matter physics.<n>Existing approaches often assume access to the equilibrium state, which may be difficult to prepare in practice.<n>We present a method that circumvents this by using energy filter techniques.
arXiv Detail & Related papers (2025-05-08T16:52:11Z) - Dynamics of Open Quantum Systems with Initial System-Environment Correlations via Stochastic Unravelings [0.0]
In open quantum systems, the reduced dynamics is described starting from the assumption that the system and the environment are initially uncorrelated.<n>For the uncorrelated scenario, unravelings are a powerful tool to simulate the dynamics, but so far they have not been used in the most general case in which correlations are initially present.<n>In our work, we employ the bath positive (B+) or one-sided positive decomposition formalism as a starting point to generalize unraveling in the presence of initial correlations.
arXiv Detail & Related papers (2025-02-18T12:26:32Z) - Reinforcement Learning under Latent Dynamics: Toward Statistical and Algorithmic Modularity [51.40558987254471]
Real-world applications of reinforcement learning often involve environments where agents operate on complex, high-dimensional observations.
This paper addresses the question of reinforcement learning under $textitgeneral$ latent dynamics from a statistical and algorithmic perspective.
arXiv Detail & Related papers (2024-10-23T14:22:49Z) - A link between static and dynamical perturbation theory [0.48951183832371004]
We show the role of emergent time as a vital link between time-independent and time-dependent theory in quantum mechanics.
Based on our results, we envision future applications for the calculation of dynamical phenomena based on a single pure energy eigenstate.
arXiv Detail & Related papers (2024-05-14T09:01:30Z) - Learning Interpretable Policies in Hindsight-Observable POMDPs through
Partially Supervised Reinforcement Learning [57.67629402360924]
We introduce the Partially Supervised Reinforcement Learning (PSRL) framework.
At the heart of PSRL is the fusion of both supervised and unsupervised learning.
We show that PSRL offers a potent balance, enhancing model interpretability while preserving, and often significantly outperforming, the performance benchmarks set by traditional methods.
arXiv Detail & Related papers (2024-02-14T16:23:23Z) - TANGO: Time-Reversal Latent GraphODE for Multi-Agent Dynamical Systems [43.39754726042369]
We propose a simple-yet-effective self-supervised regularization term as a soft constraint that aligns the forward and backward trajectories predicted by a continuous graph neural network-based ordinary differential equation (GraphODE)
It effectively imposes time-reversal symmetry to enable more accurate model predictions across a wider range of dynamical systems under classical mechanics.
Experimental results on a variety of physical systems demonstrate the effectiveness of our proposed method.
arXiv Detail & Related papers (2023-10-10T08:52:16Z) - On the energetic analysis of autonomous quantum systems [0.0]
This thesis focuses on the energetic analysis within autonomous quantum systems.
We propose a novel and general formalism for a dynamic description of the energy exchanges between interacting subsystems.
arXiv Detail & Related papers (2022-11-14T15:14:00Z) - Out-of-time-order correlations and the fine structure of eigenstate
thermalisation [58.720142291102135]
Out-of-time-orderors (OTOCs) have become established as a tool to characterise quantum information dynamics and thermalisation.
We show explicitly that the OTOC is indeed a precise tool to explore the fine details of the Eigenstate Thermalisation Hypothesis (ETH)
We provide an estimation of the finite-size scaling of $omega_textrmGOE$ for the general class of observables composed of sums of local operators in the infinite-temperature regime.
arXiv Detail & Related papers (2021-03-01T17:51:46Z) - Stochastically forced ensemble dynamic mode decomposition for
forecasting and analysis of near-periodic systems [65.44033635330604]
We introduce a novel load forecasting method in which observed dynamics are modeled as a forced linear system.
We show that its use of intrinsic linear dynamics offers a number of desirable properties in terms of interpretability and parsimony.
Results are presented for a test case using load data from an electrical grid.
arXiv Detail & Related papers (2020-10-08T20:25:52Z) - Probing eigenstate thermalization in quantum simulators via
fluctuation-dissipation relations [77.34726150561087]
The eigenstate thermalization hypothesis (ETH) offers a universal mechanism for the approach to equilibrium of closed quantum many-body systems.
Here, we propose a theory-independent route to probe the full ETH in quantum simulators by observing the emergence of fluctuation-dissipation relations.
Our work presents a theory-independent way to characterize thermalization in quantum simulators and paves the way to quantum simulate condensed matter pump-probe experiments.
arXiv Detail & Related papers (2020-07-20T18:00:02Z) - On dissipative symplectic integration with applications to
gradient-based optimization [77.34726150561087]
We propose a geometric framework in which discretizations can be realized systematically.
We show that a generalization of symplectic to nonconservative and in particular dissipative Hamiltonian systems is able to preserve rates of convergence up to a controlled error.
arXiv Detail & Related papers (2020-04-15T00:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.