Related papers: Attention in Krylov Space

Attention in Krylov Space

URL: http://arxiv.org/abs/2601.07937v1
Date: Mon, 12 Jan 2026 19:07:22 GMT
Title: Attention in Krylov Space
Authors: Zihao Qi, Christopher Earls,
Abstract summary: We introduce a transformer-based model to predict Lanczos coefficients from short prefixes.<n>For both classical and quantum systems, our machine-learning model outperforms fits, in both coefficient and physical observable reconstruction.<n>Our model also transfers across system sizes: it can be trained on smaller systems and then be used to extrapolate coefficients on a larger system without retraining.
Score: 0.14259046373656994
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Universal Operator Growth Hypothesis formulates time evolution of operators through Lanczos coefficients. In practice, however, numerical instability and memory cost limit the number of coefficients that can be computed exactly. In response to these challenges, the standard approach relies on fitting early coefficients to asymptotic forms, but such procedures can miss subleading, history-dependent structures in the coefficients that subsequently affect reconstructed observables. In this work, we treat the Lanczos coefficients as a causal time sequence and introduce a transformer-based model to autoregressively predict future Lanczos coefficients from short prefixes. For both classical and quantum systems, our machine-learning model outperforms asymptotic fits, in both coefficient extrapolation and physical observable reconstruction, by achieving an order-of-magnitude reduction in error. Our model also transfers across system sizes: it can be trained on smaller systems and then be used to extrapolate coefficients on a larger system without retraining. By probing the learned attention patterns and performing targeted attention ablations, we identify which portions of the coefficient history are most influential for accurate forecasts.

Related papers

The Coverage Principle: How Pre-Training Enables Post-Training [70.25788947586297]
We study how pre-training shapes the success of the final model.<n>We uncover a mechanism that explains the power of coverage in predicting downstream performance.
arXiv Detail & Related papers (2025-10-16T17:53:50Z)
A Simple Approximate Bayesian Inference Neural Surrogate for Stochastic Petri Net Models [0.0]
We introduce a neural-network-based approximation of the posterior distribution framework.<n>Our model employs a lightweight 1D Convolutional Residual Network trained end-to-end on Gillespie-simulated SPN realizations.<n>On synthetic SPNs with 20% missing events, our surrogate recovers rate-function coefficients with an RMSE = 0.108 and substantially runs faster than traditional Bayesian approaches.
arXiv Detail & Related papers (2025-07-14T18:31:19Z)
Lost in Retraining: Roaming the Parameter Space of Exponential Families Under Closed-Loop Learning [0.0]
We study closed-loop learning for models that belong to exponential families.<n>We show that maximum likelihood of the parameters endows sufficient statistics with the martingale property.<n>We show that this outcome may be prevented if the data contains at least one data point generated from a ground truth model.
arXiv Detail & Related papers (2025-06-25T17:12:22Z)
Lanczos-Pascal approach to correlation functions in chaotic quantum systems [0.0]
We suggest a method to compute approximations to temporal correlation functions of few-body observables in chaotic many-body systems.<n>We numerically find and analytically argue that the convergence is rather quick, if the Lanczos coefficients exhibit a smoothly increasing structure.
arXiv Detail & Related papers (2025-03-21T22:05:03Z)
Scaling and renormalization in high-dimensional regression [72.59731158970894]
We present a unifying perspective on recent results on ridge regression.<n>We use the basic tools of random matrix theory and free probability, aimed at readers with backgrounds in physics and deep learning.<n>Our results extend and provide a unifying perspective on earlier models of scaling laws.
arXiv Detail & Related papers (2024-05-01T15:59:00Z)
On the Impact of Overparameterization on the Training of a Shallow Neural Network in High Dimensions [0.0]
We study the training dynamics of a shallow neural network with quadratic activation functions and quadratic cost. In line with previous works on the same neural architecture, the optimization is performed following the gradient flow on the population risk.
arXiv Detail & Related papers (2023-11-07T08:20:31Z)
A U-turn on Double Descent: Rethinking Parameter Counting in Statistical Learning [68.76846801719095]
We show that double descent appears exactly when and where it occurs, and that its location is not inherently tied to the threshold p=n. This provides a resolution to tensions between double descent and statistical intuition.
arXiv Detail & Related papers (2023-10-29T12:05:39Z)
Structured Radial Basis Function Network: Modelling Diversity for Multiple Hypotheses Prediction [51.82628081279621]
Multi-modal regression is important in forecasting nonstationary processes or with a complex mixture of distributions. A Structured Radial Basis Function Network is presented as an ensemble of multiple hypotheses predictors for regression problems. It is proved that this structured model can efficiently interpolate this tessellation and approximate the multiple hypotheses target distribution.
arXiv Detail & Related papers (2023-09-02T01:27:53Z)
The Local Learning Coefficient: A Singularity-Aware Complexity Measure [2.1670528702668648]
The Local Learning Coefficient (LLC) is introduced as a novel complexity measure for deep neural networks (DNNs) This paper provides an extensive exploration of the LLC's theoretical underpinnings, offering both a clear definition and intuitive insights into its application. Ultimately, the LLC emerges as a crucial tool for reconciling the apparent contradiction between deep learning's complexity and the principle of parsimony.
arXiv Detail & Related papers (2023-08-23T12:55:41Z)
Kalman Filter for Online Classification of Non-Stationary Data [101.26838049872651]
In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps. We introduce a probabilistic Bayesian online learning model by using a neural representation and a state space model over the linear predictor weights. In experiments in multi-class classification we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.
arXiv Detail & Related papers (2023-06-14T11:41:42Z)
Modeling High-Dimensional Data with Unknown Cut Points: A Fusion Penalized Logistic Threshold Regression [2.520538806201793]
In traditional logistic regression models, the link function is often assumed to be linear and continuous in predictors. We consider a threshold model that all continuous features are discretized into ordinal levels, which further determine the binary responses. We find the lasso model is well suited in the problem of early detection and prediction for chronic disease like diabetes.
arXiv Detail & Related papers (2022-02-17T04:16:40Z)
Geometric Value Iteration: Dynamic Error-Aware KL Regularization for Reinforcement Learning [11.82492300303637]
We study the dynamic coefficient scheme and present the first error bound. We propose an effective scheme to tune coefficient according to the magnitude of error in favor of more robust learning. Our experiments demonstrate that GVI can effectively exploit the trade-off between learning speed and robustness over uniform averaging of constant KL coefficient.
arXiv Detail & Related papers (2021-07-16T01:24:37Z)
Efficient Causal Inference from Combined Observational and Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects. We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders. We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.