Related papers: Filtering Beats Fine Tuning: A Bayesian Kalman View of In Context Learning in LLMs

Filtering Beats Fine Tuning: A Bayesian Kalman View of In Context Learning in LLMs

URL: http://arxiv.org/abs/2601.06100v1
Date: Fri, 02 Jan 2026 21:18:48 GMT
Title: Filtering Beats Fine Tuning: A Bayesian Kalman View of In Context Learning in LLMs
Authors: Andrew Kiruluta,
Abstract summary: We present a theory-first framework that interprets inference-time adaptation in large language models as online Bayesian state estimation.<n>We formulate task- and context-specific learning as the sequential inference of a low-dimensional latent adaptation state governed by a linearized state-space model.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a theory-first framework that interprets inference-time adaptation in large language models (LLMs) as online Bayesian state estimation. Rather than modeling rapid adaptation as implicit optimization or meta-learning, we formulate task- and context-specific learning as the sequential inference of a low-dimensional latent adaptation state governed by a linearized state-space model. Under Gaussian assumptions, adaptation follows a Kalman recursion with closed-form updates for both the posterior mean and covariance. This perspective elevates epistemic uncertainty to an explicit dynamical variable. We show that inference-time learning is driven by covariance collapse, i.e., rapid contraction of posterior uncertainty induced by informative tokens, which typically precedes convergence of the posterior mean. Using observability conditions on token-level Jacobians, we establish stability of the Bayesian filter, prove exponential covariance contraction rates, and derive mean-square error bounds. Gradient descent, natural-gradient methods, and meta-learning updates arise as singular, noise-free limits of the filtering dynamics, positioning optimization-based adaptation as a degenerate approximation of Bayesian inference. The resulting theory provides a unified probabilistic account of in-context learning, parameter-efficient adaptation, and test-time learning without parameter updates. It yields explicit guarantees on stability and sample efficiency, offers a principled interpretation of prompt informativeness via information accumulation, and clarifies the role of uncertainty dynamics absent from existing accounts. Minimal illustrative experiments corroborate the qualitative predictions of the theory.

Related papers

Adaptive Benign Overfitting (ABO): Overparameterized RLS for Online Learning in Non-stationary Time-series [0.0]
ABO is highly accurate (comparable to baseline kernel methods) while achieving speed improvements of between 20 and 40 percent.<n>Results provide a unified view linking adaptive filtering, kernel approximation, and benign overfitting within a stable online learning framework.
arXiv Detail & Related papers (2026-01-29T15:58:01Z)
The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss [53.542743390809356]
This paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB)<n>Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function.<n>We present a concrete solution that simultaneously achieves both principles via DFT or DWT.
arXiv Detail & Related papers (2025-12-21T06:08:22Z)
Bridging the Gap Between Bayesian Deep Learning and Ensemble Weather Forecasts [100.26854618129039]
Weather forecasting is fundamentally challenged by the chaotic nature of the atmosphere.<n>Recent advances in Bayesian Deep Learning (BDL) offer a promising but often disconnected alternative.<n>We bridge these paradigms through a unified hybrid BDL framework for ensemble weather forecasting.
arXiv Detail & Related papers (2025-11-18T07:49:52Z)
In-Context Learning Is Provably Bayesian Inference: A Generalization Theory for Meta-Learning [51.56484100374058]
We introduce a principled risk decomposition that separates the total ICL risk into two components: Bayes Gap and Posterior Variance.<n>For a uniform-attention Transformer, we derive a non-asymptotic upper bound on this gap, which explicitly clarifies the dependence on the number of pretraining prompts.<n>The Posterior Variance is a model-independent risk representing the intrinsic task uncertainty.
arXiv Detail & Related papers (2025-10-13T03:42:31Z)
Variational Deep Learning via Implicit Regularization [11.296548737163599]
Modern deep learning models generalize remarkably well in-distribution, despite being overparametrized and trained with little to no explicit regularization.<n>We propose to regularize variational neural networks solely by relying on the implicit bias of (stochastic) gradient descent.
arXiv Detail & Related papers (2025-05-26T17:15:57Z)
Adaptive Conformal Inference by Betting [51.272991377903274]
We consider the problem of adaptive conformal inference without any assumptions about the data generating process.<n>Existing approaches for adaptive conformal inference are based on optimizing the pinball loss using variants of online gradient descent.<n>We propose a different approach for adaptive conformal inference that leverages parameter-free online convex optimization techniques.
arXiv Detail & Related papers (2024-12-26T18:42:08Z)
Revisiting Essential and Nonessential Settings of Evidential Deep Learning [70.82728812001807]
Evidential Deep Learning (EDL) is an emerging method for uncertainty estimation. We propose Re-EDL, a simplified yet more effective variant of EDL.
arXiv Detail & Related papers (2024-10-01T04:27:07Z)
Sequential Representation Learning via Static-Dynamic Conditional Disentanglement [58.19137637859017]
This paper explores self-supervised disentangled representation learning within sequential data, focusing on separating time-independent and time-varying factors in videos. We propose a new model that breaks the usual independence assumption between those factors by explicitly accounting for the causal relationship between the static/dynamic variables. Experiments show that the proposed approach outperforms previous complex state-of-the-art techniques in scenarios where the dynamics of a scene are influenced by its content.
arXiv Detail & Related papers (2024-08-10T17:04:39Z)
Generalized Gaussian Temporal Difference Error for Uncertainty-aware Reinforcement Learning [0.19418036471925312]
We introduce a novel framework for generalized Gaussian error modeling in deep reinforcement learning.<n>We improve the estimation and mitigation of data-dependent aleatoric uncertainty.<n> Experiments with policy gradient algorithms demonstrate significant performance gains.
arXiv Detail & Related papers (2024-08-05T08:12:25Z)
Towards Fair Disentangled Online Learning for Changing Environments [28.207499975916324]
We argue that changing environments in online learning can be attributed to partial changes in learned parameters that are specific to environments. We propose a novel algorithm under the assumption that data collected at each time can be disentangled with two representations. A novel regret is proposed in which it takes a mixed form of dynamic and static regret metrics followed by a fairness-aware long-term constraint.
arXiv Detail & Related papers (2023-05-31T19:04:16Z)
Variational Nonlinear Kalman Filtering with Unknown Process Noise Covariance [24.23243651301339]
This paper presents a solution for identification of nonlinear state estimation and model parameters based on the approximate Bayesian inference principle. The performance of the proposed method is verified on radar target tracking applications by both simulated and real-world data.
arXiv Detail & Related papers (2023-05-06T03:34:39Z)
Learning Partially Known Stochastic Dynamics with Empirical PAC Bayes [12.44342023476206]
This paper presents a recipe to improve the prediction accuracy of such models in three steps. We observe in our experiments that this recipe effectively translates partial and noisy prior knowledge into an improved model fit.
arXiv Detail & Related papers (2020-06-17T14:47:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.