Related papers: Theoretical Bounds for Stable In-Context Learning

Theoretical Bounds for Stable In-Context Learning

URL: http://arxiv.org/abs/2509.20677v1
Date: Thu, 25 Sep 2025 02:25:05 GMT
Title: Theoretical Bounds for Stable In-Context Learning
Authors: Tongxi Wang, Zhuoyang Xia,
Abstract summary: In-context learning (ICL) is flexible but its reliability is sensitive to prompt length.<n>This paper establishes a non-asymptotic lower bound that links the minimal number of demonstrations to ICL stability.<n>We propose a two-stage observable estimator with a one-shot calibration that produces practitioner-ready prompt-length estimates.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In-context learning (ICL) is flexible but its reliability is highly sensitive to prompt length. This paper establishes a non-asymptotic lower bound that links the minimal number of demonstrations to ICL stability under fixed high-dimensional sub-Gaussian representations. The bound gives explicit sufficient conditions in terms of spectral properties of the covariance, providing a computable criterion for practice. Building on this analysis, we propose a two-stage observable estimator with a one-shot calibration that produces practitioner-ready prompt-length estimates without distributional priors. Experiments across diverse datasets, encoders, and generators show close alignment between the predicted thresholds and empirical knee-points, with the theory acting as a conservative but reliable upper bound; the calibrated variant further tightens this gap. These results connect spectral coverage to stable ICL, bridge theory and deployment, and improve the interpretability and reliability of large-scale prompting in realistic finite-sample regimes.

Related papers

Sharp Convergence Rates for Masked Diffusion Models [53.117058231393834]
We develop a total-variation based analysis for the Euler method that overcomes limitations.<n>Our results relax assumptions on score estimation, improve parameter dependencies, and establish convergence guarantees.<n>Overall, our analysis introduces a direct TV-based error decomposition along the CTMC trajectory and a decoupling-based path-wise analysis for FHS.
arXiv Detail & Related papers (2026-02-26T00:47:51Z)
Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z)
On the Spectral Flattening of Quantized Embeddings [25.64641307046705]
Training Large Language Models at ultra-low precision is critically impeded by instability rooted in the conflict between discrete quantization constraints and the intrinsic heavy-tailed spectral nature of linguistic data.<n>This work not only quantifies the spectral sensitivity of LLMs but also establishes spectral fidelity as a necessary condition for stable low-bit optimization.
arXiv Detail & Related papers (2026-02-01T02:21:53Z)
Toward Scalable and Valid Conditional Independence Testing with Spectral Representations [25.258360465513338]
Conditional independence (CI) is untestable in many settings without additional assumptions.<n>We introduce a practical bi-level contrastive algorithm to learn representations derived from the singular value decomposition of the partial covariance operator.<n>Preliminary experiments suggest that this approach offers a practical and statistically grounded path toward scalable CI testing.
arXiv Detail & Related papers (2025-12-22T16:05:18Z)
From Tail Universality to Bernstein-von Mises: A Unified Statistical Theory of Semi-Implicit Variational Inference [0.12183405753834557]
Semi-implicit variational inference (SIVI) constructs approximate posteriors of the form $q() = int k(| z) r(dz)$<n>This paper develops a unified "approximation-optimization-statistics'' theory for such families.
arXiv Detail & Related papers (2025-12-05T19:26:25Z)
Spectral Thresholds in Correlated Spiked Models and Fundamental Limits of Partial Least Squares [15.163541835643635]
We show that Partial Least Squares (PLS) fails to recover any signal, despite detectability being possible in principle.<n>These findings clarify the theoretical limits of PLS and provide guidance for the design of reliable multi-modal inference methods in high dimensions.
arXiv Detail & Related papers (2025-10-20T14:08:58Z)
Assessing One-Dimensional Cluster Stability by Extreme-Point Trimming [0.0]
We develop a probabilistic method for assessing the tail behavior and geometric stability of one-dimensional i.i.d. samples.<n>We derive analytical expressions, including finite-sample corrections, for the expected shrinkage under both the uniform and Gaussian hypotheses.<n>We further integrate our criterion into a clustering pipeline (e.g. DBSCAN), demonstrating its ability to validate one-dimensional clusters without any density estimation or parameter tuning.
arXiv Detail & Related papers (2025-08-29T21:52:15Z)
A New Formulation of Lipschitz Constrained With Functional Gradient Learning for GANs [52.55025869932486]
This paper introduces a promising alternative method for training Generative Adversarial Networks (GANs) on large-scale datasets with clear theoretical guarantees.<n>We propose a novel Lipschitz-constrained Functional Gradient GANs learning (Li-CFG) method to stabilize the training of GAN.<n>We demonstrate that the neighborhood size of the latent vector can be reduced by increasing the norm of the discriminator gradient.
arXiv Detail & Related papers (2025-01-20T02:48:07Z)
Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
We investigate the statistical properties of Temporal Difference learning with Polyak-Ruppert averaging.<n>We make three significant contributions that improve the current state-of-the-art results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z)
General bounds on the quality of Bayesian coresets [13.497835690074151]
This work presents general upper and lower bounds on the Kullback-Leibler (KL) Lower bounds are applied to obtain fundamental limitations on the quality of coreset approximations. The upper bounds are used to analyze the performance of recent subsample-optimize methods.
arXiv Detail & Related papers (2024-05-20T04:46:14Z)
Understanding Contrastive Learning via Distributionally Robust Optimization [29.202594242468678]
This study reveals the inherent tolerance of contrastive learning (CL) towards sampling bias, wherein negative samples may encompass similar semantics (eg labels) We bridge this research gap by analyzing CL through the lens of distributionally robust optimization (DRO), yielding several key insights. We also identify CL's potential shortcomings, including over-conservatism and sensitivity to outliers, and introduce a novel Adjusted InfoNCE loss (ADNCE) to mitigate these issues.
arXiv Detail & Related papers (2023-10-17T07:32:59Z)
Last-Iterate Convergence of Adaptive Riemannian Gradient Descent for Equilibrium Computation [52.73824786627612]
This paper establishes new convergence results for textitgeodesic strongly monotone games.<n>Our key result shows that RGD attains last-iterate linear convergence in a textitgeometry-agnostic fashion.<n>Overall, this paper presents the first geometry-agnostic last-iterate convergence analysis for games beyond the Euclidean settings.
arXiv Detail & Related papers (2023-06-29T01:20:44Z)
Federated Conformal Predictors for Distributed Uncertainty Quantification [83.50609351513886]
Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning. In this paper, we extend conformal prediction to the federated learning setting. We propose a weaker notion of partial exchangeability, better suited to the FL setting, and use it to develop the Federated Conformal Prediction framework.
arXiv Detail & Related papers (2023-05-27T19:57:27Z)
Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram Iteration [122.51142131506639]
We introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory. We show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability. It proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches.
arXiv Detail & Related papers (2023-05-25T15:32:21Z)
Can convolutional ResNets approximately preserve input distances? A frequency analysis perspective [31.897568775099558]
We show that the theoretical link between the regularisation scheme used and bi-Lipschitzness is only valid under conditions which do not hold in practice. We present a simple constructive algorithm to search for counter examples to the distance preservation condition.
arXiv Detail & Related papers (2021-06-04T13:12:42Z)
Finite Sample Analysis of Minimax Offline Reinforcement Learning: Completeness, Fast Rates and First-Order Efficiency [83.02999769628593]
We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning. We show that the minimax approach enables us to achieve a fast rate of convergence for weights and quality functions. We present the first finite-sample result with first-order efficiency in non-tabular environments.
arXiv Detail & Related papers (2021-02-05T03:20:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.