Sparsity Is Necessary: Polynomial-Time Stability for Agentic LLMs in Large Action Spaces
- URL: http://arxiv.org/abs/2601.08271v1
- Date: Tue, 13 Jan 2026 06:56:53 GMT
- Title: Sparsity Is Necessary: Polynomial-Time Stability for Agentic LLMs in Large Action Spaces
- Authors: Angshul Majumdar,
- Abstract summary: Tool-augmented LLM systems expose a control regime that learning theory has largely ignored.<n>We formalize this setting as Sparse Agentic Control (SAC), where policies admit block-sparse representations over M.<n>We show that under partial observability, LLMs matter only through a belief/representation error epsilon_b, yielding an additive O(epsilon_b) degradation.
- Score: 11.62669179647184
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tool-augmented LLM systems expose a control regime that learning theory has largely ignored: sequential decision-making with a massive discrete action universe (tools, APIs, documents) in which only a small, unknown subset is relevant for any fixed task distribution. We formalize this setting as Sparse Agentic Control (SAC), where policies admit block-sparse representations over M >> 1 actions and rewards depend on sparse main effects and (optionally) sparse synergies. We study ell_{1,2}-regularized policy learning through a convex surrogate and establish sharp, compressed-sensing-style results: (i) estimation and value suboptimality scale as k (log M / T)^{1/2} under a Policy-RSC condition; (ii) exact tool-support recovery holds via primal-dual witness arguments when T > k log M under incoherence and beta-min; and (iii) any dense policy class requires Omega(M) samples, explaining the instability of prompt-only controllers. We further show that under partial observability, LLMs matter only through a belief/representation error epsilon_b, yielding an additive O(epsilon_b) degradation while preserving logarithmic dependence on M. Extensions cover tuning-free, online, robust, group-sparse, and interaction-aware SAC.
Related papers
- Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs [100.02824137397464]
We investigate how Large Language Models adapt their internal representations when encountering inputs of increasing difficulty.<n>We reveal a consistent and quantifiable phenomenon: as task difficulty increases, the last hidden states of LLMs become substantially sparser.<n>This sparsity--difficulty relation is observable across diverse models and domains.
arXiv Detail & Related papers (2026-03-03T18:48:15Z) - Sparse Semantic Dimension as a Generalization Certificate for LLMs [53.681678236115836]
We introduce the Sparse Semantic Dimension (SSD), a complexity measure derived from the active feature vocabulary of a Sparse Autoencoder (SAE) trained on the model's layers.<n>We validate this framework on GPT-2 Small and Gemma-2B, demonstrating that our bound provides non-vacuous certificates at realistic sample sizes.
arXiv Detail & Related papers (2026-02-11T21:45:18Z) - Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training [33.61029387987583]
Policy mirror descent (PMD) provides a principled framework for reinforcement learning (RL)<n>We investigate a practical algorithm, termed PMD-mean, that approximates the log-partition term with the mean reward under the sampling policy.<n>Experiments on math reasoning tasks show that PMD-mean achieves superior performance with improved stability and time efficiency.
arXiv Detail & Related papers (2026-02-05T17:44:28Z) - Greedy Is Enough: Sparse Action Discovery in Agentic LLMs [11.62669179647184]
empirical evidence suggests that only a small subset of actions meaningfully influences performance in a given deployment.<n>Motivated by this observation, we study a contextual linear reward model in which action is governed by a structured sparsity assumption.<n>Our results identify sparse action discovery as a fundamental principle underlying large-action decision-making.
arXiv Detail & Related papers (2026-01-13T07:15:32Z) - Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads [104.9566359759396]
We propose a lightweight alternative for step-level reasoning verification based on data-driven uncertainty scores.<n>Our findings suggest that the internal states of LLMs encode their uncertainty and can serve as reliable signals for reasoning verification.
arXiv Detail & Related papers (2025-11-09T03:38:29Z) - T-REGS: Minimum Spanning Tree Regularization for Self-Supervised Learning [15.016777234800585]
Self-supervised learning (SSL) has emerged as a powerful paradigm for learning representations without labeled data.<n>Recent studies have highlighted two pivotal properties for effective representations.<n>We introduce T-REGS, a simple regularization framework for SSL based on the length of the Minimum Spanning Tree (MST) over the learned representation.
arXiv Detail & Related papers (2025-10-27T16:16:40Z) - MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics [72.00014675808228]
Instability in Large Language Models evaluation process obscures true learning dynamics.<n>We introduce textbfMaP, a framework that integrates underlineMerging underlineand the underlinePass@k metric.<n>Experiments show that MaP yields significantly smoother performance curves, reduces inter-run variance, and ensures more consistent rankings.
arXiv Detail & Related papers (2025-10-10T11:40:27Z) - Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs [0.0]
We investigate various constrained optimization formulations that address unlearning of sensitive information and robustness to jail-breaking attacks.<n>We find that the simplest point-wise constraint-based intervention we propose leads to better performance than max-min interventions, while having a lower computational cost.
arXiv Detail & Related papers (2025-10-03T23:32:21Z) - Unsupervised Conformal Inference: Bootstrapping and Alignment to Control LLM Uncertainty [49.19257648205146]
We propose an unsupervised conformal inference framework for generation.<n>Our gates achieve close-to-nominal coverage and provide tighter, more stable thresholds than split UCP.<n>The result is a label-free, API-compatible gate for test-time filtering.
arXiv Detail & Related papers (2025-09-26T23:40:47Z) - Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning [55.33984461046492]
Policy-based methods currently dominate reinforcement learning pipelines for large language model (LLM) reasoning.<n>We introduce Trajectory Bellman Residual Minimization (TBRM), an algorithm that naturally adapts this idea to LLMs.<n>We prove convergence to the near-optimal KL-regularized policy from arbitrary off-policy via an improved change-of-trajectory-measure analysis.
arXiv Detail & Related papers (2025-05-21T09:41:53Z) - Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces.
We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z) - RL for Latent MDPs: Regret Guarantees and a Lower Bound [74.41782017817808]
We consider the regret problem for reinforcement learning in latent Markov Decision Processes (LMDP)
In an LMDP, an MDP is randomly drawn from a set of $M$ possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent.
We show that the key link is a notion of separation between the MDP system dynamics.
arXiv Detail & Related papers (2021-02-09T16:49:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.