Related papers: Sparsity Is Necessary: Polynomial-Time Stability for Agentic LLMs in Large Action Spaces

Sparsity Is Necessary: Polynomial-Time Stability for Agentic LLMs in Large Action Spaces

URL: http://arxiv.org/abs/2601.08271v1
Date: Tue, 13 Jan 2026 06:56:53 GMT
Title: Sparsity Is Necessary: Polynomial-Time Stability for Agentic LLMs in Large Action Spaces
Authors: Angshul Majumdar,
Abstract summary: Tool-augmented LLM systems expose a control regime that learning theory has largely ignored.<n>We formalize this setting as Sparse Agentic Control (SAC), where policies admit block-sparse representations over M.<n>We show that under partial observability, LLMs matter only through a belief/representation error epsilon_b, yielding an additive O(epsilon_b) degradation.
Score: 11.62669179647184
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Tool-augmented LLM systems expose a control regime that learning theory has largely ignored: sequential decision-making with a massive discrete action universe (tools, APIs, documents) in which only a small, unknown subset is relevant for any fixed task distribution. We formalize this setting as Sparse Agentic Control (SAC), where policies admit block-sparse representations over M >> 1 actions and rewards depend on sparse main effects and (optionally) sparse synergies. We study ell_{1,2}-regularized policy learning through a convex surrogate and establish sharp, compressed-sensing-style results: (i) estimation and value suboptimality scale as k (log M / T)^{1/2} under a Policy-RSC condition; (ii) exact tool-support recovery holds via primal-dual witness arguments when T > k log M under incoherence and beta-min; and (iii) any dense policy class requires Omega(M) samples, explaining the instability of prompt-only controllers. We further show that under partial observability, LLMs matter only through a belief/representation error epsilon_b, yielding an additive O(epsilon_b) degradation while preserving logarithmic dependence on M. Extensions cover tuning-free, online, robust, group-sparse, and interaction-aware SAC.

Related papers

Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs [100.02824137397464]
We investigate how Large Language Models adapt their internal representations when encountering inputs of increasing difficulty.<n>We reveal a consistent and quantifiable phenomenon: as task difficulty increases, the last hidden states of LLMs become substantially sparser.<n>This sparsity--difficulty relation is observable across diverse models and domains.
arXiv Detail & Related papers (2026-03-03T18:48:15Z)
Sparse Semantic Dimension as a Generalization Certificate for LLMs [53.681678236115836]
We introduce the Sparse Semantic Dimension (SSD), a complexity measure derived from the active feature vocabulary of a Sparse Autoencoder (SAE) trained on the model's layers.<n>We validate this framework on GPT-2 Small and Gemma-2B, demonstrating that our bound provides non-vacuous certificates at realistic sample sizes.
arXiv Detail & Related papers (2026-02-11T21:45:18Z)
Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training [33.61029387987583]
Policy mirror descent (PMD) provides a principled framework for reinforcement learning (RL)<n>We investigate a practical algorithm, termed PMD-mean, that approximates the log-partition term with the mean reward under the sampling policy.<n>Experiments on math reasoning tasks show that PMD-mean achieves superior performance with improved stability and time efficiency.
arXiv Detail & Related papers (2026-02-05T17:44:28Z)
Greedy Is Enough: Sparse Action Discovery in Agentic LLMs [11.62669179647184]
empirical evidence suggests that only a small subset of actions meaningfully influences performance in a given deployment.<n>Motivated by this observation, we study a contextual linear reward model in which action is governed by a structured sparsity assumption.<n>Our results identify sparse action discovery as a fundamental principle underlying large-action decision-making.
arXiv Detail & Related papers (2026-01-13T07:15:32Z)
Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads [104.9566359759396]
We propose a lightweight alternative for step-level reasoning verification based on data-driven uncertainty scores.<n>Our findings suggest that the internal states of LLMs encode their uncertainty and can serve as reliable signals for reasoning verification.
arXiv Detail & Related papers (2025-11-09T03:38:29Z)
T-REGS: Minimum Spanning Tree Regularization for Self-Supervised Learning [15.016777234800585]
Self-supervised learning (SSL) has emerged as a powerful paradigm for learning representations without labeled data.<n>Recent studies have highlighted two pivotal properties for effective representations.<n>We introduce T-REGS, a simple regularization framework for SSL based on the length of the Minimum Spanning Tree (MST) over the learned representation.
arXiv Detail & Related papers (2025-10-27T16:16:40Z)
MaP: A Unified Framework for Reliable Evaluation of Pre-training Dynamics [72.00014675808228]
Instability in Large Language Models evaluation process obscures true learning dynamics.<n>We introduce textbfMaP, a framework that integrates underlineMerging underlineand the underlinePass@k metric.<n>Experiments show that MaP yields significantly smoother performance curves, reduces inter-run variance, and ensures more consistent rankings.
arXiv Detail & Related papers (2025-10-10T11:40:27Z)
Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs [0.0]
We investigate various constrained optimization formulations that address unlearning of sensitive information and robustness to jail-breaking attacks.<n>We find that the simplest point-wise constraint-based intervention we propose leads to better performance than max-min interventions, while having a lower computational cost.
arXiv Detail & Related papers (2025-10-03T23:32:21Z)
Unsupervised Conformal Inference: Bootstrapping and Alignment to Control LLM Uncertainty [49.19257648205146]
We propose an unsupervised conformal inference framework for generation.<n>Our gates achieve close-to-nominal coverage and provide tighter, more stable thresholds than split UCP.<n>The result is a label-free, API-compatible gate for test-time filtering.
arXiv Detail & Related papers (2025-09-26T23:40:47Z)
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning [55.33984461046492]
Policy-based methods currently dominate reinforcement learning pipelines for large language model (LLM) reasoning.<n>We introduce Trajectory Bellman Residual Minimization (TBRM), an algorithm that naturally adapts this idea to LLMs.<n>We prove convergence to the near-optimal KL-regularized policy from arbitrary off-policy via an improved change-of-trajectory-measure analysis.
arXiv Detail & Related papers (2025-05-21T09:41:53Z)
Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces. We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z)
RL for Latent MDPs: Regret Guarantees and a Lower Bound [74.41782017817808]
We consider the regret problem for reinforcement learning in latent Markov Decision Processes (LMDP) In an LMDP, an MDP is randomly drawn from a set of $M$ possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent. We show that the key link is a notion of separation between the MDP system dynamics.
arXiv Detail & Related papers (2021-02-09T16:49:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.