Greedy Is Enough: Sparse Action Discovery in Agentic LLMs
- URL: http://arxiv.org/abs/2601.08280v1
- Date: Tue, 13 Jan 2026 07:15:32 GMT
- Title: Greedy Is Enough: Sparse Action Discovery in Agentic LLMs
- Authors: Angshul Majumdar,
- Abstract summary: empirical evidence suggests that only a small subset of actions meaningfully influences performance in a given deployment.<n>Motivated by this observation, we study a contextual linear reward model in which action is governed by a structured sparsity assumption.<n>Our results identify sparse action discovery as a fundamental principle underlying large-action decision-making.
- Score: 11.62669179647184
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern agentic systems operate in environments with extremely large action spaces, such as tool-augmented language models with thousands of available APIs or retrieval operations. Despite this scale, empirical evidence suggests that only a small subset of actions meaningfully influences performance in a given deployment. Motivated by this observation, we study a contextual linear reward model in which action relevance is governed by a structured sparsity assumption: only a small number of actions have nonzero effects across latent states. We formulate action discovery as a block-sparse recovery problem and analyze a greedy algorithm inspired by Orthogonal Matching Pursuit. Under standard assumptions on incoherence, signal strength, and action coverage, we prove that the greedy procedure exactly recovers the relevant action set with high probability, using a number of samples that scales polynomially in the sparsity level and latent dimension, and only logarithmically in the total number of actions. We further provide estimation error guarantees for refitted parameters and show that the resulting decision rule is near-optimal for new latent states. Complementing these results, we establish information-theoretic lower bounds demonstrating that sparsity and sufficient coverage are necessary for tractability. Together, our results identify sparse action discovery as a fundamental principle underlying large-action decision-making and provide a theoretical foundation for action pruning in agentic systems.
Related papers
- Efficient Discovery of Approximate Causal Abstractions via Neural Mechanism Sparsification [0.0]
Neural networks are hypothesized to implement interpretable causal mechanisms.<n> verifying this requires finding a causal abstraction -- a simpler, high-level Structural Causal Model (SCM) faithful to the network under interventions.<n>We reframe the problem by viewing structured pruning as a search over approximate abstractions.
arXiv Detail & Related papers (2026-02-27T18:35:10Z) - Is Softmax Loss All You Need? A Principled Analysis of Softmax-family Loss [91.61796429377041]
The Softmax loss is one of the most widely employed surrogate objectives for classification and ranking tasks.<n>We investigate whether different surrogates achieve consistency with classification and ranking metrics, and analyze their gradient dynamics to reveal distinct convergence behaviors.<n>Our results establish a principled foundation and offer practical guidance for loss selections in large-class machine learning applications.
arXiv Detail & Related papers (2026-01-30T09:24:52Z) - Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits [58.63897489864948]
Reinforcement learning with outcome-based feedback faces a fundamental challenge.<n>How do we assign credit to the right actions?<n>This paper provides the first comprehensive analysis of this problem in online RL with general function approximation.
arXiv Detail & Related papers (2025-05-26T17:44:08Z) - Probe-Free Low-Rank Activation Intervention [26.502232859901167]
Inference-time interventions that edit the hidden activations have shown promising results in steering the LMs towards desirable generations.<n>This paper proposes a probe-free intervention method FLORAIN for all attention heads in a specific activation layer.
arXiv Detail & Related papers (2025-02-06T13:03:05Z) - GLANCE: Global Actions in a Nutshell for Counterfactual Explainability [10.25011737760687]
We introduce GLANCE, a versatile and adaptive framework, comprising two algorithms.
C-GLANCE employs a clustering approach that considers both the feature space and the space of counterfactual actions.
T-GLANCE provides additional features to enhance flexibility.
arXiv Detail & Related papers (2024-05-29T09:24:25Z) - Small Object Detection via Coarse-to-fine Proposal Generation and
Imitation Learning [52.06176253457522]
We propose a two-stage framework tailored for small object detection based on the Coarse-to-fine pipeline and Feature Imitation learning.
CFINet achieves state-of-the-art performance on the large-scale small object detection benchmarks, SODA-D and SODA-A.
arXiv Detail & Related papers (2023-08-18T13:13:09Z) - Efficient Transfer Learning via Causal Bounds [8.981637739384674]
We analyze how causal side-information accelerates online learning, and experiments on data reduction.<n>Our analysis precisely characterizes when how causal side-information accelerates online learning, and experiments on data reduction.
arXiv Detail & Related papers (2023-08-07T13:24:50Z) - Leveraging Factored Action Spaces for Off-Policy Evaluation [0.0]
Off-policy evaluation (OPE) aims to estimate the benefit of following a counterfactual sequence of actions.
Existing OPE estimators often exhibit high bias and high variance in problems involving large, decomposed action spaces.
We propose a new family of "decomposed" importance sampling (IS) estimators based on factored action spaces.
arXiv Detail & Related papers (2023-07-13T18:34:14Z) - Algorithmic Recourse with Missing Values [11.401006371457436]
This paper proposes a new framework of algorithmic recourse (AR) that works even in the presence of missing values.
AR aims to provide a recourse action for altering the undesired prediction result given by a classifier.
Experimental results demonstrated the efficacy of our method in the presence of missing values compared to the baselines.
arXiv Detail & Related papers (2023-04-28T03:22:48Z) - Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model.
A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations.
We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z) - False Correlation Reduction for Offline Reinforcement Learning [115.11954432080749]
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
We empirically show that SCORE achieves the SoTA performance with 3.1x acceleration on various tasks in a standard benchmark (D4RL)
arXiv Detail & Related papers (2021-10-24T15:34:03Z) - Loss Bounds for Approximate Influence-Based Abstraction [81.13024471616417]
Influence-based abstraction aims to gain leverage by modeling local subproblems together with the 'influence' that the rest of the system exerts on them.
This paper investigates the performance of such approaches from a theoretical perspective.
We show that neural networks trained with cross entropy are well suited to learn approximate influence representations.
arXiv Detail & Related papers (2020-11-03T15:33:10Z) - Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension.
We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation.
These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.