Related papers: SAND: Boosting LLM Agents with Self-Taught Action Deliberation

SAND: Boosting LLM Agents with Self-Taught Action Deliberation

URL: http://arxiv.org/abs/2507.07441v1
Date: Thu, 10 Jul 2025 05:38:15 GMT
Title: SAND: Boosting LLM Agents with Self-Taught Action Deliberation
Authors: Yu Xia, Yiran Jenny Shen, Junda Wu, Tong Yu, Sungchul Kim, Ryan A. Rossi, Lina Yao, Julian McAuley,
Abstract summary: Large Language Model (LLM) agents are commonly tuned with supervised finetuning on ReAct-style expert trajectories or preference optimization over pairwise rollouts.<n>We propose Self-taught ActioN Deliberation (SAND) framework, enabling LLM agents to explicitly deliberate over candidate actions before committing to one.<n>SAND achieves an average 20% improvement over initial supervised finetuning and also outperforms state-of-the-art agent tuning approaches.
Score: 53.732649189709285
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large Language Model (LLM) agents are commonly tuned with supervised finetuning on ReAct-style expert trajectories or preference optimization over pairwise rollouts. Most of these methods focus on imitating specific expert behaviors or promoting chosen reasoning thoughts and actions over rejected ones. However, without reasoning and comparing over alternatives actions, LLM agents finetuned with these methods may over-commit towards seemingly plausible but suboptimal actions due to limited action space exploration. To address this, in this paper we propose Self-taught ActioN Deliberation (SAND) framework, enabling LLM agents to explicitly deliberate over candidate actions before committing to one. To tackle the challenges of when and what to deliberate given large action space and step-level action evaluation, we incorporate self-consistency action sampling and execution-guided action critique to help synthesize step-wise action deliberation thoughts using the base model of the LLM agent. In an iterative manner, the deliberation trajectories are then used to finetune the LLM agent itself. Evaluating on two representative interactive agent tasks, SAND achieves an average 20% improvement over initial supervised finetuning and also outperforms state-of-the-art agent tuning approaches.

Related papers

DICE: Dynamic In-Context Example Selection in LLM Agents via Efficient Knowledge Transfer [50.64531021352504]
Large language model-based agents, empowered by in-context learning (ICL), have demonstrated strong capabilities in complex reasoning and tool-use tasks.<n>Existing approaches typically rely on example selection, including in some agentic or multi-step settings.<n>We propose DICE, a theoretically grounded ICL framework for agentic tasks that selects the most relevant demonstrations at each step of reasoning.
arXiv Detail & Related papers (2025-07-31T13:42:14Z)
MIRROR: Multi-agent Intra- and Inter-Reflection for Optimized Reasoning in Tool Learning [33.009759731505746]
Complex tasks involving tool integration pose significant challenges for Large Language Models.<n> Reflection has emerged as an effective strategy for correcting erroneous trajectories in agentic benchmarks.<n>We propose MIRROR, a framework that consists of both intra-reflection, which critically assesses intended actions before execution, and inter-reflection, which further adjusts the trajectory.
arXiv Detail & Related papers (2025-05-27T03:37:33Z)
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z)
ATLaS: Agent Tuning via Learning Critical Steps [39.279048406057264]
Large Language Model (LLM) agents have demonstrated remarkable generalization capabilities across multi-domain tasks.<n>Existing agent tuning approaches typically employ supervised finetuning on entire expert trajectories.<n>We propose ATLaS that identifies the critical steps in expert trajectories and finetunes LLMs solely on these steps with reduced costs.
arXiv Detail & Related papers (2025-03-04T02:14:55Z)
From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z)
Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training. Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z)
Controlling Large Language Model Agents with Entropic Activation Steering [20.56909601159833]
We introduce Entropic Activation Steering (EAST), an activation steering method for in-context learning agents. We show that EAST can effectively manipulate an LLM agent's exploration by directly affecting the high-level actions parsed from the outputs of the LLM. We also reveal how applying this control modulates the uncertainty exhibited in the LLM's thoughts, guiding the agent towards more exploratory actions.
arXiv Detail & Related papers (2024-06-01T00:25:00Z)
Devil's Advocate: Anticipatory Reflection for LLM Agents [53.897557605550325]
Our approach prompts LLM agents to decompose a given task into manageable subtasks. We implement a three-fold introspective intervention:. Anticipatory reflection on potential failures and alternative remedy before action execution. Post-action alignment with subtask objectives and backtracking with remedy to ensure utmost effort in plan execution.
arXiv Detail & Related papers (2024-05-25T19:20:15Z)
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents [74.16170899755281]
We introduce AgentBoard, a pioneering comprehensive benchmark and accompanied open-source evaluation framework tailored to analytical evaluation of LLM agents.<n>AgentBoard offers a fine-grained progress rate metric that captures incremental advancements as well as a comprehensive evaluation toolkit.<n>This not only sheds light on the capabilities and limitations of LLM agents but also propels the interpretability of their performance to the forefront.
arXiv Detail & Related papers (2024-01-24T01:51:00Z)
Formally Specifying the High-Level Behavior of LLM-Based Agents [24.645319505305316]
LLMs have emerged as promising tools for solving challenging problems without the need for task-specific finetuned models. Currently, the design and implementation of such agents is ad hoc, as the wide variety of tasks that LLM-based agents may be applied to naturally means there can be no one-size-fits-all approach to agent design. We propose a minimalistic generation framework that simplifies the process of building agents.
arXiv Detail & Related papers (2023-10-12T17:24:15Z)
Reinforcement Learning With Sparse-Executing Actions via Sparsity Regularization [15.945378631406024]
Reinforcement learning (RL) has demonstrated impressive performance in decision-making tasks like embodied control, autonomous driving and financial trading. In many decision-making tasks, the agents often encounter the problem of executing actions under limited budgets. This paper formalizes the problem as a Sparse Action Markov Decision Process (SA-MDP), in which specific actions in the action space can only be executed for a limited time. We propose a policy optimization algorithm, Action Sparsity REgularization (ASRE), which adaptively handles each action with a distinct preference.
arXiv Detail & Related papers (2021-05-18T16:50:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.