Related papers: Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents

Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents

URL: http://arxiv.org/abs/2505.09970v2
Date: Mon, 19 May 2025 03:17:21 GMT
Title: Pre-Act: Multi-Step Planning and Reasoning Improves Acting in LLM Agents
Authors: Mrinal Rawat, Ambuje Gupta, Rushil Goomer, Alessandro Di Bari, Neha Gupta, Roberto Pieraccini,
Abstract summary: The ReAct capability in large language models (LLMs) has become the foundation of modern agentic systems.<n>We introduce Pre-Act, a novel approach that enhances the agent's performance by creating a multi-step execution plan.<n>Our approach is applicable to both conversational and non-conversational agents.
Score: 40.73340280747757
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ReAct (Reasoning + Action) capability in large language models (LLMs) has become the foundation of modern agentic systems. Recent LLMs, such as DeepSeek-R1 and OpenAI o1/o3, exemplify this by emphasizing reasoning through the generation of ample intermediate tokens, which help build a strong premise before producing the final output tokens. In this paper, we introduce Pre-Act, a novel approach that enhances the agent's performance by creating a multi-step execution plan along with the detailed reasoning for the given user input. This plan incrementally incorporates previous steps and tool outputs, refining itself after each step execution until the final response is obtained. Our approach is applicable to both conversational and non-conversational agents. To measure the performance of task-oriented agents comprehensively, we propose a two-level evaluation framework: (1) turn level and (2) end-to-end. Our turn-level evaluation, averaged across five models, shows that our approach, Pre-Act, outperforms ReAct by 70% in Action Recall on the Almita dataset. While this approach is effective for larger models, smaller models crucial for practical applications, where latency and cost are key constraints, often struggle with complex reasoning tasks required for agentic systems. To address this limitation, we fine-tune relatively small models such as Llama 3.1 (8B & 70B) using the proposed Pre-Act approach. Our experiments show that the fine-tuned 70B model outperforms GPT-4, achieving a 69.5% improvement in action accuracy (turn-level) and a 28% improvement in goal completion rate (end-to-end) on the Almita (out-of-domain) dataset.

Related papers

DEPO: Dual-Efficiency Preference Optimization for LLM Agents [75.6723341304463]
We propose DEPO, a dual-efficiency preference optimization method that jointly rewards succinct responses and fewer action steps.<n>Experiments on WebShop and BabyAI show that DEPO cuts token usage by up to 60.9% and steps by up to 26.9%, while achieving up to a 29.3% improvement in performance.
arXiv Detail & Related papers (2025-11-19T12:38:43Z)
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization [103.74675519953898]
Long-chain reflective reasoning is a prerequisite for solving complex real-world problems.<n>We build a benchmark consisting 1,260 samples of 42 challenging synthetic tasks.<n>We generate post-training data and explore learning paradigms for exploiting such data.
arXiv Detail & Related papers (2025-10-09T17:53:58Z)
Encouraging Good Processes Without the Need for Good Answers: Reinforcement Learning for LLM Agent Planning [6.314485350935057]
Reinforcement Learning with Tool-use Rewards is a novel framework that decouples the training process to enable a focused, single-objective optimization of the planning module.<n>Our experiments demonstrate that RLTR achieves an 8%-12% improvement in planning performance compared to end-to-end baselines.<n>This enhanced planning capability, in turn, translates to a 5%-6% increase in the final response quality of the overall agent system.
arXiv Detail & Related papers (2025-08-27T06:19:50Z)
KAT-V1: Kwai-AutoThink Technical Report [50.84483585850113]
We present Kwaipilot-AutoThink (KAT), an open-source 40B large language model developed to address the overthinking problem in reasoning-intensive tasks.<n>KAT dynamically switches between reasoning and non-reasoning modes based on task complexity.<n>We also propose Step-SRPO, a reinforcement learning algorithm that incorporates intermediate supervision into the GRPO framework.
arXiv Detail & Related papers (2025-07-11T04:07:10Z)
DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development.<n>We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents.<n>We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z)
Entropy-Regularized Process Reward Model [30.279394036823092]
Large language models (LLMs) have shown promise in performing complex multi-step reasoning, yet they continue to struggle with mathematical reasoning.<n>We propose an entropy-regularized process reward model (ER-PRM) that integrates KL-regularized Markov Decision Processes (MDP)<n>Our empirical experiments on the MATH and GSM8K benchmarks demonstrate that ER-PRM consistently outperforms existing process reward models.
arXiv Detail & Related papers (2024-12-15T01:09:23Z)
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance [95.03771007780976]
We tackle the challenge of developing proactive agents capable of anticipating and initiating tasks without explicit human instructions.<n>First, we collect real-world human activities to generate proactive task predictions.<n>These predictions are labeled by human annotators as either accepted or rejected.<n>The labeled data is used to train a reward model that simulates human judgment.
arXiv Detail & Related papers (2024-10-16T08:24:09Z)
CPL: Critical Plan Step Learning Boosts LLM Generalization in Reasoning Tasks [2.9449838351181374]
Post-training, particularly reinforcement learning, has become a new learning paradigm for large language models (LLMs) We propose searching within the action space on high-level abstract plans to enhance model generalization. We show that our method, trained exclusively on GSM8K and MATH, significantly improves performance.
arXiv Detail & Related papers (2024-09-13T08:59:31Z)
Crafting Efficient Fine-Tuning Strategies for Large Language Models [2.633490094119608]
Fine-tuning large language models (LLMs) with as few as 200 samples can improve model accuracy from 70% to 88% in a product attribute extraction task. A bayesian hyperparameter optimization method, which evaluates models at 20% of total training time, correlates strongly with final model performance. This approach led to a 2% improvement in accuracy over baseline models when evaluated on an independent test set.
arXiv Detail & Related papers (2024-07-18T21:36:00Z)
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process. We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z)
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase. We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs. To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.