CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent
- URL: http://arxiv.org/abs/2512.04949v1
- Date: Thu, 04 Dec 2025 16:15:46 GMT
- Title: CARL: Critical Action Focused Reinforcement Learning for Multi-Step Agent
- Authors: Leyang Shen, Yang Zhang, Chun Kai Ling, Xiaoyan Zhao, Tat-Seng Chua,
- Abstract summary: We propose CARL, a critical-action-focused reinforcement learning algorithm tailored for multi-step agents.<n>Carl achieves both stronger performance and higher efficiency during training and inference across diverse evaluation settings.
- Score: 53.56274149236814
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Agents capable of accomplishing complex tasks through multiple interactions with the environment have emerged as a popular research direction. However, in such multi-step settings, the conventional group-level policy optimization algorithm becomes suboptimal because of its underlying assumption that each action holds equal contribution, which deviates significantly from reality. Our analysis reveals that only a small fraction of actions are critical in determining the final outcome. Building on this insight, we propose CARL, a critical-action-focused reinforcement learning algorithm tailored for multi-step agents. CARL achieves focused training through providing action-level optimization signals for high-criticality actions while excluding low-criticality actions from model update. Extensive experiments demonstrate that CARL achieves both stronger performance and higher efficiency during training and inference across diverse evaluation settings.
Related papers
- CurES: From Gradient Analysis to Efficient Curriculum Learning for Reasoning LLMs [53.749193998004166]
Curriculum learning plays a crucial role in enhancing the training efficiency of large language models.<n>We propose CurES, an efficient training method that accelerates convergence and employs Bayesian posterior estimation to minimize computational overhead.
arXiv Detail & Related papers (2025-10-01T15:41:27Z) - Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning [56.496001894673235]
Reinforcement Learning (RL) has proven highly effective at enhancing the complex reasoning abilities of Large Language Models (LLMs)<n>Our analysis reveals that puzzling phenomena like aha moments", length-scaling'' and entropy dynamics are not disparate occurrences but hallmarks of an emergent reasoning hierarchy.
arXiv Detail & Related papers (2025-09-03T18:52:49Z) - Application of LLM Guided Reinforcement Learning in Formation Control with Collision Avoidance [1.1718316049475228]
Multi-Agent Systems (MAS) excel at accomplishing complex objectives through the collaborative efforts of individual agents.<n>In this paper, we introduce a novel framework that aims to overcome the challenge of designing an effective reward function.<n>By giving large language models (LLMs) on the prioritization of tasks, our framework generates reward functions that can be dynamically adjusted online.
arXiv Detail & Related papers (2025-07-22T09:26:00Z) - Fast Adaptation with Behavioral Foundation Models [82.34700481726951]
Unsupervised zero-shot reinforcement learning has emerged as a powerful paradigm for pretraining behavioral foundation models.<n>Despite promising results, zero-shot policies are often suboptimal due to errors induced by the unsupervised training process.<n>We propose fast adaptation strategies that search in the low-dimensional task-embedding space of the pre-trained BFM to rapidly improve the performance of its zero-shot policies.
arXiv Detail & Related papers (2025-04-10T16:14:17Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - Behavior-Guided Actor-Critic: Improving Exploration via Learning Policy
Behavior Representation for Deep Reinforcement Learning [0.0]
We propose Behavior-Guided Actor-Critic (BAC) as an off-policy actor-critic deep RL algorithm.
BAC mathematically formulates the behavior of the policy through autoencoders.
Results show considerably better performances of BAC when compared to several cutting-edge learning algorithms.
arXiv Detail & Related papers (2021-04-09T15:22:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.