Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization
- URL: http://arxiv.org/abs/2602.11351v1
- Date: Wed, 11 Feb 2026 20:40:43 GMT
- Title: Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization
- Authors: Yihang Yao, Zhepeng Cen, Haohong Lin, Shiqi Liu, Zuxin Liu, Jiacheng Zhu, Zhang-Wei Hong, Laixi Shi, Ding Zhao,
- Abstract summary: Proactive large language model (LLM) agents aim to actively plan, query, and interact over multiple turns.<n>Agentic reinforcement learning (RL) has emerged as a promising solution for training such agents in multi-turn settings.<n>We propose BAO, an agentic RL framework that combines behavior enhancement to enrich proactive reasoning and information-gathering capabilities.
- Score: 61.641777037967366
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Proactive large language model (LLM) agents aim to actively plan, query, and interact over multiple turns, enabling efficient task completion beyond passive instruction following and making them essential for real-world, user-centric applications. Agentic reinforcement learning (RL) has recently emerged as a promising solution for training such agents in multi-turn settings, allowing interaction strategies to be learned from feedback. However, existing pipelines face a critical challenge in balancing task performance with user engagement, as passive agents can not efficiently adapt to users' intentions while overuse of human feedback reduces their satisfaction. To address this trade-off, we propose BAO, an agentic RL framework that combines behavior enhancement to enrich proactive reasoning and information-gathering capabilities with behavior regularization to suppress inefficient or redundant interactions and align agent behavior with user expectations. We evaluate BAO on multiple tasks from the UserRL benchmark suite, and demonstrate that it substantially outperforms proactive agentic RL baselines while achieving comparable or even superior performance to commercial LLM agents, highlighting its effectiveness for training proactive, user-aligned LLM agents in complex multi-turn scenarios. Our website: https://proactive-agentic-rl.github.io/.
Related papers
- InfoPO: Information-Driven Policy Optimization for User-Centric Agents [39.407032905771885]
We introduce InfoPO, which frames multi-turn interaction as a process of active uncertainty reduction.<n>It computes an information-gain reward that credits turns whose feedback changes the agent's subsequent action distribution.<n>It then combines this signal with task outcomes via an adaptive variance-gated fusion.
arXiv Detail & Related papers (2026-02-28T13:58:14Z) - Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning [15.39565540937229]
Managing agent thought and observation during multi-turn agent-environment interactions is an emerging strategy to improve efficiency.<n>We propose Agent-Omit, a unified training framework that empowers LLM agents to adaptively omit redundant thoughts and observations.<n> Experimental results show that our constructed Agent-Omit-8B could obtain performance comparable to seven frontier LLM agents.
arXiv Detail & Related papers (2026-02-04T07:26:23Z) - SpeakRL: Synergizing Reasoning, Speaking, and Acting in Language Models with Reinforcement Learning [46.70182219204539]
We introduce SpeakRL, a reinforcement learning (RL) method that enhances agents' conversational capabilities by rewarding proactive interactions with users.<n>We present a systematic analysis of reward design for conversational proactivity and propose a principled reward formulation for teaching agents to balance asking with acting.
arXiv Detail & Related papers (2025-12-15T10:08:53Z) - SAND: Boosting LLM Agents with Self-Taught Action Deliberation [54.48979740613828]
Large Language Model (LLM) agents are commonly tuned with supervised finetuning on ReAct-style expert trajectories or preference optimization over pairwise rollouts.<n>We propose Self-taught ActioN Deliberation (SAND) framework, enabling LLM agents to explicitly deliberate over candidate actions before committing to one.<n>SAND achieves an average 20% improvement over initial supervised finetuning and also outperforms state-of-the-art agent tuning approaches.
arXiv Detail & Related papers (2025-07-10T05:38:15Z) - Thought-Augmented Planning for LLM-Powered Interactive Recommender Agent [56.61028117645315]
We propose a novel thought-augmented interactive recommender agent system (TAIRA) that addresses complex user intents through distilled thought patterns.<n>Specifically, TAIRA is designed as an LLM-powered multi-agent system featuring a manager agent that orchestrates recommendation tasks by decomposing user needs and planning subtasks.<n>Through comprehensive experiments conducted across multiple datasets, TAIRA exhibits significantly enhanced performance compared to existing methods.
arXiv Detail & Related papers (2025-06-30T03:15:50Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance [95.03771007780976]
We tackle the challenge of developing proactive agents capable of anticipating and initiating tasks without explicit human instructions.<n>First, we collect real-world human activities to generate proactive task predictions.<n>These predictions are labeled by human annotators as either accepted or rejected.<n>The labeled data is used to train a reward model that simulates human judgment.
arXiv Detail & Related papers (2024-10-16T08:24:09Z) - AgentCF: Collaborative Learning with Autonomous Language Agents for
Recommender Systems [112.76941157194544]
We propose AgentCF for simulating user-item interactions in recommender systems through agent-based collaborative filtering.
We creatively consider not only users but also items as agents, and develop a collaborative learning approach that optimize both kinds of agents together.
Overall, the optimized agents exhibit diverse interaction behaviors within our framework, including user-item, user-user, item-item, and collective interactions.
arXiv Detail & Related papers (2023-10-13T16:37:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.