Reinforcing Real-world Service Agents: Balancing Utility and Cost in Task-oriented Dialogue
- URL: http://arxiv.org/abs/2602.22697v1
- Date: Thu, 26 Feb 2026 07:19:57 GMT
- Title: Reinforcing Real-world Service Agents: Balancing Utility and Cost in Task-oriented Dialogue
- Authors: Ning Gao, Wei Zhang, Yuqin Dai, Ling Shi, Ziyin Wang, Yujie Wang, Wei He, Jinpeng Wang, Chaozheng Wang,
- Abstract summary: We propose InteractCS-RL, a framework that reframes task-oriented dialogue as a multi-granularity reinforcement learning process.<n>We first establish a User-centric Interaction Framework to provide a high-fidelity training gym.<n>Then, we introduce Cost-aware Multi-turn Policy Optimization (CMPO) with a hybrid advantage estimation strategy.
- Score: 28.25180116201176
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rapid evolution of Large Language Models (LLMs) has accelerated the transition from conversational chatbots to general agents. However, effectively balancing empathetic communication with budget-aware decision-making remains an open challenge. Since existing methods fail to capture these complex strategic trade-offs, we propose InteractCS-RL, a framework that reframes task-oriented dialogue as a multi-granularity reinforcement learning process. Specifically, we first establish a User-centric Interaction Framework to provide a high-fidelity training gym, enabling agents to dynamically explore diverse strategies with persona-driven users. Then, we introduce Cost-aware Multi-turn Policy Optimization (CMPO) with a hybrid advantage estimation strategy. By integrating generative process credits and employing a PID-Lagrangian cost controller, CMPO effectively guides the policy to explore Pareto boundary between user reward and global cost constraints. Extensive experiments on customized real business scenarios demonstrate that InteractCS-RL significantly outperform other baselines across three evaluation dimensions. Further evaluation on tool-agent-user interaction benchmarks verify InteractCS-RL robustness across diverse domains.
Related papers
- Pushing Forward Pareto Frontiers of Proactive Agents with Behavioral Agentic Optimization [61.641777037967366]
Proactive large language model (LLM) agents aim to actively plan, query, and interact over multiple turns.<n>Agentic reinforcement learning (RL) has emerged as a promising solution for training such agents in multi-turn settings.<n>We propose BAO, an agentic RL framework that combines behavior enhancement to enrich proactive reasoning and information-gathering capabilities.
arXiv Detail & Related papers (2026-02-11T20:40:43Z) - ESearch-R1: Learning Cost-Aware MLLM Agents for Interactive Embodied Search via Reinforcement Learning [40.2017873619555]
ESearch-R1 is a cost-aware embodied reasoning framework.<n>It unifies interactive dialogue (Ask), episodic memory retrieval (GetMemory) and physical navigation (Navigate) into a single decision process.<n>It improves task success rates while reducing total operational costs by approximately 50%.
arXiv Detail & Related papers (2025-12-21T02:45:08Z) - Reason-Plan-ReAct: A Reasoner-Planner Supervising a ReAct Executor for Complex Enterprise Tasks [0.0]
We introduce RP-ReAct, a novel multi-agent approach that decouples strategic planning from low-level execution to achieve superior reliability and efficiency.<n>RP-ReAct consists of a Reasoner Planner Agent (RPA), responsible for planning each sub-step, and one or multiple Proxy-Execution Agent (PEA) that translates sub-steps into concrete tool interactions.<n>We evaluate RP-ReAct, on the challenging, multi-domain ToolQA benchmark using a diverse set of six open-weight reasoning models.
arXiv Detail & Related papers (2025-12-03T08:28:40Z) - Interact-RAG: Reason and Interact with the Corpus, Beyond Black-Box Retrieval [49.85856484781787]
We introduce Interact-RAG, a new paradigm that elevates the LLM agent into an active manipulator of the retrieval process.<n>We develop a reasoning-enhanced workflow, which enables both zero-shot execution and the synthesis of interaction trajectories.<n>Experiments across six benchmarks demonstrate that Interact-RAG significantly outperforms other advanced methods.
arXiv Detail & Related papers (2025-10-31T15:48:43Z) - Process-Supervised Reinforcement Learning for Interactive Multimodal Tool-Use Agents [34.720205364467546]
We introduce a sandbox environment for reinforcement learning (RL) that supports interleaved speech-text rollouts.<n>Our core strategy, Turn-level Adjudicated Reinforcement Learning (TARL), addresses the challenge of credit assignment in long-horizon tasks.<n>This unified approach boosts the task pass rate on the text-based $tau$-bench by over 6% compared to strong RL baselines.
arXiv Detail & Related papers (2025-09-17T23:25:00Z) - AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning [129.44038804430542]
We introduce AgentGym-RL, a new framework to train LLM agents for multi-turn interactive decision-making through RL.<n>We propose ScalingInter-RL, a training approach designed for exploration-exploitation balance and stable RL optimization.<n>Our agents match or surpass commercial models on 27 tasks across diverse environments.
arXiv Detail & Related papers (2025-09-10T16:46:11Z) - ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay [88.74638385288773]
Agentic Replay Policy Optimization improves performance on complex, long-horizon computer tasks.<n>We propose a task selection strategy that filters tasks based on baseline agent performance.<n>Experiments on the OSWorld benchmark demonstrate that ARPO achieves competitive results.
arXiv Detail & Related papers (2025-05-22T06:24:32Z) - Advancing Language Multi-Agent Learning with Credit Re-Assignment for Interactive Environment Generalization [38.68388721203677]
We propose CollabUIAgents, a multi-agent reinforcement learning framework with a novel multi-agent credit re-assignment strategy.<n>We show that our framework improves both performance and cross-environment generalizability of multi-agent systems.
arXiv Detail & Related papers (2025-02-20T12:26:15Z) - Learning Emergence of Interaction Patterns across Independent RL Agents in Multi-Agent Environments [3.0284592792243794]
Bottom Up Network (BUN) treats the collective of multi-agents as a unified entity.
Our empirical evaluations across a variety of cooperative multi-agent scenarios, including tasks such as cooperative navigation and traffic control, consistently demonstrate BUN's superiority over baseline methods with substantially reduced computational costs.
arXiv Detail & Related papers (2024-10-03T14:25:02Z) - Enabling Real-Time Conversations with Minimal Training Costs [61.80370154101649]
This paper presents a new duplex decoding approach that enhances large language models with duplex ability, requiring minimal training.
Experimental results indicate that our proposed method significantly enhances the naturalness and human-likeness of user-AI interactions with minimal training costs.
arXiv Detail & Related papers (2024-09-18T06:27:26Z) - Design Optimization of NOMA Aided Multi-STAR-RIS for Indoor Environments: A Convex Approximation Imitated Reinforcement Learning Approach [51.63921041249406]
Non-orthogonal multiple access (NOMA) enables multiple users to share the same frequency band, and simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS)
deploying STAR-RIS indoors presents challenges in interference mitigation, power consumption, and real-time configuration.
A novel network architecture utilizing multiple access points (APs), STAR-RISs, and NOMA is proposed for indoor communication.
arXiv Detail & Related papers (2024-06-19T07:17:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.