Related papers: Dialogue Model Optimization via Agent Game and Adaptive Tree-based GRPO

Dialogue Model Optimization via Agent Game and Adaptive Tree-based GRPO

URL: http://arxiv.org/abs/2602.08533v2
Date: Tue, 10 Feb 2026 13:34:47 GMT
Title: Dialogue Model Optimization via Agent Game and Adaptive Tree-based GRPO
Authors: Kun Peng, Conghui Tan, Yu Liu, Guohua Tang, Zhongqian Sun, Wei Yang, Zining Zhu, Lei Jiang, Yanbing Liu, Hao Peng,
Abstract summary: Open-ended dialogue agents aim to deliver engaging, personalized interactions by adapting to users' traits.<n>We propose a novel long-horizon framework integrating online personalization with Adaptive Tree-based Group Relative Policy Optimization.
Score: 19.784541601653128
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Open-ended dialogue agents aim to deliver engaging, personalized interactions by adapting to users' traits, but existing methods face critical limitations: over-reliance on pre-collected user data, and short-horizon biases in reinforcement learning (RL) that neglect long-term dialogue value. To address these, we propose a novel long-horizon RL framework integrating online personalization with Adaptive Tree-based Group Relative Policy Optimization (AT-GRPO). Adopting a two-agent game paradigm, a user agent constructs dynamic environments via style mimicry (learning user-specific conversational traits) and active termination (predicting turn-level termination probabilities as immediate rewards), forming an iterative cycle that drives the dialogue agent to deepen interest exploration. AT-GRPO reinterprets dialogue trajectories as trees and introduces adaptive observation ranges. Unlike full tree expansion that incurs exponential overhead, it limits each node to aggregate rewards from a stage-aware range: larger ranges support early-stage topic exploration, while smaller ranges facilitate late-stage dialogue maintenance. This design reduces rollout budgets from exponential to polynomial in the dialogue length, while preserving long-term reward capture. Extensive experiments show our framework's superior performance, sample efficiency, and robustness.

Related papers

IntentRL: Training Proactive User-intent Agents for Open-ended Deep Research via Reinforcement Learning [54.21689544323704]
Deep Research (DR) agents extend Large Language Models (LLMs) beyond parametric knowledge.<n>Unlike real-time conversational assistants, DR is computationally expensive and time-consuming.<n>We propose IntentRL, a framework that trains proactive agents to clarify latent user intents before starting long-horizon research.
arXiv Detail & Related papers (2026-02-03T12:43:09Z)
ALPBench: A Benchmark for Attribution-level Long-term Personal Behavior Understanding [53.88804678012327]
ALPBench is a Benchmark for Attribution-level Long-term Personal Behavior Understanding.<n>It predicts user-interested attribute combinations, enabling ground-truth evaluation.<n>It models preferences from long-term historical behaviors rather than users' explicitly expressed requests.
arXiv Detail & Related papers (2026-02-03T03:32:16Z)
Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning [66.52010873968383]
We introduce a conversational agent that interleaves search and reasoning across turns, enabling exploratory and adaptive behaviors learned through reinforcement learning (RL) training.<n>The experimental results across four widely used conversational benchmarks demonstrate the effectiveness of our methods.
arXiv Detail & Related papers (2026-01-19T14:55:54Z)
Mem-PAL: Towards Memory-based Personalized Dialogue Assistants for Long-term User-Agent Interaction [55.24448139349266]
We present PAL-Bench, a new benchmark designed to evaluate the personalization capabilities of service-oriented assistants in long-term user-agent interactions.<n>To improve personalized service-oriented interactions, we propose H$2$Memory, a hierarchical and heterogeneous memory framework.
arXiv Detail & Related papers (2025-11-17T14:22:32Z)
RGMem: Renormalization Group-based Memory Evolution for Language Agent User Profile [8.224917568034572]
We propose a self-evolving memory framework, inspired by the ideology of classic renormalization group (RG) in physics.<n>This framework enables to organize the dialogue history in multiple scales.<n>The core innovation of our work lies in modeling memory evolution as a multi-scale process of information compression and emergence.
arXiv Detail & Related papers (2025-10-18T08:16:46Z)
Hello Again! LLM-powered Personalized Agent for Long-term Dialogue [63.65128176360345]
We introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent)<n>It incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation.<n>The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated.
arXiv Detail & Related papers (2024-06-09T21:58:32Z)
Evaluating Very Long-Term Conversational Memory of LLM Agents [95.84027826745609]
We introduce a machine-human pipeline to generate high-quality, very long-term dialogues. We equip each agent with the capability of sharing and reacting to images. The generated conversations are verified and edited by human annotators for long-range consistency.
arXiv Detail & Related papers (2024-02-27T18:42:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.