IntentRL: Training Proactive User-intent Agents for Open-ended Deep Research via Reinforcement Learning
- URL: http://arxiv.org/abs/2602.03468v1
- Date: Tue, 03 Feb 2026 12:43:09 GMT
- Title: IntentRL: Training Proactive User-intent Agents for Open-ended Deep Research via Reinforcement Learning
- Authors: Haohao Luo, Zexi Li, Yuexiang Xie, Wenhao Zhang, Yaliang Li, Ying Shen,
- Abstract summary: Deep Research (DR) agents extend Large Language Models (LLMs) beyond parametric knowledge.<n>Unlike real-time conversational assistants, DR is computationally expensive and time-consuming.<n>We propose IntentRL, a framework that trains proactive agents to clarify latent user intents before starting long-horizon research.
- Score: 54.21689544323704
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Research (DR) agents extend Large Language Models (LLMs) beyond parametric knowledge by autonomously retrieving and synthesizing evidence from large web corpora into long-form reports, enabling a long-horizon agentic paradigm. However, unlike real-time conversational assistants, DR is computationally expensive and time-consuming, creating an autonomy-interaction dilemma: high autonomy on ambiguous user queries often leads to prolonged execution with unsatisfactory outcomes. To address this, we propose IntentRL, a framework that trains proactive agents to clarify latent user intents before starting long-horizon research. To overcome the scarcity of open-ended research data, we introduce a scalable pipeline that expands a few seed samples into high-quality dialogue turns via a shallow-to-deep intent refinement graph. We further adopt a two-stage reinforcement learning (RL) strategy: Stage I applies RL on offline dialogues to efficiently learn general user-interaction behavior, while Stage II uses the trained agent and a user simulator for online rollouts to strengthen adaptation to diverse user feedback. Extensive experiments show that IntentRL significantly improves both intent hit rate and downstream task performance, outperforming the built-in clarify modules of closed-source DR agents and proactive LLM baselines.
Related papers
- Dialogue Model Optimization via Agent Game and Adaptive Tree-based GRPO [19.784541601653128]
Open-ended dialogue agents aim to deliver engaging, personalized interactions by adapting to users' traits.<n>We propose a novel long-horizon framework integrating online personalization with Adaptive Tree-based Group Relative Policy Optimization.
arXiv Detail & Related papers (2026-02-09T11:32:02Z) - ActiveVLN: Towards Active Exploration via Multi-Turn RL in Vision-and-Language Navigation [57.399685080574756]
Existing MLLM-based VLN methods rely on imitation learning (IL) and often use DAgger for post-training.<n>We propose ActiveVLN, a VLN framework that explicitly enables active exploration through multi-turn RL.<n>Experiments show that ActiveVLN achieves the largest performance gains over IL baselines compared to both DAgger-based and prior RL-based post-training methods.
arXiv Detail & Related papers (2025-09-16T03:31:46Z) - DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL [60.47878242100153]
We present DeepDive to advance deep search agents.<n>We propose a strategy to automatically synthesize complex, difficult, and hard-to-find questions from open knowledge graphs.<n>We apply end-to-end multi-turn reinforcement learning to enhance LLMs' long-horizon reasoning with deep search.
arXiv Detail & Related papers (2025-09-12T17:52:35Z) - SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents [93.26456498576181]
This paper focuses on the development of native Autonomous Single-Agent models for Deep Research.<n>Our best variant SFR-DR-20B achieves up to 28.7% on Humanity's Last Exam benchmark.
arXiv Detail & Related papers (2025-09-08T02:07:09Z) - Hello Again! LLM-powered Personalized Agent for Long-term Dialogue [63.65128176360345]
We introduce a model-agnostic framework, the Long-term Dialogue Agent (LD-Agent)<n>It incorporates three independently tunable modules dedicated to event perception, persona extraction, and response generation.<n>The effectiveness, generality, and cross-domain capabilities of LD-Agent are empirically demonstrated.
arXiv Detail & Related papers (2024-06-09T21:58:32Z) - Retrieval-Augmented Reinforcement Learning [63.32076191982944]
We train a network to map a dataset of past experiences to optimal behavior.
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context.
We show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
arXiv Detail & Related papers (2022-02-17T02:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.