RECAP: REwriting Conversations for Intent Understanding in Agentic Planning
- URL: http://arxiv.org/abs/2509.04472v1
- Date: Fri, 29 Aug 2025 20:45:37 GMT
- Title: RECAP: REwriting Conversations for Intent Understanding in Agentic Planning
- Authors: Kushan Mitra, Dan Zhang, Hannah Kim, Estevam Hruschka,
- Abstract summary: Real-world dialogues are often ambiguous, underspecified, or dynamic.<n>Traditional classification-based approaches struggle to generalize in open-ended settings.<n>We propose RECAP, a new benchmark designed to evaluate and advance intent rewriting.
- Score: 14.28070179801169
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Understanding user intent is essential for effective planning in conversational assistants, particularly those powered by large language models (LLMs) coordinating multiple agents. However, real-world dialogues are often ambiguous, underspecified, or dynamic, making intent detection a persistent challenge. Traditional classification-based approaches struggle to generalize in open-ended settings, leading to brittle interpretations and poor downstream planning. We propose RECAP (REwriting Conversations for Agent Planning), a new benchmark designed to evaluate and advance intent rewriting, reframing user-agent dialogues into concise representations of user goals. RECAP captures diverse challenges such as ambiguity, intent drift, vagueness, and mixed-goal conversations. Alongside the dataset, we introduce an LLM-based evaluator that assesses planning utility given the rewritten intent. Using RECAP, we develop a prompt-based rewriting approach that outperforms baselines. We further demonstrate that fine-tuning two DPO-based rewriters yields additional utility gains. Our results highlight intent rewriting as a critical and tractable component for improving agent planning in open-domain dialogue systems.
Related papers
- Exploring Plan Space through Conversation: An Agentic Framework for LLM-Mediated Explanations in Planning [10.679298682391817]
We present a multi-agent Large Language Model architecture that is agnostic to the explanation framework and enables user- and context-dependent interactive explanations.<n>We also describe an instantiation of this framework for goal-conflict explanations, which we use to conduct a user study comparing the LLM-powered interaction with a baseline template-based explanation interface.
arXiv Detail & Related papers (2026-03-02T16:58:18Z) - ReIn: Conversational Error Recovery with Reasoning Inception [43.5498321001366]
This work focuses on error recovery, which necessitates the accurate diagnosis of erroneous dialogue contexts and execution of proper recovery plans.<n>We propose Reasoning Inception (ReIn), a test-time intervention method that plants an initial reasoning into the agent's decision-making process.<n>We evaluate ReIn by systematically simulating conversational failure scenarios that directly hinder successful completion of user goals.
arXiv Detail & Related papers (2026-02-19T02:37:29Z) - ChatSOP: An SOP-Guided MCTS Planning Framework for Controllable LLM Dialogue Agents [52.7201882529976]
We propose SOP-guided Monte Carlo Tree Search (MCTS) planning framework to enhance controllability of dialogue agents.<n>To enable this, we curate a dataset comprising SOP-annotated multi-scenario dialogues, generated using a semi-automated role-playing system with GPT-4o.<n>We also propose a novel method that integrates Chain of Thought reasoning with supervised fine-tuning for SOP prediction.
arXiv Detail & Related papers (2024-07-04T12:23:02Z) - Ask-before-Plan: Proactive Language Agents for Real-World Planning [68.08024918064503]
Proactive Agent Planning requires language agents to predict clarification needs based on user-agent conversation and agent-environment interaction.
We propose a novel multi-agent framework, Clarification-Execution-Planning (textttCEP), which consists of three agents specialized in clarification, execution, and planning.
arXiv Detail & Related papers (2024-06-18T14:07:28Z) - Unsupervised End-to-End Task-Oriented Dialogue with LLMs: The Power of the Noisy Channel [9.082443585886127]
Training task-oriented dialogue systems typically require turn-level annotations for interacting with their APIs.
Unlabeled data and a schema definition are sufficient for building a working task-oriented dialogue system, completely unsupervised.
We propose an innovative approach using expectation-maximization (EM) that infers turn-level annotations as latent variables.
arXiv Detail & Related papers (2024-04-23T16:51:26Z) - Tell Me More! Towards Implicit User Intention Understanding of Language
Model Driven Agents [110.25679611755962]
Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions.
We introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries.
We empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals.
arXiv Detail & Related papers (2024-02-14T14:36:30Z) - JoTR: A Joint Transformer and Reinforcement Learning Framework for
Dialog Policy Learning [53.83063435640911]
Dialogue policy learning (DPL) is a crucial component of dialogue modelling.
We introduce a novel framework, JoTR, to generate flexible dialogue actions.
Unlike traditional methods, JoTR formulates a word-level policy that allows for a more dynamic and adaptable dialogue action generation.
arXiv Detail & Related papers (2023-09-01T03:19:53Z) - Controllable Mixed-Initiative Dialogue Generation through Prompting [50.03458333265885]
Mixed-initiative dialogue tasks involve repeated exchanges of information and conversational control.
Agents gain control by generating responses that follow particular dialogue intents or strategies, prescribed by a policy planner.
Standard approach has been fine-tuning pre-trained language models to perform generation conditioned on these intents.
We instead prompt large language models as a drop-in replacement to fine-tuning on conditional generation.
arXiv Detail & Related papers (2023-05-06T23:11:25Z) - Target-Guided Dialogue Response Generation Using Commonsense and Data
Augmentation [32.764356638437214]
We introduce a new technique for target-guided response generation.
We also propose techniques to re-purpose existing dialogue datasets for target-guided generation.
Our work generally enables dialogue system designers to exercise more control over the conversations that their systems produce.
arXiv Detail & Related papers (2022-05-19T04:01:40Z) - Improved Goal Oriented Dialogue via Utterance Generation and Look Ahead [5.062869359266078]
intent prediction can be improved by training a deep text-to-text neural model to generate successive user utterances from unlabeled dialogue data.
We present a novel look-ahead approach that uses user utterance generation to improve intent prediction in time.
arXiv Detail & Related papers (2021-10-24T11:12:48Z) - Structural Pre-training for Dialogue Comprehension [51.215629336320305]
We present SPIDER, Structural Pre-traIned DialoguE Reader, to capture dialogue exclusive features.
To simulate the dialogue-like features, we propose two training objectives in addition to the original LM objectives.
Experimental results on widely used dialogue benchmarks verify the effectiveness of the newly introduced self-supervised tasks.
arXiv Detail & Related papers (2021-05-23T15:16:54Z) - Personalized Query Rewriting in Conversational AI Agents [7.086654234990377]
We propose a query rewriting approach by leveraging users' historically successful interactions as a form of memory.
We present a neural retrieval model and a pointer-generator network with hierarchical attention and show that they perform significantly better at the query rewriting task with the aforementioned user memories than without.
arXiv Detail & Related papers (2020-11-09T20:45:39Z) - Generalizable and Explainable Dialogue Generation via Explicit Action
Learning [33.688270031454095]
Conditioned response generation serves as an effective approach to optimize task completion and language quality.
latent action learning is introduced to map each utterance to a latent representation.
This approach is prone to over-dependence on the training data, and the generalization capability is thus restricted.
Our proposed approach outperforms latent action baselines on MultiWOZ, a benchmark multi-domain dataset.
arXiv Detail & Related papers (2020-10-08T04:37:22Z) - Learning an Effective Context-Response Matching Model with
Self-Supervised Tasks for Retrieval-based Dialogues [88.73739515457116]
We introduce four self-supervised tasks including next session prediction, utterance restoration, incoherence detection and consistency discrimination.
We jointly train the PLM-based response selection model with these auxiliary tasks in a multi-task manner.
Experiment results indicate that the proposed auxiliary self-supervised tasks bring significant improvement for multi-turn response selection.
arXiv Detail & Related papers (2020-09-14T08:44:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.