Related papers: ReIn: Conversational Error Recovery with Reasoning Inception

ReIn: Conversational Error Recovery with Reasoning Inception

URL: http://arxiv.org/abs/2602.17022v1
Date: Thu, 19 Feb 2026 02:37:29 GMT
Title: ReIn: Conversational Error Recovery with Reasoning Inception
Authors: Takyoung Kim, Jinseok Nam, Chandrayee Basu, Xing Fan, Chengyuan Ma, Heng Ji, Gokhan Tur, Dilek Hakkani-Tür,
Abstract summary: This work focuses on error recovery, which necessitates the accurate diagnosis of erroneous dialogue contexts and execution of proper recovery plans.<n>We propose Reasoning Inception (ReIn), a test-time intervention method that plants an initial reasoning into the agent's decision-making process.<n>We evaluate ReIn by systematically simulating conversational failure scenarios that directly hinder successful completion of user goals.
Score: 43.5498321001366
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Conversational agents powered by large language models (LLMs) with tool integration achieve strong performance on fixed task-oriented dialogue datasets but remain vulnerable to unanticipated, user-induced errors. Rather than focusing on error prevention, this work focuses on error recovery, which necessitates the accurate diagnosis of erroneous dialogue contexts and execution of proper recovery plans. Under realistic constraints precluding model fine-tuning or prompt modification due to significant cost and time requirements, we explore whether agents can recover from contextually flawed interactions and how their behavior can be adapted without altering model parameters and prompts. To this end, we propose Reasoning Inception (ReIn), a test-time intervention method that plants an initial reasoning into the agent's decision-making process. Specifically, an external inception module identifies predefined errors within the dialogue context and generates recovery plans, which are subsequently integrated into the agent's internal reasoning process to guide corrective actions, without modifying its parameters or system prompts. We evaluate ReIn by systematically simulating conversational failure scenarios that directly hinder successful completion of user goals: user's ambiguous and unsupported requests. Across diverse combinations of agent models and inception modules, ReIn substantially improves task success and generalizes to unseen error types. Moreover, it consistently outperforms explicit prompt-modification approaches, underscoring its utility as an efficient, on-the-fly method. In-depth analysis of its operational mechanism, particularly in relation to instruction hierarchy, indicates that jointly defining recovery tools with ReIn can serve as a safe and effective strategy for improving the resilience of conversational agents without modifying the backbone models or system prompts.

Related papers

From Static Benchmarks to Dynamic Protocol: Agent-Centric Text Anomaly Detection for Evaluating LLM Reasoning [12.024430772980502]
We introduce an agent-centric benchmarking paradigm for evaluating large language models.<n>A teacher agent generates candidate problems, an orchestrator agent rigorously verifies their validity and guards against adversarial attacks.<n>If a student correctly solves the problem, the orchestrator prompts the teacher to generate more challenging variants.
arXiv Detail & Related papers (2026-02-27T06:54:32Z)
The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution [63.61358761489141]
Large Language Model (LLM)-based agents are widely used in real-world applications such as customer service, web navigation, and software engineering.<n>We propose a novel framework for textbfgeneral agentic attribution, designed to identify the internal factors driving agent actions regardless of the task outcome.<n>We validate our framework across a diverse suite of agentic scenarios, including standard tool use and subtle reliability risks like memory-induced bias.
arXiv Detail & Related papers (2026-01-21T15:22:21Z)
Metacognitive Self-Correction for Multi-Agent System via Prototype-Guided Next-Execution Reconstruction [58.51530390018909]
Large Language Model based multi-agent systems excel at collaborative problem solving but remain brittle to cascading errors.<n>We present MASC, a metacognitive framework that endows MAS with real-time, unsupervised, step-level error detection and self-correction.
arXiv Detail & Related papers (2025-10-16T05:35:37Z)
PAT-Agent: Autoformalization for Model Checking [17.082027022913998]
PAT-Agent is an end-to-end framework for natural language autoformalization and formal model repair.<n>It combines the generative capabilities of large language models with the rigor of formal verification.
arXiv Detail & Related papers (2025-09-28T06:32:14Z)
RECAP: REwriting Conversations for Intent Understanding in Agentic Planning [14.28070179801169]
Real-world dialogues are often ambiguous, underspecified, or dynamic.<n>Traditional classification-based approaches struggle to generalize in open-ended settings.<n>We propose RECAP, a new benchmark designed to evaluate and advance intent rewriting.
arXiv Detail & Related papers (2025-08-29T20:45:37Z)
In Prospect and Retrospect: Reflective Memory Management for Long-term Personalized Dialogue Agents [70.12342024019044]
Large Language Models (LLMs) have made significant progress in open-ended dialogue, yet their inability to retain and retrieve relevant information limits their effectiveness.<n>We propose Reflective Memory Management (RMM), a novel mechanism for long-term dialogue agents, integrating forward- and backward-looking reflections.<n>RMM shows more than 10% accuracy improvement over the baseline without memory management on the LongMemEval dataset.
arXiv Detail & Related papers (2025-03-11T04:15:52Z)
Self-Corrective Task Planning by Inverse Prompting with Large Language Models [9.283971287618261]
We introduce InversePrompt, a novel self-corrective task planning approach.<n>Our method incorporates reasoning steps to provide clear, interpretable feedback.<n>Results on benchmark datasets show an average 16.3% higher success rate over existing LLM-based task planning methods.
arXiv Detail & Related papers (2025-03-10T13:35:51Z)
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training [18.896813839389893]
We propose an iterative self-training framework, Agent-R, that enables language Agent to Reflect on the fly.<n>Unlike traditional methods that reward or penalize actions based on correctness, Agent-R leverages MCTS to construct training data that recover correct trajectories from erroneous ones.<n>Our findings demonstrate that Agent-R continuously improves the model's ability to recover from errors and enables timely error correction.
arXiv Detail & Related papers (2025-01-20T11:46:04Z)
Repairs in a Block World: A New Benchmark for Handling User Corrections with Multi-Modal Language Models [48.42142115255159]
We release BlockWorld-Repairs: a dataset of multi-modal TPR sequences in an instruction-following manipulation task. We evaluate several state-of-the-art Vision and Language Models (VLM) across multiple settings, focusing on their capability to process and accurately respond to TPRs. Our results suggest that these models are not yet ready to be deployed in multi-modal collaborative settings.
arXiv Detail & Related papers (2024-09-21T21:06:25Z)
Tuning-Free Accountable Intervention for LLM Deployment -- A Metacognitive Approach [55.613461060997004]
Large Language Models (LLMs) have catalyzed transformative advances across a spectrum of natural language processing tasks. We propose an innovative textitmetacognitive approach, dubbed textbfCLEAR, to equip LLMs with capabilities for self-aware error identification and correction.
arXiv Detail & Related papers (2024-03-08T19:18:53Z)
Using Textual Interface to Align External Knowledge for End-to-End Task-Oriented Dialogue Systems [53.38517204698343]
We propose a novel paradigm that uses a textual interface to align external knowledge and eliminate redundant processes. We demonstrate our paradigm in practice through MultiWOZ-Remake, including an interactive textual interface built for the MultiWOZ database.
arXiv Detail & Related papers (2023-05-23T05:48:21Z)
CAPE: Corrective Actions from Precondition Errors using Large Language Models [8.547766794082184]
We propose a novel approach that attempts to propose corrective actions to resolve precondition errors during planning. CAPE improves the quality of generated plans by leveraging few-shot reasoning from action preconditions. Our improvements transfer to a Boston Dynamics Spot robot with a set of skills (specified in language) and associated preconditions, where CAPE improves the correctness metric of the executed task plans by 76.49% compared to SayCan.
arXiv Detail & Related papers (2022-11-17T23:14:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.