Related papers: In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning

In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning

URL: http://arxiv.org/abs/2510.00777v1
Date: Wed, 01 Oct 2025 11:16:04 GMT
Title: In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning
Authors: Youngbin Choi, Minjong Lee, Saemi Moon, Seunghyuk Cho, Chaehyeon Chung, MoonJeong Park, Dongwoo Kim,
Abstract summary: We introduce in-place feedback, a novel interaction paradigm in which users directly edit an LLM's previous response.<n> Empirical evaluations on reasoning-intensive benchmarks reveal that in-place feedback achieves better performance than conventional multi-turn feedback.
Score: 10.138497038893096
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) are increasingly studied in the context of multi-turn reasoning, where models iteratively refine their outputs based on user-provided feedback. Such settings are crucial for tasks that require complex reasoning, yet existing feedback paradigms often rely on issuing new messages. LLMs struggle to integrate these reliably, leading to inconsistent improvements. In this work, we introduce in-place feedback, a novel interaction paradigm in which users directly edit an LLM's previous response, and the model conditions on this modified response to generate its revision. Empirical evaluations on diverse reasoning-intensive benchmarks reveal that in-place feedback achieves better performance than conventional multi-turn feedback while using $79.1\%$ fewer tokens. Complementary analyses on controlled environments further demonstrate that in-place feedback resolves a core limitation of multi-turn feedback: models often fail to apply feedback precisely to erroneous parts of the response, leaving errors uncorrected and sometimes introducing new mistakes into previously correct content. These findings suggest that in-place feedback offers a more natural and effective mechanism for guiding LLMs in reasoning-intensive tasks.

Related papers

Revisiting Feedback Models for HyDE [49.53124785319461]
HyDE is a method that enriches query representations with LLM-generated hypothetical answer documents.<n>Our experiments show that HyDE's effectiveness can be substantially improved when leveraging feedback algorithms such as Rocchio to extract and weight expansion terms.
arXiv Detail & Related papers (2025-11-24T17:50:18Z)
MADREC: A Multi-Aspect Driven LLM Agent for Explainable and Adaptive Recommendation [11.430206422495829]
Multi-Aspect Driven LLM Agent MADRec is an autonomous recommender that constructs user and item profiles by unsupervised extraction of multi-aspect information from reviews.<n>MADRec generates structured profiles via aspect-category-based summarization and applies Re-Ranking to construct high-density inputs.<n>Experiments across multiple domains show that MADRec outperforms traditional and LLM-based baselines in both precision and explainability.
arXiv Detail & Related papers (2025-10-15T10:03:29Z)
Critique to Verify: Accurate and Honest Test-Time Scaling with RL-Trained Verifiers [63.99316853136304]
Mirror-Critique is a framework that trains a verifier with informative critiques.<n>We deploy a small instruction-tuned model to synthesize high-quality critique data.<n>The resulting Mirror-Verifier is deployed to evaluate candidate solutions by generating multiple critiques per solution.
arXiv Detail & Related papers (2025-09-27T06:50:24Z)
Rethinking Prompt Optimization: Reinforcement, Diversification, and Migration in Blackbox LLMs [10.434732630519377]
We propose a novel Automatic Prompt Optimization (APO) framework centered on enhancing the feedback mechanism.<n>To mitigate the noise inherent in LLM-generated feedback, we introduce a technique called feedback diversification.<n>Our approach consistently outperforms strong baselines, achieving significant accuracy improvements, faster convergence, and lower computational costs.
arXiv Detail & Related papers (2025-07-14T00:20:14Z)
Feedback Friction: LLMs Struggle to Fully Incorporate External Feedback [35.13591109493438]
We show that solver models consistently show resistance to feedback, a limitation that we term Feedback Friction.<n>We analyze Feedback Friction and find that models' confidence on specific questions, measured by semantic entropy, predicts feedback resistance.
arXiv Detail & Related papers (2025-06-13T16:31:51Z)
Reasoning-Aligned Perception Decoupling for Scalable Multi-modal Reasoning [95.44766931218896]
Multi-modal large language models (MLLMs) still lag behind text-based reasoning.<n>We introduce Perception-Reasoning Decoupling, which modularizes the MLLM's reasoning component and makes it easily replaceable.<n>We propose a novel reinforcement learning algorithm called Visual Perception Optimization (VPO) to align the MLLM's perceptual output with the final reasoning task.
arXiv Detail & Related papers (2025-06-05T02:28:07Z)
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback [36.06000681394939]
We introduce WildFeedback, a novel framework that leverages in-situ user feedback during conversations with large language models (LLMs) to create preference datasets automatically.<n>Our experiments demonstrate that LLMs fine-tuned on WildFeedback dataset exhibit significantly improved alignment with user preferences.
arXiv Detail & Related papers (2024-08-28T05:53:46Z)
Belief Revision: The Adaptability of Large Language Models Reasoning [63.0281286287648]
We introduce Belief-R, a new dataset designed to test LMs' belief revision ability when presented with new evidence. Inspired by how humans suppress prior inferences, this task assesses LMs within the newly proposed delta reasoning framework. We evaluate $sim$30 LMs across diverse prompting strategies and found that LMs generally struggle to appropriately revise their beliefs in response to new information.
arXiv Detail & Related papers (2024-06-28T09:09:36Z)
RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models [17.782410287625645]
This paper proposes a benchmark, RefuteBench, covering tasks such as question answering, machine translation, and email writing. The evaluation aims to assess whether models can positively accept feedback in form of refuting instructions and whether they can consistently adhere to user demands throughout the conversation.
arXiv Detail & Related papers (2024-02-21T01:39:56Z)
LLMRefine: Pinpointing and Refining Large Language Models via Fine-Grained Actionable Feedback [65.84061725174269]
Recent large language models (LLM) are leveraging human feedback to improve their generation quality. We propose LLMRefine, an inference time optimization method to refine LLM's output. We conduct experiments on three text generation tasks, including machine translation, long-form question answering (QA), and topical summarization. LLMRefine consistently outperforms all baseline approaches, achieving improvements up to 1.7 MetricX points on translation tasks, 8.1 ROUGE-L on ASQA, 2.2 ROUGE-L on topical summarization.
arXiv Detail & Related papers (2023-11-15T19:52:11Z)
MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models [64.70153487607172]
Language Models (LMs) have shown impressive performance in various natural language tasks. When it comes to natural language reasoning, LMs still face challenges such as hallucination, generating incorrect intermediate reasoning steps, and making mathematical errors. Recent research has focused on enhancing LMs through self-improvement using feedback. In this work, we propose Multi-Aspect Feedback, an iterative refinement framework that integrates multiple feedback modules, including frozen LMs and external tools, each focusing on a specific error category.
arXiv Detail & Related papers (2023-10-19T02:32:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.