Related papers: Do LLMs Benefit From Their Own Words?

Do LLMs Benefit From Their Own Words?

URL: http://arxiv.org/abs/2602.24287v1
Date: Fri, 27 Feb 2026 18:58:26 GMT
Title: Do LLMs Benefit From Their Own Words?
Authors: Jenny Y. Huang, Leshem Choshen, Ramon Astudillo, Tamara Broderick, Jacob Andreas,
Abstract summary: We find that removing prior assistant responses does not affect response quality on a large fraction of turns.<n>Omitting assistant-side context can reduce cumulative context lengths by up to 10x.<n>Our findings suggest that selectively omitting assistant history can improve response quality while reducing memory consumption.
Score: 56.73014497206615
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-turn interactions with large language models typically retain the assistant's own past responses in the conversation history. In this work, we revisit this design choice by asking whether large language models benefit from conditioning on their own prior responses. Using in-the-wild, multi-turn conversations, we compare standard (full-context) prompting with a user-turn-only prompting approach that omits all previous assistant responses, across three open reasoning models and one state-of-the-art model. To our surprise, we find that removing prior assistant responses does not affect response quality on a large fraction of turns. Omitting assistant-side history can reduce cumulative context lengths by up to 10x. To explain this result, we find that multi-turn conversations consist of a substantial proportion (36.4%) of self-contained prompts, and that many follow-up prompts provide sufficient instruction to be answered using only the current user turn and prior user turns. When analyzing cases where user-turn-only prompting substantially outperforms full context, we identify instances of context pollution, in which models over-condition on their previous responses, introducing errors, hallucinations, or stylistic artifacts that propagate across turns. Motivated by these findings, we design a context-filtering approach that selectively omits assistant-side context. Our findings suggest that selectively omitting assistant history can improve response quality while reducing memory consumption.

Related papers

Exploring Rewriting Approaches for Different Conversational Tasks [63.56404271441824]
The exact rewriting approach may often depend on the use case and application-specific tasks supported by the conversational assistant.<n>We systematically investigate two different approaches, denoted as rewriting and fusion, on two fundamentally different generation tasks.<n>Our results indicate that the specific rewriting or fusion approach highly depends on the underlying use case and generative task.
arXiv Detail & Related papers (2025-02-26T06:05:29Z)
InfoQuest: Evaluating Multi-Turn Dialogue Agents for Open-Ended Conversations with Hidden Context [4.262907114077643]
Large language models excel at following explicit instructions, but they often struggle with ambiguous or incomplete user requests.<n>We introduce InfoQuest, a benchmark designed to evaluate how dialogue agents handle hidden context in open-ended user requests.
arXiv Detail & Related papers (2025-02-17T19:01:10Z)
Enhancing Answer Attribution for Faithful Text Generation with Large Language Models [5.065947993017158]
We propose new methods for producing more independent and contextualized claims for better retrieval and attribution. New methods are evaluated and shown to improve the performance of answer attribution components.
arXiv Detail & Related papers (2024-10-22T15:37:46Z)
Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions [45.04582353648683]
Large language models (LLMs) must often respond to highly ambiguous user requests.<n>Existing LLMs often respond by presupposing a single interpretation of such ambiguous requests, frustrating users who intended a different interpretation.<n>We propose preference labels by simulating their expected outcomes in future turns.<n>This allows LLMs to learn to ask clarifying questions when it can generate responses that are tailored to each user interpretation in future turns.
arXiv Detail & Related papers (2024-10-17T17:29:04Z)
Phrase Retrieval for Open-Domain Conversational Question Answering with Conversational Dependency Modeling via Contrastive Learning [54.55643652781891]
Open-Domain Conversational Question Answering (ODConvQA) aims at answering questions through a multi-turn conversation. We propose a method to directly predict answers with a phrase retrieval scheme for a sequence of words.
arXiv Detail & Related papers (2023-06-07T09:46:38Z)
AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies [71.62832112141913]
We show that dialogue models can detect errors in their own messages introspectively, by calculating the likelihood of replies that are indicative of poor messages. We first show that hand-crafted replies can be effective for the task of detecting nonsense in applications as complex as Diplomacy. We find that AutoReply-generated replies outperform handcrafted replies and perform on par with carefully fine-tuned large supervised models.
arXiv Detail & Related papers (2022-11-22T22:31:34Z)
Prompting for a conversation: How to control a dialog model? [9.268682116424518]
Dialog models are trained on a large amount of text, yet their responses need to be limited to a desired scope and style of a dialog agent. Because the datasets used to achieve the former contain language that is not compatible with the latter, pre-trained dialog models are fine-tuned on smaller curated datasets. In this paper we investigate if prompting can mitigate the above trade-off.
arXiv Detail & Related papers (2022-09-22T14:59:55Z)
Turn-Taking Prediction for Natural Conversational Speech [40.189938418201656]
A common conversational utterance often involves multiple queries with turn-taking. Disfluencies include pausing to think, hesitations, word lengthening, filled pauses and repeated phrases. We present a turntaking predictor built on top of the end-to-end (E2E) speech recognizer.
arXiv Detail & Related papers (2022-08-29T01:09:23Z)
Generating Dialogue Responses from a Semantic Latent Space [75.18449428414736]
We propose an alternative to the end-to-end classification on vocabulary. We learn the pair relationship between the prompts and responses as a regression task on a latent space. Human evaluation showed that learning the task on a continuous space can generate responses that are both relevant and informative.
arXiv Detail & Related papers (2020-10-04T19:06:16Z)
Query Resolution for Conversational Search with Limited Supervision [63.131221660019776]
We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers. We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC.
arXiv Detail & Related papers (2020-05-24T11:37:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.