User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal
- URL: http://arxiv.org/abs/2507.23158v1
- Date: Wed, 30 Jul 2025 23:33:29 GMT
- Title: User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal
- Authors: Yuhan Liu, Michael J. Q. Zhang, Eunsol Choi,
- Abstract summary: We study implicit user feedback in two user-LM interaction datasets.<n>We find that the contents of user feedback can improve model performance in short human-designed questions.<n>We also find that the usefulness of user feedback is largely tied to the quality of the user's initial prompt.
- Score: 58.43749783815486
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Once language models (LMs) are deployed, they can interact with users long-term, ideally evolving continuously based on their feedback. Asking for direct user feedback can be disruptive; thus, we study harvesting user feedback from user-LM interaction logs. We study implicit user feedback in two user-LM interaction datasets (WildChat and LMSYS). First, we analyze user feedback in the user-LLM conversation trajectory, providing insights into when and why such feedback occurs. Second, we study harvesting learning signals from such implicit user feedback. We find that the contents of user feedback (e.g., user wanted clarification), not just the polarity (e.g., users were unhappy with the previous model response), can improve model performance in short human-designed questions (MTBench) but not on longer and more complex questions (WildBench). We also find that the usefulness of user feedback is largely tied to the quality of the user's initial prompt. Together, we provide an in-depth study of implicit user feedback, showing its potential and limitations.
Related papers
- Can User Feedback Help Issue Detection? An Empirical Study on a One-billion-user Online Service System [28.43595612060133]
We conduct an empirical study on 50,378,766 user feedback items from six real-world services in a one-billion-user online service system.<n>Our results show that a large proportion of user feedback provides irrelevant information about system issues.<n>We find severe issues that cannot be easily detected based solely on user feedback characteristics.
arXiv Detail & Related papers (2025-08-01T12:49:07Z) - Reinforcement Learning from User Feedback [28.335218244885706]
We introduce Reinforcement Learning from User Feedback (RLUF), a framework for aligning large language models with user preferences.<n>We train a reward model, P[Love], to predict the likelihood that an LLM response will receive a Love Reaction.<n>We show that P[Love] is predictive of increased positive feedback and serves as a reliable offline evaluator of future user behavior.
arXiv Detail & Related papers (2025-05-20T22:14:44Z) - Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale [51.9706400130481]
Large Language Models (LLMs) have emerged as personalized assistants for users across a wide range of tasks.<n> PERSONAMEM features curated user profiles with over 180 simulated user-LLM interaction histories.<n>We evaluate LLM chatbots' ability to identify the most suitable response according to the current state of the user's profile.
arXiv Detail & Related papers (2025-04-19T08:16:10Z) - WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback [36.06000681394939]
We introduce WildFeedback, a novel framework that leverages in-situ user feedback during conversations with large language models (LLMs) to create preference datasets automatically.<n>Our experiments demonstrate that LLMs fine-tuned on WildFeedback dataset exhibit significantly improved alignment with user preferences.
arXiv Detail & Related papers (2024-08-28T05:53:46Z) - Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback.
The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied.
We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z) - RefuteBench: Evaluating Refuting Instruction-Following for Large Language Models [17.782410287625645]
This paper proposes a benchmark, RefuteBench, covering tasks such as question answering, machine translation, and email writing.
The evaluation aims to assess whether models can positively accept feedback in form of refuting instructions and whether they can consistently adhere to user demands throughout the conversation.
arXiv Detail & Related papers (2024-02-21T01:39:56Z) - Continually Improving Extractive QA via Human Feedback [59.49549491725224]
We study continually improving an extractive question answering (QA) system via human user feedback.
We conduct experiments involving thousands of user interactions under diverse setups to broaden the understanding of learning from feedback over time.
arXiv Detail & Related papers (2023-05-21T14:35:32Z) - Simulating Bandit Learning from User Feedback for Extractive Question
Answering [51.97943858898579]
We study learning from user feedback for extractive question answering by simulating feedback using supervised data.
We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers.
arXiv Detail & Related papers (2022-03-18T17:47:58Z) - Improving Conversational Question Answering Systems after Deployment
using Feedback-Weighted Learning [69.42679922160684]
We propose feedback-weighted learning based on importance sampling to improve upon an initial supervised system using binary user feedback.
Our work opens the prospect to exploit interactions with real users and improve conversational systems after deployment.
arXiv Detail & Related papers (2020-11-01T19:50:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.