Trust Me on This: A User Study of Trustworthiness for RAG Responses
- URL: http://arxiv.org/abs/2601.14460v1
- Date: Tue, 20 Jan 2026 20:30:33 GMT
- Title: Trust Me on This: A User Study of Trustworthiness for RAG Responses
- Authors: Weronika Ćajewska, Krisztian Balog,
- Abstract summary: This study investigates how different types of explanations can influence user trust in responses from retrieval-augmented generation systems.<n>Users' judgments are also heavily influenced by response clarity, actionability, and their own prior knowledge.
- Score: 15.029309551125962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The integration of generative AI into information access systems often presents users with synthesized answers that lack transparency. This study investigates how different types of explanations can influence user trust in responses from retrieval-augmented generation systems. We conducted a controlled, two-stage user study where participants chose the more trustworthy response from a pair-one objectively higher quality than the other-both with and without one of three explanation types: (1) source attribution, (2) factual grounding, and (3) information coverage. Our results show that while explanations significantly guide users toward selecting higher quality responses, trust is not dictated by objective quality alone: Users' judgments are also heavily influenced by response clarity, actionability, and their own prior knowledge.
Related papers
- "Even explanations will not help in trusting [this] fundamentally biased system": A Predictive Policing Case-Study [8.240854389254222]
The use of AI systems in high-risk domains have often led users to either under-trust it, potentially causing inadequate reliance or over-trust it, resulting in over-compliance.<n>Past research has indicated that explanations provided by AI systems can enhance user understanding of when to trust or not trust the system.<n>This study explores the impact of different explanation types and user expertise on establishing appropriate trust in AI-based predictive policing.
arXiv Detail & Related papers (2025-04-15T09:43:48Z) - Fostering Appropriate Reliance on Large Language Models: The Role of Explanations, Sources, and Inconsistencies [66.30619782227173]
Large language models (LLMs) can produce erroneous responses that sound fluent and convincing.<n>We identify several features of LLM responses that shape users' reliance.<n>We find that explanations increase reliance on both correct and incorrect responses.<n>We observe less reliance on incorrect responses when sources are provided or when explanations exhibit inconsistencies.
arXiv Detail & Related papers (2025-02-12T16:35:41Z) - Do RAG Systems Cover What Matters? Evaluating and Optimizing Responses with Sub-Question Coverage [74.70255719194819]
We introduce a novel framework based on sub-question coverage, which measures how well a RAG system addresses different facets of a question.
We use this framework to evaluate three commercial generative answer engines: You.com, Perplexity AI, and Bing Chat.
We find that while all answer engines cover core sub-questions more often than background or follow-up ones, they still miss around 50% of core sub-questions.
arXiv Detail & Related papers (2024-10-20T22:59:34Z) - Creating Healthy Friction: Determining Stakeholder Requirements of Job Recommendation Explanations [2.373992571236766]
We evaluate an explainable job recommender system using a realistic, task-based, mixed-design user study.
We find that providing stakeholders with real explanations does not significantly improve decision-making speed and accuracy.
arXiv Detail & Related papers (2024-09-24T11:03:17Z) - Why Would You Suggest That? Human Trust in Language Model Responses [0.3749861135832073]
We analyze how the framing and presence of explanations affect user trust and model performance.
Our findings urge future research to delve deeper into the nuanced evaluation of trust in human-machine teaming systems.
arXiv Detail & Related papers (2024-06-04T06:57:47Z) - Explainability for Transparent Conversational Information-Seeking [13.790574266700006]
This study explores different methods of explaining the responses.
By exploring transparency across explanation type, quality, and presentation mode, this research aims to bridge the gap between system-generated responses and responses verifiable by the user.
arXiv Detail & Related papers (2024-05-06T09:25:14Z) - Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback.
The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied.
We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z) - RELIC: Investigating Large Language Model Responses using Self-Consistency [58.63436505595177]
Large Language Models (LLMs) are notorious for blending fact with fiction and generating non-factual content, known as hallucinations.
We propose an interactive system that helps users gain insight into the reliability of the generated text.
arXiv Detail & Related papers (2023-11-28T14:55:52Z) - Continually Improving Extractive QA via Human Feedback [59.49549491725224]
We study continually improving an extractive question answering (QA) system via human user feedback.
We conduct experiments involving thousands of user interactions under diverse setups to broaden the understanding of learning from feedback over time.
arXiv Detail & Related papers (2023-05-21T14:35:32Z) - Double Retrieval and Ranking for Accurate Question Answering [120.69820139008138]
We show that an answer verification step introduced in Transformer-based answer selection models can significantly improve the state of the art in Question Answering.
The results on three well-known datasets for AS2 show consistent and significant improvement of the state of the art.
arXiv Detail & Related papers (2022-01-16T06:20:07Z) - An Empirical Study of Clarifying Question-Based Systems [15.767515065224016]
We conduct an online experiment by deploying an experimental system, which interacts with users by asking clarifying questions against a product repository.
We collect both implicit interaction behavior data and explicit feedback from users showing that: (a) users are willing to answer a good number of clarifying questions (11-21 on average), but not many more than that.
arXiv Detail & Related papers (2020-08-01T15:10:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.