What Else Do I Need to Know? The Effect of Background Information on
Users' Reliance on QA Systems
- URL: http://arxiv.org/abs/2305.14331v2
- Date: Thu, 26 Oct 2023 01:01:11 GMT
- Title: What Else Do I Need to Know? The Effect of Background Information on
Users' Reliance on QA Systems
- Authors: Navita Goyal, Eleftheria Briakou, Amanda Liu, Connor Baumler, Claire
Bonial, Jeffrey Micher, Clare R. Voss, Marine Carpuat, Hal Daum\'e III
- Abstract summary: We study how users interact with QA systems in the absence of sufficient information to assess their predictions.
Our study reveals that users rely on model predictions even in the absence of sufficient information needed to assess the model's correctness.
- Score: 23.69129423040988
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: NLP systems have shown impressive performance at answering questions by
retrieving relevant context. However, with the increasingly large models, it is
impossible and often undesirable to constrain models' knowledge or reasoning to
only the retrieved context. This leads to a mismatch between the information
that the models access to derive the answer and the information that is
available to the user to assess the model predicted answer. In this work, we
study how users interact with QA systems in the absence of sufficient
information to assess their predictions. Further, we ask whether adding the
requisite background helps mitigate users' over-reliance on predictions. Our
study reveals that users rely on model predictions even in the absence of
sufficient information needed to assess the model's correctness. Providing the
relevant background, however, helps users better catch model errors, reducing
over-reliance on incorrect predictions. On the flip side, background
information also increases users' confidence in their accurate as well as
inaccurate judgments. Our work highlights that supporting users' verification
of QA predictions is an important, yet challenging, problem.
Related papers
- Accounting for Sycophancy in Language Model Uncertainty Estimation [28.08509288774144]
We study the relationship between sycophancy and uncertainty estimation for the first time.
We show that user confidence plays a critical role in modulating the effects of sycophancy.
We argue that externalizing both model and user uncertainty can help to mitigate the impacts of sycophancy bias.
arXiv Detail & Related papers (2024-10-17T18:00:25Z) - Estimating Uncertainty with Implicit Quantile Network [0.0]
Uncertainty quantification is an important part of many performance critical applications.
This paper provides a simple alternative to existing approaches such as ensemble learning and bayesian neural networks.
arXiv Detail & Related papers (2024-08-26T13:33:14Z) - Uncertainty-aware Language Modeling for Selective Question Answering [107.47864420630923]
We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs.
Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems.
arXiv Detail & Related papers (2023-11-26T22:47:54Z) - Improving the Reliability of Large Language Models by Leveraging
Uncertainty-Aware In-Context Learning [76.98542249776257]
Large-scale language models often face the challenge of "hallucination"
We introduce an uncertainty-aware in-context learning framework to empower the model to enhance or reject its output in response to uncertainty.
arXiv Detail & Related papers (2023-10-07T12:06:53Z) - Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering [26.34649731975005]
Retriever-augmented instruction-following models are attractive alternatives to fine-tuned approaches for question answering (QA)
While the model responses tend to be natural and fluent, the additional verbosity makes traditional QA evaluation metrics unreliable for accurately quantifying model performance.
We use both automatic and human evaluation to evaluate these models along two dimensions: 1) how well they satisfy the user's information need (correctness) and 2) whether they produce a response based on the provided knowledge (faithfulness)
arXiv Detail & Related papers (2023-07-31T17:41:00Z) - Improving Selective Visual Question Answering by Learning from Your
Peers [74.20167944693424]
Visual Question Answering (VQA) models can have difficulties abstaining from answering when they are wrong.
We propose Learning from Your Peers (LYP) approach for training multimodal selection functions for making abstention decisions.
Our approach uses predictions from models trained on distinct subsets of the training data as targets for optimizing a Selective VQA model.
arXiv Detail & Related papers (2023-06-14T21:22:01Z) - Realistic Conversational Question Answering with Answer Selection based
on Calibrated Confidence and Uncertainty Measurement [54.55643652781891]
Conversational Question Answering (ConvQA) models aim at answering a question with its relevant paragraph and previous question-answer pairs that occurred during conversation multiple times.
We propose to filter out inaccurate answers in the conversation history based on their estimated confidences and uncertainties from the ConvQA model.
We validate our models, Answer Selection-based realistic Conversation Question Answering, on two standard ConvQA datasets.
arXiv Detail & Related papers (2023-02-10T09:42:07Z) - UKP-SQuARE v2 Explainability and Adversarial Attacks for Trustworthy QA [47.8796570442486]
Question Answering systems are increasingly deployed in applications where they support real-world decisions.
Inherently interpretable models or post hoc explainability methods can help users to comprehend how a model arrives at its prediction.
We introduce SQuARE v2, the new version of SQuARE, to provide an explainability infrastructure for comparing models.
arXiv Detail & Related papers (2022-08-19T13:01:01Z) - VisFIS: Visual Feature Importance Supervision with
Right-for-the-Right-Reason Objectives [84.48039784446166]
We show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason metrics.
Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets.
Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful.
arXiv Detail & Related papers (2022-06-22T17:02:01Z) - Exploring Weaknesses of VQA Models through Attribution Driven Insights [0.0]
Recent research effectively applies these VQA models for answering visual questions for the blind.
We analyze popular VQA models through the lens of attribution (input's influence on predictions) to gain valuable insights.
arXiv Detail & Related papers (2020-06-11T17:30:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.