Related papers: Realistic Conversational Question Answering with Answer Selection based on Calibrated Confidence and Uncertainty Measurement

Realistic Conversational Question Answering with Answer Selection based on Calibrated Confidence and Uncertainty Measurement

URL: http://arxiv.org/abs/2302.05137v1
Date: Fri, 10 Feb 2023 09:42:07 GMT
Title: Realistic Conversational Question Answering with Answer Selection based on Calibrated Confidence and Uncertainty Measurement
Authors: Soyeong Jeong, Jinheon Baek, Sung Ju Hwang, Jong C. Park
Abstract summary: Conversational Question Answering (ConvQA) models aim at answering a question with its relevant paragraph and previous question-answer pairs that occurred during conversation multiple times. We propose to filter out inaccurate answers in the conversation history based on their estimated confidences and uncertainties from the ConvQA model. We validate our models, Answer Selection-based realistic Conversation Question Answering, on two standard ConvQA datasets.
Score: 54.55643652781891
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Conversational Question Answering (ConvQA) models aim at answering a question with its relevant paragraph and previous question-answer pairs that occurred during conversation multiple times. To apply such models to a real-world scenario, some existing work uses predicted answers, instead of unavailable ground-truth answers, as the conversation history for inference. However, since these models usually predict wrong answers, using all the predictions without filtering significantly hampers the model performance. To address this problem, we propose to filter out inaccurate answers in the conversation history based on their estimated confidences and uncertainties from the ConvQA model, without making any architectural changes. Moreover, to make the confidence and uncertainty values more reliable, we propose to further calibrate them, thereby smoothing the model predictions. We validate our models, Answer Selection-based realistic Conversation Question Answering, on two standard ConvQA datasets, and the results show that our models significantly outperform relevant baselines. Code is available at: https://github.com/starsuzi/AS-ConvQA.

Related papers

DiverseAgentEntropy: Quantifying Black-Box LLM Uncertainty through Diverse Perspectives and Multi-Agent Interaction [53.803276766404494]
Existing methods, which gauge a model's uncertainty through evaluating self-consistency in responses to the original query, do not always capture true uncertainty. We propose a novel method, DiverseAgentEntropy, for evaluating a model's uncertainty using multi-agent interaction. Our method offers a more accurate prediction of the model's reliability and further detects hallucinations, outperforming other self-consistency-based methods.
arXiv Detail & Related papers (2024-12-12T18:52:40Z)
GSQA: An End-to-End Model for Generative Spoken Question Answering [54.418723701886115]
We introduce the first end-to-end Generative Spoken Question Answering (GSQA) model that empowers the system to engage in abstractive reasoning. Our model surpasses the previous extractive model by 3% on extractive QA datasets. Our GSQA model shows the potential to generalize to a broad spectrum of questions, thus further expanding the spoken question answering capabilities of abstractive QA.
arXiv Detail & Related papers (2023-12-15T13:33:18Z)
Uncertainty-aware Language Modeling for Selective Question Answering [107.47864420630923]
We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs. Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems.
arXiv Detail & Related papers (2023-11-26T22:47:54Z)
Can NLI Models Verify QA Systems' Predictions? [34.46234860404459]
We explore the use of natural language inference (NLI) to build robust question answering systems. We leverage large pre-trained models and recent prior datasets to construct powerful question converter and decontextualization modules. We show that our NLI approach can generally improve the confidence estimation of a QA model across different domains.
arXiv Detail & Related papers (2021-04-18T06:03:07Z)
How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering [80.82194311274694]
We examine the question "how can we know when language models know, with confidence, the answer to a particular query?" We examine three strong generative models -- T5, BART, and GPT-2 -- and study whether their probabilities on QA tasks are well calibrated. We then examine methods to calibrate such models to make their confidence scores correlate better with the likelihood of correctness.
arXiv Detail & Related papers (2020-12-02T03:53:13Z)
Answering Ambiguous Questions through Generative Evidence Fusion and Round-Trip Prediction [46.38201136570501]
We present a model that aggregates and combines evidence from multiple passages to adaptively predict a single answer or a set of question-answer pairs for ambiguous questions. Our model, named Refuel, achieves a new state-of-the-art performance on the AmbigQA dataset, and shows competitive performance on NQ-Open and TriviaQA.
arXiv Detail & Related papers (2020-11-26T05:48:55Z)
Counterfactual Variable Control for Robust and Interpretable Question Answering [57.25261576239862]
Deep neural network based question answering (QA) models are neither robust nor explainable in many cases. In this paper, we inspect such spurious "capability" of QA models using causal inference. We propose a novel approach called Counterfactual Variable Control (CVC) that explicitly mitigates any shortcut correlation.
arXiv Detail & Related papers (2020-10-12T10:09:05Z)
Selective Question Answering under Domain Shift [90.021577320085]
Abstention policies based solely on the model's softmax probabilities fare poorly, since models are overconfident on out-of-domain inputs. We train a calibrator to identify inputs on which the QA model errs, and abstain when it predicts an error is likely. Our method answers 56% of questions while maintaining 80% accuracy; in contrast, directly using the model's probabilities only answers 48% at 80% accuracy.
arXiv Detail & Related papers (2020-06-16T19:13:21Z)
Do not let the history haunt you -- Mitigating Compounding Errors in Conversational Question Answering [17.36904526340775]
We find that compounding errors occur when using previously predicted answers at test time. We propose a sampling strategy that dynamically selects between target answers and model predictions during training.
arXiv Detail & Related papers (2020-05-12T13:29:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.