Realistic Conversational Question Answering with Answer Selection based
on Calibrated Confidence and Uncertainty Measurement
- URL: http://arxiv.org/abs/2302.05137v1
- Date: Fri, 10 Feb 2023 09:42:07 GMT
- Title: Realistic Conversational Question Answering with Answer Selection based
on Calibrated Confidence and Uncertainty Measurement
- Authors: Soyeong Jeong, Jinheon Baek, Sung Ju Hwang, Jong C. Park
- Abstract summary: Conversational Question Answering (ConvQA) models aim at answering a question with its relevant paragraph and previous question-answer pairs that occurred during conversation multiple times.
We propose to filter out inaccurate answers in the conversation history based on their estimated confidences and uncertainties from the ConvQA model.
We validate our models, Answer Selection-based realistic Conversation Question Answering, on two standard ConvQA datasets.
- Score: 54.55643652781891
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conversational Question Answering (ConvQA) models aim at answering a question
with its relevant paragraph and previous question-answer pairs that occurred
during conversation multiple times. To apply such models to a real-world
scenario, some existing work uses predicted answers, instead of unavailable
ground-truth answers, as the conversation history for inference. However, since
these models usually predict wrong answers, using all the predictions without
filtering significantly hampers the model performance. To address this problem,
we propose to filter out inaccurate answers in the conversation history based
on their estimated confidences and uncertainties from the ConvQA model, without
making any architectural changes. Moreover, to make the confidence and
uncertainty values more reliable, we propose to further calibrate them, thereby
smoothing the model predictions. We validate our models, Answer Selection-based
realistic Conversation Question Answering, on two standard ConvQA datasets, and
the results show that our models significantly outperform relevant baselines.
Code is available at: https://github.com/starsuzi/AS-ConvQA.
Related papers
- GSQA: An End-to-End Model for Generative Spoken Question Answering [54.418723701886115]
We introduce the first end-to-end Generative Spoken Question Answering (GSQA) model that empowers the system to engage in abstractive reasoning.
Our model surpasses the previous extractive model by 3% on extractive QA datasets.
Our GSQA model shows the potential to generalize to a broad spectrum of questions, thus further expanding the spoken question answering capabilities of abstractive QA.
arXiv Detail & Related papers (2023-12-15T13:33:18Z) - Uncertainty-aware Language Modeling for Selective Question Answering [107.47864420630923]
We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs.
Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems.
arXiv Detail & Related papers (2023-11-26T22:47:54Z) - Can NLI Models Verify QA Systems' Predictions? [34.46234860404459]
We explore the use of natural language inference (NLI) to build robust question answering systems.
We leverage large pre-trained models and recent prior datasets to construct powerful question converter and decontextualization modules.
We show that our NLI approach can generally improve the confidence estimation of a QA model across different domains.
arXiv Detail & Related papers (2021-04-18T06:03:07Z) - How Can We Know When Language Models Know? On the Calibration of
Language Models for Question Answering [80.82194311274694]
We examine the question "how can we know when language models know, with confidence, the answer to a particular query?"
We examine three strong generative models -- T5, BART, and GPT-2 -- and study whether their probabilities on QA tasks are well calibrated.
We then examine methods to calibrate such models to make their confidence scores correlate better with the likelihood of correctness.
arXiv Detail & Related papers (2020-12-02T03:53:13Z) - Answering Ambiguous Questions through Generative Evidence Fusion and
Round-Trip Prediction [46.38201136570501]
We present a model that aggregates and combines evidence from multiple passages to adaptively predict a single answer or a set of question-answer pairs for ambiguous questions.
Our model, named Refuel, achieves a new state-of-the-art performance on the AmbigQA dataset, and shows competitive performance on NQ-Open and TriviaQA.
arXiv Detail & Related papers (2020-11-26T05:48:55Z) - Counterfactual Variable Control for Robust and Interpretable Question
Answering [57.25261576239862]
Deep neural network based question answering (QA) models are neither robust nor explainable in many cases.
In this paper, we inspect such spurious "capability" of QA models using causal inference.
We propose a novel approach called Counterfactual Variable Control (CVC) that explicitly mitigates any shortcut correlation.
arXiv Detail & Related papers (2020-10-12T10:09:05Z) - Selective Question Answering under Domain Shift [90.021577320085]
Abstention policies based solely on the model's softmax probabilities fare poorly, since models are overconfident on out-of-domain inputs.
We train a calibrator to identify inputs on which the QA model errs, and abstain when it predicts an error is likely.
Our method answers 56% of questions while maintaining 80% accuracy; in contrast, directly using the model's probabilities only answers 48% at 80% accuracy.
arXiv Detail & Related papers (2020-06-16T19:13:21Z) - Do not let the history haunt you -- Mitigating Compounding Errors in
Conversational Question Answering [17.36904526340775]
We find that compounding errors occur when using previously predicted answers at test time.
We propose a sampling strategy that dynamically selects between target answers and model predictions during training.
arXiv Detail & Related papers (2020-05-12T13:29:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.