Can NLI Models Verify QA Systems' Predictions?
- URL: http://arxiv.org/abs/2104.08731v1
- Date: Sun, 18 Apr 2021 06:03:07 GMT
- Title: Can NLI Models Verify QA Systems' Predictions?
- Authors: Jifan Chen, Eunsol Choi, Greg Durrett
- Abstract summary: We explore the use of natural language inference (NLI) to build robust question answering systems.
We leverage large pre-trained models and recent prior datasets to construct powerful question converter and decontextualization modules.
We show that our NLI approach can generally improve the confidence estimation of a QA model across different domains.
- Score: 34.46234860404459
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To build robust question answering systems, we need the ability to verify
whether answers to questions are truly correct, not just "good enough" in the
context of imperfect QA datasets. We explore the use of natural language
inference (NLI) as a way to achieve this goal, as NLI inherently requires the
premise (document context) to contain all necessary information to support the
hypothesis (proposed answer to the question). We leverage large pre-trained
models and recent prior datasets to construct powerful question converter and
decontextualization modules, which can reformulate QA instances as
premise-hypothesis pairs with very high reliability. Then, by combining
standard NLI datasets with NLI examples automatically derived from QA training
data, we can train NLI models to judge the correctness of QA models' proposed
answers. We show that our NLI approach can generally improve the confidence
estimation of a QA model across different domains, evaluated in a selective QA
setting. Careful manual analysis over the predictions of our NLI model shows
that it can further identify cases where the QA model produces the right answer
for the wrong reason, or where the answer cannot be verified as addressing all
aspects of the question.
Related papers
- Uncertainty-aware Language Modeling for Selective Question Answering [107.47864420630923]
We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs.
Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems.
arXiv Detail & Related papers (2023-11-26T22:47:54Z) - A Lightweight Method to Generate Unanswerable Questions in English [18.323248259867356]
We examine a simpler data augmentation method for unanswerable question generation in English.
We perform antonym and entity swaps on answerable questions.
Compared to the prior state-of-the-art, data generated with our training-free and lightweight strategy results in better models.
arXiv Detail & Related papers (2023-10-30T10:14:52Z) - QADYNAMICS: Training Dynamics-Driven Synthetic QA Diagnostic for
Zero-Shot Commonsense Question Answering [48.25449258017601]
State-of-the-art approaches fine-tune language models on QA pairs constructed from CommonSense Knowledge Bases.
We propose QADYNAMICS, a training dynamics-driven framework for QA diagnostics and refinement.
arXiv Detail & Related papers (2023-10-17T14:27:34Z) - QASnowball: An Iterative Bootstrapping Framework for High-Quality
Question-Answering Data Generation [67.27999343730224]
We introduce an iterative bootstrapping framework for QA data augmentation (named QASnowball)
QASnowball can iteratively generate large-scale high-quality QA data based on a seed set of supervised examples.
We conduct experiments in the high-resource English scenario and the medium-resource Chinese scenario, and the experimental results show that the data generated by QASnowball can facilitate QA models.
arXiv Detail & Related papers (2023-09-19T05:20:36Z) - Realistic Conversational Question Answering with Answer Selection based
on Calibrated Confidence and Uncertainty Measurement [54.55643652781891]
Conversational Question Answering (ConvQA) models aim at answering a question with its relevant paragraph and previous question-answer pairs that occurred during conversation multiple times.
We propose to filter out inaccurate answers in the conversation history based on their estimated confidences and uncertainties from the ConvQA model.
We validate our models, Answer Selection-based realistic Conversation Question Answering, on two standard ConvQA datasets.
arXiv Detail & Related papers (2023-02-10T09:42:07Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z) - Template-Based Question Generation from Retrieved Sentences for Improved
Unsupervised Question Answering [98.48363619128108]
We propose an unsupervised approach to training QA models with generated pseudo-training data.
We show that generating questions for QA training by applying a simple template on a related, retrieved sentence rather than the original context sentence improves downstream QA performance.
arXiv Detail & Related papers (2020-04-24T17:57:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.