Towards Confident Machine Reading Comprehension
- URL: http://arxiv.org/abs/2101.07942v2
- Date: Wed, 24 Feb 2021 04:32:30 GMT
- Title: Towards Confident Machine Reading Comprehension
- Authors: Rishav Chakravarti, Avirup Sil
- Abstract summary: We propose a novel post-prediction confidence estimation model, which we call Mr.C (short for Mr. Confident)
Mr.C can be trained to improve a system's ability to refrain from making incorrect predictions with improvements of up to 4 points as measured by Area Under the Curve (AUC) scores.
- Score: 7.989756186727329
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: There has been considerable progress on academic benchmarks for the Reading
Comprehension (RC) task with State-of-the-Art models closing the gap with human
performance on extractive question answering. Datasets such as SQuAD 2.0 & NQ
have also introduced an auxiliary task requiring models to predict when a
question has no answer in the text. However, in production settings, it is also
necessary to provide confidence estimates for the performance of the underlying
RC model at both answer extraction and "answerability" detection. We propose a
novel post-prediction confidence estimation model, which we call Mr.C (short
for Mr. Confident), that can be trained to improve a system's ability to
refrain from making incorrect predictions with improvements of up to 4 points
as measured by Area Under the Curve (AUC) scores. Mr.C can benefit from a novel
white-box feature that leverages the underlying RC model's gradients.
Performance prediction is particularly important in cases of domain shift (as
measured by training RC models on SQUAD 2.0 and evaluating on NQ), where Mr.C
not only improves AUC, but also traditional answerability prediction (as
measured by a 5 point improvement in F1).
Related papers
- Uncertainty-aware Language Modeling for Selective Question Answering [107.47864420630923]
We present an automatic large language model (LLM) conversion approach that produces uncertainty-aware LLMs.
Our approach is model- and data-agnostic, is computationally-efficient, and does not rely on external models or systems.
arXiv Detail & Related papers (2023-11-26T22:47:54Z) - Accurate and Reliable Confidence Estimation Based on Non-Autoregressive
End-to-End Speech Recognition System [42.569506907182706]
Previous end-to-end(E2E) based confidence estimation models (CEM) predict score sequences of equal length with input transcriptions, leading to unreliable estimation when deletion and insertion errors occur.
We propose CIF-Aligned confidence estimation model (CA-CEM) to achieve accurate and reliable confidence estimation based on novel non-autoregressive E2E ASR model - Paraformer.
arXiv Detail & Related papers (2023-05-18T03:34:50Z) - Toward Reliable Human Pose Forecasting with Uncertainty [51.628234388046195]
We develop an open-source library for human pose forecasting, including multiple models, supporting several datasets.
We devise two types of uncertainty in the problem to increase performance and convey better trust.
arXiv Detail & Related papers (2023-04-13T17:56:08Z) - VisFIS: Visual Feature Importance Supervision with
Right-for-the-Right-Reason Objectives [84.48039784446166]
We show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason metrics.
Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets.
Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful.
arXiv Detail & Related papers (2022-06-22T17:02:01Z) - Balancing Cost and Quality: An Exploration of Human-in-the-loop
Frameworks for Automated Short Answer Scoring [36.58449231222223]
Short answer scoring (SAS) is the task of grading short text written by a learner.
We present the first study of exploring the use of human-in-the-loop framework for minimizing the grading cost.
We find that our human-in-the-loop framework allows automatic scoring models and human graders to achieve the target scoring quality.
arXiv Detail & Related papers (2022-06-16T16:43:18Z) - Towards More Fine-grained and Reliable NLP Performance Prediction [85.78131503006193]
We make two contributions to improving performance prediction for NLP tasks.
First, we examine performance predictors for holistic measures of accuracy like F1 or BLEU.
Second, we propose methods to understand the reliability of a performance prediction model from two angles: confidence intervals and calibration.
arXiv Detail & Related papers (2021-02-10T15:23:20Z) - RECONSIDER: Re-Ranking using Span-Focused Cross-Attention for Open
Domain Question Answering [49.024513062811685]
We develop a simple and effective re-ranking approach (RECONSIDER) for span-extraction tasks.
RECONSIDER is trained on positive and negative examples extracted from high confidence predictions of MRC models.
It uses in-passage span annotations to perform span-focused re-ranking over a smaller candidate set.
arXiv Detail & Related papers (2020-10-21T04:28:42Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.