Related papers: Towards Improving Selective Prediction Ability of NLP Systems

Towards Improving Selective Prediction Ability of NLP Systems

URL: http://arxiv.org/abs/2008.09371v3
Date: Thu, 7 Apr 2022 00:22:04 GMT
Title: Towards Improving Selective Prediction Ability of NLP Systems
Authors: Neeraj Varshney, Swaroop Mishra, Chitta Baral
Abstract summary: We propose a method that improves probability estimates of models by calibrating them using prediction confidence and difficulty score of instances. We instantiate our method with Natural Language Inference (NLI) and Duplicate Detection (DD) tasks and evaluate it in both In-Domain (IID) and Out-of-Domain (OOD) settings.
Score: 24.774450633678125
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: It's better to say "I can't answer" than to answer incorrectly. This selective prediction ability is crucial for NLP systems to be reliably deployed in real-world applications. Prior work has shown that existing selective prediction techniques fail to perform well, especially in the out-of-domain setting. In this work, we propose a method that improves probability estimates of models by calibrating them using prediction confidence and difficulty score of instances. Using these two signals, we first annotate held-out instances and then train a calibrator to predict the likelihood of correctness of the model's prediction. We instantiate our method with Natural Language Inference (NLI) and Duplicate Detection (DD) tasks and evaluate it in both In-Domain (IID) and Out-of-Domain (OOD) settings. In (IID, OOD) settings, we show that the representations learned by our calibrator result in an improvement of (15.81%, 5.64%) and (6.19%, 13.9%) over 'MaxProb' -- a selective prediction baseline -- on NLI and DD tasks respectively.

Related papers

Beyond Accuracy: Ensuring Correct Predictions With Correct Rationales [10.397502254316645]
We propose a two-phase scheme to ensure double-correct predictions. First, we curate a new dataset that offers structured rationales for visual recognition tasks. Second, we propose a rationale-informed optimization method to guide the model in disentangling and localizing visual evidence.
arXiv Detail & Related papers (2024-10-31T18:33:39Z)
Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs [56.526095828316386]
We propose a novel framework for adaptation with self-evaluation to improve the selective prediction performance of large language models (LLMs) We evaluate our method on a variety of question-answering (QA) datasets and show that it outperforms state-of-the-art selective prediction methods.
arXiv Detail & Related papers (2023-10-18T03:34:59Z)
Bridging Precision and Confidence: A Train-Time Loss for Calibrating Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions. Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z)
Calibrated Selective Classification [34.08454890436067]
We develop a new approach to selective classification in which we propose a method for rejecting examples with "uncertain" uncertainties. We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model. We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.
arXiv Detail & Related papers (2022-08-25T13:31:09Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Neural Predictive Monitoring under Partial Observability [4.1316328854247155]
We present a learning-based method for predictive monitoring (PM) that produces accurate and reliable reachability predictions despite partial observability (PO) Our method results in highly accurate reachability predictions and error detection, as well as tight prediction regions with guaranteed coverage.
arXiv Detail & Related papers (2021-08-16T15:08:20Z)
Improving Uncertainty Calibration via Prior Augmented Data [56.88185136509654]
Neural networks have proven successful at learning from complex data distributions by acting as universal function approximators. They are often overconfident in their predictions, which leads to inaccurate and miscalibrated probabilistic predictions. We propose a solution by seeking out regions of feature space where the model is unjustifiably overconfident, and conditionally raising the entropy of those predictions towards that of the prior distribution of the labels.
arXiv Detail & Related papers (2021-02-22T07:02:37Z)
Calibrate Before Use: Improving Few-Shot Performance of Language Models [68.17016463756474]
GPT-3 can perform numerous tasks when provided a natural language prompt that contains a few training examples. We show that this type of few-shot learning can be unstable. The choice of prompt format, training examples, and even the order of the training examples can cause accuracy to vary from near chance to near state-of-the-art.
arXiv Detail & Related papers (2021-02-19T00:23:59Z)
Towards More Fine-grained and Reliable NLP Performance Prediction [85.78131503006193]
We make two contributions to improving performance prediction for NLP tasks. First, we examine performance predictors for holistic measures of accuracy like F1 or BLEU. Second, we propose methods to understand the reliability of a performance prediction model from two angles: confidence intervals and calibration.
arXiv Detail & Related papers (2021-02-10T15:23:20Z)
Calibrating Structured Output Predictors for Natural Language Processing [8.361023354729731]
We propose a general calibration scheme for output entities of interest in neural-network based structured prediction models. Our proposed method can be used with any binary class calibration scheme and a neural network model. We show that our method outperforms current calibration techniques for named-entity-recognition, part-of-speech and question answering.
arXiv Detail & Related papers (2020-04-09T04:14:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.