Calibrating Structured Output Predictors for Natural Language Processing
- URL: http://arxiv.org/abs/2004.04361v2
- Date: Tue, 5 May 2020 21:28:08 GMT
- Title: Calibrating Structured Output Predictors for Natural Language Processing
- Authors: Abhyuday Jagannatha, Hong Yu
- Abstract summary: We propose a general calibration scheme for output entities of interest in neural-network based structured prediction models.
Our proposed method can be used with any binary class calibration scheme and a neural network model.
We show that our method outperforms current calibration techniques for named-entity-recognition, part-of-speech and question answering.
- Score: 8.361023354729731
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the problem of calibrating prediction confidence for output
entities of interest in natural language processing (NLP) applications. It is
important that NLP applications such as named entity recognition and question
answering produce calibrated confidence scores for their predictions,
especially if the system is to be deployed in a safety-critical domain such as
healthcare. However, the output space of such structured prediction models is
often too large to adapt binary or multi-class calibration methods directly. In
this study, we propose a general calibration scheme for output entities of
interest in neural-network based structured prediction models. Our proposed
method can be used with any binary class calibration scheme and a neural
network model. Additionally, we show that our calibration method can also be
used as an uncertainty-aware, entity-specific decoding step to improve the
performance of the underlying model at no additional training cost or data
requirements. We show that our method outperforms current calibration
techniques for named-entity-recognition, part-of-speech and question answering.
We also improve our model's performance from our decoding step across several
tasks and benchmark datasets. Our method improves the calibration and model
performance on out-of-domain test scenarios as well.
Related papers
- Feature Clipping for Uncertainty Calibration [24.465567005078135]
Modern deep neural networks (DNNs) often suffer from overconfidence, leading to miscalibration.
We propose a novel post-hoc calibration method called feature clipping (FC) to address this issue.
FC involves clipping feature values to a specified threshold, effectively increasing entropy in high calibration error samples.
arXiv Detail & Related papers (2024-10-16T06:44:35Z) - Optimizing Estimators of Squared Calibration Errors in Classification [2.3020018305241337]
We propose a mean-squared error-based risk that enables the comparison and optimization of estimators of squared calibration errors.
Our approach advocates for a training-validation-testing pipeline when estimating a calibration error.
arXiv Detail & Related papers (2024-10-09T15:58:06Z) - Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency.
Results show that consistency-based calibration methods outperform existing post-hoc approaches.
We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z) - On the Calibration of Large Language Models and Alignment [63.605099174744865]
Confidence calibration serves as a crucial tool for gauging the reliability of deep models.
We conduct a systematic examination of the calibration of aligned language models throughout the entire construction process.
Our work sheds light on whether popular LLMs are well-calibrated and how the training process influences model calibration.
arXiv Detail & Related papers (2023-11-22T08:57:55Z) - Preserving Pre-trained Features Helps Calibrate Fine-tuned Language
Models [23.881825575095945]
Large pre-trained language models (PLMs) have demonstrated strong performance on natural language understanding (NLU) tasks through fine-tuning.
However, fine-tuned models still suffer from overconfident predictions, especially in out-of-domain settings.
We demonstrate that the PLMs are well-calibrated on the masked language modeling task with robust predictive confidence under domain shift.
We show that preserving pre-trained features can improve the calibration of fine-tuned language models.
arXiv Detail & Related papers (2023-05-30T17:35:31Z) - Calibration of Neural Networks [77.34726150561087]
This paper presents a survey of confidence calibration problems in the context of neural networks.
We analyze problem statement, calibration definitions, and different approaches to evaluation.
Empirical experiments cover various datasets and models, comparing calibration methods according to different criteria.
arXiv Detail & Related papers (2023-03-19T20:27:51Z) - Sharp Calibrated Gaussian Processes [58.94710279601622]
State-of-the-art approaches for designing calibrated models rely on inflating the Gaussian process posterior variance.
We present a calibration approach that generates predictive quantiles using a computation inspired by the vanilla Gaussian process posterior variance.
Our approach is shown to yield a calibrated model under reasonable assumptions.
arXiv Detail & Related papers (2023-02-23T12:17:36Z) - On Calibrating Semantic Segmentation Models: Analyses and An Algorithm [51.85289816613351]
We study the problem of semantic segmentation calibration.
Model capacity, crop size, multi-scale testing, and prediction correctness have impact on calibration.
We propose a simple, unifying, and effective approach, namely selective scaling.
arXiv Detail & Related papers (2022-12-22T22:05:16Z) - Variable-Based Calibration for Machine Learning Classifiers [11.9995808096481]
We introduce the notion of variable-based calibration to characterize calibration properties of a model.
We find that models with near-perfect expected calibration error can exhibit significant miscalibration as a function of features of the data.
arXiv Detail & Related papers (2022-09-30T00:49:31Z) - Modular Conformal Calibration [80.33410096908872]
We introduce a versatile class of algorithms for recalibration in regression.
This framework allows one to transform any regression model into a calibrated probabilistic model.
We conduct an empirical study of MCC on 17 regression datasets.
arXiv Detail & Related papers (2022-06-23T03:25:23Z) - Meta-Calibration: Learning of Model Calibration Using Differentiable
Expected Calibration Error [46.12703434199988]
We introduce a new differentiable surrogate for expected calibration error (DECE) that allows calibration quality to be directly optimised.
We also propose a meta-learning framework that uses DECE to optimise for validation set calibration.
arXiv Detail & Related papers (2021-06-17T15:47:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.