How Does Beam Search improve Span-Level Confidence Estimation in
Generative Sequence Labeling?
- URL: http://arxiv.org/abs/2212.10767v3
- Date: Wed, 31 Jan 2024 04:10:30 GMT
- Title: How Does Beam Search improve Span-Level Confidence Estimation in
Generative Sequence Labeling?
- Authors: Kazuma Hashimoto and Iftekhar Naim and Karthik Raman
- Abstract summary: This paper aims to provide some empirical insights on estimating model confidence for generative sequence labeling.
As verified over six public datasets, we show that our proposed approach significantly reduces calibration errors of the predictions of a generative sequence labeling model.
- Score: 11.481435098152893
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sequence labeling is a core task in text understanding for IE/IR systems.
Text generation models have increasingly become the go-to solution for such
tasks (e.g., entity extraction and dialog slot filling). While most research
has focused on the labeling accuracy, a key aspect -- of vital practical
importance -- has slipped through the cracks: understanding model confidence.
More specifically, we lack a principled understanding of how to reliably gauge
the confidence of a model in its predictions for each labeled span. This paper
aims to provide some empirical insights on estimating model confidence for
generative sequence labeling. Most notably, we find that simply using the
decoder's output probabilities \textbf{is not} the best in realizing
well-calibrated confidence estimates. As verified over six public datasets of
different tasks, we show that our proposed approach -- which leverages
statistics from top-$k$ predictions by a beam search -- significantly reduces
calibration errors of the predictions of a generative sequence labeling model.
Related papers
- Selective Learning: Towards Robust Calibration with Dynamic Regularization [79.92633587914659]
Miscalibration in deep learning refers to there is a discrepancy between the predicted confidence and performance.
We introduce Dynamic Regularization (DReg) which aims to learn what should be learned during training thereby circumventing the confidence adjusting trade-off.
arXiv Detail & Related papers (2024-02-13T11:25:20Z) - Predicting generalization performance with correctness discriminators [64.00420578048855]
We present a novel model that establishes upper and lower bounds on the accuracy, without requiring gold labels for the unseen data.
We show across a variety of tagging, parsing, and semantic parsing tasks that the gold accuracy is reliably between the predicted upper and lower bounds.
arXiv Detail & Related papers (2023-11-15T22:43:42Z) - Perception and Semantic Aware Regularization for Sequential Confidence
Calibration [12.265757315192497]
We propose a Perception and Semantic aware Sequence Regularization framework.
We introduce a semantic context-free recognition and a language model to acquire similar sequences with high perceptive similarities and semantic correlation.
Experiments on canonical sequence recognition tasks, including scene text and speech recognition, demonstrate that our method sets novel state-of-the-art results.
arXiv Detail & Related papers (2023-05-31T02:16:29Z) - A Confidence-based Partial Label Learning Model for Crowd-Annotated
Named Entity Recognition [74.79785063365289]
Existing models for named entity recognition (NER) are mainly based on large-scale labeled datasets.
We propose a Confidence-based Partial Label Learning (CPLL) method to integrate the prior confidence (given by annotators) and posterior confidences (learned by models) for crowd-annotated NER.
arXiv Detail & Related papers (2023-05-21T15:31:23Z) - Confidence-Aware Calibration and Scoring Functions for Curriculum
Learning [1.192436948211501]
We integrate notions of model confidence and human confidence with label smoothing to achieve better model calibration and generalization.
A higher model or human confidence score indicates a more recognisable and therefore easier sample, and can therefore be used as a scoring function to rank samples in curriculum learning.
arXiv Detail & Related papers (2023-01-29T23:59:40Z) - Predictive Inference with Weak Supervision [3.1925030748447747]
We bridge the gap between partial supervision and validation by developing a conformal prediction framework.
We introduce a new notion of coverage and predictive validity, then develop several application scenarios.
We corroborate the hypothesis that the new coverage definition allows for tighter and more informative (but valid) confidence sets.
arXiv Detail & Related papers (2022-01-20T17:26:52Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Distribution-Free, Risk-Controlling Prediction Sets [112.9186453405701]
We show how to generate set-valued predictions from a black-box predictor that control the expected loss on future test points at a user-specified level.
Our approach provides explicit finite-sample guarantees for any dataset by using a holdout set to calibrate the size of the prediction sets.
arXiv Detail & Related papers (2021-01-07T18:59:33Z) - Knowing what you know: valid and validated confidence sets in multiclass
and multilabel prediction [0.8594140167290097]
We develop conformal prediction methods for constructing valid confidence sets in multiclass and multilabel problems.
By leveraging ideas from quantile regression, we build methods that always guarantee correct coverage but additionally provide conditional coverage for both multiclass and multilabel prediction problems.
arXiv Detail & Related papers (2020-04-21T17:45:38Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.