Measuring the Instability of Fine-Tuning
- URL: http://arxiv.org/abs/2302.07778v2
- Date: Sun, 1 Oct 2023 10:38:39 GMT
- Title: Measuring the Instability of Fine-Tuning
- Authors: Yupei Du and Dong Nguyen
- Abstract summary: Fine-tuning pre-trained language models on downstream tasks with varying random seeds has been shown to be unstable.
In this paper, we analyze SD and six other measures quantifying instability at different levels of granularity.
- Score: 7.370822347217826
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Fine-tuning pre-trained language models on downstream tasks with varying
random seeds has been shown to be unstable, especially on small datasets. Many
previous studies have investigated this instability and proposed methods to
mitigate it. However, most studies only used the standard deviation of
performance scores (SD) as their measure, which is a narrow characterization of
instability. In this paper, we analyze SD and six other measures quantifying
instability at different levels of granularity. Moreover, we propose a
systematic framework to evaluate the validity of these measures. Finally, we
analyze the consistency and difference between different measures by
reassessing existing instability mitigation methods. We hope our results will
inform the development of better measurements of fine-tuning instability.
Related papers
- Second-Order Uncertainty Quantification: Variance-Based Measures [2.3999111269325266]
This paper proposes a novel way to use variance-based measures to quantify uncertainty on the basis of second-order distributions in classification problems.
A distinctive feature of the measures is the ability to reason about uncertainties on a class-based level, which is useful in situations where nuanced decision-making is required.
arXiv Detail & Related papers (2023-12-30T16:30:52Z) - One step closer to unbiased aleatoric uncertainty estimation [71.55174353766289]
We propose a new estimation method by actively de-noising the observed data.
By conducting a broad range of experiments, we demonstrate that our proposed approach provides a much closer approximation to the actual data uncertainty than the standard method.
arXiv Detail & Related papers (2023-12-16T14:59:11Z) - Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point.
Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z) - Evaluating AI systems under uncertain ground truth: a case study in
dermatology [44.80772162289557]
We propose a metric for measuring annotation uncertainty and provide uncertainty-adjusted metrics for performance evaluation.
We present a case study applying our framework to skin condition classification from images where annotations are provided in the form of differential diagnoses.
arXiv Detail & Related papers (2023-07-05T10:33:45Z) - Toward Reliable Human Pose Forecasting with Uncertainty [51.628234388046195]
We develop an open-source library for human pose forecasting, including multiple models, supporting several datasets.
We devise two types of uncertainty in the problem to increase performance and convey better trust.
arXiv Detail & Related papers (2023-04-13T17:56:08Z) - Mutual Wasserstein Discrepancy Minimization for Sequential
Recommendation [82.0801585843835]
We propose a novel self-supervised learning framework based on Mutual WasserStein discrepancy minimization MStein for the sequential recommendation.
We also propose a novel contrastive learning loss based on Wasserstein Discrepancy Measurement.
arXiv Detail & Related papers (2023-01-28T13:38:48Z) - Monotonicity and Double Descent in Uncertainty Estimation with Gaussian
Processes [52.92110730286403]
It is commonly believed that the marginal likelihood should be reminiscent of cross-validation metrics and that both should deteriorate with larger input dimensions.
We prove that by tuning hyper parameters, the performance, as measured by the marginal likelihood, improves monotonically with the input dimension.
We also prove that cross-validation metrics exhibit qualitatively different behavior that is characteristic of double descent.
arXiv Detail & Related papers (2022-10-14T08:09:33Z) - Better Uncertainty Quantification for Machine Translation Evaluation [17.36759906285316]
We train the COMET metric with new heteroscedastic regression, divergence minimization, and direct uncertainty prediction objectives.
Experiments show improved results on WMT20 and WMT21 metrics task datasets and a substantial reduction in computational costs.
arXiv Detail & Related papers (2022-04-13T17:49:25Z) - How certain are your uncertainties? [0.3655021726150368]
Measures of uncertainty in the output of a deep learning method are useful in several ways.
This work investigates the stability of these uncertainty measurements, in terms of both magnitude and spatial pattern.
arXiv Detail & Related papers (2022-03-01T05:25:02Z) - Uncertainty-Gated Stochastic Sequential Model for EHR Mortality
Prediction [6.170898159041278]
We present a novel variational recurrent network that estimates the distribution of missing variables, updates hidden states, and predicts the possibility of in-hospital mortality.
It is noteworthy that our model can conduct these procedures in a single stream and learn all network parameters jointly in an end-to-end manner.
arXiv Detail & Related papers (2020-03-02T04:41:28Z) - Learning to Predict Error for MRI Reconstruction [67.76632988696943]
We demonstrate that predictive uncertainty estimated by the current methods does not highly correlate with prediction error.
We propose a novel method that estimates the target labels and magnitude of the prediction error in two steps.
arXiv Detail & Related papers (2020-02-13T15:55:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.