On Task Performance and Model Calibration with Supervised and
Self-Ensembled In-Context Learning
- URL: http://arxiv.org/abs/2312.13772v2
- Date: Fri, 22 Dec 2023 21:03:03 GMT
- Title: On Task Performance and Model Calibration with Supervised and
Self-Ensembled In-Context Learning
- Authors: Chengzu Li, Han Zhou, Goran Glava\v{s}, Anna Korhonen, Ivan Vuli\'c
- Abstract summary: In-context learning (ICL) has become an efficient approach propelled by the recent advancements in large language models (LLMs)
However, both paradigms are prone to suffer from the critical problem of overconfidence (i.e., miscalibration)
- Score: 71.44986275228747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Following the standard supervised fine-tuning (SFT) paradigm, in-context
learning (ICL) has become an efficient approach propelled by the recent
advancements in large language models (LLMs), yielding promising performance
across various tasks in few-shot data setups. However, both paradigms are prone
to suffer from the critical problem of overconfidence (i.e., miscalibration),
especially in such limited data setups. In this work, we deliver an in-depth
analysis of the behavior across different choices of learning methods from the
perspective of both performance and calibration, as well as their interplay.
Through extensive controlled experiments, we find that simultaneous gains for
both task performance and calibration are difficult to achieve, and the problem
of miscalibration exists across all learning methods in low-resource scenarios.
To address this challenging trade-off between performance and calibration, we
then investigate the potential of self-ensembling techniques applied at
different modeling stages (e.g., variations of in-context examples or
variations in prompts or different ensembling strategies). We justify the
feasibility of self-ensembling on SFT in addition to ICL, to make the
predictions more calibrated and have comparable or even better performance. Our
work sheds light on which learning paradigm to choose and how to enhance both
task performance and calibration of LLMs.
Related papers
- Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.
We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z) - Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Calibrating Large Language Models with Sample Consistency [76.23956851098598]
We explore the potential of deriving confidence from the distribution of multiple randomly sampled model generations, via three measures of consistency.
Results show that consistency-based calibration methods outperform existing post-hoc approaches.
We offer practical guidance on choosing suitable consistency metrics for calibration, tailored to the characteristics of various LMs.
arXiv Detail & Related papers (2024-02-21T16:15:20Z) - A Study on the Calibration of In-context Learning [27.533223818505682]
We study in-context learning (ICL), a prevalent method for adapting static language models through tailored prompts.
We observe that, with an increasing number of ICL examples, models initially exhibit increased miscalibration before achieving better calibration.
We explore recalibration techniques and find that a scaling-binning calibrator can reduce calibration errors consistently.
arXiv Detail & Related papers (2023-12-07T03:37:39Z) - Latent Alignment with Deep Set EEG Decoders [44.128689862889715]
We introduce the Latent Alignment method that won the Benchmarks for EEG Transfer Learning competition.
We present its formulation as a deep set applied on the set of trials from a given subject.
Our experimental results show that performing statistical distribution alignment at later stages in a deep learning model is beneficial to the classification accuracy.
arXiv Detail & Related papers (2023-11-29T12:40:45Z) - On the Calibration of Large Language Models and Alignment [63.605099174744865]
Confidence calibration serves as a crucial tool for gauging the reliability of deep models.
We conduct a systematic examination of the calibration of aligned language models throughout the entire construction process.
Our work sheds light on whether popular LLMs are well-calibrated and how the training process influences model calibration.
arXiv Detail & Related papers (2023-11-22T08:57:55Z) - Mitigating Gradient Bias in Multi-objective Learning: A Provably Convergent Stochastic Approach [38.76462300149459]
We develop a Multi-objective Correction (MoCo) method for multi-objective gradient optimization.
The unique feature of our method is that it can guarantee convergence without increasing the non fairness gradient.
arXiv Detail & Related papers (2022-10-23T05:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.