Surprise Calibration for Better In-Context Learning
- URL: http://arxiv.org/abs/2506.12796v2
- Date: Tue, 17 Jun 2025 07:46:17 GMT
- Title: Surprise Calibration for Better In-Context Learning
- Authors: Zhihang Tan, Jingrui Hou, Ping Wang, Qibiao Hu, Peng Zhu,
- Abstract summary: In-context learning (ICL) has emerged as a powerful paradigm for task adaptation in large language models.<n>Existing bias calibration methods apply fixed class priors across all inputs, limiting their efficacy in dynamic ICL settings.<n>We introduce a novel method-Surprise (SC), which captures the temporal dynamics of class priors.
- Score: 6.566285172635043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In-context learning (ICL) has emerged as a powerful paradigm for task adaptation in large language models (LLMs), where models infer underlying task structures from a few demonstrations. However, ICL remains susceptible to biases that arise from prior knowledge and contextual demonstrations, which can degrade the performance of LLMs. Existing bias calibration methods typically apply fixed class priors across all inputs, limiting their efficacy in dynamic ICL settings where the context for each query differs. To address these limitations, we adopt implicit sequential Bayesian inference as a framework for interpreting ICL, identify "surprise" as an informative signal for class prior shift, and introduce a novel method--Surprise Calibration (SC). SC leverages the notion of surprise to capture the temporal dynamics of class priors, providing a more adaptive and computationally efficient solution for in-context learning. We empirically demonstrate the superiority of SC over existing bias calibration techniques across a range of benchmark natural language processing tasks.
Related papers
- Corrective In-Context Learning: Evaluating Self-Correction in Large Language Models [0.0]
In-context learning (ICL) has transformed the use of large language models (LLMs) for NLP tasks.<n>Despite its effectiveness, ICL is prone to errors, especially for challenging examples.<n>We propose corrective in-context learning (CICL), an approach that incorporates a model's incorrect predictions alongside ground truth corrections into the prompt.
arXiv Detail & Related papers (2025-03-20T10:39:39Z) - Unlocking In-Context Learning for Natural Datasets Beyond Language Modelling [37.36879079951306]
Large Language Models (LLMs) exhibit In-Context Learning (ICL)<n>ICL offers fast adaptation across natural language tasks and domains, but its emergence is less straightforward for modalities beyond text.<n>We identify exact token repetitions in the training data sequences as an important factor for ICL.<n>We unlock ICL capabilities for various visual datasets and a more challenging EEG classification task in a few-shot learning regime.
arXiv Detail & Related papers (2025-01-09T09:45:05Z) - Competition Dynamics Shape Algorithmic Phases of In-Context Learning [10.974593590868533]
In-Context Learning (ICL) has significantly expanded the general-purpose nature of large language models.<n>We propose a synthetic sequence modeling task that involves learning to simulate a finite mixture of Markov chains.<n>We show we can explain a model's behavior by decomposing it into four broad algorithms that combine a fuzzy retrieval vs. inference approach with either unigram or bigram statistics.
arXiv Detail & Related papers (2024-12-01T23:35:53Z) - Disentangling Latent Shifts of In-Context Learning Through Self-Training [0.0]
We introduce STICL (Self-Training ICL), an approach that disentangles the latent shifts of demonstrations from the latent shift of the query through self-training.
STICL employs a teacher model to generate pseudo-labels and trains a student model using these labels, encoded in an adapter module.
Our empirical results show that STICL improves generalization and stability, consistently outperforming traditional ICL methods and other disentangling strategies.
arXiv Detail & Related papers (2024-10-02T13:00:21Z) - Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering [12.348320788446841]
Batch (BC) is a simple yet intuitive method that controls the contextual bias from the batched input.<n>BC is zero-shot, inference-only, and incurs negligible additional costs.<n>We demonstrate state-of-the-art performance over previous calibration baselines across more than 10 natural language understanding and image classification tasks.
arXiv Detail & Related papers (2023-09-29T13:55:45Z) - What and How does In-Context Learning Learn? Bayesian Model Averaging,
Parameterization, and Generalization [111.55277952086155]
We study In-Context Learning (ICL) by addressing several open questions.
We show that, without updating the neural network parameters, ICL implicitly implements the Bayesian model averaging algorithm.
We prove that the error of pretrained model is bounded by a sum of an approximation error and a generalization error.
arXiv Detail & Related papers (2023-05-30T21:23:47Z) - Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs)
Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages.
The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z) - CLIPood: Generalizing CLIP to Out-of-Distributions [73.86353105017076]
Contrastive language-image pre-training (CLIP) models have shown impressive zero-shot ability, but the further adaptation of CLIP on downstream tasks undesirably degrades OOD performances.
We propose CLIPood, a fine-tuning method that can adapt CLIP models to OOD situations where both domain shifts and open classes may occur on unseen test data.
Experiments on diverse datasets with different OOD scenarios show that CLIPood consistently outperforms existing generalization techniques.
arXiv Detail & Related papers (2023-02-02T04:27:54Z) - Using Representation Expressiveness and Learnability to Evaluate
Self-Supervised Learning Methods [61.49061000562676]
We introduce Cluster Learnability (CL) to assess learnability.
CL is measured in terms of the performance of a KNN trained to predict labels obtained by clustering the representations with K-means.
We find that CL better correlates with in-distribution model performance than other competing recent evaluation schemes.
arXiv Detail & Related papers (2022-06-02T19:05:13Z) - MCDAL: Maximum Classifier Discrepancy for Active Learning [74.73133545019877]
Recent state-of-the-art active learning methods have mostly leveraged Generative Adversarial Networks (GAN) for sample acquisition.
We propose in this paper a novel active learning framework that we call Maximum Discrepancy for Active Learning (MCDAL)
In particular, we utilize two auxiliary classification layers that learn tighter decision boundaries by maximizing the discrepancies among them.
arXiv Detail & Related papers (2021-07-23T06:57:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.