Related papers: Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering

Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering

URL: http://arxiv.org/abs/2309.17249v2
Date: Wed, 24 Jan 2024 18:27:30 GMT
Title: Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering
Authors: Han Zhou, Xingchen Wan, Lev Proleev, Diana Mincu, Jilin Chen, Katherine Heller, Subhrajit Roy
Abstract summary: Batch (BC) is a simple yet intuitive method that controls the contextual bias from the batched input. BC is zero-shot, inference-only, and incurs negligible additional costs. We demonstrate state-of-the-art performance over previous calibration baselines across more than 10 natural language understanding and image classification tasks.
Score: 12.967536233145614
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Prompting and in-context learning (ICL) have become efficient learning paradigms for large language models (LLMs). However, LLMs suffer from prompt brittleness and various bias factors in the prompt, including but not limited to the formatting, the choice verbalizers, and the ICL examples. To address this problem that results in unexpected performance degradation, calibration methods have been developed to mitigate the effects of these biases while recovering LLM performance. In this work, we first conduct a systematic analysis of the existing calibration methods, where we both provide a unified view and reveal the failure cases. Inspired by these analyses, we propose Batch Calibration (BC), a simple yet intuitive method that controls the contextual bias from the batched input, unifies various prior approaches, and effectively addresses the aforementioned issues. BC is zero-shot, inference-only, and incurs negligible additional costs. In the few-shot setup, we further extend BC to allow it to learn the contextual bias from labeled data. We validate the effectiveness of BC with PaLM 2-(S, M, L) and CLIP models and demonstrate state-of-the-art performance over previous calibration baselines across more than 10 natural language understanding and image classification tasks.

Related papers

Surprise Calibration for Better In-Context Learning [6.566285172635043]
In-context learning (ICL) has emerged as a powerful paradigm for task adaptation in large language models.<n>Existing bias calibration methods apply fixed class priors across all inputs, limiting their efficacy in dynamic ICL settings.<n>We introduce a novel method-Surprise (SC), which captures the temporal dynamics of class priors.
arXiv Detail & Related papers (2025-06-15T10:04:42Z)
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models [69.798277882245]
We introduce Unsupervised Prefix Fine-Tuning (UPFT) to enhance large language models' reasoning efficiency. UPFT removes the need for labeled data or exhaustive sampling. Experiments show that UPFT matches the performance of supervised methods.
arXiv Detail & Related papers (2025-03-04T18:56:03Z)
Loss-Aware Curriculum Learning for Chinese Grammatical Error Correction [21.82403446634522]
Chinese grammatical error correction (CGEC) aims to detect and correct errors in the input Chinese sentences. Current approaches ignore that correction difficulty varies across different instances and treat these samples equally. We propose a multi-granularity Curriculum Learning framework to address this problem.
arXiv Detail & Related papers (2024-12-31T08:11:49Z)
Task Calibration: Calibrating Large Language Models on Inference Tasks [23.257422868895855]
Large language models (LLMs) have exhibited impressive zero-shot performance on inference tasks. LLMs may suffer from spurious correlations between input texts and output labels, which limits their ability to reason. We propose task calibration (TC), a zero-shot and inference-only calibration method inspired by mutual information.
arXiv Detail & Related papers (2024-10-24T14:18:32Z)
Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods [69.36397993451742]
This work introduces Context-aware Prompt Tuning (CPT), a method inspired by ICL, PT, and adversarial attacks. We modify specific context tokens, considering the unique structure of input and output formats. Inspired by adversarial attacks, we adjust the input based on the labels present in the context, focusing on minimizing, rather than maximizing, the loss.
arXiv Detail & Related papers (2024-10-22T17:45:47Z)
Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE) RISE injects predefined subtle errors into partial tokens of correct solutions to construct hard pairs for error mitigation. Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH.
arXiv Detail & Related papers (2024-10-09T07:43:38Z)
Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models [24.085614720512744]
This study shows that large language models (LLMs) are vulnerable to changes in the number and arrangement of options in text classification. Key bottleneck arises from ambiguous decision boundaries and inherent biases towards specific tokens and positions. Our approach is grounded in the empirical observation that pairwise comparisons can effectively alleviate boundary ambiguity and inherent bias.
arXiv Detail & Related papers (2024-06-11T06:53:19Z)
Debiasing Multimodal Large Language Models [61.6896704217147]
Large Vision-Language Models (LVLMs) have become indispensable tools in computer vision and natural language processing. Our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior to the input image. To rectify these biases and redirect the model's focus toward vision information, we introduce two simple, training-free strategies.
arXiv Detail & Related papers (2024-03-08T12:35:07Z)
Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment [32.12998469814097]
A novel causal prompting method based on front-door adjustment is proposed to effectively mitigate Large Language Models (LLMs) biases. Experimental results show that the proposed causal prompting approach achieves excellent performance across seven natural language processing datasets.
arXiv Detail & Related papers (2024-03-05T07:47:34Z)
Prompt-Based Bias Calibration for Better Zero/Few-Shot Learning of Language Models [7.089534153472173]
We propose a null-input prompting method to calibrate intrinsic bias encoded in pre-trained language models. Our method significantly improves zero/few-shot learning performance of LMs for both in-context learning and prompt-based fine-tuning.
arXiv Detail & Related papers (2024-02-15T22:54:24Z)
Open-Vocabulary Calibration for Fine-tuned CLIP [44.82453633696438]
The confidence calibration problem in fine-tuned vision-language models (VLMs) could greatly reduce reliability when deploying such models in the real world. This paper bridges the gap by systematically investigating the confidence calibration problem in the context of prompt learning. We present a simple and effective approach called Distance-Aware (DAC), which is based on scaling the temperature using as guidance the distance between predicted text labels and base classes.
arXiv Detail & Related papers (2024-02-07T08:42:48Z)
On Task Performance and Model Calibration with Supervised and Self-Ensembled In-Context Learning [71.44986275228747]
In-context learning (ICL) has become an efficient approach propelled by the recent advancements in large language models (LLMs) However, both paradigms are prone to suffer from the critical problem of overconfidence (i.e., miscalibration)
arXiv Detail & Related papers (2023-12-21T11:55:10Z)
$k$NN Prompting: Beyond-Context Learning with Calibration-Free Nearest Neighbor Inference [75.08572535009276]
In-Context Learning (ICL) formulates target tasks as prompt completion conditioned on in-context demonstrations. $k$NN Prompting first queries LLM with training data for distributed representations, then predicts test instances by simply referring to nearest neighbors. It significantly outperforms state-of-the-art calibration-based methods under comparable few-shot scenario.
arXiv Detail & Related papers (2023-03-24T06:16:29Z)
Boosting Weakly Supervised Object Detection via Learning Bounding Box Adjusters [76.36104006511684]
Weakly-supervised object detection (WSOD) has emerged as an inspiring recent topic to avoid expensive instance-level object annotations. We defend the problem setting for improving localization performance by leveraging the bounding box regression knowledge from a well-annotated auxiliary dataset. Our method performs favorably against state-of-the-art WSOD methods and knowledge transfer model with similar problem setting.
arXiv Detail & Related papers (2021-08-03T13:38:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.