Batch Calibration: Rethinking Calibration for In-Context Learning and
Prompt Engineering
- URL: http://arxiv.org/abs/2309.17249v2
- Date: Wed, 24 Jan 2024 18:27:30 GMT
- Title: Batch Calibration: Rethinking Calibration for In-Context Learning and
Prompt Engineering
- Authors: Han Zhou, Xingchen Wan, Lev Proleev, Diana Mincu, Jilin Chen,
Katherine Heller, Subhrajit Roy
- Abstract summary: Batch (BC) is a simple yet intuitive method that controls the contextual bias from the batched input.
BC is zero-shot, inference-only, and incurs negligible additional costs.
We demonstrate state-of-the-art performance over previous calibration baselines across more than 10 natural language understanding and image classification tasks.
- Score: 12.967536233145614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Prompting and in-context learning (ICL) have become efficient learning
paradigms for large language models (LLMs). However, LLMs suffer from prompt
brittleness and various bias factors in the prompt, including but not limited
to the formatting, the choice verbalizers, and the ICL examples. To address
this problem that results in unexpected performance degradation, calibration
methods have been developed to mitigate the effects of these biases while
recovering LLM performance. In this work, we first conduct a systematic
analysis of the existing calibration methods, where we both provide a unified
view and reveal the failure cases. Inspired by these analyses, we propose Batch
Calibration (BC), a simple yet intuitive method that controls the contextual
bias from the batched input, unifies various prior approaches, and effectively
addresses the aforementioned issues. BC is zero-shot, inference-only, and
incurs negligible additional costs. In the few-shot setup, we further extend BC
to allow it to learn the contextual bias from labeled data. We validate the
effectiveness of BC with PaLM 2-(S, M, L) and CLIP models and demonstrate
state-of-the-art performance over previous calibration baselines across more
than 10 natural language understanding and image classification tasks.
Related papers
- Task Calibration: Calibrating Large Language Models on Inference Tasks [23.257422868895855]
Large language models (LLMs) have exhibited impressive zero-shot performance on inference tasks.
LLMs may suffer from spurious correlations between input texts and output labels, which limits their ability to reason.
We propose task calibration (TC), a zero-shot and inference-only calibration method inspired by mutual information.
arXiv Detail & Related papers (2024-10-24T14:18:32Z) - Context-aware Prompt Tuning: Advancing In-Context Learning with Adversarial Methods [69.36397993451742]
This work introduces Context-aware Prompt Tuning (CPT), a method inspired by ICL, PT, and adversarial attacks.
We modify specific context tokens, considering the unique structure of input and output formats.
Inspired by adversarial attacks, we adjust the input based on the labels present in the context, focusing on minimizing, rather than maximizing, the loss.
arXiv Detail & Related papers (2024-10-22T17:45:47Z) - Subtle Errors Matter: Preference Learning via Error-injected Self-editing [59.405145971637204]
We propose a novel preference learning framework called eRror-Injected Self-Editing (RISE)
RISE injects predefined subtle errors into partial tokens of correct solutions to construct hard pairs for error mitigation.
Experiments validate the effectiveness of RISE, with preference learning on Qwen2-7B-Instruct yielding notable improvements of 3.0% on GSM8K and 7.9% on MATH.
arXiv Detail & Related papers (2024-10-09T07:43:38Z) - Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models [24.085614720512744]
This study shows that large language models (LLMs) are vulnerable to changes in the number and arrangement of options in text classification.
Key bottleneck arises from ambiguous decision boundaries and inherent biases towards specific tokens and positions.
Our approach is grounded in the empirical observation that pairwise comparisons can effectively alleviate boundary ambiguity and inherent bias.
arXiv Detail & Related papers (2024-06-11T06:53:19Z) - Debiasing Multimodal Large Language Models [61.6896704217147]
Large Vision-Language Models (LVLMs) have become indispensable tools in computer vision and natural language processing.
Our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior to the input image.
To rectify these biases and redirect the model's focus toward vision information, we introduce two simple, training-free strategies.
arXiv Detail & Related papers (2024-03-08T12:35:07Z) - Causal Prompting: Debiasing Large Language Model Prompting based on Front-Door Adjustment [32.12998469814097]
A novel causal prompting method based on front-door adjustment is proposed to effectively mitigate Large Language Models (LLMs) biases.
Experimental results show that the proposed causal prompting approach achieves excellent performance across seven natural language processing datasets.
arXiv Detail & Related papers (2024-03-05T07:47:34Z) - Prompt-Based Bias Calibration for Better Zero/Few-Shot Learning of Language Models [7.089534153472173]
We propose a null-input prompting method to calibrate intrinsic bias encoded in pre-trained language models.
Our method significantly improves zero/few-shot learning performance of LMs for both in-context learning and prompt-based fine-tuning.
arXiv Detail & Related papers (2024-02-15T22:54:24Z) - Open-Vocabulary Calibration for Fine-tuned CLIP [44.82453633696438]
The confidence calibration problem in fine-tuned vision-language models (VLMs) could greatly reduce reliability when deploying such models in the real world.
This paper bridges the gap by systematically investigating the confidence calibration problem in the context of prompt learning.
We present a simple and effective approach called Distance-Aware (DAC), which is based on scaling the temperature using as guidance the distance between predicted text labels and base classes.
arXiv Detail & Related papers (2024-02-07T08:42:48Z) - On Task Performance and Model Calibration with Supervised and
Self-Ensembled In-Context Learning [71.44986275228747]
In-context learning (ICL) has become an efficient approach propelled by the recent advancements in large language models (LLMs)
However, both paradigms are prone to suffer from the critical problem of overconfidence (i.e., miscalibration)
arXiv Detail & Related papers (2023-12-21T11:55:10Z) - $k$NN Prompting: Beyond-Context Learning with Calibration-Free Nearest
Neighbor Inference [75.08572535009276]
In-Context Learning (ICL) formulates target tasks as prompt completion conditioned on in-context demonstrations.
$k$NN Prompting first queries LLM with training data for distributed representations, then predicts test instances by simply referring to nearest neighbors.
It significantly outperforms state-of-the-art calibration-based methods under comparable few-shot scenario.
arXiv Detail & Related papers (2023-03-24T06:16:29Z) - Boosting Weakly Supervised Object Detection via Learning Bounding Box
Adjusters [76.36104006511684]
Weakly-supervised object detection (WSOD) has emerged as an inspiring recent topic to avoid expensive instance-level object annotations.
We defend the problem setting for improving localization performance by leveraging the bounding box regression knowledge from a well-annotated auxiliary dataset.
Our method performs favorably against state-of-the-art WSOD methods and knowledge transfer model with similar problem setting.
arXiv Detail & Related papers (2021-08-03T13:38:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.