$k$NN Prompting: Beyond-Context Learning with Calibration-Free Nearest
Neighbor Inference
- URL: http://arxiv.org/abs/2303.13824v1
- Date: Fri, 24 Mar 2023 06:16:29 GMT
- Title: $k$NN Prompting: Beyond-Context Learning with Calibration-Free Nearest
Neighbor Inference
- Authors: Benfeng Xu, Quan Wang, Zhendong Mao, Yajuan Lyu, Qiaoqiao She,
Yongdong Zhang
- Abstract summary: In-Context Learning (ICL) formulates target tasks as prompt completion conditioned on in-context demonstrations.
$k$NN Prompting first queries LLM with training data for distributed representations, then predicts test instances by simply referring to nearest neighbors.
It significantly outperforms state-of-the-art calibration-based methods under comparable few-shot scenario.
- Score: 75.08572535009276
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In-Context Learning (ICL), which formulates target tasks as prompt completion
conditioned on in-context demonstrations, has become the prevailing utilization
of LLMs. In this paper, we first disclose an actual predicament for this
typical usage that it can not scale up with training data due to context length
restriction. Besides, existing works have shown that ICL also suffers from
various biases and requires delicate calibration treatment. To address both
challenges, we advocate a simple and effective solution, $k$NN Prompting, which
first queries LLM with training data for distributed representations, then
predicts test instances by simply referring to nearest neighbors. We conduct
comprehensive experiments to demonstrate its two-fold superiority: 1)
Calibration-Free: $k$NN Prompting does not directly align LLM output
distribution with task-specific label space, instead leverages such
distribution to align test and training instances. It significantly outperforms
state-of-the-art calibration-based methods under comparable few-shot scenario.
2) Beyond-Context: $k$NN Prompting can further scale up effectively with as
many training data as are available, continually bringing substantial
improvements. The scaling trend holds across 10 orders of magnitude ranging
from 2 shots to 1024 shots as well as different LLMs scales ranging from 0.8B
to 30B. It successfully bridges data scaling into model scaling, and brings new
potentials for the gradient-free paradigm of LLM deployment. Code is publicly
available.
Related papers
- Bayesian scaling laws for in-context learning [72.17734205418502]
In-context learning (ICL) is a powerful technique for getting language models to perform complex tasks with no training updates.
We show that ICL approximates a Bayesian learner and develop a family of novel Bayesian scaling laws for ICL.
arXiv Detail & Related papers (2024-10-21T21:45:22Z) - Uncertainty Aware Learning for Language Model Alignment [97.36361196793929]
We propose uncertainty-aware learning (UAL) to improve the model alignment of different task scenarios.
We implement UAL in a simple fashion -- adaptively setting the label smoothing value of training according to the uncertainty of individual samples.
Experiments on widely used benchmarks demonstrate that our UAL significantly and consistently outperforms standard supervised fine-tuning.
arXiv Detail & Related papers (2024-06-07T11:37:45Z) - Feature-Adaptive and Data-Scalable In-Context Learning [36.01997148676005]
FADS-ICL is a feature-adaptive and data-scalable in-context learning framework.
It can leverage task-adaptive features to promote inference on the downstream task.
FADS-ICL consistently outperforms previous state-of-the-art methods.
arXiv Detail & Related papers (2024-05-17T12:32:53Z) - Alpaca against Vicuna: Using LLMs to Uncover Memorization of LLMs [61.04246774006429]
We introduce a black-box prompt optimization method that uses an attacker LLM agent to uncover higher levels of memorization in a victim agent.
We observe that our instruction-based prompts generate outputs with 23.7% higher overlap with training data compared to the baseline prefix-suffix measurements.
Our findings show that instruction-tuned models can expose pre-training data as much as their base-models, if not more so, and using instructions proposed by other LLMs can open a new avenue of automated attacks.
arXiv Detail & Related papers (2024-03-05T19:32:01Z) - Generative Calibration for In-context Learning [20.207930451266822]
In this paper, we identify that such a paradox is mainly due to the label shift of the in-context model to the data distribution.
With this understanding, we can calibrate the in-context predictive distribution by adjusting the label marginal.
We call our approach as generative calibration. We conduct exhaustive experiments with 12 text classification tasks and 12 LLMs scaling from 774M to 33B.
arXiv Detail & Related papers (2023-10-16T10:45:02Z) - Not All Demonstration Examples are Equally Beneficial: Reweighting
Demonstration Examples for In-Context Learning [32.29118942982609]
Large Language Models (LLMs) have recently gained the In-Context Learning (ICL) ability with the models scaling up.
This paper investigates how to determine approximately optimal weights for demonstration examples and how to apply them during ICL.
Experimental results on 8 text classification tasks show that our approach outperforms conventional ICL by a large margin.
arXiv Detail & Related papers (2023-10-12T13:15:11Z) - Batch Calibration: Rethinking Calibration for In-Context Learning and
Prompt Engineering [12.967536233145614]
Batch (BC) is a simple yet intuitive method that controls the contextual bias from the batched input.
BC is zero-shot, inference-only, and incurs negligible additional costs.
We demonstrate state-of-the-art performance over previous calibration baselines across more than 10 natural language understanding and image classification tasks.
arXiv Detail & Related papers (2023-09-29T13:55:45Z) - ReLLa: Retrieval-enhanced Large Language Models for Lifelong Sequential Behavior Comprehension in Recommendation [43.270424225285105]
We focus on adapting and empowering a pure large language model for zero-shot and few-shot recommendation tasks.
We propose Retrieval-enhanced Large Language models (ReLLa) for recommendation tasks in both zero-shot and few-shot settings.
arXiv Detail & Related papers (2023-08-22T02:25:04Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z) - Improving Representational Continuity via Continued Pretraining [76.29171039601948]
Transfer learning community (LP-FT) outperforms naive training and other continual learning methods.
LP-FT also reduces forgetting in a real world satellite remote sensing dataset (FMoW)
variant of LP-FT gets state-of-the-art accuracies on an NLP continual learning benchmark.
arXiv Detail & Related papers (2023-02-26T10:39:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.