Measuring Inductive Biases of In-Context Learning with Underspecified
Demonstrations
- URL: http://arxiv.org/abs/2305.13299v1
- Date: Mon, 22 May 2023 17:56:31 GMT
- Title: Measuring Inductive Biases of In-Context Learning with Underspecified
Demonstrations
- Authors: Chenglei Si, Dan Friedman, Nitish Joshi, Shi Feng, Danqi Chen, He He
- Abstract summary: In-context learning (ICL) is an important paradigm for adapting large language models to new tasks.
We investigate the inductive biases of ICL from the perspective of feature bias.
- Score: 35.16904555065152
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-context learning (ICL) is an important paradigm for adapting large
language models (LLMs) to new tasks, but the generalization behavior of ICL
remains poorly understood. We investigate the inductive biases of ICL from the
perspective of feature bias: which feature ICL is more likely to use given a
set of underspecified demonstrations in which two features are equally
predictive of the labels. First, we characterize the feature biases of GPT-3
models by constructing underspecified demonstrations from a range of NLP
datasets and feature combinations. We find that LLMs exhibit clear feature
biases - for example, demonstrating a strong bias to predict labels according
to sentiment rather than shallow lexical features, like punctuation. Second, we
evaluate the effect of different interventions that are designed to impose an
inductive bias in favor of a particular feature, such as adding a natural
language instruction or using semantically relevant label words. We find that,
while many interventions can influence the learner to prefer a particular
feature, it can be difficult to overcome strong prior biases. Overall, our
results provide a broader picture of the types of features that ICL may be more
likely to exploit and how to impose inductive biases that are better aligned
with the intended task.
Related papers
- Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors [74.04775677110179]
In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs)
In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt.
Our results indicate that aggregation is a confounding factor in the modeling of subjective tasks, and advocate focusing on modeling individuals instead.
arXiv Detail & Related papers (2024-10-17T17:16:00Z) - Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process [45.632012199451275]
In-context learning (ICL) is a few-shot learning paradigm that involves learning mappings through input-output pairs.
Existing works are highly dependent on large-scale labeled support sets, not always feasible in practical scenarios.
We introduce the Language Model-based Determinant Point Process (LM-DPP) that simultaneously considers the uncertainty and diversity of unlabeled instances for optimal selection.
arXiv Detail & Related papers (2024-08-04T18:08:15Z) - UniBias: Unveiling and Mitigating LLM Bias through Internal Attention and FFN Manipulation [12.04811490937078]
We investigate how feedforward neural networks (FFNs) and attention heads result in the bias of large language models (LLMs)
To mitigate these biases, we introduce UniBias, an inference-only method that effectively identifies and eliminates biased FFN vectors and attention heads.
arXiv Detail & Related papers (2024-05-31T03:59:15Z) - Reinforcement Learning Fine-tuning of Language Models is Biased Towards
More Extractable Features [0.5937476291232802]
We investigate whether principles governing inductive biases in the supervised fine-tuning of large language models also apply when the fine-tuning process uses reinforcement learning.
We find statistically significant correlations which constitute strong evidence for these hypotheses.
arXiv Detail & Related papers (2023-11-07T15:00:39Z) - Improving Input-label Mapping with Demonstration Replay for In-context
Learning [67.57288926736923]
In-context learning (ICL) is an emerging capability of large autoregressive language models.
We propose a novel ICL method called Sliding Causal Attention (RdSca)
We show that our method significantly improves the input-label mapping in ICL demonstrations.
arXiv Detail & Related papers (2023-10-30T14:29:41Z) - Mitigating Label Biases for In-context Learning [28.209613730240633]
Various design settings for in-context learning (ICL) can bias a model toward a particular prediction without being reflective of an understanding of the task.
In this work, we define a typology for three types of label biases in ICL for text classification: vanilla-label bias, context-label bias, and domain-label bias.
arXiv Detail & Related papers (2023-05-28T15:37:39Z) - Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds [59.71218039095155]
We evaluate language understanding capacities on simple inference tasks that most humans find trivial.
We target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments.
The models exhibit moderate to low performance on these evaluation sets.
arXiv Detail & Related papers (2023-05-24T06:41:09Z) - Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats.
We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes.
We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z) - Larger language models do in-context learning differently [93.90674531127559]
In-context learning (ICL) in language models is affected by semantic priors versus input-label mappings.
We investigate two setups-ICL with flipped labels and ICL with semantically-unrelated labels.
arXiv Detail & Related papers (2023-03-07T12:24:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.