Larger language models do in-context learning differently
- URL: http://arxiv.org/abs/2303.03846v2
- Date: Wed, 8 Mar 2023 07:37:43 GMT
- Title: Larger language models do in-context learning differently
- Authors: Jerry Wei and Jason Wei and Yi Tay and Dustin Tran and Albert Webson
and Yifeng Lu and Xinyun Chen and Hanxiao Liu and Da Huang and Denny Zhou and
Tengyu Ma
- Abstract summary: In-context learning (ICL) in language models is affected by semantic priors versus input-label mappings.
We investigate two setups-ICL with flipped labels and ICL with semantically-unrelated labels.
- Score: 93.90674531127559
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study how in-context learning (ICL) in language models is affected by
semantic priors versus input-label mappings. We investigate two setups-ICL with
flipped labels and ICL with semantically-unrelated labels-across various model
families (GPT-3, InstructGPT, Codex, PaLM, and Flan-PaLM). First, experiments
on ICL with flipped labels show that overriding semantic priors is an emergent
ability of model scale. While small language models ignore flipped labels
presented in-context and thus rely primarily on semantic priors from
pretraining, large models can override semantic priors when presented with
in-context exemplars that contradict priors, despite the stronger semantic
priors that larger models may hold. We next study semantically-unrelated label
ICL (SUL-ICL), in which labels are semantically unrelated to their inputs
(e.g., foo/bar instead of negative/positive), thereby forcing language models
to learn the input-label mappings shown in in-context exemplars in order to
perform the task. The ability to do SUL-ICL also emerges primarily with scale,
and large-enough language models can even perform linear classification in a
SUL-ICL setting. Finally, we evaluate instruction-tuned models and find that
instruction tuning strengthens both the use of semantic priors and the capacity
to learn input-label mappings, but more of the former.
Related papers
- Explore Spurious Correlations at the Concept Level in Language Models for Text Classification [28.832684088975622]
Language models (LMs) have achieved notable success in numerous NLP tasks.
They face robustness challenges due to spurious correlations arising from imbalanced label distributions in training data or ICL exemplars.
This paper introduces two main contributions. First, we employ ChatGPT to assign concept labels to texts, assessing concept bias in models during fine-tuning or ICL on test data.
Second, we introduce a data rebalancing technique that incorporates ChatGPT-generated counterfactual data, thereby balancing label distribution and mitigating spurious correlations.
arXiv Detail & Related papers (2023-11-15T01:58:54Z) - Improving Input-label Mapping with Demonstration Replay for In-context
Learning [67.57288926736923]
In-context learning (ICL) is an emerging capability of large autoregressive language models.
We propose a novel ICL method called Sliding Causal Attention (RdSca)
We show that our method significantly improves the input-label mapping in ICL demonstrations.
arXiv Detail & Related papers (2023-10-30T14:29:41Z) - In-Context Learning for Text Classification with Many Labels [34.87532045406169]
In-context learning (ICL) using large language models for tasks with many labels is challenging due to the limited context window.
We use a pre-trained dense retrieval model to bypass this limitation.
We analyze the performance across number of in-context examples and different model scales.
arXiv Detail & Related papers (2023-09-19T22:41:44Z) - Exploring Structured Semantic Prior for Multi Label Recognition with
Incomplete Labels [60.675714333081466]
Multi-label recognition (MLR) with incomplete labels is very challenging.
Recent works strive to explore the image-to-label correspondence in the vision-language model, ie, CLIP, to compensate for insufficient annotations.
We advocate remedying the deficiency of label supervision for the MLR with incomplete labels by deriving a structured semantic prior.
arXiv Detail & Related papers (2023-03-23T12:39:20Z) - CCPrefix: Counterfactual Contrastive Prefix-Tuning for Many-Class
Classification [57.62886091828512]
We propose a brand-new prefix-tuning method, Counterfactual Contrastive Prefix-tuning (CCPrefix) for many-class classification.
Basically, an instance-dependent soft prefix, derived from fact-counterfactual pairs in the label space, is leveraged to complement the language verbalizers in many-class classification.
arXiv Detail & Related papers (2022-11-11T03:45:59Z) - Integrating Language Guidance into Vision-based Deep Metric Learning [78.18860829585182]
We propose to learn metric spaces which encode semantic similarities as embedding space.
These spaces should be transferable to classes beyond those seen during training.
This causes learned embedding spaces to encode incomplete semantic context and misrepresent the semantic relation between classes.
arXiv Detail & Related papers (2022-03-16T11:06:50Z) - Leveraging Adversarial Training in Self-Learning for Cross-Lingual Text
Classification [52.69730591919885]
We present a semi-supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations.
We observe significant gains in effectiveness on document and intent classification for a diverse set of languages.
arXiv Detail & Related papers (2020-07-29T19:38:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.