Label Words are Anchors: An Information Flow Perspective for
Understanding In-Context Learning
- URL: http://arxiv.org/abs/2305.14160v4
- Date: Tue, 19 Dec 2023 15:13:52 GMT
- Title: Label Words are Anchors: An Information Flow Perspective for
Understanding In-Context Learning
- Authors: Lean Wang, Lei Li, Damai Dai, Deli Chen, Hao Zhou, Fandong Meng, Jie
Zhou, Xu Sun
- Abstract summary: In-context learning (ICL) emerges as a promising capability of large language models (LLMs)
In this paper, we investigate the working mechanism of ICL through an information flow lens.
We introduce an anchor re-weighting method to improve ICL performance, a demonstration compression technique to expedite inference, and an analysis framework for diagnosing ICL errors in GPT2-XL.
- Score: 77.7070536959126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-context learning (ICL) emerges as a promising capability of large language
models (LLMs) by providing them with demonstration examples to perform diverse
tasks. However, the underlying mechanism of how LLMs learn from the provided
context remains under-explored. In this paper, we investigate the working
mechanism of ICL through an information flow lens. Our findings reveal that
label words in the demonstration examples function as anchors: (1) semantic
information aggregates into label word representations during the shallow
computation layers' processing; (2) the consolidated information in label words
serves as a reference for LLMs' final predictions. Based on these insights, we
introduce an anchor re-weighting method to improve ICL performance, a
demonstration compression technique to expedite inference, and an analysis
framework for diagnosing ICL errors in GPT2-XL. The promising applications of
our findings again validate the uncovered ICL working mechanism and pave the
way for future studies.
Related papers
- Revisiting In-context Learning Inference Circuit in Large Language Models [2.4866936275046405]
In-context learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored.
This paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL.
arXiv Detail & Related papers (2024-10-06T12:50:15Z) - Implicit In-context Learning [37.0562059811099]
In-context Learning (ICL) empowers large language models to adapt to unseen tasks during inference by prefixing a few demonstration examples prior to test queries.
We introduce Implicit In-context Learning (I2CL), an innovative paradigm that addresses the challenges associated with traditional ICL by absorbing demonstration examples within the activation space.
I2CL achieves few-shot performance with zero-shot cost and exhibits robustness against the variation of demonstration examples.
arXiv Detail & Related papers (2024-05-23T14:57:52Z) - Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning [41.606494950216764]
In-context Learning (ICL) has emerged as a powerful capability alongside the development of scaled-up large language models (LLMs)
This paper decomposes the overall performance of ICL into three dimensions, label space, format, and discrimination.
We show that ICL exhibits significant efficacy in regulating the label space and format, which helps LLMs respond to desired label words.
arXiv Detail & Related papers (2024-04-11T08:20:10Z) - Rectifying Demonstration Shortcut in In-Context Learning [15.08431909212102]
Large language models (LLMs) are able to solve various tasks with only a few demonstrations utilizing their in-context learning (ICL) abilities.
LLMs often rely on their pre-trained semantic priors of demonstrations rather than on the input-label relationships to proceed with ICL prediction.
arXiv Detail & Related papers (2024-03-14T15:30:14Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - In-Context Exemplars as Clues to Retrieving from Large Associative
Memory [1.2952137350423816]
In-context learning (ICL) enables large language models (LLMs) to learn patterns from in-context exemplars without training.
How to choose exemplars remains unclear due to the lack of understanding of how in-context learning works.
Our study sheds new light on the mechanism of ICL by connecting it to memory retrieval.
arXiv Detail & Related papers (2023-11-06T20:13:29Z) - Hint-enhanced In-Context Learning wakes Large Language Models up for knowledge-intensive tasks [54.153914606302486]
In-context learning (ICL) ability has emerged with the increasing scale of large language models (LLMs)
We propose a new paradigm called Hint-enhanced In-Context Learning (HICL) to explore the power of ICL in open-domain question answering.
arXiv Detail & Related papers (2023-11-03T14:39:20Z) - Improving Input-label Mapping with Demonstration Replay for In-context
Learning [67.57288926736923]
In-context learning (ICL) is an emerging capability of large autoregressive language models.
We propose a novel ICL method called Sliding Causal Attention (RdSca)
We show that our method significantly improves the input-label mapping in ICL demonstrations.
arXiv Detail & Related papers (2023-10-30T14:29:41Z) - Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs)
Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages.
The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z) - Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.
In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training.
We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.