Label Words are Anchors: An Information Flow Perspective for
Understanding In-Context Learning
- URL: http://arxiv.org/abs/2305.14160v4
- Date: Tue, 19 Dec 2023 15:13:52 GMT
- Title: Label Words are Anchors: An Information Flow Perspective for
Understanding In-Context Learning
- Authors: Lean Wang, Lei Li, Damai Dai, Deli Chen, Hao Zhou, Fandong Meng, Jie
Zhou, Xu Sun
- Abstract summary: In-context learning (ICL) emerges as a promising capability of large language models (LLMs)
In this paper, we investigate the working mechanism of ICL through an information flow lens.
We introduce an anchor re-weighting method to improve ICL performance, a demonstration compression technique to expedite inference, and an analysis framework for diagnosing ICL errors in GPT2-XL.
- Score: 77.7070536959126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In-context learning (ICL) emerges as a promising capability of large language
models (LLMs) by providing them with demonstration examples to perform diverse
tasks. However, the underlying mechanism of how LLMs learn from the provided
context remains under-explored. In this paper, we investigate the working
mechanism of ICL through an information flow lens. Our findings reveal that
label words in the demonstration examples function as anchors: (1) semantic
information aggregates into label word representations during the shallow
computation layers' processing; (2) the consolidated information in label words
serves as a reference for LLMs' final predictions. Based on these insights, we
introduce an anchor re-weighting method to improve ICL performance, a
demonstration compression technique to expedite inference, and an analysis
framework for diagnosing ICL errors in GPT2-XL. The promising applications of
our findings again validate the uncovered ICL working mechanism and pave the
way for future studies.
Related papers
- Enhancing Input-Label Mapping in In-Context Learning with Contrastive Decoding [71.01099784480597]
Large language models (LLMs) excel at a range of tasks through in-context learning (ICL)
We introduce In-Context Contrastive Decoding (ICCD), a novel method that emphasizes input-label mapping.
ICCD emphasizes input-label mapping by contrasting the output distributions between positive and negative in-context examples.
arXiv Detail & Related papers (2025-02-19T14:04:46Z) - PICLe: Pseudo-Annotations for In-Context Learning in Low-Resource Named Entity Detection [56.916656013563355]
In-context learning (ICL) enables Large Language Models to perform tasks using few demonstrations.
We propose PICLe, a framework for in-context learning with noisy, pseudo-annotated demonstrations.
We evaluate PICLe on five biomedical NED datasets and show that, with zero human annotation, PICLe outperforms ICL in low-resource settings.
arXiv Detail & Related papers (2024-12-16T16:09:35Z) - Revisiting In-context Learning Inference Circuit in Large Language Models [2.4866936275046405]
In-context learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored.
This paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL.
arXiv Detail & Related papers (2024-10-06T12:50:15Z) - Implicit In-context Learning [37.0562059811099]
In-context Learning (ICL) empowers large language models to adapt to unseen tasks during inference by prefixing a few demonstration examples prior to test queries.
We introduce Implicit In-context Learning (I2CL), an innovative paradigm that addresses the challenges associated with traditional ICL by absorbing demonstration examples within the activation space.
I2CL achieves few-shot performance with zero-shot cost and exhibits robustness against the variation of demonstration examples.
arXiv Detail & Related papers (2024-05-23T14:57:52Z) - Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning [41.606494950216764]
In-context Learning (ICL) has emerged as a powerful capability alongside the development of scaled-up large language models (LLMs)
This paper decomposes the overall performance of ICL into three dimensions, label space, format, and discrimination.
We show that ICL exhibits significant efficacy in regulating the label space and format, which helps LLMs respond to desired label words.
arXiv Detail & Related papers (2024-04-11T08:20:10Z) - Rectifying Demonstration Shortcut in In-Context Learning [15.08431909212102]
Large language models (LLMs) are able to solve various tasks with only a few demonstrations utilizing their in-context learning (ICL) abilities.
LLMs often rely on their pre-trained semantic priors of demonstrations rather than on the input-label relationships to proceed with ICL prediction.
arXiv Detail & Related papers (2024-03-14T15:30:14Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - In-Context Exemplars as Clues to Retrieving from Large Associative
Memory [1.2952137350423816]
In-context learning (ICL) enables large language models (LLMs) to learn patterns from in-context exemplars without training.
How to choose exemplars remains unclear due to the lack of understanding of how in-context learning works.
Our study sheds new light on the mechanism of ICL by connecting it to memory retrieval.
arXiv Detail & Related papers (2023-11-06T20:13:29Z) - Hint-enhanced In-Context Learning wakes Large Language Models up for knowledge-intensive tasks [54.153914606302486]
In-context learning (ICL) ability has emerged with the increasing scale of large language models (LLMs)
We propose a new paradigm called Hint-enhanced In-Context Learning (HICL) to explore the power of ICL in open-domain question answering.
arXiv Detail & Related papers (2023-11-03T14:39:20Z) - Improving Input-label Mapping with Demonstration Replay for In-context
Learning [67.57288926736923]
In-context learning (ICL) is an emerging capability of large autoregressive language models.
We propose a novel ICL method called Sliding Causal Attention (RdSca)
We show that our method significantly improves the input-label mapping in ICL demonstrations.
arXiv Detail & Related papers (2023-10-30T14:29:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.