Related papers: Revisiting In-context Learning Inference Circuit in Large Language Models

Revisiting In-context Learning Inference Circuit in Large Language Models

URL: http://arxiv.org/abs/2410.04468v4
Date: Thu, 20 Feb 2025 12:58:28 GMT
Title: Revisiting In-context Learning Inference Circuit in Large Language Models
Authors: Hakaze Cho, Mariko Kato, Yoshihiro Sakai, Naoya Inoue,
Abstract summary: In-context learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored.<n>This paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL.
Score: 2.4866936275046405
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In-context Learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored. There are already existing works describing the inner processing of ICL, while they struggle to capture all the inference phenomena in large language models. Therefore, this paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL. In detail, we divide ICL inference into 3 major operations: (1) Input Text Encode: LMs encode every input text (in the demonstrations and queries) into linear representation in the hidden states with sufficient information to solve ICL tasks. (2) Semantics Merge: LMs merge the encoded representations of demonstrations with their corresponding label tokens to produce joint representations of labels and demonstrations. (3) Feature Retrieval and Copy: LMs search the joint representations of demonstrations similar to the query representation on a task subspace, and copy the searched representations into the query. Then, language model heads capture these copied label representations to a certain extent and decode them into predicted labels. Through careful measurements, the proposed inference circuit successfully captures and unifies many fragmented phenomena observed during the ICL process, making it a comprehensive and practical explanation of the ICL inference process. Moreover, ablation analysis by disabling the proposed steps seriously damages the ICL performance, suggesting the proposed inference circuit is a dominating mechanism. Additionally, we confirm and list some bypass mechanisms that solve ICL tasks in parallel with the proposed circuit.

Related papers

Take Off the Training Wheels Progressive In-Context Learning for Effective Alignment [22.224737528266598]
In this paper, we investigate the impact of demonstrations on token representations within alignment tasks. We propose an efficient Progressive In-Context Alignment (PICA) method consisting of two stages. Our work highlights the application of ICL for alignment and calls for a deeper understanding of ICL for complex generations.
arXiv Detail & Related papers (2025-03-13T02:01:02Z)
Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism [28.751003584429615]
Large language models (LLMs) exhibit remarkable in-context learning capabilities. Recent research presents two conflicting views on ICL. We provide a Two-Dimensional Coordinate System that unifies both views into a systematic framework.
arXiv Detail & Related papers (2024-07-24T05:26:52Z)
Implicit In-context Learning [37.0562059811099]
In-context Learning (ICL) empowers large language models to adapt to unseen tasks during inference by prefixing a few demonstration examples prior to test queries. We introduce Implicit In-context Learning (I2CL), an innovative paradigm that addresses the challenges associated with traditional ICL by absorbing demonstration examples within the activation space. I2CL achieves few-shot performance with zero-shot cost and exhibits robustness against the variation of demonstration examples.
arXiv Detail & Related papers (2024-05-23T14:57:52Z)
Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning [41.606494950216764]
In-context Learning (ICL) has emerged as a powerful capability alongside the development of scaled-up large language models (LLMs) This paper decomposes the overall performance of ICL into three dimensions, label space, format, and discrimination. We show that ICL exhibits significant efficacy in regulating the label space and format, which helps LLMs respond to desired label words.
arXiv Detail & Related papers (2024-04-11T08:20:10Z)
Identifying and Analyzing Performance-Critical Tokens in Large Language Models [52.404072802235234]
We study how large language models learn to perform tasks from demonstrations. Our work sheds light on how large language models learn to perform tasks from demonstrations and deepens our understanding of the roles different types of tokens play in large language models.
arXiv Detail & Related papers (2024-01-20T20:55:21Z)
Improving Input-label Mapping with Demonstration Replay for In-context Learning [67.57288926736923]
In-context learning (ICL) is an emerging capability of large autoregressive language models. We propose a novel ICL method called Sliding Causal Attention (RdSca) We show that our method significantly improves the input-label mapping in ICL demonstrations.
arXiv Detail & Related papers (2023-10-30T14:29:41Z)
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning [77.7070536959126]
In-context learning (ICL) emerges as a promising capability of large language models (LLMs) In this paper, we investigate the working mechanism of ICL through an information flow lens. We introduce an anchor re-weighting method to improve ICL performance, a demonstration compression technique to expedite inference, and an analysis framework for diagnosing ICL errors in GPT2-XL.
arXiv Detail & Related papers (2023-05-23T15:26:20Z)
Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs) Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages. The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z)
Explaining Emergent In-Context Learning as Kernel Regression [61.57151500616111]
Large language models (LLMs) have initiated a paradigm shift in transfer learning. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training. We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z)
What Makes Good In-context Demonstrations for Code Intelligence Tasks with LLMs? [60.668318972782295]
Large language models have shown the ability of in-context learning (ICL) ICL employs task instructions and a few examples as demonstrations, and then inputs the demonstrations to the language models for making predictions. It is important to systematically investigate how to construct a good demonstration for code-related tasks.
arXiv Detail & Related papers (2023-04-15T15:13:58Z)
ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction [56.790794611002106]
Large language models (LLMs) have demonstrated remarkable results in various natural language processing (NLP) tasks with in-context learning. We propose a simple but effective in-context learning framework called ICL-D3IE. Specifically, we extract the most difficult and distinct segments from hard training documents as hard demonstrations.
arXiv Detail & Related papers (2023-03-09T06:24:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.