Related papers: From Compression to Expansion: A Layerwise Analysis of In-Context Learning

From Compression to Expansion: A Layerwise Analysis of In-Context Learning

URL: http://arxiv.org/abs/2505.17322v1
Date: Thu, 22 May 2025 22:22:03 GMT
Title: From Compression to Expansion: A Layerwise Analysis of In-Context Learning
Authors: Jiachen Jiang, Yuxin Dong, Jinxin Zhou, Zhihui Zhu,
Abstract summary: In-context learning (ICL) enables large language models to adapt to new tasks without weight updates by learning from demonstration sequences.<n>We conduct a statistical geometric analysis of ICL representations to investigate how task-specific information is captured across layers.<n>Our findings reveal an intriguing layerwise dynamic in ICL, highlight how structured representations emerge within LLMs, and showcase that analyzing internal representations can facilitate a deeper understanding of model behavior.
Score: 20.64102133977965
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In-context learning (ICL) enables large language models (LLMs) to adapt to new tasks without weight updates by learning from demonstration sequences. While ICL shows strong empirical performance, its internal representational mechanisms are not yet well understood. In this work, we conduct a statistical geometric analysis of ICL representations to investigate how task-specific information is captured across layers. Our analysis reveals an intriguing phenomenon, which we term *Layerwise Compression-Expansion*: early layers progressively produce compact and discriminative representations that encode task information from the input demonstrations, while later layers expand these representations to incorporate the query and generate the prediction. This phenomenon is observed consistently across diverse tasks and a range of contemporary LLM architectures. We demonstrate that it has important implications for ICL performance -- improving with model size and the number of demonstrations -- and for robustness in the presence of noisy examples. To further understand the effect of the compact task representation, we propose a bias-variance decomposition and provide a theoretical analysis showing how attention mechanisms contribute to reducing both variance and bias, thereby enhancing performance as the number of demonstrations increases. Our findings reveal an intriguing layerwise dynamic in ICL, highlight how structured representations emerge within LLMs, and showcase that analyzing internal representations can facilitate a deeper understanding of model behavior.

Related papers

Provable Low-Frequency Bias of In-Context Learning of Representations [19.066378730056275]
In-context learning (ICL) enables large language models (LLMs) to acquire new behaviors from the input sequence alone without any parameter updates.<n>Recent studies have shown that ICL can surpass the original meaning learned in pretraining stage through internalizing the structure the data-generating process (DGP) of the prompt into the hidden representations.<n>We present the first rigorous explanation of such phenomena by introducing a unified framework of double convergence.<n>This double convergence process leads to an implicit bias towards smooth (low-frequency) representations, which we prove analytically and verify empirically.
arXiv Detail & Related papers (2025-07-17T21:19:32Z)
Illusion or Algorithm? Investigating Memorization, Emergence, and Symbolic Processing in In-Context Learning [48.67380502157004]
Large-scale Transformer language models (LMs) trained solely on next-token prediction with web-scale data can solve a wide range of tasks.<n>The mechanism behind this capability, known as in-context learning (ICL), remains both controversial and poorly understood.
arXiv Detail & Related papers (2025-05-16T08:50:42Z)
Does Representation Matter? Exploring Intermediate Layers in Large Language Models [22.704926222438456]
We investigate the quality of intermediate representations in large language models (LLMs)<n>We find that intermediate layers often yield more informative representations for downstream tasks than the final layers.<n>Our results illuminate the internal mechanics of LLMs and guide strategies for architectural optimization and training.
arXiv Detail & Related papers (2024-12-12T18:48:51Z)
Interpreting token compositionality in LLMs: A robustness analysis [10.777646083061395]
Constituent-Aware Pooling (CAP) is a methodology designed to analyse how large language models process linguistic structures.<n>CAP intervenes in model activations through constituent-based pooling at various model levels.<n>Our findings highlight fundamental limitations in current transformer architectures regarding compositional semantics processing and model interpretability.
arXiv Detail & Related papers (2024-10-16T18:10:50Z)
DETAIL: Task DEmonsTration Attribution for Interpretable In-context Learning [75.68193159293425]
In-context learning (ICL) allows transformer-based language models to learn a specific task with a few "task demonstrations" without updating their parameters.<n>We propose an influence function-based attribution technique, DETAIL, that addresses the specific characteristics of ICL.<n>We experimentally prove the wide applicability of DETAIL by showing our attribution scores obtained on white-box models are transferable to black-box models in improving model performance.
arXiv Detail & Related papers (2024-05-22T15:52:52Z)
Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models [5.062236259068678]
We investigate how large language models (LLMs) exhibit remarkable performance improvement through in-context learning (ICL) We propose novel methods for parameterized probing and measuring ratio of attention to relevant vs. irrelevant information in Llama-2 70B and Vicuna 13B. Our analyses revealed a meaningful correlation between improvements in behavior after ICL and changes in both embeddings and attention weights across LLM layers.
arXiv Detail & Related papers (2023-09-30T09:01:35Z)
Scaling In-Context Demonstrations with Structured Attention [75.41845145597875]
We propose a better architectural design for in-context learning. Structured Attention for In-Context Learning replaces the full-attention by a structured attention mechanism. We show that SAICL achieves comparable or better performance than full attention while obtaining up to 3.4x inference speed-up.
arXiv Detail & Related papers (2023-07-05T23:26:01Z)
Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning [77.7070536959126]
In-context learning (ICL) emerges as a promising capability of large language models (LLMs) In this paper, we investigate the working mechanism of ICL through an information flow lens. We introduce an anchor re-weighting method to improve ICL performance, a demonstration compression technique to expedite inference, and an analysis framework for diagnosing ICL errors in GPT2-XL.
arXiv Detail & Related papers (2023-05-23T15:26:20Z)
Iterative Forward Tuning Boosts In-Context Learning in Language Models [88.25013390669845]
In this study, we introduce a novel two-stage framework to boost in-context learning in large language models (LLMs) Specifically, our framework delineates the ICL process into two distinct stages: Deep-Thinking and test stages. The Deep-Thinking stage incorporates a unique attention mechanism, i.e., iterative enhanced attention, which enables multiple rounds of information accumulation.
arXiv Detail & Related papers (2023-05-22T13:18:17Z)
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations [70.41385310930846]
We present an end-to-end framework Structure-CLIP to enhance multi-modal structured representations. We use scene graphs to guide the construction of semantic negative examples, which results in an increased emphasis on learning structured representations. A Knowledge-Enhance (KEE) is proposed to leverage SGK as input to further enhance structured representations.
arXiv Detail & Related papers (2023-05-06T03:57:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.