Related papers: Context Structure Reshapes the Representational Geometry of Language Models

Context Structure Reshapes the Representational Geometry of Language Models

URL: http://arxiv.org/abs/2601.22364v1
Date: Thu, 29 Jan 2026 22:17:17 GMT
Title: Context Structure Reshapes the Representational Geometry of Language Models
Authors: Eghbal A. Hosseini, Yuxuan Li, Yasaman Bahri, Declan Campbell, Andrew Kyle Lampinen,
Abstract summary: Large Language Models (LLMs) organize the representations of input sequences into straighter neural trajectories.<n>Recent work has shown that this in-context learning can be reflected in representational changes.<n>We measure representational straightening in Gemma 2 models across a diverse set of in-context tasks.
Score: 9.670218260803628
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Models (LLMs) have been shown to organize the representations of input sequences into straighter neural trajectories in their deep layers, which has been hypothesized to facilitate next-token prediction via linear extrapolation. Language models can also adapt to diverse tasks and learn new structure in context, and recent work has shown that this in-context learning (ICL) can be reflected in representational changes. Here we bring these two lines of research together to explore whether representation straightening occurs \emph{within} a context during ICL. We measure representational straightening in Gemma 2 models across a diverse set of in-context tasks, and uncover a dichotomy in how LLMs' representations change in context. In continual prediction settings (e.g., natural language, grid world traversal tasks) we observe that increasing context increases the straightness of neural sequence trajectories, which is correlated with improvement in model prediction. Conversely, in structured prediction settings (e.g., few-shot tasks), straightening is inconsistent -- it is only present in phases of the task with explicit structure (e.g., repeating a template), but vanishes elsewhere. These results suggest that ICL is not a monolithic process. Instead, we propose that LLMs function like a Swiss Army knife: depending on task structure, the LLM dynamically selects between strategies, only some of which yield representational straightening.

Related papers

Stable Diffusion Models are Secretly Good at Visual In-Context Learning [9.829303881652548]
We show that off-the-shelf Stable Diffusion models can be repurposed for visual in-context learning (V-ICL)<n>We formulate an in-place attention re-computation within the self-attention layers of the Stable Diffusion architecture.<n>We show that this repurposed Stable Diffusion model is able to adapt to six different tasks.
arXiv Detail & Related papers (2025-08-13T17:08:22Z)
Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B [51.74607395697567]
In-Context Learning (ICL) is an intriguing ability of large language models (LLMs)<n>We use causal interventions to identify information flow in Gemma-2 2B for five naturalistic ICL tasks.<n>We find that the model infers task information using a two-step strategy we call contextualize-then-aggregate.
arXiv Detail & Related papers (2025-03-31T18:33:55Z)
The representation landscape of few-shot learning and fine-tuning in large language models [43.76048699313088]
In-context learning (ICL) and supervised fine-tuning (SFT) are two common strategies for improving the performance of modern large language models (LLMs) We analyze the probability landscape of their hidden representations in the two cases. We find that ICL and SFT create very different internal structures, in both cases undergoing a sharp transition in the middle of the network.
arXiv Detail & Related papers (2024-09-05T16:15:12Z)
Parallel Structures in Pre-training Data Yield In-Context Learning [41.27837171531926]
We study what patterns of the pre-training data contribute to in-context learning (ICL) We find that LMs' ICL ability depends on $textitparallel structures$ in the pre-training data.
arXiv Detail & Related papers (2024-02-19T20:40:48Z)
In-Context Language Learning: Architectures and Algorithms [73.93205821154605]
We study ICL through the lens of a new family of model problems we term in context language learning (ICLL) We evaluate a diverse set of neural sequence models on regular ICLL tasks.
arXiv Detail & Related papers (2024-01-23T18:59:21Z)
In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax [36.98247762224868]
In-context learning (ICL) is now a common method for teaching large language models (LLMs) new tasks. Do models infer the underlying structure of the task defined by the context, or do they rely on superficial generalizations that only generalize to identically distributed examples? In experiments with models from the GPT, PaLM, and Llama 2 families, we find large variance across LMs. The variance is explained more by the composition of the pre-training corpus and supervision methods than by model size.
arXiv Detail & Related papers (2023-11-13T23:52:43Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization. We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z)
Understanding Emergent In-Context Learning from a Kernel Regression Perspective [55.95455089638838]
Large language models (LLMs) have initiated a paradigm shift in transfer learning.<n>This paper proposes a kernel-regression perspective of understanding LLMs' ICL bahaviors when faced with in-context examples.<n>We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression.
arXiv Detail & Related papers (2023-05-22T06:45:02Z)
Autoregressive Structured Prediction with Language Models [73.11519625765301]
We describe an approach to model structures as sequences of actions in an autoregressive manner with PLMs. Our approach achieves the new state-of-the-art on all the structured prediction tasks we looked at.
arXiv Detail & Related papers (2022-10-26T13:27:26Z)
SLM: Learning a Discourse Language Representation with Sentence Unshuffling [53.42814722621715]
We introduce Sentence-level Language Modeling, a new pre-training objective for learning a discourse language representation. We show that this feature of our model improves the performance of the original BERT by large margins.
arXiv Detail & Related papers (2020-10-30T13:33:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.