Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs
- URL: http://arxiv.org/abs/2603.03415v1
- Date: Tue, 03 Mar 2026 18:48:15 GMT
- Title: Farther the Shift, Sparser the Representation: Analyzing OOD Mechanisms in LLMs
- Authors: Mingyu Jin, Yutong Yin, Jingcheng Niu, Qingcheng Zeng, Wujiang Xu, Mengnan Du, Wei Cheng, Zhaoran Wang, Tianlong Chen, Dimitris N. Metaxas,
- Abstract summary: We investigate how Large Language Models adapt their internal representations when encountering inputs of increasing difficulty.<n>We reveal a consistent and quantifiable phenomenon: as task difficulty increases, the last hidden states of LLMs become substantially sparser.<n>This sparsity--difficulty relation is observable across diverse models and domains.
- Score: 100.02824137397464
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we investigate how Large Language Models (LLMs) adapt their internal representations when encountering inputs of increasing difficulty, quantified as the degree of out-of-distribution (OOD) shift. We reveal a consistent and quantifiable phenomenon: as task difficulty increases, whether through harder reasoning questions, longer contexts, or adding answer choices, the last hidden states of LLMs become substantially sparser. In short, \textbf{\textit{the farther the shift, the sparser the representations}}. This sparsity--difficulty relation is observable across diverse models and domains, suggesting that language models respond to unfamiliar or complex inputs by concentrating computation into specialized subspaces in the last hidden state. Through a series of controlled analyses with a learning dynamic explanation, we demonstrate that this sparsity is not incidental but an adaptive mechanism for stabilizing reasoning under OOD. Leveraging this insight, we design \textit{Sparsity-Guided Curriculum In-Context Learning (SG-ICL)}, a strategy that explicitly uses representation sparsity to schedule few-shot demonstrations, leading to considerable performance enhancements. Our study provides new mechanistic insights into how LLMs internalize OOD challenges. The source code is available at the URL: https://github.com/MingyuJ666/sparsityLLM.
Related papers
- Where MLLMs Attend and What They Rely On: Explaining Autoregressive Token Generation [59.40886078302025]
Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in aligning visual inputs with natural language outputs.<n>Yet, the extent to which generated tokens depend on visual modalities remains poorly understood.<n>We present a lightweight black-box framework for explaining autoregressive token generation in MLLMs.
arXiv Detail & Related papers (2025-09-26T15:38:42Z) - Explaining multimodal LLMs via intra-modal token interactions [55.27436637894534]
Multimodal Large Language Models (MLLMs) have achieved remarkable success across diverse vision-language tasks, yet their internal decision-making mechanisms remain insufficiently understood.<n>We propose enhancing interpretability by leveraging intra-modal interaction.
arXiv Detail & Related papers (2025-09-26T14:39:13Z) - AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking [38.8730008545358]
Large language models (LLMs) often lack robustness in their reasoning.<n>Our approach focuses on "abstracting" reasoning problems.<n>We find that this abstraction process is better acquired through reinforcement learning (RL) than just supervised fine-tuning.
arXiv Detail & Related papers (2025-06-09T13:34:50Z) - MLLMs are Deeply Affected by Modality Bias [158.64371871084478]
Recent advances in Multimodal Large Language Models (MLLMs) have shown promising results in integrating diverse modalities such as texts and images.<n>MLLMs are heavily influenced by modality bias, often relying on language while under-utilizing other modalities like visual inputs.<n>This paper argues that MLLMs are deeply affected by modality bias, highlighting its manifestations across various tasks.
arXiv Detail & Related papers (2025-05-24T11:49:31Z) - MIR-Bench: Can Your LLM Recognize Complicated Patterns via Many-Shot In-Context Reasoning? [21.638848019633595]
We propose MIR-Bench, the first many-shot in-context reasoning benchmark for pattern recognition.<n>We study many novel problems for many-shot in-context reasoning, and acquired many insightful findings.
arXiv Detail & Related papers (2025-02-14T06:05:12Z) - Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search [57.28671084993782]
Large language models (LLMs) have demonstrated remarkable reasoning capabilities across diverse domains.<n>Recent studies have shown that increasing test-time computation enhances LLMs' reasoning capabilities.<n>We propose a two-stage training paradigm: 1) a small-scale format tuning stage to internalize the COAT reasoning format and 2) a large-scale self-improvement stage leveraging reinforcement learning.
arXiv Detail & Related papers (2025-02-04T17:26:58Z) - Randomly Sampled Language Reasoning Problems Elucidate Limitations of In-Context Learning [9.75748930802634]
We study the power of in-context-learning to improve machine learning performance.<n>We consider an extremely simple domain: next token prediction on simple language tasks.<n>We find that LLMs uniformly underperform n-gram models on this task.
arXiv Detail & Related papers (2025-01-06T07:57:51Z) - Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models [7.115323364355489]
In-context learning, the ability to adapt based on a few examples in the input prompt, is a ubiquitous feature of large language models (LLMs)
We first show that Llama $3$ $70$B can solve simple RL problems in-context.
We then analyze the residual stream of Llama using Sparse Autoencoders (SAEs) and find representations that closely match temporal difference (TD) errors.
arXiv Detail & Related papers (2024-10-02T06:51:12Z) - Analyzing the Role of Semantic Representations in the Era of Large Language Models [104.18157036880287]
We investigate the role of semantic representations in the era of large language models (LLMs)
We propose an AMR-driven chain-of-thought prompting method, which we call AMRCoT.
We find that it is difficult to predict which input examples AMR may help or hurt on, but errors tend to arise with multi-word expressions.
arXiv Detail & Related papers (2024-05-02T17:32:59Z) - In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax [36.98247762224868]
In-context learning (ICL) is now a common method for teaching large language models (LLMs) new tasks.
Do models infer the underlying structure of the task defined by the context, or do they rely on superficial generalizations that only generalize to identically distributed examples?
In experiments with models from the GPT, PaLM, and Llama 2 families, we find large variance across LMs.
The variance is explained more by the composition of the pre-training corpus and supervision methods than by model size.
arXiv Detail & Related papers (2023-11-13T23:52:43Z) - Concise and Organized Perception Facilitates Reasoning in Large Language Models [31.238220405009617]
Exploiting large language models (LLMs) to tackle reasoning has garnered growing attention.<n>It still remains highly challenging to achieve satisfactory results in complex logical problems, characterized by plenty of premises within the context and requiring multi-hop reasoning.<n>In this work, we first examine the mechanism from the perspective of information flow and reveal that LLMs confront difficulties akin to human-like cognitive biases when dealing with disordered and irrelevant content in reasoning tasks.
arXiv Detail & Related papers (2023-10-05T04:47:49Z) - IERL: Interpretable Ensemble Representation Learning -- Combining
CrowdSourced Knowledge and Distributed Semantic Representations [11.008412414253662]
Large Language Models (LLMs) encode meanings of words in the form of distributed semantics.
Recent studies have shown that LLMs tend to generate unintended, inconsistent, or wrong texts as outputs.
We propose a novel ensemble learning method, Interpretable Ensemble Representation Learning (IERL), that systematically combines LLM and crowdsourced knowledge representations.
arXiv Detail & Related papers (2023-06-24T05:02:34Z) - Shortcut Learning of Large Language Models in Natural Language
Understanding [119.45683008451698]
Large language models (LLMs) have achieved state-of-the-art performance on a series of natural language understanding tasks.
They might rely on dataset bias and artifacts as shortcuts for prediction.
This has significantly affected their generalizability and adversarial robustness.
arXiv Detail & Related papers (2022-08-25T03:51:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.