Related papers: Demystifying the Roles of LLM Layers in Retrieval, Knowledge, and Reasoning

Demystifying the Roles of LLM Layers in Retrieval, Knowledge, and Reasoning

URL: http://arxiv.org/abs/2510.02091v3
Date: Fri, 31 Oct 2025 19:28:21 GMT
Title: Demystifying the Roles of LLM Layers in Retrieval, Knowledge, and Reasoning
Authors: Xinyuan Song, Keyu Wang, PengXiang Li, Lu Yin, Shiwei Liu,
Abstract summary: Studies suggest that the deeper layers of Large Language Models (LLMs) contribute little to representation learning and can often be removed without significant performance loss.<n>We present a systematic study of depth utilization across diverse dimensions, including evaluation protocols, task categories, and model architectures.<n>Our analysis confirms that very deep layers are generally less effective than earlier ones, but their contributions vary substantially with the evaluation setting.
Score: 18.333513606360896
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies suggest that the deeper layers of Large Language Models (LLMs) contribute little to representation learning and can often be removed without significant performance loss. However, such claims are typically drawn from narrow evaluations and may overlook important aspects of model behavior. In this work, we present a systematic study of depth utilization across diverse dimensions, including evaluation protocols, task categories, and model architectures. Our analysis confirms that very deep layers are generally less effective than earlier ones, but their contributions vary substantially with the evaluation setting. Under likelihood-based metrics without generation, pruning most layers preserves performance, with only the initial few being critical. By contrast, generation-based evaluation uncovers indispensable roles for middle and deeper layers in enabling reasoning and maintaining long-range coherence. We further find that knowledge and retrieval are concentrated in shallow components, whereas reasoning accuracy relies heavily on deeper layers -- yet can be reshaped through distillation. These results highlight that depth usage in LLMs is highly heterogeneous and context-dependent, underscoring the need for task-, metric-, and model-aware perspectives in both interpreting and compressing large models.

Related papers

What Affects the Effective Depth of Large Language Models? [44.85395501835759]
We study how effective depth varies with model scale, training type, and task difficulty.<n>We find that while the number of effective layers grows with model size, the effective depth ratio remains stable.<n>Our results suggest that current LLMs underuse available depth across scales, training paradigms, and tasks of varying difficulty.
arXiv Detail & Related papers (2025-12-16T04:07:17Z)
Multimodal Language Models See Better When They Look Shallower [54.5303326937134]
Multimodal large language models (MLLMs) typically extract visual features from the final layers of a pretrained Vision Transformer (ViT)<n>We present the first comprehensive study of visual layer selection for MLLMs, analyzing representation similarity across ViT layers.<n>We find that while deep layers excel in semantic-rich tasks like OCR, shallow and middle layers significantly outperform them on fine-grained visual tasks.
arXiv Detail & Related papers (2025-04-30T09:07:10Z)
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective [64.00022624183781]
Large language models (LLMs) can assess relevance and support information retrieval (IR) tasks.<n>We investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability.
arXiv Detail & Related papers (2025-04-10T16:14:55Z)
The Computational Advantage of Depth: Learning High-Dimensional Hierarchical Functions with Gradient Descent [28.999394988111106]
This paper introduces a class of target functions that incorporate a hierarchy of latent subspace dimensionalities.<n>We analytically study the learning dynamics and generalization performance of deep networks compared to shallow ones in the high-dimensional limit.
arXiv Detail & Related papers (2025-02-19T18:58:28Z)
LESA: Learnable LLM Layer Scaling-Up [57.0510934286449]
Training Large Language Models (LLMs) from scratch requires immense computational resources, making it prohibitively expensive.<n>Model scaling-up offers a promising solution by leveraging the parameters of smaller models to create larger ones.<n>We propose textbfLESA, a novel learnable method for depth scaling-up.
arXiv Detail & Related papers (2025-02-19T14:58:48Z)
AVSS: Layer Importance Evaluation in Large Language Models via Activation Variance-Sparsity Analysis [5.854247492297834]
We propose a novel metric combining normalized activation variance and sparsity to assess each layer's contribution to model performance. By identifying and removing approximately the lowest 25% of layers based on AVSS, we achieve over 90% of original model performance.
arXiv Detail & Related papers (2024-11-04T14:29:49Z)
Aggregation Artifacts in Subjective Tasks Collapse Large Language Models' Posteriors [74.04775677110179]
In-context Learning (ICL) has become the primary method for performing natural language tasks with Large Language Models (LLMs)<n>In this work, we examine whether this is the result of the aggregation used in corresponding datasets, where trying to combine low-agreement, disparate annotations might lead to annotation artifacts that create detrimental noise in the prompt.<n>Our results indicate that aggregation is a confounding factor in the modeling of subjective tasks, and advocate focusing on modeling individuals instead.
arXiv Detail & Related papers (2024-10-17T17:16:00Z)
Investigating Layer Importance in Large Language Models [28.156622049937216]
Large language models (LLMs) have gained increasing attention due to their prominent ability to understand and process texts. The lack of understanding of LLMs has obstructed the deployment in safety-critical scenarios and hindered the development of better models. This study identifies cornerstone layers in LLMs and underscores their critical role for future research.
arXiv Detail & Related papers (2024-09-22T09:53:13Z)
Exploring Concept Depth: How Large Language Models Acquire Knowledge and Concept at Different Layers? [57.04803703952721]
Large language models (LLMs) have shown remarkable performances across a wide range of tasks.<n>However, the mechanisms by which these models encode tasks of varying complexities remain poorly understood.<n>We introduce the idea of "Concept Depth" to suggest that more complex concepts are typically acquired in deeper layers.
arXiv Detail & Related papers (2024-04-10T14:56:40Z)
Masked Image Modeling with Local Multi-Scale Reconstruction [54.91442074100597]
Masked Image Modeling (MIM) achieves outstanding success in self-supervised representation learning. Existing MIM models conduct reconstruction task only at the top layer of encoder. We design local multi-scale reconstruction, where the lower and upper layers reconstruct fine-scale and coarse-scale supervision signals respectively.
arXiv Detail & Related papers (2023-03-09T13:42:04Z)
Why Layer-Wise Learning is Hard to Scale-up and a Possible Solution via Accelerated Downsampling [19.025707054206457]
Layer-wise learning can achieve state-of-the-art performance in image classification on various datasets. Previous studies of layer-wise learning are limited to networks with simple hierarchical structures. This paper reveals the fundamental reason that impedes the scale-up of layer-wise learning is due to the relatively poor separability of the feature space in shallow layers.
arXiv Detail & Related papers (2020-10-15T21:51:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.