Related papers: Layer-wise Positional Bias in Short-Context Language Modeling

Layer-wise Positional Bias in Short-Context Language Modeling

URL: http://arxiv.org/abs/2601.04098v1
Date: Wed, 07 Jan 2026 17:04:30 GMT
Title: Layer-wise Positional Bias in Short-Context Language Modeling
Authors: Maryam Rahimi, Mahdi Nouri, Yadollah Yaghoobzadeh,
Abstract summary: We introduce an attribution-based framework to analyze positional effects in short-context language modeling.<n>We quantify how each layer distributes importance across input positions, yielding layer-wise positional importance profiles.<n>Characterizing these profiles, we find prominent recency bias that increases with depth and subtle primacy bias that diminishes through model depth.
Score: 5.417332705560665
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Language models often show a preference for using information from specific positions in the input regardless of semantic relevance. While positional bias has been studied in various contexts, from attention sinks to task performance degradation in long-context settings, prior work has not established how these biases evolve across individual layers and input positions, or how they vary independent of task complexity. We introduce an attribution-based framework to analyze positional effects in short-context language modeling. Using layer conductance with a sliding-window approach, we quantify how each layer distributes importance across input positions, yielding layer-wise positional importance profiles. We find that these profiles are architecture-specific, stable across inputs, and invariant to lexical scrambling. Characterizing these profiles, we find prominent recency bias that increases with depth and subtle primacy bias that diminishes through model depth. Beyond positional structure, we also show that early layers preferentially weight content words over function words across all positions, while later layers lose this word-type differentiation.

Related papers

Positional Bias in Multimodal Embedding Models: Do They Favor the Beginning, the Middle, or the End? [5.449094110831793]
We investigate positional bias in multimodal representation models, specifically in the context of image-text retrieval.<n>Our experiments demonstrate that positional bias is prevalent in multimodal models, but manifests differently across modalities.<n>We find that this bias arises from, or is amplified by, a combination of factors, including the positional encoding scheme, training loss, context importance, and the nature of using image-text pairs in multimodal training.
arXiv Detail & Related papers (2025-11-14T12:15:46Z)
Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective [4.17645248123697]
Large Language Models (LLMs) are known to exhibit social, demographic, and gender biases.<n>We analyze how such biases are structurally represented within models such as GPT-2 and Llama2.<n>We show that removing these components not only reduces biased outputs but also affects other NLP tasks.
arXiv Detail & Related papers (2025-06-05T15:43:34Z)
Beyond Early-Token Bias: Model-Specific and Language-Specific Position Effects in Multilingual LLMs [50.07451351559251]
We present a study across five typologically distinct languages (English, Russian, German, Hindi, and Vietnamese)<n>We examine how position bias interacts with prompt strategies and affects output entropy.
arXiv Detail & Related papers (2025-05-22T02:23:00Z)
Probing Context Localization of Polysemous Words in Pre-trained Language Model Sub-Layers [12.610445666406898]
We investigate the degree of contextualization encoded in the fine-grained sub-layer representations of a Pre-trained Language Model (PLM) To identify the main contributions of sub-layers to contextualisation, we first extract the sub-layer representations of polysemous words in minimally different sentence pairs. We also try to empirically localize the strength of contextualization information encoded in these sub-layer representations.
arXiv Detail & Related papers (2024-09-21T10:42:07Z)
Eliminating Position Bias of Language Models: A Mechanistic Approach [119.34143323054143]
Position bias has proven to be a prevalent issue of modern language models (LMs)<n>Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and relative positional encodings.<n>By eliminating position bias, models achieve better performance and reliability in downstream tasks, including LM-as-a-judge, retrieval-augmented QA, molecule generation, and math reasoning.
arXiv Detail & Related papers (2024-07-01T09:06:57Z)
Distributional Associations vs In-Context Reasoning: A Study of Feed-forward and Attention Layers [49.80959223722325]
We study the distinction between feed-forward and attention layers in large language models.<n>We find that feed-forward layers tend to learn simple distributional associations such as bigrams, while attention layers focus on in-context reasoning.
arXiv Detail & Related papers (2024-06-05T08:51:08Z)
Mitigate Position Bias in Large Language Models via Scaling a Single Dimension [47.792435921037274]
This paper first explores the micro-level manifestations of position bias, concluding that attention weights are a micro-level expression of position bias.<n>It further identifies that, in addition to position embeddings, causal attention mask also contributes to position bias by creating position-specific hidden states.<n>Based on these insights, we propose a method to mitigate position bias by scaling this positional hidden states.
arXiv Detail & Related papers (2024-06-04T17:55:38Z)
How to Handle Different Types of Out-of-Distribution Scenarios in Computational Argumentation? A Comprehensive and Fine-Grained Field Study [59.13867562744973]
This work systematically assesses LMs' capabilities for out-of-distribution (OOD) scenarios. We find that the efficacy of such learning paradigms varies with the type of OOD. Specifically, while ICL excels for domain shifts, prompt-based fine-tuning surpasses for topic shifts.
arXiv Detail & Related papers (2023-09-15T11:15:47Z)
A Frustratingly Easy Improvement for Position Embeddings via Random Padding [68.75670223005716]
In this paper, we propose a simple but effective strategy, Random Padding, without any modifications to existing pre-trained language models. Experiments show that Random Padding can significantly improve model performance on the instances whose answers are located at rear positions.
arXiv Detail & Related papers (2023-05-08T17:08:14Z)
The Curious Case of Absolute Position Embeddings [65.13827063579728]
Transformer language models encode the notion of word order using positional information. In natural language, it is not absolute position that matters, but relative position, and the extent to which APEs can capture this type of information has not been investigated. We observe that models trained with APE over-rely on positional information to the point that they break-down when subjected to sentences with shifted position information.
arXiv Detail & Related papers (2022-10-23T00:00:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.