Related papers: Mitigate Position Bias in Large Language Models via Scaling a Single Dimension

Mitigate Position Bias in Large Language Models via Scaling a Single Dimension

URL: http://arxiv.org/abs/2406.02536v2
Date: Tue, 15 Oct 2024 15:58:07 GMT
Title: Mitigate Position Bias in Large Language Models via Scaling a Single Dimension
Authors: Yijiong Yu, Huiqiang Jiang, Xufang Luo, Qianhui Wu, Chin-Yew Lin, Dongsheng Li, Yuqing Yang, Yongfeng Huang, Lili Qiu,
Abstract summary: This paper first explores the micro-level manifestations of position bias, concluding that attention weights are a micro-level expression of position bias. It further identifies that, in addition to position embeddings, causal attention mask also contributes to position bias by creating position-specific hidden states. Based on these insights, we propose a method to mitigate position bias by scaling this positional hidden states.
Score: 47.792435921037274
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) are increasingly applied in various real-world scenarios due to their excellent generalization capabilities and robust generative abilities. However, they exhibit position bias, also known as "lost in the middle", a phenomenon that is especially pronounced in long-context scenarios, which indicates the placement of the key information in different positions of a prompt can significantly affect accuracy. This paper first explores the micro-level manifestations of position bias, concluding that attention weights are a micro-level expression of position bias. It further identifies that, in addition to position embeddings, causal attention mask also contributes to position bias by creating position-specific hidden states. Based on these insights, we propose a method to mitigate position bias by scaling this positional hidden states. Experiments on the NaturalQuestions Multi-document QA, KV retrieval, LongBench and timeline reorder tasks, using various models including RoPE models, context windowextended models, and Alibi models, demonstrate the effectiveness and generalizability of our approach. Our method can improve performance by up to 15.2% by modifying just one dimension of hidden states. Our code is available at https://aka.ms/PositionalHidden.

Related papers

Attention Basin: Why Contextual Position Matters in Large Language Models [16.11590856103274]
We show that models systematically assign higher attention to items at the beginning and end of a sequence, while neglecting those in the middle.<n>We introduce Attention-Driven Reranking (AttnRank), a framework that estimates a model's intrinsic positional attention preferences.<n>AttnRank is a model-agnostic, training-free, and plug-and-play method with minimal computational overhead.
arXiv Detail & Related papers (2025-08-07T08:08:08Z)
Position of Uncertainty: A Cross-Linguistic Study of Positional Bias in Large Language Models [49.46335932942725]
We study how positional bias interacts with model uncertainty, syntax, and prompting.<n>We present a cross-linguistic study across five typologically distinct languages.
arXiv Detail & Related papers (2025-05-22T02:23:00Z)
Benchmarking the Myopic Trap: Positional Bias in Information Retrieval [10.558261017416472]
This study investigates a specific form of positional bias, termed the Myopic Trap, where retrieval models disproportionately attend to the early parts of documents while overlooking relevant information that appears later.<n>To systematically quantify this phenomenon, we propose a semantics-preserving evaluation framework that repurposes the existing NLP datasets into position-aware benchmarks.
arXiv Detail & Related papers (2025-05-20T05:29:01Z)
Wavelet-based Positional Representation for Long Context [14.902305283428642]
We analyze conventional position encoding methods for long contexts. We propose a new position representation method that captures multiple scales (i.e., window sizes) by leveraging wavelet transforms. Experimental results show that this new method improves the performance of the model in both short and long contexts.
arXiv Detail & Related papers (2025-02-04T04:44:53Z)
On Positional Bias of Faithfulness for Long-form Summarization [83.63283027830657]
Large Language Models (LLMs) often exhibit positional bias in long-context settings, under-attending to information in the middle of inputs. We investigate the presence of this bias in long-form summarization, its impact on faithfulness, and various techniques to mitigate this bias.
arXiv Detail & Related papers (2024-10-31T03:50:15Z)
Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs [50.40165119718928]
LongPiBench is a benchmark designed to assess positional bias involving multiple pieces of relevant information. These experiments reveal that while most current models are robust against the "lost in the middle" issue, there exist significant biases related to the spacing of relevant information pieces.
arXiv Detail & Related papers (2024-10-18T17:41:19Z)
Eliminating Position Bias of Language Models: A Mechanistic Approach [119.34143323054143]
Position bias has proven to be a prevalent issue of modern language models (LMs) Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of-the-art LMs: causal attention and relative positional encodings. By eliminating position bias, models achieve better performance and reliability in downstream tasks, including LM-as-a-judge, retrieval-augmented QA, molecule generation, and math reasoning.
arXiv Detail & Related papers (2024-07-01T09:06:57Z)
Position-Aware Parameter Efficient Fine-Tuning Approach for Reducing Positional Bias in LLMs [18.832135309689736]
Recent advances in large language models (LLMs) have enhanced their ability to process long input contexts. Recent studies show a positional bias in LLMs, demonstrating varying performance depending on the location of useful information. We develop a Position-Aware PAPEFT approach which is composed of a data augmentation technique and an efficient parameter adapter.
arXiv Detail & Related papers (2024-04-01T19:04:17Z)
Position bias in features [0.0]
Document-specific historical click-through rates can be important features in a dynamic ranking system. This paper describes the properties of several such features, and tests them in controlled experiments.
arXiv Detail & Related papers (2024-02-04T22:15:30Z)
The Curious Case of Absolute Position Embeddings [65.13827063579728]
Transformer language models encode the notion of word order using positional information. In natural language, it is not absolute position that matters, but relative position, and the extent to which APEs can capture this type of information has not been investigated. We observe that models trained with APE over-rely on positional information to the point that they break-down when subjected to sentences with shifted position information.
arXiv Detail & Related papers (2022-10-23T00:00:04Z)
Mitigating Representation Bias in Action Recognition: Algorithms and Benchmarks [76.35271072704384]
Deep learning models perform poorly when applied to videos with rare scenes or objects. We tackle this problem from two different angles: algorithm and dataset. We show that the debiased representation can generalize better when transferred to other datasets and tasks.
arXiv Detail & Related papers (2022-09-20T00:30:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.